Sternhammer

Sternhammer@aussie.zone · 13 hours ago

LLMs don’t ’remember the entire contents of each book they read’. The data are used to train the LLMs predictive capabilities for sequences of words (or more accurately, tokens). In a sense, it develops of lossy model of its training data not a literal database. LLMs use a stochastic process which means you’ll get different results each time you ask any given question, not deterministic regurgitation of ‘read texts’. This is why it’s a transformative process and also why LLMs can hallucinate nonsense.

This stuff is counter-intuitive. Below is a very good, in-depth explanation that really helped me get a sense of how these things work. Highly recommended if you can spare the 3 hours (!):

https://www.youtube.com/watch?v=7xTGNNLPyMI&list=PLMtPKpcZqZMzfmi6lOtY6dgKXrapOYLlN