LLMs don’t ’remember the entire contents of each book they read’. The data are used to train the LLMs predictive capabilities for sequences of words (or more accurately, tokens). In a sense, it develops of lossy model of its training data not a literal database. LLMs use a stochastic process which means you’ll get different results each time you ask any given question, not deterministic regurgitation of ‘read texts’. This is why it’s a transformative process and also why LLMs can hallucinate nonsense.
This stuff is counter-intuitive. Below is a very good, in-depth explanation that really helped me get a sense of how these things work. Highly recommended if you can spare the 3 hours (!):
LLMs don’t ’remember the entire contents of each book they read’. The data are used to train the LLMs predictive capabilities for sequences of words (or more accurately, tokens). In a sense, it develops of lossy model of its training data not a literal database. LLMs use a stochastic process which means you’ll get different results each time you ask any given question, not deterministic regurgitation of ‘read texts’. This is why it’s a transformative process and also why LLMs can hallucinate nonsense.
This stuff is counter-intuitive. Below is a very good, in-depth explanation that really helped me get a sense of how these things work. Highly recommended if you can spare the 3 hours (!):
https://www.youtube.com/watch?v=7xTGNNLPyMI&list=PLMtPKpcZqZMzfmi6lOtY6dgKXrapOYLlN