LLM works very well on many tasks. It's quite amazing to reflect how many things the next-token prediction has realized. We cannot take this for granted. Here we collect all the popular hypothesis why LLM works.
Next-token prediction as a massive multi-task learning problem (https://x.com/_jasonwei/status/1729585618311950445, https://www.youtube.com/watch?v=kYWUEV_e2ss)
The Linear Representation Hypothesis and the Geometry of Large Language Models (https://arxiv.org/pdf/2311.03658)
The Platonic Representation Hypothesis(https://arxiv.org/abs/2405.07987)