Compression Theory

Compression Theory for Large Language Models Compression theory 或许是目前为止 LLM 最底层的理论，它的核心论点是只要压缩下一个 token 就可以通向 AGI；此理论被 OpenAI / DeepMind / Moonshot 等一线机构的研究者们深入讨论，具体的资料包括但不限于： DeepMind Jake Rae, Compression for AGI OpenAI Ilya Sutskever, An observation on generalization DeepMind Deletang et. al. Language Modeling Is Compression Moonshot 周昕宇, 压缩下一个 token 通向超过人类的智能 Preliminary: Arithmetic coding Arithmetic coding 是一种无损编码数据的算法，它依赖一个概率模型概率模型对数据的 likelihood 越高，arithmetic coding 压缩率越高 Language modeling as lossless compression 语言模型对数据并不是有损压缩，而是无损语言模型通过对数据做 arithmetic coding 来压缩数据 gzip 的压缩率是 32%，200K 大小的 Transformer 有 30%，Chinchilla 7B 有 10....