Site Navigation

Selected Content

Journal-like notes
Reading notes
Zeitgeist notes

Einstein and Infeld on Physical Concepts
Jiddu Krishnamurti on Truth

the Drawing Board

Edit on GitHub


Notes on LLM architectures

Transformer architecture (a.k.a. models)

from Transformers explained:

MosaicML MPT-30B:

https://medium.com/@mysocial81/mpt-30b-the-open-source-llm-that-outperforms-gpt-3-and-other-llms-in-text-and-code-generation-90fd1dbd4a7

In June 2023, MosaicML announced the release of a new and more powerful member of the Foundation Series that has 30 billion parameters and an 8k token context window. This new model is known as ‘MPT-30B’. ...
MPT-30B is a large language model (LLM) that can generate natural language texts on various topics and domains. It is based on the transformer architecture, which is a neural network that uses attention mechanisms to learn the relationships between words and sentences. ...
MPT-30B is a decoder-style model based on a modified transformer architecture. A transformer is a neural network that uses attention to learn word and sentence relationships. A decoder-style model only uses the decoder part of the transformer to predict the next word given the previous words. This is good for text generation and completion, but not for text understanding and transformation.

HuggingFace Transformers:

https://huggingface.co/docs/transformers/task_summary#what-transformers-can-do


Pages that link to this page