LLM 2.0?

7 min readJust now

An introduction to LARGE CONCEPT MODELS (LCMs)

Layman’s Overview

Large Concept Models (LCMs) are build upon on the foundation of Large Language Models (LLMs) like GPT. While LLMs focus on understanding and generating text based on patterns in language data, LCMs aim to model and reason about higher-level concepts and their relationships.

Unlike Large Language Models (LLMs), LCMs operate on higher-level semantics, processing concepts instead of tokens, making them more close to human reasoning, and possibly positioning them as a future rival to the current token-based LLM architecture. To note that the LCM model is trained to perform autoregressive sentence prediction in an embedding space.

In this post, we will explore a recent research paper from
Meta titled — “Large Concept Models: Language Modeling in a Sentence Representation Space” that introduces a new architecture called Large Concept Models or LCM in short unlike traditional large language models that process tokens, large concept models work with concepts.

Processing Concepts instead of Tokens — How beneficial is it?

Let’s start by understanding what it means to process concepts instead of tokens first. Before that let’s understand what are concepts?

Concepts represent the semantics of higher-level ideas or actions and are not specific to single words, additionally concepts are not even specific to a language and can be derived from multiple modalities, for
example the concept behind a certain sentence may remain the same whether it is provided in English or any different language or through text or voice data modalities.
Concepts rather than sub-word tokens allows better hierarchical reasoning. Let understand this with an example from their paper

Visualization of reasoning in an embedding space of concepts (Copyright, LCM Team)

The above figure provides an illustration of reasoning in an embedding space of concepts for a summarization task. On the left, we have embeddings of five sentences, which represent our concepts. To create a summary for these sentences, the concepts are mapped into two other concept representations, which represent the summary.

Another benefit is better handling of long context inputs, the reason for that is that the concept sequence is an order of magnitude shorter than the token sequence for the same input and so it reduces significantly the challenge of dealing with long sequence lengths.

Let’s understand this with an example — Imagine a presenter giving a 10-minute talk. Now the presenter typically wouldn’t prepare a detailed speech by writing out every single word instead the presenter would outline a flow of higher level ideas to communicate during the speech, should the presenter give the same talk multiple times the actual words being spoken may differ over the talks, could even be given in different languages, but the flow of higher level abstract ideas will remain the same working in the concept space.

LCM by Meta — Model Architecture

Let’s review the high-level architecture of large concept models using the
following figure as referred from their paper

High-Level Architecture of Large Concept Models (LCMs) (Copyright: LCM Team)

We can see that an input sequence of words divided into sentences are assumed to be the basic building blocks that represent concepts. These sentences are first passed via a concept encoder which encodes them into concept embeddings. The concept encoder used in their paper is an existing encoder and decoder component called “SONAR” which remains fixed as part of the large concept model training. Sonar supports 200 languages as text input and output which is more than double the amount of languages supported by most large language models today. It also accepts 76 languages as speech input the sequence of concepts is then processed by a large concept model to generate a new sequence of concepts, at the output the main component which is the large concept model itself operates solely in the embedding space and is therefore not dependent on a specific language or modality this approach can be extended beyond text and speech which are explored in their paper. Finally, the generated concepts are decoded back into language also using Sonar. The decoder can convert the output of the large concept model into more than one language or even more than one modality we can see that the hierarchical structure is explicit in the architecture first extracting concepts then reason based on these concepts and finally generating output possibly multiple times without a need to run the large concept model again. Let’s now dive deeper into the inner architecture of the large concept model as seen below.

Base-LCM Architecture (Copyright: Team LCM, Source)

The Base-LCM in the above figure from their paper, depicts that we have a sequence of concepts, this sequence excluding the last concept is fed into the model to predict the next concept. The output is compared to the actual next concept which was not included in the model input. A mean squared error loss is used to train the model, we see that the model has a main Transformer decoder component along with smaller components before and after the Transformer referred as PreNet and PostNet. The former component normalizes the concept embeddings received from Sonar and maps them into the Transformers Dimension, while the latter
component does the opposite projecting the model output back to Sonar’s
dimension.

A potential shortcoming to this approach is that unlike large language models that learn a distribution for next token prediction, here we train the model to output a very specific concept but there are likely many other concepts that could make sense, this leads us to the next version of large concept model architecture — the challenge of having many possible plausible outputs for a given input.

Such short come has already being tackled in the image generation domain by diffusion model, inspired by this diffusion-based architecture the LCMs are also explored. In a nutshell — “Diffusion models are advanced machine learning algorithms that uniquely generate high-quality data by progressively adding noise to a dataset and then learning to reverse this process.”

The two types of diffusion-based large concept models proposed in their paper are — 1) One-Tower LCM 2) Two-Tower LCM as shown below.

Illustration of the two types of diffusion-based LCMs (Copyright: Team LCM, Source)

For more technical details, refer their paper.

Key Characteristics of Large Concept Models:

Concept-Centric Reasoning:

LCMs focus on understanding and manipulating abstract concepts (e.g., “justice,” “energy,” or “evolution”) rather than just processing text.
They aim to capture the relationships between concepts, such as hierarchies, dependencies, and analogies.

2. Cross-Domain Knowledge:

LCMs are designed to integrate knowledge from diverse fields (e.g., science, art, philosophy) to enable more generalized reasoning.
This contrasts with LLMs, which often rely on patterns in text data without deeply understanding the underlying concepts.

3. Structured Knowledge Representation:

LCMs may use structured representations of knowledge, such as ontologies, knowledge graphs, or symbolic reasoning, to model concepts and their relationships.
This allows for more interpretable and explainable reasoning compared to purely neural approaches.

4. Human-Like Abstraction:

LCMs aim to mimic human-like abstraction and generalization, enabling them to apply learned concepts to new, unseen scenarios.

5. Integration with LLMs:

LCMs can be seen as complementary to LLMs. While LLMs excel at processing and generating text, LCMs provide a deeper understanding of the concepts behind the text.
For example, an LCM might help an LLM reason about the ethical implications of a decision by understanding the abstract concept of “ethics.”

Potential Applications of LCMs:

Scientific Discovery: Identifying connections between concepts in different fields to generate new hypotheses.
Education: Teaching and explaining complex concepts by breaking them down into simpler, related ideas.
Decision-Making: Assisting in high-level reasoning tasks, such as policy analysis or strategic planning.
AI Explainability: Providing clearer explanations of AI decisions by referencing underlying concepts.

Challenges:

Defining Concepts: Concepts are often abstract and context-dependent, making them difficult to model formally.
Scalability: Combining structured knowledge with neural approaches can be computationally expensive.
Evaluation: Measuring the performance of LCMs is challenging, as concept understanding is harder to quantify than text generation.

Relation to Other AI Models:

Large Language Models (LLMs): LCMs build on LLMs but focus on conceptual understanding rather than text patterns.
Knowledge Graphs: LCMs may use knowledge graphs to represent relationships between concepts.
Neuro-Symbolic AI: LCMs align with neuro-symbolic approaches, combining neural networks with symbolic reasoning.

Key findings for a summarization task:

Abstractive Summaries: LCMs tend to generate more abstractive summaries rather than extractive ones
Repetition Rate: LCMs produce fewer repetitions compared to LLMs, with repetition rates closer to the ground truth.
Fluency: LCMs generate less fluent summaries than LLMs, though even human-generated summaries scored lower than LLM outputs.

In summary, Large Concept Models (LCMs) represent a shift toward AI systems that can reason about abstract ideas and their inter-connections, potentially enabling more human-like understanding and problem-solving. However, this field is still in its early stages, and significant research is needed to realize its full potential.