AI use case

LTM-2-mini 100M Token Context Code Generation Model

Magic.dev's LTM-2-mini achieves a 100M token context window for code comprehension using a sequence-dimension algorithm that reduces per-token inference cost by 1000x compared t…

Browse catalog

At a glance

Core facts from this catalog record. Primary narrative lives in the hero above; full raw fields follow in the next section.

Company/Organization: Magic AI
Industry: Internet Software & Services
Location: San Francisco

Record fields

Every column from the source row, in stable order. URLs open in a new tab.

Title

LTM-2-mini 100M Token Context Code Generation Model

Content

Magic AI has developed LTM-2-mini, a code generation model trained to reason on up to 100 million tokens of context given to it during inference. One hundred million tokens equals approximately 10 million lines of code or about 750 novels. The model represents a fundamentally different approach to AI capabilities, where rather than relying on fuzzy memorisation during training, LTM (Long-Term Memory) models are designed to process and reason over vast amounts of information provided at inference time. The architecture behind LTM-2-mini uses a sequence-dimension algorithm that is roughly 1,000 times cheaper than the attention mechanism in Llama 3.1 405B for a 100 million token context window. The memory requirements are even more dramatically different. Running Llama 3.1 405B with a 100 million token context requires 638 H100 GPUs just to store the KV cache. In contrast, LTM requires a small fraction of a single H100's HBM per user for the same context. To evaluate ultra-long context models, Magic developed HashHop, a benchmark using random, incompressible hashes that eliminates the semantic shortcuts that plague traditional evaluations like Needle In A Haystack. The model is prompted with hash pairs and asked to complete the value for a randomly selected hash key. For more complex multi-step reasoning, chains of hashes are used, requiring the model to build inference circuits across the full context. Writing out all intermediate hashes is similar to how chain of thought allows models to spread reasoning over time. Magic also proposes a more challenging variant where the model skips intermediate steps, requiring the architecture to attend and jump across multiple points of the entire context in a single operation. LTM-2-mini was also trained on text-to-diff data with the ultra-long context mechanism. While the prototype model was several orders of magnitude smaller than frontier models and its code synthesis abilities were not yet production-ready, it demonstrated real capabilities: it successfully created a calculator using a custom in-context GUI framework, showcasing real-time learning ability. It also implemented a password strength meter for the open source repository Documenso without human intervention, editing a complex codebase unassisted. Magic has raised a total of 515 million dollars, including a recent 320 million dollar investment from Eric Schmidt, Jane Street, Sequoia, Atlassian, and other investors, alongside existing investors Nat Friedman, Daniel Gross, Elad Gil, and CapitalG. The company is building two new supercomputers on Google Cloud: Magic-G4 powered by NVIDIA H100 Tensor Core GPUs, and Magic-G5 powered by NVIDIA GB200 NVL72, with the ability to scale to tens of thousands of Blackwell GPUs over time. Magic's vision centres on inference-time compute as the next frontier in AI, the idea that sufficiently advanced AI should reliably produce high-quality outputs when given sufficient time and compute at inference, rather than relying solely on what was memorised during pre-training. The company is 23 people and is hiring engineers and researchers to accelerate training of the larger LTM-2 model on their new supercomputer. The long-term goal is to enable AI that can spend 100 dollars and 10 minutes on a coding issue and reliably produce a great pull request for an entire feature.

Continue exploring AI deployments in the catalog.

Back to use cases