Story

Opening the briefing

Loading the article brief, supporting context, and related editorial blocks.

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality | AI BriefWire

Original article excerpt

Server-side extracted preview paragraphs from the original source.

A Blog post by IBM Granite on Hugging Face

In this post: Enterprise-Ready by Design · A Strong Sub-100M Multilingual Model · What Changed from R1 · Training the Full-Size 311M Model · Building the compact 97M Multilingual Model · Benchmark Results · Matryoshka Embeddings · Deployment Options · For Framework Integrators · Which Model Should You Use? · Try The Models

Multilingual embedding models face a persistent tension: broad language coverage usually comes at the cost of model size, and small models usually sacrifice languages. If you work across languages — retrieval-augmented generation over multilingual corpora, cross-lingual search, code retrieval in international teams — you've likely had to choose between a model that's fast enough and one that's good enough.

The Granite Embedding Multilingual R2 release narrows that gap considerably. We're releasing two new multilingual embedding models:

Both models support 200+ languages with enhanced retrieval quality for 52 languages and programming code, handle context lengths up to 32,768 tokens (a 64x increase over their R1 predecessors), and are released under the Apache 2.0 license. They work out of the box with sentence-transformers and transformers, require no task-specific instructions, and are compatible as drop-in replacements in LangChain, LlamaIndex, Haystack, and Milvus with a one-line model name change. For frameworks currently using an English-only default, that one line gives every user in your community support for 200+ languages — no API changes, no new dependencies, no code changes required on their end. Both models ship with ONNX and OpenVINO weights for CPU-optimized inference.

The underlying encoder was pretrained on text from 200+ languages, producing general-purpose embeddings for any of them. The following 52 languages receive explicit retrieval-pair and cross-lingual training for higher-quality retrieval:

Albanian (sq), Arabic (ar), Azerbaijani (az), Bengali (bn), Bulgarian (bg), Catalan (ca), Chinese (zh), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), Georgian (ka), German (de), Greek (el), Hebrew (he), Hindi (hi), Hungarian (hu), Icelandic (is), Indonesian (id), Italian (it), Japanese (ja), Kazakh (kk), Khmer (km), Korean (ko), Latvian (lv), Lithuanian (lt), Malay (ms), Marathi (mr), Norwegian (no), Persian (fa), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Serbian (sr), Slovak (sk), Slovenian (sl), Spanish (es), Swahili (sw), Swedish (sv), Tagalog (tl), Telugu (te), Thai (th), Turkish (tr), Ukrainian (uk), Urdu (ur), Uzbek (uz), Vietnamese (vi).

Opening the briefing

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

Original article excerpt

Today is the last day to apply to speak at TechCrunch Disrupt 2026

This is the gaming laptop that makes desktops obsolete for me

How I prep my solar power stations for weather emergencies - before it's too late