How Microsoft Trained a 270M-Pair AI to Power Smarter Search

Hacker NoonHacker Noon
February 28, 2026 at 02:41 PM
How Microsoft Trained a 270M-Pair AI to Power Smarter Search

Researchers at Microsoft Corporation introduce E5, a powerful text embedding model trained on 270M curated web text pairs (CCPairs). Using contrastive learning with in-batch negatives, E5 becomes the first unsupervised model to outperform BM25 on the BEIR benchmark. After fine-tuning, it tops the MTEB leaderboard—beating models 40× larger in retrieval, clustering, classification, and semantic similarity tasks.

Related Articles