版本：最新版

What is MoM Model Family?

The MoM (Mixture of Models) Model Family is a curated collection of specialized, lightweight models designed for intelligent routing, content safety, and semantic understanding. These models power the core capabilities of Semantic Router, enabling fast, accurate, and privacy-preserving AI operations.

Overview

The MoM family consists of purpose-built models that handle specific tasks in the routing pipeline:

Classification Models: Domain detection, PII identification, jailbreak detection
Embedding Models: Semantic similarity, caching, retrieval
Safety Models: Hallucination detection, content moderation
Feedback Models: User intent understanding, conversation analysis

All MoM models are:

Lightweight: 33M-600M parameters for fast inference
Specialized: Fine-tuned for specific routing tasks
Efficient: Many use LoRA adapters for minimal memory footprint
Open Source: Available on HuggingFace for transparency and customization

Model Categories

1. Classification Models

Domain/Intent Classifier

Model ID: models/mmbert32k-intent-classifier-merged
HuggingFace: llm-semantic-router/mmbert32k-intent-classifier-merged
Purpose: Classify user queries into 14 MMLU categories (math, science, history, etc.)
Architecture: mmBERT-32K merged classifier (307M)
Use Case: Route queries to domain-specific models or experts

PII Detector

Model ID: models/mmbert32k-pii-detector-merged
HuggingFace: llm-semantic-router/mmbert32k-pii-detector-merged
Purpose: Detect 17 PII entity types across 35 BIO labels
Architecture: mmBERT-32K merged token classifier (307M)
Use Case: Privacy protection, compliance, data masking

Jailbreak Detector

Model ID: models/mmbert32k-jailbreak-detector-merged
HuggingFace: llm-semantic-router/mmbert32k-jailbreak-detector-merged
Purpose: Detect prompt injection and jailbreak attempts
Architecture: mmBERT-32K merged classifier (307M)
Use Case: Content safety, prompt security

Feedback Detector

Model ID: models/mmbert32k-feedback-detector-merged
HuggingFace: llm-semantic-router/mmbert32k-feedback-detector-merged
Purpose: Classify user feedback into 4 types (satisfied, need clarification, wrong answer, want different)
Architecture: mmBERT-32K merged classifier (307M)
Use Case: Adaptive routing, conversation improvement

2. Embedding Models

Embedding Pro (High Quality)

Model ID: models/mom-embedding-pro
HuggingFace: Qwen/Qwen3-Embedding-0.6B
Purpose: High-quality embeddings with 32K context support
Architecture: Qwen3 (600M parameters)
Embedding Dimension: 1024
Use Case: Long-context semantic search, high-accuracy caching

Embedding Flash (Balanced)

Model ID: models/mom-embedding-flash
HuggingFace: google/embeddinggemma-300m
Purpose: Fast embeddings with Matryoshka support
Architecture: Gemma (300M parameters)
Embedding Dimension: 768 (supports 512/256/128 via Matryoshka)
Use Case: Balanced speed/quality, multilingual support

Embedding Ultra (Default)

Model ID: models/mom-embedding-ultra
HuggingFace: llm-semantic-router/mmbert-embed-32k-2d-matryoshka
Purpose: Long-context multilingual semantic similarity with 2D Matryoshka support
Architecture: mmBERT 2D Matryoshka (307M parameters)
Embedding Dimension: 768 (supports lower dimensions via Matryoshka)
Use Case: Default semantic caching, retrieval, and tools similarity

3. Hallucination Detection Models

Halugate Sentinel

Model ID: models/mom-halugate-sentinel
HuggingFace: LLM-Semantic-Router/halugate-sentinel
Purpose: First-stage hallucination screening
Architecture: BERT-base (110M)
Use Case: Fast hallucination detection, pre-filtering

Halugate Detector

Model ID: models/mom-halugate-detector
HuggingFace: KRLabsOrg/lettucedect-base-modernbert-en-v1
Purpose: Accurate hallucination verification
Architecture: ModernBERT-base (149M)
Context Length: 8192 tokens
Use Case: Factual accuracy verification, grounding check

Halugate Explainer

Model ID: models/mom-halugate-explainer
HuggingFace: tasksource/ModernBERT-base-nli
Purpose: Explain hallucination reasoning via NLI
Architecture: ModernBERT-base (149M)
Classes: 3 (entailment/neutral/contradiction)
Use Case: Explainable AI, hallucination analysis

Model Selection Guide

By Use Case

Use Case	Recommended Model	Why
Domain routing	mmbert32k-intent-classifier-merged	14 MMLU categories, 32K context
Privacy protection	mmbert32k-pii-detector-merged	17 entity types, 35 BIO labels, 32K context
Content safety	mmbert32k-jailbreak-detector-merged	Prompt injection detection with merged mmBERT
Semantic caching	mom-embedding-ultra	Default 32K multilingual embeddings
Long-context search	mom-embedding-pro	32K context, 1024-dim
Hallucination check	mom-halugate-detector	ModernBERT, 8K context
User feedback	mmbert32k-feedback-detector-merged	4 feedback types, merged mmBERT

By Performance Requirements

Requirement	Model Tier	Examples
Ultra-fast (<10ms)	Light	mom-embedding-flash, mmbert32k-jailbreak-detector-merged
Balanced (10-50ms)	Default	mom-embedding-ultra, mmbert32k-intent-classifier-merged
High-quality (50-200ms)	Pro	mom-embedding-pro, mom-halugate-detector

Configuration

Using MoM Models in Router

MoM models are configured through the canonical global.model_catalog block, with module-level settings living under global.model_catalog.modules:

global:
  model_catalog:
    system:
      domain_classifier: "models/mmbert32k-intent-classifier-merged"
      pii_classifier: "models/mmbert32k-pii-detector-merged"
      prompt_guard: "models/mmbert32k-jailbreak-detector-merged"
    modules:
      classifier:
        domain:
          model_ref: "domain_classifier"
          threshold: 0.6
          use_cpu: true
        pii:
          model_ref: "pii_classifier"
          threshold: 0.9
          use_cpu: true
      prompt_guard:
        model_ref: "prompt_guard"
        threshold: 0.7
        use_cpu: true

Custom System Bindings

Override the built-in system-model bindings in your config.yaml:

global:
  model_catalog:
    system:
      domain_classifier: "models/your-domain-classifier"
      pii_classifier: "models/your-pii-classifier"
      prompt_guard: "models/your-prompt-guard"

Model Architecture

LoRA-Based Models

Many MoM models use LoRA (Low-Rank Adaptation) for efficiency:

Base Model: BERT-base-uncased (110M parameters)
LoRA Adapters: <1M parameters per task
Memory Footprint: ~440MB base + ~4MB per adapter
Inference Speed: Same as base model (~10-20ms on CPU)

ModernBERT Models

Newer models use ModernBERT for better performance:

Architecture: ModernBERT-base (149M parameters)
Context Length: 8192 tokens (vs 512 for BERT)
Performance: Better accuracy on long-context tasks
Use Cases: Hallucination detection, feedback classification

Next Steps

Signal-Driven Decisions - Learn how MoM models power routing decisions
Domain - Use mmbert32k-intent-classifier-merged for routing
PII - Configure mmbert32k-pii-detector-merged
RAG - Use MoM embedding models for route-local retrieval

What is MoM Model Family?

Overview​

Model Categories​

1. Classification Models​

Domain/Intent Classifier​

PII Detector​

Jailbreak Detector​

Feedback Detector​

2. Embedding Models​

Embedding Pro (High Quality)​

Embedding Flash (Balanced)​

Embedding Ultra (Default)​

3. Hallucination Detection Models​

Halugate Sentinel​

Halugate Detector​

Halugate Explainer​

Model Selection Guide​

By Use Case​

By Performance Requirements​

Configuration​

Using MoM Models in Router​

Custom System Bindings​

Model Architecture​

LoRA-Based Models​

ModernBERT Models​

Next Steps​