NeurIPS 2024論文解説: HaloScope - 未ラベルLLM生成データを活用したハルシネーション検出

この記事は Zenn記事: LLM出力検証の実践：Pydanticで95%精度を実現する3層戦略の深掘りです。

情報源

会議名: NeurIPS 2024（Neural Information Processing Systems）
年: 2024
URL: https://papers.nips.cc/paper_files/paper/2024/hash/ba92705991cfbbcedc26e27e833ebbae-Abstract-Conference.html
採択率: 約26%（2024年）

カンファレンス情報

NeurIPS（Conference on Neural Information Processing Systems）は、機械学習分野の最高峰国際会議の1つです。採択率は通常25-30%と非常に競争率が高く、最新の深層学習研究が集まります。

論文概要（Abstract）

HaloScopeは、ラベル付きデータを必要としないハルシネーション検出フレームワークです。LLMが生成した大量の未ラベルテキストから、自己教師あり学習によりハルシネーションを検出する手法を提案します。

従来手法が高コストな人手ラベリングに依存していたのに対し、HaloScopeは複数回サンプリング + 一貫性ベース検証により、教師なしでハルシネーションを特定します。

主要な貢献（Key Contributions）

未ラベルデータ活用: 大規模な未ラベルLLM生成データから学習
一貫性ベース検出: 複数サンプリングの不一致をハルシネーションの指標として使用
自己教師あり学習: 疑似ラベルを自動生成して検出器を訓練

技術的詳細（Technical Details）

コアアイデア: サンプリング一貫性

仮説: 正しい情報は複数回生成しても一貫するが、ハルシネーションは揺らぐ

  
def sampling_consistency_check(
    prompt: str,
    model,
    n_samples: int = 10
) -> dict:
    """複数サンプリングによる一貫性チェック

    Args:
        prompt: 入力プロンプト
        model: LLM
        n_samples: サンプリング回数

    Returns:
        {"consistent": bool, "variance": float}
    """
    # 同じプロンプトでN回生成（高温度設定）
    responses = []
    for _ in range(n_samples):
        response = model.generate(prompt, temperature=0.8)
        responses.append(response)

    # 応答の多様性を計算
    # 方法1: セマンティック類似度の分散
    embeddings = [embed(r) for r in responses]
    similarity_matrix = compute_pairwise_similarity(embeddings)
    variance = np.var(similarity_matrix)

    # 方法2: n-gramオーバーラップ
    overlap_scores = []
    for i in range(len(responses)):
        for j in range(i+1, len(responses)):
            overlap = jaccard_similarity(responses[i], responses[j])
            overlap_scores.append(overlap)

    avg_overlap = np.mean(overlap_scores)

    return {
        "consistent": avg_overlap > 0.7,  # 閾値
        "variance": variance,
        "avg_overlap": avg_overlap
    }

アルゴリズム: 疑似ラベル生成

  
def generate_pseudo_labels(unlabeled_data: list[str], model) -> list[tuple]:
    """未ラベルデータから疑似ラベルを生成

    Returns:
        [(text, label), ...] where label = 0 (clean) or 1 (hallucination)
    """
    pseudo_labeled = []

    for text in unlabeled_data:
        # 複数サンプリング
        consistency = sampling_consistency_check(text, model, n_samples=10)

        # 一貫性が高い → clean (0)
        # 一貫性が低い → hallucination (1)
        label = 0 if consistency["consistent"] else 1

        pseudo_labeled.append((text, label))

    return pseudo_labeled

検出器の訓練

  
from transformers import AutoModelForSequenceClassification, Trainer

def train_hallucination_detector(pseudo_labeled_data):
    """疑似ラベルで検出器を訓練"""

    # BERTベースの分類器
    model = AutoModelForSequenceClassification.from_pretrained(
        "bert-base-uncased",
        num_labels=2  # clean vs hallucination
    )

    # 疑似ラベルデータで訓練
    trainer = Trainer(
        model=model,
        train_dataset=pseudo_labeled_data,
        # ... 訓練設定
    )

    trainer.train()
    return model

# 使用例
detector = train_hallucination_detector(pseudo_labeled_data)

# 推論
result = detector("Pydantic V2 is 100x faster")  # ハルシネーションの可能性
# → label: 1 (hallucination)

実験結果（Results）

検出精度の比較

手法	データ要件	F1スコア	備考
教師あり（Baseline）	10,000ラベル付き	0.87	高コスト
HaloScope	未ラベル（100,000）	0.84	ラベリング不要
ルールベース	なし	0.62	精度低い

発見事項:

教師ありと遜色ない精度をラベルなしで達成
未ラベルデータの規模が大きいほど精度向上

サンプリング回数の影響

n_samples	F1スコア	レイテンシ
5	0.79	1.2秒
10	0.84	2.4秒
20	0.86	4.8秒

トレードオフ: n=10がバランス良好

実運用への応用（Practical Applications）

応用1: ラベリングコスト削減

従来のハルシネーション検出には、専門家による高コストなラベリングが必要でした：

従来: 10,000サンプル × $1/サンプル = $10,000
HaloScope: 未ラベルデータ活用 → $0

応用2: Zenn記事との統合

Zenn記事の「引用接地」と組み合わせることで、2層の検証が可能：

  
def two_layer_validation(response: str, context: str) -> dict:
    """2層ハルシネーション検証"""

    # 第1層: HaloScope（サンプリング一貫性）
    consistency = sampling_consistency_check(response, model)

    if not consistency["consistent"]:
        return {"valid": False, "reason": "Inconsistent sampling"}

    # 第2層: Citation Grounding（Zenn記事）
    grounding = verify_citations(response, context)

    if not grounding["all_valid"]:
        return {"valid": False, "reason": "Citation not grounded"}

    return {"valid": True}

まとめと今後の展望

まとめ

未ラベル活用: ラベリングコストゼロでハルシネーション検出
一貫性ベース: 複数サンプリングの揺らぎを指標化
実用的精度: F1スコア0.84（教師ありに匹敵）

今後の展望

リアルタイム検出: サンプリング回数削減による高速化
マルチモーダル: 画像生成の一貫性検証への拡張
Active Learning: 低確信サンプルのみ人手ラベリング

参考文献

NeurIPS 2024: https://papers.nips.cc/paper_files/paper/2024/hash/ba92705991cfbbcedc26e27e833ebbae-Abstract-Conference.html
Related Zenn article: https://zenn.dev/0h_n0/articles/0a8f4d0e7c71bf

📄 NeurIPS 2024論文解説: HaloScope - 未ラベルLLM生成データを活用したハルシネーション検出