Chain-of-Verification Reduces Hallucination in Large Language Models

Introduction

Hallucinations occur when the answer appears plausible but is factually incorrect.
Meta AI → CoVe
4 스텝
- Initial baseline response creation
  - 걍 baseline
- verification question generation: 확인하는 질문을 생성을 하네
  - prefix로 original query와 baseline response를 받고, verify하는 질문을 생성
- execute verification
  - external tool(web search)
  - approaches
    - joint: 질문 생성과 확인 답변을 같이
    - 2 step: 질문 생성 + 확인 답변
    - factored: answering each question separately(이해 못함)
    - factored + revise: check if the answers match with the baseline response. Compare the answers to the baseline response.
- final refined answer generation
Large Language Model (LLM) can be used to validate itself (self-verification)
Implementation (Python 🐍 + Langchain 🔗 + OpenAI 🦾 + Search Tool🔍)
verification 질문은 3가지
1. Wiki Data & Wiki Category List: 엔티티 목록을 표시하는 형태의 답변
2. Multi-Span QA: 서로 다른 인접하지 않은 섹션에서 나온 여러 개의 독립적인 답변
3. Long-form Generation: biographical questions
Routing mechanism으로 질문 카테고리에 따라 적절한 chain으로 연결
질문 생성과 확인 실행 부분이 나와있긴 한데, 잘 이해가 안됨.
how to improve?
1. prompt engineering
2. external tools
3. more chains
4. haman in loop (HIL)
  - 인간 전문가가 인공지능 학습 과정 중에 중간 결과물을 확인하고, 학습 데이터를 조정하는 과정
    
    instruction에 따라 3가지로 나누고, 다른 chain of thoughts 방법을 사용하는 것이 신기하네.
Limitation
- Incomplete Removal of Hallucinations
- Limited Scope of Hallucination Mitigation
- Increased Computational Expense
- Upper Bound on Improvement