Build the self‑checking heart of our RAG system—an AI critic that can spot hallucinations, request fresh evidence, and keep the agent’s answers grounded in truth.
You will create the first end‑to‑end, low‑latency critic loop that operates at inference time, combining lightweight transformer inference with dynamic retrieval re‑ranking—an uncharted approach to self‑correcting generation.
Self‑Critiquing Retrieval‑Augmented Generation
From: Retrieval Unreliability and Knowledge Base Corruption
The critic module is the final safeguard that ensures generated content aligns with retrieved evidence. This role will design, train, and deploy a lightweight critic that evaluates faithfulness, triggers re‑retrieval, and closes the loop with the LLM.
A modular critic model (e.g., LoRA‑adapted BERT or Tiny‑Critic), an evaluation pipeline that scores faithfulness, and an automated re‑retrieval/re‑generation loop that operates in real time.
Master’s or PhD in Computer Science or AI with a focus on natural language processing or machine learning.
Reduce hallucination rates by at least 40% and improve faithfulness scores to >0.8 on RAGAS within a year, while keeping end‑to‑end inference latency under 1 second on a 1 billion‑token corpus.
Lead the expansion of the critic framework to multi‑modal evidence (images, PDFs) and to cross‑domain compliance use cases, becoming the flagship self‑correcting component of our AI platform.
If this sounds like the challenge you have been looking for, we want to hear from you. We value what you can build over where you have been.