Senior Self‑Critiquing Retrieval‑Augmented Generation Engineer

corpora-jobs-1778796293285-db9d41c6 - Frontier Development

Applied ScientistSenior1 position

⚡

Why This Role is Different

Frontier Development Role

Build the self‑checking heart of our RAG system—an AI critic that can spot hallucinations, request fresh evidence, and keep the agent’s answers grounded in truth.

The Frontier Element

You will create the first end‑to‑end, low‑latency critic loop that operates at inference time, combining lightweight transformer inference with dynamic retrieval re‑ranking—an uncharted approach to self‑correcting generation.

🔬

Project Context

Research Area

Self‑Critiquing Retrieval‑Augmented Generation

From: Retrieval Unreliability and Knowledge Base Corruption

Why This Role is Critical

The critic module is the final safeguard that ensures generated content aligns with retrieved evidence. This role will design, train, and deploy a lightweight critic that evaluates faithfulness, triggers re‑retrieval, and closes the loop with the LLM.

What You Will Build

A modular critic model (e.g., LoRA‑adapted BERT or Tiny‑Critic), an evaluation pipeline that scores faithfulness, and an automated re‑retrieval/re‑generation loop that operates in real time.

🛠

Key Responsibilities

Design and train a parameter‑efficient critic model that scores faithfulness against retrieved evidence, using techniques like LoRA, Tiny‑Critic, or QLoRA.
Implement a feedback loop that, upon low faithfulness scores, triggers re‑retrieval with updated trust weights and re‑generation, all within a single request latency budget.
Develop a benchmarking suite that quantifies hallucination reduction, faithfulness improvement, and overall latency impact across multiple domains.
Integrate the critic with the hybrid retrieval engine, ensuring seamless data flow of evidence and trust scores.
Collaborate with the trust‑scoring team to align critic thresholds with dynamic alpha and trust metrics.

🎯

Required Skills & Experience

Technical Must-Haves

LLM fine‑tuning and LoRA/QLoRA techniques

Expert

Essential for building lightweight critic models.

Faithfulness evaluation metrics (RAGAS, GPT‑4 judge)

Advanced

Needed to train and validate the critic.

Python/Julia/JavaScript for inference pipelines

Proficient

For building the real‑time critic loop.

Vector search APIs (FAISS, Milvus, ElasticSearch)

Proficient

To fetch evidence for critic evaluation.

Graph traversal algorithms (BFS, Dijkstra, PageRank)

Advanced

For multi‑hop evidence extraction.

Experience Requirements

3+ years building or fine‑tuning LLMs for retrieval‑augmented generation.
Demonstrated ability to reduce hallucinations via critic or feedback loops.

Education

Master’s or PhD in Computer Science or AI with a focus on natural language processing or machine learning.

⭐

Preferred Skills

Experience with RAGAS or similar evaluation suites.
Knowledge of reinforcement learning for generation control.

🤝

You Will Thrive Here If...

Excited by rapid prototyping and iterative experimentation.
Able to translate research insights into production‑grade code.

📈

Impact & Growth

12-Month Impact

Reduce hallucination rates by at least 40% and improve faithfulness scores to >0.8 on RAGAS within a year, while keeping end‑to‑end inference latency under 1 second on a 1 billion‑token corpus.

Growth Opportunity

Lead the expansion of the critic framework to multi‑modal evidence (images, PDFs) and to cross‑domain compliance use cases, becoming the flagship self‑correcting component of our AI platform.

Ready to Push the Boundaries?

If this sounds like the challenge you have been looking for, we want to hear from you. We value what you can build over where you have been.