AI Interview Simulator

A voice app to practice technical interviews. You give it your CV and a job description, set up the interviewer, and then talk through a real interview with an AI. When you finish, a second AI gives you a clear report on how you did, with examples from your own answers. It runs on your own computer to keep your data private, and you can move it to the cloud by changing a setting — no rewrite needed.

ReactTanStack StartFastAPILangGraphQdrantOllamaDocker

CONTEXT

When the interviewer was a bot

I built this after a job interview where I never spoke to a person. Being interviewed by a bot was deeply frustrating — it was slow to answer, quick to talk over me, and cut my answers short before I could finish. It is the kind of AI-led first interview people joke about online, until it happens to you.

I had just been interviewed by an AI. So I built a better one to practice with.

Every frustration from that call became a design choice — I wanted mine to be everything that bot was not. It waits until you have truly finished before it replies, and answers one sentence at a time, so a turn feels like a real conversation instead of a fight to be heard.

WHAT I BUILT

A voice interviewer, plus an AI that scores you

You give it your CV and a job description, then talk through a real voice interview. Four parts work together on every turn:

Listen — your speech becomes text with Faster-Whisper.
Lead — a LangGraph agent asks questions matched to your CV and the job (searched with Qdrant).
Speak — its replies are spoken back to you with Piper.
Score — when you finish, a second AI writes a report that links each point to something you actually said, not just a number.

A short clip of a live voice interview: your speech goes to the AI, and its replies are spoken back. Something the reader can hear.

VIDEO

Placeholder — drop a real video here

ARCHITECTURE

Local first, and easy to move to the cloud

FastAPI backend (data in SQLModel + Alembic), React + TanStack Start front end, with the AI models running locally on Ollama. Docker Compose runs the API, database, vector store, and speech services together.

One rule holds it together: every part talks through a clear interface. Speech-to-text, text-to-speech, and turn detection are used only through STTProvider, TTSProvider, and TurnDetector, and a setting picks which one. Moving the model, vector store, or database to the cloud is a config change — not a rewrite.

Rendering diagram…

How the parts fit together: the front end, the gateway, the two AIs, the parts you can swap, and storage.

How a single turn works

Rendering diagram…

One turn, step by step. Sound streams both ways, so the wait feels short. When the interview ends, the second AI runs.

TRADE-OFFS

The hard choices, and what they cost

Ollama native, not in Docker. On Apple Silicon, Docker cannot use the GPU, so Ollama in Docker runs CPU-only and crawls. Running it natively uses the Apple GPU (Metal) and is far faster. Cost: no single docker compose up — but for real use, the speed is worth it.

Turn detection: a simple rule, not a big model. By default the app uses a light rule over the words so far — no heavy model, no restrictive license bundled in. You can opt into a stronger model instead; I documented the trade-off for each:

Model	Input	Size / placement	License
Pipecat Smart Turn v3	Audio / waveform	~8M params, CPU	BSD-2-Clause — permissive
LiveKit turn-detector	Text (partial transcript)	~0.1B (Qwen2.5-0.5B), INT8 ONNX, CPU	Code Apache-2.0; weights under restricted LiveKit license
TEN Turn Detection	Text	8B (Qwen2.5-7B), GPU only	Apache-2.0 with extra restrictions

They take different inputs — some audio, some text — so the TurnDetector interface carries both. Any of them drops in without touching the rest of the app.

Voice is an optional extra. It installs on demand (uv sync --extra voice) and loads only when needed, so the text-only version and the tests stay small and fast. Cost: voice users run one extra install step.

OUTCOME

Private by default, measured, and free to run

Because the AI runs locally, no data leaves your computer and there is no per-request bill. The same code runs on a laptop or in the cloud, with no change to the app logic.

And I measure it instead of guessing. The app times each step — turn_detection_ms, stt_ms, graph_ms, tts_ms, plus time-to-first-audio — sends them to LangSmith, and warns me when a step runs slow. On a local GPU the interviewer replies about 1–3 seconds after you stop (≈0.7–1s in the cloud).

The final report: each point links back to a moment in the interview, not just a score. The end result the reader wants to see.

REPORT

Placeholder — drop a real image here

WHAT I LEARNED

Three things I learned

Clear interfaces from the start let the app run both locally and in the cloud without writing the code twice.
How fast it feels beats the total time.* Speaking back one sentence at a time helped more than any other speed fix.
Measure before optimizing. Timing each step in LangSmith showed the real slow point, instead of guessing.

AI Interview Simulator

ReactTanStack StartFastAPILangGraphQdrantOllamaDocker

When the interviewer was a bot

I had just been interviewed by an AI. So I built a better one to practice with.

A voice interviewer, plus an AI that scores you

You give it your CV and a job description, then talk through a real voice interview. Four parts work together on every turn:

Listen — your speech becomes text with Faster-Whisper.
Lead — a LangGraph agent asks questions matched to your CV and the job (searched with Qdrant).
Speak — its replies are spoken back to you with Piper.
Score — when you finish, a second AI writes a report that links each point to something you actually said, not just a number.

A short clip of a live voice interview: your speech goes to the AI, and its replies are spoken back. Something the reader can hear.

VIDEO

Placeholder — drop a real video here

Local first, and easy to move to the cloud

Rendering diagram…

How the parts fit together: the front end, the gateway, the two AIs, the parts you can swap, and storage.

How a single turn works

Rendering diagram…

One turn, step by step. Sound streams both ways, so the wait feels short. When the interview ends, the second AI runs.

The hard choices, and what they cost

Model	Input	Size / placement	License
Pipecat Smart Turn v3	Audio / waveform	~8M params, CPU	BSD-2-Clause — permissive
LiveKit turn-detector	Text (partial transcript)	~0.1B (Qwen2.5-0.5B), INT8 ONNX, CPU	Code Apache-2.0; weights under restricted LiveKit license
TEN Turn Detection	Text	8B (Qwen2.5-7B), GPU only	Apache-2.0 with extra restrictions

They take different inputs — some audio, some text — so the TurnDetector interface carries both. Any of them drops in without touching the rest of the app.

Private by default, measured, and free to run

Because the AI runs locally, no data leaves your computer and there is no per-request bill. The same code runs on a laptop or in the cloud, with no change to the app logic.

The final report: each point links back to a moment in the interview, not just a score. The end result the reader wants to see.

REPORT

Placeholder — drop a real image here

Three things I learned

Clear interfaces from the start let the app run both locally and in the cloud without writing the code twice.
How fast it feels beats the total time.* Speaking back one sentence at a time helped more than any other speed fix.
Measure before optimizing. Timing each step in LangSmith showed the real slow point, instead of guessing.

AI Interview Simulator

When the interviewer was a bot

A voice interviewer, plus an AI that scores you

Local first, and easy to move to the cloud

How a single turn works

The hard choices, and what they cost

Private by default, measured, and free to run

Three things I learned

Gallery & demos

Want the full story?

AI Interview Simulator

When the interviewer was a bot

A voice interviewer, plus an AI that scores you

Local first, and easy to move to the cloud

How a single turn works

The hard choices, and what they cost

Private by default, measured, and free to run

Three things I learned

Gallery & demos

Want the full story?