CONTEXT
When the interviewer was a bot
I built this after a job interview where I never spoke to a person. Being interviewed by a bot was deeply frustrating — it was slow to answer, quick to talk over me, and cut my answers short before I could finish. It is the kind of AI-led first interview people joke about online, until it happens to you.
I had just been interviewed by an AI. So I built a better one to practice with.
Every frustration from that call became a design choice — I wanted mine to be everything that bot was not. It waits until you have truly finished before it replies, and answers one sentence at a time, so a turn feels like a real conversation instead of a fight to be heard.
WHAT I BUILT
A voice interviewer, plus an AI that scores you
You give it your CV and a job description, then talk through a real voice interview. Four parts work together on every turn:
- Listen — your speech becomes text with Faster-Whisper.
- Lead — a LangGraph agent asks questions matched to your CV and the job (searched with Qdrant).
- Speak — its replies are spoken back to you with Piper.
- Score — when you finish, a second AI writes a report that links each point to something you actually said, not just a number.
A short clip of a live voice interview: your speech goes to the AI, and its replies are spoken back. Something the reader can hear.
ARCHITECTURE
Local first, and easy to move to the cloud
FastAPI backend (data in SQLModel + Alembic), React + TanStack Start front end, with the AI models running locally on Ollama. Docker Compose runs the API, database, vector store, and speech services together.
One rule holds it together: every part talks through a clear interface. Speech-to-text, text-to-speech, and turn detection are used only through STTProvider, TTSProvider, and TurnDetector, and a setting picks which one. Moving the model, vector store, or database to the cloud is a config change — not a rewrite.
How a single turn works
TRADE-OFFS
The hard choices, and what they cost
Ollama native, not in Docker. On Apple Silicon, Docker cannot use the GPU, so Ollama in Docker runs CPU-only and crawls. Running it natively uses the Apple GPU (Metal) and is far faster. Cost: no single docker compose up — but for real use, the speed is worth it.
Turn detection: a simple rule, not a big model. By default the app uses a light rule over the words so far — no heavy model, no restrictive license bundled in. You can opt into a stronger model instead; I documented the trade-off for each:
| Model | Input | Size / placement | License |
|---|---|---|---|
| Pipecat Smart Turn v3 | Audio / waveform | ~8M params, CPU | BSD-2-Clause — permissive |
| LiveKit turn-detector | Text (partial transcript) | ~0.1B (Qwen2.5-0.5B), INT8 ONNX, CPU | Code Apache-2.0; weights under restricted LiveKit license |
| TEN Turn Detection | Text | 8B (Qwen2.5-7B), GPU only | Apache-2.0 with extra restrictions |
They take different inputs — some audio, some text — so the TurnDetector interface carries both. Any of them drops in without touching the rest of the app.
Voice is an optional extra. It installs on demand (uv sync --extra voice) and loads only when needed, so the text-only version and the tests stay small and fast. Cost: voice users run one extra install step.
OUTCOME
Private by default, measured, and free to run
Because the AI runs locally, no data leaves your computer and there is no per-request bill. The same code runs on a laptop or in the cloud, with no change to the app logic.
And I measure it instead of guessing. The app times each step — turn_detection_ms, stt_ms, graph_ms, tts_ms, plus time-to-first-audio — sends them to LangSmith, and warns me when a step runs slow. On a local GPU the interviewer replies about 1–3 seconds after you stop (≈0.7–1s in the cloud).
The final report: each point links back to a moment in the interview, not just a score. The end result the reader wants to see.
WHAT I LEARNED
Three things I learned
- Clear interfaces from the start let the app run both locally and in the cloud without writing the code twice.
- How fast it feels beats the total time.* Speaking back one sentence at a time helped more than any other speed fix.
- Measure before optimizing. Timing each step in LangSmith showed the real slow point, instead of guessing.