What We Learned After Hiring 3 Bad Developers Through AI Video Screening

What We Learned After Hiring 3 Bad Developers Through AI Video Screening

What We Learned After Hiring 3 Bad Developers Through AI Video Screening

|

Contents

Key Takeaways

AI video screening optimizes for communication skills (fluency, confidence, vocabulary), not actual engineering ability—leading to high-scoring candidates who can’t execute in real environments

Strong interview performance can be gamed (memorization, AI-assisted answers, rehearsed responses), creating a false signal that doesn’t translate to on-the-job performance

The core failure is evaluating theory instead of execution—explaining solutions is fundamentally different from debugging, building, and shipping under real constraints

Real-world, task-based assessments (debugging, deploying, optimizing) quickly expose true capability by revealing how candidates think, act, and use tools in practice

Hiring accuracy improves dramatically when you simulate actual work—shifting from “watch them talk” to “watch them execute” leads to faster, more reliable hiring decisions

We burned $240,000 and six months of runway before admitting the obvious: our AI video screening tool was giving us polished liars, not engineers. All three hires passed automated interviews with flying colors. All three were gone within 90 days. Here's what actually happened and why we got it so wrong.

The Setup: Why We Trusted The Machine

Our hiring pipeline was drowning. 180 applicants per backend role. Our tech lead was spending 25 hours a week in interviews. We needed a filter, and AI video screening promised exactly that: automated technical interviews, scored responses, ranked candidates, zero human time.

The tool asked solid questions. "Explain microservices architecture." "How would you optimize database queries?" "Walk through your debugging process." Candidates recorded video answers. The AI analyzed their words, facial expressions, confidence levels, and technical vocabulary. It gave us scores and ranked lists.

We hired the top three scorers over four months. Two backend engineers, one DevOps.

What The Scores Missed

Hire #1: The Pattern Memorizer

Interview score: 94/100. Could recite SOLID principles flawlessly. Explained CAP theorem like a textbook. Failed to debug a simple null pointer exception in production code during week two. Couldn't explain why his own PR was causing memory leaks.

He had memorized answers. He could talk about system design but couldn't read a stack trace.

Hire #2: The AI Parrot

Interview score: 91/100. Delivered word-perfect explanations of Redis caching strategies and load balancing approaches. Three weeks in, we discovered he'd fed the interview questions into ChatGPT and read the responses on camera.

When asked to implement the caching layer he'd described so eloquently, he copied Stack Overflow examples that didn't compile.

Hire #3: The Talker

Interview score: 89/100. Confident, articulate, great presence. Spoke in complete sentences about CI/CD pipelines and container orchestration. Couldn't write a working Dockerfile. Took five days to add logging to a service because he didn't understand how our deployment pipeline actually worked.

He knew the words. He didn't know the work.

The Real Problem With AI Video Screening

These tools measure the wrong signal entirely. They optimize for:

  • Verbal fluency → Not code quality

  • Confidence → Not competence

  • Theoretical knowledge → Not practical execution

  • Performance under recording → Not performance under pressure

They're evaluating a speech contest, not engineering ability.

Here's the gap we missed: explaining how to fix a database bottleneck is completely different from actually connecting to the database, adding indexes, changing queries, and confirming latency drops. One is theory. The other is work.

All three of our bad hires could explain what should happen. None could make it happen.

What Actually Predicts Performance

After the third termination, we rebuilt our screening process. We stopped asking candidates to talk about work and started watching them work.

We gave them real tasks:

  • Fix a failing deployment on a live staging server

  • Debug an API endpoint returning 500 errors using actual logs

  • Optimize a slow database query in our codebase

  • Refactor a service to reduce Docker image size

Thirty minutes. Screen recording. Live environment. All tools available, including AI assistants.

The difference was immediate and brutal.

Candidates who interviewed well but couldn't execute dropped out fast. They'd freeze when faced with an actual terminal, real error messages, ambiguous requirements.

Candidates who never would have scored high on verbal explanations solved problems methodically, asked clarifying questions, explained tradeoffs, used AI effectively to speed up boilerplate, and delivered working solutions.

We saw how they thought. How they debugged. How they handled constraints. How they used tools. How they communicated while working, not performing.

The Expensive Lesson

AI video screening selects for people who are good at AI video screening. That skill has almost zero correlation with being a good engineer.

If your assessment doesn't replicate actual job conditions, you're not assessing job performance. You're assessing test-taking ability.

We wasted half a year and a quarter million dollars learning what should have been obvious: watching someone talk about work is not the same as watching someone work.

The fix isn't better AI interviews. It's a better question: does this assessment show me what this person will actually do on the job, or does it show me how well they perform for a camera?

We're now three hires in using task-based screening. All three are still here. All three shipped features in their first two weeks. Not one required a 90-day performance review to figure out if they could actually code.

Stop asking candidates to explain. Start watching them execute.

Zubin leverages his engineering background and decade of B2B SaaS experience to drive GTM as the Co-founder of Utkrusht. He previously founded Zaminu, served 25+ B2B clients across US, Europe and India.

Want to hire

the best talent

with proof

of skill?

Shortlist candidates with

strong proof of skill

in just 48 hours