June 25, 2025

Human-Scored Coding Assessments: How Woven’s Engineers Deliver 24-Hour, AI-Proof Results

By Wes Winham

Hiring platforms love to brag about “automated scoring,” yet most still boil a candidate’s work down to a pass/fail badge or a single number. That’s fine if you’re screening interns, but fatal when you’re hiring a senior engineer who will architect critical systems.

At Woven, every assessment is double-blind reviewed by two certified engineers—not algorithms—within 24 hours. Our scorers benchmark each submission against industry pay bands and your own bar, surface line-by-line feedback, and flag any AI-generated code or plagiarism along the way.

The result: a nuanced strengths-and-gaps report your team can trust, plus the speed today’s hiring market demands.

In this post, we’ll pull back the curtain on how we recruit, train, and certify our scoring panel—and why that human rigor is the secret to hiring engineers who actually thrive on the job.

Scorer Background and Selection:

Our scorers possess backgrounds in computer science, software development, or software engineering. They undergo a rigorous selection process, including a work simulation where candidates score dummy assessments. Approximately 1 in 5 candidates pass this initial bar. The successful candidates then go through a multi-week onboarding process, engaging with training modules and scoring live assessments as a redundant 3rd scorer for certification.

Certification Process:

Scorers achieving a sufficiently low error rate (currently 6% or lower for initial certification, reducing to 5% or less in subsequent months) become certified as scorers in production. Day-to-day, scorers receive detailed reports on their mistakes and improvement notes from the QA team. Monthly reviews by Woven management ensure ongoing scorer performance. If error rates exceed our threshold, coaching is provided.

Reliability and Consistency:

Scorers operate in a double-blind environment, grading specific true/false rubric items within a scenario. Although scores should align, occasional discrepancies occur, necessitating a reconciliation scorer (3rd scorer) to decide the final score. Reconciliation errors are tracked and reported monthly, ensuring transparency. The error rate is approximately 1% due to the independent nature of double-scoring.

ChatGPT & AI Detection:

Scorers review the video playback of each candidate submission and will detect any cheating, AI usage, or plagiarism by the candidate.  (Note: The candidate is informed of the code of conduct prior to starting their assessment.)  They are evaluating candidate behavior and final responses against the guidelines our team has defined and tested.  In other words, if a candidate tries to use outside help (like AI or copied code), Woven will know – protecting the integrity of your hiring.

Learn more about the AI & Cheating Detection Process>>

The Results? A Detailed Analysis of Every Candidate Benchmarked To Your Hiring Bar:

Want To See It In Action?