AF
Afsal
afsal@policyinsure.com
Partial evidenceNeeds follow-up
AI CODING ASSESSMENT · 4 SESSIONS · 11 COMMITS

Artifact-oriented burst builder

The candidate ships in bursts, grounds agents in references, and often codifies and verifies outputs, though the captured evidence is partial and decision outcomes are unknown.

How much did they ship?
21,659 lines
11 commits across 1 repos
What kind of builder are they?
Burst builder
75% of work blocks were short bursts
Longest single session?
5h 60m
2 deep sessions (>=1h focus)
How often do they plan first?
0% in plan mode
4 main sessions analyzed
How many agents at once?
1 at once
peak concurrent main sessions · 60 subagent runs
Longest streak?
3 days straight
consecutive days shipping
How often do they change course?
2%
2 redirects across 64 traced sessions
How long are their prompts?
33 words on average
16 prompts a session
How much time did they put in?
7h 5m
4 main sessions
What kind of work is it?
Unclassified commits
commits don't follow a conventional feat/fix convention
How they work with AI
Steering
The candidate shows some deliberate steering through reference grounding and reusable artifacts, while the low redirect rate suggests they generally let the agent continue once oriented. There is little evidence of explicit upfront plan mode, so this reads as reference-led and artifact-led steering rather than formal planning.
Solid
partial evidence
Verification
The candidate shows recurring verification behavior: transcripts exhibit verify-before-ship, product QA, and second-opinion patterns, and the repository contains a notable test footprint. Because the evidence tier is partial and decision outcomes are unknown, this supports a solid verification habit but not a definitive claim about quality outcomes.
Solid
partial evidence
Debugging & recovery
There is some debugging signal through 4 recorded debugging decisions, command-heavy tool use, and occasional tight-iteration and before/after-proof patterns. The debugging sample is smaller than architecture and scope decisions, so the evidence supports an emerging read rather than a stronger characterization.
Emerging
partial evidence
Decision-making
The candidate makes many recorded architecture and scope decisions and appears to use references and second opinions to support some choices. However, all 75 decision outcomes are unknown, so the data shows decision activity and some decision supports, not whether those decisions worked well.
Solid
partial evidence
Delivery
The candidate delivered a substantial amount of linked work in a short window, combining burst-style blocks with two deep sessions and a 3-day shipping streak. The 60 subagent runs indicate internal AI fan-out supporting the work, while the candidate’s own peak concurrent main sessions was 1.
Solid
partial evidence
Signature patterns
Codifies lessons into artifacts
turns learnings into reusable skills/docs/config
seen in 4 sessions
Grounds the agent in a reference
anchors work to a concrete example/benchmark/reference product
seen in 3 sessions
Verifies before shipping
checks/tests output before accepting
seen in 3 sessions
Seeks a second opinion
brings in another model or an explicit review pass
seen in 2 sessions
Tests their own product
runs and QAs the thing end-to-end
seen in 2 sessions
Decision patterns
75 decisions across architecture, scope, quality, debugging
architecture · 35scope · 26quality · 10debugging · 4
outcomes recorded: 0 positive, 0 negative, 75 unknown
Most decisions have no recorded outcome — outcome quality needs interview follow-up.

Strengths

Substantial linked delivery in the captured project
21,659 lines shipped in 11 commits across 1 repo, with 11 linked commits and join_confidence 0.8931.
Turns learnings into reusable artifacts
Signature pattern appears in 4 sessions: Codifies lessons into artifacts.
Uses multiple verification loops
Signature patterns include Verifies before shipping in 3 sessions, Tests their own product in 2 sessions, and Seeks a second opinion in 2 sessions.
Anchors agent work with references and low observed rework
Grounds the agent in a reference in 3 sessions, with 2 redirects across 64 traced sessions.
Maintains a visible test footprint
Quality dimensions show test_file_ratio 0.3218 across 174 files; commits include test-related changes such as 48 test files in commit 5e2504 and 2 test files in commit a346d6.

Risks & growth areas

Evidence base is narrow
4 main sessions in 1 repo, with all per-axis confidence marked partial.
Decision quality cannot be inferred from outcomes
75 decisions recorded with 0 positive, 0 negative, and 75 unknown outcomes.
Formal upfront planning is not visible in the captured sessions
0% plan mode across 4 main sessions, despite 35 architecture decisions and 26 scope decisions.
Change intent may be harder to assess from commit metadata alone
Commits are classified as unclassified or non-conventional: 0 feature commits, 0 fix commits, and metric card notes commits do not follow a conventional feat/fix convention.
Suggested interview questions
01The candidate had 0% plan mode across 4 main sessions but showed reference grounding in 3 sessions; how do they normally frame an AI agent before implementation when no formal plan mode is used?
02The candidate had only 2 redirects across 64 traced sessions; can they describe a case where an agent went in the wrong direction and how they intervened?
03The candidate showed verify-before-shipping in 3 sessions and product QA in 2 sessions; what exact checks, tests, or manual review steps did they run before considering the work done?
04The candidate used second opinions in 2 sessions; when do they decide to bring in another model or review pass, and how do they resolve conflicting recommendations?
05The candidate recorded 35 architecture and 26 scope decisions, but all 75 decision outcomes are unknown; which architectural or scope decision from this project would they defend, and what tradeoff did they make?
06The candidate shipped 21,659 lines across 11 commits with a 32.18% test-file ratio; what parts of the shipped work were generated by agents versus manually reviewed or rewritten by them?
07The candidate used 60 subagent runs while maintaining a peak of 1 concurrent main session; how did they coordinate, merge, and verify work produced through subagent fan-out?
08The candidate’s commits do not follow a conventional feat/fix convention; how do they usually structure commits and communicate change intent to reviewers?
Evidence & method
4
sessions
11
commits
11
linked
0.8931
join confidence
75
decisions

Generated by VibeHire from a privacy-safe payload. Raw source, diffs, and full transcripts never left the candidate's machine — only redacted behavioral metadata and scores. Bands are provisional and uncalibrated: decision support, not a hire/no-hire verdict.