Evaluating Performance and Making Decisions
A whiteboard challenge is designed to reveal something important. But only if you know how to extract and interpret what you see. The distance between observing a challenge and making a hiring decision requires careful evaluation.
Immediately After the Challenge
Take Detailed Notes (Within 15 minutes)
While the experience is fresh, capture:
Objective Observations:
- Time spent on each phase (problem analysis, research, ideation, solution)
- Key decisions made and why they were made
- Questions asked during clarification
- How they handled being challenged or asked follow-up questions
- Moments of confidence or uncertainty
Specific Examples:
- "When I asked about mobile vs desktop, they said 'I assumed desktop because...' showing they made explicit assumptions"
- "When I challenged their navigation approach, they said 'Let me reconsider' and actually changed their direction"
- "They spent 12 minutes on research but only 4 on ideation, suggesting they value thorough analysis"
Avoid Generalizations:
- Not: "Good communication"
- Instead: "Explained their user research rationale clearly, used specific examples to support their reasoning"
What NOT to Do
- Don't make your final decision immediately
- Don't rely only on how you "felt" about them
- Don't compare to other candidates yet
- Don't score before you've had time to process
Scoring and Assessment (24 hours later)
Use Your Rubric
Return to your assessment framework (problem comprehension, research, ideation, solution quality, communication) and score each dimension.
Scoring Scale (Example):
- 5: Exceptional. Clear strength in this dimension.
- 4: Strong. Well above what we need.
- 3: Adequate. Meets our needs.
- 2: Weak. Below expectations.
- 1: Poor. Significant gap.
Score Each Dimension Independently
Look at your notes and score problem comprehension first, without thinking about solution quality. This prevents "halo effect" (being influenced by overall impression).
Example Scoring Process:
Problem Comprehension:
- Notes: "Asked about user demographics, business goals, and competitive landscape. Wrote down key constraints before starting"
- Spent 5 minutes on analysis, which is right for this challenge
- Had a clear hypothesis before moving forward
- Score: 4 (Strong—asked good questions, thorough understanding, but could have dug deeper on constraints)
Research & Discovery:
- Notes: "Identified 3 distinct user scenarios, talked about their needs and pain points"
- Spent 4 minutes on research
- Considered but didn't deeply explore competitive solutions
- Score: 3 (Adequate—good user focus, but could have researched competitive landscape more)
Ideation & Exploration:
- Notes: "Sketched 2 distinct concepts, discussed trade-offs, chose one with clear reasoning"
- Spent 7 minutes on this phase
- Good rationale for chosen direction
- Score: 4 (Strong—multiple concepts, clear thinking)
Solution Quality:
- Notes: "Solution was comprehensive, addressed core problem, thought through user flow and edge cases"
- Good level of detail without over-specifying
- Clear rationale for design decisions
- Score: 4 (Strong—solid solution, well-reasoned)
Communication:
- Notes: "Clear explanation of thinking, easy to follow, used appropriate design language"
- Thought out loud effectively
- Responded well when I challenged them
- Score: 4 (Strong—articulate and confident)
Calculate Weighted Score
If your weights are:
- Problem Comprehension (20%): 4 × 0.20 = 0.80
- Research & Discovery (15%): 3 × 0.15 = 0.45
- Ideation & Exploration (15%): 4 × 0.15 = 0.60
- Solution Quality (20%): 4 × 0.20 = 0.80
- Communication (20%): 4 × 0.20 = 0.80
- Total Score: 3.45/5
Interpreting Your Score
What Each Score Range Means
4.5-5.0: Exceptional
- Clear strength across almost all dimensions
- Demonstrates mastery of fundamental design thinking
- Would likely succeed in the role
4.0-4.4: Strong
- Solid across all dimensions
- Has necessary fundamentals and would grow in the role
- This is your target hire for most positions
3.5-3.9: Borderline
- Adequate in some areas, weak in others
- Could succeed with strong management and support
- Requires careful evaluation—don't hire just because they're not bad
3.0-3.4: Weak
- Below expectations on key dimensions
- Likely to struggle in the role without significant coaching
- Unless there are strong mitigating factors from other interviews, recommend not hiring
Below 3.0: Poor
- Significant gaps across multiple dimensions
- Strong indicators they're not ready for this role
- Clear recommendation: don't hire
The Panel Discussion (If Using Multiple Evaluators)
Before Discussing, Score Independently
Each panelist should score before discussing. This prevents groupthink and ensures diverse perspectives are heard.
Discussing Significant Differences
If one panelist scored 4.5 and another 3.0, discuss why:
- "I scored their solution quality high because it addressed the core problem thoroughly"
- "I scored it lower because they didn't consider the mobile experience"
This discussion usually reveals important nuances and leads to better consensus.
Reaching Consensus (Or Documenting Disagreement)
- Aim for consensus if possible
- If disagreement persists, you can note it ("Scored 4.0 and 3.5; noted area of disagreement: depth of competitive research")
- Use the average of panel scores if you can't reach consensus
Other Interview Data
Whiteboard Challenge + Behavioral Interview
How does the challenge score relate to their behavioral interview performance?
- If strong challenge + strong behavioral interview: Hire with confidence
- If strong challenge + weak behavioral interview: They can do the work but may struggle with collaboration or communication on the job
- If weak challenge + strong behavioral interview: They communicate well but might struggle with core design thinking
- If weak on both: Strong indicator they're not ready
Whiteboard Challenge + Portfolio Review
What does the challenge reveal that the portfolio doesn't?
- Strong portfolio + strong challenge: Excellent candidate
- Strong portfolio + weak challenge: They may have had great teams/mentorship but might not deliver individually
- Weak portfolio + strong challenge: High potential but less experience; could be great hire for right role
- Weak on both: Unlikely to succeed
Making the Final Decision
Decision Framework
Consider all data, not just challenge score:
- Challenge Assessment: 70% weight (most predictive of role success)
- Behavioral Interview: 15% weight (demonstrates collaboration and culture fit)
- Portfolio: 10% weight (shows track record but less predictive than challenge)
- References/Others: 5% weight (validation and context)
Clear Cases
Clear Hire (Score 4.0+):
- Strong across multiple evaluation methods
- Clear readiness for the role
- Low risk hire
- Decision: Hire
Clear Don't Hire (Score 2.5 or below):
- Significant gaps in key areas
- Weak across multiple evaluation methods
- Unlikely to succeed without substantial support
- Decision: Don't Hire
Borderline Cases (Score 3.0-3.9)
Ask These Questions:
- Is the gap due to nervousness or a real skill gap? (Answer: Real skill gap—whiteboard challenges are about assessing under pressure)
- Can we mitigate the weakness through team placement or mentorship? (Maybe—but risky)
- Are there other strong signals from interviews/references? (If yes, this helps tip the decision)
- Is this person in a high-visibility role where weakness could be problematic? (Higher risk if yes)
- How critical is the weak skill to success in this role?
Decision Path for Borderline Cases:
- If weakness is in a critical area → Don't hire
- If weakness is in a secondary area + strong signals elsewhere → Could hire with clear understanding of gap
- If uncertain → Don't hire (hire the person who's clearly good, not the person you hope will be good)
Documenting Your Decision
Document:
- Candidate name and date
- Challenge presented
- Scores on each dimension
- Weighted total score
- Key strengths and growth areas
- How this score compared to other evaluators (if panel)
- How this relates to other interview data
- Final recommendation and rationale
Why Documentation Matters:
- Defensibility: Shows your decision-making was structured and fair
- Learning: Lets you track which indicators predict actual on-the-job success
- Continuity: If others need to understand your decision, documentation helps
- Improvement: Patterns in hiring data reveal what you should weight more/less
After the Hire (Or Rejection)
For Hired Candidates
- Share key feedback from the challenge (strengths and growth areas)
- Use this to inform their onboarding and initial projects
- Track their actual performance and compare to challenge predictions
For Candidates You Didn't Hire
- If they ask, you can share general feedback
- Focus on growth areas: "We noticed you could strengthen your research phase. Here are some resources..."
- Leave the door open: "These skills develop with practice. We'd love to see you again in a year"
Continuous Improvement
- Every quarter, review hiring decisions against performance
- Did people who scored 4.5 actually perform well? Did they exceed expectations?
- Did people who scored 3.0 struggle?
- Adjust weights in your rubric based on what predicts success
The best hiring decisions come from structured evaluation combined with multiple data points. Whiteboard challenges are powerful because they reveal core capability. But they're most powerful when you know how to interpret what you see and make decisions based on evidence, not intuition.