Please take our programming assessment with an LLM

Nov 05, 2024

I’m Trey, the Head of Engineering at Clipboard Health, and I want you to use an LLM to apply to our Software Engineering roles.

At Clipboard Health, we’ve long been believers in hiring using assessments that are as close as possible to the actual work you would do on the job (we call these “Case Studies”, sometimes folks call them “take-home tests”). These show us how you actually think and work (instead of where you’ve worked), and are far more predictive of how you’ll perform in our fast-paced, customer-obsessed, async/remote environment than an ordinary interview.

We’re in the process of changing our evaluation process for hiring new software engineers, and we’re rolling out new Case Studies. For a time, we avoided engineering Case Studies in favor of pure synchronous interviews. This was driven by two big fears from our team:

A fear of driving away good candidates who don’t want to do an async problem that takes many hours
A fear of LLMs making it too easy to get past and thus losing signal

However, by avoiding Case Studies, we created some new problems:

We interviewed in a style that was totally different from how we work as a fully async, remote company; we weren’t exposing candidates to our culture or evaluating them under the work conditions they can actually expect on the job
We made it much harder to have consistently well-calibrated evaluations of candidates’ work; a large portion of the output of the interview was based on the interviewer rather than the candidate
We spent a lot of engineering time interviewing candidates who weren’t at the right skill level for our role
To avoid #3, we ended up using proxy thinking (Where did they work before? Where did they go to school?) as a significant factor in deciding who to interview

To solve those problems, we re-introduced an asynchronous programming problem, but this time using a vendor who promised to increase submission rates by time limiting the exercise, and to detect/prevent LLM usage.

This has worked pretty well, but we think we can do better. Specifically, we are trying to solve a few more problems:

We’d like to evaluate the actual work submitted for clues that the engineer is aligned with our culture rather than trying to save this for some kind of “culture fit interview” – after all, culture is what you do every day, not how you talk about it
We want to rely even more on asynchronous work samples than we do today for the reasons listed above
We don’t care if engineers use LLMs to solve our asynchronous problems

That 3rd point may be surprising, and took some hard swallows from our team to accept. But what we realized is that LLMs are part of a modern engineer’s toolkit. I polled our engineering team and over half are using an LLM-enhanced IDE every day (and we’ve had a few more converts since then)!

LLMs have become an important enough part of our everyday toolkit that we rolled out an allowance for the company to pay for the LLM-based tools of each engineer’s choosing (Cursor + Claude 3.5 is the current most popular). We also just open-sourced a collection of prompts that our team uses to help our LLMs generate code that adheres to our best practices.

LLMs, at least as they exist today, are not a replacement for software engineers. But they are an extremely valuable tool that engineers can use to accelerate their work, no different in many ways than an IDE or a linter. If that’s the case, then our job in evaluating software engineers is to be able to figure out who is a good software engineer even when they are using an LLM. If our Case Study can be trivially solved with an LLM, it is plainly not a good Case Study for evaluating whether an engineer will be a good fit on our team. We want to evaluate engineers in an environment as close to their real-world work environment as possible, which means an LLM may be a part of it.

In our new initial Case Study, we give candidates 90 minutes (to limit the investment you’re making before you meet us) to solve a real-world programming problem in a toy version of our application using whatever tools they normally would.

So go ahead, use an LLM during our evaluation process. We want great engineers, no matter what tools they use.

Creating Value from Nothing

Please take our programming assessment with an LLM