Hit 95% accuracy on my hiring algorithm and still got burned by bias

I was training a resume screening model for a mid size company in Austin. Got it to 95% accuracy on the test set. Felt great until an auditor ran it against a fairness toolkit and found out it was rejecting women for warehouse roles at twice the rate. The accuracy number just hid the problem. Has anyone else had a high performing model that failed a fairness check in a way you didn't expect?

3 comments

3 Comments

martin.riley1mo ago

Accuracy numbers are a trap if you don't dig into what they're actually measuring. 95% doesn't mean much when the model is basically learning to reject people based on zip codes or names that correlate with gender. The real problem is that most folks stop at overall accuracy and never check for these slices. You gotta run the data through a fairness tool from day one, not as an afterthought. That's the only way to catch this stuff before it's live and causing damage.

elliotm571mo ago

Throw a fairness tool on that thing and watch the accuracy number tank while the model suddenly stops being racist. Feels like 95% is just code for "we didn't look hard enough," right?

-1

ward.jamie1mo ago

That zip code thing Martin mentioned is real, I've seen it too. A buddy of mine was building a model for a retail chain in Atlanta and it learned to flag certain zip codes as "high risk." Turned out those zip codes were mostly black neighborhoods. The model was 94% accurate but it was basically redlining applicants. Here's what I keep coming back to though - when you found that gender bias in the warehouse roles, did you trace it back to a specific feature or was it something more hidden like the way the job descriptions were written? Because I've noticed a lot of these bias issues sneak in through the training data itself, not just the model. Like if the warehouse historically hired mostly men, the model just learns that pattern and then you're stuck trying to untangle it. Did you ever figure out if it was the data or the model architecture that caused the problem?