A March Madness Bracket disguised as a data science project
I’m trying to get hired as a data scientist after my military career concludes, and I’ve been told that building a portfolio of personal data projects can help, so I used data analytics to determine which teams in the tournament are a threat to make noise and which ones are too highly seeded and doomed to exit early.
The full article I wrote up is linked but here’s some of the analysis I wanted to share:
I always hear teams evaluated by either an “eye test”, by asking “who would win on a neutral field”, or by evaluating the teams’ body of work. So I made a new system that combines all 3, using national polls for the eye test, multiple advanced metrics for the neutral site test, and a mathematical formula based on quad wins/losses for the resume check.
Once I scored each team, I ran through the bracket using historical data from mcubed.net on seed vs seed matchups and basic probability tables to determine where upsets should occur, then using my scoring system to decide where in the bracket to place those upsets.
The results were fascinating (to me at least), especially when the two models sometimes worked against each other.