Quora: How many employed data scientists are able to solve problems from online competitions such as Kaggle’s?

by Jeong-Yoon Lee

Before reading further, please watch this video (only 1m 47s long), which shows how an average man compares to a football player at 40 yard dash.

When I talk to data science professionals, especially senior ones with more experience, I often encounter optimism on one’s competitiveness - “I know what I am doing and can build good models at work - maybe better than others”.

Online competitions provide objective measures for at least a few criteria, such as prediction accuracy, time to build a good model, reproducibility, etc.

For most data scientists, including myself, working on competitions is a reality check and humbling experience:

Long tenure doesn’t guarantee superior performance. As summarized by Dr. Ericsson in his bestseller book, Peak, the doctor, teacher, or driver with twenty years of experiences is likely to be worse than the one with only five because one’s performance deteriorates gradually with years of routine/automated work in the absence of deliberate efforts to improve.

Going back to the original question, employed data scientists “without learning from competitions” are likely to do very poorly on competitions.

The learning doesn’t need to come from participating in competitions. Out of 1MM+ Kaggle users, only 65K+ participate in competitions, while others learn cutting edge algorithms and best practices from tutorials, solutions shared by others, working on open data sets, etc.

Whenever I talk to someone who discounts the benefits of competitions without having a single competition experience, and yet is very confident on her/his modeling capability, I can’t help thinking about the average-man-vs-football-player video above, and just smile. :)

Competing against 0.1% improvement in accuracy? It’s like criticizing that Olympian 100m sprinters compete for 0.1 sec. That’s not for most of us. Don’t worry about it until you get close. We have much longer way to go.

Footnotes

  1. Superforecasting: The Art and Science of Prediction eBook: Philip E. Tetlock, Dan Gardner: Kindle Store 

  2. Rank