This year I had several occasions to give my “Winning Data Science Competitions” talk – at Microsoft, KSEA-SWC 2017, USC Applied Statistics Club, Spark SC, and Whisper.

I am grateful for all these opportunities to share what I enjoy with the data scientist community.

I truly believe that working on competitions on a regular basis can make us better data scientists. Hope my talk and slides help other data scientists.

My talk is outlined as follows:

  • Why compete
    • For fun
    • For experience
    • For learning
    • For networking
  • Data science competition intro
    • Competitions
    • Structure
    • Kaggle
  • Misconceptions of data science competitions
    • No ETL?
    • No EDA?
    • Not worth it?
    • Not for production?
  • Best practices
    • Feature engineering
    • Diverse algorithms
    • Cross validation
    • Ensemble
    • Collaboration
  • Personal tips
  • Additional resources

You can find latest slides here