Solution Sharing for the Talking Data Competition at Kaggle


At the Talking Data Competition at Kaggle, I teamed up with Luca, Hang, Mert, Erkut, and Damien.  We finished 37th out of 1,689 teams.  I originally posted this to the forum here:

Hi, I’d like to share my team, ensemble’s solution and framework.

The code is available at gitlab:

and team’s internal LB is available here:

We joined the competition late, and had just enough time to build and run the end-to-end framework without much feature engineering. So feature-wise, there is nothing fancy, but I hope that you can find the framework itself helpful. 🙂

As you can see, it uses Makefiles to pipeline feature generation, single model training, and ensemble training. The main benefits of our framework based on Makefiles are:

  • It’s language agnostic – You can use any language to do any parts of pipeline. Although this specific version uses Python throughout the pipeline, I used to mix R, Python, and other executables to run the pipeline.
  • It checks dependencies automatically – It checks if previous steps were completed, and if not, it runs those steps automatically.
  • It’s modular – When working with others, it’s easy to split tasks across team members so that each one can focus on different parts of pipeline.

If you are new to Makefiles, here are some references:

Enjoy. 🙂

Kaggler. Data Scientist.