Kaggler 0.3.7 Released

Changes:

  • Cython optimization for performance – boundscheck(False), wraparound(False), and¬†cdivision(True) are used.
  • Adaptive learning rate – instead of \frac{1}{\sqrt{n_i} + 1}, \frac{1}{\sqrt{\sum{g_i^2}} + 1} is used where g_i is the gradient of the associated weight.
  • Type correction – change the type of index from double to int.

You can upgrade Kaggler either by using pip:

$ (sudo) pip install -U Kaggler

or from the source at github:

$ git fetch origin
$ git rebase origin/master
$ python setup.py build_ext --inplace
$ (sudo) python setup.py install

I haven’t had a chance to use it with real competition data yet – after the Avazu competition, I deleted whole build directory ūüôĀ – and I don’t have numbers for how much faster (or slower?!) it becomes after these changes yet.

I will jump into another competition soon, and let you know how it works. ūüôā

Kaggler. Data Scientist.

Kaggler – Python Package for Kaggler

This article was originally posted on Kaggle’s Avazu competition forum¬†and reposted here with a few edits.

Here I’d like to share what I’ve put together for online learning as a Python package – named Kaggler.

You can install it with pip as follows:

$ pip install -U Kaggler

then, import algorithm classes as follows:

from kaggler.online_model import SGD, FTRL, FM, NN, NN_H2

Currently it supports 4 online learning algorithms – SGD, FTRL, FM, NN (1 or 2 ReLU hidden layers), and 1 batch learning algorithm – NN with L-BFGS AUC optimization.

It uses the liblinear style sparse input format РIt is chosen so that the same input file can be used across other popular tools such as XGBoost, VW, libFM, SVMLight, etc.

Code and examples are available at https://github.com/jeongyoonlee/Kaggler, and package documentation is available at http://pythonhosted.org//Kaggler/.

Kaggler. Data Scientist.