by Jeong-Yoon Lee
I am glad to announce the release of Kaggler 0.5.0. Kaggler 0.5.0 has a significant improvement in the performance of the FTRL algorithm thanks to Po-Hsien Chu (github, kaggle, linkedin).
We increase the train speed by up to 100 times compare to 0.4.x. Our benchmark shows that one epoch with 1MM records with 8 features takes 1.2 seconds with 0.5.0 compared to 98 seconds with 0.4.x on an i7 CPU.
The FTRL algorithm has been a popular algorithm since its first appearance on a paper published by Google. It is suitable for highly sparse data, so it has been widely used for click-through-rate (CTR) prediction in online advertisement. Many Kagglers use FTRL as one of their base algorithms in CTR prediction competitions. Therefore, we want to improve our FTRL implementation and benefit Kagglers who use our package.
We profile the code with cProfile and resolve the overheads one by one:
__getitems__, resulting in a lot of function calls. In
fit(), we know that we’re fetching exactly each row, and it is very unlikely to exceed the bound, so we can fetch the indexes of each row in a faster way. This enhancement makes our FTRL 10x faster.
Po-Hsien Chu (github, kaggle, linkedin)