NIPS 2017 Notes

Jeong and I attended NIPS 2017 in December, 2017. Our notes are as follows.

Take-Aways for Professionals

As shown in the statistics shared by organizers during opening remarks, the majority of NIPS papers are from academia. Even papers from industry, which are only a small fraction, are mostly from research organizations. What can professionals take away from this academic conference? In my experience, people from industry can get following benefits from NIPS.

  • Cutting-edge research: This might not be applicable in practice immediately, but can still provide important perspectives and directions on each problem.
  • Recruiting: I would say that 90% of sponsors are focus on hiring. All big companies had their after-parties (a.k.a. recruiting events).
  • Networking: For some people, this is the most important benefit at NIPS. With over 7,000 attendees, NIPS 2017 was the largest academic conference in Machine Learning. Everyday I enjoyed conversations with many people in the same field at the poster sessions, after-parties, and even on the way back home with uberpool.
Registration line at NIPS 2017. There were over 7,000 attendees.

Technical Trends

We noticed technical trends as follows.

  • Meta-learning
  • Interpretability
  • ML systems (or systems for ML)
  • Bayesian modeling
  • Unsupervised learning
  • Probabilistic programming

Below are areas that I would like to investigate further in 2018.

  • Model Interpretation
  • Attention models
  • Online learning
  • Reinforcement learning

Detailed Session-by-Session Notes

Below are more detailed notes:

On 12/4 (Mon)

Tutorials

TitleComments
Deep Learning: Practice and TrendsVery good summary of deep learning’s current status and trends.

CNN, RNN, adversarial networks and unsupervised domain adaptation are closer to actual application. These models should be in professionals' tool boxes.
Meta learning and graph networks are interesting but further away from application.
Deep Probabilistic Modeling with Gaussian ProcessesThis talk brings an important point. In real world applications, we need to know not only pointwise predictions, but also the level of uncertainty in predictions to support decision making.
Geometric Deep Learning on Graphs and Manifolds by Michael BronsteinThis talk focuses on a interesting trend in deep learning, which uses deep learning on graphs data. In my opinion, there is still a long way to have real applications out of this field.

Opening Remarks/ Invited Talk

TitleComments
Opening Remarks & Powering the next 100 yearsOpening remarks has several interesting statistics of NIPS. It shows that NIPS is a very academia-centric conference.
The invited talk, explains the huge amount of energy human need and the limitation of fossil fuel and low-carbon tech.
Some ideas of how machine learning can help new energy (fusion) next 100 years and have big impact. Including: exploration and inference experiments data. Adding human (domain experts) preferences into ML approach. Mentioned several Bayesian approaches.
It is about applied machine learning in physics which can impact world a lot.
Thanks to many open source frameworks, it gets much easier to apply ML to different problem. ML becomes a major tool and will have huge impact across different domains.

Poster Sessions

TitleComments
SvCCa: Singular vector Canonical Correlation analysis for Deep understanding and improvementGoogle’s blog and paper to understand deep learning models. It can be used to improve prediction performance. The key idea is using Singular vector Canonical Correlation (SvCC) to analysis hidden layer parameters.
Dropoutnet: addressing Cold Start in recommender SystemsThis focuses only on the item cold start. It need a metadata based vector representative of new items.  
LightGBM: A Highly Efficient Gradient Boosting Decision TreeThis paper explains the implementation of LightGBM. It uses different approximate approach from XGBoost's.
Discovering Potential Correlations via HypercontractivityAn interesting idea to find potential relationship in the subset of data.
Other interesting papers* Learning Hierarchical Information Flow with Recurrent Neural Modules.
* Learning ReLUs via Gradient Descent.
* Clone MCMC: Parallel High-Dimensional Gaussian Gibbs Sampling
* Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

On 12/5 (Tue)

Invited Talk

TitleComments
Why AI Will Make it Possible to Reprogram the Human GenomeThis is one of the most impactful areas of AI/DL. Lately, AI/DL has been used to tackle many challenges in healthcare and shown some promising results.
Test Of Time Award: Random Features for Large-Scale Kernel MachinesThis is the spotlight talk of NIPS 2017. It stirred a lot of discussions online. I highly recommend that you watch the video. Points from both sides of discussion are valid. Some related discussions:

Yann LeCun's rebuttal to Ali's talk
Alchemy, Rigour and Engineering
The Trouble with BiasThis is a good topic. Data collection and creation process can introduce strong undesirable bias to the data set. ML algorithms can reproduce and even reinforce such bias. This is more than a technical problem.

Poster Sessions

TitleComments
A Unified Approach to Interpreting Model PredictionsUse expectations and Shapley values to interpret model prediction. Unified several previous approaches including LIME.
https://github.com/slundberg/shap
Positive-Unlabeled Learning with Non-Negative Risk Estimator1 class classification is very useful in real world, e.g. click ads, watch content, etc. This paper use a different loss function in PU learning.
An Applied Algorithmic Foundation for Hierarchical ClusteringThere are several papers on hierarchical clustering. This is just one of them. Hierarchical clustering is also very useful in real world. In this paper it more focus on the foundation(objective function) of this problem.
Affinity Clustering: Hierarchical Clustering at ScaleAnother hierarchical clustering paper. A bottom-up hierarchical clustering. Each time make many merge decisions.
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning resultsThis is an interesting semi-supervised deep learning approach. I feel it used students to prevent overfitting. Teacher and student improve each other in a virtuous cycle.
Unbiased estimates for linear regression via volume samplingChoose samples wisely can get similar (not bad) performance w entire data set. This will be useful in the scenarios which is costly to get labels.
A framework for Multi-A(rmed)/B(andit) Testing with Online FDR ControlThere are several papers of MAB(Multi-armed bandit), this is one of them. MAB can be very useful in website optimization.
Other interesting papers* Streaming Weak Submodularity: Interpreting Neural Networks on the Fly
* Generalization Properties of Learning with Random Features

On 12/6 (Wed)

Invited Talk

TitleComments
The Unreasonable Effectiveness of StructureThis talk discussed the structure in input and output. Then describe a way to describe “structure” in data. (Probabilistic Soft Logic http://psl.linqs.org/ )
Deep Learning for RoboticsIf working in robotics domain, this is a must attend talk. This talk discussed many unsolved pieces to the AI robotics puzzle and how DL (deep reinforcement learning, meta learning, etc ) can help. Some ideas might be useful in other domain.

Poster Sessions

TitleComments
Clustering with Noisy QueriesThis paper describe and analysis a way of how to gather answers of a clustering problem. Instead of asking “do element u belong to cluster A” this paper suggest asking “do elements u and v belong to the same cluster?”
End-to-End Differentiable ProvingVery interesting paper which try to combine NN and 1st order logic expert system. Learn vector representation of symbols.
ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy GamesLooks like a fun place to try AI(:)).
Attention Is All You NeedA new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
Simple and Scalable Predictive Uncertainty Estimation using Deep EnsemblesMeasure the uncertainty is very important. This paper describe a way (simple non-Bayesian baseline) to measure uncertainty.
Other interesting papers* Train longer, generalize better: closing the generalization gap in large batch training of neural networks
* Unsupervised Image-to-Image Translation Networks
* A simple neural network module for relational reasoning
* Style Transfer from Non-parallel Text by Cross-Alignment

On 12/7 (Thu)

Invited Talk

TitleComments
Learning State RepresentationsThis is a very interesting talk. It tried to peel the onions of how human make decision and learn stuff. The researcher also design experiments to prove the hypothesis of “we cluster experiences together into task states based on similarity and learning happens within a cluster, not across cluster borders”. Then try to design model structure to represent this cluster(state).
On Bayesian Deep Learning and Deep Bayesian LearningThis talk is about combine Bayesian Learning and Deep Learning. This topic can be very useful in the future. It also include several projects in this area.

Symposium – Interpretable ML

TitleComments
About this symposiumI think interpretability is a very important part of models. As be mentioned in one talk of this symposium interpretability is not a purely computational problem and beyond tech. The final goal still be untangle(understand) causal impact, model interpretability can be valuable in at least 2 aspects: debug model predict, help generate hypotheses to do controlled experiment.
Invited talk - The role of causality for interpretability.This talk discussed how to use causality in model interpretability.
Invited talk - Interpretable Discovery in Large Image Data SetsThis talk present a DEMUD(SVOD-based plus explanations) method to interprete image data sets.
Poster* Detecting Bias in Black-Box Models Using Transparent Model Distillation
* The Intriguing Properties of Model Explanations
* Feature importance scores and lossless feature pruning using Banzhaf power indices
Debate about whether or not interpretability is necessary for machine learningInteresting debates about interpretability. Worth to watch.

Other Resources

NIPS videos, slides and notes are available as follows.