NeurIPS 2017 Notes

by Hang Li

Jeong and I attended NeurIPS 2017 in December, 2017. Our notes are as follows.

NeurIPS 2017 Notes
Take-Aways for Professionals
Technical Trends
Detailed Session-by-Session Notes
Other Resources

Take-Aways for Professionals

As shown in the statistics shared by organizers during opening remarks, the majority of NeurIPS papers are from academia. Even papers from industry, which are only a small fraction, are mostly from research organizations. What can professionals take away from this academic conference? In my experience, people from industry can get following benefits from NeurIPS.

Cutting-edge research: This might not be applicable in practice immediately, but can still provide important perspectives and directions on each problem.
Recruiting: I would say that 90% of sponsors are focus on hiring. All big companies had their after-parties (a.k.a. recruiting events).
Networking: For some people, this is the most important benefit at NeurIPS. With over 7,000 attendees, NeurIPS 2017 was the largest academic conference in Machine Learning. Everyday I enjoyed conversations with many people in the same field at the poster sessions, after-parties, and even on the way back home with uberpool.

Technical Trends

We noticed technical trends as follows.

Meta-learning
Interpretability
ML systems (or systems for ML)
Bayesian modeling
Unsupervised learning
Probabilistic programming

Below are areas that I would like to investigate further in 2018.

Model Interpretation
Attention models
Online learning
Reinforcement learning

Detailed Session-by-Session Notes

Below are more detailed notes:

On 12/4 (Mon)

TUTORIALS

Title	Comments
Deep Learning: Practice and Trends	Very good summary of deep learning’s current status and trends. CNN, RNN, adversarial networks and unsupervised domain adaptation are closer to actual application. These models should be in professionals’ tool boxes. Meta learning and graph networks are interesting but further away from application.
Deep Probabilistic Modeling with Gaussian Processes	This talk brings an important point. In real world applications, we need to know not only pointwise predictions, but also the level of uncertainty in predictions to support decision making.
Geometric Deep Learning on Graphs and Manifolds by Michael Bronstein	This talk focuses on a interesting trend in deep learning, which uses deep learning on graphs data. In my opinion, there is still a long way to have real applications out of this field.

OPENING REMARKS/ INVITED TALK

| Title | Comments | | ————- | ————- | | Opening Remarks & Powering the next 100 years | Opening remarks has several interesting statistics of NeurIPS. It shows that NeurIPS is a very academia-centric conference. The invited talk, explains the huge amount of energy human need and the limitation of fossil fuel and low-carbon tech. Some ideas of how machine learning can help new energy (fusion) next 100 years and have big impact. Including: exploration and inference experiments data. Adding human (domain experts) preferences into ML approach. Mentioned several Bayesian approaches. It is about applied machine learning in physics which can impact world a lot. Thanks to many open source frameworks, it gets much easier to apply ML to different problem. ML becomes a major tool and will have huge impact across different domains. |

POSTER SESSIONS

Title	Comments
SvCCa: Singular vector Canonical Correlation analysis for Deep understanding and improvement	Google’s blog and paper to understand deep learning models. It can be used to improve prediction performance. The key idea is using Singular vector Canonical Correlation (SvCC) to analysis hidden layer parameters.
Dropoutnet: addressing Cold Start in recommender Systems	This focuses only on the item cold start. It need a metadata based vector representative of new items.
LightGBM: A Highly Efficient Gradient Boosting Decision Tree	This paper explains the implementation of LightGBM. It uses different approximate approach from XGBoost’s.
Discovering Potential Correlations via Hypercontractivity	An interesting idea to find potential relationship in the subset of data.
Other interesting papers	Learning Hierarchical Information Flow with Recurrent Neural Modules. Learning ReLUs via Gradient Descent. Clone MCMC: Parallel High-Dimensional Gaussian Gibbs Sampling Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

On 12/5 (Tue)

INVITED TALK

Title	Comments
Why AI Will Make it Possible to Reprogram the Human Genome	This is one of the most impactful areas of AI/DL. Lately, AI/DL has been used to tackle many challenges in healthcare and shown some promising results.
Test Of Time Award: Random Features for Large-Scale Kernel Machines	This is the spotlight talk of NeurIPS 2017. It stirred a lot of discussions online. I highly recommend that you watch the video. Points from both sides of discussion are valid. Some related discussions: Yann LeCun’s rebuttal to Ali’s talk Alchemy, Rigour and Engineering
The Trouble with Bias	This is a good topic. Data collection and creation process can introduce strong undesirable bias to the data set. ML algorithms can reproduce and even reinforce such bias. This is more than a technical problem.

POSTER SESSIONS

Title	Comments
A Unified Approach to Interpreting Model Predictions	Use expectations and Shapley values to interpret model prediction. Unified several previous approaches including LIME. https://github.com/slundberg/shap
Positive-Unlabeled Learning with Non-Negative Risk Estimator	1 class classification is very useful in real world, e.g. click ads, watch content, etc. This paper use a different loss function in PU learning.
An Applied Algorithmic Foundation for Hierarchical Clustering	There are several papers on hierarchical clustering. This is just one of them. Hierarchical clustering is also very useful in real world. In this paper it more focus on the foundation(objective function) of this problem.
Affinity Clustering: Hierarchical Clustering at Scale	Another hierarchical clustering paper. A bottom-up hierarchical clustering. Each time make many merge decisions.
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results	This is an interesting semi-supervised deep learning approach. I feel it used students to prevent overfitting. Teacher and student improve each other in a virtuous cycle.
Unbiased estimates for linear regression via volume sampling	Choose samples wisely can get similar (not bad) performance w entire data set. This will be useful in the scenarios which is costly to get labels.
A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control	There are several papers of MAB(Multi-armed bandit), this is one of them. MAB can be very useful in website optimization.
Other interesting papers	Streaming Weak Submodularity: Interpreting Neural Networks on the Fly. Generalization Properties of Learning with Random Features

On 12/6 (Wed)

INVITED TALK

Title	Comments
The Unreasonable Effectiveness of Structure	This talk discussed the structure in input and output. Then describe a way to describe “structure” in data. (Probabilistic Soft Logic http://psl.linqs.org/ )
Deep Learning for Robotics	If working in robotics domain, this is a must attend talk. This talk discussed many unsolved pieces to the AI robotics puzzle and how DL (deep reinforcement learning, meta learning, etc ) can help. Some ideas might be useful in other domain.

POSTER SESSIONS

Title	Comments
Clustering with Noisy Queries	This paper describe and analysis a way of how to gather answers of a clustering problem. Instead of asking “do element u belong to cluster A” this paper suggest asking “do elements u and v belong to the same cluster?”
End-to-End Differentiable Proving	Very interesting paper which try to combine NN and 1st order logic expert system. Learn vector representation of symbols.
ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games	Looks like a fun place to try AI(:)).
Attention Is All You Need	A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles	Measure the uncertainty is very important. This paper describe a way (simple non-Bayesian baseline) to measure uncertainty.
Other interesting papers	Train longer, generalize better: closing the generalization gap in large batch training of neural networks. Unsupervised Image-to-Image Translation Networks. A simple neural network module for relational reasoning. Style Transfer from Non-parallel Text by Cross-Alignment

On 12/7 (Thu)

INVITED TALK

Title	Comments
Learning State Representations	This is a very interesting talk. It tried to peel the onions of how human make decision and learn stuff. The researcher also design experiments to prove the hypothesis of “we cluster experiences together into task states based on similarity and learning happens within a cluster, not across cluster borders”. Then try to design model structure to represent this cluster(state).
On Bayesian Deep Learning and Deep Bayesian Learning	This talk is about combine Bayesian Learning and Deep Learning. This topic can be very useful in the future. It also include several projects in this area.

SYMPOSIUM – INTERPRETABLE ML

Title	Comments
About this symposium	I think interpretability is a very important part of models. As be mentioned in one talk of this symposium interpretability is not a purely computational problem and beyond tech. The final goal still be untangle(understand) causal impact, model interpretability can be valuable in at least 2 aspects: debug model predict, help generate hypotheses to do controlled experiment.
Invited talk - The role of causality for interpretability.	This talk discussed how to use causality in model interpretability.
Invited talk - Interpretable Discovery in Large Image Data Sets	This talk present a DEMUD(SVOD-based plus explanations) method to interprete image data sets.
Poster	Detecting Bias in Black-Box Models Using Transparent Model Distillation. The Intriguing Properties of Model Explanations. Feature importance scores and lossless feature pruning using Banzhaf power indices
Debate about whether or not interpretability is necessary for machine learning	Interesting debates about interpretability. Worth to watch.

Other Resources

NeurIPS videos, slides and notes are available as follows.

NeurIPS 2017 Proceedings
Slides
Videos
- https://www.youtube.com/results?search_query=NeurIPS+2017
- https://nips.cc/Conferences/2017/Videos
Curated Resources
- https://github.com/kihosuh/nips_2017
- https://github.com/hindupuravinash/nips2017
Notes