# NeurIPS 2017 Notes

by Hang Li

Jeong and I attended NeurIPS 2017 in December, 2017. Our notes are as follows.

# Take-Aways for Professionals

As shown in the statistics shared by organizers during opening remarks, the majority of NeurIPS papers are from academia. Even papers from industry, which are only a small fraction, are mostly from research organizations. What can professionals take away from this academic conference? In my experience, people from industry can get following benefits from NeurIPS.

**Cutting-edge research**: This might not be applicable in practice immediately, but can still provide important perspectives and directions on each problem.**Recruiting**: I would say that 90% of sponsors are focus on hiring. All big companies had their after-parties (a.k.a. recruiting events).**Networking**: For some people, this is the most important benefit at NeurIPS. With over 7,000 attendees, NeurIPS 2017 was the largest academic conference in Machine Learning. Everyday I enjoyed conversations with many people in the same field at the poster sessions, after-parties, and even on the way back home with uberpool.

# Technical Trends

We noticed technical trends as follows.

- Meta-learning
- Interpretability
- ML systems (or systems for ML)
- Bayesian modeling
- Unsupervised learning
- Probabilistic programming

Below are areas that I would like to investigate further in 2018.

- Model Interpretation
- Attention models
- Online learning
- Reinforcement learning

# Detailed Session-by-Session Notes

Below are more detailed notes:

## On 12/4 (Mon)

### TUTORIALS

Title | Comments |
---|---|

Deep Learning: Practice and Trends | Very good summary of deep learning’s current status and trends. CNN, RNN, adversarial networks and unsupervised domain adaptation are closer to actual application. These models should be in professionals’ tool boxes. Meta learning and graph networks are interesting but further away from application. |

Deep Probabilistic Modeling with Gaussian Processes | This talk brings an important point. In real world applications, we need to know not only pointwise predictions, but also the level of uncertainty in predictions to support decision making. |

Geometric Deep Learning on Graphs and Manifolds by Michael Bronstein | This talk focuses on a interesting trend in deep learning, which uses deep learning on graphs data. In my opinion, there is still a long way to have real applications out of this field. |

### OPENING REMARKS/ INVITED TALK

| Title | Comments | | ————- | ————- | | Opening Remarks & Powering the next 100 years | Opening remarks has several interesting statistics of NeurIPS. It shows that NeurIPS is a very academia-centric conference. The invited talk, explains the huge amount of energy human need and the limitation of fossil fuel and low-carbon tech. Some ideas of how machine learning can help new energy (fusion) next 100 years and have big impact. Including: exploration and inference experiments data. Adding human (domain experts) preferences into ML approach. Mentioned several Bayesian approaches. It is about applied machine learning in physics which can impact world a lot. Thanks to many open source frameworks, it gets much easier to apply ML to different problem. ML becomes a major tool and will have huge impact across different domains. |

### POSTER SESSIONS

Title | Comments |
---|---|

SvCCa: Singular vector Canonical Correlation analysis for Deep understanding and improvement | Google’s blog and paper to understand deep learning models. It can be used to improve prediction performance. The key idea is using Singular vector Canonical Correlation (SvCC) to analysis hidden layer parameters. |

Dropoutnet: addressing Cold Start in recommender Systems | This focuses only on the item cold start. It need a metadata based vector representative of new items. |

LightGBM: A Highly Efficient Gradient Boosting Decision Tree | This paper explains the implementation of LightGBM. It uses different approximate approach from XGBoost’s. |

Discovering Potential Correlations via Hypercontractivity | An interesting idea to find potential relationship in the subset of data. |

Other interesting papers | Learning Hierarchical Information Flow with Recurrent Neural Modules. Learning ReLUs via Gradient Descent. Clone MCMC: Parallel High-Dimensional Gaussian Gibbs Sampling Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems |

## On 12/5 (Tue)

### INVITED TALK

Title | Comments |
---|---|

Why AI Will Make it Possible to Reprogram the Human Genome | This is one of the most impactful areas of AI/DL. Lately, AI/DL has been used to tackle many challenges in healthcare and shown some promising results. |

Test Of Time Award: Random Features for Large-Scale Kernel Machines | This is the spotlight talk of NeurIPS 2017. It stirred a lot of discussions online. I highly recommend that you watch the video. Points from both sides of discussion are valid. Some related discussions: Yann LeCun’s rebuttal to Ali’s talk Alchemy, Rigour and Engineering |

The Trouble with Bias | This is a good topic. Data collection and creation process can introduce strong undesirable bias to the data set. ML algorithms can reproduce and even reinforce such bias. This is more than a technical problem. |

### POSTER SESSIONS

Title | Comments |
---|---|

A Unified Approach to Interpreting Model Predictions | Use expectations and Shapley values to interpret model prediction. Unified several previous approaches including LIME. https://github.com/slundberg/shap |

Positive-Unlabeled Learning with Non-Negative Risk Estimator | 1 class classification is very useful in real world, e.g. click ads, watch content, etc. This paper use a different loss function in PU learning. |

An Applied Algorithmic Foundation for Hierarchical Clustering | There are several papers on hierarchical clustering. This is just one of them. Hierarchical clustering is also very useful in real world. In this paper it more focus on the foundation(objective function) of this problem. |

Affinity Clustering: Hierarchical Clustering at Scale | Another hierarchical clustering paper. A bottom-up hierarchical clustering. Each time make many merge decisions. |

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results | This is an interesting semi-supervised deep learning approach. I feel it used students to prevent overfitting. Teacher and student improve each other in a virtuous cycle. |

Unbiased estimates for linear regression via volume sampling | Choose samples wisely can get similar (not bad) performance w entire data set. This will be useful in the scenarios which is costly to get labels. |

A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control | There are several papers of MAB(Multi-armed bandit), this is one of them. MAB can be very useful in website optimization. |

Other interesting papers | Streaming Weak Submodularity: Interpreting Neural Networks on the Fly. Generalization Properties of Learning with Random Features |

## On 12/6 (Wed)

### INVITED TALK

Title | Comments |
---|---|

The Unreasonable Effectiveness of Structure | This talk discussed the structure in input and output. Then describe a way to describe “structure” in data. (Probabilistic Soft Logic http://psl.linqs.org/ ) |

Deep Learning for Robotics | If working in robotics domain, this is a must attend talk. This talk discussed many unsolved pieces to the AI robotics puzzle and how DL (deep reinforcement learning, meta learning, etc ) can help. Some ideas might be useful in other domain. |

### POSTER SESSIONS

Title | Comments |
---|---|

Clustering with Noisy Queries | This paper describe and analysis a way of how to gather answers of a clustering problem. Instead of asking “do element u belong to cluster A” this paper suggest asking “do elements u and v belong to the same cluster?” |

End-to-End Differentiable Proving | Very interesting paper which try to combine NN and 1st order logic expert system. Learn vector representation of symbols. |

ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games | Looks like a fun place to try AI(:)). |

Attention Is All You Need | A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. |

Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles | Measure the uncertainty is very important. This paper describe a way (simple non-Bayesian baseline) to measure uncertainty. |

Other interesting papers | Train longer, generalize better: closing the generalization gap in large batch training of neural networks. Unsupervised Image-to-Image Translation Networks. A simple neural network module for relational reasoning. Style Transfer from Non-parallel Text by Cross-Alignment |

## On 12/7 (Thu)

### INVITED TALK

Title | Comments |
---|---|

Learning State Representations | This is a very interesting talk. It tried to peel the onions of how human make decision and learn stuff. The researcher also design experiments to prove the hypothesis of “we cluster experiences together into task states based on similarity and learning happens within a cluster, not across cluster borders”. Then try to design model structure to represent this cluster(state). |

On Bayesian Deep Learning and Deep Bayesian Learning | This talk is about combine Bayesian Learning and Deep Learning. This topic can be very useful in the future. It also include several projects in this area. |

### SYMPOSIUM – INTERPRETABLE ML

Title | Comments |
---|---|

About this symposium | I think interpretability is a very important part of models. As be mentioned in one talk of this symposium interpretability is not a purely computational problem and beyond tech. The final goal still be untangle(understand) causal impact, model interpretability can be valuable in at least 2 aspects: debug model predict, help generate hypotheses to do controlled experiment. |

Invited talk - The role of causality for interpretability. | This talk discussed how to use causality in model interpretability. |

Invited talk - Interpretable Discovery in Large Image Data Sets | This talk present a DEMUD(SVOD-based plus explanations) method to interprete image data sets. |

Poster | Detecting Bias in Black-Box Models Using Transparent Model Distillation. The Intriguing Properties of Model Explanations. Feature importance scores and lossless feature pruning using Banzhaf power indices |

Debate about whether or not interpretability is necessary for machine learning | Interesting debates about interpretability. Worth to watch. |

# Other Resources

NeurIPS videos, slides and notes are available as follows.

- NeurIPS 2017 Proceedings
- Slides
- Videos
- https://www.youtube.com/results?search_query=NeurIPS+2017
- https://nips.cc/Conferences/2017/Videos

- Curated Resources
- https://github.com/kihosuh/nips_2017
- https://github.com/hindupuravinash/nips2017

- Notes
- NeurIPS 2017 – Day 1 Highlights by Emmanuel Ameisen
- NeurIPS 2017 – Day 2 Highlights by Emmanuel Ameisen
- NeurIPS 2017 – Day 3 Highlights by Emmanuel Ameisen
- Highlights from My First NeurIPS by Ryan Rosario
- NeurIPS 2017 Notes by David Abel (pdf)
- NeurIPS 2017 Reports by Viktoriya Krakovna
- NeurIPS 2017 notes and thoughts by Olga Liakhovich