Xavier Amatriain’s Machine Learning and Artificial Intelligence 2019 Year-end Roundup

Adapting Pretrained Representations to Diverse Tasks”, it usually pays off to fine tune models to specific tasks.

Our team at Curai came to that exact conclusion when comparing general language models to those trained on the medical domain in “Domain-Relevant Embeddings for Medical Question Similarity”.

So, we are still far from having general-purpose language models that can tell good stories and adapt to different tasks and domains.

Finally, I could not finish this paragraph on limitations of language models without mentioning Merity’s great “Stop Thinking With Your Head” where he shows how for many tasks a simple LSTM model can perform almost as well as the most complicated Transformer.

   In 2019 we continued to hear loud voices advocating for AI not to get stuck in a Deep local maxima.

According to many, me included, we should be able to combine data-intensive deep learning approaches with more knowledge-intensive methods to add some form of innate structure.

While it is true that there is a lot of work to be done in that space, we did see many examples of research combining deep learning and more “traditional” AI.

In “Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems”, Salesforce presents a state-of-the-art approach for slot-filling task-oriented dialogue systems that combines deep learning with more traditional conversational methods.

“Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning” is a recent paper from Google that also combines several deep and knowledge-intensive approaches for the same purpose.

Wizard of Wikipedia: Knowledge-Powered Conversational agents is Facebook’s response in that same space.

“ERNIE: A knowledge graph-enhanced language model” is a novel approach where a language model is trained not only on natural language data, but also a knowledge graph.

Interestingly though, it turns out that while language models might benefit from being trained on knowledge graphs, they themselves also encode knowledge and can be used as knowledge bases (see “Language Models as Knowledge Bases?”).

Similarly, deep learning models like BERT or ELMO do not only encode knowledge, but also syntax.

The Stanford NLP team showed how syntax trees can be directly inferred from such models.

    If I had to name two important fundamental trends behind many of the advances highlighted above those would be: transfer learning, and self-supervision.

Transfer learning (the idea that you can train a model on an original dataset and apply the resulting model elsewhere) is a pretty obvious idea behind language models, but also earlier image models trained on Imagenet and the like.

The idea of self-supervision might be a bit less obvious.

Maybe that is why some are calling it The Quiet Revolution despite Yan LeCun having screamed it out loud for years to anyone who was listening.

In any case, self-supervision, the idea that you can train a model on unlabelled data by exploiting the context in the data itself, is catching on.

Not only language models like BERT or ALBERT use the idea extensively, but this same notion is being applied to other domains, making it easier to train on large corpuses without needing to spend huge efforts in annotation.

For example, self-supervision is being used to improve image classification models.

See for example “Self-Supervised Learning of Pretext-Invariant Representations”, “Data-Efficient Image Recognition with Contrastive Predictive Coding”, or the recent “Self-training with Noisy Student improves ImageNet classification”.

All of these approaches improve on SOTA supervised methods while using much less labeled data.

 A fascinating application of self-supervision that takes the idea a step further is Facebook’s “Unsupervised Question Answering by Cloze Translation” where they split the question answering problem into two steps.

The first steps generates synthetic training data with a model that synthesizes fill-in-the-gap questions from documents.

The second step uses a traditional Q&A model.

This is similar to our “Learning from the experts” where we sidestep the need for costly and noisy labeling of medical data by generating synthetic training data.

   The year also came with other advances that don’t neatly fit into the main trends of combining knowledge with deep learning, or self-supervision.

What follows are some of my favorite highlights in this miscellaneous category.

In “The Lottery ticket hypothesis” the authors show a fascinating result: due to sheer chance, some subnetworks with many less parameters than the original network have comparable accuracy.

For some reason their connections have initial weights that result in a much more effective training.

The authors also present an algorithm to identify those “winning tickets”.

In the same vein of finding more efficient yet performing models, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks” introduces an approach to uniformly scale all dimensions in a CNN.

Rectified Adam is a variation over the well-known Adam optimizer that results in better training and higher accuracy (if you don’t know Adam, you probably should since, according to Chip Huyen, it’s the most commonly asked question during interviews).

In “Classification Accuracy Score for Conditional Generative Models” the authors present a new way to evaluate generative models by training a classifier on synthetic data but predicting labels on real data.

While this is not strictly speaking a novel idea (see e.

g.

“LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation”), and the paper applies it only on GANs for image, it does show an interesting path for evaluating other generative models in different domains.

In “Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation” we see how injecting synthetic noise during training can improve the quality of the trained models.

I personally believe that the use of synthetic data plus noise is going to bring a lot of advances in AI in the near future.

But, maybe I am just biased because our own publication “Learning from the experts: From expert systems to machine-learned diagnosis models” already proposed a combination of these two techniques.

Another important line of research is on how to apply learned models in “the wild” by modeling uncertainty and out-of-distribution modeling.

In real-life it is important to understand the uncertainty of model predictions and whether the data point is outside of the distribution on which the model was trained on.

A few papers on this space have been presented at NeurIPS 2019 (see e.

g.

”Modeling Uncertainty by Learning a Hierarchy of Deep Neural Connections” and “Likelihood Ratios for Out-of-Distribution Detection”.

This is also a very important aspect to tackle in healthcare, and it is indeed the focus of our recent paper “Open Set Medical Diagnosis”.

Finally, I should note that there have been a lot of publications in the broad space of what I would call human-AI-Interaction that includes research areas like fairness, bias, or interpretability.

Hard to pick the most impactful works in this space, but I will highlight two with almost opposite takeaways.

AI2 presented AllenNLP-Interpret, a toolkit for interactive model interpretations and explanations.

This work won the EMNLP best demo award.

On the other extreme, in “Manipulating and Measuring Model Interpretability” Microsoft researchers surprisingly concluded that model transparency and interpretability not only did not help, but could hamper user ability to detect model mistakes.

And one last, and very recent, piece of news on this space of human-AI is Facebook’s announcement of a $1M deep-fake detection challenge.

Clearly detecting fake content is going to be a huge deal in the future, and it is good to see that we are already putting efforts into this.

   It has been more than 3 years since Alpha Go beat Lee Sedol, but we are still receiving the aftershocks of such a feat with Sedol recently announcing his retirement because of that defeat.

And, while it might seem like there is not much more progress to be made in AI for games, computers insist on getting better at more, and more complex games.

This year we saw two major feats, with DeepMind reaching human-level performance in Quake III Arena Capture the Flag and wining the Starcraft competition with AlphaStar.

Both these advances show us the ability of algorithms not only to master complicated but highly structured games like Go, but also to adapt to more fuzzy strategic goals in which even collaboration is needed.

 A final, and pretty recent, advance in this space is Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model where Deepmind again shows like a combination of search and a learned model can be applied to gain superhuman performance not only in a single game, but in a range of games.

   Of course the Deep revolution is impacting way beyond text and image.

I will focus on the two areas I follow most closely: recommender systems and healthcare.

Interestingly, I have seen a similar pattern in both areas this year (warning: you should know that us “scientists” see patterns all around us).

In recommender systems, Deep Learning has been impacting the research community for some time now, probably ever since Youtube published their first paper on using Deep Learning for recommendations in 2016.

Maybe unsurprisingly, most of the results related to deep learning continue to come from industry continues.

My former team at Netflix has definitely jumped on the DL train, and they have been speaking publicly about different deep learning enhancements to the Netflix recommender system.

See this recent tutorial by Anoop Deoras on using Deep Learning for recommendations.

Even Facebook, who are honestly not very active in the recommendation arena, made quite a splash this year by open-sourcing a Deep Learning recsys model/framework.

But, not everything is shiny and bright on the deep side of the recsys street.

As a matter of fact, the best paper award at the Recsys conference went to a paper that questions most of the recent advances in using deep learning approaches and shows how simpler methods obtain similar to better results.

 In healthcare, the deep learning revolution has already been here for some time too.

There have been many research papers in this intersection.

So much, that Stanford/Google felt like we are at a point when we can even publish “A guide to deep learning in healthcare”.

Truth be told, the most interesting/credible applications of deep learning to healthcare are actually still when applied to either images or text (see some of our own examples in “Domain-Relevant Embeddings for Medical Question Similarity” or “Prototypical Clustering Networks for Dermatological Disease Diagnosis”).

However, when applied to more complex data like Electronic Health Records (EHR) we show that much simpler models perform just as well as deep neural networks (see our upcoming “The accuracy vs.

coverage trade-off in patient-facing diagnosis models”).

   Unsurprisingly, the “AI framework war” that I already mentioned in last year’s round up has not cooled down.

The two main contenders continue to be Google’s TensorFlow and Facebook’s Pytorch.

Who will win the war remains to be seen, but according to some data, Pytorch continues to win the research battle, while TensorFlow dominates in production-ready systems.

This last year, TensorFlow released the highly-anticipated TF 2.

0, its main highlights being tight integration with Keras, default eager execution mode, and more Pythonic function execution.

In other words, TF is trying to become more usable and friendly… more Pytorch-like.

TensorFlow also introduced its Swift library, which immediately attracted many programmers.

Even fast.

ai announced that it would embrace Swift, and maybe question the use of Pytorch over time, which would be a huge deal.

On the other hand, Pytorch has continued to evolve.

Probably the most notable news late this year was another popular library Chainer merging into Pytorch.

 There was a lot of movement outside of the two big players too.

Microsoft recently announced a really interesting serving infrastructure, which provides highly efficient production-ready serving of models trained with any framework supporting the ONNX standard.

This can be a huge win for Pytorch since its serving infrastructure is lagging behind TensorFlow’s for sure.

On the NLP-frameworks arena, I have to obviously mention the great work by the folks at Huggingface who seem to release the code to any NLP advance before the paper even hits arxiv.

A huge accomplishment especially given that all big players are interested in this space.

Even Uber published their own framework for dialogue system research.

    Thanks for making it this far!.I know this is a long post.

Plus, I am always much better at explaining the past than predicting the future.

So, I won’t keep you here much longer.

I don’t have risky predictions for what 2020 will bring, but I am sure of a few things:Hope you enjoyed the post, and looking forward to your feedback and comments!  Bio: Xavier Amatriain is the Cofounder and CTO at Curai.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.

. More details

Leave a Reply