By Benedict Neo, Data Science enthusiast and blogger.
Photo by Arseny Togulev on Unsplash.
Machine learning is a trendy topic in this age of Artificial Intelligence.
The fields of computer vision and Natural Language Processing (NLP) are making breakthroughs that no one could’ve predicted.
We see both of them in our lives more and more, facial recognition in your smartphones, language translation software, self-driving cars and so on.
What might seem sci-fi is becoming a reality, and it is only a matter of time before we attain Artificial General Intelligence.
In this article, I will be covering Jeff Dean’s keynote on the advancements of computer vision and language models and how ML will progress towards the future from the perspective of model building.
Photo by Alex Knight on Unsplash.
The field of Machine learning is experiencing exponential growth today, especially in the subject of computer vision.
Today, the error rate in humans is only 3% in computer vision.
This means computers are already better at recognizing and analyzing images than humans.
What an amazing feat! Decades ago, computers were hunks of machinery the size of a room; today, they can perceive the world around us in ways that we never thought possible.
The progress we’ve made from 26% error in 2011 to 3% error in 2016 is hugely impactful.
The way I like to think is, computers have now evolved eyes that work.
— Jeff DeanNow this achievement — made possible with advancements in machine learning — isn’t just a celebration for computer geeks and AI experts, it has real-world applications that save lives and make the world a better place.
Before I blab about a life-saving application of computer vision, let me illustrate to you the power of computer vision.
Let’s say I give you 10,000 pictures of dogs and I ask you to classify them into their respective species, are you able to do that? Well, you can, but you have to be a dog expert, and it’ll take days by the time you’re done.
But for a computer (with a GPU), this takes mere minutes.
This incredible capability of computer vision opens up a profusion of applications.
Application of computer visionOne quintessential application for computer vision given by Jeff Dean is in diabetic retinopathy — which is a diabetes complication that affects the eye.
Now to diagnose it, an extensive eye exam is required.
In third-world countries and rural villages where there is a paucity of doctors, a machine learning model that uses computer vision to make a diagnosis will be extremely beneficial.
As with all medical imaging fields, this computer vision can also be a second opinion for the domain experts, ensuring the credibility of their diagnosis.
Generally, the purpose of computer vision in the medical field is to replicate the expertise of specialists and deploy it in places where people need it the most.
Photo by VanveenJF on Unsplash.
Language models are algorithms that help machines understand the text and perform all kinds of operations, such as translating text.
According to Jeff Dean, a lot of progress has been made in language models.
Today, computers can understand paragraphs of text at a much deeper level than they could before.
Even though they aren’t at the level of reading an entire book and understanding it the way we humans do, the ability to understand a few paragraphs of text is fundamental to things such as improving the Google search system.
The BERT model, the latest Natural Language Processing (NLP) model that Google announced has been put to use in their search ranking algorithms, This helped enhance the search results for myriads of different kinds of queries that were previously very difficult.
In other words, the search system can now better understand different kinds of searches done by users and help provide better and more accurate answers.
“Deep learning and machine learning architectures are going to change a lot in the next few years.
You can see a lot of this already, where now with NLP, the only game in town basically is Transformer networks,” — Yann LeCunThese Transformer-based models for translation are showing spectacular gains in the BLEU score, which is a measurement of translation quality.
So, Machine Learning architectures that utilize transformers such as BERT are increasing in popularity and functionality.
Photo by Charles on Unsplash.
In the keynote, the Google Senior Fellow mentioned atomic models that Machine Learning developers use today to perform all kinds of unit tasks.
He believes these models are inefficient and computationally expensive, and more effort is required to achieve good results in those tasks.
To elaborate, in the ML world today, experts find a problem that they want to solve, and they focus on finding the right dataset to train the model and perform that particular task.
Dean argues that by doing so, they basically start from zero — they initialize the parameter of the model with random floating points and then try to learn about everything that tasks from the dataset.
To elaborate on this matter, he gives an excellent comparison that goes like this:“It’s akin to when you want to learn something new, you forget all your education and you go back to being an infant, and now you try to learn everything about this task”He compares this methodology with humans becoming infants every time we want to learn something new and taking a brain out and putting in a different one in.
Not only is this method computationally expensive, but more effort is also required to achieve good outcomes in those tasks.
And Jeff Dean proposes a solution.
Photo by Marius Masalar on Unsplash.
Jeff believes the future of ML lies in a great big model, a multi-functioning model that can do plenty of things.
This uber model will eliminate the need to create models that do specific tasks and instead train this one large model with different pieces of expertise.
Imagine a computer vision model that can diagnose diabetic retinopathy, classify different species of dogs, recognize your face and be used in self-driving cars and drones at the same time.
He also proclaimed that the model operates by sparsely activating different pieces of the model that is required.
The model will be 99% idle most of the time, and you only have to call upon the right fragment of expertise when needed.
ChallengesDean believes this uber model is a promising direction for ML and the engineering challenges are very interesting.
To build a model like this would engender lots of interesting computer systems and machine learning problems such as scalability and the structure of the model.
The main question posed is:How will the model learn how to route the different pieces of the model that is most appropriate?To achieve a breakthrough like this will require more advancements in machine learning research as well as in mathematics.
Computer vision and NLP will continue to play a significant role in our lives.
But there are adverse implications to this advancement as well, such as China using facial recognition to implement a rating system on the people (straight out of an episode from the TV show black mirror) and the proliferation of fake news.
We must progress in Machine Learning while taking into account of algorithmic biases and ethics that remind us of our place, a creation of God and not creators.
As for the uber model, there is much evidence proving we are inching closer and closer towards it.
For example, transfer learning — a way of reusing the model for a different purpose achieves good results with fewer data and multi-task learning — a model that operates at small scales of five or six related things all tend to make things work well.
Thus, it’s logical to say that the realization of an uber model is plausible by extending those ideas — transfer learning and multi-task learning — out and developing on them, it’s only a matter of when and not how.
Thanks for reading my excerpt on the future of ML and my synopsis of Jeff Dean’s keynote.
I hope you got a glimpse of what is to come in Machine Learning and AI.
Watch the full video here.
Original.
Reposted with permission.
Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.
createElement(script); dsq.
type = text/javascript; dsq.
async = true; dsq.
src = https://kdnuggets.
disqus.
com/embed.
js; (document.
getElementsByTagName(head)[0] || document.
getElementsByTagName(body)[0]).
appendChild(dsq); })();.