Deep Multi-Task Learning – 3 Lessons Learned

We could tune a separate learning rate for each of the “heads” (task-specific subnets), and another rate for the shared subnet.

Though it may sound complicated, it’s actually pretty simple.

Usually when training a NN in TensorFlow you use something like:AdamOptimizer defines how gradients should be applied, and minimize computes and applies them.

We can replace minimize with our own implementation that would use the appropriate learning rate for each variable in our computational graph when applying the gradients:By the way, this trick can actually also be useful for single-task networks.

Once we’re past the first phase of creating a NN that predicts multiple tasks, we might want to use our estimate for one task as a feature to another.

In the forward-pass that’s really easy.

The estimate is a Tensor, so we can wire it just like any other layer’s output.

But what happens in backprop?Say the estimate for task A is passed as a feature to task B.

We probably wouldn’t want to propagate the gradients from task B back to task A, as we already have a label for A.

Don’t worry, TensorFlow’s API has tf.

stop_gradient just for that reason.

When computing the gradients, it lets you pass a list of Tensors you wish to treat as constants, which is exactly what we need.

Again, this is useful in MTL networks, but not only.

This technique can be used whenever you want to compute a value with TensorFlow, and need to pretend that the value was a constant.

 For example, when training Generative Adversarial Networks (GANs), you don’t want to backprop through the generation process of the adversarial example.

Our models are up and running and Taboola feed is being personalized.

However, there is still a lot of room for improvement, and lots of interesting architectures to explore.

In our use case, predicting multiple tasks also means we make a decision based on multiple KPIs.

That can be a bit more tricky than using a single KPI… but that’s already a whole new topic.

Thanks for reading, I hope you found this post useful!Bio: Zohar Komarovsky is an Algorithms Developer at Taboola and works on Machine Learning applications for Recommendation Systems.


Reposted with permission.

Resources:Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.



js; (document.

getElementsByTagName(head)[0] || document.


appendChild(dsq); })();.. More details

Leave a Reply