Machine Learning Strategy (Part 3)

In this third and last blog post about machine learning strategy I will talk about the problems of different distributions of train, development and test set and about learning from multiple tasks.

Different distributions of training and test/dev data sets

In many cases we have different distributions of the train set and the test set (which should be similar to the final dataset where we apply our data). The dev set should be ideally similar to the test set. With different distribution of train and dev/test set it is not clear where the difference in performance comes from;

→ Is it variance or the different distributions that lead to different performances?

Solution: Create a Training-dev-set, which has the same distribution as the training set, but is used for developing. Also create a standard dev-set with the same distribution as the test set. By looking at the performance difference between training-dev-set and standard dev-set we can analyze if the difference comes from the different distributions or through the variance.

Learning from multiple tasks

Here I will shortly introduce the topic of learning from multiple tasks by mentioning some topics of this vast field.

Transfer Learning

Use infrastructures and algorithms of similar problems and adjust them to the current case. This is especially useful if more data is available for similar problems than for the problem at hand.

Multi-task Learning

Try to learn tasks simultaneously:

  • E.g. if the task is to predict several classes at once the task could be converted to a multilabel or multivariate task.
  • It might be useful to use a single loss function
  • Only makes sense if tasks are similar and if their is a connection between the tasks

End-to-end Learning

End-to-end Learning is the modeling of the whole pipeline with just one model:

  • In contrast to that is the division of the task in single modeling steps
  • Example: Identifications of persons with a camera. This task can be divided in two steps:
    1. Identification of the face on the picture and zoom on it
    2. Use an algorithms to identify the face with the zoomed picture

      → In this case the single tasks are much easier to learn than an algorithm that learns everything at once

Advantages and Disadvantages pf end-to-end learning


  • Let the data speak
  • No manual adjustment of the modeling design is necessary


  • Usually more data necessary
  • External knowledge (not available in the data) cannot be incorporated
  • Manual designed pipelines/features can possibly incorporate this knowledge

The mentioned strategies and methods are applicable on many machine learning problems, on classical statistical problems as well as on complex deep learning pipelines.

The end

This blog post is partly based on information that is contained in a tutorial about deep learning on that I took recently. Hence, a lot of credit for this post goes to Andrew Ng that held this tutorial.

Feel free to post your questions and annotations below.

Written on August 19, 2021
comments powered by Disqus