Machine Learning Strategy (Part 2)

This is the second blog post about machine learning strategy. It is about human-level performance, bias and variance (tradeoff) and how to improve your algorithm iteratively.

Human-Level Performance

  • Human-Level Performance is the performance that can be achieved by human, by e.g. looking at the data.
  • An algorithm should at least reach human-level performance; If not, there is potential for improvement
  • Improvement can be achieved by e.g.
    • Generating more training data (e.g. using humans)
    • A manual analysis of the observations on which the algorithm performs bad
    • An analysis of the bias-variance tradeoff (see later)
  • The data could be structured in different ways for humans and computers; whereas humans can read in the information of a picture with the eyes, for computers this information has to be transformed to numbers.
  • The Bayes Error is defined as the best performance that could be achieved by an algorithm for the task; It should theoretically always be below the human-level performance

Bias and Variance

There is a tradeoff between bias and variance, while learning machine learning algorithms. I will differentiate here between avoidable bias and variance:

  • Avoidable bias: Difference in performance between training data and Bayes error (=human-level performance)
  • Variance: Difference in performance between training data and development data.

graphic

Possible improvements:

graphic

Notes:

  • Avoidable bias should be small, but not too small (problem of overfitting)
  • Depending on performance of human, training and dev error either bias or variance reduction should be targeted
  • If algorithmic dev set performance is better than human-level performance there is no clear indicator if bias or variance should be reduced

Examples:

graphic

Error analysis

Following things can be done in the error analysis:

  • Get observations which cause problems (missclassification, big deviation)
  • Which is the cause for this?
  • Create table with percentage error of the cause compared to whole error
  • Obtain the main causes which are responsible for the worse performance

Example for image classification with cats:

graphic

Concentration on problems that cause the lion part of the worse performance

  • Can the problems be solved by adjusting the algorithm? (new features, other hyperparameters, other algorithms)
  • Can the algorithm be improved by removing certain observations from the dataset? (e.g. outliers, wrongly classified observations, etc.)

General advices:

  • Set up train/dev/test data set
  • Build the first algorithm quickly (on train)
  • Execute a bias-variance and error analysis to detect the main problems (on train/dev)
  • Make improvements in the data set/algorithm and analyse the resulting algorithm again (on train/dev)
  • Do the final evaluation on test

→ „Build your first system quickly, then iterate to improve it“

→ In most cases data scientists build too complex procedures/algorithms than too simple ones, so think about the complexity of the machine learning system that you have built.

Next blog post

In the next blog post I will talk about what to do with different distribution in the train/dev/test set, how to learn with multiple tasks and advantages and disadvantages of end-to-end learning.

This blog post is partly based on information that is contained in a tutorial about deep learning on coursera.org that I took recently. Hence, a lot of credit for this post goes to Andrew Ng that held this tutorial.

Written on August 13, 2021
comments powered by Disqus