Relations between machine learning problems








Workshop Description


The workshop proposes to focus on relations between machine learning problems. We use “relation” quite generally to include (but not limit ourselves to) notions such as:
  • one type of problem being viewed special case of another type (e.g., classification as thresholded probability estimation); 
  • reductions between learning problems (e.g., transforming ranking problems into classification problems); 
  •  the use of surrogate losses (e.g., replacing misclassification loss with some other, convex loss). 
  •  relations between sets of learning problems, such as those studied in the (old) theory of “comparison of experiments”;
  •  connections between machine learning problems and what could be construed as "economic learning problems" such as prediction markets and forecast elicitation.


The point of studying relations between machine learning problems is that it stands a reasonable chance of being a way to be able to understand the field of machine learning as a whole. It could serve to prevent re-invention, and rapidly facilitate the growth of new methods. The motivation is not dissimilar to Hal Varian’s notion of combinatorial innovation. Another analogy is to consider the development of function theory in the 19th century and observe the rapid advances made possible by the development of functional analysis, which, rather than studying individual functions, studied operators that transformed one function to another.

Much recent work in machine learning can be interpreted as relations between problems. For example:
In fact some older work in machine learning can be viewed as relations between problems. For example
  •  Learning with real-valued functions in the presence of noise can be reduced to multiclass classification 
  • Comparison of Experiments  involves comparison of families of machine learning problems (where the comparison has to hold for all loss functions)
If one attempts to construct a catalogue of machine learning problems at present one is rapidly overwhelmed by the complexity. And it is not at all clear (on the basis of the usual description of them) whether or not two problems with different names are really different.

(If the reader is unconvinced, consider the following partial list: batch, online, transductive, off-training set, semi-supervised, noisy (label, attribute, constant noise / variable noise, data of variable quality), data of different costs, weighted loss functions, active, distributed, classification (binary weighted binary multi-class), structured output, probabilistic concepts / scoring rules, class probability estimation, learning with statistical queries, Neyman-Pearson classification, regression, ordinal regression, ranked regression, ranking, ranking the best, optimising the ROC curve, optimising the AUC, regression, selection, novelty detection, multi-instance learning, minimum volume sets, density level sets, regression level sets, sets of quantiles, quantile regression, density estimation, data segmentation, clustering, co-training, co-validation, learning with constraints, conditional estimators, estimated loss, confidence / hedging estimators, hypothesis testing, distributional distance estimation, learning relations, learning total orders, learning causal relationships, and estimating performance (cross validation)!

Current Attempts

There are few current attempts to build a better understanding of machine learning via relations between problems. One attempt (by some of the organisers) is the Reconceiving  Machine  Learning project.

Desired Outcomes

  • New relations between learning problems – not individual solutions to individual problems
  • Visibility and promulgation of the “meme” of relating problems;
  • Potential agreement to a shared community effort to build a comprehensive map of the relations between machine learning problems.