# Representation

Learning relies on extraction and representation of the information given by a set of examples in order to understand the phenomena they describe (e.g. the discrimination between healthy and diseased people). It imitates a process of reasoning. An appropriate mathematical representation of objects is crucial for this purpose. In this project, representations alternative to the feature-based descriptions, are studied. They are based on the concept of a proximity (e.g. dissimilarity), which is the underlying principle of considering objects as members of the same class. An object is then characterized in a relative way, i.e. as a vector of dissimilarities to a set R of n prototypes.

Concerning the classification aspects, a discrimination function is built on the N x n proximity matrix D(T,R), relating N training objects to n prototypes. Novel objects are classified by their dissimilarities to the set R. The k-nearest neighbor (k-NN) rule, assigning an object to a class the most frequently represented among its k nearest neighbors, is a simple and intuitive approach. It performs well, provided that a large representation set R is available. It is, however, sensitive to noisy objects, due to its operation on local neighborhoods. A more global approach is possible by mapping the dissimilarity data onto a (pseudo-)Euclidean space, such that the dissimilarities are preserved. Then, a classifier built in such a space can outperform the k-NN rule.

Another possibility is to interpret dissimilarities as a representation of a dissimilarity space, where each dimension corresponds to a dissimilarity to a particular prototype. The prototypes constitute, thereby, an n-dimensional space, where traditional decision rules may be used. Our experiments show that e.g. linear or quadratic classifiers built in such a space may outperform the NN rule.

Essential questions refer to the study of discriminative both similarity and dissimilarity measures, the selection or generation of an informative, yet small representation set from a given training set (for the construction of a dissimilarity space or for the distance preserving mapping), the use of non-metric dissimilarity measures and their possible corrections and the construction of specific dissimilarity-sensitive classifiers. New applications are to be explored, especially for the raw measurements such as spectra or images, as well.

In general, the research focuses on various methodologies to be designed or suitably adapted for proximity representations such that usual pattern recognition problems like classification, novelty detection, clustering or active learning can be solved.