Improvement of classifier selection using meta-learning
Jesse H. Krijthe
In pattern recognition, we are presented with the problem of mapping objects to classes. Research in pattern recognition, machine learning and statistics has given us many methods of learning this mapping based on training examples. This leaves the question which method is most applicable to a given dataset. My research concentrated on finding whether our current procedures for making this choice can be improved. Quite surprisingly, this turned out to be the case.
Using so-called meta-learning techniques we were able to improve the selection of the best classifier over the standard procedure: cross-validation. Meta-learning basically means treating the classifier selection problem as another classification problem. In some sense we are trying to replace the pattern recognition researcher by a pattern recognition procedure! Although this approach has its own drawbacks, for instance the need to collect many datasets, the main result from my work is that unlike what is generally assumed, additional information that is not present in regular cross-validation can improve the classifier choice.
Most of this work was carried out during a six month research visit I undertook to the Statistics and Learning group at Alcatel-Lucent Bell Labs, in New Jersey. Besides being an amazing place to learn about research, it also offered plenty of opportunities to explore nearby New York city.Ultimately, the work resulted in a conference publication that I was able to present at the International Conference on Pattern Recognition in Tsukuba, Japan. This project was supervised by Dr. Tin Kam Ho of Bell Labs and Dr. Marco Loog of Delft University of Technology.
Sample reusability in importance-weighted active learning
Gijs van TulderWhich example should we label next?
Active learning sounds like a wonderful idea: select the most interesting examples to learn the best classifier with the least effort. Not every example is equally helpful for training your classifier. If labelling examples is expensive, it makes sense to label only those examples that you expect will give a large improvement to your classifier.
Unfortunately, using a non-random and unrepresentative selection of samples violates the basic rules of machine learning. It is therefore not surprising that active learning can sometimes lead to poor results because its unnatural sample selection produces the wrong classifier.
Modern active learning algorithms try to avoid these problems and can be reasonably successful at it. One of the remaining problems is that of sample reusability: if you used active learning to select a dataset tailored for one type of classifier, can you also use that same dataset to train another type of classifier?
In my thesis I investigate the reusability of samples selected by the importance-weighted active learning algorithm, one of the current state-of-the-art active learning algorithms. I conclude that importance-weighted active learning does not solve the sample reusability problem completely. There are datasets and classifiers where it does not work.
In fact, as I argue in the second part of my thesis, I think it is impossible to have an active learning algorithm that can always guarantee sample reusability between every pair of classifiers. To specialise your dataset for one classifier, you must necessarily exclude samples that could be useful for others. If you want to do active learning, decide what classifier you want to use before you start selecting your samples.
This project was supervised by Marco Loog. Gijs is now doing his PhD at the Biomedical Imaging Group Rotterdam (Erasmus MC).
Robust automatic detection in a maritime environment
Automatic object detection in a maritime environment is a complex problem that is of growing importance to the Royal Netherlands Navy. Complicating factors are camera motion, the highly dynamic background, the variety in objects and their appearance, and the diversity in both meteorological as well as environmental circumstances. Due to these factors the problem is far too complex for conventional in- and outdoor detection schemes as described in the literature.
Although an initially developed detection algorithm based on polynomial background estimation is well capable of detecting a variety of objects in various circumstances, it also produces an extensive amount of false detections. During the project it is investigated whether these false detections can be successfully eliminated by classifying the detections as either 'target' or 'background'. To this end, the initial detection algorithm has been optimised to detect as much objects as possible in a carefully constructed dataset of eight hundred visible light images. The resulting detections from the optimised algorithm have been used accordingly to train and test various basic classifiers, using a set of features found in the literature as starting point. Finally the performance of the new two-stage detection algorithm has been analysed in detail with respect to its parameters. Results show that the developed classification approach is capable of eliminating many false detections, while retaining a majority of the true detections. Even though a significant performance improvement has been achieved, the solution is still not perfect and opportunities for improvement are left unexploited. Further research is therefore recommended, and suggested for closer examination are: separate classifiers for the sea- and sky part, inclusion of the time dimension, optimisation of the operating point of the classifier and pre-processing steps.
Random subspace method for one-class classifiers
Prime Vision specializes in development of optical character recognition (OCR) techniques, including automatic sorting of parcels. The first step in sorting a parcel is to find the location where the address is written. This is done by searching for "interesting" rectangular blocks on the parcel, which probably contain text, and therefore, an address. Once an "interesting" block is located, it can be run through OCR to obtain an address. However, locating only address blocks is challenging, because parcels often contain several "interesting blocks", including irrelevant (outlier) blocks such as stamps or barcodes. Eventually, these outliers are discarded because OCR does not return meaningful text, or the text that is returned does not provide a meaningful address. Unfortunately, time is wasted while processing these outliers in OCR and therefore, detecting these outliers before the OCR step would be advisable.
The project consisted of investigating whether a one-class classifier would be able to distinguish text blocks from everything else. Such a classifier only needs examples of target data, i.e. only text blocks, in contrast to a traditional classifier, which would need examples of all types of data. This property of outlier detectors is particularly useful in situations when it is too difficult to create a description for the outliers. This may be the case when too little outlier examples are available, or when the outlier examples are of many different types, such as barcodes, stamps and so forth.
Experiments showed that by applying a one-class classifier to the parcel images, more than 90% of outliers could be filtered out successfully. As only already available features were used, this detector had almost no overhead because no extra feature extraction step was needed. Further investigation showed that training the classifiers on random subsets of the available features and then combining their predictions further improved the results.This project was performed in 2010 under supervision of David M.J. Tax and Theo van der Donk (Prime Vision). A related publication can be found here.