Finished projects

Natural language processing and machine learning for finance

D.M.J. Tax

1. Natural Language Processing and Artificial Intelligence; deals with the application of computational models to textual information. This involves questioning how to structure statistical models of natural langue, and how to design and implement AI algorithms. Our goal is to parse financial deals by analyzing the loan agreements.

2. Machine learning; This assignment includes the development of algorithms that discover knowledge from specific textual data combined with quantitative data on statistical and computational principles. This also includes the development of learning processes that illustrate the nature of the computations and experience necessary for successful learning in humans and machines. Our goal is to accurately predict the best financial agreements suited for each client through a clustering algorithm.

Maintenance prediction for trains

D.M.J. Tax

Introduction: NedTrain is the part of the Dutch Railways (NS Group) responsible for the cleaning, maintenance & service, and overhaul of rolling stock. To ensure all trains are reliable and safe to operate at the lowest cost, NedTrain is continuously optimizing the maintenance schedule to plan when and what to maintain.

Project: In the trains there are diagnosis systems continuously registering the diagnosis codes of various modules such as bogie, traction, climate, brake electronics and so forth. A diagnosis code is generated by the rule-based system if there is an abnormal or particular behavior occurred on some module sensors. Based on the diagnosis codes and/or sensor measurement, we would like to build a prognostic model of train conditions to prevent train failure during operations.

Transaction monitoring

D.M.J. Tax

The issue: Shady transactions. They happen all over the world, but for us they will become a recurring, frequent issue that we need to mitigate. Whether it's terrorist funding or wire transfers to that eccentric Nigerian prince, we have a responsibility to tackle these types of transactions and spot them from miles away.

What we need A monitoring system that filters out any fishy transactions that happen within our banking environment. Proper detection and alerts when something strange happens is our first line of defence in ridding the world from white-collar criminality or other glitches in the compliance matrix. You will design and build the system, of course in cooperation with your personal supervisor. If you deliver great work, we'll naturally implement it in our systems.

Suggested reading: • Intelligent agents and financial risk monitoring systems,

What will the work environment look like for the student? You'll join in on the fun of 35 smart, dedicated people (10+ nationalities) that are on a common mission to revolutionise the financial sector. You'll be based in Amsterdam, right next to Sloterdijk station. We'll of course supply you with a laptop and a solid monthly pay (€500). Being a tech startup, we have all the perks to match, from pingpong and free beer to epic bi-monthly team events.

What are your expectations of the student? Be passionate about your work and have a drive to learn. That's all we expect. We value creativity and smart solutions. In return for your hard work, we give you a personal supervisor that knows what he is talking about. Any questions? Don't hesitate to get in touch!

Offshore Subsea Object Detection

D.M.J. Tax

Project: ECE Offshore is developing a new sensor line with its first use case in offshore cable installation. The sensor collects sub-sea point cloud data. This point cloud data contains information on the seabed and the objects in the measurement range. Goal of the sensor is to estimate the shape and location of the subsea objects. The data is also disturbed by various physical processes that can be addressed by model based solutions and must be classified in real-time. This classification is especially challenging and a good topic for in depth research.

ECE Offshore: ECE Offshore is a young and fast growing player in the offshore installation market. ECE Offshore provides engineering, consultancy and equipment solutions to the offshore oil, gas and renewable industries. Our company size and the flexible mentality of our engineers allow us to move agile on technically challenging and innovative opportunities in the offshore industry. You can work in our open-office space in Wateringen where you’ll be part of our team and experience real-world offshore engineering.

Credit Risk and Modelling at ABN-AMRO

D.M.J. Tax

Within the banking industry risk management had been transformed over the past decade, largely due to regulations that arose from the financial crisis. At ABN AMRO we believe that risk management could undergo even more change in the next decade. Trends that shape the risk function of ABN AMRO come from many directions. Within Credit Risk & Modelling we focus on evolving technology and advanced analytics, such as machine learning, that are enabling new risk management techniques. Currently, we are experimenting with self-learning algorithms in credit loss prediction, with encouraging results. Therefore, we are looking for three highly skilled students that will help us to develop a text mining framework with academic research to enable future success.

Within Credit Risk and Modelling there are three challenging internship assignments:

  1. Natural Language Processing and Artificial Intelligence; deals with the application of computational models to textual information. This involves questioning how to structure statistical models of natural langue, and how to design and implement AI algorithms.
  2. Text Mining framework; development and implementation of a text mining framework. This includes research in text mining techniques used to discover useful knowledge implicit in text sources.
  3. Machine learning; This assignment includes the development of algorithms that discover knowledge from specific textual data based on statistical and computational principles. This also includes the development of learning processes that illustrate the nature of the computations and experience necessary for successful learning in humans and machines. This assignment specifically includes the development of an Artificial Neural Network
  4. Improvement of classifier selection using meta-learning

    Jesse H. Krijthe

    In pattern recognition, we are presented with the problem of mapping objects to classes. Research in pattern recognition, machine learning and statistics has given us many methods of learning this mapping based on training examples. This leaves the question which method is most applicable to a given dataset. My research concentrated on finding whether our current procedures for making this choice can be improved. Quite surprisingly, this turned out to be the case.

    Using so-called meta-learning techniques we were able to improve the selection of the best classifier over the standard procedure: cross-validation. Meta-learning basically means treating the classifier selection problem as another classification problem. In some sense we are trying to replace the pattern recognition researcher by a pattern recognition procedure! Although this approach has its own drawbacks, for instance the need to collect many datasets, the main result from my work is that unlike what is generally assumed, additional information that is not present in regular cross-validation can improve the classifier choice.

    Most of this work was carried out during a six month research visit I undertook to the Statistics and Learning group at Alcatel-Lucent Bell Labs, in New Jersey. Besides being an amazing place to learn about research, it also offered plenty of opportunities to explore nearby New York city.

    Ultimately, the work resulted in a conference publication that I was able to present at the International Conference on Pattern Recognition in Tsukuba, Japan. This project was supervised by Dr. Tin Kam Ho of Bell Labs and Dr. Marco Loog of Delft University of Technology.


    Sample reusability in importance-weighted active learning

    Gijs van Tulder

    Which example should we label next?

    Active learning sounds like a wonderful idea: select the most interesting examples to learn the best classifier with the least effort. Not every example is equally helpful for training your classifier. If labelling examples is expensive, it makes sense to label only those examples that you expect will give a large improvement to your classifier.

    Unfortunately, using a non-random and unrepresentative selection of samples violates the basic rules of machine learning. It is therefore not surprising that active learning can sometimes lead to poor results because its unnatural sample selection produces the wrong classifier.

    Modern active learning algorithms try to avoid these problems and can be reasonably successful at it. One of the remaining problems is that of sample reusability: if you used active learning to select a dataset tailored for one type of classifier, can you also use that same dataset to train another type of classifier?

    In my thesis I investigate the reusability of samples selected by the importance-weighted active learning algorithm, one of the current state-of-the-art active learning algorithms. I conclude that importance-weighted active learning does not solve the sample reusability problem completely. There are datasets and classifiers where it does not work.

    In fact, as I argue in the second part of my thesis, I think it is impossible to have an active learning algorithm that can always guarantee sample reusability between every pair of classifiers. To specialise your dataset for one classifier, you must necessarily exclude samples that could be useful for others. If you want to do active learning, decide what classifier you want to use before you start selecting your samples.

    This project was supervised by Marco Loog. Gijs is now doing his PhD at the Biomedical Imaging Group Rotterdam (Erasmus MC).


    Robust automatic detection in a maritime environment

    Maarten Hartemink

    Automatic object detection in a maritime environment is a complex problem that is of growing importance to the Royal Netherlands Navy. Complicating factors are camera motion, the highly dynamic background, the variety in objects and their appearance, and the diversity in both meteorological as well as environmental circumstances. Due to these factors the problem is far too complex for conventional in- and outdoor detection schemes as described in the literature.

    Although an initially developed detection algorithm based on polynomial background estimation is well capable of detecting a variety of objects in various circumstances, it also produces an extensive amount of false detections. During the project it is investigated whether these false detections can be successfully eliminated by classifying the detections as either 'target' or 'background'. To this end, the initial detection algorithm has been optimised to detect as much objects as possible in a carefully constructed dataset of eight hundred visible light images. The resulting detections from the optimised algorithm have been used accordingly to train and test various basic classifiers, using a set of features found in the literature as starting point. Finally the performance of the new two-stage detection algorithm has been analysed in detail with respect to its parameters. Results show that the developed classification approach is capable of eliminating many false detections, while retaining a majority of the true detections. Even though a significant performance improvement has been achieved, the solution is still not perfect and opportunities for improvement are left unexploited. Further research is therefore recommended, and suggested for closer examination are: separate classifiers for the sea- and sky part, inclusion of the time dimension, optimisation of the operating point of the classifier and pre-processing steps.


    Random subspace method for one-class classifiers

    Veronika Cheplygina

    Prime Vision specializes in development of optical character recognition (OCR) techniques, including automatic sorting of parcels. The first step in sorting a parcel is to find the location where the address is written. This is done by searching for "interesting" rectangular blocks on the parcel, which probably contain text, and therefore, an address. Once an "interesting" block is located, it can be run through OCR to obtain an address. However, locating only address blocks is challenging, because parcels often contain several "interesting blocks", including irrelevant (outlier) blocks such as stamps or barcodes. Eventually, these outliers are discarded because OCR does not return meaningful text, or the text that is returned does not provide a meaningful address. Unfortunately, time is wasted while processing these outliers in OCR and therefore, detecting these outliers before the OCR step would be advisable.

    The project consisted of investigating whether a one-class classifier would be able to distinguish text blocks from everything else. Such a classifier only needs examples of target data, i.e. only text blocks, in contrast to a traditional classifier, which would need examples of all types of data. This property of outlier detectors is particularly useful in situations when it is too difficult to create a description for the outliers. This may be the case when too little outlier examples are available, or when the outlier examples are of many different types, such as barcodes, stamps and so forth.

    Experiments showed that by applying a one-class classifier to the parcel images, more than 90% of outliers could be filtered out successfully. As only already available features were used, this detector had almost no overhead because no extra feature extraction step was needed. Further investigation showed that training the classifiers on random subsets of the available features and then combining their predictions further improved the results.

    This project was performed in 2010 under supervision of David M.J. Tax and Theo van der Donk (Prime Vision). A related publication can be found here.

    Classification of noise events

    D.M.J. Tax, and J.Koolhaas (Geluidsnet Sensornet B.V.)

    Since 2003 the company Geluidsnet provides reliable, objective data on noise levels and patterns in residential, industrial and entertainment areas. Geluidsnet specializes in long-term, reliable measurements of environmental sounds and effective reporting of current and historical measurement results and analyses to customers. The customers are the civil governments that are using these data for their policy making, information provision and enforcement, lobby groups, house owners, catering industry and festival organizers, that use Geluidsnet to show noise levels and patterns. More information is available at

    The research focuses on the classification of different noise sources. The goal is to find a set generic feature vectors that can be used to train one-class classifiers to recognize different noise sources from audio recordings. Labeled training set is available and includes airplane, helicopter, train, automobile and motorbike noise. This extension to our current measuring services will enable us to separate the investigated noise source from background noise or other noise sources with our low-tech, cost efficient measuring unit.

    The generic approach enables us to use the same feature vectors to train unknown future customer requests. The trained pattern recognizer should be computationally efficient so that it can be run on the embedded computers in our 100 measuring units in the Netherlands.

    Computer-aided detection and characterization of interstitial lung disease

    D.M.J. Tax

    The goal of this project is to automatically detect interstitial lung diseases. The interstitium of the lung is the tissue connecting the blood vessels and the alveoli, the tiny air sacs. Interstitial illnesses often reveal themselves by a changed 'textural appearance' of the lungs. In order to detect these interstitial diseases, radiographs (X-rays) have to be made, and uncharacteristic textures have to be characterized and detected. Due to advances in computed tomography (CT, using special X-ray equipment to obtain many images from different angles, then joined to show a much higher resolution image) it is currently possible to replace the radiographs by much higher resolution scans. In these scans textures can be characterized much better than the currently available systems.

    Judgment of steam cleaned panels by means of machine vision

    D.M.J. Tax

    The goal is development of machine vision applications for our Physical Laboratory. This lab carries out hundreds of visual judgments per month. It involves determination of rating scores for cracks, blisters, delamination, haze, etc. on coated test panels according to standard methods. We are in the process of developing vision applications. To reach the target at least another five years of research will be required. The development of proprietary vision methods arises from the fact that commercially available instruments fail to deliver satisfactory test results.

    The Physical and Analytical Laboratories of AkzoNobel Automotive & Aerospace Coating in Sassenheim were founded in 1939 and have played an important role in the development and production of our coating products since. Over the years, capabilities were expanded with a Microscopy Laboratory, an Expertise Group and a group specialized in High Throughput Experimentation (HTE). All groups are now part of the R&D Service Unit of Automotive & Aerospace Coatings (A&AC). The groups within the R&D Service Unit of A&AC maintain a wide range of technical capabilities and expertise to support our customers in the field of coating research. Over the years we have developed particular areas of expertise where the combination of technical skills is crucial. The strength of the combined Physical, Analytical, Microscopy, HTE and Expertise groups lies in our ability to pull these skills together to meet the needs of any given customer problem related to coatings.

    For more detailed information, see here.

    Personalization by learning from negative examples.

    D.M.J. Tax, Bob Duin, and A. Ypma (GN ReSound, Eindhoven)

    Modern digital hearing aids contain algorithm parameters that are preset to values that ideally match the preferences of its user. To a certain extent, this can indeed be done in a fitting session, e.g. at a hearing aid dispenser. However, not every individual user preference can be put into the device in this manner: some particularities of the user may be hard to code or represent into the algorithm, his typical sound environments may be mismatched or changing, and also his preference patterns may be changing. Therefore one would like to further personalize the instrument to the user preference during usage.

    The question arises whether one can successfully adapt parameters when the user can only issue negative feedback. We can represent the problem as the identification of an admissible area in some parameter space, where the negative examples indicate unfavorable settings. First, positive examples and prior knowledge are used to initialize the admissible area. Next, an incremental updating of the model in the parameter space is required, based on negative examples only. Advanced pattern recognition techniques like one-class support vector machines will be used for this purpose. The method will be calibrated on simulated data and evaluated on a real-world dataset of hearing aid preference data. Furthermore, the method has to be low-complexity and should return appropriate parameter settings in terms of probabilities. We aim at a Matlab implementation that is demonstrated to capture the user preference better after learning, e.g. by performing a listening test.

    The graduation project will mainly be executed in Delft, under supervision of David Tax and Bob Duin. Several visits to the GN ReSound site at Eindhoven should ensure the relevance of the method for the problem of hearing aid personalization. GN ReSound will offer a working place, supervision during the visits, and preparation of a real-world dataset of user preference data. The student will receive a financial compensation from GN ReSound.

    GN ReSound is one of the world’s largest manufacturers of digital hearing aids. It is based in Copenhagen and has a Research and Development site in Eindhoven, where advanced algorithms for digital signal processing and machine learning are being developed and tested.