Example Projects for TU Delft/EEMCS Master Students

Below you can find some possible M.Sc. projects that can be done within the Pattern Recognition Laboratory. Please note that these are example projects. If you have your own ideas for a pattern recognition or machine learning project, you are most welcome to bring that in. The best thing to do is to just contact the teachers from the PRLab and have a discussion on the project possibilities. The same holds if you do not have any clear ideas yet about what the exact topic should be. But if you would like to graduate in PR or ML, do mail us!

Finding train and track defects based on spectrograms of currents at electricity substations

David M.J. Tax

DEKRA is an internationally active service-providing company. DEKRA Rail is the European rail division of DEKRA for Testing, Inspection, Certification and Research and has about 120 professionals. The main part of DEKRA Rail used to be NS Technical Research (part of Dutch Railways) with over 100 years of experience.

The Netherlands has a dense railway network. The biggest part of the network is electrified and electricity substations power the overhead lines. While operating, DC current flows from the overhead line to the electrified train, and then back through the track (retour current). Depending on the state of the train, this retour current consists of different frequencies and amplitudes.

DEKRA Rail has a monitoring system installed in several electricity substations in the Netherlands to measure the retour current of trains continuously. Trains can have failures in their system, which introduces an AC component in the retour current. When certain criteria are met, this can be an indicator of a defect within the train or track. At the time (t0) that one of such criteria is met, a measurement of 30 sec before t0 en 30 sec after t0 is saved and send to the DEKRA Rail servers. The measurement is transformed to a spectrogram as seen in Fig 1.

In this spectrogram it is possible to recognize patterns which correspond to defects of trains or tracks. We receive over 100 measurements each day, and only a few are really related to a defect. We would like to classify the measurements automatically, to make it easy to find trains and tracks with defects.

For more information, please contact Léanneke Loeve at leanneke.loeve@dekra.com

Feature extraction for an educational program recommender system

M. Loog

Partners: StudyPortals (http://www.studyportals.com/) is an international study choice platform. They offer web-based services to students and academic institutions. For students, it provides portals that categorize and summarize Bachelor, Master, and PhD programs. For institutions, it promotes their educational programs as well as analyzing their marketing and student recruitment efforts.

Introduction: In this project, there is an interest in exploring ways to offer the best fitting studies to students by enriching the study page description. Their educational program search engine relies on descriptions provided by the universities at the moment. For instance, a user who comes to the website enters a query and, based on the keywords in that query, the search engine presents a list of programs at different universities to the user.

Task: However, the provided descriptions by the universities are sometimes limited and lead to unsatisfactory results. This project will be proposing a content framework to the institutions based on a data driven investigation, such as feature extraction of the study pages. The new proposed framework will be tested using A/B testing and the performance will be measured by metrics such as the click-through rate. During the course of the project, the student will be free to focus on the exploratory nature of data with some machine learning components or to fully focus on developing new machine learning techniques for better information retrieval and feature extraction.

For more information, contact M.Loog@tudelft.nl

Natural language processing and machine learning for finance

D.M.J. Tax

Partner:For the department Loan Syndications Corporate & Institutional Banking, we have two challenging internship assignments. The Loan Syndications team structures complex debt transactions and raises financing by way of syndicated loans. The team has in-depth product and sector knowledge that can assist large firms in funding both organic business growth and strategic events. They are also specialists in export finance (Export Credit Agency) taking care of contract financing for capital goods and services with cross-border components. Therefore, we are looking for two highly skilled students that will help us to develop a machine learning framework with academic research to enable future success for the Loan Syndications team.

1. Natural Language Processing and Artificial Intelligence; deals with the application of computational models to textual information. This involves questioning how to structure statistical models of natural langue, and how to design and implement AI algorithms. Our goal is to parse financial deals by analyzing the loan agreements.

2. Machine learning; This assignment includes the development of algorithms that discover knowledge from specific textual data combined with quantitative data on statistical and computational principles. This also includes the development of learning processes that illustrate the nature of the computations and experience necessary for successful learning in humans and machines. Our goal is to accurately predict the best financial agreements suited for each client through a clustering algorithm.

This program offers a challenging, high-demanding, 6 to 8-month internship in the area of natural language processing and machine learning, as part of a complex world. The assignment will offer the opportunity to carry heavy responsibilities. We require students to actively participate in the team in order to grow and share their knowledge and expertise, and to develop their interpersonal skills, while providing sufficient opportunities to write a high quality thesis.

For more information, contact D.M.J.Tax@tudelft.nl

Energy disaggregation using non-intrusive load monitoring

D.M.J. Tax

Introduction: Greeniant is a small company based in Rijswijk working on creating consumer focused energy awareness using data from smart meters. Specifically, we are developing a technology for energy disaggregation, also known as non-intrusive load monitoring.

Energy disaggregation is the practice of estimating the energy consumed by each appliance in an household by only analyzing the total household energy consumption measured at the smart meter, which avoids the need to install separate hardware (plugs/sensors) on the individual appliances. For example, it extracts the energy consumed by washing machines, dryers, dish washers, ovens, etc. from the aggregate electricity meter reading of a household. Energy disaggregation provides the consumer with more understanding on how the individual appliances contribute to the total energy consumption and bills. The increased awareness about the energy consumption of the appliances helps consumers to take effective measures to improve their energy efficiency.

Project: Currently, we are looking for an intelligent student to join our science team to work on developing our non-intrusive load monitoring algorithms. The task of the student is to work on developing algorithms to learn the energy consumption patterns of different appliances, using techniques from machine learning and pattern recognition.

Requirements: We are looking for a student with the following background:

  • Working on MSc degree in computer science, mathematics or related field.
  • Has knowledge/experience in machine learning/pattern recognition, particularly using Bayesian Networks (Hidden Markov Models, Hidden semi-Markov Models).
  • Has experience in programming languages, such as Java or C++, experience with Matlab and/or Python + Numpy/Scipy is a plus.
  • Creative and motivated, capable of working in teams.
  • Good command of either Dutch or English.

Mobility and traffic flow

D.M.J. Tax

Team Smart Mobility of the Mobility department at TNO is looking for smart and enthusiastic students who are interested in doing state of the art research in the field of Pattern Recognition and Machine Learning applied to the field of Mobility and traffic flow in the Netherlands. The Dutch highway network is densely monitored by Induction Detector Loops (ILD). Around every 500 meter there is a ILD that measures each minute average speed(km/h) and intensity(veh/h). Furthermore, TNO maps and fuses different data sources together in the vicinity of the ILD locations such as rainfall and incident information. We are looking for an automated way to classify and understand traffic jams in general. We are for example interested in traffic jams that are reoccurring each day and those that are not.

For more detailed information, see here.

Social Signal Processing

D.M.J. Tax

In the European Union project Social Signal Processing the goal is to automatically extract, segment and classify social signals that appear in the interaction between people. The pattern recognition group supports the social scientists in analyzing the behavioral patterns and signs that are used in normal social interaction. Questions like "Who is opposing who in a discussion? Who is agreeing? Who is nodding at what time?" have to be answered.

The pattern recognition research focusses on the more fundamental problems in the processing of (huge) video sequence data. Is it possible to segment the audio and the video without expert interaction? How should the image and audio data be represented to allow for this? Is it possible to automatically extract how many people are present in a video? How do they look like, and what do they sound like? Which speaker is speaking at what time? How can we classify sequences of variable length? Can we find out when something unexpected, something exciting happens?

Active Learning

Marco Loog

Active learning does not allude to the activity, or absence of it, of the person interested in this project. It refers more to active in the sense of being effective. More precisely, the adjective refers to the learning phase of a classifier and how, indeed, it can be made more effective.

An example.

You trained a classifier that should act as support to a medical expert in coming to diagnoses, but you are not really satisfied with its [the classifier's that is] performance. One way to potentially improve the accuracy is to provide the classifier with additional labeled examples to train on. Getting labeled examples, however, is rather expensive as you need the medical expert to provide the correct label, so you would like to achieve as much gain in classification performance with as little additional labeled examples as possible. This is the challenge active learning tries to solve by enabling the current classifier to provide active feedback to its user on which unlabeled samples would be most informative for it to have labeled.

Active learning is a rather novel research direction and a valuable approach in general, not only for developing medical expert systems such as the one that featured above. Applications like image segmentation and tasks like speech tagging, for instance, can also benefit from it. Active learning is broadly applicable and both more applied and more fundamental master's projects are possible in this area of pattern recognition.

Real-time classification of rodent behavior.

D.M.J. Tax, Elsbeth van Dam (Noldus, Wageningen)

Observation of rodent behavior is important to many fields of research in the life sciences. Rats and mice are used as models for human diseases and their behavior is studied in labs around the world in order to find new drugs that cure psychiatric and neurological disorders. The automation of these measurements is crucial to advances in pharmaceutical research as well as animal welfare. In this assignment Noldus Information Technology BV (www.noldus.com) tries to create state of the art advances in computer vision and behavior recognition on the cutting edge between science and application.

State-of-the-art in behavior recognition is the detection of behavior of humans, mostly based on spatial measurements like pose and speed. Unfortunately these techniques are not suitable for the detection of subtle rodent behaviors like grooming, sniffing and eating. A few systems have been described in literature that can recognize rodent behavior from a side view. In a recent study features are generated based on a computational model of motion processing in the human brain. Classification of these features and their temporal context is done using advanced event recognition techniques (HMMSVM).

We aim to apply these techniques from literature to top view recordings of rats in infrared light. The capability of the trained classification modules will be evaluated in an automated recognition system for real-time operation in a noisy environment. For this research we have a large and manually labeled dataset of high quality available.

We are looking for students who are interested in computer vision and pattern recognition applied to behavioral research. Knowledge of Matlab is preferred.

Multiple Instance Learning

David Tax and Marco Loog

Multi-Instance Learning: A Survey

A Wiki orphan

Semi-Supervised Learning

Marco Loog

Semi-Supervised Learning Literature Survey

A brief Wiki note