Example Projects for TU Delft/EEMCS Master Students

Below you can find some possible M.Sc. projects that can be done within the Pattern Recognition Laboratory. Please note that these are example projects. If you have your own ideas for a pattern recognition or machine learning project, you are most welcome to bring that in. The best thing to do is to just contact the teachers from the PRLab and have a discussion on the project possibilities. The same holds if you do not have any clear ideas yet about what the exact topic should be. But if you would like to graduate in PR or ML, do mail us!

Feature extraction for an educational program recommender system

M. Loog

Partners: StudyPortals (http://www.studyportals.com/) is an international study choice platform. They offer web-based services to students and academic institutions. For students, it provides portals that categorize and summarize Bachelor, Master, and PhD programs. For institutions, it promotes their educational programs as well as analyzing their marketing and student recruitment efforts.

Introduction: In this project, there is an interest in exploring ways to offer the best fitting studies to students by enriching the study page description. Their educational program search engine relies on descriptions provided by the universities at the moment. For instance, a user who comes to the website enters a query and, based on the keywords in that query, the search engine presents a list of programs at different universities to the user.

Task: However, the provided descriptions by the universities are sometimes limited and lead to unsatisfactory results. This project will be proposing a content framework to the institutions based on a data driven investigation, such as feature extraction of the study pages. The new proposed framework will be tested using A/B testing and the performance will be measured by metrics such as the click-through rate. During the course of the project, the student will be free to focus on the exploratory nature of data with some machine learning components or to fully focus on developing new machine learning techniques for better information retrieval and feature extraction.

For more information, contact M.Loog@tudelft.nl

Two Masters Thesis projects in collaboration with Michigan State University, USA:

H. Hung

In Isolated Confined and Extreme (ICE) conditions, ensuring the teams work effectively is vitally important. ICE conditions include long term space missions, undersea missions, etc. In collaboration with Michigan State University who has been working with NASA to understand the role of wearable sensors in helping teams function, the following projects are offered:

Novel computational methods for detecting affect from sparse wearable sensor data: This project will investigate novel methods for predicting the affective state of people during interactions that involve negotiation. The task will involve analysing existing sensor data associated with an interaction and then determining how that person feels as collected by survey data.

Novel computational methods for detecting affect from noisy labels: In this project, wearable sensor data collected during a 2 week sea mission has been recorded for 2 different teams. A few times a day, the team members are asked to report on their feelings. While the times at which the experiences are sampled is known, the events that contribute to that judgement is not. The aim of this project is to develop novel computational methods to handle this type of noisy sensor data by considering what salient moment leading up to the label could contribute to predicting it.

Novel computational methods for analysing non-verbal behaviour in relation to social and romantic attraction

Using data captured of 100 people taking part in speed date events, this thesis will investigate the relationship between non-verbal behaviour and perceptions of attraction, as captured from video and wearable sensors. These topics will be related to devising statistical models to automatically predict socially relevant phenomena from real behavioural data.

For more information, contact Hayley Hung (h.hung@tudelft.nl)

Novel computational methods for predicting individual influence on group decisions

This is a collaboration together with the University of Innsbruck, School of Management. This project will involve the analysis of audio and video data during group discussions for a specific task. The goal is to see whether verbal and non-verbal behaviour of the meeting participants can be used to estimate who the most influential person was. Researchers at the university of Innsbruck have already collected 800 minutes of data of 40 groups of 5 people. The project will involve a combination of computer vision, audio processing, and machine learning. An interest in social and behavioural psychology is a bonus. This project will also involve collaboration with Organisational Psychologist Dr. Clemens Hutzinger.

For more information, contact Hayley Hung (h.hung@tudelft.nl)

Semantic text analysis for people analytics

Introduction: This is a project on natural language processing. In particular, the focus is on understanding free text input in surveys. Employees in large companies are regularly asked to report on their job satisfaction and working conditions with a possibility to leave additional comments. There are often many of these comments which creates a lot of manual labor for human resource specialists.

Task: The goal is now to develop a topic / keyword extraction technique for document summarization with a possible extension to sentiment analysis in order to automatically process these comment sections on surveys. The company can use the extracted topics, keywords or sentiments to identify issues in the workplace that require attention. The student is expected to have a working understanding of Matlab and an interest in pattern recognition, machine learning or statistics. An interest in or experience with natural language processing is a bonus.

Partners: Focus Orange is a company that develops software for analyzing Human Resources in major corporations (https://www.crunchrapps.com/). They focus on a creating a digital overview of the organization, generating reports on key statistics, identifying talent among employees and finding successors for internal positions. They use and develop state-of-the-art data analysis tools in order to perform accurate predictions on demand.

For more information, contact Marco Loog (M.Loog@tudelft.nl)

Positive unlabeled learning for people analytics

Introduction: Positive Unlabeled Learning is a pattern recognition problem where some samples are labeled as positive and for all other samples we don't know if they are positive or negative. In this particular project, we are looking at the suitability of a person for a particular job. Some positions in the company are vital and require excellent successors, which the company wants to recruit internally from a set of suitable candidates. It is possible to perform predictions by looking at the characteristics of people who previously held this position (positively labeled points). But there is little data available on candidates who are not suited for the position (negatively labeled points).

Task: All employees are measured with characteristics as age, total employment duration, performance metrics, etc. The people who previously held this position in the company are labeled as positive and all other employees are unlabeled. The goal is to identify which candidate(s) would be the best successor(s), or wich unlabeled point(s) is(are) most likely positive. The student is expected to have a working understanding of Matlab and an interest in pattern recognition, machine learning or statistics.

Partners: Focus Orange is a company that develops software for analyzing Human Resources in major corporations (https://www.crunchrapps.com/). They focus on a creating a digital overview of the organization, generating reports on key statistics, identifying talent among employees and finding successors for internal positions. They use and develop state-of-the-art data analysis tools in order to perform accurate predictions on demand.

For more information, contact Marco Loog (M.Loog@tudelft.nl)

Automatic cell type recognition in microscopy images of breast tissue

D.M.J. Tax

Introduction: In breast cancer diagnosis, a biopsy is performed when other tests such as mammograms deliver inconclusive results. During a biopsy, a sample of the breast tissue is removed for further inspection. A pathologist then examines the sample under a microscope to identify the presence or absence (and the type) of cancer-related lesions in order to make a diagnosis.

Project: The current examination of the microscopy images is based on human assessment and grading of a number of predefined features (e.g. growth pattern of cancer cells). As a result, the grading is not necessarily consistent across images due to both inter- and intra-observer variability. Automating the grading task can help reduce variability and thus increase the robustness of the medical assessment. In this project, we will address the first step towards assessment automation by developing a method for the recognition of the different cell types found in microscopy images of breast cancer screening, namely: milk ducts, cancer cells, breast tissue cells, fatty cells, and blood vessels. This task, known as object recognition, faces challenges that arise from variations in image features such as scale and deformation (among others) between objects of the same type.

Partners: This project is a collaboration between the Pattern Recognition and Bioinformatics group at the TUDelft (David Tax, Joana Gonçalves), the Computational Cancer Biology group at the NKI (Lodewyk Wessels), and the Wesseling group / Molecular Pathology at the NKI (Jelle Wesseling, Lindy Visser).

Energy disaggregation using non-intrusive load monitoring

D.M.J. Tax

Introduction: Greeniant is a small company based in Rijswijk working on creating consumer focused energy awareness using data from smart meters. Specifically, we are developing a technology for energy disaggregation, also known as non-intrusive load monitoring.

Energy disaggregation is the practice of estimating the energy consumed by each appliance in an household by only analyzing the total household energy consumption measured at the smart meter, which avoids the need to install separate hardware (plugs/sensors) on the individual appliances. For example, it extracts the energy consumed by washing machines, dryers, dish washers, ovens, etc. from the aggregate electricity meter reading of a household. Energy disaggregation provides the consumer with more understanding on how the individual appliances contribute to the total energy consumption and bills. The increased awareness about the energy consumption of the appliances helps consumers to take effective measures to improve their energy efficiency.

Project: Currently, we are looking for an intelligent student to join our science team to work on developing our non-intrusive load monitoring algorithms. The task of the student is to work on developing algorithms to learn the energy consumption patterns of different appliances, using techniques from machine learning and pattern recognition.

Requirements: We are looking for a student with the following background:

  • Working on MSc degree in computer science, mathematics or related field.
  • Has knowledge/experience in machine learning/pattern recognition, particularly using Bayesian Networks (Hidden Markov Models, Hidden semi-Markov Models).
  • Has experience in programming languages, such as Java or C++, experience with Matlab and/or Python + Numpy/Scipy is a plus.
  • Creative and motivated, capable of working in teams.
  • Good command of either Dutch or English.

Automatic Recognition of Cancer Cells in Blood at IMEC

Laurens van der Maaten


Mobility and traffic flow

D.M.J. Tax

Team Smart Mobility of the Mobility department at TNO is looking for smart and enthusiastic students who are interested in doing state of the art research in the field of Pattern Recognition and Machine Learning applied to the field of Mobility and traffic flow in the Netherlands. The Dutch highway network is densely monitored by Induction Detector Loops (ILD). Around every 500 meter there is a ILD that measures each minute average speed(km/h) and intensity(veh/h). Furthermore, TNO maps and fuses different data sources together in the vicinity of the ILD locations such as rainfall and incident information. We are looking for an automated way to classify and understand traffic jams in general. We are for example interested in traffic jams that are reoccurring each day and those that are not.

For more detailed information, see here.

Judgment of steam cleaned panels by means of machine vision

D.M.J. Tax

The goal is development of machine vision applications for our Physical Laboratory. This lab carries out hundreds of visual judgments per month. It involves determination of rating scores for cracks, blisters, delamination, haze, etc. on coated test panels according to standard methods. We are in the process of developing vision applications. To reach the target at least another five years of research will be required. The development of proprietary vision methods arises from the fact that commercially available instruments fail to deliver satisfactory test results.

The Physical and Analytical Laboratories of AkzoNobel Automotive & Aerospace Coating in Sassenheim were founded in 1939 and have played an important role in the development and production of our coating products since. Over the years, capabilities were expanded with a Microscopy Laboratory, an Expertise Group and a group specialized in High Throughput Experimentation (HTE). All groups are now part of the R&D Service Unit of Automotive & Aerospace Coatings (A&AC). The groups within the R&D Service Unit of A&AC maintain a wide range of technical capabilities and expertise to support our customers in the field of coating research. Over the years we have developed particular areas of expertise where the combination of technical skills is crucial. The strength of the combined Physical, Analytical, Microscopy, HTE and Expertise groups lies in our ability to pull these skills together to meet the needs of any given customer problem related to coatings.

For more detailed information, see here.

Computer-aided detection and characterization of interstitial lung disease

D.M.J. Tax

The goal of this project is to automatically detect interstitial lung diseases. The interstitium of the lung is the tissue connecting the blood vessels and the alveoli, the tiny air sacs. Interstitial illnesses often reveal themselves by a changed 'textural appearance' of the lungs. In order to detect these interstitial diseases, radiographs (X-rays) have to be made, and uncharacteristic textures have to be characterized and detected. Due to advances in computed tomography (CT, using special X-ray equipment to obtain many images from different angles, then joined to show a much higher resolution image) it is currently possible to replace the radiographs by much higher resolution scans. In these scans textures can be characterized much better than the currently available systems.

Social Signal Processing

D.M.J. Tax

In the European Union project Social Signal Processing the goal is to automatically extract, segment and classify social signals that appear in the interaction between people. The pattern recognition group supports the social scientists in analyzing the behavioral patterns and signs that are used in normal social interaction. Questions like "Who is opposing who in a discussion? Who is agreeing? Who is nodding at what time?" have to be answered.

The pattern recognition research focusses on the more fundamental problems in the processing of (huge) video sequence data. Is it possible to segment the audio and the video without expert interaction? How should the image and audio data be represented to allow for this? Is it possible to automatically extract how many people are present in a video? How do they look like, and what do they sound like? Which speaker is speaking at what time? How can we classify sequences of variable length? Can we find out when something unexpected, something exciting happens?

Active Learning

Marco Loog

Active learning does not allude to the activity, or absence of it, of the person interested in this project. It refers more to active in the sense of being effective. More precisely, the adjective refers to the learning phase of a classifier and how, indeed, it can be made more effective.

An example.

You trained a classifier that should act as support to a medical expert in coming to diagnoses, but you are not really satisfied with its [the classifier's that is] performance. One way to potentially improve the accuracy is to provide the classifier with additional labeled examples to train on. Getting labeled examples, however, is rather expensive as you need the medical expert to provide the correct label, so you would like to achieve as much gain in classification performance with as little additional labeled examples as possible. This is the challenge active learning tries to solve by enabling the current classifier to provide active feedback to its user on which unlabeled samples would be most informative for it to have labeled.

Active learning is a rather novel research direction and a valuable approach in general, not only for developing medical expert systems such as the one that featured above. Applications like image segmentation and tasks like speech tagging, for instance, can also benefit from it. Active learning is broadly applicable and both more applied and more fundamental master's projects are possible in this area of pattern recognition.

Real-time classification of rodent behavior.

D.M.J. Tax, Elsbeth van Dam (Noldus, Wageningen)

Observation of rodent behavior is important to many fields of research in the life sciences. Rats and mice are used as models for human diseases and their behavior is studied in labs around the world in order to find new drugs that cure psychiatric and neurological disorders. The automation of these measurements is crucial to advances in pharmaceutical research as well as animal welfare. In this assignment Noldus Information Technology BV (www.noldus.com) tries to create state of the art advances in computer vision and behavior recognition on the cutting edge between science and application.

State-of-the-art in behavior recognition is the detection of behavior of humans, mostly based on spatial measurements like pose and speed. Unfortunately these techniques are not suitable for the detection of subtle rodent behaviors like grooming, sniffing and eating. A few systems have been described in literature that can recognize rodent behavior from a side view. In a recent study features are generated based on a computational model of motion processing in the human brain. Classification of these features and their temporal context is done using advanced event recognition techniques (HMMSVM).

We aim to apply these techniques from literature to top view recordings of rats in infrared light. The capability of the trained classification modules will be evaluated in an automated recognition system for real-time operation in a noisy environment. For this research we have a large and manually labeled dataset of high quality available.

We are looking for students who are interested in computer vision and pattern recognition applied to behavioral research. Knowledge of Matlab is preferred.

Multiple Instance Learning

David Tax and Marco Loog

Multi-Instance Learning: A Survey

A Wiki orphan

Semi-Supervised Learning

Marco Loog

Semi-Supervised Learning Literature Survey

A brief Wiki note