Student research opportunities
Space/time efficient privacy-preserving medical data representation
Project Code: CECS_1113
This project is available at the following levels:
CS single semester, Honours, Masters
Keywords:
Medical data, longitudinal data, data structures, privacy, data types, masking functions
Supervisors:
Professor Peter ChristenDr Dinusha Vatsalan
Outline:
Medical data are recorded to serve a variety of purposes including better care of patients, resource management, advanced treatments, detection and prevention of diseases, risk management, clinical trials, and research and development through aggregation and analysis of data regarding population of individuals.
Increasing concerns of privacy and confidentiality, however, preclude the exchange or sharing of such medical data across different organizations for data aggregation and analysis. Techniques are required to conduct data matching or aggregation on masked medical data such that no sensitive information are revealed to any party involved in the linkage or any other external parties.
A medical datum is a single observation of a patient that generally comprises of four elements: 1) the patient in question, 2) the parameter being observed, 3) the value of the parameter, and 4) the time of the observation. Medical data are multiple such observations. This includes several different observations made concurrently, observations of the same patient parameter made at several points in time, or both. Therefore, medical data are often longitudinal and of different types ranging from narrative, textual data to numerical measurements, recorded signals, drawings, and images and videos.
Representing such complex data efficiently in terms of space and time is a challenging aspect that has been researched over several decades. However, efficient and privacy-preserving (i.e. masked) medical data representation is an interesting research direction that requires more attention in order to enable privacy-preserving medical data linkage, mining, and analysis.
Goals of this project
The aim of this project is to research and develop efficient (in terms of memory space and computational complexities) data structures and masking functions for representing medical data in a privacy-preserving manner. Specific goals are:
1. Conduct literature review and compare and assess different existing data structures and techniques for masking medical data.
2. Develop novel techniques to efficiently represent longitudinal medical data using efficient masking functions and data structures.
3. Empirically evaluate and compare existing and the developed techniques using real and/or synthetic datasets in terms of space and time requirements for storage and processing, quality (or utility) of representation, and privacy guarantees.
Requirements/Prerequisites
This project is available as a one semester Computer Science project for both undergraduate and MComp students, or as a one year honours project.
Interested students should have good programming skills (ideally including in Python) and background knowledge in algorithms and data structures, data mining, and privacy.
It is of advantage if the students have knowledge in medical data storage, analysis and mining and/or successfully attended some courses on databases, data mining, or document computing.
Student Gain
Medical data mining is a promising research field and is being widely required in many real health applications. This project allows the student to gain exposure to medical data storage and processing and privacy aspects in medical data mining that would help to contribute to applied research in healthcare applications. The project contributes a baseline for privacy-preserving medical data mining that has a high impact in the healthcare and research industries.
Background Literature
The following materials provide specific background literature on medical data, different data structures used for medical data storage and representation, and different masking functions for privacy-preservation that will be required to conduct the project.
Links
A taxonomy of privacy-preserving record linkage techniques (Dinusha Vatsalan et al. 2013)Data driven analytics in Healthcare: Problems, Challenges, and Future Directions (Fei Wang, CIKM 2014)
Medical Data: Their acquisition, storage, and use (Shortliffe and Barnett)
Standardized vectorial representation of medical data in patient records (Orthuber, Papavramidis, 2010)







