By Leo Barolo
Genetic and genomic technologies generate massive amounts of data that can no longer be analyzed manually. Machine learning has revolutionized research by using computational methods to present patterns and relationships among large datasets. Researchers are suddenly breaking barriers in biomedical research using machine learning approaches, including providing clues to understand complex diseases that have no cure. One of these researchers is Ting Jin, a third-year Ph.D. student in the Biomedical Data Science program.
Originally from a city in northern China (which she says has similar weather to Madison), Jin is now developing “interpretable” machine learning models to understand functional genomics and gene regulation. Whereas many machine learning methods are famous for serving as “black boxes” that simply present patterns in large datasets, interpretable models are intended to present rational methods that facilitate hypothesis-generating insights. Jin is developing models aimed at single-cell datasets from complex biological systems, especially for brain diseases, in Daifeng Wang’s lab in the Department of Biostatistics & Medical Informatics (BMI). Dr. Wang is also a member of the Waisman Center and an affiliate member in the Center for Genomic Science Innovation.
As Jin continues her research at UW-Madison, we asked her a couple of questions.
What are the main goals of your research?
Currently, the diagnosis of some complex brain diseases, such as schizophrenia and Alzheimer’s disease, is based on imaging data, and behavioral observation. We have limited knowledge about the mechanisms of diseases at the cellular and molecular levels. My research aims to provide a deeper mechanistic insight into how the genetic links and risks are associated with disease development, particularly at the cell-type levels. A better understanding of the disease may ultimately lead to precision medicine and treatments that may help more people.
Can you give us a basic description of what machine learning is and why it is a powerful tool for analyzing genomic datasets?
Machine learning, which aims to develop computer algorithms that could improve the experience and the use of data, enables computers to assist in the analysis of large and complex datasets. As the genomic data volumes continue to expand, especially at cellular resolution, machine learning could make predictions accurately and efficiently use data collected from millions of cells, which have high amounts of noise, large dimensions, and incompleteness.
What is the single person, event, or experience that most influenced your trajectory to where you are today?
I think that my PI, Daifeng Wang, influenced my trajectory. My background is in Electrical Engineering, Signal and Image Processing, and I have experience in applying machine learning in big data analytics. However, before I worked with Professor Wang, I had no experience in the field of functional genomics. He taught me everything from the gene expression level, which is the most basic concept in the genomic area, and all of my work wouldn’t get done and published without his help. The most important lesson I learned from him to grow as a researcher is that everyone has their timing, I should calm down, and focus on my own pace. I don’t think that I would be where I am today without his help and encouragement.
What advice would you have for a young person interested in graduate school or research?
Research sometimes is unpredictable. No one would know which method could be the most suitable one for a specific dataset or problem. We may put lots of effort into it, but the results are still not good. So my advice is that do not easily give up, be patient, and it is important to discuss with your PI or labmates to come up with some great ideas.
With both programming skills and a strong engineering background, Jin hopes to develop data-driven machine learning systems for real-world applications. After graduation, she plans to find a research-related job in industry, and this summer she will be interning at biotech giant Merck.