Ancestry.com is the world’s largest provider of consumer DNA testing, using genetic information to reveal key heritage data that far exceeds what is available through family and historical records. But while the genetic inheritance of single segments of DNA is relatively understood, humanity’s long and complex history of interactions makes the task of summarizing a person’s ancestry extremely challenging. UW-Madison alumnus Dr. Keith Noto, Principal Scientist at AncestryDNA, is addressing this issue by spearheading the development of new methods to characterize a shared family history.
“As grandiose as it sounds, the history of the human race is encoded in our DNA, just waiting to be organized and explained, and it’s fascinating work to try to do that,” he explains.
Dr. Noto received his bachelor’s, master’s, and PhD degrees in computer science at UW-Madison. He was also a trainer in the Computation and Informatics in Biology and Medicine (CIBM) program, where he honed his computational science skills in the group of Mark Craven in the Departments of Biostatistics and Medical Informatics and Computer Sciences.
While reminiscing about his past, we asked Dr. Noto about his current position, trajectory, and the advice he would give to students interested in pursuing careers in the genomics field.
What technologies and approaches are used to predict ancestry?
Ancestry offers a DNA test (yes, you spit in a tube and send it to a lab) which determines the alleles (A, C, G, or T) you inherited at about 800,000 genomic sites where human genomes are known to differ. Each of these single nucleotide polymorphisms (SNPs) has an origin in the distant past and a complex history of inheritance that involves millions of people and countless generations. Each allele you have provides the basic unit of insight into your genetic history. We compare your DNA to a curated collection of individuals of known origin to estimate the specific mixture of ancestry you inherited from each parent. We also identify small segments of your DNA that are already present in our database (which now has over 20 million people) to find genetic relatives ranging from close family to distant cousins that share less than 1% of your DNA. The network of people connected by shared DNA actually has a rich structure, which allows us to connect you to groups of people with similar stories in their ancestry, which we can often describe in some detail. For instance, I’m connected to a group whose ancestors lived near western Norway and moved to the American Midwest in the late nineteenth century.
How have you innovated this process?
Research into ancestry using SNP genotype data has been underway for decades. In fact, when I started my career at Ancestry, we were using a published approach that calculated the frequency of alleles in various populations and then estimated the proportion of each inherited population that was statistically most likely, but it treated each SNP as independent information. Our team developed a completely new machine learning approach to this problem with models of our own design that are much more representative of the way DNA is inherited and varies within populations. Our analysis of the shared DNA network which connects people to ancestor stories was developed completely by my team at Ancestry. One of our more recent innovations allows us to separate the alleles inherited from each parent (the technology we use to genotype can’t do that by itself). This is an important innovation because it means that we can start to explain how you inherited the DNA for any of the insights we provide. As we continue to innovate, one of my goals is to segment the DNA of any individual even further, according to which ancestor it was inherited from.
What is your job like on a daily basis?
There are about a dozen full-time scientists on our team, and I consult on several of their research projects in addition to my own. I am also heavily involved with building the products that take advantage of our discoveries. My day often involves coordination with engineering teams at Ancestry as well as research meetings, but I spend most of my time planning and executing my own research projects.
What about your training at UW-Madison was the most impactful in preparing you for your career?
I’ve been successful throughout my career at designing and developing new machine learning approaches to various problems, usually in biology. It began with my thesis work on DNA binding site discovery, and whether I knew it or not at the time, my education at Wisconsin set me on track for similarly successful research. The strong background in machine learning that I received at UW, collaboration with my colleagues, and a little bit of creativity are what it takes to build these research projects. Any success that I’ve had, I owe in large part to the excellent teachers and mentors that I’ve had at Wisconsin. I am grateful to them, and proud to be a Badger.
What are the most exciting challenges and opportunities in your field and company?
It’s been interesting to watch data science evolve and mature over the past several decades, and to see the difference that extremely large data sets can make. We can prove that many of the technologies that we have developed do not start to work until there are tens of millions of examples. DNA data are more expensive to acquire than many other types, and I certainly expect DNA technologies to continue to improve and data to become more ubiquitous, but one of the advantages of doing research at Ancestry is that it already has such a large database of not only DNA, but millions of people have researched and built their own family histories there.
What is your advice to current grad students interested in pursuing a career in your industry/field?
Follow your passion, and follow your nose! Something I didn’t observe and perhaps didn’t realize until I was out of an academic setting is how many different ways there are to be a successful scientist, and how many different ways there are to turn a scientific background into a successful career. I’ve worked with people who are great at testing hypotheses in the lab, and others who are great at managing teams, or managing projects, or building an idea into a new product, or even a new business. And I’ve seen bright scientists struggle, waiting for the right problem to come along. It is easier said than done, I know, but the people that I’ve seen become the most successful let whatever drives them keep doing the driving.
What is the single person, event, or experience that most influenced your trajectory to where you are today?
The event that comes to mind is a career choice that I made several years ago. I was a director at Ancestry and managed a team of scientists, but I found it difficult to carry out my research vision in that position. I became a Principal Scientist with no one reporting directly to me so that I could focus all of my time on new research. I think the decision was viewed by many as a lateral move, but it was not a difficult decision for me. Now that I have hindsight, it was the right one. I’m happy to have chosen my own career ambitions, and I believe I’ve been able to accomplish much more by working in the style that makes me most effective.