Research Profile - Computing cancer

Dr. Sohrab Shah
BC researchers are making more accurate deductions from the myriad of genetic clues that next-generation technology provides.
The best way to appreciate the current revolution in cancer research is to look back a few hundred years, says the University of British Columbia's Dr. Sohrab Shah.
"A good analogy is the 1600s, when microscopes first came into use," says Dr. Shah, a computer scientist with the BC Cancer Agency. "Scientists of that day were looking down microscopes and - for the first time - seeing the presence of bacteria and micro-organisms."
Technological advances, he says, have created a similar scenario in the early 21st century. Next-generation sequencing machines are now producing millions of pieces of information about genetic mutations for analysis using sophisticated computer programs, allowing scientists to study cancers in completely new ways.
At a Glance
Who – Dr. Sohrab Shah, Assistant Professor, University of British Columbia, Scientist, BC Cancer Agency. Dr. Samuel Aparicio, Nan and Lorraine Robertson Chair of Breast Cancer Research, University of British Columbia/BC Cancer Agency, Canada Research Chair in Molecular Oncology.
Issue – New technology called next-generation sequencing allows scientists to study genetic alterations in cancer tumours at unprecedented resolution. It produces vast amounts of data that can only be interpreted through computer-based algorithms. However, current methods often produce numerous false predictions of genetic alterations that, upon further analysis, don't hold up as valid. Also, important indicators of mutations may be overlooked in the mountain of data produced.
Approach – By developing and refining algorithms to improve computational analysis, researchers will be able to study the genetic mutations faster and more accurately.
Impact – Knowledge gained will lead to improved clinical management of cancers and improved outcomes for cancer patients.
The ability to observe single-nucleotide variants in a DNA strand - essentially, variations in how the A, T, C or G nucleobases of the double helix pair up - is shedding new light on the mechanisms involved in how different cancers form and evolve.
"For example, in studying breast cancers, we've learned how different they are in terms of the number of mutations and the gene content," says Dr. Shah. "We're seeing this for the first time."
As with any new technology, however, there are bugs to be worked out. Next-generation sequencing and the computational algorithms used to interpret the data it yields can produce "false positive" indicators of genetic alterations. Also, because sequencing spits out millions of bytes of information, the algorithms used to sort through it may simply miss the presence of important alterations.
"Think of it as punching in 100 phone numbers," says Dr. Samuel Aparicio, one of Canada's leading molecular oncologists and a partner with Dr. Shah in a CIHR-funded project to improve the computational prediction power of next-generation sequencing. "One or two times out of 100 you are going to get a number out of sequence and end up with the wrong phone number. When you're sequencing a cancer genome, you are gathering millions of pieces of information. So even an error rate of 0.01% is going to produce thousands of errors which have to be sorted out."
What is needed, say Drs. Shah and Aparicio, are better computer algorithms. And in a field as complex as molecular biology, it seems that practice makes perfect.
"It's a cyclical process," says Dr. Shah. "We do a lot of computation and then we perform extensive validation of the results. From that we learn which elements of computation were correct and which ones weren't. We design algorithms that can learn from those validations and become better at predicting mutations."
The work that Dr. Shah - who, in the late 1990s, switched from biology to computer science because the Human Genome Project was transforming life science - and Dr. Aparicio are doing is already yielding important discoveries. Their methods were pivotal in their landmark 2009 Nature paper showing evolution of a breast cancer tumour. They are also co-authors, with colleague Dr. David Huntsman, of two New England Journal of Medicine papers that identify mutations (ARID1A and FOXL2) present in ovarian cancer.
Advancing the ability to apply computer-driven genetic sequencing to understanding mutations can have widespread impact in oncology, says Dr. Shah.
"The computational work applies to many cancer types. Whether you're sequencing lymphomas, ovarian cancers, breast cancers or other tumour types, the algorithms apply equally."
"We're finding mutations in genes that we didn't expect to see. This is telling us new things about the way cells can be transformed to become malignant and how they evolve."
- Dr. Samuel Aparicio, University of British Columbia