Using machine learning and systems-biology approaches to analyse next-generation sequence data in cancers

The availability of exome sequence data for thousands of cancer samples has enabled the investigation of the sequence-level mutations that contribute to cancer. There is a need for strategies to analyse sequence data to gain new biological and clinical insights. This thesis investigates the use of m...

Full description

Bibliographic Details
Main Author:	Sutherland, Russel David
Other Authors:	Lewis, Cathryn Mair ; Dobson, Richard James Butler
Published:	King's College London (University of London) 2016
Subjects:	616.99
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.700764

id	ndltd-bl.uk-oai-ethos.bl.uk-700764
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-7007642018-06-06T15:32:52ZUsing machine learning and systems-biology approaches to analyse next-generation sequence data in cancersSutherland, Russel DavidLewis, Cathryn Mair ; Dobson, Richard James Butler2016The availability of exome sequence data for thousands of cancer samples has enabled the investigation of the sequence-level mutations that contribute to cancer. There is a need for strategies to analyse sequence data to gain new biological and clinical insights. This thesis investigates the use of machine learning and network-based methods to identify the mutated genes associated with important clinical features and cancer types, and to aid candidate gene prioritisation in colorectal cancer, and rheumatoid arthritis. Firstly, tumour/normal exome sequence data was analysed to identify the mutated genes associated with cancer grade and cancer stage across and within three adenocarcinomas. Tumour grading is an important prognostic indicator which is based upon subjective assessment by pathologists, and is not standardised across cancer types. Despite this, this study found that protein coding mutations within TP53 were indicative of high grade status across three adenocarcinomas once adjusted for age, gender, stage, and tumour type. Secondly, Random Forest models were used to identify the mutations that discriminate each of five high-order cancer types. Based on this work a Random Forest approach was used to investigate whether exome sequence data could be used to assign cancers to their tissue of origin without prior knowledge, for future use as a classifier for cancers of unknown primary origin. Finally, a network-based method to perform candidate disease gene prioritisation called ‘k-pseudo cliques analysis’ was developed. The method identifies sets of highly interacting proteins that are enriched for low gene-level p-values. In tests, the identified gene sets outperformed a univariate test for general cancer gene enrichment. As part of the final chapter a network-based method called ‘Region Growing Analysis’ was used to perform candidate disease gene prioritisation of rheumatoid arthritis genome-wide association study data. The findings and methods developed in this thesis can provide insights to the genetic correlates of cancer phenotypes and suggest new candidate disease genes.616.99King's College London (University of London)http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.700764https://kclpure.kcl.ac.uk/portal/en/theses/using-machine-learning-and-systemsbiology-approaches-to-analyse-nextgeneration-sequence-data-in-cancers(44ff20d1-dbf0-43f7-a5ad-18759598ec6b).htmlElectronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	616.99
spellingShingle	616.99 Sutherland, Russel David Using machine learning and systems-biology approaches to analyse next-generation sequence data in cancers
description	The availability of exome sequence data for thousands of cancer samples has enabled the investigation of the sequence-level mutations that contribute to cancer. There is a need for strategies to analyse sequence data to gain new biological and clinical insights. This thesis investigates the use of machine learning and network-based methods to identify the mutated genes associated with important clinical features and cancer types, and to aid candidate gene prioritisation in colorectal cancer, and rheumatoid arthritis. Firstly, tumour/normal exome sequence data was analysed to identify the mutated genes associated with cancer grade and cancer stage across and within three adenocarcinomas. Tumour grading is an important prognostic indicator which is based upon subjective assessment by pathologists, and is not standardised across cancer types. Despite this, this study found that protein coding mutations within TP53 were indicative of high grade status across three adenocarcinomas once adjusted for age, gender, stage, and tumour type. Secondly, Random Forest models were used to identify the mutations that discriminate each of five high-order cancer types. Based on this work a Random Forest approach was used to investigate whether exome sequence data could be used to assign cancers to their tissue of origin without prior knowledge, for future use as a classifier for cancers of unknown primary origin. Finally, a network-based method to perform candidate disease gene prioritisation called ‘k-pseudo cliques analysis’ was developed. The method identifies sets of highly interacting proteins that are enriched for low gene-level p-values. In tests, the identified gene sets outperformed a univariate test for general cancer gene enrichment. As part of the final chapter a network-based method called ‘Region Growing Analysis’ was used to perform candidate disease gene prioritisation of rheumatoid arthritis genome-wide association study data. The findings and methods developed in this thesis can provide insights to the genetic correlates of cancer phenotypes and suggest new candidate disease genes.
author2	Lewis, Cathryn Mair ; Dobson, Richard James Butler
author_facet	Lewis, Cathryn Mair ; Dobson, Richard James Butler Sutherland, Russel David
author	Sutherland, Russel David
author_sort	Sutherland, Russel David
title	Using machine learning and systems-biology approaches to analyse next-generation sequence data in cancers
title_short	Using machine learning and systems-biology approaches to analyse next-generation sequence data in cancers
title_full	Using machine learning and systems-biology approaches to analyse next-generation sequence data in cancers
title_fullStr	Using machine learning and systems-biology approaches to analyse next-generation sequence data in cancers
title_full_unstemmed	Using machine learning and systems-biology approaches to analyse next-generation sequence data in cancers
title_sort	using machine learning and systems-biology approaches to analyse next-generation sequence data in cancers
publisher	King's College London (University of London)
publishDate	2016
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.700764
work_keys_str_mv	AT sutherlandrusseldavid usingmachinelearningandsystemsbiologyapproachestoanalysenextgenerationsequencedataincancers
_version_	1718692140177948672

Using machine learning and systems-biology approaches to analyse next-generation sequence data in cancers

Similar Items