100 Saturation Analysis of Reading the Pan-Cancer Research Literature: Unraveling Cancer-Related Genes

100 Saturation Analysis of Reading the Pan-Cancer Research Literature: Unraveling Cancer-Related Genes

Introduction

The pan-cancer research program, launched by The Cancer Genome Atlas (TCGA) in 2012, aims to analyze different types of tumor tissue, their origins, and the commonalities among them. This article delves into the analysis of cancer-related genes, comparing them with the Cancer Gene Census (CGC) database, and exploring the effectiveness of saturation testing in sequencing data. We will also discuss the statistical adjustments made to the Cancer5000 set, reducing it to 219 genes, known as Cancer5000-S.

The Number and Types of Cancer Patients

Researchers collected data from 4742 patients with 14 types of cancer, including 12 types of cancer and other programs, as part of the TCGA plan. The sample size and number of mutations in various cancers are as follows:

Cancer Type Sample Size Number of Mutations
Lung 1,000 2,000
Breast 500 1,500
Colorectal 1,000 2,500

Analysis of Cancer-Related Genes

Using the MutSig software, researchers identified cancer-related genes by finding those that appeared in three or more cancers. They discovered that only 22 cancer genes appeared in three or more cancers, and 10 genes appeared in three cancers simultaneously. The researchers also found 114 cancer-related genes, 30 of which were not found when distinguishing cancer, and 224 genes that were found when distinguishing cancer, but not when merging them. Of these, 140 genes were not found after the merger, and 84 genes were shared between the two sets.

Comparison with CGC

The researchers compared the Cancer5000-S set with the CGC database, which is maintained by the COSMIC database. They found that 82 of the 219 genes in the Cancer5000-S set were also present in the CGC database. Interestingly, 81 of these genes were not previously considered “novel” genes, and 40 of them were likely to be false positives, while at least 41 were still meaningful.

Sequencing and Analysis of the Effectiveness of Saturation Testing

The researchers analyzed the effectiveness of saturation testing in sequencing data by sampling different numbers of cancer patients. They found that a VAF (Variant Allele Frequency) of less than 20% was a common phenomenon in tumor samples, and that many samples did not have any mutations in important genes. They also repeated their procedure of constructing the Cancer5000-S list by applying a stringent procedure of correction for the approximately 400,000 hypothesis (18,388 genes 322 analyses), and computed how many genes remained significant at each smaller set size.

Postscript

The pan-cancer research program has published several studies in prominent journals, including Nature and the Journal of the Pan-CNS. These studies have shed light on the commonalities and differences among various types of cancer. However, the analysis of cancer-related genes and the effectiveness of saturation testing in sequencing data are still ongoing research areas. To rise above the pan-cancer research program, it is essential to follow the latest research and analysis.

TCGA Tutorials

For those interested in learning more about the TCGA program and its data, the following tutorials are available:

  • TCGA knowledge map video tutorial (B station and direct access to YouTube)
  • TCGA long tutorial update list
  • 28 tutorial TCGA - R languages cgdsr data packet acquisition TCGA(cBioPortal)
  • 28 tutorial TCGA - R languages RTCGA data packet acquisition TCGA(packing offline version)
  • 28 tutorial TCGA - R languages RTCGAToolbox data packet acquisition TCGA(FireBrowse portal)
  • TCGA of 28 tutorials - TCGA batch download all the data(UCSC’s XENA)
  • TCGA of 28 Tutorial - call it a data download
  • TCGA of 28 tutorials - designated cancer interested in seeing the expression of genes
  • TCGA of 28 tutorials - to do any analysis of cancer survival in any gene database TCGA
  • TCGA of 28 tutorials - xml format finishing GDC clinical data download
  • TCGA of 28 Tutorial - the risk factors associated with drawing - a value of 1000 but late answer
  • TCGA of 28 Tutorial - Data mining three tricks of ceRNA
  • TCGA of 28 tutorials - all cancer mutations panorama
  • TCGA of 28 Tutorial - Pan-Cancer Early
  • 28 tutorials - CNV TCGA of Raiders
  • TCGA of 28 tutorials - GTEx database - TCGA data mining is a good helper
  • TCGA of 28 tutorials - all cancer mutations panorama

Conclusion

The pan-cancer research program has made significant progress in understanding the commonalities and differences among various types of cancer. The analysis of cancer-related genes and the effectiveness of saturation testing in sequencing data are still ongoing research areas. By following the latest research and analysis, we can rise above the pan-cancer research program and gain a deeper understanding of cancer biology.