Gene chip analysis process

Gene chip analysis process

Reading of array data

To read array data, first determine the sequencing platform and data series, and then use the corresponding package to read the gene expression chip data-CEL format file and process it into an expression matrix. The chip platform processed by the affy package (Affymetrix platform) is generally hgu 95 series and 133 series; the oligo package (Affymetrix platform) can process affymetrix's Gene ST arrays, such as [HuGene-1_1-st] Affymetrix Human Gene 1.1 ST Array; Illumina Platform, you can use beadarray or lumi

Some introduction to array: 1. http://homer.ucsd.edu/homer/basicTutorial/affymetrix.html

2. https://qiubio.com/new/book/chapter-03/#%E7%AC%AC%E4%BA%8C%E7%AB%A0-%E5%9F%BA%E5%9B%A0 %E8%8A%AF%E7%89%87%E5%88%86%E6%9E%90chapter-1-microarray-analysis

Affy analysis array

source("https://bioconductor.org/biocLite.R")
biocLite("affy")
biocLite("hgu133plus2.db")#Comment file
library(affy)
library("hgu133plus2.db")
Read CEL file
#Read CEL file
celFiles <- list.celfiles(path = "/yourdataPATH/", full.names=TRUE)
data.affy <- ReadAffy(filenames = celFiles)
data.affy
standardization
#Normalize data with RMA
data.rma <- rma(data.affy)
Get expression matrix
expr.rma <- exprs(data.rma) # format as table
Annotate the probe
# Convert gene names
Annot <- data.frame(REFSEQ=sapply(contents(hgu133plus2REFSEQ), paste, collapse=", "),
                    SYMBOL=sapply(contents(hgu133plus2SYMBOL), paste, collapse=", "),
                    DESC = sapply(contents(hgu133plus2GENENAME), paste, collapse=", "))
# Merge data frames together (like a database table join)
all <- merge(Annot, expr.rma, by.x=0, ​​by.y=0, all = TRUE)
#remove probe ID,gene REFSEQ and DESC and NA 
all<-all[,c(-1:-2,-4)]
all<-all[which(all[,1] != "NA"),]
#use mean of probe ID as gene expression
all<-aggregate(.~SYMBOL,all,mean)
#change transcript ID to gene symbol in rownames
rownames(all)<-all$SYMBOL

Refer to the limma package for subsequent difference analysis

oligo analysis array

source("https://bioconductor.org/biocLite.R")
biocLite("oligo")
library("oligo")
library("oligo")
#File location
data.dir <- "../../../test/GSE81580_RAW-2/"
##CEL file reading
celfiles <- list.files(data.dir, "\\CEL.gz$")
data.raw <- read.celfiles(filenames = file.path(data.dir, celfiles))
#Expression calculation
data.eset <- oligo::rma(data.raw) #Include background processing, normalization and expression calculation
data.exprs <- exprs(data.eset) #Extract expression matrix
write.csv(data.exprs,"../../../test/GSE81580expr.csv")
Annotate the probe

Get the annotation R package corresponding to the array. Can be queried by ALL

biocLite("ALL")
library(ALL)
show(data.raw)

And some URLs for querying and downloading annotation files: Annotation file download https://www.thermofisher.com/cn/zh/home/life-science/microarray-analysis/microarray-data-analysis/genechip-array-annotation-files .html

http://www.affymetrix.com/support/technical/byproduct.affx?product=mo_trans_assay

Query of annotation packages for each species: https://blog.csdn.net/weixin_40739969/article/details/103186027

Reference: https://bioconductor.org/packages/release/bioc/vignettes/oligo/inst/doc/oug.pdf http://bioinfo.au.tsinghua.edu.cn/member/cye/ref/Microarray.pdf https://y570pc.github.io/%E4%BD%BF%E7%94%A8oligo%E5%8C%85%E5%A4%84%E7%90%86%E5%9F%BA%E5%9B %A0%E8%8A%AF%E7%89%87%E6%95%B0%E6%8D%AE/ https://blog.csdn.net/tommyhechina/article/details/80409983

Reference: https://cloud.tencent.com/developer/article/1625373 Gene Chip Analysis Process-Cloud + Community-Tencent Cloud