To read array data, first determine the sequencing platform and data series, and then use the corresponding package to read the gene expression chip data-CEL format file and process it into an expression matrix. The chip platform processed by the affy package (Affymetrix platform) is generally hgu 95 series and 133 series; the oligo package (Affymetrix platform) can process affymetrix's Gene ST arrays, such as [HuGene-1_1-st] Affymetrix Human Gene 1.1 ST Array; Illumina Platform, you can use beadarray or lumi
Some introduction to array: 1. http://homer.ucsd.edu/homer/basicTutorial/affymetrix.html
source("https://bioconductor.org/biocLite.R") biocLite("affy") biocLite("hgu133plus2.db")#Comment file library(affy) library("hgu133plus2.db")
#Read CEL file celFiles <- list.celfiles(path = "/yourdataPATH/", full.names=TRUE) data.affy <- ReadAffy(filenames = celFiles) data.affy
#Normalize data with RMA data.rma <- rma(data.affy)
expr.rma <- exprs(data.rma) # format as table
# Convert gene names Annot <- data.frame(REFSEQ=sapply(contents(hgu133plus2REFSEQ), paste, collapse=", "), SYMBOL=sapply(contents(hgu133plus2SYMBOL), paste, collapse=", "), DESC = sapply(contents(hgu133plus2GENENAME), paste, collapse=", ")) # Merge data frames together (like a database table join) all <- merge(Annot, expr.rma, by.x=0, by.y=0, all = TRUE) #remove probe ID,gene REFSEQ and DESC and NA all<-all[,c(-1:-2,-4)] all<-all[which(all[,1] != "NA"),] #use mean of probe ID as gene expression all<-aggregate(.~SYMBOL,all,mean) #change transcript ID to gene symbol in rownames rownames(all)<-all$SYMBOL
Refer to the limma package for subsequent difference analysis
source("https://bioconductor.org/biocLite.R") biocLite("oligo") library("oligo")
library("oligo") #File location data.dir <- "../../../test/GSE81580_RAW-2/" ##CEL file reading celfiles <- list.files(data.dir, "\\CEL.gz$") data.raw <- read.celfiles(filenames = file.path(data.dir, celfiles)) #Expression calculation data.eset <- oligo::rma(data.raw) #Include background processing, normalization and expression calculation data.exprs <- exprs(data.eset) #Extract expression matrix write.csv(data.exprs,"../../../test/GSE81580expr.csv")
Get the annotation R package corresponding to the array. Can be queried by ALL
biocLite("ALL") library(ALL) show(data.raw)
And some URLs for querying and downloading annotation files: Annotation file download https://www.thermofisher.com/cn/zh/home/life-science/microarray-analysis/microarray-data-analysis/genechip-array-annotation-files .html
http://www.affymetrix.com/support/technical/byproduct.affx?product=mo_trans_assay
Query of annotation packages for each species: https://blog.csdn.net/weixin_40739969/article/details/103186027
Reference: https://bioconductor.org/packages/release/bioc/vignettes/oligo/inst/doc/oug.pdf http://bioinfo.au.tsinghua.edu.cn/member/cye/ref/Microarray.pdf https://y570pc.github.io/%E4%BD%BF%E7%94%A8oligo%E5%8C%85%E5%A4%84%E7%90%86%E5%9F%BA%E5%9B %A0%E8%8A%AF%E7%89%87%E6%95%B0%E6%8D%AE/ https://blog.csdn.net/tommyhechina/article/details/80409983