Remove mapping bias

Remove mapping bias

In the research of alleles, one of the important steps is to remove the alignment bias. As we all know, human beings are diploid organisms, there will be a pair of alleles at the same position on the chromosome, which is generally homozygous (homozygous); sometimes one of the alleles is mutated (it can be understood as a SNP, oligonucleotide polymorphism) ), will become heterozygous (heterozygous) state.

image.png

If a person is heterozygous at a certain locus, such as AG (A is consistent with the reference genome, G is the mutation locus), in the process of comparing with software such as bowtie2 or bwa, reads carrying A are easier to compare than Right, and the reads carrying G will be relatively difficult to compare because they are not completely consistent with the reference genome (as a mismatch penalty), which will eventually lead to a difference in the number of reads between the two and cause errors.

The following are several methods that I have seen to remove alignment bias:

  1. Use N-masked genome comparison, that is, replace all the SNP positions of the reference genome with N. The older method does not work well;
  2. To construct a personal genome, replace the position of the reference genome SNP with the SNP that is different from the reference (ie alternative SNP). This method is very commonly used.
  3. Simulate data, insert the simulated data into the SNP site comparison, and discard the sites that are prone to misalignment.
  4. Constructing the parental genome, if you know the parent's genome information, comparing the person's information to the parent's genome, the scope of application is relatively narrow.
  5. The method on Nature Method in 2015-WASP, the principle is to consider various combinations of reads with SNPs and construct reads of different situations to compare. If they are all compared to the same position, then leave it. If not, just leave it. Throw away. The effect is relatively good.

In 15 years GB, there is an article Tools and best practices for data processing in allelic expression analysis comparing the effects of several methods at that time:

image.png

WASP works well and loses the least reads.

Welcome to pay attention to the public account: daily common learning of Shengxin programming~

Reference: https://cloud.tencent.com/developer/article/1607861 remove mapping bias-Cloud+Community-Tencent Cloud