Analysis of Variance in Python Data Analysis

Analysis of Variance in Python Data Analysis

problem

Set up a nursery to develop 5 different treatment methods for a flower tree seed, each method treated 6 seeds for seedling experiment. Observe the height of the seedlings one year later and obtain the information as shown in the table below. It is known that except for the different treatment methods, other seedling raising conditions are the same and the distribution of seedling height is similar to normal and equal variance, try to judge whether the seed treatment method has a significant impact on the growth of seedlings with 95% reliability.

Data preprocessing

  1. Those who have done the analysis of variance know, first make an assumption H0: different treatment methods have no significant effect on the growth of seedlings.
  2. Look at the data given by the course teacher

The format of the copy is very unfriendly, so I wrote a python code for conversion, the code:

import csv
i = 0
f = open('C://Users/Administrator/Desktop/Analysis of variance.txt','r')
csvfile = open('C://Users/Administrator/Desktop/Analysis of variance.csv','wt',newline='',encoding='utf-8')
writer = csv.writer(csvfile)
for fs in f:
    i = i+1
    contents_1 = fs.strip()
    contents = contents_1.split(',')
    for content in contents:
        writer.writerow((content,i))
f.close()
csvfile.close()

The data can be converted into the following format, which is convenient to run in python's variance analysis:

Python ANOVA

df = pd.read_excel('C:/Users/Administrator/Desktop/Analysis of variance.xls',header=None,names=['value','group'])
d1 = df[df['group']==1]['value']
d2 = df[df['group']==2]['value']
d3 = df[df['group']==3]['value']
d4 = df[df['group']==4]['value']
d5 = df[df['group']==5]['value']
args = [d1,d2,d3,d4,d5]
f,p = stats.f_oneway(*args)
print(f,p)

The result is shown in the figure:

Conclusion

Look up the table and get F0.05(4,25)=2.76, because F=Sb2/Sw2=4.38﹥F0.05(4,25)=2.76, so the hypothesis H0 is overturned (or rejected), that is, caused by different treatment methods The difference in seedling height growth is significant.

Reference: https://cloud.tencent.com/developer/article/1197125 Analysis of Variance in Python Data Analysis-Cloud + Community-Tencent Cloud