Stop Thinking, Just Do!

Sungsoo Kim's Blog

Boxplots comparing two different populations in Python

tagsTags

11 January 2014


Boxplots comparing two different populations

A common scenario in research is trying to determine whether two groups systematically differ on some characteristic. We can simulate this situation by generating normally distributed random variates for imaginary groups 1 and 2. In fact there is a difference: group 2 has a higher mean of 105 compared to 95. Parallel boxplots are a good way of showing the distribution of scores across the two groups.

import pylab
import random

popSize = 100
category1 = []
category2 = []

## Start by generating scores for category 1 individuals
## Their mean score is 95
for i in range(popSize):
    category1.append(random.normalvariate(95,10))

## Now generate scores for category 2 individuals
## They have a higher mean of 105
for i in range(popSize):
    category2.append(random.normalvariate(105,10))

scores = [category1,category2]

pylab.boxplot(scores)
pylab.savefig('boxplots.png')
pylab.show()

Line graphs

To show how to plot a line graph in Python, we can generate a fictional time series. Imagine a 100-day time span, and that the price of some commodity tends to increase by about 0.2 units per day.

```python import pylab import random

duration = 100 meanIncrease = 0.2 stDevIncrease = 1.2

Here we generate a fictional time series, for a

variable that generally increases over time but

has significant noise.

x = range(duration) y = [] yNow = 0

for i in x: nextDelta = random.normalvariate(meanIncrease,stDevIncrease) yNow += nextDelta y.append(yNow)

pylab.plot(x,y) pylab.xlabel(“Time”) pylab.ylabel(“Value”) pylab.savefig(“lineGraph.png”) pylab.show()

References

[1] Ken Lambert, Fundamentals of Python: First Programs, 2012.


comments powered by Disqus