Skip to content

Latest commit

 

History

History
60 lines (32 loc) · 2.38 KB

plotting.md

File metadata and controls

60 lines (32 loc) · 2.38 KB

#Plotting

One are where R has had a huge advantage over Python is plotting and making attractive graphics. Python recently got its own version of the popular ggplots library, which we'll be working with today.

from ggplot import * 

my_plot =  ggplot(df, aes(x='Expenditure',y='Species')) + geom_point() + xlim(0,350) + ylim(0,21)

my_plot

Easy, a scatter plot. The syntax here might look a little alien. Plots, in Python and R, are generally build by combining objects (or "layers") together for some effect. For example, in our plot, we've defined that we'll be making a plot uing data from the dataframe df with two specific coloumns as the X and Y axes. Additional objects provided are axis limits.

You'll notice that this plot has opened in a viewer window. We can change this behavior.

ggsave(my_plot, "my_plot", "png")

OK, that's all well and good. Let's try this with some Biggish Data. Let's say we've got a big matrix. Half the data was collected by me, half by my labmate. I'm suspcious that we've not done collection in the exact same way. What I want to do is load up my data, add a column that shows who collected what, and plot the two subsets against each other.

big_matrix = pd.read_csv('/home/april/Documents/ccbb_pythonspring2014/random_mat', index_col=None, header=None)

big_matrix['1001'] = '1001'
big_matrix['1001'][0:250] = 'Me'
big_matrix['1001'][251:500] = 'My labmate'

Print out that matrix. Does it look like you thought it might? What's with the row and column names? Let's check and make sure that the reassignment worked.

ggplot(aes(x=big_matrix.ix[0:500,2], y = range(0,500), color='1001'), data=big_matrix) + geom_point() +geom_jitter()  + ylab("Y Axis") + xlab("X Axis") +ylim(0,500) + xlim(0,1) + theme_bw()

OK, that looks good.

Let's actually overlay my data and hers:

ggplot(aes(x=big_matrix.ix[0:499,2], y = big_matrix.ix[:,1], color='1001'), data=big_matrix) + geom_point() +geom_jitter()  + ylab("Y Axis") + xlab("X Axis") +ylim(0,1) + xlim(0,1) + theme_bw()

Hmmm, some of her points are right on mine, some aren't. It looks like maybe we had some different practices in data collection.

##Exercise:

Try removing some of the geoms. What happens. I've attached a cheatsheet, 'Plot_types.md'. Try a couple of the different options and see what they do. What types of plots would be good for your data?