Skip to topic | Skip to bottom
Search:
Home

Chris and Janet's website

Home


Start of topic | Skip to actions

Multiple plots

To print 4 histograms (with 100 classes) of random normal distribution samples on the same plot, where n=351, mean=160, sd=6.03:

> par(mfrow=c(2,2)) 
> for (i in 1:4){
   hist(rnorm(351, 160, 6.03), 100)
}

Gives the following plot (yours may differ because of randomness): Multi

To produce a jpeg:

> jpeg("~/multiplot.jpg")
> par(mfrow=c(3,3))
> for (i in 1:9){
 hist(rnorm(351, 160, 6.03), 100)
}
> dev.off()

Which produces the following image:

Multi

Note: The dimensions of the image are 480x480 pixels, hence the titles being truncated. The following is more appropriate:

> jpeg("~/multiplot.jpg", 600, 600)

an


Cumalative plot

coal.txt contains the interval in days between successive mine explosions. The following command can be used to produce a cumulative plot:

> x<-scan("coal.txt") //read data

> time<-cumsum(x) //cumulative 'running' sum of the data

> number<-1:length(x) //sequence from 1:190

> plot(time, number) //giving:

Coal

> scatter.smooth(x=time, y=number) //or a scatter with smoothed curve overlay:

coal


Poisson distribution

To plot a poisson probability function Poission(mu) from the data in dec20.txt, issue the following commands:

> dec20<-scan("dec20.txt") //load in data

> barplot(dpois(1:max(dec20),summary(dec20)['Mean'])) //prints the following:

dec

Note: the syntax to get the mean from the summary is not straight forward. Because summary(...) returns an atomic vector, hence the $ notation cannot be used to (de-)reference the internals of the returned data structure, like it can from say hist(x,y)$counts. The array syntax [] should be used, hence: summary(dec20)['Mean'] to get hold of the mean/mu. You might find it enlightening to issue the following commands: mode(summary(dec20)), mode(hist(x)) to get a feel for the different data structures returned from various functions. You might also like to try attributes(...).


Histograms

> hist(dec20) //plots the following

dec

But can be improved with the following:

> hist(dec20, 23, right=F)

dec

Notice that the second plot has more groups/classes, and the frequency for 0 is now plotted separately, right=F.


Poisson process

earth.txt contains intervals in days between major earthquakes.

> e<-scan("earth.txt")

The following finds the probability that exactly 5 earthquakes happen in a typical 4 year period:

> dpois(5,((365*3)+366)/summary(e)['Mean'])

And the following finds the probability that fewer than 3 earthquakes happen in a 4 year period:

> ppois(2,((365*3)+366)/summary(e)['Mean'])

The waiting times between successive earthquakes can be described through an exponential distribution, with parameter lambda (rate, number of earthquakes a day).

The probability that the waiting time will be less than the 30 days:

> pexp(30, 62/27107)

and will exceed two years:

> 1-pexp(2*365, 62/27107)


Quantiles for an exponential distribution

The mean of the interval between serious earthquakes is 437 days (see above for data). The median can be calculated using P(X<= x)=0.5:

> qexp(.5, 1/437)

which results in approx 303 days. 1/437 is the rate of earthquakes (1/mu). Any quantile can be calculated in this manner.


Cycling accident data for 2005

See also: Times online, Gov Direct, Repackaged csv.

Load the following file 2005.csv

> stuff<-scan("2005.csv", skip=1, what=list(year=0,north=0, east=0), sep=',', quiet=T)

Create a density plot of the co-ordinate data (densities are calculated over a 500x500 grid):

> blah<-kde2d(stuff$north, stuff$east, n=500)

blah is a list containing 3 items, x and y, representing the grid, and z a 500x500 matrix containing the kernel density for each cell in the grid:

Then plot the result:

> image(blah, col=c(0, rainbow(50)))

accident

You can see that the above plot is a lot clearer than the following:

> plot(stuff$north, stuff$east)

latlong


If you want to add a colour legend to the z-plot above, you'll need to install the fields package:

> install.packages("fields")

> library(fields)

> image.plot(blah, col=c(0, rainbow(50)))

legend


To add a few place names...

> points(537500, 169500);text(537500, 169500, pos=2,"Beckenham");points(534500,184500);text(534500,184500, pos=2, "Hackney");points(532108,178763);text(532108,178763, pos=2, "E&C")
> points(531496,181359)
> text(531496,181359, "Shoe Lane (The City)", pos=4)


The following plot show the accident hotspot for London in 2005 was approximately in the region of Shoe Lane:

accident

and Victoria Emabankment in 2007:

blah


TODO nice to add UK boundary to the plots.



Testing the central limit theorem

I wanted to test that the mean over a large number of samples can be modeled adequately with a normal distribution.

Further, I wanted to see if the theorem held up when the original distribution was heavily skew.

To investigate:

means<-numeric(10000) //somewhere to store the means

for(i in 1:10000){means[i]<-mean(rbinom(n=10000, prob=.1, size=5))} //generate 10000 random binomial distributions, storing each of their means, note I use prob=.1 producing a heavily skew distribution

h<-hist(means) //plot the distribution

Now lets overlay the idealised normal curve.

x<-seq(0.4766, 0.5262,.0001) //generate the x values, use range(means)

y<-dnorm(x,mean=mean(means),sd(means))*.005*10000 //scale the y values by the bin width and the number of samples (look at the =h= for bin width).
central

Normal approximation looks pretty good.

And the probabilty plot looks like this:

plot(sort(means), qnorm(seq(1/10000, 1,1/10000)))

prob

Again looks very good.

Data frames and Combinations

Create a list containing two columns with a count form 1 to 31

L<-list(A=c(1:31),B=c(1:31))

Create a data frame that contains all the combinations 1 to 31 G<-expand.grid(L)

Add a column, excel style:

G$fraction <- G$A/G$B

Filter by column:

G[G$fraction <= 1,]

Count the number of rows of a dataframe:

nrow (G[G$fraction <= 1,])

to top

You are here: Home > WebLeftBar > TechStuff > R

to top