<<O>>  Difference Topic R (r1.13 - 29 Jan 2010 - ChrisJones)

META TOPICPARENT TechStuff
TOC: No TOC in "Home.R"
Line: 228 to 228

Again looks very good.

Added:
>
>

Data frames and Combinations

Create a list containing two columns with a count form 1 to 31

L<-list(A=c(1:31),B=c(1:31))

Create a data frame that contains all the combinations 1 to 31 G<-expand.grid(L)

Add a column, excel style:

G$fraction <- G$A/G$B

Filter by column:

G[G$fraction <= 1,]

Count the number of rows of a dataframe:

nrow (G[G$fraction <= 1,])

META FILEATTACHMENT multiplot.jpg attr="h" comment="Multi" date="1236370301" path="multiplot.jpg" size="34299" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot1.jpg attr="h" comment="another" date="1236370740" path="multiplot1.jpg" size="49539" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot4.jpg attr="h" comment="an" date="1236375152" path="multiplot4.jpg" size="66058" user="ChrisJones" version="1.1"
 <<O>>  Difference Topic R (r1.12 - 17 May 2009 - ChrisJones)

META TOPICPARENT TechStuff
TOC: No TOC in "Home.R"
Line: 191 to 191

Added:
>
>

Testing the central limit theorem

I wanted to test that the mean over a large number of samples can be modeled adequately with a normal distribution.

Further, I wanted to see if the theorem held up when the original distribution was heavily skew.

To investigate:

means<-numeric(10000) //somewhere to store the means

for(i in 1:10000){means[i]<-mean(rbinom(n=10000, prob=.1, size=5))} //generate 10000 random binomial distributions, storing each of their means, note I use prob=.1 producing a heavily skew distribution

h<-hist(means) //plot the distribution

Now lets overlay the idealised normal curve.

x<-seq(0.4766, 0.5262,.0001) //generate the x values, use range(means)

y<-dnorm(x,mean=mean(means),sd(means))*.005*10000 //scale the y values by the bin width and the number of samples (look at the =h= for bin width).
central

Normal approximation looks pretty good.

And the probabilty plot looks like this:

plot(sort(means), qnorm(seq(1/10000, 1,1/10000)))

prob

Again looks very good.


META FILEATTACHMENT multiplot.jpg attr="h" comment="Multi" date="1236370301" path="multiplot.jpg" size="34299" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot1.jpg attr="h" comment="another" date="1236370740" path="multiplot1.jpg" size="49539" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot4.jpg attr="h" comment="an" date="1236375152" path="multiplot4.jpg" size="66058" user="ChrisJones" version="1.1"
Line: 211 to 248

META FILEATTACHMENT accident2005Legend.jpg attr="h" comment="accident" date="1240054379" path="accident2005Legend.jpg" size="163511" user="ChrisJones" version="1.2"
META FILEATTACHMENT accident2005Lengend.jpg attr="h" comment="accident" date="1240054423" path="accident2005Lengend.jpg" size="28333" user="ChrisJones" version="1.1"
META FILEATTACHMENT accidentHotspot2007.jpg attr="h" comment="blah" date="1240058769" path="~accidentHotspot2007.jpg" size="27318" user="ChrisJones" version="1.1"
Added:
>
>
META FILEATTACHMENT central.jpg attr="h" comment="central" date="1242552167" path="central.jpg" size="19316" user="ChrisJones" version="1.1"
META FILEATTACHMENT probplot.jpg attr="h" comment="prob" date="1242554963" path="probplot.jpg" size="16307" user="ChrisJones" version="1.1"
 <<O>>  Difference Topic R (r1.11 - 18 Apr 2009 - ChrisJones)

META TOPICPARENT TechStuff
TOC: No TOC in "Home.R"
Line: 127 to 127

Cycling accident data for 2005

Changed:
<
<
See also: Time online, Gov Direct, Repackaged csv.
>
>
See also: Times online, Gov Direct, Repackaged csv.

Load the following file 2005.csv

Line: 137 to 137

> blah<-kde2d(stuff$north, stuff$east, n=500)

Changed:
<
<
blah is a list containing 3 items, x and y, representing the grid, and z a 500x500 matrix containing the density value for each item in the grid:
>
>
blah is a list containing 3 items, x and y, representing the grid, and z a 500x500 matrix containing the kernel density for each cell in the grid:

Then plot the result:

Line: 175 to 175


Changed:
<
<
The following plots show the accident hotspot for London, in 2005 (and 2007):
>
>
The following plot show the accident hotspot for London in 2005 was approximately in the region of Shoe Lane:

accident

Changed:
<
<
and 2007:
>
>
and Victoria Emabankment in 2007:

blah

 <<O>>  Difference Topic R (r1.10 - 18 Apr 2009 - ChrisJones)

META TOPICPARENT TechStuff
TOC: No TOC in "Home.R"
Line: 173 to 173

> text(531496,181359, "Shoe Lane (The City)", pos=4)
Added:
>
>

The following plots show the accident hotspot for London, in 2005 (and 2007):


accident
Changed:
<
<

>
>
and 2007:

Changed:
<
<
>
>
blah


TODO nice to add UK boundary to the plots.

Added:
>
>


META FILEATTACHMENT multiplot.jpg attr="h" comment="Multi" date="1236370301" path="multiplot.jpg" size="34299" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot1.jpg attr="h" comment="another" date="1236370740" path="multiplot1.jpg" size="49539" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot4.jpg attr="h" comment="an" date="1236375152" path="multiplot4.jpg" size="66058" user="ChrisJones" version="1.1"
Line: 202 to 210

META FILEATTACHMENT 2007.csv attr="h" comment="2009" date="1240052016" path="2007.csv" size="312058" user="ChrisJones" version="1.1"
META FILEATTACHMENT accident2005Legend.jpg attr="h" comment="accident" date="1240054379" path="accident2005Legend.jpg" size="163511" user="ChrisJones" version="1.2"
META FILEATTACHMENT accident2005Lengend.jpg attr="h" comment="accident" date="1240054423" path="accident2005Lengend.jpg" size="28333" user="ChrisJones" version="1.1"
Added:
>
>
META FILEATTACHMENT accidentHotspot2007.jpg attr="h" comment="blah" date="1240058769" path="~accidentHotspot2007.jpg" size="27318" user="ChrisJones" version="1.1"
 <<O>>  Difference Topic R (r1.9 - 18 Apr 2009 - ChrisJones)

META TOPICPARENT TechStuff
TOC: No TOC in "Home.R"
Line: 163 to 163

legend

Added:
>
>

To add a few place names...

> points(537500, 169500);text(537500, 169500, pos=2,"Beckenham");points(534500,184500);text(534500,184500, pos=2, "Hackney");points(532108,178763);text(532108,178763, pos=2, "E&C")
> points(531496,181359)
> text(531496,181359, "Shoe Lane (The City)", pos=4)

accident




TODO nice to add UK boundary to the plots.

META FILEATTACHMENT multiplot.jpg attr="h" comment="Multi" date="1236370301" path="multiplot.jpg" size="34299" user="ChrisJones" version="1.1"
Line: 181 to 199

META FILEATTACHMENT accident2005.jpg attr="h" comment="accident" date="1239964793" path="accident2005.jpg" size="19448" user="ChrisJones" version="1.1"
META FILEATTACHMENT latlong2005.jpg attr="h" comment="latlong" date="1239964993" path="latlong2005.jpg" size="31799" user="ChrisJones" version="1.1"
META FILEATTACHMENT legend2005.jpg attr="h" comment="legend" date="1239965425" path="legend2005.jpg" size="22219" user="ChrisJones" version="1.1"
Added:
>
>
META FILEATTACHMENT 2007.csv attr="h" comment="2009" date="1240052016" path="2007.csv" size="312058" user="ChrisJones" version="1.1"
META FILEATTACHMENT accident2005Legend.jpg attr="h" comment="accident" date="1240054379" path="accident2005Legend.jpg" size="163511" user="ChrisJones" version="1.2"
META FILEATTACHMENT accident2005Lengend.jpg attr="h" comment="accident" date="1240054423" path="accident2005Lengend.jpg" size="28333" user="ChrisJones" version="1.1"
 <<O>>  Difference Topic R (r1.8 - 17 Apr 2009 - ChrisJones)

META TOPICPARENT TechStuff
TOC: No TOC in "Home.R"
Line: 123 to 123

which results in approx 303 days. 1/437 is the rate of earthquakes (1/mu). Any quantile can be calculated in this manner.

Added:
>
>

Cycling accident data for 2005

See also: Time online, Gov Direct, Repackaged csv.

Load the following file 2005.csv

> stuff<-scan("2005.csv", skip=1, what=list(year=0,north=0, east=0), sep=',', quiet=T)

Create a density plot of the co-ordinate data (densities are calculated over a 500x500 grid):

> blah<-kde2d(stuff$north, stuff$east, n=500)

blah is a list containing 3 items, x and y, representing the grid, and z a 500x500 matrix containing the density value for each item in the grid:

Then plot the result:

> image(blah, col=c(0, rainbow(50)))

accident

You can see that the above plot is a lot clearer than the following:

> plot(stuff$north, stuff$east)

latlong


If you want to add a colour legend to the z-plot above, you'll need to install the fields package:

> install.packages("fields")

> library(fields)

> image.plot(blah, col=c(0, rainbow(50)))

legend

TODO nice to add UK boundary to the plots.


META FILEATTACHMENT multiplot.jpg attr="h" comment="Multi" date="1236370301" path="multiplot.jpg" size="34299" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot1.jpg attr="h" comment="another" date="1236370740" path="multiplot1.jpg" size="49539" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot4.jpg attr="h" comment="an" date="1236375152" path="multiplot4.jpg" size="66058" user="ChrisJones" version="1.1"
Line: 135 to 177

META FILEATTACHMENT dec20hist.jpg attr="h" comment="dec" date="1236629031" path="dec20hist.jpg" size="13918" user="ChrisJones" version="1.1"
META FILEATTACHMENT dec20hist1.jpg attr="h" comment="dec" date="1236629116" path="dec20hist1.jpg" size="15759" user="ChrisJones" version="1.1"
META FILEATTACHMENT earth.txt attr="h" comment="earth" date="1236633123" path="earth.txt" size="299" user="ChrisJones" version="1.1"
Added:
>
>
META FILEATTACHMENT 2005.csv attr="h" comment="Cycling accident data" date="1239961818" path="2005.csv" size="319469" user="ChrisJones" version="1.1"
META FILEATTACHMENT accident2005.jpg attr="h" comment="accident" date="1239964793" path="accident2005.jpg" size="19448" user="ChrisJones" version="1.1"
META FILEATTACHMENT latlong2005.jpg attr="h" comment="latlong" date="1239964993" path="latlong2005.jpg" size="31799" user="ChrisJones" version="1.1"
META FILEATTACHMENT legend2005.jpg attr="h" comment="legend" date="1239965425" path="legend2005.jpg" size="22219" user="ChrisJones" version="1.1"
 <<O>>  Difference Topic R (r1.7 - 10 Mar 2009 - ChrisJones)

META TOPICPARENT TechStuff
TOC: No TOC in "Home.R"
Line: 113 to 113

> 1-pexp(2*365, 62/27107)

Added:
>
>

Quantiles for an exponential distribution

The mean of the interval between serious earthquakes is 437 days (see above for data). The median can be calculated using P(X<= x)=0.5:

> qexp(.5, 1/437)

which results in approx 303 days. 1/437 is the rate of earthquakes (1/mu). Any quantile can be calculated in this manner.


META FILEATTACHMENT multiplot.jpg attr="h" comment="Multi" date="1236370301" path="multiplot.jpg" size="34299" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot1.jpg attr="h" comment="another" date="1236370740" path="multiplot1.jpg" size="49539" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot4.jpg attr="h" comment="an" date="1236375152" path="multiplot4.jpg" size="66058" user="ChrisJones" version="1.1"
 <<O>>  Difference Topic R (r1.6 - 10 Mar 2009 - ChrisJones)

META TOPICPARENT TechStuff
TOC: No TOC in "Home.R"
Line: 41 to 41

Cumalative plot

Changed:
<
<
Given the data in the coal.txt, the following command can be used to produce a cumulative plot:
>
>
coal.txt contains the interval in days between successive mine explosions. The following command can be used to produce a cumulative plot:

> x<-scan("coal.txt") //read data

Changed:
<
<
> time<-cumsum(x) //cumulative sum of the data
>
>
> time<-cumsum(x) //cumulative 'running' sum of the data

> number<-1:length(x) //sequence from 1:190

 <<O>>  Difference Topic R (r1.5 - 09 Mar 2009 - ChrisJones)

META TOPICPARENT TechStuff
Added:
>
>
TOC: No TOC in "Home.R"

Multiple plots

To print 4 histograms (with 100 classes) of random normal distribution samples on the same plot, where n=351, mean=160, sd=6.03:

Line: 69 to 71

Note: the syntax to get the mean from the summary is not straight forward. Because summary(...) returns an atomic vector, hence the $ notation cannot be used to (de-)reference the internals of the returned data structure, like it can from say hist(x,y)$counts. The array syntax [] should be used, hence: summary(dec20)['Mean'] to get hold of the mean/mu. You might find it enlightening to issue the following commands: mode(summary(dec20)), mode(hist(x)) to get a feel for the different data structures returned from various functions. You might also like to try attributes(...).

Added:
>
>

Histograms

> hist(dec20) //plots the following

dec

But can be improved with the following:

> hist(dec20, 23, right=F)

dec

Notice that the second plot has more groups/classes, and the frequency for 0 is now plotted separately, right=F.


Poisson process

earth.txt contains intervals in days between major earthquakes.

> e<-scan("earth.txt")

The following finds the probability that exactly 5 earthquakes happen in a typical 4 year period:

> dpois(5,((365*3)+366)/summary(e)['Mean'])

And the following finds the probability that fewer than 3 earthquakes happen in a 4 year period:

> ppois(2,((365*3)+366)/summary(e)['Mean'])

The waiting times between successive earthquakes can be described through an exponential distribution, with parameter lambda (rate, number of earthquakes a day).

The probability that the waiting time will be less than the 30 days:

> pexp(30, 62/27107)

and will exceed two years:

> 1-pexp(2*365, 62/27107)


META FILEATTACHMENT multiplot.jpg attr="h" comment="Multi" date="1236370301" path="multiplot.jpg" size="34299" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot1.jpg attr="h" comment="another" date="1236370740" path="multiplot1.jpg" size="49539" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot4.jpg attr="h" comment="an" date="1236375152" path="multiplot4.jpg" size="66058" user="ChrisJones" version="1.1"
Line: 77 to 121

META FILEATTACHMENT coal1.jpg attr="h" comment="coal" date="1236616595" path="coal1.jpg" size="15856" user="ChrisJones" version="1.1"
META FILEATTACHMENT dec20.txt attr="h" comment="dec" date="1236621090" path="dec20.txt" size="396" user="ChrisJones" version="1.1"
META FILEATTACHMENT dec20.jpg attr="h" comment="dec" date="1236621306" path="dec20.jpg" size="10971" user="ChrisJones" version="1.1"
Added:
>
>
META FILEATTACHMENT dec20hist attr="h" comment="sdw" date="1236628989" path="dec20hist" size="13918" user="ChrisJones" version="1.1"
META FILEATTACHMENT dec20hist.jpg attr="h" comment="dec" date="1236629031" path="dec20hist.jpg" size="13918" user="ChrisJones" version="1.1"
META FILEATTACHMENT dec20hist1.jpg attr="h" comment="dec" date="1236629116" path="dec20hist1.jpg" size="15759" user="ChrisJones" version="1.1"
META FILEATTACHMENT earth.txt attr="h" comment="earth" date="1236633123" path="earth.txt" size="299" user="ChrisJones" version="1.1"
 <<O>>  Difference Topic R (r1.4 - 09 Mar 2009 - ChrisJones)

META TOPICPARENT TechStuff

Multiple plots

Line: 55 to 55

coal

Added:
>
>

Poisson distribution

To plot a poisson probability function Poission(mu) from the data in dec20.txt, issue the following commands:

> dec20<-scan("dec20.txt") //load in data

> barplot(dpois(1:max(dec20),summary(dec20)['Mean'])) //prints the following:

dec

Note: the syntax to get the mean from the summary is not straight forward. Because summary(...) returns an atomic vector, hence the $ notation cannot be used to (de-)reference the internals of the returned data structure, like it can from say hist(x,y)$counts. The array syntax [] should be used, hence: summary(dec20)['Mean'] to get hold of the mean/mu. You might find it enlightening to issue the following commands: mode(summary(dec20)), mode(hist(x)) to get a feel for the different data structures returned from various functions. You might also like to try attributes(...).


META FILEATTACHMENT multiplot.jpg attr="h" comment="Multi" date="1236370301" path="multiplot.jpg" size="34299" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot1.jpg attr="h" comment="another" date="1236370740" path="multiplot1.jpg" size="49539" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot4.jpg attr="h" comment="an" date="1236375152" path="multiplot4.jpg" size="66058" user="ChrisJones" version="1.1"
META FILEATTACHMENT coal.txt attr="h" comment="Coal" date="1236615897" path="coal.txt" size="855" user="ChrisJones" version="1.1"
META FILEATTACHMENT coal.jpg attr="h" comment="Coal" date="1236616511" path="coal.jpg" size="15147" user="ChrisJones" version="1.1"
META FILEATTACHMENT coal1.jpg attr="h" comment="coal" date="1236616595" path="coal1.jpg" size="15856" user="ChrisJones" version="1.1"
Added:
>
>
META FILEATTACHMENT dec20.txt attr="h" comment="dec" date="1236621090" path="dec20.txt" size="396" user="ChrisJones" version="1.1"
META FILEATTACHMENT dec20.jpg attr="h" comment="dec" date="1236621306" path="dec20.jpg" size="10971" user="ChrisJones" version="1.1"
 <<O>>  Difference Topic R (r1.3 - 09 Mar 2009 - ChrisJones)

META TOPICPARENT TechStuff
Added:
>
>

Multiple plots


To print 4 histograms (with 100 classes) of random normal distribution samples on the same plot, where n=351, mean=160, sd=6.03:


Changed:
<
<
par(mfrow=c(2,2)) for (i in 1:4){
>
>
> par(mfrow=c(2,2))
> for (i in 1:4){

hist(rnorm(351, 160, 6.03), 100) }
Line: 15 to 17

To produce a jpeg:


Changed:
<
<
jpeg("~/multiplot.jpg") par(mfrow=c(3,3)) for (i in 1:9){
>
>
> jpeg("~/multiplot.jpg")
> par(mfrow=c(3,3))
> for (i in 1:9){

hist(rnorm(351, 160, 6.03), 100) }
Changed:
<
<
dev.off()
>
>
> dev.off()

Which produces the following image:

Line: 29 to 31

Note: The dimensions of the image are 480x480 pixels, hence the titles being truncated. The following is more appropriate:

Changed:
<
<
jpeg("~/multiplot.jpg", 600, 600)
>
>
> jpeg("~/multiplot.jpg", 600, 600)

an

Added:
>
>

Cumalative plot

Given the data in the coal.txt, the following command can be used to produce a cumulative plot:

> x<-scan("coal.txt") //read data

> time<-cumsum(x) //cumulative sum of the data

> number<-1:length(x) //sequence from 1:190

> plot(time, number) //giving:

Coal

> scatter.smooth(x=time, y=number) //or a scatter with smoothed curve overlay:

coal


META FILEATTACHMENT multiplot.jpg attr="h" comment="Multi" date="1236370301" path="multiplot.jpg" size="34299" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot1.jpg attr="h" comment="another" date="1236370740" path="multiplot1.jpg" size="49539" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot4.jpg attr="h" comment="an" date="1236375152" path="multiplot4.jpg" size="66058" user="ChrisJones" version="1.1"
Added:
>
>
META FILEATTACHMENT coal.txt attr="h" comment="Coal" date="1236615897" path="coal.txt" size="855" user="ChrisJones" version="1.1"
META FILEATTACHMENT coal.jpg attr="h" comment="Coal" date="1236616511" path="coal.jpg" size="15147" user="ChrisJones" version="1.1"
META FILEATTACHMENT coal1.jpg attr="h" comment="coal" date="1236616595" path="coal1.jpg" size="15856" user="ChrisJones" version="1.1"
 <<O>>  Difference Topic R (r1.2 - 06 Mar 2009 - ChrisJones)

META TOPICPARENT TechStuff
To print 4 histograms (with 100 classes) of random normal distribution samples on the same plot, where n=351, mean=160, sd=6.03:
Line: 12 to 12

Gives the following plot (yours may differ because of randomness): Multi
Changed:
<
<
If you want to produce a jpeg do the following:
>
>
To produce a jpeg:

jpeg("~/multiplot.jpg")
Line: 20 to 20

for (i in 1:9){ hist(rnorm(351, 160, 6.03), 100) }
Added:
>
>
dev.off()

Which produces the following image:

Multi

Added:
>
>
Note: The dimensions of the image are 480x480 pixels, hence the titles being truncated. The following is more appropriate:

jpeg("~/multiplot.jpg", 600, 600)

an


META FILEATTACHMENT multiplot.jpg attr="h" comment="Multi" date="1236370301" path="multiplot.jpg" size="34299" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot1.jpg attr="h" comment="another" date="1236370740" path="multiplot1.jpg" size="49539" user="ChrisJones" version="1.1"
Added:
>
>
META FILEATTACHMENT multiplot4.jpg attr="h" comment="an" date="1236375152" path="multiplot4.jpg" size="66058" user="ChrisJones" version="1.1"
 <<O>>  Difference Topic R (r1.1 - 06 Mar 2009 - ChrisJones)
Line: 1 to 1
Added:
>
>
META TOPICPARENT TechStuff
To print 4 histograms (with 100 classes) of random normal distribution samples on the same plot, where n=351, mean=160, sd=6.03:

par(mfrow=c(2,2)) 
for (i in 1:4){
   hist(rnorm(351, 160, 6.03), 100)
}

Gives the following plot (yours may differ because of randomness): Multi

If you want to produce a jpeg do the following:

jpeg("~/multiplot.jpg")
par(mfrow=c(3,3))
for (i in 1:9){
 hist(rnorm(351, 160, 6.03), 100)
}

Which produces the following image:

Multi

META FILEATTACHMENT multiplot.jpg attr="h" comment="Multi" date="1236370301" path="multiplot.jpg" size="34299" user="ChrisJones" version="1.1"
META FILEATTACHMENT multiplot1.jpg attr="h" comment="another" date="1236370740" path="multiplot1.jpg" size="49539" user="ChrisJones" version="1.1"
Revision r1.1 - 06 Mar 2009 - 19:59 - ChrisJones
Revision r1.13 - 29 Jan 2010 - 10:00 - ChrisJones