Skip to topic
|
Skip to bottom
Search:
Chris and Janet's website
Home
Historian
Archive
Tech
Amnesty
Home
Changes
Index
Search
Tools
Start of topic |
Skip to actions
%TOC% ---+ Multiple plots To print 4 histograms (with 100 classes) of random normal distribution samples on the same plot, where n=351, mean=160, sd=6.03: <verbatim> > par(mfrow=c(2,2)) > for (i in 1:4){ hist(rnorm(351, 160, 6.03), 100) } </verbatim> Gives the following plot (yours may differ because of randomness): <img src="%ATTACHURLPATH%/multiplot.jpg" alt="Multi" width="480" height="480" /> To produce a jpeg: <verbatim> > jpeg("~/multiplot.jpg") > par(mfrow=c(3,3)) > for (i in 1:9){ hist(rnorm(351, 160, 6.03), 100) } > dev.off() </verbatim> Which produces the following image: <img src="%ATTACHURLPATH%/multiplot1.jpg" alt="Multi" width="480" height="480" /> *Note:* The dimensions of the image are 480x480 pixels, hence the titles being truncated. The following is more appropriate: => jpeg("~/multiplot.jpg", 600, 600)= <img src="%ATTACHURLPATH%/multiplot4.jpg" alt="an" width="600" height="600" /> --- ---+ Cumalative plot [[%ATTACHURL%/coal.txt][coal.txt]] contains the interval in days between successive mine explosions. The following command can be used to produce a cumulative plot: => x<-scan("coal.txt") //read data= => time<-cumsum(x) //cumulative 'running' sum of the data= => number<-1:length(x) //sequence from 1:190= => plot(time, number) //giving:= <img src="%ATTACHURLPATH%/coal.jpg" alt="Coal" width="480" height="480" /> => scatter.smooth(x=time, y=number) //or a scatter with smoothed curve overlay:= <img src="%ATTACHURLPATH%/coal1.jpg" alt="coal" width="480" height="480" /> --- ---+ Poisson distribution To plot a poisson probability function Poission(mu) from the data in [[%ATTACHURL%/dec20.txt][dec20.txt]], issue the following commands: => dec20<-scan("dec20.txt") //load in data= => barplot(dpois(1:max(dec20),summary(dec20)['Mean'])) //prints the following:= <img src="%ATTACHURLPATH%/dec20.jpg" alt="dec" width="480" height="480" /> *Note:* the syntax to get the mean from the summary is not straight forward. Because =summary(...)= returns an atomic vector, hence the =$= notation cannot be used to (de-)reference the internals of the returned data structure, like it can from say =hist(x,y)$counts=. The array syntax =[]= should be used, hence: =summary(dec20)['Mean']= to get hold of the mean/mu. You might find it enlightening to issue the following commands: =mode(summary(dec20))=, =mode(hist(x))= to get a feel for the different data structures returned from various functions. You might also like to try =attributes(...)=. --- ---+ Histograms => hist(dec20) //plots the following= <img src="%ATTACHURLPATH%/dec20hist.jpg" alt="dec" width="480" height="480" /> But can be improved with the following: => hist(dec20, 23, right=F)= <img src="%ATTACHURLPATH%/dec20hist1.jpg" alt="dec" width="480" height="480" /> Notice that the second plot has more groups/classes, and the frequency for 0 is now plotted separately, =right=F=. --- ---+ Poisson process [[%ATTACHURL%/earth.txt][earth.txt]] contains intervals in days between major earthquakes. => e<-scan("earth.txt")= The following finds the probability that exactly 5 earthquakes happen in a typical 4 year period: => dpois(5,((365*3)+366)/summary(e)['Mean'])= And the following finds the probability that fewer than 3 earthquakes happen in a 4 year period: => ppois(2,((365*3)+366)/summary(e)['Mean'])= The waiting times between successive earthquakes can be described through an exponential distribution, with parameter lambda (rate, number of earthquakes a day). The probability that the waiting time will be less than the 30 days: => pexp(30, 62/27107)= and will exceed two years: => 1-pexp(2*365, 62/27107)= --- ---+ Quantiles for an exponential distribution The mean of the interval between serious earthquakes is 437 days (see above for data). The median can be calculated using P(X<= x)=0.5: => qexp(.5, 1/437)= which results in approx 303 days. 1/437 is the rate of earthquakes (1/mu). Any quantile can be calculated in this manner. --- ---+ Cycling accident data for 2005 See also: [[http://labs.timesonline.co.uk/blog/2009/03/11/uk-cycling-accidents/][Times online]], [[http://innovate.direct.gov.uk/2009/03/10/pedalling-some-raw-data/][Gov Direct]], [[http://po-ru.com/files/pedal-cycle-accident-locations.zip][Repackaged csv]]. Load the following file [[%ATTACHURL%/2005.csv][2005.csv]] => stuff<-scan("2005.csv", skip=1, what=list(year=0,north=0, east=0), sep=',', quiet=T)= Create a density plot of the co-ordinate data (densities are calculated over a 500x500 grid): => blah<-kde2d(stuff$north, stuff$east, n=500)= =blah= is a list containing 3 items, =x= and =y=, representing the grid, and =z= a 500x500 matrix containing the kernel density for each cell in the grid: Then plot the result: => image(blah, col=c(0, rainbow(50)))= <img src="%ATTACHURLPATH%/accident2005.jpg" alt="accident" width="480" height="480" /> You can see that the above plot is a lot clearer than the following: => plot(stuff$north, stuff$east)= <img src="%ATTACHURLPATH%/latlong2005.jpg" alt="latlong" width="480" height="480" /> --- If you want to add a colour legend to the z-plot above, you'll need to install the =fields= package: => install.packages("fields")= => library(fields)= => image.plot(blah, col=c(0, rainbow(50)))= <img src="%ATTACHURLPATH%/legend2005.jpg" alt="legend" width="480" height="480" /> --- To add a few place names... <verbatim> > points(537500, 169500);text(537500, 169500, pos=2,"Beckenham");points(534500,184500);text(534500,184500, pos=2, "Hackney");points(532108,178763);text(532108,178763, pos=2, "E&C") > points(531496,181359) > text(531496,181359, "Shoe Lane (The City)", pos=4) </verbatim> --- The following plot show the accident hotspot for London in 2005 was approximately in the region of Shoe Lane: <img src="%ATTACHURLPATH%/accident2005Lengend.jpg" alt="accident" width="480" height="480" /> and Victoria Emabankment in 2007: <img src="%ATTACHURLPATH%/accidentHotspot2007.jpg" alt="blah" width="480" height="480" /> --- TODO nice to add UK boundary to the plots. --- * [[%ATTACHURL%/2007.csv][2007.csv]] --- ---+ Testing the central limit theorem I wanted to test that the mean over a large number of samples can be modeled adequately with a normal distribution. Further, I wanted to see if the theorem held up when the original distribution was heavily skew. To investigate: =means<-numeric(10000) //somewhere to store the means= <verbatim> for(i in 1:10000){means[i]<-mean(rbinom(n=10000, prob=.1, size=5))} //generate 10000 random binomial distributions, storing each of their means, note I use prob=.1 producing a heavily skew distribution </verbatim> =h<-hist(means) //plot the distribution= Now lets overlay the idealised normal curve. =x<-seq(0.4766, 0.5262,.0001) //generate the x values, use range(means)= <verbatim> y<-dnorm(x,mean=mean(means),sd(means))*.005*10000 //scale the y values by the bin width and the number of samples (look at the =h= for bin width). </verbatim> <img src="%ATTACHURLPATH%/central.jpg" alt="central" width="480" height="480" /> Normal approximation looks pretty good. And the probabilty plot looks like this: =plot(sort(means), qnorm(seq(1/10000, 1,1/10000)))= <img src="%ATTACHURLPATH%/probplot.jpg" alt="prob" width="480" height="480" /> Again looks very good. ---+ Data frames and Combinations Create a list containing two columns with a count form 1 to 31 =L<-list(A=c(1:31),B=c(1:31))= Create a data frame that contains all the combinations 1 to 31 =G<-expand.grid(L)= Add a column, excel style: =G$fraction <- G$A/G$B= Filter by column: <verbatim> G[G$fraction <= 1,] </verbatim> Count the number of rows of a dataframe: <verbatim> nrow (G[G$fraction <= 1,]) </verbatim>
to top
End of topic
Skip to action links
|
Back to top
Edit
|
Attach image or document
|
Printable version
|
Raw text
|
More topic actions
Revisions: | r1.13 |
>
|
r1.12
|
>
|
r1.11
|
Total page history
|
Backlinks
You are here:
Home
>
R
to top