Section 16 R Markdown: Example

Here is a simple example of R Markdown document that reads the mtcars data, calculates summary statistics, plots different variables and fits a statistical model to investigate the relationship between fuel use and car weight and transmission type.

The R Markdown code is provided below. You can copy and paste the following code in a text file and save the file with the extension as .Rmd.

Open the file in RStudio and use the Knit button to compile the file in the HTML format. Note that the data should be in the same folder as your Rmd file.

You can also download the R Markdown file report.Rmd.

The final output as a HTML file is here

16.1 R Markdown Example file


 ---
 title: "Relationship between fuel use with weight of car and transmission type"
 author: "My Name"
 date: '`r format(Sys.Date(), "%d %B %Y")`'
 output:
   word_document:
     toc: yes
   html_document:
     number_sections: yes
     theme: cerulean
     toc: yes
 ---
 
 ```{r label='input', eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
 
 # Library
 
 library(knitr)
 
 
 # Read data
 
 DF <- read.csv(file='cars.csv')
 
 DF$am <- as.factor(DF$am)
 
 ```
 
 
 ```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
 
 # My function
 
 fnSummary <- function(x, ...){
 
     n1 <- length(x)
     n2 <- sum(!is.na(x))
   
     x.am <- sum(x, ...) / n2
   
     x.gm <- exp(sum(log(x[x > 0]), ...) / n2)
   
     x.hm <- 1 / (sum(1/x, ...)/n2)
   
     x.var <- sum((x - x.am)^2, ...) / (n2-1)
   
     x.sd <- sqrt(x.var)
   
   
     x.min <- min(x, ...)
     x.max <- max(x, ...)
     x.rng <- x.max - x.min
   
     x.q <- quantile(x, probs=c(0.25,0.50,0.75), ...)
     x.iqr <- unname(x.q[3] - x.q[1])
   
     x.cv <- x.sd / x.am
   
     # Summary as a numeric vector
     Summary <- c(N=n1, N_excl_NA=n2,
                     Min=x.min, Max=x.max,
                     AM=x.am, GM=x.gm, HM=x.hm,
                     Q1=unname(x.q[1]), 
                     Q2=unname(x.q[2]), 
                     Q3=unname(x.q[3]),
                     Range=x.rng, IQR=x.iqr,
                     Var=x.var, SD=x.sd, CV=x.cv)
   
     return(Summary)
 
 }
 
 ```
 
 <br>
 
 <br>
 
 <br>
 
 # Description of the data
 
 
 The data is referred as _mtcars_ in the R environment. 
 
 The data was extracted from the 1974 Motor Trend US magazine.
 
 It contains performance for `r nrow(DF)` automobiles (1973-74 models).
 
 It also includes fuel consumption and `r ncol(DF)` aspects of these models.
 
 
 <br>
 
 
 # Summary statistics
 
 <br>
 
 ## Continuous variables
 
 
 Summary statistics of continuous variables are presented in the following table.
 
 
 ```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
 
 sDF <- sapply(DF[,c(2:8)], FUN = fnSummary, na.rm=TRUE)
 sDF <- as.data.frame(sDF)
 
 kable(x=sDF, format='markdown', digits=2,
       row.names=TRUE, col.names=colnames(sDF))
 
 
 ```
 
 <br>
 
 
 ## Variable _mpg_ & _wt_ by _am_
 
 Summary statistics of _mpg_ and _wt_ by _am_ are presented in the following table.
 
 ```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
 
 sDF <- aggregate(cbind(mpg,wt) ~ am, data=DF, FUN=mean, na.rm=TRUE)
 
 ```
 
 am    | mpg                    | wt      
 ----- | -----------------------| -----------------------
 0     | `r round(sDF$mpg[1],2)`| `r round(sDF$wt[1],2)`
 1     | `r round(sDF$mpg[2],2)`| `r round(sDF$wt[2],2)`
 
 
 
 <br>
 
 # Summary plots
 
 ## Pairwise plots
 
 ```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE, fig.width=18, fig.height=14}
 
 pairs(DF[,c(2:12)])
 
 ```
 
 ## Scatter plot of _mpg_ and _wt_
 
 ```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
 
 plot(mpg ~ wt, data=DF)
 
 ```
 
 
 ## Boxplot of _mpg_ with _am_
 
 ```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
 
 plot(mpg ~ am, data=DF)
 
 ```
 
 <br>
 
 # Linear Regression Model
 
 * We modelled the fuel efficiency (_mpg_) to investigate the effect of:
 
       + weight of the car (_wt_) and
       
       + transmission type (_am_) 
 
 * We fitted a linear regression model with _wt_ and _am_ as predictors. 
 
 * The model also included the two-way interaction terms of _wt_ and _am_.
 
 
 <br>
 
 ## Model in R
 
 ```{r eval=TRUE, echo=TRUE, warning=FALSE, message=FALSE, comment=NA}
 
 fm.lm <- lm(mpg ~ wt + am + wt:am, data=DF)
 
 ```
 
 <br>
 
 ## R Outputs
 
 ```{r eval=TRUE, echo=TRUE, warning=FALSE, message=FALSE, comment=NA}
 
 summary(fm.lm)
 
 ```
 
 
 <br>
 
 ## Summary
 
 Table: Estimates and standard errors of fitted linear regression model
 
 ```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
 
 sDF <- summary(fm.lm)$coefficients
 
 kable(x=sDF, format='markdown', digits=6,
       row.names = FALSE, col.names=colnames(sDF))
 
 
 ```
 
 <br>
 
 ## Residual Plot
 
 Model assumptions are checked by different plots of the residuals.
 
 ```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
 
 plot(fm.lm)
 
 ```
 
 <br>
 
 ## Conclusion
 
 We observed that the two-way interaction effect of _wt_ and _am_ was significant.
 
 
 <br>
 
 <br>