Section 16 R Markdown: Example
Here is a simple example of R Markdown document that reads the mtcars
data, calculates summary statistics, plots different variables and fits a statistical model to investigate the relationship between fuel use and car weight and transmission type.
The R Markdown code is provided below. You can copy and paste the following code in a text file and save the file with the extension as .Rmd
.
Open the file in RStudio and use the Knit
button to compile the file in the HTML format. Note that the data should be in the same folder as your Rmd file.
You can also download the R Markdown file report.Rmd
.
The final output as a HTML file is here
16.1 R Markdown Example file
---
title: "Relationship between fuel use with weight of car and transmission type"
author: "My Name"
date: '`r format(Sys.Date(), "%d %B %Y")`'
output:
word_document:
toc: yes
html_document:
number_sections: yes
theme: cerulean
toc: yes
---
```{r label='input', eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
# Library
library(knitr)
# Read data
DF <- read.csv(file='cars.csv')
DF$am <- as.factor(DF$am)
```
```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
# My function
fnSummary <- function(x, ...){
n1 <- length(x)
n2 <- sum(!is.na(x))
x.am <- sum(x, ...) / n2
x.gm <- exp(sum(log(x[x > 0]), ...) / n2)
x.hm <- 1 / (sum(1/x, ...)/n2)
x.var <- sum((x - x.am)^2, ...) / (n2-1)
x.sd <- sqrt(x.var)
x.min <- min(x, ...)
x.max <- max(x, ...)
x.rng <- x.max - x.min
x.q <- quantile(x, probs=c(0.25,0.50,0.75), ...)
x.iqr <- unname(x.q[3] - x.q[1])
x.cv <- x.sd / x.am
# Summary as a numeric vector
Summary <- c(N=n1, N_excl_NA=n2,
Min=x.min, Max=x.max,
AM=x.am, GM=x.gm, HM=x.hm,
Q1=unname(x.q[1]),
Q2=unname(x.q[2]),
Q3=unname(x.q[3]),
Range=x.rng, IQR=x.iqr,
Var=x.var, SD=x.sd, CV=x.cv)
return(Summary)
}
```
<br>
<br>
<br>
# Description of the data
The data is referred as _mtcars_ in the R environment.
The data was extracted from the 1974 Motor Trend US magazine.
It contains performance for `r nrow(DF)` automobiles (1973-74 models).
It also includes fuel consumption and `r ncol(DF)` aspects of these models.
<br>
# Summary statistics
<br>
## Continuous variables
Summary statistics of continuous variables are presented in the following table.
```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
sDF <- sapply(DF[,c(2:8)], FUN = fnSummary, na.rm=TRUE)
sDF <- as.data.frame(sDF)
kable(x=sDF, format='markdown', digits=2,
row.names=TRUE, col.names=colnames(sDF))
```
<br>
## Variable _mpg_ & _wt_ by _am_
Summary statistics of _mpg_ and _wt_ by _am_ are presented in the following table.
```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
sDF <- aggregate(cbind(mpg,wt) ~ am, data=DF, FUN=mean, na.rm=TRUE)
```
am | mpg | wt
----- | -----------------------| -----------------------
0 | `r round(sDF$mpg[1],2)`| `r round(sDF$wt[1],2)`
1 | `r round(sDF$mpg[2],2)`| `r round(sDF$wt[2],2)`
<br>
# Summary plots
## Pairwise plots
```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE, fig.width=18, fig.height=14}
pairs(DF[,c(2:12)])
```
## Scatter plot of _mpg_ and _wt_
```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
plot(mpg ~ wt, data=DF)
```
## Boxplot of _mpg_ with _am_
```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
plot(mpg ~ am, data=DF)
```
<br>
# Linear Regression Model
* We modelled the fuel efficiency (_mpg_) to investigate the effect of:
+ weight of the car (_wt_) and
+ transmission type (_am_)
* We fitted a linear regression model with _wt_ and _am_ as predictors.
* The model also included the two-way interaction terms of _wt_ and _am_.
<br>
## Model in R
```{r eval=TRUE, echo=TRUE, warning=FALSE, message=FALSE, comment=NA}
fm.lm <- lm(mpg ~ wt + am + wt:am, data=DF)
```
<br>
## R Outputs
```{r eval=TRUE, echo=TRUE, warning=FALSE, message=FALSE, comment=NA}
summary(fm.lm)
```
<br>
## Summary
Table: Estimates and standard errors of fitted linear regression model
```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
sDF <- summary(fm.lm)$coefficients
kable(x=sDF, format='markdown', digits=6,
row.names = FALSE, col.names=colnames(sDF))
```
<br>
## Residual Plot
Model assumptions are checked by different plots of the residuals.
```{r eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
plot(fm.lm)
```
<br>
## Conclusion
We observed that the two-way interaction effect of _wt_ and _am_ was significant.
<br>
<br>