Section 33 Continuous & Categorical Variables

33.1 Tabulation

We need to produce summary statistics to capture relationship between continuous and categorical variables.

In general tables of summary statistics provide more detail than plots, but plots reveal trends and patterns more easily than tables.

weather <- read.csv('weather.csv')

with(data=weather, tapply(X = humidity, INDEX = rain, FUN = median))
Dry Wet 
 82  91 
with(data=weather, tapply(X = temp, INDEX = rain, FUN = mean))
     Dry      Wet 
3.901460 3.964516 
with(data=weather, tapply(X = windsp, INDEX = winddir, FUN = summary))
$Calm
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 

$East
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   5.00    8.75   11.50   12.65   15.25   24.00 

$North
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   4.00    8.50   14.00   12.43   16.00   19.00 

$South
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.000   5.000   7.000   6.176   7.000   9.000 

$West
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   3.00    9.25   12.00   13.01   16.00   25.00 

33.2 Plot

Plot of continuous variable identifying levels of a categorical variable.

  • Box plot
  • Histogram and Density plots
  • A combination of these plots