Section 31 Two or More Categorical Variables

31.1 Tabulation

When looking at the relationship between two categorical variables, it can be useful to produce a table containing the number of units (data values, samples) for each of the group combinations. Furthermore, it can often be informative to add margins to the tables.

weather <- read.csv('weather.csv')

# with(data=weather, addmargins(table(winddir,rain)))

t1 <- with(data=weather, table(winddir,rain))
t2 <- addmargins(t1)
t2

       rain
winddir Dry Wet Sum
  Calm   13   1  14
  East   14   6  20
  North  18   5  23
  South  15   2  17
  West   77  17  94
  Sum   137  31 168

The above table might be more useful for making comparisons if expressed as percentages. To illustrate, we express the values as percentages of the number of times the wind blew from each direction.

100 * round(prop.table(x = t1), digits=4)

       rain
winddir   Dry   Wet
  Calm   7.74  0.60
  East   8.33  3.57
  North 10.71  2.98
  South  8.93  1.19
  West  45.83 10.12

100 * round(prop.table(x = t1, margin = 1), digits=4)

       rain
winddir   Dry   Wet
  Calm  92.86  7.14
  East  70.00 30.00
  North 78.26 21.74
  South 88.24 11.76
  West  81.91 18.09

100 * round(prop.table(x = t1, margin = 2), digits=4)

       rain
winddir   Dry   Wet
  Calm   9.49  3.23
  East  10.22 19.35
  North 13.14 16.13
  South 10.95  6.45
  West  56.20 54.84