Section 31 Two or More Categorical Variables
31.1 Tabulation
When looking at the relationship between two categorical variables, it can be useful to produce a table containing the number of units (data values, samples) for each of the group combinations. Furthermore, it can often be informative to add margins to the tables.
weather <- read.csv('weather.csv')
# with(data=weather, addmargins(table(winddir,rain)))
t1 <- with(data=weather, table(winddir,rain))
t2 <- addmargins(t1)
t2
rain
winddir Dry Wet Sum
Calm 13 1 14
East 14 6 20
North 18 5 23
South 15 2 17
West 77 17 94
Sum 137 31 168
The above table might be more useful for making comparisons if expressed as percentages. To illustrate, we express the values as percentages of the number of times the wind blew from each direction.
rain
winddir Dry Wet
Calm 7.74 0.60
East 8.33 3.57
North 10.71 2.98
South 8.93 1.19
West 45.83 10.12
rain
winddir Dry Wet
Calm 92.86 7.14
East 70.00 30.00
North 78.26 21.74
South 88.24 11.76
West 81.91 18.09
rain
winddir Dry Wet
Calm 9.49 3.23
East 10.22 19.35
North 13.14 16.13
South 10.95 6.45
West 56.20 54.84