Summary: A brief module demonstrating the table, prop.table, and related functions in R to complement the content found in Collaborative Statistics.
The examples in the previous section provided you with contingency tables and asked you to calculate row, column, and cell percentages and totals. In real practice, you will most likely be dealing with raw data that needs to first be summarized before creating the contingency tables. This section will introduce some of the many methods that can be used in R to calculate frequencies and create contingency tables.
Here, we will generate some sample data and view the first and last few lines of the dataset. In reality, you would probably have additional variables in the dataset, in which case when you are creating your tables, you will have to specify the columns to tabulate.
set.seed(1)
myDF = data.frame(car.phone.use = sample(c(TRUE, FALSE), 755,
replace=TRUE, prob=c(.4, .6)),
speed.violation = sample(c(TRUE, FALSE), 755,
replace=TRUE, prob=c(.1, .9)))
# First few cases
head(myDF)
## car.phone.use speed.violation
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 TRUE TRUE
## 5 FALSE FALSE
## 6 TRUE FALSE
# Last few cases
tail(myDF)
## car.phone.use speed.violation
## 750 TRUE FALSE
## 751 FALSE FALSE
## 752 TRUE FALSE
## 753 FALSE FALSE
## 754 FALSE FALSE
## 755 FALSE FALSER has several useful in-built functions for tabulation and contingency tables. In particular, the functions table() and prop.table() are a good starting point. The table() function will create a basic cross table of the specified variables. The prop.table() takes a table (or matrix) as its input and is usually used to return cell percentages (no second argument), row percentages (1 as the second argument), or column percentages (2 as the second argument).
# Simple tabulation of the two variables
myTable = table(myDF)
myTable
## speed.violation
## car.phone.use FALSE TRUE
## FALSE 411 48
## TRUE 264 32
# Adding row and column sums
addmargins(myTable)
## speed.violation
## car.phone.use FALSE TRUE Sum
## FALSE 411 48 459
## TRUE 264 32 296
## Sum 675 80 755
# Cell percentages of total
prop.table(myTable)
## speed.violation
## car.phone.use FALSE TRUE
## FALSE 0.54437 0.06358
## TRUE 0.34967 0.04238
# Row percentages
prop.table(myTable, 1)
## speed.violation
## car.phone.use FALSE TRUE
## FALSE 0.8954 0.1046
## TRUE 0.8919 0.1081
# Column percentages
prop.table(myTable, 2)
## speed.violation
## car.phone.use FALSE TRUE
## FALSE 0.6089 0.6000
## TRUE 0.3911 0.4000
An alternative to creating these tables separately is to use the CrossTable() function from the "gmodels" package (installed by using install.packages("gmodels") (only required once) and loaded using library(gmodels) (required once per R session)).
# Uncomment the following to install the
# `gmodels` package if not yet installed.
# install.packages('gmodels')
library(gmodels)
CrossTable(myTable)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 755
##
##
## | speed.violation
## car.phone.use | FALSE | TRUE | Row Total |
## --------------|-----------|-----------|-----------|
## FALSE | 411 | 48 | 459 |
## | 0.001 | 0.008 | |
## | 0.895 | 0.105 | 0.608 |
## | 0.609 | 0.600 | |
## | 0.544 | 0.064 | |
## --------------|-----------|-----------|-----------|
## TRUE | 264 | 32 | 296 |
## | 0.002 | 0.013 | |
## | 0.892 | 0.108 | 0.392 |
## | 0.391 | 0.400 | |
## | 0.350 | 0.042 | |
## --------------|-----------|-----------|-----------|
## Column Total | 675 | 80 | 755 |
## | 0.894 | 0.106 | |
## --------------|-----------|-----------|-----------|