18 Example: Summary Statistics in R

  • Both R and Python (pandas) include functions or methods to summarise the data
  • R applies functions to the column(s)
  • Python pandas applies methods to the column(s) of the dataframe

18.1 R

Set the working directory to the data folder and read the iris dataset as an R object DF.

DF = read.csv('iris.csv')

cat('Number of rows in iris data:', nrow(DF))

cat('Number of columns in iris data:', ncol(DF))

cat('First five rows of the data:')
head(iris, n = 5)

cat('Last five rows of the data:')
tail(iris, n = 5)

cat('Minimum value of Sepal Length:', min(DF$SepalLength))

cat('Maximum value of Sepal Length:', max(DF$SepalLength))

cat('Mean of Sepal Length:', mean(DF$SepalLength))

cat('Median of Sepal Length:', median(DF$SepalLength))

cat('Standard deviation of Sepal Length:', sd(DF$SepalLength))

cat('Summary Statistics: Sepal Length')

cat('Summary Statistics: All')

cat('Mean Sepal Length by Species:')
aggregate(SepalLength ~ Species, data = DF, FUN = mean, na.rm = TRUE)

cat('Mean Sepal & Petal Length by Species:')
aggregate(cbind(SepalLength, PetalLength) ~ Species, data = DF, FUN = mean, na.rm = TRUE)

cat('Summary Statistics of Sepal Length by Species:')
aggregate(SepalLength ~ Species, data = DF, FUN = summary, na.rm = TRUE)

cat('Number of missing values of Sepal Length:', sum(is.na(DF$SepalLength)))

cat('Number of non-missing values of Sepal Length:', sum(!is.na(DF$SepalLength)))

cat('Counts for different Species:')

cat('Calculate a new column with log-transformed SepalLength')
DF$log_SepalLength = log(DF$SepalLength)

cat('Sort SepalLength in descending order')
head(DF[order(DF$SepalLength, decreasing = TRUE),])
Number of rows in iris data: 150
Number of columns in iris data: 5
First five rows of the data:
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
Last five rows of the data:
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
146          6.7         3.0          5.2         2.3 virginica
147          6.3         2.5          5.0         1.9 virginica
148          6.5         3.0          5.2         2.0 virginica
149          6.2         3.4          5.4         2.3 virginica
150          5.9         3.0          5.1         1.8 virginica
Minimum value of Sepal Length: 4.3
Maximum value of Sepal Length: 7.9
Mean of Sepal Length: 5.843333
Median of Sepal Length: 5.8
Standard deviation of Sepal Length: 0.8280661
Summary Statistics: Sepal Length
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  4.300   5.100   5.800   5.843   6.400   7.900 
Summary Statistics: All
  SepalLength      SepalWidth     PetalLength      PetalWidth   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
 Class :character  
 Mode  :character  
Mean Sepal Length by Species:
     Species SepalLength
1     setosa       5.006
2 versicolor       5.936
3  virginica       6.588
Mean Sepal & Petal Length by Species:
     Species SepalLength PetalLength
1     setosa       5.006       1.462
2 versicolor       5.936       4.260
3  virginica       6.588       5.552
Summary Statistics of Sepal Length by Species:
     Species SepalLength.Min. SepalLength.1st Qu. SepalLength.Median
1     setosa            4.300               4.800              5.000
2 versicolor            4.900               5.600              5.900
3  virginica            4.900               6.225              6.500
  SepalLength.Mean SepalLength.3rd Qu. SepalLength.Max.
1            5.006               5.200            5.800
2            5.936               6.300            7.000
3            6.588               6.900            7.900
Number of missing values of Sepal Length: 0
Number of non-missing values of Sepal Length: 150
Counts for different Species:

    setosa versicolor  virginica 
        50         50         50 
Calculate a new column with log-transformed SepalLength
  SepalLength SepalWidth PetalLength PetalWidth Species log_SepalLength
1         5.1        3.5         1.4        0.2  setosa        1.629241
2         4.9        3.0         1.4        0.2  setosa        1.589235
3         4.7        3.2         1.3        0.2  setosa        1.547563
4         4.6        3.1         1.5        0.2  setosa        1.526056
5         5.0        3.6         1.4        0.2  setosa        1.609438
6         5.4        3.9         1.7        0.4  setosa        1.686399
Sort SepalLength in descending order
    SepalLength SepalWidth PetalLength PetalWidth   Species log_SepalLength
132         7.9        3.8         6.4        2.0 virginica        2.066863
118         7.7        3.8         6.7        2.2 virginica        2.041220
119         7.7        2.6         6.9        2.3 virginica        2.041220
123         7.7        2.8         6.7        2.0 virginica        2.041220
136         7.7        3.0         6.1        2.3 virginica        2.041220
106         7.6        3.0         6.6        2.1 virginica        2.028148