18 Example: Summary Statistics in R
-
Both R and Python (
pandas
) include functions or methods to summarise the data - R applies functions to the column(s)
-
Python
pandas
applies methods to the column(s) of the dataframe
18.1 R
Set the working directory to the data folder and read the iris dataset as an R object DF
.
DF = read.csv('iris.csv')
Click to toggle script window
cat('Number of rows in iris data:', nrow(DF))
cat('Number of columns in iris data:', ncol(DF))
cat('First five rows of the data:')
head(iris, n = 5)
cat('Last five rows of the data:')
tail(iris, n = 5)
cat('Minimum value of Sepal Length:', min(DF$SepalLength))
cat('Maximum value of Sepal Length:', max(DF$SepalLength))
cat('Mean of Sepal Length:', mean(DF$SepalLength))
cat('Median of Sepal Length:', median(DF$SepalLength))
cat('Standard deviation of Sepal Length:', sd(DF$SepalLength))
cat('Summary Statistics: Sepal Length')
summary(DF$SepalLength)
cat('Summary Statistics: All')
summary(DF)
cat('Mean Sepal Length by Species:')
aggregate(SepalLength ~ Species, data = DF, FUN = mean, na.rm = TRUE)
cat('Mean Sepal & Petal Length by Species:')
aggregate(cbind(SepalLength, PetalLength) ~ Species, data = DF, FUN = mean, na.rm = TRUE)
cat('Summary Statistics of Sepal Length by Species:')
aggregate(SepalLength ~ Species, data = DF, FUN = summary, na.rm = TRUE)
cat('Number of missing values of Sepal Length:', sum(is.na(DF$SepalLength)))
cat('Number of non-missing values of Sepal Length:', sum(!is.na(DF$SepalLength)))
cat('Counts for different Species:')
table(DF$Species)
cat('Calculate a new column with log-transformed SepalLength')
DF$log_SepalLength = log(DF$SepalLength)
head(DF)
cat('Sort SepalLength in descending order')
head(DF[order(DF$SepalLength, decreasing = TRUE),])
Number of rows in iris data: 150
Number of columns in iris data: 5
First five rows of the data:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
Last five rows of the data:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
Minimum value of Sepal Length: 4.3
Maximum value of Sepal Length: 7.9
Mean of Sepal Length: 5.843333
Median of Sepal Length: 5.8
Standard deviation of Sepal Length: 0.8280661
Summary Statistics: Sepal Length
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.300 5.100 5.800 5.843 6.400 7.900
Summary Statistics: All
SepalLength SepalWidth PetalLength PetalWidth
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
Length:150
Class :character
Mode :character
Mean Sepal Length by Species:
Species SepalLength
1 setosa 5.006
2 versicolor 5.936
3 virginica 6.588
Mean Sepal & Petal Length by Species:
Species SepalLength PetalLength
1 setosa 5.006 1.462
2 versicolor 5.936 4.260
3 virginica 6.588 5.552
Summary Statistics of Sepal Length by Species:
Species SepalLength.Min. SepalLength.1st Qu. SepalLength.Median
1 setosa 4.300 4.800 5.000
2 versicolor 4.900 5.600 5.900
3 virginica 4.900 6.225 6.500
SepalLength.Mean SepalLength.3rd Qu. SepalLength.Max.
1 5.006 5.200 5.800
2 5.936 6.300 7.000
3 6.588 6.900 7.900
Number of missing values of Sepal Length: 0
Number of non-missing values of Sepal Length: 150
Counts for different Species:
setosa versicolor virginica
50 50 50
Calculate a new column with log-transformed SepalLength
SepalLength SepalWidth PetalLength PetalWidth Species log_SepalLength
1 5.1 3.5 1.4 0.2 setosa 1.629241
2 4.9 3.0 1.4 0.2 setosa 1.589235
3 4.7 3.2 1.3 0.2 setosa 1.547563
4 4.6 3.1 1.5 0.2 setosa 1.526056
5 5.0 3.6 1.4 0.2 setosa 1.609438
6 5.4 3.9 1.7 0.4 setosa 1.686399
Sort SepalLength in descending order
SepalLength SepalWidth PetalLength PetalWidth Species log_SepalLength
132 7.9 3.8 6.4 2.0 virginica 2.066863
118 7.7 3.8 6.7 2.2 virginica 2.041220
119 7.7 2.6 6.9 2.3 virginica 2.041220
123 7.7 2.8 6.7 2.0 virginica 2.041220
136 7.7 3.0 6.1 2.3 virginica 2.041220
106 7.6 3.0 6.6 2.1 virginica 2.028148