19 Example: Summary Statistics in Python

  • Both R and Python (pandas) include functions or methods to summarise the data
  • R applies functions to the column(s)
  • Python pandas applies methods to the column(s) of the dataframe

19.1 Python

Set the working directory to the data folder and read the iris dataset as an R object DF.

import pandas as pd

DF = pd.read_csv('iris.csv')

Click to toggle script window
import pandas as pd

print('Minimum value of', DF[['SepalLength']].min())

print('Maximum value of', DF[['SepalLength']].max())

print('Mean of', DF[['SepalLength']].mean())

print('Median of', DF[['SepalLength']].median())

print('Standard deviation of', DF[['SepalLength']].std())

print('Summary Statistics of', DF[['SepalLength']].describe())

print('Summary Statistics: All')
DF.describe()

print('Mean Sepal Length by Species:')
DF.groupby('Species')[['SepalLength']].mean()

print('Mean Sepal & Petal Length by Species:')
DF.groupby('Species')[['SepalLength', 'PetalLength']].mean()

print('Summary Statistics of Sepal Length by Species:')
DF.groupby('Species')[['SepalLength']].describe()

print('Number of missing values of Sepal Length = ', sum(DF.SepalLength.isnull()))

print('Number of non-missing values of Sepal Length = ', sum(DF.SepalLength.notnull()))

print('Counts for different Species:')
DF.Species.value_counts()

print('Calculate a new column with log-transformed SepalLength')
import numpy as np
DF['log_SepalLength'] = np.log(DF['SepalLength'])
DF.head()

print('Sort SepalLength in descending order')
DF.sort_values(by = "SepalLength", ascending = False)
Minimum value of SepalLength    4.3
dtype: float64
Maximum value of SepalLength    7.9
dtype: float64
Mean of SepalLength    5.843333
dtype: float64
Median of SepalLength    5.8
dtype: float64
Standard deviation of SepalLength    0.828066
dtype: float64
Summary Statistics of        SepalLength
count   150.000000
mean      5.843333
std       0.828066
min       4.300000
25%       5.100000
50%       5.800000
75%       6.400000
max       7.900000
Summary Statistics: All
       SepalLength  SepalWidth  PetalLength  PetalWidth
count   150.000000  150.000000   150.000000  150.000000
mean      5.843333    3.057333     3.758000    1.199333
std       0.828066    0.435866     1.765298    0.762238
min       4.300000    2.000000     1.000000    0.100000
25%       5.100000    2.800000     1.600000    0.300000
50%       5.800000    3.000000     4.350000    1.300000
75%       6.400000    3.300000     5.100000    1.800000
max       7.900000    4.400000     6.900000    2.500000
Mean Sepal Length by Species:
            SepalLength
Species                
setosa            5.006
versicolor        5.936
virginica         6.588
Mean Sepal & Petal Length by Species:
            SepalLength  PetalLength
Species                             
setosa            5.006        1.462
versicolor        5.936        4.260
virginica         6.588        5.552
Summary Statistics of Sepal Length by Species:
           SepalLength                                            
                 count   mean       std  min    25%  50%  75%  max
Species                                                           
setosa            50.0  5.006  0.352490  4.3  4.800  5.0  5.2  5.8
versicolor        50.0  5.936  0.516171  4.9  5.600  5.9  6.3  7.0
virginica         50.0  6.588  0.635880  4.9  6.225  6.5  6.9  7.9
Number of missing values of Sepal Length =  0
Number of non-missing values of Sepal Length =  150
Counts for different Species:
setosa        50
versicolor    50
virginica     50
Name: Species, dtype: int64
Calculate a new column with log-transformed SepalLength
   SepalLength  SepalWidth  PetalLength  PetalWidth Species  log_SepalLength
0          5.1         3.5          1.4         0.2  setosa         1.629241
1          4.9         3.0          1.4         0.2  setosa         1.589235
2          4.7         3.2          1.3         0.2  setosa         1.547563
3          4.6         3.1          1.5         0.2  setosa         1.526056
4          5.0         3.6          1.4         0.2  setosa         1.609438
Sort SepalLength in descending order
     SepalLength  SepalWidth  ...    Species  log_SepalLength
131          7.9         3.8  ...  virginica         2.066863
135          7.7         3.0  ...  virginica         2.041220
122          7.7         2.8  ...  virginica         2.041220
117          7.7         3.8  ...  virginica         2.041220
118          7.7         2.6  ...  virginica         2.041220
..           ...         ...  ...        ...              ...
41           4.5         2.3  ...     setosa         1.504077
42           4.4         3.2  ...     setosa         1.481605
38           4.4         3.0  ...     setosa         1.481605
8            4.4         2.9  ...     setosa         1.481605
13           4.3         3.0  ...     setosa         1.458615

[150 rows x 6 columns]

19.2 Note