16 Data: Slice & Dice
- There are multiple ways to slice and dice dataframe
- Indexing could be done at both row level and column level
16.1 R
Set the working directory to the data folder and read the iris dataset as an R object DF.
DF = read.csv('iris.csv')
| Approach | Explanation | Example |
|---|---|---|
| Select row(s) | ||
[i, ] |
Get the row indexed by integer value i | DF[, 4] |
Name |
Get the row by name | DF[c('3', '4'), ] |
[-i, ] |
Remove the row indexed by integer value i | DF[-3, ] |
logical operator |
Get the rows where condition is TRUE (note incomplete length will recycle) | DF[DF$SepalLength >= 7.5, ] |
| Select column(s) | ||
| $ | Get the named column | DF$Species |
[, j] |
Get the column indexed by integer value j | DF[, 4] |
Name |
Get the column by name string | DF['Species']; DF[, c('SepalLength', 'Species')] |
[, -j] |
Remove the column indexed by integer value j | DF[, -4] |
logical operator |
Get the columns where the condition is TRUE (note incomplete length will recycle) | DF[,c(T,F,F,F,T)] |
16.2 Python
Set the working directory to the data folder and read the iris dataset as an R object DF.
import pandas as pd
DF = pd.read_csv('iris.csv')
Remember: For Python, the indexing starts at zero and the operation at row or column level is according to
start, stop, step format.
| Approach | Explanation | Example |
|---|---|---|
| Select row(s) | ||
iloc[index] |
Get the row(s) by the index value (index-based location) | DF.iloc[10, ]; DF.iloc[0:10, ]; DF.iloc[0:10:2, ] |
logical operator |
Get the rows where condition is TRUE (note incomplete length will recycle) | DF[DF.SepalLength >= 7.5, ] |
drop with axis 0 |
Drop the indexed rows | DF.drop([0,1], axis = 0) |
| Select column(s) | ||
. |
Get the named column | DF.Species |
[Name] |
Get the named column | DF['Species']; DF[:, ['SepalLength', 'Species']] |
loc[:, Name] |
Get the named column (name-based location) | DF.loc[:, ['Species']]; DF.loc[:, ['SepalLength', 'Species']]; DF.loc[:, 'SepalWidth':'Species'] |
drop with axis 1 |
Drop the named columns | DF.drop(['SepalLength', 'Species'], axis = 1) |
DF = read.csv('./data/iris.csv')
DF[, 4]
DF[c(3, 4), ]
DF[-3, ]
DF[DF$SepalLength >= 7.5, ]
DF$Species
DF[, 4]
DF['Species']
DF[[, c('SepalLength', 'Species')]
DF[, -4]
DF[,c(T,F,F,F,T)]