16 Data: Slice & Dice
- There are multiple ways to slice and dice dataframe
- Indexing could be done at both row level and column level
16.1 R
Set the working directory to the data folder and read the iris dataset as an R object DF
.
DF = read.csv('iris.csv')
Approach | Explanation | Example |
---|---|---|
Select row(s) | ||
[i, ] |
Get the row indexed by integer value i | DF[, 4] |
Name |
Get the row by name | DF[c('3', '4'), ] |
[-i, ] |
Remove the row indexed by integer value i | DF[-3, ] |
logical operator |
Get the rows where condition is TRUE (note incomplete length will recycle) | DF[DF$SepalLength >= 7.5, ] |
Select column(s) | ||
$ | Get the named column | DF$Species |
[, j] |
Get the column indexed by integer value j | DF[, 4] |
Name |
Get the column by name string | DF['Species']; DF[, c('SepalLength', 'Species')] |
[, -j] |
Remove the column indexed by integer value j | DF[, -4] |
logical operator |
Get the columns where the condition is TRUE (note incomplete length will recycle) | DF[,c(T,F,F,F,T)] |
16.2 Python
Set the working directory to the data folder and read the iris dataset as an R object DF
.
import pandas as pd
DF = pd.read_csv('iris.csv')
Remember: For Python, the indexing starts at zero and the operation at row or column level is according to
start, stop, step
format.
Approach | Explanation | Example |
---|---|---|
Select row(s) | ||
iloc[index] |
Get the row(s) by the index value (index-based location) | DF.iloc[10, ]; DF.iloc[0:10, ]; DF.iloc[0:10:2, ] |
logical operator |
Get the rows where condition is TRUE (note incomplete length will recycle) | DF[DF.SepalLength >= 7.5, ] |
drop with axis 0 |
Drop the indexed rows | DF.drop([0,1], axis = 0) |
Select column(s) | ||
. |
Get the named column | DF.Species |
[Name] |
Get the named column | DF['Species']; DF[:, ['SepalLength', 'Species']] |
loc[:, Name] |
Get the named column (name-based location) | DF.loc[:, ['Species']]; DF.loc[:, ['SepalLength', 'Species']]; DF.loc[:, 'SepalWidth':'Species'] |
drop with axis 1 |
Drop the named columns | DF.drop(['SepalLength', 'Species'], axis = 1) |
= read.csv('./data/iris.csv')
DF
4]
DF[, c(3, 4), ]
DF[
-3, ]
DF[
$SepalLength >= 7.5, ]
DF[DF
$Species
DF4]
DF[,
'Species']
DF[
c('SepalLength', 'Species')]
DF[[,
-4]
DF[,
c(T,F,F,F,T)] DF[,