16 Data: Slice & Dice

  • There are multiple ways to slice and dice dataframe
  • Indexing could be done at both row level and column level

16.1 R

Set the working directory to the data folder and read the iris dataset as an R object DF.

DF = read.csv('iris.csv')

Approach Explanation Example
Select row(s)
[i, ] Get the row indexed by integer value i DF[, 4]
Name Get the row by name DF[c('3', '4'), ]
[-i, ] Remove the row indexed by integer value i DF[-3, ]
logical operator Get the rows where condition is TRUE (note incomplete length will recycle) DF[DF$SepalLength >= 7.5, ]
Select column(s)
$ Get the named column DF$Species
[, j] Get the column indexed by integer value j DF[, 4]
Name Get the column by name string DF['Species']; DF[, c('SepalLength', 'Species')]
[, -j] Remove the column indexed by integer value j DF[, -4]
logical operator Get the columns where the condition is TRUE (note incomplete length will recycle) DF[,c(T,F,F,F,T)]

16.2 Python

Set the working directory to the data folder and read the iris dataset as an R object DF.

import pandas as pd

DF = pd.read_csv('iris.csv')

Remember: For Python, the indexing starts at zero and the operation at row or column level is according to

start, stop, step format.

Approach Explanation Example
Select row(s)
iloc[index] Get the row(s) by the index value (index-based location) DF.iloc[10, ]; DF.iloc[0:10, ]; DF.iloc[0:10:2, ]
logical operator Get the rows where condition is TRUE (note incomplete length will recycle) DF[DF.SepalLength >= 7.5, ]
drop with axis 0 Drop the indexed rows DF.drop([0,1], axis = 0)
Select column(s)
. Get the named column DF.Species
[Name] Get the named column DF['Species']; DF[:, ['SepalLength', 'Species']]
loc[:, Name] Get the named column (name-based location) DF.loc[:, ['Species']]; DF.loc[:, ['SepalLength', 'Species']]; DF.loc[:, 'SepalWidth':'Species']
drop with axis 1 Drop the named columns DF.drop(['SepalLength', 'Species'], axis = 1)
DF = read.csv('./data/iris.csv')

DF[, 4]
DF[c(3, 4), ]

DF[-3, ]

DF[DF$SepalLength >= 7.5, ]

DF$Species
DF[, 4]

DF['Species']

DF[[, c('SepalLength', 'Species')]
   
DF[, -4]

DF[,c(T,F,F,F,T)]