16 Data: Slice & Dice

There are multiple ways to slice and dice dataframe
Indexing could be done at both row level and column level

16.1 R

Set the working directory to the data folder and read the iris dataset as an R object DF.

DF = read.csv('iris.csv')

Approach	Explanation	Example
Select row(s)
`[i, ]`	Get the row indexed by integer value i	`DF[, 4]`
`Name`	Get the row by name	`DF[c('3', '4'), ]`
`[-i, ]`	Remove the row indexed by integer value i	`DF[-3, ]`
`logical operator`	Get the rows where condition is TRUE (note incomplete length will recycle)	`DF[DF$SepalLength >= 7.5, ]`
Select column(s)
$	Get the named column	`DF$Species`
`[, j]`	Get the column indexed by integer value j	`DF[, 4]`
`Name`	Get the column by name string	`DF['Species']; DF[, c('SepalLength', 'Species')]`
`[, -j]`	Remove the column indexed by integer value j	`DF[, -4]`
`logical operator`	Get the columns where the condition is TRUE (note incomplete length will recycle)	`DF[,c(T,F,F,F,T)]`

16.2 Python

Set the working directory to the data folder and read the iris dataset as an R object DF.

import pandas as pd

DF = pd.read_csv('iris.csv')

Remember: For Python, the indexing starts at zero and the operation at row or column level is according to

start, stop, step format.

Approach	Explanation	Example
Select row(s)
`iloc[index]`	Get the row(s) by the index value (index-based location)	`DF.iloc[10, ]; DF.iloc[0:10, ]; DF.iloc[0:10:2, ]`
`logical operator`	Get the rows where condition is TRUE (note incomplete length will recycle)	`DF[DF.SepalLength >= 7.5, ]`
`drop with axis 0`	Drop the indexed rows	`DF.drop([0,1], axis = 0)`
Select column(s)
`.`	Get the named column	`DF.Species`
`[Name]`	Get the named column	`DF['Species']; DF[:, ['SepalLength', 'Species']]`
`loc[:, Name]`	Get the named column (name-based location)	`DF.loc[:, ['Species']]; DF.loc[:, ['SepalLength', 'Species']]; DF.loc[:, 'SepalWidth':'Species']`
`drop with axis 1`	Drop the named columns	`DF.drop(['SepalLength', 'Species'], axis = 1)`

DF = read.csv('./data/iris.csv')

DF[, 4]
DF[c(3, 4), ]

DF[-3, ]

DF[DF$SepalLength >= 7.5, ]

DF$Species
DF[, 4]

DF['Species']

DF[[, c('SepalLength', 'Species')]
   
DF[, -4]

DF[,c(T,F,F,F,T)]

15 Data: Overview

17 Data: Summary Statistics