15 Data: Overview

  • Set the current working directory to read the data file

  • Check all objects in the current working directory

  • R base format of data is data.frame

  • Python format of data is DataFrame based on pandas library

  • For Python, import pandas as pd is a conventional import statement

  • Here we present functions to explore R and Python dataframe


15.1 R

Set the working directory to the data folder and read the iris dataset as an R object DF.

DF = read.csv('iris.csv')

Function Explanation Example
dim Dimension of the data.frame dim(DF)
nrow Number of rows in the data.frame nrow(DF)
ncol Number of columns in the data.frame ncol(DF)
head First n (default = 6) rows of the data.frame head(DF)
tail Last n (default = 6) rows of the data.frame tail(DF)
rownames Rownames of the data.frame rownames(DF)
colnames, names Column names of the data.frame names(DF)
str Structure of the data.frame str(DF)

15.2 Python

Set the working directory to the data folder and read the iris dataset as an R object DF.

import pandas as pd

DF = pd.read_csv('iris.csv')

Method Explanation Example
shape Dimension of the data.frame DF.shape
shape with index 0 Number of rows in the data.frame DF.shape[0]
shape with index 1 Number of columns in the data.frame DF.shape[1]
head First n (default = 5) rows of the data.frame DF.head()
tail Last n (default = 5) rows of the data.frame DF.tail()
index Rownames of the data.frame DF.index
columns Column names of the data.frame DF.columns
info, dtypes Types of different columns in the data.frame DF.info(), DF.dtypes

15.3 Note