Section 48 Vectorised Operation

48.1 Use Vectorised code

Perform operation on all elements at once
Use logical indexing to handle parallel cases
Complete sequential steps in one go

48.2 Vectorised Operation: Numeric vector

Arithmatic and Statistical functions + Elementwise operation on a vector + Both vectors are of the same length + The longer vector is a multiple of the shorter vector + The longer vector is NOT a multiple of the shorter vector

48.3 Vectorised Operation: Logical vector

Logical operation: == != < <= > >= & | ! + Elementwise operation on a vector + Both vectors are of the same length + The longer vector is a multiple of the shorter vector + The longer vector is NOT a multiple of the shorter vector
Implicit and explicit coercion + mathematical operation (e.g. sum)

48.4 Vectorised Operation: Missing values (`NA`)

Logical operation: == != < <= > >= & | ! + Elementwise operation on a vector + Both vectors are of the same length + The longer vector is a multiple of the shorter vector + The longer vector is NOT a multiple of the shorter vector
Calling function with a vector containing NA

48.5 Vectorised operation: Example 1

x <- c(1:10)

y <- x + 10
z <- x^2 - 5*x - 10

sum(y)

sum(z)

y + z
sum(y + z)

sum(x > 5)

sum(x[x>5]^2)

48.6 Vectorised operation: Example 2

Generate a vector of 20 random number \(x\) from the following Normal distribution:

\[ \large x \sim N(10,2^2) \]

Calculate AM, GM and HM using the following formula.

Arithmetic Mean (AM)

\[ \large AM = (x_1 + x_2 + ... + x_n)/n = \frac{1}{n}\sum\limits_{i=1}^{n} x_{i}\]

Geometric Mean (GM)

\[ \large GM = \sqrt[n]{(x_1 x_2 ... x_n)} = \left( \prod \limits_{i=1}^{n} x_{i} \right) ^{\frac{1}{n}} \]

Harmonic Mean (HM)

\[ \large HM = \frac{n}{(\frac{1}{x_1} + \frac{1}{x_2} + ... + \frac{1}{x_n})} = \frac{1}{\frac{1}{n}\sum\limits_{i=1}^{n} \frac{1}{x_i}}\]

set.seed(12345)

x <- rnorm(n=20, mean=10, sd=2)

n <- length(x)

x.am <- sum(x)/n
mean(x)

x.gm <- prod(x)^(1/n)

x.gm <- exp(sum(log(x[x > 0]), na.rm=TRUE) / length(x))

x.hm <- 1 / (sum(1/x)/n)

48.7 Vectorised operation: Example 3

Use the above data to calculate sample variance and standard deviation.

Sample Variance

\[ \large Var(x) = s_x^2 = \frac{1}{n-1}\sum\limits_{i=1}^{n} (x_i-\bar{x})^2 \]

Sample Standard Deviation

\[ \large s_x = \sqrt{s_x^2} = \sqrt{Var(x)} \]

x <- rnorm(n=20, mean=10, sd=2)

n <- length(x)

mean.x <- sum(x)/n

var.x <- sum( (x - mean.x)^2 ) / (n-1) 
var(x)

sd.x <- sqrt(var.x)
sd(x)

48.8 Vectorised operation: Example 4

Explain the operations of the following R codes.

x <- c(1:10)

sum(x<5)
sum(x[x<5])


meanx <- sum(x[x<5]) / sum(x<5)
mean(x[x<5])


x[x<5]
x[x<5]^2 - 5*x[x<5] - 10

sum((x[x<5]-meanx)^2)/(sum(x<5)-1)
var(x[x<5])

48.9 Vectorised operation: Example 5

Generate a vector of 20 random number \(x\) from the following Normal distribution:
Generate a vector of 20 random number \(y\) from the following Normal distribution:
Calculate the correlation coefficient between \(x\) and \(y\) using the following formula.
Also check the estimate using the cor function

\[ \large r_{xy} = \frac{Cov(x,y)}{\sqrt{(Var(x)Var(y)}} = \frac{Cov(x,y)}{s_xs_y} \]

\[ \large Cov(x,y) = s_{xy} = \frac{1}{n-1}\sum\limits_{i=1}^{n} (x_i-\bar{x})(y_i-\bar{y}) \]

\[ \large Var(x) = s_x^2 = \frac{1}{n-1}\sum\limits_{i=1}^{n} (x_i-\bar{x})^2 \]

\[ \large Var(y) = s_y^2 = \frac{1}{n-1}\sum\limits_{i=1}^{n} (y_i-\bar{y})^2 \]

set.seed(12345)

x <- rnorm(n=20, mean=12, sd=2)
y <- rnorm(n=20, mean=10, sd=2)

n <- length(x)

mean.x <- sum(x)/n
mean(x)
mean.y <- sum(y)/n
mean(y)

var.x <- sum( (x - mean.x)^2 ) / (n-1) 
var.x
var(x)

var.y <- sum( (y - mean.y)^2 ) / (n-1) 
var.y
var(y)

cov.xy <- sum((x-mean.x)*(y-mean.y)) / (n-1)
cov.xy
cov(x,y)

cor.xy <- cov.xy / sqrt(var.x*var.y)
cor.xy
cor(x,y)

48.10 Vectorised operation: Example 6

Skewness

\[ \large Skewness = \frac{m^3}{s^3} = \frac{\frac{1}{n}\sum\limits_{i=1}^{n}(x_i-\bar{x})^3} {[\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^2]^{3/2}} \]

Kurtosis

\[ \large Kurtosis = \frac{m^4}{s^4} = \frac{\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^4} {[\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^2]^{2}} \]

where \(\large \bar{x}\) is the sample mean, \(\large s\) is the sample standard deviation, and the numerator \(\large m^3\) is the sample third central moment and \(\large m^4\) is the sample fourth central moment.

x <- rnorm(n=20, mean=10, sd=2)

n <- length(x)

# Skewness
skx <- (sum((x - mean(x))^3)/n)/((sum((x - mean(x))^2)/n)^(3/2))

# Kurtosis
kurtx <- (sum((x - mean(x))^4)/n)/((sum((x - mean(x))^2)/n)^2)

48.11 Vectorised operation: Example 7

Generate a vector of 20 random number \(x_1\) from the following Normal distribution:

\[ \large x_1 \sim N(12,2^2) \]

Generate a vector of 15 random number \(x_2\) from the following Normal distribution:

\[ \large x_2 \sim N(10,2^2) \]

Calculate the following:

Sample Mean: \[ \large \bar{x_1} = \frac{1}{n}\sum\limits_{i=1}^{n} x_{1i} \] \[ \large \bar{x_2} = \frac{1}{n}\sum\limits_{i=1}^{n} x_{2i} \]

Pooled Sample Standard Deviation: \[ \large s = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}\] Where: \(\large s_1^2\) and \(\large s_2^2\) are variances from Sample 1 and Sample 2, respectively.

Test statistic \(t_{Cal}\) under \(\large H_O: \mu_1 = \mu_2\):

\[ \large t_{Cal} = \frac{(\bar{x_1}-\bar{x_2})} {s\sqrt{1/n_1+1/n_2}} \]

Two-tailed probability under the distribution of the test statistic: \[\large 2*pt(q = |t_{Cal}|, df = n1+n2-2, lower.tail=FALSE)\]

Compare the results based on t.test.

set.seed(12345)

x1 <- rnorm(n=20, mean=12, sd=2)
x2 <- rnorm(n=15, mean=10, sd=2)

n1 <- length(x1)
n2 <- length(x2)
df <- n1 + n2 - 2

mean.x1 <- sum(x1)/n1
mean(x1)
mean.x2 <- sum(x2)/n2
mean(x2)

var.x1 <- sum( (x1 - mean.x1)^2 ) / (n1-1) 
var.x1
var(x1)

var.x2 <- sum( (x2 - mean.x2)^2 ) / (n2-1) 
var.x2
var(x2)

s2 <- ((n1-1)*var.x1 + (n2-1)*var.x2) / df
s <- sqrt(s2)

# s2 <- var.x1/n1 + var.x2/n2
# s <- sqrt(s2)


t_cal <- (mean.x1 - mean.x2) / (s * sqrt(1/n1 + 1/n2))
t_cal

prob <- 2 * pt(q = t_cal, df = df, lower.tail = FALSE)
prob

t.test(x = x1, y = x2, var.equal = TRUE, paired = FALSE)


# methods(t.test) 
# getAnywhere(t.test.default)