Section 48 Vectorised Operation


48.1 Use Vectorised code

  • Perform operation on all elements at once

  • Use logical indexing to handle parallel cases

  • Complete sequential steps in one go

48.2 Vectorised Operation: Numeric vector

  • Arithmatic and Statistical functions + Elementwise operation on a vector + Both vectors are of the same length + The longer vector is a multiple of the shorter vector + The longer vector is NOT a multiple of the shorter vector

48.3 Vectorised Operation: Logical vector

  • Logical operation: == != < <= > >= & | ! + Elementwise operation on a vector + Both vectors are of the same length + The longer vector is a multiple of the shorter vector + The longer vector is NOT a multiple of the shorter vector

  • Implicit and explicit coercion + mathematical operation (e.g. sum)

48.4 Vectorised Operation: Missing values (NA)

  • Logical operation: == != < <= > >= & | ! + Elementwise operation on a vector + Both vectors are of the same length + The longer vector is a multiple of the shorter vector + The longer vector is NOT a multiple of the shorter vector

  • Calling function with a vector containing NA


48.5 Vectorised operation: Example 1

x <- c(1:10)

y <- x + 10
z <- x^2 - 5*x - 10

sum(y)

sum(z)

y + z
sum(y + z)

sum(x > 5)

sum(x[x>5]^2)


48.6 Vectorised operation: Example 2

  • Generate a vector of 20 random number \(x\) from the following Normal distribution:

\[ \large x \sim N(10,2^2) \]

  • Calculate AM, GM and HM using the following formula.

Arithmetic Mean (AM)

\[ \large AM = (x_1 + x_2 + ... + x_n)/n = \frac{1}{n}\sum\limits_{i=1}^{n} x_{i}\]

Geometric Mean (GM)

\[ \large GM = \sqrt[n]{(x_1 x_2 ... x_n)} = \left( \prod \limits_{i=1}^{n} x_{i} \right) ^{\frac{1}{n}} \]

Harmonic Mean (HM)

\[ \large HM = \frac{n}{(\frac{1}{x_1} + \frac{1}{x_2} + ... + \frac{1}{x_n})} = \frac{1}{\frac{1}{n}\sum\limits_{i=1}^{n} \frac{1}{x_i}}\]

set.seed(12345)

x <- rnorm(n=20, mean=10, sd=2)

n <- length(x)

x.am <- sum(x)/n
mean(x)

x.gm <- prod(x)^(1/n)

x.gm <- exp(sum(log(x[x > 0]), na.rm=TRUE) / length(x))

x.hm <- 1 / (sum(1/x)/n)


48.7 Vectorised operation: Example 3

  • Use the above data to calculate sample variance and standard deviation.

Sample Variance

\[ \large Var(x) = s_x^2 = \frac{1}{n-1}\sum\limits_{i=1}^{n} (x_i-\bar{x})^2 \]

Sample Standard Deviation

\[ \large s_x = \sqrt{s_x^2} = \sqrt{Var(x)} \]

x <- rnorm(n=20, mean=10, sd=2)

n <- length(x)

mean.x <- sum(x)/n

var.x <- sum( (x - mean.x)^2 ) / (n-1) 
var(x)

sd.x <- sqrt(var.x)
sd(x)


48.8 Vectorised operation: Example 4

Explain the operations of the following R codes.

x <- c(1:10)

sum(x<5)
sum(x[x<5])


meanx <- sum(x[x<5]) / sum(x<5)
mean(x[x<5])


x[x<5]
x[x<5]^2 - 5*x[x<5] - 10

sum((x[x<5]-meanx)^2)/(sum(x<5)-1)
var(x[x<5])


48.9 Vectorised operation: Example 5

  • Generate a vector of 20 random number \(x\) from the following Normal distribution:

  • Generate a vector of 20 random number \(y\) from the following Normal distribution:

  • Calculate the correlation coefficient between \(x\) and \(y\) using the following formula.

  • Also check the estimate using the cor function

\[ \large r_{xy} = \frac{Cov(x,y)}{\sqrt{(Var(x)Var(y)}} = \frac{Cov(x,y)}{s_xs_y} \]

\[ \large Cov(x,y) = s_{xy} = \frac{1}{n-1}\sum\limits_{i=1}^{n} (x_i-\bar{x})(y_i-\bar{y}) \]

\[ \large Var(x) = s_x^2 = \frac{1}{n-1}\sum\limits_{i=1}^{n} (x_i-\bar{x})^2 \]

\[ \large Var(y) = s_y^2 = \frac{1}{n-1}\sum\limits_{i=1}^{n} (y_i-\bar{y})^2 \]

set.seed(12345)

x <- rnorm(n=20, mean=12, sd=2)
y <- rnorm(n=20, mean=10, sd=2)

n <- length(x)

mean.x <- sum(x)/n
mean(x)
mean.y <- sum(y)/n
mean(y)

var.x <- sum( (x - mean.x)^2 ) / (n-1) 
var.x
var(x)

var.y <- sum( (y - mean.y)^2 ) / (n-1) 
var.y
var(y)

cov.xy <- sum((x-mean.x)*(y-mean.y)) / (n-1)
cov.xy
cov(x,y)

cor.xy <- cov.xy / sqrt(var.x*var.y)
cor.xy
cor(x,y)


48.10 Vectorised operation: Example 6

Skewness

\[ \large Skewness = \frac{m^3}{s^3} = \frac{\frac{1}{n}\sum\limits_{i=1}^{n}(x_i-\bar{x})^3} {[\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^2]^{3/2}} \]

Kurtosis

\[ \large Kurtosis = \frac{m^4}{s^4} = \frac{\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^4} {[\frac{1}{n}\sum\limits_{i=1}^{n} (x_i-\bar{x})^2]^{2}} \]

where \(\large \bar{x}\) is the sample mean, \(\large s\) is the sample standard deviation, and the numerator \(\large m^3\) is the sample third central moment and \(\large m^4\) is the sample fourth central moment.

x <- rnorm(n=20, mean=10, sd=2)

n <- length(x)

# Skewness
skx <- (sum((x - mean(x))^3)/n)/((sum((x - mean(x))^2)/n)^(3/2))

# Kurtosis
kurtx <- (sum((x - mean(x))^4)/n)/((sum((x - mean(x))^2)/n)^2)


48.11 Vectorised operation: Example 7

  • Generate a vector of 20 random number \(x_1\) from the following Normal distribution:

\[ \large x_1 \sim N(12,2^2) \]

  • Generate a vector of 15 random number \(x_2\) from the following Normal distribution:

\[ \large x_2 \sim N(10,2^2) \]

  • Calculate the following:

Sample Mean: \[ \large \bar{x_1} = \frac{1}{n}\sum\limits_{i=1}^{n} x_{1i} \] \[ \large \bar{x_2} = \frac{1}{n}\sum\limits_{i=1}^{n} x_{2i} \]

Pooled Sample Standard Deviation: \[ \large s = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}\] Where: \(\large s_1^2\) and \(\large s_2^2\) are variances from Sample 1 and Sample 2, respectively.

Test statistic \(t_{Cal}\) under \(\large H_O: \mu_1 = \mu_2\):

\[ \large t_{Cal} = \frac{(\bar{x_1}-\bar{x_2})} {s\sqrt{1/n_1+1/n_2}} \]

Two-tailed probability under the distribution of the test statistic: \[\large 2*pt(q = |t_{Cal}|, df = n1+n2-2, lower.tail=FALSE)\]

Compare the results based on t.test.

set.seed(12345)

x1 <- rnorm(n=20, mean=12, sd=2)
x2 <- rnorm(n=15, mean=10, sd=2)

n1 <- length(x1)
n2 <- length(x2)
df <- n1 + n2 - 2

mean.x1 <- sum(x1)/n1
mean(x1)
mean.x2 <- sum(x2)/n2
mean(x2)

var.x1 <- sum( (x1 - mean.x1)^2 ) / (n1-1) 
var.x1
var(x1)

var.x2 <- sum( (x2 - mean.x2)^2 ) / (n2-1) 
var.x2
var(x2)

s2 <- ((n1-1)*var.x1 + (n2-1)*var.x2) / df
s <- sqrt(s2)

# s2 <- var.x1/n1 + var.x2/n2
# s <- sqrt(s2)


t_cal <- (mean.x1 - mean.x2) / (s * sqrt(1/n1 + 1/n2))
t_cal

prob <- 2 * pt(q = t_cal, df = df, lower.tail = FALSE)
prob

t.test(x = x1, y = x2, var.equal = TRUE, paired = FALSE)


# methods(t.test) 
# getAnywhere(t.test.default)