Title: | Statistical and Geometrical Tools |
---|---|
Description: | A collection of statistical and geometrical tools including the aligned rank transform (ART; Higgins et al. 1990 <doi:10.4148/2475-7772.1443>; Peterson 2002 <doi:10.22237/jmasm/1020255240>; Wobbrock et al. 2011 <doi:10.1145/1978942.1978963>), 2-D histograms and histograms with overlapping bins, a function for making all possible formulae within a set of constraints, amongst others. |
Authors: | Adam B. Smith [aut, cre] |
Maintainer: | Adam B. Smith <[email protected]> |
License: | GPL (>=3) |
Version: | 1.0.5 |
Built: | 2025-01-03 05:13:35 UTC |
Source: | https://github.com/adamlilith/statisfactory |
This function performs the aligned rank transforms on non-parametric data which is useful for further analysis using parametric techniques like ANOVA.
art( x, response = names(x)[1], factors = names(x)[2:ncol(x)], subject = NULL, fun = function(x) mean(x, na.rm = TRUE), verbose = FALSE )
art( x, response = names(x)[1], factors = names(x)[2:ncol(x)], subject = NULL, fun = function(x) mean(x, na.rm = TRUE), verbose = FALSE )
x |
Data frame. |
response |
Character. Names of column of |
factors |
Character list. Names of columns of |
subject |
|
fun |
Function. Function used to calculate cell centering statistic (the default is to use: |
verbose |
Logical. If TRUE then display progress. |
The function successfully re-creates rankings given by ARTool (Wobbrock et al. 2011) of data in Higgins et al. (1990) for data with 2 and 3 factors.
If response
is ranks and the set of ranks in each cell is the same (e.g., each cell has ranks 1, 2, and 3, but not necessarily in that order), then all values will be equal across the different ART variables. This occurs because the center of each cell (e.g., the mean) is the same as the grand mean, so the aligned values are simply the residuals. An ANOVA on this data yields no variance across cells, so the F tests are invalid.
Data frame.
Higgins, J.J., Blair, R.C., and Tashtoush, S. 1990. The aligned rank transform procedure. Proceedings of the Conference on Applied Statistics in Agriculture. Manhattan, Kansas: Kansas State University, pp. 185-195. doi:10.4148/2475-7772.1443
Peterson, K. 2002. Six modifications of the aligned rank transform test for interaction. Journal of Modern Applied Statistical Methods 1:100-109. doi:10.22237/jmasm/1020255240
Wobbrock, J.O., Findlater, L., Gergle, D., and Higgins, J.J. 2011. The aligned rank transform for nonparametric factorial analysis using only ANOVA procedures. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2011). Vancouver, British Columbia (May 7-12, 2011). New York: ACM Press, pp. 143-146. doi:10.1145/1978942.1978963.
x <- data.frame( subject=c('a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'), factor1=c('up', 'up', 'up', 'up', 'up', 'up', 'down', 'down', 'down', 'down', 'down', 'down'), factor2=c('high', 'med', 'low', 'high', 'med', 'low', 'high', 'med', 'low', 'high', 'med', 'low'), response=c(1, 17, 1, 1, 0, 4, 5, 6, 3, 7, 100, 70) ) art(x=x, response='response', factors=c('factor1', 'factor2'))
x <- data.frame( subject=c('a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'), factor1=c('up', 'up', 'up', 'up', 'up', 'up', 'down', 'down', 'down', 'down', 'down', 'down'), factor2=c('high', 'med', 'low', 'high', 'med', 'low', 'high', 'med', 'low', 'high', 'med', 'low'), response=c(1, 17, 1, 1, 0, 4, 5, 6, 3, 7, 100, 70) ) art(x=x, response='response', factors=c('factor1', 'factor2'))
This function back-transforms principal component scores to their original values.
backTransPCA(pca, x = NULL)
backTransPCA(pca, x = NULL)
pca |
Object of class |
x |
Either |
Numeric vector.
x <- data.frame( x1 = 1:20 + rnorm(20), x2 = 1:20 + rnorm(20, 0, 5), x3 = sample(20, 20) ) pca1 <- prcomp(x, center=FALSE, scale=FALSE) pca2 <- prcomp(x, center=TRUE, scale=FALSE) pca3 <- prcomp(x, center=TRUE, scale=TRUE) backTransPCA(pca1) backTransPCA(pca2) backTransPCA(pca3)
x <- data.frame( x1 = 1:20 + rnorm(20), x2 = 1:20 + rnorm(20, 0, 5), x3 = sample(20, 20) ) pca1 <- prcomp(x, center=FALSE, scale=FALSE) pca2 <- prcomp(x, center=TRUE, scale=FALSE) pca3 <- prcomp(x, center=TRUE, scale=TRUE) backTransPCA(pca1) backTransPCA(pca2) backTransPCA(pca3)
This function calculates the number of objects formed by one or more adjacent cells that touch on their edges (i.e., not just at a corner). One way to solve this (inefficiently) is using a "ink-spreading" algorithm that accumulates adjacent cells until all are accounted for, then counts this as a single component. This function uses an efficient solution based on the Euler characteristic.
countConnected(x, count = 1)
countConnected(x, count = 1)
x |
Matrix |
count |
Value to count as a "presence" in the matrix. All other values will be assumed to be not part of a component. |
Inspired by an answer by Alon Amit to the question on Quora, "What are some programming problems that look hard at a first glance but are actually easy?".
An integer (the number of connected, non-conterminous components).
v <- c( 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) x <- matrix(v, ncol=4, byrow=TRUE) x countConnected(x) ## Not run: # will break because of connection at a vertex v <- c( 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0) x <- matrix(v, ncol=4, byrow=TRUE) x countConnected(x) ## End(Not run)
v <- c( 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) x <- matrix(v, ncol=4, byrow=TRUE) x countConnected(x) ## Not run: # will break because of connection at a vertex v <- c( 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0) x <- matrix(v, ncol=4, byrow=TRUE) x countConnected(x) ## End(Not run)
Euclidian distance in one or more dimensions.
euclid(a, b, na.rm = FALSE)
euclid(a, b, na.rm = FALSE)
a |
Numeric vector. |
b |
Numeric vector of same length as |
na.rm |
Logical. If |
Numeric.
euclid(0, 5) euclid(c(0, 0), c(1, 1)) euclid(c(0, 0, 0), c(1, 1, 1))
euclid(0, 5) euclid(c(0, 0), c(1, 1)) euclid(c(0, 0, 0), c(1, 1, 1))
Calculates the fuzzy Jaccard index. The "normal" Jaccard index is given by sum(A intersect B) / sum(A union B)
, where A
and B
are sets. Typically, A
and B
are binary outcomes, but the fuzzy version can accommodate values in [0, 1] and/or binary outcomes. The computationally efficient and equivalent method is sum(pmin(A, B)) / (sum(A) + sum(B) - sum(pmin(A, B)))
. If A
and B
and both binary, the outcome is the same as the "plain" Jaccard index.
fuzzyJaccard(a, b)
fuzzyJaccard(a, b)
a , b
|
Vectors of binary and/or values in the range [0, 1]. The vectors must be of the same length. |
Numeric in the range [0, 1].
a <- c(0.3, 0, 0.9, 0.5) b <- c(1, 1, 0, 0) fuzzyJaccard(a, b)
a <- c(0.3, 0, 0.9, 0.5) b <- c(1, 1, 0, 0) fuzzyJaccard(a, b)
Geometric mean, with optional removal of NA
's and propagation of zeros.
geoMean(x, prop0 = FALSE, na.rm = TRUE)
geoMean(x, prop0 = FALSE, na.rm = TRUE)
x |
Numeric list. |
prop0 |
Logical, if |
na.rm |
Logical, if |
Adapted from Paul McMurdie on StackOverflow.
Numeric.
x <- seq(0.01, 1, by=0.01) mean(x) geoMean(x) x <- seq(0, 1, by=0.01) mean(x) geoMean(x) geoMean(x, prop0=TRUE)
x <- seq(0.01, 1, by=0.01) mean(x) geoMean(x) x <- seq(0, 1, by=0.01) mean(x) geoMean(x) geoMean(x, prop0=TRUE)
Two-dimensional histogram
hist2d(x, breaks1 = "Sturges", breaks2 = "Sturges", right = TRUE, ...)
hist2d(x, breaks1 = "Sturges", breaks2 = "Sturges", right = TRUE, ...)
x |
Data frame or matrix with at least two columns. Only first two columns are used to tally frequencies. |
breaks1 |
One of the following describing how breaks for the first variable are calculated:
|
breaks2 |
Same as |
right |
Logical, if |
... |
Arguments to pass to |
Object of class matrix
and histogram2d
. Columns pertain to bins of x1
and rows x2
. Column names and row names are mid-points of bins.
x1 <- rnorm(1000) x2 <- 0.5 * x1 * rnorm(1000) x <- data.frame(x1=x1, x2=x2) hist2d(x)
x1 <- rnorm(1000) x2 <- 0.5 * x1 * rnorm(1000) x <- data.frame(x1=x1, x2=x2) hist2d(x)
Histogram of number of values in overlapping bins.
histOverlap(x, breaks, right = TRUE, graph = TRUE, indices = FALSE)
histOverlap(x, breaks, right = TRUE, graph = TRUE, indices = FALSE)
x |
Numeric values. |
breaks |
One integer, three numeric values, or a matrix or data frame with at least two columns:
|
right |
Logical, if |
graph |
Logical, if |
indices |
Logical, if |
Matrix
set.seed(123) x <- rnorm(1000) histOverlap(x, breaks=10, graph=TRUE) histOverlap(x, breaks=c(0, 1, 10), graph=TRUE) mat <- matrix(c(seq(0, 1, by=0.1), seq(0.3, 1.3, by=0.1)), ncol=2) histOverlap(x, breaks=mat, graph=TRUE) histOverlap(x, breaks=mat, indices=TRUE)
set.seed(123) x <- rnorm(1000) histOverlap(x, breaks=10, graph=TRUE) histOverlap(x, breaks=c(0, 1, 10), graph=TRUE) mat <- matrix(c(seq(0, 1, by=0.1), seq(0.3, 1.3, by=0.1)), ncol=2) histOverlap(x, breaks=mat, graph=TRUE) histOverlap(x, breaks=mat, indices=TRUE)
This function is the inverse of logitAdj
. That function calculates the logit of values but is robust to cases where the operand is 0 or 1. The adjusted inverse logit is equal to (base^x + epsilon * base^x - epsilon) / (base^x + 1)
.
invLogitAdj(x, epsilon = 0.01, base = 10, auto = FALSE)
invLogitAdj(x, epsilon = 0.01, base = 10, auto = FALSE)
x |
Numeric vector. |
epsilon |
Value or character. If a numeric value (typically ~0.01 or smaller), then this is added/subtracted from |
base |
Base of logarithm. Use |
auto |
If |
Numeric.
x <- seq(0, 1, by=0.1) y <- logitAdj(x) xx <- invLogitAdj(y, auto = TRUE)
x <- seq(0, 1, by=0.1) y <- logitAdj(x) xx <- invLogitAdj(y, auto = TRUE)
This function returns the logit value (log(x / (1 - x))
) where a small value can be added to x
to avoid problems of calculating the log when x
equals 0 or 1.
logitAdj(x, epsilon = 0.01, base = 10)
logitAdj(x, epsilon = 0.01, base = 10)
x |
Numeric vector. |
epsilon |
Value to add/subtract from x to ensure log of 0 or 1 is not taken (usually a small number). If |
base |
Base of logarithm. |
Numeric equal to log((x + epsilon)/(1 - x + epsilon), base=base)
.
set.seed(123) x <- seq(0, 1, by=0.01) logitAdj(x) logitAdj(x, 0.001) invLogitAdj(x, 0.001) invLogitAdj(x, 0.001) invLogitAdj(x, auto = TRUE)
set.seed(123) x <- seq(0, 1, by=0.01) logitAdj(x) logitAdj(x, 0.001) invLogitAdj(x, 0.001) invLogitAdj(x, 0.001) invLogitAdj(x, auto = TRUE)
makeFormulae( formula, intercept = TRUE, interceptOnly = TRUE, linearOnly = TRUE, quad = TRUE, ia = TRUE, verboten = NULL, verbotenCombos = NULL, minTerms = NULL, maxTerms = NULL, returnFx = stats::as.formula, verbose = FALSE )
makeFormulae( formula, intercept = TRUE, interceptOnly = TRUE, linearOnly = TRUE, quad = TRUE, ia = TRUE, verboten = NULL, verbotenCombos = NULL, minTerms = NULL, maxTerms = NULL, returnFx = stats::as.formula, verbose = FALSE )
formula |
A |
intercept |
Logical: If |
interceptOnly |
Logical: If |
linearOnly |
Logical: If |
quad |
Logical: If |
ia |
Logical: If |
verboten |
Character vector of terms that should not appear in the models. Ignored if
|
Calculate the mode of numeric, character, or factor data
mmode(x, na.rm = FALSE)
mmode(x, na.rm = FALSE)
x |
Numeric, character, or factor vector. |
na.rm |
Logical. If |
Numeric, character, or factor value.
mmode(round(10 * rnorm(1000, 2))) mmode(c('a', 'b', 'b', 'b', 'c', 'd', 'd'))
mmode(round(10 * rnorm(1000, 2))) mmode(c('a', 'b', 'b', 'b', 'c', 'd', 'd'))
Nagelkerge's / Craig & Uhler's R2
nagelR2(likeNull, likeFull, n)
nagelR2(likeNull, likeFull, n)
likeNull |
Likelihood (not log-likelihood) of the null model or an object of class |
likeFull |
Likelihood (not log-likelihood) of the "full" model or an object of class |
n |
Sample size. |
Numeric.
# create data x <- 1:100 y <- 2 + 1.7 * x + rnorm(100, 0, 30) # models nullModel <- lm(y ~ 1) fullModel <- lm(y ~ x) # plot plot(x, y) abline(nullModel, col='red') abline(fullModel, col='blue') legend('bottomright', legend=c('Null', 'Full'), lwd=1, col=c('red', 'blue')) # R2 likeNull <- exp(as.numeric(logLik(nullModel))) likeFull <- exp(as.numeric(logLik(fullModel))) nagelR2(likeNull, likeFull, 100)
# create data x <- 1:100 y <- 2 + 1.7 * x + rnorm(100, 0, 30) # models nullModel <- lm(y ~ 1) fullModel <- lm(y ~ x) # plot plot(x, y) abline(nullModel, col='red') abline(fullModel, col='blue') legend('bottomright', legend=c('Null', 'Full'), lwd=1, col=c('red', 'blue')) # R2 likeNull <- exp(as.numeric(logLik(nullModel))) likeFull <- exp(as.numeric(logLik(fullModel))) nagelR2(likeNull, likeFull, 100)
This function is similar to pmax
or pmin
, except that it returns the element-wise sum of values. If the input is a matrix
or data.frame
, the output is the same as colSums
.
psum(..., na.rm = FALSE)
psum(..., na.rm = FALSE)
... |
A set of vectors of the same length, a |
na.rm |
If |
Adapted from answer by Ben Bolker on StackOverflow.
A numeric vector.
x1 <- 1:10 x2 <- runif(10) psum(x1, x2) x <- cbind(x1, x2) psum(x) x2[3] <- NA psum(x1, x2) psum(x1, x2, na.rm=TRUE)
x1 <- 1:10 x2 <- runif(10) psum(x1, x2) x <- cbind(x1, x2) psum(x) x2[3] <- NA psum(x1, x2) psum(x1, x2, na.rm=TRUE)
This function ranks values in a data frame or matrix by more than one field, with ties in one field broken by subsequent fields.
rankMulti(x, cols = 1:ncol(x), ...)
rankMulti(x, cols = 1:ncol(x), ...)
x |
Data frame or matrix. |
cols |
Names or indices of columns by which to rank, with first one gaining preference over the second, second over the third, etc. |
... |
Arguments to pass to |
Numeric vector of ranks.
x <- data.frame(x1=c('a', 'b', 'b', 'c', 'a', 'a'), x2=c(11, 2, 1, NA, 10, 11)) rankMulti(x) rankMulti(x, c('x2', 'x1'))
x <- data.frame(x1=c('a', 'b', 'b', 'c', 'a', 'a'), x2=c(11, 2, 1, NA, 10, 11)) rankMulti(x) rankMulti(x, c('x2', 'x1'))
Calculate the root-mean-square deviation (sqrt(mean((x1 - x2)^2))
). If non-constant weights w
are supplied, then the calculation is sqrt(sum(w * (x1 - x2)^2) / sum(w))
. Alternatively, w
can be a function, in which case the returned value is equal to sqrt(mean(w((x1 - x2)^2)))
.
rmsd(x1, x2, w = NULL, na.rm = FALSE)
rmsd(x1, x2, w = NULL, na.rm = FALSE)
x1 |
Numeric vector, matrix, or data frame. |
x2 |
Numeric vector the same length as |
w |
Weights or a function defining weights. If |
na.rm |
Logical, if |
Numeric.
set.seed(123) # numeric vectors x1 <- 1:20 x2 <- 1:20 + rnorm(20) rmsd(x1, x2) x1[1] <- NA rmsd(x1, x2) rmsd(x1, x2, na.rm=TRUE) # matrices x1 <- matrix(1:20, ncol=5) x2 <- matrix(1:20 + rnorm(20), ncol=5) rmsd(x1, x2) x1[1, 1] <- NA rmsd(x1, x2) rmsd(x1, x2, na.rm=TRUE) # weights as values x1 <- matrix(1:20, ncol=5) x2 <- matrix(1:20 + rnorm(20, 0, 2), ncol=5) w <- matrix(1:20, ncol=5) rmsd(x1, x2) rmsd(x1, x2, w) # weights as a function x1 <- matrix(1:20, ncol=5) x2 <- matrix(20:1, ncol=5) w <- function(x) 1 - exp(-x) rmsd(x1, x2) rmsd(x1, x2, w)
set.seed(123) # numeric vectors x1 <- 1:20 x2 <- 1:20 + rnorm(20) rmsd(x1, x2) x1[1] <- NA rmsd(x1, x2) rmsd(x1, x2, na.rm=TRUE) # matrices x1 <- matrix(1:20, ncol=5) x2 <- matrix(1:20 + rnorm(20), ncol=5) rmsd(x1, x2) x1[1, 1] <- NA rmsd(x1, x2) rmsd(x1, x2, na.rm=TRUE) # weights as values x1 <- matrix(1:20, ncol=5) x2 <- matrix(1:20 + rnorm(20, 0, 2), ncol=5) w <- matrix(1:20, ncol=5) rmsd(x1, x2) rmsd(x1, x2, w) # weights as a function x1 <- matrix(1:20, ncol=5) x2 <- matrix(20:1, ncol=5) w <- function(x) 1 - exp(-x) rmsd(x1, x2) rmsd(x1, x2, w)
This function permutes values across two or more vectors or columns between two or more data frames or matrices. If vectors, then all values are swapped randomly and the output is a list object with vectors of the same length. If data frames or matrices, then values in selected columns are swapped across the data frames or matrices and the output is a list object with data frames or matrices of the same dimension as the originals.
sampleAcross(..., by = NULL, replace = FALSE)
sampleAcross(..., by = NULL, replace = FALSE)
... |
One or more vectors, data frames, or matrices (all objects must be the same class). |
by |
Character list or list of integers. Names of columns or column numbers to permute (only used if |
replace |
Logical. If |
A list object with same number of elements as in ...
with the original dimensions. The order is the same as in ...
(e.g., so if the call is like sampleAcross(a, b, c)
then the output will be a list with permuted versions of a
, b
, and c
in that order).
x1 <- 1:5 x2 <- 6:10 x3 <- 50:60 sampleAcross(x1, x2, x3) sampleAcross(x1, x2, x3, replace=TRUE) a <- data.frame(x=1:10, y=letters[1:10]) b <- data.frame(x=11:20, y=letters[11:20]) sampleAcross(a, b, by='y') sampleAcross(a, b)
x1 <- 1:5 x2 <- 6:10 x3 <- 50:60 sampleAcross(x1, x2, x3) sampleAcross(x1, x2, x3, replace=TRUE) a <- data.frame(x=1:10, y=letters[1:10]) b <- data.frame(x=11:20, y=letters[11:20]) sampleAcross(a, b, by='y') sampleAcross(a, b)
This function scrambles values of a given column of a data frame in a stratified manner with respect to one or more other "covariate" columns. The covariate columns can be specified, as well as the width of the range of each covariate around each focal value from which to sample candidates for swapping.
sampleStrat( x, col, w = function(x) stats::sd(x, na.rm = TRUE)/(max(x, na.rm = TRUE) - min(x, na.rm = TRUE)), d = 0.1, by = "all", permuteBy = TRUE )
sampleStrat( x, col, w = function(x) stats::sd(x, na.rm = TRUE)/(max(x, na.rm = TRUE) - min(x, na.rm = TRUE)), d = 0.1, by = "all", permuteBy = TRUE )
x |
Data frame containing at least two columns, one with numeric values and at least one more with numeric or factor values. |
col |
Character or integer, name or number of column in |
w |
Function or numeric value >0, sets window size of non-factor covariates as a proportion of their range. If using a function it must work on a list of values. It can be helpful if this function accepts the argument |
d |
Numeric > 0, if no swappable value is found within |
by |
Character vector or integers. Name(s) or columns numbers of covariates by which to stratify the target column. Can also specify |
permuteBy |
Logical, if |
The script starts by randomly selecting a value v_i
from the target column. It then finds the value of covariate c_j
, that is associated with v_i
. Call the particular value of c_j
associated with v_i
c_j:i
. If c_j
is a continuous variable it then finds all values c_{v}
that fall within c_j:i - w, c_j:i + w
where w
is a proportion of the range of c_j
.
The function then randomly selects a value of v_k
from those associated with this range of c_j
and swaps v_i
with this value. Depending on the random number generator, v_i
can = v_k
and in fact be the same value. If no values of c_j
other than the one associated with v_i
are found within this range, then the window is expanded iteratively by a factor of w * (1 + d)
until at least one more values that have yet to be swapped have been found. The procedure then finds a window around v_k
as described above (or randomly selects a new v_i
if v_i
was v_k
) and continues. If there is an odd number of values then the last value is kept as is (not scrambled).
If c_j
is a categorical variable (a factor), then the script finds all values of of v
in same factor level as v_i
. Swaps of v
occur within this level of c_j
. However, if there are <2 of values in the level (including the value associated with v_i
), then the script looks to the next factor level. The "next" is taken to be the factor level with the least difference between v_i
and the average of values of v
associated with the potential "next" factor level. The "window" for a factor level is thus the level plus one or more levels with the closest average values of v
given that there is >1 value of v
within this group that has yet to be swapped.
If there is more than one covariate, then these steps are repeated iteratively for each covariate (i.e., selecting values of v
given the stratum identified in covariate c_j
, then among these values those also in the stratum identified in covariate c_k
, and so on). In this case the order in which the covariates are listed in by
can affect the outcome. The order can be permuted each values of v_i
if permuteBy
is TRUE
.
A data frame with one column swapped in a stratified manner relative another column or set of columns.
# Example #1: Scramble column 1 with respect to columns 2 and 3. # Note in the output high values of "a" tend to be associated with # high values of "b" and low values of "c". This tendency decreases as "w" increases. x <- data.frame(a=1:20, b=1:20, c=20:1, d=c(rep('a', 10), rep('b', 10))) x$d <- as.factor(x$d) x # scramble by all other columns sampleStrat(x=x, col=1, w=0.2, by='all', d=0.1) # scramble by column "d" sampleStrat(x=x, col=1, w=0.2, by='d', d=0.1) # Example #2: The target variable and covariate are equal # (perfectly collinear). How wide must the window (set by # argument "w'" be to reduce the average correlation # between them to an arbitrary low level? df <- data.frame(a=1:100, b=1:100) cor(df) # perfect correlation corFrame <- data.frame() for (w in seq(0.1, 1, 0.1)) { for (countRep in 1:10) { df2 <- sampleStrat(x=df, col=1, w=w) corFrame <- rbind(corFrame, data.frame(w=w, cor=cor(df2)[1, 2])) } } boxplot(cor ~ w, data=corFrame, xlab='w', ylab='correlation coefficient')
# Example #1: Scramble column 1 with respect to columns 2 and 3. # Note in the output high values of "a" tend to be associated with # high values of "b" and low values of "c". This tendency decreases as "w" increases. x <- data.frame(a=1:20, b=1:20, c=20:1, d=c(rep('a', 10), rep('b', 10))) x$d <- as.factor(x$d) x # scramble by all other columns sampleStrat(x=x, col=1, w=0.2, by='all', d=0.1) # scramble by column "d" sampleStrat(x=x, col=1, w=0.2, by='d', d=0.1) # Example #2: The target variable and covariate are equal # (perfectly collinear). How wide must the window (set by # argument "w'" be to reduce the average correlation # between them to an arbitrary low level? df <- data.frame(a=1:100, b=1:100) cor(df) # perfect correlation corFrame <- data.frame() for (w in seq(0.1, 1, 0.1)) { for (countRep in 1:10) { df2 <- sampleStrat(x=df, col=1, w=w) corFrame <- rbind(corFrame, data.frame(w=w, cor=cor(df2)[1, 2])) } } boxplot(cor ~ w, data=corFrame, xlab='w', ylab='correlation coefficient')
Calculate the standard error of the mean.
se(x, na.rm = FALSE)
se(x, na.rm = FALSE)
x |
Numeric vector. |
na.rm |
Logical. If TRUE then remove |
Numeric.
link[stats]{sd}
se(1:100)
se(1:100)