Getting started with functional statistical testing
getting-started-with-functional-statistical-testing.Rmd
Data simulation
Here are some functions used to simulate clustered trajectories of functional data associated to a spatial location.
Simulate a single trajectory
The functional data simulation process is described in [1, section 3.1].
simu_vec <- simul_traj(100)
plot(simu_vec, xlab = "point", ylab = "value")
Simulate trajectoris from two samples diverging by a delta
simu_data <- simul_data(
n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10,
delta_shape = "constant", distrib = "normal"
)
str(simu_data)
#> List of 4
#> $ mat_sample1: num [1:100, 1:50] 10 10 9.99 9.99 9.99 ...
#> $ mat_sample2: num [1:100, 1:75] 0 0.00555 0.01109 0.0166 0.02208 ...
#> $ c_val : num 10
#> $ delta_shape: chr "constant"
Graphical representation of simulated spatial clustered data
# constant delta
simu_data <- simul_data(
n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5,
delta_shape = "constant", distrib = "normal"
)
plot_simu(simu_data)
# linear delta
simu_data <- simul_data(
n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5,
delta_shape = "linear", distrib = "normal"
)
plot_simu(simu_data)
# quadratic delta
simu_data <- simul_data(
n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 5,
delta_shape = "quadratic", distrib = "normal"
)
plot_simu(simu_data)
Statistics
MO median statistic
The \(MO\) median statistic [1] is
implemented in the stat_mo()
function.
simu_data <- simul_data(
n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10,
delta_shape = "constant", distrib = "normal"
)
MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2
stat_mo(MatX, MatY)
#> [1] 0.9989697
MED median statistic
The \(MED\) median statistic [1] is
implemented in the stat_med()
function.
simu_data <- simul_data(
n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10,
delta_shape = "constant", distrib = "normal"
)
MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2
stat_med(MatX, MatY)
#> [1] 0.9988727
WMW statistic
The Wilcoxon-Mann-Whitney statistic [2] (noted \(WMW\) in [1]) is implemented in the
stat_wmw()
function.
simu_data <- simul_data(
n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10,
delta_shape = "constant", distrib = "normal"
)
MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2
stat_wmw(MatX, MatY)
#> [1] 0.9985697
HKR statistics
The Horváth-Kokoszka-Reeder statistics [3] (noted \(HKR1\) and \(HKR2\) in [1]) are implemented in the
stat_hkr()
function.
simu_data <- simul_data(
n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10,
delta_shape = "constant", distrib = "normal"
)
MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2
stat_hkr(MatX, MatY)
#> $T1
#> [1] 1.310621e+13
#>
#> $T2
#> [1] 293499
#>
#> $eigenval
#> [1] 3.669029e+01 3.995394e+00 1.592520e+00 5.617462e-01 3.323612e-01
#> [6] 1.369763e-01 8.771022e-02 6.388348e-02 2.889530e-02 1.551277e-02
#> [11] 7.677951e-03 6.838997e-03 2.466240e-03 1.652934e-03 1.114547e-03
#> [16] 3.785597e-04 1.305596e-04 9.480272e-06 1.318984e-11
CFF statistic
The Cuevas-Febrero-Fraiman statistic [4] (noted \(CFF\) in [1]) ais implemented in the
stat_cff()
function.
simu_data <- simul_data(
n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10,
delta_shape = "constant", distrib = "normal"
)
MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2
stat_cff(MatX, MatY)
#> [1] 509471.6
Compute multiple statistics
The function stat_val()
allows to compute multiple
stastitics defined above in a single call on the same data.
simu_data <- simul_data(
n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10,
delta_shape = "constant", distrib = "normal"
)
MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2
Permutation-based computation of p-values
simu_data <- simul_data(
n_point = 100, n_obs1 = 50, n_obs2 = 75, c_val = 10,
delta_shape = "constant", distrib = "normal"
)
MatX <- simu_data$mat_sample1
MatY <- simu_data$mat_sample2
References
[1] Zaineb Smida, Lionel Cucala, Ali Gannoun & Ghislain Durif
(2022) A median test for functional data, Journal of Nonparametric
Statistics, 34:2, 520-553, <DOI:10.1080/10485252.2022.2064997>
[2] Anirvan Chakraborty, Probal Chaudhuri, A Wilcoxon–Mann–Whitney-type test for infinite-dimensional data, Biometrika, Volume 102, Issue 1, March 2015, Pages 239–246, <DOI:10.1093/biomet/asu072>
[3] Horváth, L., Kokoszka, P., & Reeder, R. (2013). Estimation of
the mean of functional time series and a two-sample problem. Journal of
the Royal Statistical Society. Series B (Statistical Methodology),
75(1), 103–122.
[4] Antonio Cuevas, Manuel Febrero, Ricardo Fraiman, An anova test for functional data, Computational Statistics & Data Analysis, Volume 47, Issue 1, 2004, Pages 111-122, ISSN 0167-9473, <DOI:10.1016/j.csda.2003.10.021>