Title: | A Bias Bound Approach to Non-Parametric Inference |
---|---|
Description: | A novel bias-bound approach for non-parametric inference is introduced, focusing on both density and conditional expectation estimation. It constructs valid confidence intervals that account for the presence of a non-negligible bias and thus make it possible to perform inference with optimal mean squared error minimizing bandwidths. This package is based on Schennach (2020) <doi:10.1093/restud/rdz065>. |
Authors: | Xinyu DAI [aut, cre], Susanne M Schennach [aut] |
Maintainer: | Xinyu DAI <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0 |
Built: | 2025-02-17 02:43:53 UTC |
Source: | https://github.com/cran/rbbnp |
Estimates the density at a given point or across a range, and provides visualization options for density, bias, and confidence intervals.
biasBound_condExpectation( Y, X, x = NULL, h = 0.09, alpha = 0.05, est_Ar = NULL, resol = 100, xi_lb = NULL, xi_ub = NULL, methods_get_xi = "Schennach", if_plot_ft = FALSE, ora_Ar = NULL, if_plot_conditional_mean = TRUE, kernel.fun = "Schennach2004", if_approx_kernel = TRUE, kernel.resol = 1000 )
biasBound_condExpectation( Y, X, x = NULL, h = 0.09, alpha = 0.05, est_Ar = NULL, resol = 100, xi_lb = NULL, xi_ub = NULL, methods_get_xi = "Schennach", if_plot_ft = FALSE, ora_Ar = NULL, if_plot_conditional_mean = TRUE, kernel.fun = "Schennach2004", if_approx_kernel = TRUE, kernel.resol = 1000 )
Y |
A numerical vector of sample data. |
X |
A numerical vector of sample data. |
x |
Optional. A scalar or range of points where the density is estimated. If NULL, a range is automatically generated. |
h |
A scalar bandwidth parameter. |
alpha |
Confidence level for intervals. Default is 0.05. |
est_Ar |
Optional list of estimates for A and r. If NULL, they are computed using |
resol |
Resolution for the estimation range. Default is 100. |
xi_lb |
Optional. Lower bound for the interval of Fourier Transform frequency xi. Used for determining the range over which A and r is estimated. If NULL, it is automatically determined based on the methods_get_xi. |
xi_ub |
Optional. Upper bound for the interval of Fourier Transform frequency xi. Similar to xi_lb, it defines the upper range for A and r estimation. If NULL, the upper bound is determined based on the methods_get_xi. |
methods_get_xi |
A string specifying the method to automatically determine the xi interval if xi_lb and xi_ub are NULL. Options are "Schennach" and "Schennach_loose". If "Schennach" the range is selected based on the Theorem 2 in Schennach2020, if "Schennach_loose", it is defined by the initial interval given in Theorem 2 without selecting the xi_n. |
if_plot_ft |
Logical. If TRUE, plots the Fourier transform. |
ora_Ar |
Optional list of oracle values for A and r. |
if_plot_conditional_mean |
Logical. If TRUE, plots the conditional mean estimation. |
kernel.fun |
A string specifying the kernel function to be used. Options are "Schennach2004", "sinc", "normal", "epanechnikov". |
if_approx_kernel |
Logical. If TRUE, uses approximations for the kernel function. |
kernel.resol |
The resolution for kernel function approximation. See |
A list containing various outputs including estimated values, plots, and intervals.
# Example 1: point estimation of conditional expectation of Y on X biasBound_condExpectation( Y = sample_data$Y, X = sample_data$X, x = 1, h = 0.09, kernel.fun = "Schennach2004" ) # Example 2: conditional expectation of Y on X with manually selected range of xi # biasBound_condExpectation( # Y = sample_data$Y, # X = sample_data$X, # h = 0.09, # xi_lb = 1, # xi_ub = 12, # kernel.fun = "Schennach2004" # )
# Example 1: point estimation of conditional expectation of Y on X biasBound_condExpectation( Y = sample_data$Y, X = sample_data$X, x = 1, h = 0.09, kernel.fun = "Schennach2004" ) # Example 2: conditional expectation of Y on X with manually selected range of xi # biasBound_condExpectation( # Y = sample_data$Y, # X = sample_data$X, # h = 0.09, # xi_lb = 1, # xi_ub = 12, # kernel.fun = "Schennach2004" # )
Estimates the density at a given point or across a range, and provides visualization options for density, bias, and confidence intervals.
biasBound_density( X, x = NULL, h = 0.09, alpha = 0.05, resol = 100, xi_lb = NULL, xi_ub = NULL, methods_get_xi = "Schennach", if_plot_density = TRUE, if_plot_ft = FALSE, ora_Ar = NULL, kernel.fun = "Schennach2004", if_approx_kernel = TRUE, kernel.resol = 1000 )
biasBound_density( X, x = NULL, h = 0.09, alpha = 0.05, resol = 100, xi_lb = NULL, xi_ub = NULL, methods_get_xi = "Schennach", if_plot_density = TRUE, if_plot_ft = FALSE, ora_Ar = NULL, kernel.fun = "Schennach2004", if_approx_kernel = TRUE, kernel.resol = 1000 )
X |
A numerical vector of sample data. |
x |
Optional. A scalar or range of points where the density is estimated. If NULL, a range is automatically generated. |
h |
A scalar bandwidth parameter. |
alpha |
Confidence level for intervals. Default is 0.05. |
resol |
Resolution for the estimation range. Default is 100. |
xi_lb |
Optional. Lower bound for the interval of Fourier Transform frequency xi. Used for determining the range over which A and r is estimated. If NULL, it is automatically determined based on the methods_get_xi. |
xi_ub |
Optional. Upper bound for the interval of Fourier Transform frequency xi. Similar to xi_lb, it defines the upper range for A and r estimation. If NULL, the upper bound is determined based on the methods_get_xi. |
methods_get_xi |
A string specifying the method to automatically determine the xi interval if xi_lb and xi_ub are NULL. Options are "Schennach" and "Schennach_loose". If "Schennach" the range is selected based on the Theorem 2 in Schennach2020, if "Schennach_loose", it is defined by the initial interval given in Theorem 2 without selecting the xi_n. |
if_plot_density |
Logical. If TRUE, plots the density estimation. |
if_plot_ft |
Logical. If TRUE, plots the Fourier transform. |
ora_Ar |
Optional list of oracle values for A and r. |
kernel.fun |
A string specifying the kernel function to be used. Options are "Schennach2004", "sinc", "normal", "epanechnikov". |
if_approx_kernel |
Logical. If TRUE, uses approximations for the kernel function. |
kernel.resol |
The resolution for kernel function approximation. See |
A list containing various outputs including estimated values, plots, and intervals.
# Example 1: Specifying x for point estimation with manually selected xi range from 1 to 12 biasBound_density( X = sample_data$X, x = 1, h = 0.09, xi_lb = 1, xi_ub = 12, if_plot_ft = TRUE, kernel.fun = "Schennach2004" ) # Example 2: Density estimation with manually selected xi range from 1 to 12 xi_lb and xi_ub # biasBound_density( # X = sample_data$X, # h = 0.09, # xi_lb = 1, # xi_ub = 12, # if_plot_ft = FALSE, # kernel.fun = "Schennach2004" # ) # Example 3: Density estimation with automatically selected xi range via Theorem 2 in Schennach 2020 # biasBound_density( # X = sample_data$X, # h = 0.09, # methods_get_xi = "Schennach", # if_plot_ft = TRUE, # kernel.fun = "Schennach2004" # )
# Example 1: Specifying x for point estimation with manually selected xi range from 1 to 12 biasBound_density( X = sample_data$X, x = 1, h = 0.09, xi_lb = 1, xi_ub = 12, if_plot_ft = TRUE, kernel.fun = "Schennach2004" ) # Example 2: Density estimation with manually selected xi range from 1 to 12 xi_lb and xi_ub # biasBound_density( # X = sample_data$X, # h = 0.09, # xi_lb = 1, # xi_ub = 12, # if_plot_ft = FALSE, # kernel.fun = "Schennach2004" # ) # Example 3: Density estimation with automatically selected xi range via Theorem 2 in Schennach 2020 # biasBound_density( # X = sample_data$X, # h = 0.09, # methods_get_xi = "Schennach", # if_plot_ft = TRUE, # kernel.fun = "Schennach2004" # )
This variable provides the path to the data
folder within the package.
The path to the package's internal data folder as a character string.
Epanechnikov Kernel
epanechnikov_kernel(u)
epanechnikov_kernel(u)
u |
A numerical value or vector representing the input to the kernel function. |
Returns the value of the Epanechnikov kernel function at the given input.
Fourier Transform Epanechnikov Kernel
epanechnikov_kernel_ft(xi)
epanechnikov_kernel_ft(xi)
xi |
A numerical value or vector representing the frequency domain. |
Returns the value of the Fourier transform of the Epanechnikov kernel at the given frequency/frequencies.
This variable provides the path to the extdata
folder within the package,
where non-standard R data files are stored.
The path to the package's external data folder (for non-standard R data files) as a character string.
This function provides a lookup-based approximation for calculations that are computationally intensive. Once computed, it stores the results in an environment and uses linear interpolation for new data points to speed up subsequent computations.
fun_approx(u, u_lb = -100, u_ub = 100, resol = 1000, fun = W_kernel)
fun_approx(u, u_lb = -100, u_ub = 100, resol = 1000, fun = W_kernel)
u |
A vector of values where the function should be evaluated. |
u_lb |
Lower bound for the precomputed range. Defaults to -10. |
u_ub |
Upper bound for the precomputed range. Defaults to 10. |
resol |
The resolution or number of sample points in the precomputed range. Defaults to 1000. |
fun |
A function for which the approximation is computed. Defaults to the |
The fun_approx
function works by initially creating a lookup table of function values based on
the range specified by u_lb
and u_ub
and the resolution resol
. This precomputation only happens once
for a given set of parameters (u_lb
, u_ub
, resol
, and fun
). Subsequent calls to fun_approx
with the
same parameters use the lookup table to find the closest precomputed points to the requested u
values
and then return an interpolated result.
Linear interpolation is used between the two closest precomputed points in the lookup table. This ensures a smooth approximation for values in between sample points.
This function is especially useful for computationally intensive functions where recalculating
function values is expensive or time-consuming. By using a combination of precomputation and
interpolation, fun_approx
provides a balance between accuracy and speed.
A vector of approximated function values corresponding to u
.
This function used for generate some sample data for experiment
gen_sample_data(size, dgp, seed = NULL)
gen_sample_data(size, dgp, seed = NULL)
size |
control the sample size. |
dgp |
data generating process, have options "normal", "chisq", "mixed", "poly", "2_fold_uniform". |
seed |
random seed number. |
A numeric vector of length size
. The elements of the vector
are generated according to the specified dgp
:
Normally distributed values with mean 0 and standard deviation 2.
Chi-squared distributed values with df = 10.
Half normally distributed (mean 0, sd = 2) and half chi-squared distributed (df = 10) values.
Values from a polynomial cumulative distribution function on [0,1]
.
Sum of two uniformly distributed random numbers.
Computes the point estimate using the specified kernel function.
get_avg_f1x(X, x, h, inf_k)
get_avg_f1x(X, x, h, inf_k)
X |
A numerical vector of sample data. |
x |
A scalar representing the point where the density is estimated. |
h |
A scalar bandwidth parameter. |
inf_k |
Kernel function used for the computation. |
A scalar representing the kernel density estimate at point x.
Computes the point estimate using the specified kernel function.
get_avg_fyx(Y, X, x, h, inf_k)
get_avg_fyx(Y, X, x, h, inf_k)
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
x |
A scalar representing the point where the density is estimated. |
h |
A scalar bandwidth parameter. |
inf_k |
Kernel function used for the computation. |
A scalar representing the kernel density estimate at point x.
Compute Sample Average of Fourier Transform Magnitude
get_avg_phi(Y = 1, X, xi)
get_avg_phi(Y = 1, X, xi)
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
xi |
A single numerical value representing the frequency at which the Fourier transform is computed. |
Returns the sample estimation of expected Fourier transform at frequency xi
.
Compute log sample average of fourier transform and get mod
get_avg_phi_log(Y = 1, X, ln_xi)
get_avg_phi_log(Y = 1, X, ln_xi)
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
ln_xi |
A single numerical value representing the log frequency at which the Fourier transform is computed. |
Returns the log sample estimation of expected Fourier transform at frequency xi
.
get the conditional variance of Y on X for given x
get_conditional_var(X, Y, x, h, kernel_func)
get_conditional_var(X, Y, x, h, kernel_func)
X |
A numerical vector representing the sample data of variable X. |
Y |
A numerical vector representing the sample data of variable Y. |
x |
The specific point at which the conditional variance is to be calculated. |
h |
A bandwidth parameter used in the kernel function for smoothing. |
kernel_func |
A kernel function used to weigh observations in the neighborhood of point x. |
Returns a scalar representing the estimated conditional variance of Y given X at the point x.
This function estimates the parameters A and r by optimizing an objective function over a specified range of frequency values and r values.
get_est_Ar(Y = 1, X, xi_interval, r_stepsize = 150)
get_est_Ar(Y = 1, X, xi_interval, r_stepsize = 150)
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
xi_interval |
A list with elements |
r_stepsize |
An integer value representing the number of steps in the r range. This controls the granularity of the estimation. Higher values lead to finer granularity but increase computation time. |
The function internally defines a range for the natural logarithm of frequency values (ln_xi_range
)
and a range for the parameter r
(r_range
). It then defines an optimization function optim_ln_A
to minimize the integral of a given function over the ln_xi_range
. The actual estimation is done by
finding the r
and A
value that minimizes the the area of the line under the constraint that the line should not go below the Fourier transform curve.
A named vector with elements est_A
and est_r
representing the estimated
values of A and r, respectively.
get the estimation of B
get_est_B(Y)
get_est_B(Y)
Y |
A numerical vector representing the sample data of variable Y. |
The mean of the absolute values of the elements in Y, representing the estimated value of .
Computes the bias estimate for given parameters.
get_est_b1x(X, ...)
get_est_b1x(X, ...)
X |
A numerical vector representing the sample data of variable X. |
... |
Additional arguments passed to other methods. |
A scalar representing the bias b1x estimate.
Estimation of bias byx
get_est_byx(Y, X, ...)
get_est_byx(Y, X, ...)
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
... |
Additional arguments passed to other methods. |
A scalar representing the bias byx estimate.
get the estimation of Vy
get_est_vy(Y)
get_est_vy(Y)
Y |
A numerical vector representing the sample data of variable Y. |
Computes the sigma estimate for given parameters.
get_sigma(X, x, h, inf_k)
get_sigma(X, x, h, inf_k)
X |
A numerical vector of sample data. |
x |
A scalar representing the point where the density is estimated. |
h |
A scalar bandwidth parameter. |
inf_k |
Kernel function used for the computation. |
A scalar representing the sigma estimate at point x.
Estimation of sigma_yx
get_sigma_yx(Y, X, x, h, inf_k)
get_sigma_yx(Y, X, x, h, inf_k)
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
x |
The specific point at which sigma_yx is to be estimated. |
h |
A bandwidth parameter used in the kernel function for smoothing. |
inf_k |
A kernel function used to weigh observations in the neighborhood of point x. |
Returns a scalar representing the estimated value of sigma_yx at the point x.
get xi interval
get_xi_interval(Y = 1, X, methods = "Schennach")
get_xi_interval(Y = 1, X, methods = "Schennach")
Y |
A numerical vector representing the sample data of variable Y. |
X |
A numerical vector representing the sample data of variable X. |
methods |
A character string indicating the method to use for calculating the xi interval. Supported methods are "Schennach" and "Schennach_loose". Defaults to "Schennach". |
The "Schennach" method computes the xi interval by performing a test based on the
Schennach's theorem, adjusting the upper bound xi_ub
if the test condition is met.
The "Schennach_loose" method provides a looser calculation of the xi interval without
performing the Schennach's test.
A list containing the lower (xi_lb
) and upper (xi_ub
) bounds of the xi interval.
Kernel Regression function
kernel_reg(X, Y, x, h, kernel_func)
kernel_reg(X, Y, x, h, kernel_func)
X |
A numerical vector representing the sample data of variable X. |
Y |
A numerical vector representing the sample data of variable Y. |
x |
The point at which the regression function is to be estimated. |
h |
A bandwidth parameter that determines the weight assigned to each observation in X. |
kernel_func |
A function that computes the weight of each observation based on its distance to x. |
Returns a scalar representing the estimated value of the regression function at the point x.
Normal Kernel Function
normal_kernel(u)
normal_kernel(u)
u |
A numerical value or vector representing the input to the kernel function. |
Returns the value of the Normal kernel function at the given input.
Fourier Transform of Normal Kernel
normal_kernel_ft(xi)
normal_kernel_ft(xi)
xi |
A numerical value or vector representing the frequency domain. |
Returns the value of the Fourier transform of the Normal kernel at the given frequency/frequencies.
Plot the Fourier Transform of the
plot_ft(X, xi_interval, ft_plot.resol = 500)
plot_ft(X, xi_interval, ft_plot.resol = 500)
X |
A numerical vector of sample data. |
xi_interval |
A list containing the lower ( |
ft_plot.resol |
An integer representing the resolution of the plot, specifically the number of points used to represent the Fourier transform. Defaults to 500. |
C = 1, the parameter in , see more details in Schennach (2020) doi:10.1093/restud/rdz065.
A ggplot object representing the plot of the Fourier transform.
plot_ft( sample_data$X, xi_interval = list(xi_lb = 1, xi_ub = 50), ft_plot.resol = 1000 )
plot_ft( sample_data$X, xi_interval = list(xi_lb = 1, xi_ub = 50), ft_plot.resol = 1000 )
Generate n samples from the distribution
rpoly01(n, k = 5)
rpoly01(n, k = 5)
n |
The number of samples to generate. |
k |
The exponent in the distribution function, defaults to 5. |
A vector of n
samples from the specified polynomial distribution.
CDF: f(x) = (x-1)^k + 1
Sample Data
sample_data
sample_data
A data frame with 1000 rows and 2 variables:
Numeric vector, generated from 2 fold uniform distribution.
Numeric vector, Y = -X^2 + 3*X + rnorm(1000)*X
.
Infinite Kernel Function
sinc(u)
sinc(u)
u |
A numerical value or vector where the sinc function is evaluated. |
The value of the sinc function at each point in u
.
Define the closed form FT of the infinite order kernel sin(x)/(pi*x)
sinc_ft(x)
sinc_ft(x)
x |
A numerical value or vector where the Fourier Transform is evaluated. |
The value of the Fourier Transform of the sinc function at each point in x
.
True density of 2-fold uniform distribution
true_density_2fold(x)
true_density_2fold(x)
x |
A numerical value or vector where the true density function is evaluated. |
The value of the true density of the 2-fold uniform distribution at each point in x
.
Define the inverse Fourier transform function of W
W_kernel(u, L = 10)
W_kernel(u, L = 10)
u |
A numerical value or vector representing the time or space domain. |
L |
The limit for numerical integration, defines the range of integration as |
A numerical value or vector representing the inverse Fourier transform of the infinite order kernel at the given time or space point(s).
Define the Fourier transform of a infinite kernel proposed in Schennach 2004
W_kernel_ft(xi, xi_lb = 0.5, xi_ub = 1.5)
W_kernel_ft(xi, xi_lb = 0.5, xi_ub = 1.5)
xi |
A numerical value or vector representing the frequency domain. |
xi_lb |
The lower bound for the frequency domain. Defaults to 0.5. |
xi_ub |
The upper bound for the frequency domain. Defaults to 1.5. |
A numerical value or vector representing the Fourier transform of the infinite order kernel at the given frequency/frequencies.