Estimate Propensity Scores Using LBC-Net

`lbc_net` estimates propensity scores using the LBC-Net model, a deep learning method designed to enhance covariate balance. It integrates a Variational Autoencoder (VAE) with a customized neural network to estimate treatment probabilities for causal inference.

When an outcome `Y` is supplied, `lbc_net` also computes inverse probability weighted (IPW) estimates of the causal effect (ATE or ATT) and, when enabled, corresponding influence-function–based standard errors and confidence intervals.

Usage

lbc_net(
  data = NULL,
  formula = NULL,
  Z = NULL,
  Tr = NULL,
  Y = NULL,
  estimand = c("ATE", "ATT", "Y"),
  K = 99,
  rho = 0.15,
  na.action = na.fail,
  gpu = 0,
  show_progress = TRUE,
  ...,
  setup_lbcnet_args = list()
)

Arguments

data

an optional data frame, list, or environment (or an object coercible by `as.data.frame` to a data frame) containing the variables specified in `formula`. If a variable is not found in `data`, it is taken from `environment(formula)`, typically the environment from which `lbc_net()` is called. If `formula` is not provided, `Z` (a matrix of covariates) and `Tr` (a numeric treatment assignment vector) must be explicitly supplied.

formula

An object of class `"formula"` (or one that can be coerced to that class), specifying a symbolic description of the model in the form `Tr ~ X1 + X2 + ...`. If `formula` is provided, `Z` and `Tr` are extracted from `data`. If omitted, `Z` and `Tr` must be supplied explicitly.

Z

A numeric matrix or data frame of covariates. Required if `formula` is not provided. Each row represents an observation, and each column represents a covariate. If `formula` is used, `Z` is extracted from `data` automatically.

Tr

A numeric vector representing treatment assignment (typically 0/1). Required if `formula` is not provided. Must have the same number of rows as `Z`. If `formula` is used, `Tr` is extracted from `data` automatically.

Y

Optional numeric vector of observed outcomes. If provided, `lbc_net` will, in addition to estimating propensity scores, compute inverse probability weighted estimates of a causal estimand (e.g., ATE or ATT) using the fitted LBC-Net model. `Y` can be continuous or binary; the IPW formula is the same, only the interpretation differs.

estimand

Character string specifying the target estimand when an outcome `Y` is supplied. Available options are:

`"ATE"`: Average Treatment Effect. The frequency weight function $\omega^{*}(p_i) = 1$ targets the combined population.
`"ATT"`: Average Treatment Effect on the Treated. The frequency weight function $\omega^{*}(p_i) = p_i$ upweights units that are likely to be treated.
`"Y"`: Weighted mean outcome among the treated group, using IPW weights derived from the fitted propensity scores.

If `Y` is `NULL`, `estimand` is ignored and `lbc_net` only fits the propensity model. See **Details** for more information on ATT, ATE, and their corresponding weighting schemes.

K

an integer specifying the number of grid center points used to compute kernel weights in the local neighborhood. These weights are used to assess balance and calibration conditions. If specified, `ck` is automatically computed as `k / (K + 1)`, where `k = 1, ..., K`. The default is `99`, generating grid points from `0.01` to `0.99`. See **Details** for more information on kernel weights.

rho

a numeric value specifying the span used to determine the adaptive bandwidth `h` when `h` is not provided. The span controls the proportion of data included in the local neighborhood, ensuring a sufficient sample size for accurate training. The choice of `rho` influences local balance assessment and should be selected based on the data structure. While cross-validation can be used to approximate an optimal span, user discretion is advised. The default is `0.15`.

na.action

A function to specify the action to be taken if NAs are found. The default action is for the procedure to fail. An alternative is `na.omit`, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)

gpu

An integer specifying the GPU device ID for computation if CUDA is available. If set to `0`, the function will attempt to use the default GPU. If CUDA is unavailable, computations will automatically fall back to the CPU.

show_progress

A logical value indicating whether to display a progress bar during training. If `TRUE` (default), displays elapsed time, remaining time, loss values, and training speed (iterations per second). This helps monitor training progress efficiently. Set to `FALSE` to disable the display.

...

Additional parameters for model tuning, including:

`ck`

a numeric vector of kernel center points. Values should be strictly between `0` and `1`. If `NULL`, `ck` is automatically computed using the default `K`. If provided, user-defined grid points should adhere to the constraint `0 < ck < 1`.

`h`

A numeric vector of bandwidth values for kernel weighting. By default, an adaptive bandwidth is automatically computed via preliminary probabilities estimated using logistic regression (`glm`), given `ck` and the span parameter `rho`. If `NULL`, `h` is computed using span_bw.

`kernel`

A character string specifying the kernel function used for local weighting. The default is `"gaussian"`. Available options include:

`"gaussian"` (default): Ensures smooth weighting and continuity.
`"uniform"`: Assigns equal weight to all observations within a fixed bandwidth.
`"epanechnikov"`: Gives higher weight to observations closer to the kernel center.

See **Details** for more information on kernel weighting and its role in local balance estimation.

`seed`

An integer specifying the random seed for reproducibility in training the neural network. This seed is applied to PyTorch using `torch.manual_seed(seed)`, ensuring consistent results across runs when using stochastic optimization methods.

`hidden_dim`

An integer specifying the number of hidden units in the LBC-Net model. A rule of thumb is to set this as two to three times the number of covariates, but it should be significantly smaller than the sample size to prevent overfitting. Default is `100`.

`num_hidden_layers`

An integer specifying the number of hidden layers in the LBC-Net model. Default is `1`, which results in a three-layer network overall (input layer, one hidden layer, and output layer).

`vae_epochs`

An integer specifying the number of training epochs for the Variational Autoencoder (VAE). This determines how long the VAE component of the model is trained before being used in LBC-Net. Default is `250`.

`vae_lr`

A numeric value specifying the learning rate for training the VAE using the Adam optimizer. Controls how quickly the model updates its parameters during training. Default is `0.01`.

`max_epochs`

An integer specifying the maximum number of training epochs for LBC-Net. Early stopping is applied based on `lsd_threshold` to prevent unnecessary training. Default is `5000`.

`lr`

A numeric value specifying the initial learning rate for LBC-Net training using the Adam optimizer. The learning rate controls how much the model updates during each training step. Default is `0.05`.

`weight_decay`

A numeric value specifying the regularization parameter in the Adam optimizer for LBC-Net. Helps prevent overfitting by penalizing large weights in the model. Default is `1e-5`.

`balance_lambda`

A numeric value controlling the relative contributions of the local balance loss ($Q_1$) and calibration loss ($Q_2$) in the objective function, where the total loss is defined as $Q = Q_1 + \lambda Q_2$. Default is `1.0`.

`epsilon`

A small numeric value controlling the lower and upper bounds of the estimated propensity scores. The default is `0.001`, ensuring scores remain within $[\epsilon, 1 - \epsilon]$ for numerical stability, particularly in cases of poor overlap. Setting `epsilon = 0` reverts to the standard logit link function. See **Details** for more on its role in model stabilization.

`lsd_threshold`

A numeric value defining the stopping criterion based on the Local Standardized mean Difference (LSD). Training stops when the rolling average of the maximum local balance falls below this threshold. The default `lsd_threshold = 2` balances efficiency and precision. In cases of poor overlap or small sample sizes, a more relaxed threshold (e.g., `5%` or `10%`) may be used to allow more flexibility in training.

`rolling_window`

An integer specifying the number of recent epochs used to compute the rolling average of the maximum local balance. Default is `5`. The early stopping mechanism is triggered when the rolling average of the maximum LSD over the most recent `rolling_window` epochs falls below `lsd_threshold`. Specifically, at every 200-epoch step, the maximum local balance is calculated, and a rolling average over the last `rolling_window` steps is updated. Training halts when this rolling average drops below `lsd_threshold`, or when the predefined maximum epochs is reached, ensuring sufficient learning capacity.

setup_lbcnet_args

List. Optional arguments passed to setup_lbcnet for configuring the Python environment. If Python is not set up, `setup_lbcnet()` is automatically called using these parameters. Default is `list(envname = "r-lbcnet", create_if_missing = TRUE)`, meaning it will attempt to use the virtual environment `"r-lbcnet"` and create it if missing.

Value

An object of class `"lbc_net"`, containing:

`fitted.values`: Estimated propensity scores.
`weights`: Inverse probability weights (IPW).

Other model components (e.g., `losses`, `parameters`, `Z`, `Tr`) are accessible via `$` or the recommended getLBC function. While direct access (e.g., `fit$fitted.values`) is possible, using `getLBC(fit, "fitted.values")` is recommended for stability and future-proofing.

Details

This function optimizes the objective function $Q(\theta)$ using a feed-forward neural network with batch normalization after each layer. Rectified Linear Unit (ReLU) activations are used in hidden layers, while the output layer employs a modified sigmoid activation to ensure propensity scores remain bounded between $\epsilon$ and $1-\epsilon$:

$$ p(\mathbf{Z}) = \epsilon + (1 - 2\epsilon) \frac{\exp(S(\mathbf{Z}; \theta))}{1 + \exp(S(\mathbf{Z}; \theta))} $$

In well-overlapping distributions, $\epsilon = 0$ (logit link function) is effective, while for poor overlap, $\epsilon = 0.001$ stabilizes computation by preventing extreme probabilities (0 or 1). The default $\epsilon = 0.001$ works well in most cases.

If categorical covariates with more than two levels are included in `formula` or `Z`, users must manually convert them into dummy (one-hot encoded) variables before fitting the model. This can be done as follows:


data$cate <- factor(data$cate)
dummy_vars <- model.matrix(~ cate - 1, data = data)
data <- cbind(data, dummy_vars)
data$cate <- NULL

Optimization Process: The model is trained using Adaptive Moment Estimation (ADAM) optimization. Given the complexity of the objective function, a pre-training phase using a Variational Autoencoder (VAE) is incorporated to enhance feature representation. After pre-training, the encoder weights are transferred to initialize the LBC-Net. This initialization improves training stability and propensity score estimation.

Kernel Weighting & Local Inverse Probability Weights (IPW): To weigh observations in local neighborhoods, we use kernel smoothing to define local balance weights. The default choice in this package is the Gaussian kernel due to its smoothness and continuity, but users may specify alternative kernels.

The general form of the kernel weighting function is: $$\omega(c_k, x) = h_k^{-1} K\left( \frac{c_k - x}{h_k} \right)$$ where $h_k$ is the location-specific bandwidth, and $K(x)$ is the kernel function. The default Gaussian kernel is given by: $$K(x) = (2\pi)^{-1/2} \exp(-x^2/2)$$

Alternative Kernel Choices: Users can modify the kernel function for different weighting schemes:

Uniform Kernel: $$K(x) = 0.5 \times \mathbf{1}(|x| \leq 1)$$
Epanechnikov Kernel: $$K(x) = 0.75 (1 - x^2) \mathbf{1}(|x| \leq 1)$$

The Local Inverse Probability Weight (LIPW) at each local grid center $c_k$ is: $$ W_k(p_i) = \frac{ \omega(c_k, p_i)\omega^*(p_i) }{ T_i p_i + (1 - T_i)(1 - p_i) }, \quad i = 1, 2, ..., N. $$

The frequency weight function $\omega^{*}(p_i)$ determines the target population: setting $\omega^{*}(p_i) = 1$ yields the Average Treatment Effect (ATE), while choosing $\omega^{*}(p_i) = p_i$ results in the Average Treatment Effect on the Treated (ATT).

Training Considerations & Tuning:

- Poor Overlap Situations: If groups have poor overlap (see mirror_hist), achieving the minimum local balance may be difficult. In such cases, relax `lsd_threshold` and increase `max_epochs`.

- Tuning Neural Network Parameters: The local balance (`LSD`) and loss from summary can guide tuning. However, the default values generally work well in most cases.

- The LSD metric is used to evaluate local balance and guide hyperparameter tuning.

- During training, the model tracks LSD values to determine convergence.

These can be retrieved using: getLBC(object, "max_lsd"): Returns the maximum LSD at last epoch training; getLBC(object, "mean_lsd"): Returns the mean LSD.

- For final evaluation:

lsd(object): Computes the final LSD values based on the fitted model;
plot.lsd(object): Generates visualizations for LSD values across covariates;
plot_calib(object): Assesses the local calibration for estimated propensity scores.
mirror_hist(object): Mirror histogram for propensity score distribution between groups.

Examples