Hurdle Demand Models

Introduction

The beezdemand package includes functionality for fitting two-part mixed effects hurdle demand models using Template Model Builder (TMB). This approach is particularly useful when:

Zero consumption values are informative (i.e., represent true non-consumption rather than censored data)
You want to model both the probability of consuming and the intensity of consumption simultaneously
Individual-level heterogeneity is important for both parts of the model

When to Use Hurdle Models vs. Standard Models

Scenario	Recommended Approach
Few zeros, zeros are measurement artifacts	`fit_demand_fixed()` or `fit_demand_mixed()`
Many zeros, zeros represent true non-consumption	`fit_demand_hurdle()`
Need to understand factors affecting whether someone consumes	`fit_demand_hurdle()`
Need individual-level estimates of “quitting price”	`fit_demand_hurdle()`

Model Specification

The hurdle model has two parts:

Part I: Binary Model (Probability of Zero Consumption)

\[\text{logit}(\pi_{ij}) = \beta_0 + \beta_1 \cdot \log(\text{price} + \epsilon) + a_i\]

Where:

\(\pi_{ij}\) = probability of zero consumption for subject \(i\) at price \(j\)
\(\beta_0\) = intercept (baseline log-odds of zero consumption)
\(\beta_1\) = slope (effect of log-price on log-odds of zero)
\(\epsilon\) = small constant (default 0.001) for handling zero prices
\(a_i\) = subject-specific random intercept

Part II: Continuous Model (Consumption Given Positive)

With 3 random effects: \[\log(Q_{ij}) = (\log Q_0 + b_i) + k \cdot (\exp(-\alpha_i \cdot \text{price}) - 1) + \varepsilon_{ij}, \qquad \alpha_i = \exp(\log\alpha + c_i)\]

With 2 random effects: \[\log(Q_{ij}) = (\log Q_0 + b_i) + k \cdot (\exp(-\alpha \cdot \text{price}) - 1) + \varepsilon_{ij}, \qquad \alpha = \exp(\log\alpha)\]

Where:

\(Q_0\) = intensity (consumption at price 0)
\(k\) = scaling parameter for exponential decay
\(\alpha\) = elasticity parameter, estimated in log space (\(\log\alpha\)) so that \(\alpha > 0\)
\(b_i\) = subject-specific random effect on log-intensity
\(c_i\) = subject-specific random effect on log-elasticity (3-RE model only). It is multiplicative on the natural scale: \(\alpha_i = \exp(\log\alpha + c_i)\), which keeps each subject’s elasticity positive.

Random Effects Structure

The random effects follow a multivariate normal distribution:

2-RE model: \((a_i, b_i) \sim \text{MVN}(0, \Sigma_{2 \times 2})\)
3-RE model: \((a_i, b_i, c_i) \sim \text{MVN}(0, \Sigma_{3 \times 3})\)

Getting Started

Data Requirements

Your data should be in long format with columns for:

Subject identifier
Price
Consumption (including zeros)

library(beezdemand)

# Example data structure
knitr::kable(
    head(apt, 10),
    caption = "Example APT data structure (first 10 rows)"
)

Example APT data structure (first 10 rows)
id	x	y	group
19	0.0	10	a
19	0.5	10	a
19	1.0	10	a
19	1.5	8	a
19	2.0	8	a
19	2.5	8	a
19	3.0	7	a
19	4.0	7	a
19	5.0	7	a
19	6.0	6	a

Basic Model Fitting

# Fit 2-RE model (simpler, faster)
fit2 <- fit_demand_hurdle(
    data = apt,
    y_var = "y",
    x_var = "x",
    id_var = "id",
    random_effects = c("zeros", "q0"), # 2 random effects
    verbose = 0
)

# Fit 3-RE model (more flexible)
fit3 <- fit_demand_hurdle(
    data = apt,
    y_var = "y",
    x_var = "x",
    id_var = "id",
    random_effects = c("zeros", "q0", "alpha"), # 3 random effects
    verbose = 1
)

Interpreting Output

# View summary
summary(fit2)
#> 
#> Two-Part Mixed Effects Hurdle Demand Model
#> ============================================
#> 
#> Call:
#> fit_demand_hurdle(data = apt, y_var = "y", x_var = "x", id_var = "id", 
#>     random_effects = c("zeros", "q0"), verbose = 0)
#> 
#> Convergence: Yes 
#> Number of subjects: 10 
#> Number of observations: 160 
#> Random effects: 2 (zeros, q0) 
#> 
#> Fixed Effects:
#> --------------
#>              Estimate Std. Error z value
#> beta0      -392.08424  146.77064  -2.671
#> beta1       135.79403   50.43489   2.692
#> log_q0        1.93813    0.11712  16.548
#> log_k         0.55505    0.09013   6.158
#> log_alpha    -2.31478    0.16822 -13.760
#> logsigma_a    5.76791    1.48287   3.890
#> logsigma_b   -1.04114    0.22836  -4.559
#> logsigma_e   -1.63184    0.06063 -26.914
#> rho_ab_raw   -0.19191    0.23754  -0.808
#> 
#> Variance Components:
#> --------------------
#>           Estimate  Std. Error
#> alpha       0.0988      0.0166
#> k           1.7420      0.1570
#> var_a  102316.4496 303443.2535
#> var_b       0.1246      0.0569
#> cov_ab    -21.4106     12.6866
#> var_e       0.0382      0.0046
#> 
#> Correlations:
#> -------------
#>        Estimate Std. Error
#> rho_ab  -0.1896      0.229
#> 
#> Model Fit:
#> ----------
#>   Log-likelihood: 2.31
#>   AIC: 13.38
#>   BIC: 41.06
#> 
#> Demand Metrics (Group-Level):
#> -----------------------------
#>   Pmax (price at max expenditure): 20.0000
#>   Omax (max expenditure): 30.9810
#>   Q at Pmax: 1.5491
#>   Elasticity at Pmax: -0.4772
#>   Method: numerical_optimize_observed_domain
#> 
#> Derived Parameters (Individual-Level Summary):
#> ----------------------------------------------
#>   Q0 (Intensity):
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   3.622   5.496   7.585   7.365   8.544  11.623 
#>   Alpha:
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> 0.09879 0.09879 0.09879 0.09879 0.09879 0.09879 
#>   Breakpoint:
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   7.591  13.549  16.183  17.986  21.796  34.419 
#>   Pmax:
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>      20      20      20      20      20      20 
#>   Omax:
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   16.15   24.51   33.83   32.85   38.11   51.85

# Extract coefficients
coef(fit2)
#>         beta0         beta1    logsigma_a    logsigma_b    logsigma_e 
#> -392.08424414  135.79402761    5.76791287   -1.04113716   -1.63183781 
#>    rho_ab_raw            Q0         alpha             k 
#>   -0.19191230    6.94572447    0.09878809    1.74202549

# Standardized tidy summaries
tidy(fit2) |> head()
#> # A tibble: 6 × 9
#>   term        estimate std.error statistic  p.value component     estimate_scale
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl> <chr>         <chr>         
#> 1 beta0      -392.      147.         -2.67 7.55e- 3 zero_probabi… logit         
#> 2 beta1       136.       50.4         2.69 7.09e- 3 zero_probabi… logit         
#> 3 Q0            6.95      0.813      16.5  1.65e-61 consumption   natural       
#> 4 k             1.74      0.157       6.16 7.34e-10 consumption   natural       
#> 5 alpha         0.0988    0.0166    -13.8  4.43e-43 consumption   natural       
#> 6 logsigma_a    5.77      1.48        3.89 1.00e- 4 variance      natural       
#> # ℹ 2 more variables: term_display <chr>, estimate_internal <dbl>
glance(fit2)
#> # A tibble: 1 × 9
#>   model_class   backend  nobs n_subjects n_random_effects converged logLik   AIC
#>   <chr>         <chr>   <int>      <int>            <int> <lgl>      <dbl> <dbl>
#> 1 beezdemand_h… TMB_hu…   160         10                2 TRUE        2.31  13.4
#> # ℹ 1 more variable: BIC <dbl>

# Get subject-specific parameters
head(get_subject_pars(fit2))
#>    id       a_i        b_i        Q0      alpha breakpoint Pmax     Omax
#> 1  19 -88.44328  0.5148913 11.623368 0.09878809  34.419417   20 51.84539
#> 2  30 -21.33116 -0.6512299  3.621529 0.09878809  20.997058   20 16.15363
#> 3  38  40.35748 -0.2348862  5.491712 0.09878809  13.330758   20 24.49548
#> 4  60  31.73720  0.1663613  8.202899 0.09878809  14.204504   20 36.58857
#> 5  68  31.19351  0.4228981 10.601806 0.09878809  14.261495   20 47.28877
#> 6 106 116.82464 -0.2317221  5.509116 0.09878809   7.590564   20 24.57311
#>   Pmax_unconditional Omax_unconditional
#> 1          19.999999           51.84539
#> 2          19.999999           16.13182
#> 3          12.765578           20.06101
#> 4          13.605392           30.70281
#> 5          13.660199           39.74372
#> 6           7.270466           16.35863

Diagnostics

Use check_demand_model() and the plotting helpers for quick post-fit checks.

check_demand_model(fit2)
#> 
#> Model Diagnostics
#> ================================================== 
#> Model class: beezdemand_hurdle 
#> 
#> Convergence:
#>   Status: Converged
#> 
#> Random Effects:
#> 
#> Residuals:
#>   Mean: -0.008684
#>   SD: 0.9478
#>   Range: [-3.129, 2.549]
#>   Outliers: 2 observations
#> 
#> --------------------------------------------------
#> Issues Detected (1):
#>   1. Detected 2 potential outliers (|resid| > 3)
#> 
#> Recommendations:
#>   - Investigate outlying observations
plot_residuals(fit2)$fitted
#> NULL
plot_qq(fit2)

Understanding Results

Fixed Effects

Parameter	Interpretation
`beta0`	Part I intercept: baseline log-odds of zero consumption
`beta1`	Part I slope: change in log-odds per unit increase in log(price)
`logQ0`	Log of intensity parameter (population average)
`k`	Scaling parameter for demand decay
`alpha`	Population elasticity \(\exp(\log\alpha)\); subject values are \(\exp(\log\alpha + c_i)\) in the 3-RE model

Subject-Specific Parameters

The subject_pars data frame contains:

Parameter	Description
`a_i`	Random effect for Part I (zeros probability)
`b_i`	Random effect for Part II (intensity)
`c_i`	Random effect for alpha (3-RE model only)
`Q0`	Individual intensity: \(\exp(\log Q_0 + b_i)\)
`alpha`	Individual elasticity: \(\exp(\log\alpha + c_i)\) (or \(\exp(\log\alpha)\) for 2-RE)
`breakpoint`	Price where P(zero) = 0.5: \(\exp(-(\beta_0 + a_i) / \beta_1) - \epsilon\)
`Pmax`	Price at maximum expenditure
`Omax`	Maximum expenditure

Model Selection: 2-RE vs 3-RE

When to Use Each

2-RE model: When you believe elasticity (\(\alpha\)) is relatively constant across subjects
3-RE model: When you believe elasticity varies meaningfully between subjects

Likelihood Ratio Test

# Compare nested models
compare_hurdle_models(fit3, fit2)

# Unified model comparison (AIC/BIC + LRT when appropriate)
compare_models(fit3, fit2)

# Output:
# Likelihood Ratio Test
# =====================
#            Model n_RE    LogLik df     AIC     BIC
#    Full (3 RE)    3 -1234.56 12 2493.12 2543.21
# Reduced (2 RE)    2 -1245.78  9 2509.56 2547.89
#
# LR statistic: 22.44
# df: 3
# p-value: 5.2e-05

A significant p-value suggests the 3-RE model provides a better fit.

Predicting and Scoring

A hurdle model carries two different “predicted consumption” quantities, and choosing the wrong one quietly distorts any score computed against observed data:

type = "demand" (the default since beezdemand 0.3.0) is the marginal expectation (1 - P0) * E[Y | Y > 0]. Observed consumption includes zeros, so this is the quantity to score against raw data (cross-validation error, calibration checks, or model comparison on predictions).
type = "response" is the conditional positive mean E[Y | Y > 0]: expected consumption given that any is purchased. Scoring raw data with it systematically overstates prediction error wherever the probability of zero consumption is large (typically at high prices).

# Marginal expectation (the default): score this against observed y
pred <- predict(fit2, type = "demand", prices = c(0.5, 1, 5, 10))
pred[, c("x", "prob_zero", "predicted_consumption", ".fitted")]
#> # A tibble: 40 × 4
#>        x prob_zero predicted_consumption .fitted
#>    <dbl>     <dbl>                 <dbl>   <dbl>
#>  1   0.5 3.54e-250                 10.9    10.9 
#>  2   0.5 4.96e-221                  3.39    3.39
#>  3   0.5 3.07e-194                  5.15    5.15
#>  4   0.5 5.53e-198                  7.69    7.69
#>  5   0.5 3.21e-198                  9.94    9.94
#>  6   0.5 4.97e-161                  5.16    5.16
#>  7   0.5 3.35e-228                  8.11    8.11
#>  8   0.5 5.98e-224                  7.66    7.66
#>  9   0.5 2.75e-212                  6.56    6.56
#> 10   0.5 7.71e-184                  4.48    4.48
#> # ℹ 30 more rows

The prob_zero and predicted_consumption columns expose the two parts; .fitted is their product (1 - prob_zero) * predicted_consumption. See the Scoring predictions section of ?predict.beezdemand_hurdle for details.

Visualization

# Population demand curve
plot(fit2, type = "demand")

Population demand curve from 2-RE hurdle model.

# Probability of zero consumption
plot(fit2, type = "probability")

Probability of zero consumption as a function of price.

# Distribution of individual parameters
plot(fit2, type = "parameters")
plot(fit2, type = "parameters", parameters = c("Q0", "alpha", "Pmax"))

# Individual demand curves
plot(fit2, type = "individual", subjects = c("1", "2", "3", "4"))

# Single subject with observed data
plot_subject(fit2, subject_id = "1", show_data = TRUE, show_population = TRUE)

Simulation and Validation

Simulating Data

The simulate_hurdle_data() function generates data from the hurdle model:

# Simulate with default parameters
sim_data <- simulate_hurdle_data(
    n_subjects = 100,
    seed = 123
)

head(sim_data)
#   id    x         y delta       a_i        b_i
# 1  1 0.00 12.345678     0 -0.523456  0.1234567
# 2  1 0.50 11.234567     0 -0.523456  0.1234567
# ...

# Custom parameters
sim_custom <- simulate_hurdle_data(
    n_subjects = 100,
    logQ0 = log(15), # Q0 = 15
    alpha = 0.1, # Lower elasticity
    k = 3, # Higher k (ensures Pmax exists)
    stop_at_zero = FALSE, # All prices for all subjects
    seed = 456
)

Monte Carlo Simulation Studies

The run_hurdle_monte_carlo() function assesses model performance through simulation.

Note: Monte Carlo simulations are computationally intensive and not run during vignette building. The example below shows typical usage and expected output format.

# Run Monte Carlo study
mc_results <- run_hurdle_monte_carlo(
    n_sim = 100, # Number of simulations
    n_subjects = 100, # Subjects per simulation
    n_random_effects = 2, # 2-RE model
    verbose = TRUE,
    seed = 123
)

# View summary
print_mc_summary(mc_results)

# Monte Carlo Simulation Summary
# ==============================
#
# Simulations: 100 attempted, 95 converged (95.0%)
#
#    Parameter True Mean_Est   Bias Rel_Bias% Emp_SE Mean_SE SE_Ratio Coverage_95%  N
#        beta0 -2.0    -2.01  -0.01      0.5   0.12    0.11     0.92         94.7 95
#        beta1  1.0     1.02   0.02      2.0   0.08    0.08     1.00         95.8 95
#        logQ0  2.3     2.29  -0.01     -0.4   0.05    0.05     1.00         94.7 95
#            k  2.0     2.03   0.03      1.5   0.15    0.14     0.93         93.7 95
#        alpha  0.5     0.51   0.01      2.0   0.04    0.04     1.00         95.8 95
# ...

Interpreting Monte Carlo Results

Metric	Target	Interpretation
Bias	~0	Estimates should be unbiased
Relative Bias	< 5%	Acceptable bias relative to true value
SE Ratio	~1.0	Model SEs match empirical variability
Coverage 95%	~95%	CIs should contain true value 95% of time

Integration with beezdemand Workflow

Combining with Other Analyses

# Fit hurdle model
hurdle_fit <- fit_demand_hurdle(data,
    y_var = "y", x_var = "x", id_var = "id",
    random_effects = c("zeros", "q0"), verbose = 0
)

# Extract subject parameters
hurdle_pars <- get_subject_pars(hurdle_fit)

# These can be merged with other analyses
# e.g., correlate with individual characteristics
cor(hurdle_pars$Q0, subject_characteristics$age)
cor(hurdle_pars$breakpoint, subject_characteristics$dependence_score)

Exporting Results

# Subject parameters
write.csv(get_subject_pars(hurdle_fit), "hurdle_subject_parameters.csv")

# Summary statistics
summ <- get_hurdle_param_summary(hurdle_fit)
print(summ)

Technical Details

TMB Backend

The hurdle model is implemented using Template Model Builder (TMB), which provides:

Efficient Laplace approximation for marginal likelihood
Automatic differentiation for fast optimization
Standard errors via the delta method

Control Parameters

fit <- fit_demand_hurdle(
    data,
    y_var = "y",
    x_var = "x",
    id_var = "id",
    epsilon = 0.001, # Constant for log(price + epsilon)
    tmb_control = list(
        max_iter = 200, # Maximum iterations
        eval_max = 1000, # Maximum function evaluations
        trace = 0 # Optimization trace level
    ),
    verbose = 1 # 0 = silent, 1 = progress, 2 = detailed
)

Custom Starting Values

For difficult optimization problems:

custom_starts <- list(
    beta0 = -3.0,
    beta1 = 1.5,
    logQ0 = log(8),
    k = 2.5,
    alpha = 0.1,
    logsigma_a = 0.5,
    logsigma_b = -0.5,
    logsigma_e = -1.0,
    rho_ab_raw = 0
)

fit <- fit_demand_hurdle(data,
    y_var = "y", x_var = "x", id_var = "id",
    start_values = custom_starts, verbose = 1
)

References

Zhao, T., Luo, X., Chu, H., Le, C.T., Epstein, L.H. and Thomas, J.L. (2016), A two-part mixed effects model for cigarette purchase task data. Jrnl Exper Analysis Behavior, 106: 242-253. https://doi.org/10.1002/jeab.228

Hursh, S. R., & Silberberg, A. (2008). Economic demand and essential value. Psychological Review, 115(1), 186-198.

id	x	y	group
19	0.0	10	a
19	0.5	10	a
19	1.0	10	a
19	1.5	8	a
19	2.0	8	a
19	2.5	8	a
19	3.0	7	a
19	4.0	7	a
19	5.0	7	a
19	6.0	6	a

id	x	y	group
19	0.0	10	a
19	0.5	10	a
19	1.0	10	a
19	1.5	8	a
19	2.0	8	a
19	2.5	8	a
19	3.0	7	a
19	4.0	7	a
19	5.0	7	a
19	6.0	6	a