--- title: "Modeling delay-discounting indifference points with bounded error distributions" author: "Brent Kaplan" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Modeling delay-discounting indifference points with bounded error distributions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.2) set.seed(1) has_tmb <- requireNamespace("TMB", quietly = TRUE) library(beezdiscounting) ``` ## Why indifference points need a bounded error distribution A delay-discounting task asks a person to trade a smaller-sooner reward against a larger-later one across a range of delays. The **indifference point** (IP) at each delay is the smaller-sooner amount that feels equivalent to the later reward, expressed as a *proportion of that later reward*. By construction an IP lives on the closed interval $[0, 1]$: - $y = 1$: the person is indifferent only when the immediate amount equals the full later reward (**no discounting**). - $y = 0$: any immediate amount, however small, is preferred (**complete discounting**). Real titration data pile up at both ends, and the example data shipped with this package (`data(dd_ip)`) reproduce the pattern: a meaningful fraction of observations sit *exactly* at 0 or 1. That single fact rules out the two error distributions people reach for first: - **Gaussian / nonlinear least squares** treats residuals as unbounded and homoscedastic. It can predict an IP below 0 or above 1, and it assumes the noise is the same size near the bounds as in the middle, which is never true for a proportion. - **Standard beta regression** is the natural model for a proportion, but its density is *undefined* at exactly 0 and 1 (the log-likelihood is $-\infty$). Every boundary observation has to be discarded or nudged, which biases the fit. The **Scale-Location-Truncated beta (SLT-beta)** distribution (Kim, Koffarnus & Franck, 2024; Kim, Kaplan, Koffarnus & Franck, 2025) keeps the beta's natural fit to proportion data but assigns *finite probability* to the endpoints, so 0 and 1 are modeled rather than thrown away. ## What the SLT-beta density is Write the mean of the IP at delay $D$ as a discounting function of a single rate $k$, using either Mazur's hyperbola $\mu = 1/(1 + kD)$ or the exponential $\mu = e^{-kD}$, and let $\phi$ be a precision parameter (larger $\phi$ gives a tighter spread around the curve). With shape parameters $a = \mu\phi$ and $b = (1-\mu)\phi$, and two *fixed* micro-constants $s = 1{+}10^{-7}$ and $l = 10^{-8}$, the log-density of an observed IP $y \in [0,1]$ is $$ \log f(y \mid \mu, \phi) = \underbrace{\log\Gamma(a{+}b) - \log\Gamma(a) - \log\Gamma(b)}_{-\log B(a,b)} + (a{-}1)\log\!\big(\tfrac{y}{s}{+}l\big) + (b{-}1)\log\!\big(1 - \big(\tfrac{y}{s}{+}l\big)\big) - \log s - \log Z, $$ where the truncation normalizer is $Z = \mathrm{pbeta}(\tfrac{1}{s}{+}l,\,a,\,b) - \mathrm{pbeta}(l,\,a,\,b)$. The intuition is a *microscopic stretch*: the data window $y\in[0,1]$ is mapped just inside the open interval where the ordinary beta density is finite; the kernel is therefore never evaluated at exactly 0 or 1. As $s\to 1$ and $l\to 0$ the SLT-beta collapses to the ordinary beta. ```{r density, fig.cap = "SLT-beta (solid) vs. ordinary beta (dashed). The SLT-beta carries finite density to y = 0 and y = 1; the ordinary beta diverges or vanishes there."} slt_logpdf <- function(y, mu, phi, s = 1.0000001, l = 1e-8) { a <- mu * phi; b <- (1 - mu) * phi lgamma(a + b) - lgamma(a) - lgamma(b) + (a - 1) * log(y / s + l) + (b - 1) * log(1 - (y / s + l)) - log(s) - log(pbeta(1 / s + l, a, b) - pbeta(l, a, b)) } yy <- seq(0, 1, length.out = 400) op <- par(mfrow = c(1, 2), mar = c(4, 4, 2, 1)) for (p in list(c(mu = 0.5, phi = 4), c(mu = 0.15, phi = 6))) { plot(yy, exp(slt_logpdf(yy, p["mu"], p["phi"])), type = "l", lwd = 2, xlab = "indifference point", ylab = "density", main = sprintf("mu = %.2f, phi = %g", p["mu"], p["phi"])) lines(yy, dbeta(yy, p["mu"] * p["phi"], (1 - p["mu"]) * p["phi"]), lty = 2) points(c(0, 1), exp(slt_logpdf(c(0, 1), p["mu"], p["phi"])), pch = 16) } par(op) ``` Because the variance of a beta is $\mu(1-\mu)/(\phi+1)$, the SLT-beta automatically gives **less** noise near the floor and ceiling and **more** in the middle: the heteroscedastic pattern real IP data actually show, and exactly what a Gaussian model gets wrong. ## The boundary problem, in the example data The package ships `dd_ip`, a synthetic delay-discounting dataset (100 subjects across six delays, in the `id`, `x` (delay), `y` (indifference point) long format) that reproduces the boundary pattern of real titration data. A handful of synthetic values fall just above 1 and are clamped to the ceiling. ```{r load-data} data(dd_ip) dd_ip$id <- factor(dd_ip$id) dd_ip$y <- pmin(dd_ip$y, 1) # clamp a few synthetic values to the [0, 1] ceiling c(observations = nrow(dd_ip), subjects = nlevels(dd_ip$id), at_floor_or_ceiling = sum(dd_ip$y %in% c(0, 1))) ``` Many observations sit at exactly 0 and 1. Counting subjects with at least one boundary value, and asking which error models can even evaluate them: ```{r boundary-demo} beta_logpdf <- function(y, mu, phi) { # ordinary beta: -Inf at y in {0,1} a <- mu * phi; b <- (1 - mu) * phi lgamma(a + b) - lgamma(a) - lgamma(b) + (a - 1) * log(y) + (b - 1) * log(1 - y) } ids <- levels(dd_ip$id) yof <- function(i) dd_ip$y[dd_ip$id == i] bnd <- vapply(ids, function(i) any(yof(i) %in% c(0, 1)), logical(1)) beta_ok <- vapply(ids, function(i) all(is.finite(beta_logpdf(yof(i), 0.5, 8))), logical(1)) slt_ok <- vapply(ids, function(i) all(is.finite(slt_logpdf(yof(i), 0.5, 8))), logical(1)) data.frame( subjects = length(ids), with_boundary_IP = sum(bnd), evaluable_under_ordinary_beta = sum(beta_ok), evaluable_under_SLT_beta = sum(slt_ok) ) ``` Every subject with a boundary value is lost to ordinary beta regression and retained by the SLT-beta. Fitting each subject individually (estimating $\log k$ and $\log\phi$ so that both stay positive) and comparing the recovered rate to nonlinear least squares confirms that the SLT-beta agrees with NLS on the bulk of subjects *and* succeeds on the boundary subjects that NLS-plus-beta could not model: ```{r per-subject-fit} fit_slt <- function(y, D) { # per-subject SLT MLE (log k, log phi) nll <- function(th) { mu <- pmin(pmax(1 / (1 + exp(th[1]) * D), 1e-6), 1 - 1e-6) v <- -sum(slt_logpdf(y, mu, exp(th[2]))); if (is.finite(v)) v else 1e10 } o <- optim(c(log(0.01), log(8)), nll, control = list(reltol = 1e-11)) exp(o$par[1]) } nls_k <- function(y, D) { # multi-start NLS (single start is fragile) for (st in c(exp(-10), 1e-3, 1e-2, 1e-1)) { m <- tryCatch(nls(y ~ 1 / (1 + k * D), start = list(k = st)), error = function(e) NULL) if (!is.null(m)) return(coef(m)[["k"]]) } NA_real_ } xof <- function(i) dd_ip$x[dd_ip$id == i] k_slt <- vapply(ids, function(i) fit_slt(yof(i), xof(i)), numeric(1)) k_nls <- vapply(ids, function(i) nls_k(yof(i), xof(i)), numeric(1)) sprintf("cor(log k): SLT vs NLS = %.3f | median k: SLT = %.4f, NLS = %.4f", cor(log(k_slt), log(k_nls), use = "complete.obs"), median(k_slt), median(k_nls, na.rm = TRUE)) ``` A caveat the next section resolves: fit *one subject at a time* and the SLT-beta likelihood can occasionally run off to a degenerate solution. For a person whose indifference points are almost all 0s and 1s, a fit with near-zero precision that places all its mass on the endpoints can score a higher likelihood than any sensible discounting curve. The fix is not to fit people in isolation. ## Fitting the mixed-effects model A mixed-effects model shares information across people. The subject random intercept on $\log k$ acts as a prior that pulls each person's estimate toward the population; the degenerate per-subject solutions described above are thus regularized away, which is one of the main reasons to prefer `fit_dd_tmb()` over fitting each subject separately. The following chunks require TMB (a compiled C++ template; `has_tmb = TRUE` on this build): ```{r fit-tmb, eval = has_tmb} fit <- fit_dd_tmb( data = dd_ip, # columns: id, x (delay), y (indifference point) equation = "mazur", # or "exponential" family = "sltb", # default; "gaussian" reproduces the NLS-style fit random_effects = k ~ 1 # random intercept on log k (fit internally on log scale) ) summary(fit) tidy(fit) # population log k, precision; back-transformed k ranef(fit) # per-subject discount rates ``` Group contrasts use the same estimated-marginal-means surface computed on the `log k` scale and back-transformed, so a between-group comparison reads as a **ratio of discount rates**. The example below adds a synthetic two-group factor for illustration: ```{r emmeans, eval = has_tmb} set.seed(2) grp <- data.frame( id = levels(dd_ip$id), group = factor(sample(c("A", "B"), nlevels(dd_ip$id), replace = TRUE)) ) dd_grp <- merge(dd_ip, grp, by = "id") fit_grp <- fit_dd_tmb(dd_grp, equation = "mazur", factors = "group") get_dd_param_emms(fit_grp) # EMM of k per group get_dd_comparisons(fit_grp, compare_specs = ~group, adjust = "holm") # ratio-of-k contrasts ``` ## Choosing a family | | `family = "sltb"` (default) | `family = "gaussian"` | |---|---|---| | Respects $[0,1]$ | yes (incl. boundary mass at 0 and 1) | no (can predict outside) | | Variance | shrinks near bounds, grows mid-range | constant | | Boundary data | modeled | only via the bounded mean | | Matches legacy `fit_dd()` | — | yes (NLS-style) | Use `"sltb"` for inference on real IP data; use `"gaussian"` when continuity with a classic least-squares discounting analysis is required or as a quick baseline. ## How much to trust it The SLT-beta density was validated before release: it integrates to one, has finite density at both boundaries, reproduces the published moments, and recovers known discount rates from simulated data (rank correlation 0.99 for Mazur, 0.95 for the exponential). On the bundled example data, every boundary subject is fit successfully where ordinary beta regression cannot be evaluated at all. The full Monte-Carlo study lives in the package's `dev/` directory. ## References - Kim, M., Koffarnus, M. N., & Franck, C. T. (2024). *Thinking Inside the Bounds: Improved Error Distributions for Indifference Point Data Analysis and Simulation via Beta Regression Using Common Discounting Functions.* - Kim, M., Kaplan, B. A., Koffarnus, M. N., & Franck, C. T. (2025). *Scale-Location-Truncated Beta Regression: Expanding Beta Regression to Accommodate 0 and 1.* arXiv:2509.13167.