Data balancing across log- or linear-scaled bins and fitting via SMA regression

balanced_scaling() partitions a data set into equal‑width bins on a log or linear axis, upsamples each bin so they contain the same number of observations, and then fits a standardised major axis (SMA) model to every balanced bootstrap replicate.

Usage

balanced_scaling(
  data,
  var_x,
  var_y,
  min_per_bin = 100,
  n_boot = 100,
  base = 10,
  seed = 1,
  model_type = c("power", "exp", "linear")
)

Arguments

data: A data frame or tibble.
var_x, var_y: Unquoted column names for the predictor and response.
min_per_bin: Minimum number of observations per bin. Default 100.
n_boot: Number of bootstrap iterations. Default 100.
base: Logarithmic base when scale = "log". Default 10.
seed: Base seed for reproducibility; iteration i uses seed + i. Set NULL for no seeding.
model_type: "power" (default), "exp", or "linear".

Value

A list with:

stats: Regression statistics, including r², p-value, slope, intercept (i.e., elevation).
first_boot: The first bootstrap-balanced dataset generated via the function. Useful for plotting or statistical comparisons with original, imbalanced data.
bins: The exact output from create_bins() — a list with data (binned input rows) and summary (one‑row‑per‑bin metadata including bin_width, bin_class, bootstrap_count, etc.).

Details

Three model types are supported:

"power": Power‑law model y = a xᵇ (log10–log10).
"exp": Exponential model y = a exp(b x) (log–linear).
"linear": Ordinary linear model y = a + b x.

References

Simovic, M., & Michaletz, S.T. (2025). Harnessing the Full Power of Data to Characterise Biological Scaling Relationships. Global Ecology and Biogeography, 34(2). https://doi.org/10.1111/geb.70019 Warton, D.I., Duursma, R.A., Falster, D.S., & Taskinen, S. (2012). smatr 3 – an R package for estimation and inference about allometric lines. Methods in Ecology and Evolution, 3(2), 257–259. https://doi.org/10.1111/j.2041-210X.2011.00153.x

Author

Simovic, M. milos.simovic@botany.ubc.ca; Michaletz, S.T. sean.michaletz@ubc.ca

Examples

if (requireNamespace("smatr", quietly = TRUE)) {
  data(xylem_scaling_simulation_dataset)
  res <- balanced_scaling(
    data        = xylem_scaling_simulation_dataset,
    var_x       = L,
    var_y       = DAVG,
    min_per_bin = 100,
    n_boot      = 10,
    seed        = 1,
    model_type  = "power"
  )
  head(res$stats)
}
#>   iter     slope  slope_lo  slope_hi intercept intercept_lo intercept_hi
#> 1    1 0.2294432 0.2291341 0.2297526 0.9566411    0.9562315    0.9570507
#> 2    2 0.2294725 0.2291634 0.2297820 0.9567315    0.9563217    0.9571413
#> 3    3 0.2294092 0.2291003 0.2297186 0.9567903    0.9563808    0.9571998
#> 4    4 0.2294480 0.2291389 0.2297574 0.9566746    0.9562649    0.9570843
#> 5    5 0.2294091 0.2291002 0.2297184 0.9570781    0.9566687    0.9574876
#> 6    6 0.2294702 0.2291611 0.2297798 0.9571014    0.9566916    0.9575112
#>          r2 pval       n
#> 1 0.2698124    0 1544056
#> 2 0.2696528    0 1544056
#> 3 0.2701138    0 1544056
#> 4 0.2698243    0 1544056
#> 5 0.2704510    0 1544056
#> 6 0.2694203    0 1544056