Data balancing across log- or linear-scaled bins and fitting via SMA regression
Source:R/balanced_scaling.R
balanced_scaling.Rd
balanced_scaling()
partitions a data set into equal‑width bins on a
log or linear axis, upsamples each bin so they contain the same number
of observations, and then fits a standardised major axis (SMA)
model to every balanced bootstrap replicate.
Usage
balanced_scaling(
data,
var_x,
var_y,
min_per_bin = 100,
n_boot = 100,
base = 10,
seed = 1,
model_type = c("power", "exp", "linear")
)
Arguments
- data
A data frame or tibble.
- var_x, var_y
Unquoted column names for the predictor and response.
- min_per_bin
Minimum number of observations per bin. Default
100
.- n_boot
Number of bootstrap iterations. Default
100
.- base
Logarithmic base when
scale = "log"
. Default10
.- seed
Base seed for reproducibility; iteration i uses
seed + i
. SetNULL
for no seeding.- model_type
"power"
(default),"exp"
, or"linear"
.
Value
A list with:
stats
Regression statistics, including r², p-value, slope, intercept (i.e., elevation).
first_boot
The first bootstrap-balanced dataset generated via the function. Useful for plotting or statistical comparisons with original, imbalanced data.
bins
The exact output from
create_bins()
— a list withdata
(binned input rows) andsummary
(one‑row‑per‑bin metadata includingbin_width
,bin_class
,bootstrap_count
, etc.).
Details
Three model types are supported:
"power"
Power‑law model y = a xᵇ (log10–log10).
"exp"
Exponential model y = a exp(b x) (log–linear).
"linear"
Ordinary linear model y = a + b x.
References
Simovic, M., & Michaletz, S.T. (2025). Harnessing the Full Power of Data to Characterise Biological Scaling Relationships. Global Ecology and Biogeography, 34(2). https://doi.org/10.1111/geb.70019 Warton, D.I., Duursma, R.A., Falster, D.S., & Taskinen, S. (2012). smatr 3 – an R package for estimation and inference about allometric lines. Methods in Ecology and Evolution, 3(2), 257–259. https://doi.org/10.1111/j.2041-210X.2011.00153.x
Author
Simovic, M. milos.simovic@botany.ubc.ca; Michaletz, S.T. sean.michaletz@ubc.ca
Examples
if (requireNamespace("smatr", quietly = TRUE)) {
data(xylem_scaling_simulation_dataset)
res <- balanced_scaling(
data = xylem_scaling_simulation_dataset,
var_x = L,
var_y = DAVG,
min_per_bin = 100,
n_boot = 10,
seed = 1,
model_type = "power"
)
head(res$stats)
}
#> iter slope slope_lo slope_hi intercept intercept_lo intercept_hi
#> 1 1 0.2294432 0.2291341 0.2297526 0.9566411 0.9562315 0.9570507
#> 2 2 0.2294725 0.2291634 0.2297820 0.9567315 0.9563217 0.9571413
#> 3 3 0.2294092 0.2291003 0.2297186 0.9567903 0.9563808 0.9571998
#> 4 4 0.2294480 0.2291389 0.2297574 0.9566746 0.9562649 0.9570843
#> 5 5 0.2294091 0.2291002 0.2297184 0.9570781 0.9566687 0.9574876
#> 6 6 0.2294702 0.2291611 0.2297798 0.9571014 0.9566916 0.9575112
#> r2 pval n
#> 1 0.2698124 0 1544056
#> 2 0.2696528 0 1544056
#> 3 0.2701138 0 1544056
#> 4 0.2698243 0 1544056
#> 5 0.2704510 0 1544056
#> 6 0.2694203 0 1544056