Skip to contents

create_bins() finds the largest number of equal-width bins (on a log or linear axis) such that every bin still contains at least min_per_bin observations. It then reports, for each bin, how many additional rows would need to be bootstrapped so that all bins end up the same size.

Usage

create_bins(
  data,
  var,
  min_per_bin = 100,
  scale = "log",
  base = 10,
  right = TRUE
)

Arguments

data

A data frame or tibble.

var

Unquoted numeric column to bin (tidy-eval).

min_per_bin

Minimum number of observations per bin. Default 100.

scale

Binning scale: "log" (default) or "linear".

base

Logarithmic base when scale = "log". Default 10.

right

Logical; should bins be right-closed? Default TRUE.

Value

A list with two tibbles:

data

The original data with an added bin_class column.

summary

Per-bin counts, rows to bootstrap (bootstrap_count), and the bin width.

References

Simovic M., & Michaletz S.T. (2025). Harnessing the Full Power of Data to Characterise Biological Scaling Relationships. Global Ecology and Biogeography, 34(2). https://doi.org/10.1111/geb.70019

Author

Simovic, M. milos.simovic@botany.ubc.ca; Michaletz, S.T. sean.michaletz@ubc.ca

Examples

data(xylem_scaling_simulation_dataset)

bins <- create_bins(
  data         = xylem_scaling_simulation_dataset,
  var          = L,
  min_per_bin  = 100,
  scale        = "log",
  base         = 10
)

head(bins$data)      # Binned dataset
#> # A tibble: 6 × 4
#>   Organ      L  DAVG bin_class                          
#>   <chr>  <dbl> <dbl> <fct>                              
#> 1 Branch 0.465  1.96 0.347372278686722 - 1.0296528323398
#> 2 Branch 0.790  1.97 0.347372278686722 - 1.0296528323398
#> 3 Branch 0.720  1.96 0.347372278686722 - 1.0296528323398
#> 4 Branch 0.465  2.01 0.347372278686722 - 1.0296528323398
#> 5 Branch 0.465  1.99 0.347372278686722 - 1.0296528323398
#> 6 Branch 0.720  1.99 0.347372278686722 - 1.0296528323398
bins$summary         # Counts & bootstrap totals
#> # A tibble: 8 × 4
#>   bin_class                             original_count bootstrap_count bin_width
#>   <fct>                                          <int>           <int>     <dbl>
#> 1 0.00449998963837901 - 0.013338536290…           1009          191998     0.472
#> 2 0.0133385362903635 - 0.0395370111931…           1068          191939     0.472
#> 3 0.0395370111931877 - 0.1171924130250…           1625          191382     0.472
#> 4 0.117192413025076 - 0.347372278686722          12393          180614     0.472
#> 5 0.347372278686722 - 1.0296528323398            97276           95731     0.472
#> 6 1.0296528323398 - 3.05201370458668            193007               0     0.472
#> 7 3.05201370458668 - 9.04653234607028            53084          139923     0.472
#> 8 9.04653234607028 - 26.8150617438916            82442          110565     0.472