Skip to contents

Log-likelihood for classIntervals objects

Usage

# S3 method for class 'classIntervals'
logLik(object, ...)

Arguments

object

A classIntervals object

...

Ignored.

Value

A `logLik` object (see `stats::logLik`).

Details

Generally, the likelihood is a method for minimizing the standard deviation within an interval, and with the AIC, a per-interval penalty can be used to maximize the information and self-similarity of data in the interval.

Based on Birge 2006 and Davies 2009 (see references), interval binning selections may be compared by likelihood to optimize the number of intervals selected for a set of data. The `logLik()` function (and associated `AIC()` function) can be used to optimize binning by maximizing the likelihood across choices of intervals.

As illustrated by the examples below (the AIC comparison does not specifically select 3 intervals when comparing 2, 3, and 4 intervals for data with 3 intervals), while likelihood-based methods can provide evidence toward optimization of binning, they are not infallible for bin selection.

References

Lucien Birge, Yves Rozenholc. How many bins should be put in a regular histogram. ESAIM: Probability and Statistics. 31 January 2006. 10:24-45. url: https://www.esaim-ps.org/articles/ps/abs/2006/01/ps0322/ps0322.html. doi:10.1051/ps:2006001

Laurie Davies, Ursula Gather, Dan Nordman, Henrike Weinert. A comparison of automatic histogram constructions. ESAIM: Probability and Statistics. 11 June 2009. 13:181-196. url: https://www.esaim-ps.org/articles/ps/abs/2009/01/ps0721/ps0721.html doi:10.1051/ps:2008005

Examples

x <- classIntervals(rnorm(100), n=5, style="fisher")
logLik(x)
#> 'log Lik.' 8.000225 (df=5)
AIC(x) # By having a logLik method, AIC.default is used.
#> [1] -6.00045

# When the intervals are made of a limited number of discrete values, the
# logLik is zero by definition (the standard deviation is zero giving a dirac
# function at the discrete value indicating a density of 1 and a log-density
# of zero).
x <- classIntervals(rep(1:2, each=10), n=2, style="jenks")
#> Warning: n same as number of different finite values\neach different finite value is a separate class
logLik(x)
#> 'log Lik.' 0 (df=2)
x <- classIntervals(rep(1:3, each=10), n=2, style="jenks")
logLik(x)
#> 'log Lik.' -14.52876 (df=2)

# With slight jitter but notable categorical intervals (at 1, 2, and 3), the
# AIC will make selection of the optimal intervals easier.
data <- rep(1:3, each=100) + runif(n=300, min=-0.01, max=0.01)
x_2 <- classIntervals(data, n=2, style="jenks")
x_3 <- classIntervals(data, n=3, style="jenks")
x_4 <- classIntervals(data, n=4, style="jenks")
AIC(x_2, x_3, x_4)
#>     df        AIC
#> x_2  2  -456.1001
#> x_3  3 -2237.4472
#> x_4  4 -2384.8774