What are the theoretical guarantees for sagging

What are the theoretical guarantees of sagging?

I've heard (roughly) that:

Sagging is a technique to reduce the variance of a predictor / estimator / learning algorithm.

However, I have never seen formal mathematical proof of this statement. Does anyone know why this is mathematically true? It just seems to be such a widely accepted / known fact that I would expect a direct reference to it. I would be surprised if there aren't any. Also, does anyone know what effect this has on bias?

Are there other theoretical guarantees of practice that everyone thinks is important and that they want to share?


The main use case for sagging is to reduce the variance of low bias models by stitching them together. This was empirically shown in the groundbreaking work " An empirical comparison of voting classification algorithms: bagging, boosting, and variants "investigated by Bauer and Kohavi. It usually works as stated.

Contrary to popular belief however, there is no guarantee that the sag will reduce the variance . A more recent and in my opinion better explanation is that sagging decreases the influence of leverage points. Leverage points are those that disproportionately affect the resulting model, e.g. B. Least squares regression outliers. It is rare, but possible, that leverage points positively affect the resulting models. In this case, sagging decreases performance. Look at " Bagging Equalizes Influence "from Grandvalet.

To finish off your question, the impact of sagging largely depends on the leverage points. There are few theoretical guarantees other than that bagging will linearly add computation time to bag size! That said, it's still a widely used and very powerful technique. For example, if you are learning with label sounds, the sagging can create more robust classifiers.

Rao and Tibshirani have a Bayesian interpretation in " The out-of-bootstrap method for model averaging and selection "given:

In this sense, the bootstrap distribution represents an (approximate) nonparametric, non-informative trailing distribution for our parameter. However, this bootstrap distribution is painlessly obtained without having to provide any formal information beforehand and without sampling from the posterior distribution must be pulled. Hence, we can think of the bootstrap distribution as a poor man's "Bayes posterior".

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.