This page is dedicated to start discussions about the article "Compressed Least Squares Regression".
Feel free to post any comment, sugggestion, question, correction, extension... I will enjoy discussing this with you.
- Abstract:
"We consider the problem of learning, from K data, a regression function in a linear space of high dimension N using projections onto a random subspace of lower dimension M. From any algorithm minimizing the (possibly penalized) empirical risk, we provide bounds on the excess risk of the estimate computed in the projected subspace (compressed domain) in terms of the excess risk of the estimate built in the high-dimensional space (initial domain). We show that solving the problem in the compressed domain instead of the initial domain reduces the estimation error at the price of an increased (but controlled) approximation error. We apply the analysis to Least-Squares (LS) regression and discuss the excess risk and numerical complexity of the resulting ``Compressed Least Squares Regression'' (CLSR) in terms of N, K, and M. When we choose M=O(\sqrt{K}), we show that CLSR has estimation error of order O(\log K / \sqrt{K}) and provides an interesting alternative to usual penalization techniques where one first selects a random subspace of lower dimension and then performs an empirical risk minimizer in this subspace."
- Discussion:
The term appearing in the bound is crucial for the method to be useful, and seem to appear in a lot of works dealing with regression. So it would be nice to make this more explicit in previous work, have possibly different interpretation, and of course use approximation theory to know under which assumptions it can be controlled.
- Future work: Possible extensions include
Compressed kernel regression.
Compressed density estimation.
New interpretation of compressed learning.