Akakpo, N. (2012). Adaptation to anisotropy and inhomogeneity via dyadic piecewise polynomial selection. Math. Methods Statist. 21 1–28. MR2901269 https://doi.org/10.3103/S1066530712010012
Antoniadis, A. and Sapatinas, T. (2001). Wavelet shrinkage for natural exponential families with quadratic variance functions. Biometrika 88 805–820. MR1859411 https://doi.org/10.1093/biomet/88.3.805
Baraud, Y. and Birgé, L. (2014). Estimating composite functions by model selection. Ann. Inst. Henri Poincaré Probab. Stat. 50 285–314. MR3161532 https://doi.org/10.1214/12-AIHP516
Baraud, Y. and Birgé, L. (2018). Rho-estimators revisited: General theory and applications. Ann. Statist. 46 3767–3804. MR3852668 https://doi.org/10.1214/17-AOS1675
Baraud, Y., Birgé, L. and Sart, M. (2017). A new method for estimation and model selection: ρ-estimation. Invent. Math. 207 425–517. MR3595933 https://doi.org/10.1007/s00222-016-0673-5
Baraud, Y. and Chen, J. (2020). Robust estimation of a regression function in exponential families. arXiv preprint. Available at arXiv:2011.01657.
Bartlett, P.L., Maiorov, V. and Meir, R. (1998). Almost linear VC-dimension bounds for piecewise polynomial networks. Neural Comput. 10 2159–2173.
Bartlett, P.L., Harvey, N., Liaw, C. and Mehrabian, A. (2019). Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. J. Mach. Learn. Res. 20 Paper No. 63, 17. MR3960917
Brown, L.D., Cai, T.T. and Zhou, H.H. (2010). Nonparametric regression in exponential families. Ann. Statist. 38 2005–2046. MR2676882 https://doi.org/10.1214/09-AOS762
Chen, J. (2024). Supplement to “Estimating a regression function in exponential families by model selection.” https://doi.org/10.3150/23-BEJ1649SUPP
Dahmen, W., DeVore, R. and Scherer, K. (1980). Multidimensional spline approximation. SIAM J. Numer. Anal. 17 380–402. MR0581486 https://doi.org/10.1137/0717033
Daubechies, I., DeVore, R., Foucart, S., Hanin, B. and Petrova, G. (2022). Nonlinear approximation and (deep) ReLU networks. Constr. Approx. 55 127–172. MR4376561 https://doi.org/10.1007/s00365-021-09548-z
Fryźlewicz, P. and Nason, G.P. (2001). Poisson intensity estimation using wavelets and the Fisz transformation. Technical Report, 01/10, Department of Mathematics, Univ. Bristol, United Kingdom.
Fryzlewicz, P. and Nason, G.P. (2004). A Haar-Fisz algorithm for Poisson intensity estimation. J. Comput. Graph. Statist. 13 621–638. MR2087718 https://doi.org/10.1198/106186004X2697
Goodfellow, I., Bengio, Y. and Courville, A. (2016). Deep Learning. Adaptive Computation and Machine Learning. Cambridge, MA: MIT Press. MR3617773
Hochmuth, R. (2002). Wavelet characterizations for anisotropic Besov spaces. Appl. Comput. Harmon. Anal. 12 179–208. MR1884234 https://doi.org/10.1006/acha.2001.0377
Horowitz, J.L. and Mammen, E. (2007). Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions. Ann. Statist. 35 2589–2619. MR2382659 https://doi.org/10.1214/009053607000000415
Ibragimov, I.A. and Has’minskiĭ, R.Z. (1984). More on the estimation of distribution densities. J. Sov. Math. 25 1155–1165.
Jia, J., Xie, F. and Xu, L. (2019). Sparse Poisson regression with penalized weighted score function. Electron. J. Stat. 13 2898–2920. MR3998931 https://doi.org/10.1214/19-EJS1580
Kolaczyk, E.D. and Nowak, R.D. (2005). Multiscale generalised linear models for nonparametric function estimation. Biometrika 92 119–133. MR2158614 https://doi.org/10.1093/biomet/92.1.119
Kroll, M. (2019). Non-parametric Poisson regression from independent and weakly dependent observations by model selection. J. Statist. Plann. Inference 199 249–270. MR3857826 https://doi.org/10.1016/j.jspi.2018.07. 003
Li, Y. and Cevher, V. (2015). Consistency of ℓ1-regularized maximum-likelihood for compressive Poisson regression. In 2015 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) 3606–3610.
Nunes, M.A. and Nason, G.P. (2009). A multiscale variance stabilization for binomial sequence proportion estimation. Statist. Sinica 19 1491–1510. MR2589194
Nussbaum, M. (1987). Nonparametric estimation of a regression function that is smooth in a domain in Rk. Theory Probab. its Appl. 31 108–115.
Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with ReLU activation function. Ann. Statist. 48 1875–1897. MR4134774 https://doi.org/10.1214/19-AOS1875
Schumaker, L.L. (1981). Spline Functions: Basic Theory. Pure and Applied Mathematics. New York: Wiley. MR0606200
Stone, C.J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040–1053. MR0673642
Suzuki, T. (2019). Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: Optimal rate and curse of dimensionality. In 7th International Conference on Learning Representations, ICLR.
Suzuki, T. and Nitanda, A. (2021). Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space. In 35th Advances in Neural Information Processing Systems, NeurIPS.
Triebel, H. (2006). Theory of Function Spaces. III. Monographs in Mathematics 100. Basel: Birkhäuser. MR2250142
van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes: With applications to statistics. Springer Series in Statistics. New York: Springer. MR1385671 https://doi.org/10.1007/978-1-4757-2545-2
Yamaguti, M. and Hata, M. (1983). Weierstrass’s function and chaos. Hokkaido Math. J. 12 333–342. MR0719972 https://doi.org/10.14492/hokmj/1470081010