Abstract: | Optimal a priori estimates are derived for the generalization error of regularized two-layer network and deep residual network models. For two-layer neural networks, the path norm is used as the regularization term. For residual networks, we define and use the "weighted path norm", which treats the skip connections and the nonlinearities differently so that paths with more nonlinearities are regularized by larger weights. The error estimates are a priori in the sense that the estimates depend only on the target function, not on the parameters obtained in the training process. The estimates are optimal, in a high dimensional setting, in the sense that both the bounds for approximation and estimation errors are comparable to the Monte Carlo error rates. Comparisons are made with existing norm-based generalization error bounds. |