The conditional gradient method (CGM) is widely used in large-scale sparse convex optimization, having a low per iteration computational cost for structured sparse regularizers and a greedy approach for collecting nonzeros. We explore the sparsity acquiring properties of a general penalized CGM (P-CGM) for convex regularizers and a reweighted penalized CGM (RP-CGM) for nonconvex regularizers, replacing the usual convex constraints with gauge-inspired penalties. This generalization does not increase the per-iteration complexity noticeably. Without assuming bounded iterates or using line search, we show
Révisé le :
Accepté le :
Publié le :
@article{OJMO_2022__3__A3_0, author = {Sun, Yifan and Bach, Francis}, title = {Screening for a {Reweighted} {Penalized} {Conditional} {Gradient} {Method}}, journal = {Open Journal of Mathematical Optimization}, eid = {3}, pages = {1--35}, publisher = {Universit\'e de Montpellier}, volume = {3}, year = {2022}, doi = {10.5802/ojmo.14}, language = {en}, url = {https://www.numdam.org/articles/10.5802/ojmo.14/} }
TY - JOUR AU - Sun, Yifan AU - Bach, Francis TI - Screening for a Reweighted Penalized Conditional Gradient Method JO - Open Journal of Mathematical Optimization PY - 2022 SP - 1 EP - 35 VL - 3 PB - Université de Montpellier UR - https://www.numdam.org/articles/10.5802/ojmo.14/ DO - 10.5802/ojmo.14 LA - en ID - OJMO_2022__3__A3_0 ER -
%0 Journal Article %A Sun, Yifan %A Bach, Francis %T Screening for a Reweighted Penalized Conditional Gradient Method %J Open Journal of Mathematical Optimization %D 2022 %P 1-35 %V 3 %I Université de Montpellier %U https://www.numdam.org/articles/10.5802/ojmo.14/ %R 10.5802/ojmo.14 %G en %F OJMO_2022__3__A3_0
Sun, Yifan; Bach, Francis. Screening for a Reweighted Penalized Conditional Gradient Method. Open Journal of Mathematical Optimization, Tome 3 (2022), article no. 3, 35 p. doi : 10.5802/ojmo.14. https://www.numdam.org/articles/10.5802/ojmo.14/
[1] Structured sparsity-inducing norms through submodular functions, Advances in Neural Information Processing Systems (2010), pp. 118-126
[2] Duality between subgradient and conditional gradient methods, SIAM J. Optim., Volume 25 (2015) no. 1, pp. 115-129 | DOI | MR | Zbl
[3] A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications, Math. Oper. Res., Volume 42 (2017) no. 2, pp. 330-348 | DOI | MR | Zbl
[4] Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 408, Springer, 2011 | DOI
[5] Deep Frank-Wolfe for neural network optimization (2018) (https://arxiv.org/abs/1811.07591)
[6] Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR, Biometrics, Volume 64 (2008) no. 1, pp. 115-123 | DOI | MR | Zbl
[7] Dynamic screening: Accelerating first-order algorithms for the LASSO and group-LASSO, IEEE Trans. Signal Process., Volume 63 (2015) no. 19, pp. 5121-5132 | DOI | MR | Zbl
[8] Convex Analysis and Nonlinear Optimization: Theory and Examples, Springer, 2010
[9] Iterated hard shrinkage for minimization problems with sparsity constraints, SIAM J. Sci. Comput., Volume 30 (2008) no. 2, pp. 657-683 | DOI | MR | Zbl
[10] A generalized conditional gradient method and its connection to an iterative shrinkage method, Comput. Optim. Appl., Volume 42 (2009) no. 2, pp. 173-193 | DOI | MR | Zbl
[11] On the identification of active constraints, SIAM J. Numer. Anal., Volume 25 (1988) no. 5, pp. 1197-1211 | DOI | MR | Zbl
[12] On the identification of active constraints II: The nonconvex case, SIAM J. Numer. Anal., Volume 27 (1990) no. 4, pp. 1081-1102 | DOI | MR | Zbl
[13] Robust signal recovery from incomplete observations, 2006 International Conference on Image Processing, IEEE (2006), pp. 1281-1284 | DOI
[14] Decoding by linear programming, IEEE Trans. Inf. Theory, Volume 51 (2005) no. 12, pp. 4203-4215 | DOI | MR | Zbl
[15] The convex geometry of linear inverse problems, Found. Comput. Math., Volume 12 (2012) no. 6, pp. 805-849 | DOI | MR | Zbl
[16] On pairwise costs for network flow multi-object tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 5537-5545
[17] Convergence of reweighted
[18] Generalized gradients and applications, Trans. Am. Math. Soc., Volume 205 (1975), pp. 247-262 | DOI | MR | Zbl
[19] Nonsmooth analysis and optimization, Proceedings of the international congress of mathematicians, Volume 5, Citeseer (1983), pp. 847-853
[20] Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm, ACM Trans. Algorithms, Volume 6 (2010) no. 4, p. 63 | MR | Zbl
[21] Iteratively reweighted least squares minimization for sparse recovery, Commun. Pure Appl. Math., Volume 63 (2010) no. 1, pp. 1-38 | DOI | MR | Zbl
[22] Compressed sensing, IEEE Trans. Inf. Theory, Volume 52 (2006) no. 4, pp. 1289-1306 | DOI | MR | Zbl
[23] Lifted coordinate descent for learning with trace-norm regularization, Artificial Intelligence and Statistics (2012), pp. 327-336
[24] Conditional gradient algorithms with open loop step size rules, J. Math. Anal. Appl., Volume 62 (1978) no. 2, pp. 432-444 | DOI | MR | Zbl
[25] Safe feature elimination for the Lasso and sparse supervised learning problems, Pac. J. Optim., Volume 8 (2012) no. 4, pp. 667-698 | Zbl
[26] Improved Convergence for
[27] et al. Atomic Decomposition via Polar Alignment: The Geometry of Structured Optimization, Found. Trends Optim., Volume 3 (2020) no. 4, pp. 280-366
[28] Mind the duality gap: safer rules for the lasso, International Conference on Machine Learning, PMLR (2015), pp. 333-342
[29] An algorithm for quadratic programming, Nav. Res. Logist. Q., Volume 3 (1956) no. 1-2, pp. 95-110 | DOI | MR
[30] Dual gauge programs, with applications to quadratic programming and the minimum-norm problem, Math. Program., Volume 38 (1987) no. 1, pp. 47-67 | DOI | MR | Zbl
[31] An Extended Frank–Wolfe Method with “In-Face” Directions, and Its Application to Low-Rank Matrix Completion, SIAM J. Optim., Volume 27 (2017) no. 1, pp. 319-346 | DOI | MR | Zbl
[32] Gauge optimization and duality, SIAM J. Optim., Volume 24 (2014) no. 4, pp. 1999-2022 | DOI | MR | Zbl
[33] A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems, International Conference on Machine Learning (2013), pp. 37-45
[34] Some comments on Wolfe’s ‘away step’, Math. Program., Volume 35 (1986) no. 1, pp. 110-119 | DOI | MR | Zbl
[35] Result analysis of the NIPS 2003 feature selection challenge, Adv. Neural Inf. Process. Syst., Volume 17 (2004)
[36] Conditional gradient algorithms for norm-regularized smooth convex optimization, Math. Program., Volume 152 (2015) no. 1-2, pp. 75-112 | DOI | MR | Zbl
[37] Identifying active manifolds in regularization problems, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Springer, 2011, pp. 261-271 | DOI | Zbl
[38] Sparse approximate solutions to semidefinite programs, Latin American Symposium on Theoretical Informatics, Springer (2008), pp. 306-316 | Zbl
[39] Revisiting Frank-Wolfe: Projection-free sparse convex optimization., ICML (2013), pp. 427-435
[40] Stingy CD: safely avoiding wasteful updates in coordinate descent, Proceedings of the 34th International Conference on Machine Learning-Volume 70 (2017), pp. 1752-1760
[41] Barrier Frank-Wolfe for marginal inference, Advances in Neural Information Processing Systems (2015), pp. 532-540
[42] On the global linear convergence of Frank-Wolfe optimization variants, Advances in Neural Information Processing Systems, 2015, pp. 496-504
[43] Block-coordinate Frank-Wolfe optimization for structural SVMs, International Conference on Machine Learning, PMLR (2013), pp. 53-61
[44] Sequential kernel herding: Frank-Wolfe optimization for particle filtering, Artificial Intelligence and Statistics, PMLR (2015), pp. 544-552
[45] Identifying activity, SIAM J. Optim., Volume 21 (2011) no. 2, pp. 597-614 | DOI | MR | Zbl
[46] Safe screening with variational inequalities and its application to lasso, International Conference on Machine Learning, PMLR (2014), pp. 289-297
[47] Relatively smooth convex optimization by first-order methods, and applications, SIAM J. Optim., Volume 28 (2018) no. 1, pp. 333-354 | MR | Zbl
[48] Sparse modeling for image and vision processing (2014) (https://arxiv.org/abs/1411.3230)
[49] Safe screening tests for Lasso based on firmly non-expansiveness, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (2016), pp. 4732-4736 | DOI
[50] Scalable robust matrix recovery: Frank–Wolfe meets proximal methods, SIAM J. Sci. Comput., Volume 38 (2016) no. 5, p. A3291-A3317 | MR | Zbl
[51] GAP safe screening rules for sparse multi-task and multi-class models, Advances in Neural Information Processing Systems (2015), pp. 811-819
[52] Introductory Lectures on Convex Optimization: A Basic Course, Springer, 2013
[53] “Active-set complexity” of proximal gradient: How long does it take to find the sparsity pattern?, Optim. Lett., Volume 13 (2019) no. 4, pp. 645-655 | DOI | MR | Zbl
[54] Coordinate descent converges faster with the Gauss-Southwell rule than random selection, International Conference on Machine Learning (2015), pp. 1632-1641
[55] Group lasso with overlaps: the latent group lasso approach (2011) (https://arxiv.org/abs/1110.0413)
[56] On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision, SIAM J. Imaging Sci., Volume 8 (2015) no. 1, pp. 331-372 | DOI | MR | Zbl
[57] Learning infinite RBMs with Frank-Wolfe, Advances in Neural Information Processing Systems (2016), pp. 3063-3071
[58] Screening rules for lasso with non-convex sparse regularizers, International Conference on Machine Learning, PMLR (2019), pp. 5341-5350
[59] Forward-backward greedy algorithms for atomic norm regularization, IEEE Trans. Signal Process., Volume 63 (2015) no. 21, pp. 5798-5811 | MR | Zbl
[60] Convex Analysis, 28, Princeton University Press, 1970 | DOI
[61] Multi-task learning as multi-objective optimization, Advances in Neural Information Processing Systems (2018), pp. 527-538
[62] Greedy algorithms for structurally constrained high dimensional problems, Advances in Neural Information Processing Systems (2011), pp. 882-890
[63] Regression shrinkage and selection via the Lasso, J. R. Stat. Soc., Ser. B, Stat. Methodol., Volume 58 (1996) no. 1, pp. 267-288 | MR | Zbl
[64] Fast column generation for atomic norm regularization, Proceedings of International Conference on Artificial Intelligence and Statistics (2017)
[65] Simplicial decomposition in nonlinear programming algorithms, Math. Program., Volume 13 (1977) no. 1, pp. 49-68 | DOI | MR | Zbl
[66] LASSO screening rules via dual polytope projection, Adv. Neural Inf. Process. Syst. (2013), pp. 1070-1078
[67] Iteratively reweighted least squares: algorithms, convergence analysis, and numerical comparisons, SIAM J. Sci. Stat. Comput., Volume 9 (1988) no. 5, pp. 907-921 | DOI | MR | Zbl
[68] Sparse reconstruction by separable approximation, IEEE Trans. Signal Process., Volume 57 (2009) no. 7, pp. 2479-2493 | DOI | MR
[69] Generalized conditional gradient for sparse estimation, J. Mach. Learn. Theory, Volume 18 (2017) no. 1, pp. 5279-5324
[70] Sketchy decisions: Convex low-rank matrix optimization with optimal storage, Artificial intelligence and statistics, PMLR (2017), pp. 1188-1196
[71] The Ordered Weighted
[72] Limited memory Kelley’s method converges for composite convex and submodular objectives, Advances in Neural Information Processing Systems, 2018, pp. 4414-4424
Cité par Sources :