## ICML 2016

### Workshop — Friday, June 24, 2016

#### Optimization Methods for the Next Generation of Machine Learning

Abstract: The future of optimization for machine learning, lies in the design of methods for nonconvex optimization problems, such as those arising through the use of deep neural networks. Nonconvex formulations lead to more powerful predictive models, but are much more complex in the sense that they result in much more challenging optimization problems. This workshop will bring together experts from the machine learning and optimization communities whose research focuses on the design of optimization methodologies that combine recent trends of optimization in machine learning—stochasticity, parallel and distributed computing, and second order information—but do so in nonconvex settings.

#### Organizers

 Katya Scheinberg Lehigh University Yoshua Bengio University of Montreal Frank E. Curtis Lehigh University Jorge Nocedal Northwestern University

#### Schedule

Below is an outline of the schedule. Please check back later for further details, which will be posted as they are developed. The poster spotlights will involve short presentations by a few selected poster contributors, which are to be determined.

Slides are available! Click invited speaker’s name for link to slides.

 08:30–09:00 Yoshua Bengio 09:00–09:30 Coralia Cartis 09:30–10:00 Poster Spotlights 10:00–10:30 Coffee Break 10:30–11:00 Elad Hazan 11:00–11:30 Josh Griffin 11:30–02:00 Lunch Break and Poster Session 1 02:00–02:30 Leon Bottou 02:30–03:00 Raghu Pasupathy 03:00–03:30 Coffee Break 03:30–04:00 Ben Recht 04:00–04:30 Mark Schmidt 04:30–06:00 Poster Session 2

#### Speakers: Titles and Abstracts

 On an ADMM framework for the resolution of complementarity formulations of the l0-norm minimization problem: Preliminary work AIDE: Fast and Communication Efficient Distributed Optimization Optimizing binary autoencoders using auxiliary coordinates, with application to learning binary hashing Empirical Investigation on Second-order Integrators for Non-Convex Stochastic Optimization Faster Asynchronous SGD Parallel SGD: When does averaging help Large Scale Distributed Hessian-Free Optimization for Deep Neural Network Convergence Rate Analysis of a Stochastic Trust Region Method for Nonconvex Optimization A Trust Region Method with Complexity of $\mathcal{O}(\epsilon^{-3/2})$ for Nonconvex Optimization