Improving the Adam optimizer via projection-based gradient correction in deep learning
Deep neural networks (DNNs) are widely used for large-scale learning tasks because of their ability... See more
Deep neural networks (DNNs) are widely used for large-scale learning tasks because of their ability to model complex relationships within data. The Adaptive Moment Estimation (Adam) optimizer is a popular choice for training DNNs; however, its generalization performance can be suboptimal on challenging datasets. To address this limitation, we propose three modified Adam variants (Adam-V1, Adam-V2, and Adam-V3) that incorporate a projection-based gradient-correction mechanism inspired by quasi-Newton and conjugate gradient methods. This correction introduces curvature awareness without requiring full Hessian computations, improving convergence stability and reducing the tendency to settle at sharp or poorly generalizing minima. The proposed methods were systematically evaluated on both low- and high-dimensional tasks, including one- and two-variable non-convex functions, two-dimensional image segmentation, image classification using CNNs on MNIST, CIFAR- 10, and the more challenging CIFAR-100 datasets, as well as ResNet-based architectures on CIFAR-10. In addition, robustness on non-stationary real-world signals was assessed through ECG beat classification using the MIT-BIH Arrhythmia dataset. Experimental results demonstrate consistent improvements over baseline Adam. On CNN models trained on MNIST, Adam-V2 achieved the highest accuracy of 97.93 %, surpassing standard Adam (96.48 %) and highlighting the benefit of combining gradient correction with adaptive step-size adjustment in lower-dimensional settings. For CNNs trained on CIFAR-10, Adam-V3 attained a validation accuracy of 73.59 %, improving generalization relative to Adam (72.44 %). On the more complex CIFAR-100 dataset, the proposed variants consistently outperformed baseline Adam and recent adaptive optimizers in terms of accuracy and F1-score. Using a ResNet-50 model on CIFAR-10, Adam-V1 reached the highest accuracy of 79.9 %, while Adam-V3 achieved the best F1-score of 0.704, demonstrating strong performance in deeper network architectures. These results show that curvature-aware gradient corrections enhance convergence speed, stability, and generalization in deep learning tasks with minimal additional computational overhead. The proposed optimizers offer practical advantages for both shallow and deep architectures, providing a simple and effective improvement to existing adaptive optimization methods.
2026-01