This project aims to further improve Sharpness-Aware Minimization method to better enhance the generalization of deep learning models, and applies the method to various applications.
In this research, we found that, to find the worst-case parameters, SAM's perturbation depends on the normalized gradient of cross-entropy loss and a pre-defined constant radius of the neighborhood. Since at the nearly optimum stage, the gradient of cross-entropy loss is very small and fluctuates around the optimum point, this leads to the unstable direction of the perturbation. Another noticeable issue is that, at the nearly optimum stage, the magnitude of the gradient of cross-entropy loss becomes smaller and smaller and has a risk of being zero which could cause devising by zero problem.
Hence, we introduce an innovative method to address the challenges associated with SAM's perturbation step and satisfy the required properties of the perturbation. Our approach involves altering the loss function used for calculating the perturbation vector. Rather than relying on the cross-entropy loss, which diminishes as the model trained, we propose a novel loss function named Adaptive Adversarial Cross-Entropy (AACE). This new loss function is designed to increase magnitude as the model approaches convergence.
We propose to replace standard cross-entropy loss in SAM's perturbation step with this AACE loss to ensures a more consistent direction of the perturbation and also prevents a gradient diminishing problem.
Moreover, we further improve the method by integrating label smoothing into the weight-updating step, reducing overconfidence in predictions. Together, this combination of the AACE loss function and label smoothing technique leads to further improvement in model performance.
The empirical results confirmed our hypothesis on AACE characteristics and its improved generalizability over the original SAM.
In this research, we propose FlatFace, a novel training framework for face recognition that adopts weight perturbation into the training process. FlatFace consists of two key steps: the perturbation step, which perturbs model parameters in both the feature extractor and class weights toward the worst-case scenario, and the weight updating step, which uses the loss gradient at the perturbed feature extractor and class weights to update the parameters.
By guiding the model toward flatter minima, FlatFace improves generalization performance and accuracy, particularly for open-set face recognition tasks. FlatFace guides the model toward flatter regions in the loss landscape, improving model generalization and accuracy in open-set face recognition tasks. The empirical experiments confirm the decrease in the generalization gap and the improvement in overall performance.
1. Tanapat Ratchatorn and Masayuki Tanaka, “Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization”,
IEEE International Conference on Image Processing (ICIP), October, 2024.
[Paper]
[GitHub]
2. Tanapat Ratchatorn and Masayuki Tanaka, “Improving Sharpness-Aware Minimization Using Label Smoothing and Adaptive Adversarial Cross-Entropy Loss”,
IEEE Access, June, 2025.
[Paper]
[GitHub]
3. Tanapat Ratchatorn and Masayuki Tanaka, “FlatFace: Improve Face Recognition by Sharpness-Aware Minimization”,
Electronic Imaging (EI), March, 2026.
[Paper link TBA]
[GitHub]