IMPROVEMENTS AND APPLICATIONS
FOR SHARPNESS-AWARE MINIMIZATION

This project aims to further improve Sharpness-Aware Minimization method to better enhance the generalization of deep learning models, and applies the method to various applications.

  1. Adaptive Adversarial Cross-Entropy Loss
  2. FlatFace


1. Adaptive Adversarial Cross-Entropy Loss [1, 2]

Description of the image
Comparison of loss and gradient between standard cross-entropy loss and Adaptive Adversarial Cross-Entropy loss at early stage and later stage of training.

In this research, we found that, to find the worst-case parameters, SAM's perturbation depends on the normalized gradient of cross-entropy loss and a pre-defined constant radius of the neighborhood. Since at the nearly optimum stage, the gradient of cross-entropy loss is very small and fluctuates around the optimum point, this leads to the unstable direction of the perturbation. Another noticeable issue is that, at the nearly optimum stage, the magnitude of the gradient of cross-entropy loss becomes smaller and smaller and has a risk of being zero which could cause devising by zero problem.

Description of the image
Loss calculation between standard cross-entropy loss and Adaptive Adversarial Cross-Entropy loss.

Hence, we introduce an innovative method to address the challenges associated with SAM's perturbation step and satisfy the required properties of the perturbation. Our approach involves altering the loss function used for calculating the perturbation vector. Rather than relying on the cross-entropy loss, which diminishes as the model trained, we propose a novel loss function named Adaptive Adversarial Cross-Entropy (AACE). This new loss function is designed to increase magnitude as the model approaches convergence.

Description of the image
Diagram illustrates the perturbation step and the updating step of original SAM and our proposed method.

We propose to replace standard cross-entropy loss in SAM's perturbation step with this AACE loss to ensures a more consistent direction of the perturbation and also prevents a gradient diminishing problem.

Moreover, we further improve the method by integrating label smoothing into the weight-updating step, reducing overconfidence in predictions. Together, this combination of the AACE loss function and label smoothing technique leads to further improvement in model performance.

The empirical results confirmed our hypothesis on AACE characteristics and its improved generalizability over the original SAM.

Description of the image
Accuracies (%) of SAM and SAM with AACE on Wide ResNet on CIFAR-100, with and without smooth labeling. SCE and SAACE denote applying smooth labeling to CE loss and AACE loss, respectively.

Description of the image
Top-1 and Top-5 Accuracies (%) of models training with SGD, SAM, and our proposed method on ImageNet-1k with ResNet-50 with smooth labeling.

2. FlatFace [3]

Description of the image
Training process comparison between normal margin-based FR and FlatFace.

In this research, we propose FlatFace, a novel training framework for face recognition that adopts weight perturbation into the training process. FlatFace consists of two key steps: the perturbation step, which perturbs model parameters in both the feature extractor and class weights toward the worst-case scenario, and the weight updating step, which uses the loss gradient at the perturbed feature extractor and class weights to update the parameters.

Description of the image
Accuracy comparison of models trained on CASIA dataset, tested with different benchmarks.

By guiding the model toward flatter minima, FlatFace improves generalization performance and accuracy, particularly for open-set face recognition tasks. FlatFace guides the model toward flatter regions in the loss landscape, improving model generalization and accuracy in open-set face recognition tasks. The empirical experiments confirm the decrease in the generalization gap and the improvement in overall performance.

Description of the image
Generalization gaps comparison between normal margin-based FR and FlatFace trained on CASIA dataset.

Publications

1. Tanapat Ratchatorn and Masayuki Tanaka, “Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization”, IEEE International Conference on Image Processing (ICIP), October, 2024.
[Paper] [GitHub]

2. Tanapat Ratchatorn and Masayuki Tanaka, “Improving Sharpness-Aware Minimization Using Label Smoothing and Adaptive Adversarial Cross-Entropy Loss”, IEEE Access, June, 2025.
[Paper] [GitHub]

3. Tanapat Ratchatorn and Masayuki Tanaka, “FlatFace: Improve Face Recognition by Sharpness-Aware Minimization”, Electronic Imaging (EI), March, 2026.
[Paper link TBA] [GitHub]