Fed-Focal Loss Explained: What It Does and Why It Matters
Fed-Focal Loss is my most-cited paper (93 citations as of 2026, presented at IJCAI 2020). This is the plain-language explanation for founders, CTOs, and anyone who wants to understand what it does without reading the paper.
The problem
In federated learning, multiple clients (hospitals, phones, branches) each train a model on their own data and share only the model updates — not the data itself. The server aggregates these updates into a better model.
The problem: most real-world data is imbalanced. In a fraud detection system, 99% of transactions are legitimate and 1% are fraud. In a medical diagnosis system, 95% of patients are healthy and 5% have the condition. Standard federated learning (FedAvg) struggles with this — the majority class dominates, and the model becomes good at predicting the common case and bad at predicting the rare case.
The challenge is worse in federated settings because you cannot see the global data distribution (that’s the whole point of FL — data stays local). You cannot rebalance the data because you don’t have it.
The solution
Fed-Focal Loss adapts focal loss (originally from object detection) to federated learning. The idea:
- Down-weight easy examples: examples the model already classifies correctly get a lower weight in the loss function. The model spends less effort on what it already knows.
- Up-weight hard examples: examples the model gets wrong (typically the minority class) get a higher weight. The model focuses on what it doesn’t know.
- No global data needed: the focal loss modifier is computed locally on each client’s data. No global data distribution is needed. This preserves the privacy guarantee of FL.
The result: the model learns the minority class better without requiring access to the global data distribution. Balanced accuracy and F1-score improve, particularly for the minority class.
Why it matters
Fed-Focal Loss matters for three real-world use cases:
-
Healthcare: rare diseases are the hardest to diagnose and the most important to get right. Fed-Focal Loss improves rare-disease detection in federated hospital networks without requiring hospitals to share patient data.
-
Fraud detection: fraudulent transactions are <1% of all transactions. Fed-Focal Loss improves fraud detection across banks that cannot share transaction data.
-
Any FL deployment with class imbalance: which is most of them. Real-world data is almost always imbalanced.
The paper
- Title: “Fed-Focal Loss for imbalanced data classification in Federated Learning”
- Authors: Dipankar Sarkar, Ankur Narang (DeepCoreX), Sumit Rai
- Venue: IJCAI 2020 Workshop on Federated Learning for Data Privacy and Confidentiality
- arXiv: 2011.06283
- Citations: 93 (as of 2026)
- Full paper: dipankar.cc/publication/fed-focal-loss/
The companion paper
CatFedAvg (4 citations) is the companion paper, published in the same November 2020 cycle. It addresses the other big FL challenge: communication efficiency. Together, the two papers cover the two most important practical challenges in real-world federated learning.
How to use this in your system
If you’re building a federated learning system and dealing with class imbalance, the Federated Learning Implementation consulting engagement can help. Fed-Focal Loss is one of the techniques we apply, alongside CatFedAvg, differential privacy, and secure aggregation.
— Dipankar Sarkar, Author of Fed-Focal Loss