FedGraph-VASP: Privacy-Preserving Federated Graph Learning for AML
The 20-Second Summary
Money laundering is fundamentally cross-institutional, but VASPs can’t pool raw transaction graphs without exposing sensitive customer and proprietary information. FedGraph-VASP uses federated graph learning and exchanges only boundary embeddings (not raw graphs), securing those exchanges with post-quantum cryptography so cross-entity patterns can be learned without giving up data sovereignty.
The Problem
Money laundering is a graph problem. Criminals move funds through chains of transactions, often spanning multiple financial institutions, to obscure the origin and destination of illicit money. The most effective laundering patterns (layering, structuring, chain-hopping) deliberately exploit institutional boundaries: a transaction that looks benign within one institution may be part of a suspicious pattern when viewed across two or three.
Virtual Asset Service Providers (VASPs)—cryptocurrency exchanges, custodial wallets, payment processors—face this problem acutely. The Travel Rule (FATF Recommendation 16) requires VASPs to share certain transaction metadata, but raw transaction graphs are far more revealing than the mandated fields. Sharing them would expose customer wallet associations and transaction patterns, internal risk scoring models and thresholds, proprietary liquidity flows and trading strategies, and personally identifiable information subject to GDPR/CCPA or local privacy law.
So VASPs are stuck: train alone and miss cross-entity patterns, or share data and violate privacy and compliance constraints. Neither option is acceptable.
Why Existing Approaches Fall Short
| Approach | Limitation |
|---|---|
| Centralized graph learning | Requires pooling all transaction data - privacy and regulatory non-starter |
| Standard federated learning (FedAvg) | Designed for tabular/image data; doesn’t handle graph topology |
| Secure multi-party computation | Computationally expensive for graph operations; doesn’t scale to large transaction networks |
| Rule-based cross-institution sharing | Shares only flagged transactions; misses structural patterns in the graph |
| Homomorphic encryption on graphs | Prohibitive overhead for GNN operations |
The core issue is that graph learning requires structural information (neighborhoods, edge patterns, motifs) that vanilla FL aggregation can’t convey. A node’s embedding depends on its neighbors, and in cross-institution graphs those neighbors may live at a different VASP.
Our Approach: FedGraph-VASP
FedGraph-VASP is built around a practical insight: in a cross-institution transaction graph, the nodes that matter most are the boundary nodes—accounts that transact across institutional boundaries. These are exactly the points where laundering signals move from one VASP’s partial visibility to another’s.
Architecture
The framework has three moving parts. Each VASP trains a local GNN on its own transaction graph (so the raw graph never leaves the institution), capturing intra-institution patterns like unusual volumes, rapid fund movements, and proximity to known suspicious addresses. For accounts that transact across institutional boundaries, each party computes and exchanges boundary embeddings—compressed representations of those boundary nodes’ local neighborhoods—so cross-institution structure can be learned without sharing raw transactions. Because transaction data often has multi-year retention and compliance requirements, those embedding exchanges are protected with post-quantum cryptography to reduce “harvest now, decrypt later” risk.
The Federated Training Loop
1. Each VASP trains GNN on local transaction graph
2. Boundary nodes identified (accounts with cross-VASP transactions)
3. Boundary embeddings computed and encrypted with PQ-secure scheme
4. Encrypted embeddings exchanged between relevant VASPs
5. Each VASP incorporates received embeddings into its local GNN
6. Model parameters (not data) aggregated across VASPs
7. Repeat until convergence
Why Boundary Embeddings?
Boundary embeddings aim for a privacy–utility sweet spot. Sharing raw subgraphs is a privacy non-starter; sharing only model parameters loses the structural signal at the boundary (VASP A can’t “see” the neighborhood structure at VASP B). Boundary embeddings exchange a lossy summary of what the local model learned about the boundary neighborhood, without exposing individual transactions, amounts, or counterparties.
How We Evaluated
The evaluation asks whether federated graph learning across VASPs improves AML detection compared to isolated local training while maintaining data privacy.
Setting: Transaction graph scenarios simulating multiple VASPs with overlapping customer bases and cross-institution flows, with laundering chains deliberately spanning institutional boundaries.
We report detection performance (precision/recall/F1), cross-institution gain over local-only training, privacy analysis of information leakage from boundary embeddings, communication overhead of embedding exchange, and the added cost of post-quantum protection.
Key Results
The paper reports concrete gains on the Elliptic Bitcoin dataset under Louvain partitioning: FedGraph-VASP reaches F1 = 0.508, outperforming FedSage+ (F1 = 0.453) by 12.1%, and in a high-connectivity regime it matches centralized performance (F1 = 0.620) in the reported setting. On an Ethereum dataset, the trade-off flips in their results: FedGraph-VASP reports F1 = 0.635 while FedSage+ reports F1 = 0.855, suggesting that when connectivity is very sparse/modular, different design choices can dominate.
On the post-quantum side, the reported Kyber-512 sizes are 800 bytes (public key) and 768 bytes (ciphertext). Encryption is ~0.10 ms per embedding; batching 1,000 boundary nodes completes in ~95 ms (throughput > 10,000 embeddings/sec), adding <0.5% overhead to training time.
For privacy positioning, embeddings are treated as informative but not “perfectly one-way”: the paper reports partial success in reconstruction (R² = 0.32) in its inversion test.
Discussion
The core motivation is partial visibility: laundering patterns span multiple institutions, while data sharing is constrained by privacy and regulation. Boundary embeddings are one way to exchange limited cross-institution signals without transferring raw graphs.
Designs that avoid raw graph sharing may be easier to evaluate under privacy and regulatory requirements, and boundary exchange resembles existing inter-VASP communication patterns (like Travel Rule messaging) at a higher abstraction level. The post-quantum layer targets long retention windows for financial data, where confidentiality needs to survive beyond current cryptographic assumptions.
Limitations and Next Steps
The approach depends on local training quality: sparse or noisy graphs produce less informative embeddings. The threat model also doesn’t fully cover adversarial VASPs—malicious participants could submit misleading embeddings to disrupt detection or probe others’ models. Scalability to hundreds of VASPs and millions of boundary nodes needs further optimization of the embedding exchange protocol, and the evaluation uses simulated graphs; validation on real VASP data remains future work (subject to access agreements).
Natural next steps are adding formal DP guarantees for boundary embeddings, adopting Byzantine-robust aggregation for adversarial participants, running deployment pilots under regulatory sandbox frameworks, and extending to temporal graphs (since laundering strategies evolve over time).