Publication Details
- Home
- Publication Details
ECHO: Effective Coreset-Driven Learning via Hierarchical Optimizations
Published in: IEEE International Conference on Data Mining 2025
Despite driving record performance, the increasing reliance of deep learning on ever-larger datasets has led to prohibitively high storage and management costs that threaten continued progress. While coreset selection offers a promising solution to this challenge, existing methods often rely on expensive iterative optimization procedures or fail to select samples that allow strong generalization across tasks. In this work, we introduce ECHO, a coreset construction and augmentation strategy that leverages the relational properties inherent to a dataset to find its most representative samples. Unlike prior methods, our approach constructs a structured graph that encodes intrinsic dataset patterns, based on which influential samples are identified and augmented to maximize generalization performance. Extensive experiments across five benchmark datasets and against eighteen different coreset selection baselines show that ECHO achieves up to 60% accuracy gains under extreme compression, while being orders of magnitude faster than state-of-the-art alternatives. These results establish a new benchmark for data-efficient learning, particularly under tight coreset budgets, and showcase the benefits of structured coreset selection for effective generalization.