A Novel Low-Query-Budget Active Learner with Pseudo-Labels for Imbalanced Data
A. Tharwat, W. Schenck, Mathematics 10 (2022).
Download (ext.)
Artikel
| Veröffentlicht
| Englisch
Autor*in
Tharwat, Alaa;
Schenck, Wolfram
Abstract
Despite the availability of a large amount of free unlabeled data, collecting sufficient training data for supervised learning models is challenging due to the time and cost involved in the labeling process. The active learning technique we present here provides a solution by querying a small but highly informative set of unlabeled data. It ensures high generalizability across space, improving classification performance with test data that we have never seen before. Most active learners query either the most informative or the most representative data to annotate them. These two criteria are combined in the proposed algorithm by using two phases: exploration and exploitation phases. The former aims to explore the instance space by visiting new regions at each iteration. The second phase attempts to select highly informative points in uncertain regions. Without any predefined knowledge, such as initial training data, these two phases improve the search strategy of the proposed algorithm so that it can explore the minority class space with imbalanced data using a small query budget. Further, some pseudo-labeled points geometrically located in trusted explored regions around the new labeled points are added to the training data, but with lower weights than the original labeled points. These pseudo-labeled points play several roles in our model, such as (i) increasing the size of the training data and (ii) decreasing the size of the version space by reducing the number of hypotheses that are consistent with the training data. Experiments on synthetic and real datasets with different imbalance ratios and dimensions show that the proposed algorithm has significant advantages over various well-known active learners.
Erscheinungsjahr
Zeitschriftentitel
Mathematics
Band
10
Zeitschriftennummer
7
Artikelnummer
1068
eISSN
FH-PUB-ID
Zitieren
Tharwat, Alaa ; Schenck, Wolfram: A Novel Low-Query-Budget Active Learner with Pseudo-Labels for Imbalanced Data. In: Mathematics Bd. 10, MDPI AG (2022), Nr. 7
Tharwat A, Schenck W. A Novel Low-Query-Budget Active Learner with Pseudo-Labels for Imbalanced Data. Mathematics. 2022;10(7). doi:10.3390/math10071068
Tharwat, A., & Schenck, W. (2022). A Novel Low-Query-Budget Active Learner with Pseudo-Labels for Imbalanced Data. Mathematics, 10(7). https://doi.org/10.3390/math10071068
@article{Tharwat_Schenck_2022, title={A Novel Low-Query-Budget Active Learner with Pseudo-Labels for Imbalanced Data}, volume={10}, DOI={10.3390/math10071068}, number={71068}, journal={Mathematics}, publisher={MDPI AG}, author={Tharwat, Alaa and Schenck, Wolfram}, year={2022} }
Tharwat, Alaa, and Wolfram Schenck. “A Novel Low-Query-Budget Active Learner with Pseudo-Labels for Imbalanced Data.” Mathematics 10, no. 7 (2022). https://doi.org/10.3390/math10071068.
A. Tharwat and W. Schenck, “A Novel Low-Query-Budget Active Learner with Pseudo-Labels for Imbalanced Data,” Mathematics, vol. 10, no. 7, 2022.
Tharwat, Alaa, and Wolfram Schenck. “A Novel Low-Query-Budget Active Learner with Pseudo-Labels for Imbalanced Data.” Mathematics, vol. 10, no. 7, 1068, MDPI AG, 2022, doi:10.3390/math10071068.
Alle Dateien verfügbar unter der/den folgenden Lizenz(en):
Creative Commons Namensnennung 4.0 International Public License (CC-BY 4.0):
Link(s) zu Volltext(en)
Access Level
Closed Access