Suche nach Personen

plus im Publikationsserver
plus bei BASE
plus bei Google Scholar

Daten exportieren

 

Loss-Aware Histogram Binning and Principal Component Analysis for Customer Fleet Analytics

Titelangaben

Verfügbarkeit überprüfen

Ling, Kunxiong ; Thiele, Jan ; Setzer, Thomas:
Loss-Aware Histogram Binning and Principal Component Analysis for Customer Fleet Analytics.
In: IEEE Open Journal of Intelligent Transportation Systems. 5 (15. Februar 2024). - S. 160-173.
ISSN 2687-7813

Volltext

Open Access
[img]
Vorschau
Text (PDF)
Verfügbar unter folgender Lizenz: Creative Commons: Attribution 4.0 International (CC BY 4.0) Creative Commons: Namensnennung (CC BY 4.0) .

Download (1MB) | Vorschau
Volltext Link zum Volltext (externe URL):
https://doi.org/10.1109/OJITS.2024.3366279

Kurzfassung/Abstract

We propose a method to estimate information loss when conducting histogram binning and principal component analysis (PCA) sequentially, as usually done in practice for fleet analytics. Coarser-grained histogram binning results in less data volume, fewer dimensions, but more information loss. Considering fewer principal components (PCs) results in fewer data dimensions but increased information loss. Although information loss with each step is well understood, little guidance exists on the overall information loss when conducting both steps sequentially. We use Monte Carlo simulations to regress information loss on the number of bins and PCs, given few parameters of a dataset related to its scale and correlation structure. A sensitivity study shows that information loss can be approximated well given sufficiently large datasets. Using the number of bins, PCs, and two correlation measures, we derive an empirical loss model with high accuracy. Furthermore, we demonstrate the benefits of estimating information losses and the representativeness of total loss in evaluating the accuracy of k-means clustering for a real-world customer fleet dataset. For preprocessing sensor data which are aggregated from sufficient number of samples, continuously distributed, and can be represented by Beta-distributions, we recommend not to coarsen the histogram binning before PCA.

Weitere Angaben

Publikationsform:Artikel
Sprache des Eintrags:Englisch
Institutionen der Universität:Wirtschaftswissenschaftliche Fakultät > Betriebswirtschaftslehre > ABWL und Wirtschaftsinformatik
DOI / URN / ID:10.1109/OJITS.2024.3366279
Open Access: Freie Zugänglichkeit des Volltexts?:Ja
Peer-Review-Journal:Ja
Verlag:Ieee-Inst Electrical Electronics Engineers Inc
Die Zeitschrift ist nachgewiesen in:
Titel an der KU entstanden:Ja
KU.edoc-ID:33074
Eingestellt am: 18. Mär 2024 11:46
Letzte Änderung: 18. Mär 2024 11:46
URL zu dieser Anzeige: https://edoc.ku.de/id/eprint/33074/
AnalyticsGoogle Scholar