Suche nach Personen

plus im Publikationsserver
plus bei BASE
plus bei Google Scholar

Daten exportieren

 

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications

Titelangaben

Verfügbarkeit überprüfen

Böhme, David ; Geimer, Markus ; Arnold, Lukas ; Voigtlaender, Felix ; Wolf, Felix:
Identifying the Root Causes of Wait States in Large-Scale Parallel Applications.
In: ACM Transactions on Parallel Computing. 3 (2016) 2: 11. - 24 S.
ISSN 2329-4949 ; 2329-4957

Volltext

Volltext Link zum Volltext (externe URL):
https://doi.org/10.1145/2934661

Kurzfassung/Abstract

Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira, Jr., et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. By replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances, even for runs with hundreds of thousands of processes.

Weitere Angaben

Publikationsform:Artikel
Sprache des Eintrags:Englisch
Institutionen der Universität:Mathematisch-Geographische Fakultät > Mathematik > Lehrstuhl für Reliable Machine Learning
DOI / URN / ID:10.1145/2934661
Open Access: Freie Zugänglichkeit des Volltexts?:Nein
Peer-Review-Journal:Ja
Verlag:ACM
Die Zeitschrift ist nachgewiesen in:
Titel an der KU entstanden:Nein
KU.edoc-ID:29920
Eingestellt am: 01. Apr 2022 12:37
Letzte Änderung: 05. Apr 2022 12:18
URL zu dieser Anzeige: https://edoc.ku.de/id/eprint/29920/
AnalyticsGoogle Scholar