skip to main content
Tipo de recurso Mostra resultados com: Mostra resultados com: Índice

Deterministic Replay Using Global Clock

Chen, Yunji ; Chen, Tianshi ; Li, Ling ; Wu, Ruiyang ; Liu, Daofu ; Hu, Weiwu

ACM transactions on architecture and code optimization, 2013-04, Vol.10 (1), p.1-28 [Periódico revisado por pares]

ACM

Texto completo disponível

Citações Citado por
  • Título:
    Deterministic Replay Using Global Clock
  • Autor: Chen, Yunji ; Chen, Tianshi ; Li, Ling ; Wu, Ruiyang ; Liu, Daofu ; Hu, Weiwu
  • Assuntos: Chips ; CMP ; design for debug ; Deterministic replay ; global clock ; pending period ; physical time order
  • É parte de: ACM transactions on architecture and code optimization, 2013-04, Vol.10 (1), p.1-28
  • Notas: ObjectType-Article-2
    SourceType-Scholarly Journals-1
    ObjectType-Feature-1
    content type line 23
  • Descrição: Debugging parallel programs is a well-known difficult problem. A promising method to facilitate debugging parallel programs is using hardware support to achieve deterministic replay on a Chip Multi-Processor (CMP). As a Design-For-Debug (DFD) feature, a practical hardware-assisted deterministic replay scheme should have low design and verification costs, as well as a small log size. To achieve these goals, we propose a novel and succinct hardware-assisted deterministic replay scheme named LReplay. The key innovation of LReplay is that instead of recording the logical time orders between instructions or instruction blocks as previous investigations, LReplay is built upon recording the pending period information infused by the global clock. By the recorded pending period information, about 99% execution orders are inferrable, implying that LReplay only needs to record directly the residual 1% noninferrable execution orders in production run. The 1% noninferrable orders can be addressed by a simple yet cost-effective direction prediction technique, which further reduces the log size of LReplay. Benefiting from the preceding innovations, the overall log size of LReplay over SPLASH-2 benchmarks is about 0.17B/K-Inst (byte per k-instruction) for the sequential consistency, and 0.57B/K-Inst for the Godson-3 consistency. Such log sizes are smaller in an order of magnitude than previous deterministic replay schemes incurring no performance loss. Furthermore, LReplay only consumes about 0.5% area of the Godson-3 CMP, since it requires only trivial modifications to existing components of Godson-3. The features of LReplay demonstrate the potential of integrating hardware support for deterministic replay into future industrial processors.
  • Editor: ACM
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.