Project Details
Projekt Print View

System-Physician-on-a-Chip (SPOC): Chip Health-Monitoring Infrastructure IP and Run-Time Adaptation

Subject Area Computer Architecture, Embedded and Massively Parallel Systems
Electronic Semiconductors, Components and Circuits, Integrated Systems, Sensor Technology, Theoretical Electrical Engineering
Term from 2015 to 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 269744693
 
Final Report Year 2021

Final Report Abstract

Design-time solutions and guard-bands for resilience are no longer sufficient for nanoscale integrated circuits (ICs). Each chip, due to process variations, is born with a unique personality (“nature”), and because of operating conditions, environment, and workload, grows uniquely (“nurture”). This project is motivated by the need to guarantee that each system, despite different nature and nurture, has an acceptable behavior (“resilience”). Resilience has been defined as the persistence of performance level that can justifiably be trusted in the presence of change. Hence static solutions based on pre-determined adaptation strategies cannot provide adequate resilience as systems evolve with time. While today’s ICs incorporate a large number of sensors (thermal, voltage, delay, etc.) for runtime monitoring, breakthroughs are needed to extract useful information from sensor data, perform real-time analysis, and make decisions about online adaptation. Appropriate reasoning methods are also needed to deal with inconsistent or contradictory sensor data due to stress-, process-, and workload-induced spatiotemporal variations. It is important to predict system state so that countermeasures can be taken before a failure occurs. The proposed research is focused on data-driven techniques for guiding dynamic adaptation policies. This level of dynamic decision-making and prediction-based control is a significant step forward towards resilient systems. The intellectual merit lies in the advancement of data analytics solutions for reasoning about on-chip behavior, the integration of prediction-based adaptation, and the update of adaptation strategies based on success, or lack thereof, of past adaptation decisions. This project leads to a health-monitoring infrastructure IP for a system to respond to changes in behavior occurring at different time scales. A hybrid hardware/software implementation is considered. For real-time decisions (e.g., response to voltage droop), the IP is designed purely in hardware.

Publications

  • "Selfawareness and self-learning for resiliency in real-time systems", International Online Test Symposium (IOLTS), 2015, Greece
    M.B. Tahoori, A. Chatterjee, K. Chakrabarty, A. Koneru, A. Vijayan and D. Banerjee
    (See online at https://doi.org/10.1109/IOLTS.2015.7229845)
  • "On-chip Droop-induced Circuit Delay Prediction Based on Support-Vector Machines", in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2016
    F. Ye, F. Firouzi, Y. Yang, K. Chakrabarty, and M.B. Tahoori
    (See online at https://doi.org/10.1109/TCAD.2015.2474392)
  • "Online Soft-Error Vulnerability Estimation for Memory Arrays", in proceedings of VLSI Test Symposium (VTS), 2016, USA
    A. Vijayan, A. Koneru, M. Ebrahimi, K. Chakrabarty, and M.B. Tahoori
    (See online at https://doi.org/10.1109/VTS.2016.7477301)
  • "Fine-Grained Aging- Induced Delay Prediction Based on the Monitoring of Run-Time Stress", in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2017
    A. Vijayan, A. Koneru, S. Kiamehr, K. Chakrabarty, and M.B. Tahoori
    (See online at https://doi.org/10.1109/TCAD.2016.2620903)
  • "Online Soft-Error Vulnerability Estimation for Memory Arrays and Logic Cores", in IEEE Transcactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2017
    A. Vijayan, S. Kiamehr, M. Ebrahimi, K. Chakrabarty, and M.B. Tahoori
    (See online at https://doi.org/10.1109/TCAD.2017.2706558)
  • "Run-time hardware trojan detection using performance counters", Proc. IEEE International Test Conference, 2017
    R. Elnaggar, K. Chakrabarty and M. Tahoori
    (See online at https://doi.org/10.1109/TEST.2017.8242063)
  • "Workload-aware Static Aging Monitoring of Timing Critical Flip-flops", in Proceedings of the Asia and South Pacific Design Automation Conference (ASPDAC), 2017, Japan
    A. Vijayan, S. Kiamehr, F. Oboril, K. Chakrabarty, and M.B. Tahoori
    (See online at https://doi.org/10.1109/ASPDAC.2017.7858316)
  • "Workload-aware Static Aging Monitoring and Mitigation of Timing-critical Flip-flops", in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2018
    A. Vijayan, S. Kiamehr, F. Oboril, K. Chakrabarty, and M.B. Tahoori
    (See online at https://doi.org/10.1109/TCAD.2017.2778254)
  • "Hardware trojan detection using changepoint-based anomaly detection techniques", in IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 2019
    R. Elnaggar, K. Chakrabarty and M. B. Tahoori
    (See online at https://doi.org/10.1109/TVLSI.2019.2925807)
  • "Runtime Identification of Hardware Trojans by Feature Analysis on Gate-level Unstructured Data and Anomaly Detection", in ACM Transaction on Design Automation of Electronic Systems (TODAES), 2020
    A.Vijayan, M. Tahoori, K. Chakrabarty
    (See online at https://doi.org/10.1145/3391890)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung