

# Latency, Power, and Security Optimization in Distributed Reconfigurable Embedded Systems

#### Hyunsuk Nam and Roman Lysecky

Electrical and Computer Engineering University of Arizona, Tucson, AZ hnam@email.arizona.edu





# Outline

- Introduction and Motivation
- Related Work
- Research Objectives
- Modeling and Optimization of Distributed Heterogeneous Embedded Systems
- Experimental Results
- Conclusions and Future Work



#### **Introduction – Distributed Heterogeneous Embedded Systems**



Distributed embedded system are composed of heterogeneous computing resources including processors, FPGA, and custom HW



#### **Introduction – Distributed Heterogeneous Embedded Systems**



Traditional approaches to security focus on cryptography only for inter-device communication



## **Introduction - Malware**





McAfee Labs Threats Report: Nov. 2015

- Malware growing at an alarming rate
  - 100,000 new malware every day
- Malware can affect both SW and HW
  - FPGAs are reconfigurable and can potentially be reconfigured by malicious software

5



# Introduction – Integrating Security within Design Process

- Goal: Security needs to be integrated within the design and optimization
  - Equally as important as other evaluation metrics (e.g., latency, energy)
- Need for method to quantify security within the design process
  - Enable ability to analyze the impact of different cryptographic implementations





#### **Related Work - Integrating Security within Design Process (1)**

- Hardware/software co-design for secure automotive systems [Jiang et al., DATE 2012]
  - Mapping and scheduling of tasks for ECUs
  - Cryptography used for inter-ECU communication
  - Goal is to optimize the number AES cryptography rounds



K. Jiang, P. Eles, and Z. Peng, "Co-Design Techniques for Distributed Real-Time Embedded Systems with Communication Security Constraints. *Design, Automation & Test in Europe Conference & Exhibition (DATE)*, pp. 947-952, March 2012.



#### **Related Work - Integrating Security within Design Process (1)**

- Task allocation and network scheduling [Selicean & Pop, ACM TECS 2015]
  - Design Methodology security-aware authentication supporting FlexRay, Time Triggered protocol
  - Determine task allocation, priority assignment, network scheduling, and key release
  - Goal is to minimize the summation of the worst-case latency
  - Seek to optimize the cryptography and authentication methods utilized within distributed automotive electronics
  - Does not consider intra-device cryptography, wireless communication/ or energy constraints

D. T. Selicean and D. P. Pop, "Design Optimization of Mixed-Criticality Real-Time Embedded Systems," ACM Transactions on Embedded Computing Systems (TECS), vol. 14, no. 50, May 2015.



# **Objectives**

- Objectives and goals:
  - Design methodology for optimizing dataflow application using distributed, heterogeneous,

and reconfigurable embedded systems

- Consider embedded devices incorporating reconfigurable FPGAs, supporting mapping of tasks between SW and HW alternatives
- Support cryptography between all tasks implementations, including interand intra-device, SW and HW
- Develop integrated modeling framework for computation, communication, security, and power
- Optimize security, latency, power consumption given constraints on other metrics
- Define security levels for quantify to trade-off power and security





# **Security Levels**

- Security Level
  - Defines a relative ranking of strength of the selected cryptography method
  - Can be used to rank different cryptographic alternatives and configurations thereof

| Security<br>Level | Key Size<br>(bits) | Number<br>Rounds |
|-------------------|--------------------|------------------|
| 12                | 256                | 14               |
| 11                | 256                | 13               |
| 10                | 256                | 12               |
| 9                 | 256                | 11               |
| 8                 | 256                | 10               |
| 7                 | 192                | 12               |
| 6                 | 192                | 11               |
| 5                 | 192                | 10               |
| 4                 | 192                | 9                |
| 3                 | 128                | 10               |
| 2                 | 128                | 9                |
| 1                 | 128                | 8                |
| 0                 | 0                  | 0                |



## **Parameterized Dataflow Application Model**



- Use Parameterized synchronous dataflow (PSDF) model
  - Specify
    - System tasks
    - Parameterizable data sizes
    - Tokens transmitted between tasks
- Dataflow model for a video-based object detection and tracking application

Target ID/ Target Classified images



# **Execution Latency Model**



- Specifies software and hardware task alternatives
  - Assumes all tasks can be implemented in HW or SW
- Software Latency
  - Latency of a task is based upon physical measurement from specific device
  - Linear scaling is applied to adjust for specific processor frequency
- Hardware Latency
  - Latency is measured in clock cycles based on RTL simulation
  - Frequency of hardware is limited by ED's system bus or synthesis results
  - Hardware size if constrained to size of reconfigurable region/tile



# **Communication Latency Model**

CL<sub>SS</sub>(w) CL<sub>SH</sub>(w)

CL<sub>HS</sub>(w) CL<sub>HH</sub>(w)

CLD<sub>SS</sub>(w) CLD<sub>HS</sub>(w)

CLD<sub>SH</sub>(w) CLD<sub>HH</sub>(w)



- Communication latency model
  - Use physical measurements to determine latency for different modes for communication and size of tokens
  - Using IEEE 802.11g
  - Eight possible communication modes for transferring data between tasks, which depends on the task implementation



Inter-device communication



# **Power Model**



- SW Power (P<sub>SW</sub>)
  - Characterizes the active and idle power consumption of each μP
- HW Power (P<sub>HW</sub>)
  - RTL implementation for each hardware task
  - Post-synthesis power estimation
- Communication Power (P<sub>c</sub>)
  - Physical measurements of communication middleware on EDs
  - Latency based on data transferred, operating frequency, and communication mode
- Security Power consumption (P<sub>s</sub>)
  - Utilized prototype SW and HW implementations for each SL
  - Created regression model based and key, rounds, and data size

ED2 FPGA: 1 µP1: 0

ED1

**µP1: 2** 



## **DESIGN SPACE EXPLORATION METHODOLOGY**





# **Experimental Setup**



- Base latency constraint: 8 sec
  - Relaxed latency constraint: 12 sec



#### Example task mapping for a particular system configuration





THE UNIVERSITY OF ARIZONA College of Engineering

#### **Experimental Results – Genetic Optimization Algorithm**



For base constraint, all population size reach 0.1 % optimal after 50 generations For relaxed constraint, population size 75 and 100 reach

0.1 % Optimal after 100

Configured genetic optimization algorithm to use population size of 75 and generations of 100



### **Experimental Results - Power vs. Security Level**



- Each doubling key size (with same rounds) increases power by average of 0.2% (e.g., SL of 3 to SL 8)
- Each increase in number of round (with same key size) increases power by average of 1.6% (e.g., SL of 8 to SL 12)



#### **Experimental Results – Hardware/Security/Power Tradeoffs**



Increasing number of hardware accelerator results in

lower power



THE UNIVERSITY OF ARIZONA College of Engineering

#### **Experimental Results – Hardware/Security/Power Tradeoffs**





# **Conclusions and Future Works**

#### • Conclusions

- Application modeling and optimization framework for dataflow applications
- Different cryptographic configurations to achieve different security levels
- Evaluated the security, hardware, and power tradeoffs, demonstrating the power reductions that can be achieved using reconfigurable hardware and in some cases using a higher security level

#### Future works

- Utilizing multi-objective optimization metrics
- Integrating dynamic profiling and system observation methods to monitor system execution and detect deviations
- Explore the effectiveness of the proposed approach both for different applications and different heterogeneous system architectures



# Thank you

# Questions?

23