

#### A fully parameterized Virtual Coarse-Grained Reconfigurable Arrays for HPC Applications

<u>Amit Kulkarni</u> (Amit.Kulkarni@UGent.be) Elias Vansteenkiste and Dirk Stroobandt Ghent University - Belgium



Andreas Brokalakis and Antonios Nikitakis Synilixis Solutions Ltd - Greece



**RAW 2016** 

1

# Virtual Coarse-Grained Reconfigurable Array (VCGRA)

CGRA: Domain-specific architecture

Virtual CGRA: CGRA implemented on FPGA

Example: CGRA of mathematical operators to compute mathematical functions



# Why use a VCGRA?

CGRA versus processor:

- Better performance, power efficiency, ... CGRA versus FPGA/ASIC:
- Higher abstraction level for programmers
- Higher abstraction level for development tools (faster P&R tools)

Lower development cost

Implemented on FPGA:

 Advantages of (customized) CGRA without cost of ASIC

# Conventional implementation of PEs and SBs on FPGA

Static FPGA-configuration

Application updated by writing settings registers



## Implementation of PEs and SBs using Parameterized Configurations

Configuration is function of settings Application updated by reconfiguring FPGA



# Parameterizing the intra-connects of the PE





#### Parameterized Configurations VCGRA tool flow



\*Usually custom synthesis, mapping and P&R tools

# Why (not) use Parameterized Configurations?

Lower implementation overhead of VCGRAs

Computing intensive FPGA tool flow outside application development cycle

Slower reconfiguration of VCGRA No cycle-by-cycle context switching

Tools currently work for academic FPGA architectures and (with limitations) for Xilinx FPGAs

# **HPC: Retinal Vessel Segmentation**

Resource utilization and PaR results of a PE (MAC) operator

| VCGRA               | LUTs (TLUTs)        | TCONs | Logic depth<br>level |
|---------------------|---------------------|-------|----------------------|
| Conventional        | 2522 (0)            | 0     | 36                   |
| Fully Parameterized | 1802 ( <i>526</i> ) | 568   | 33                   |

30% fewer slices, 31% fewer intra-connects, lower delay

| 4x4 VCGRA Grid      | Inter-Network | Settings Register |
|---------------------|---------------|-------------------|
| Conventional        | 41            | 25                |
| Fully Parameterized | 0             | 0                 |

- No logic resources (LUTs and FFs) used for switch blocks
- No settings registers and shift-register chain

# Conclusion

- Applied the technique of Parameterized configurations to VCGRAs
- Significantly more efficient implementation of VCGRAs on FPGA possible

Necessary algorithms have been developed, but tools still need to be adapted for commercial FPGAs

# Thank you

• For more info please visit my poster

