Customized Parallel
Computing (CPC) group

This is the home page of the CPC research group of Tampere University. The group's name in Finnish is Räätälöity rinnakkaislaskenta. CPC's main research focus is on design and programming methodologies of customized parallel computing platforms and real time implementations of challenging algorithms.


In addition to publications and theses listed here as academic contributions, CPC has also made major open source contributions in the field of portable and customized heterogeneous computing: The group has created OpenASIP and Portable Computing Language (pocl) which are being used widely as research platforms and even for product use cases. CPC also created the prototype HIPCL tool which evolved into chipStar, a portable CUDA/HIP implementation using open standards.


An algorithm domain with extreme computational demands that CPC has been very interested in the past years is real time ray tracing. A separate focus group was formed for finding algorithmic, parallel/heterogeneous implementation and custom hardware solutions for its challenges in 2015. The group's web pages are here.

The CPC group in Fall 2023.

News

Dec 11th, 2024: Doctoral thesis on energy-efficient CGRAs utilizes OpenASIP

Barry de Bruin from Technical University of Eindhoven defended his doctoral thesis on energy-efficient coarse-grained reconfigurable arrays (CGRA). In the thesis, titled "Design of Energy‐Efficient CGRA‐based Systems", Barry leveraged OpenASIP's flexible framework and used it's retargetable compiler backend to compile C programs for the designed CGRAs. This allowed more flexibility and ease of programming when compared to other CGRA implementations. As a doctoral student, Barry also visited and worked as a member of the CPC group. The group's leader Pekka Jääskeläinen was a copromotor (co-supervisor) in the thesis. Congratulations Barry!

Barry's dissertation

Nov 27th, 2024: Two publications in NorCAS

The CPC group published two papers in this year's IEEE Nordic Circuits and Systems Conference (NorCAS). Kari, the first author of both of the papers, participated in the conference in Lund and gave a presentation about each topic. The first paper leans on the recent interest in using AI-based methods for processor design space exploration. In this field, methods to evaluate design points quickly are key for fast exploration. The paper, done in collaboration with the Robot Learning team in Aalto University, describes a machine learning based method to estimate cycle counts in application-specific, static multi-issue architecuters. The paper is titled "Cycle Count Estimation of VLIW Processors Using Machine Learning".

Cycle count estimation

The second paper, "Fully Automatic Compiler Retargeting and CV-X-IF Hardware Interface Generation for RISC-V Custom Instructions", concerns CPC's efforts in the TRISTAN project. Since developing, verifying, and possibly certifying processor IPs is time-consuming and expensive, there are ongoing efforts in the RISC-V community to specify and implement standardized coprocessor/accelerator interfaces to existing processors. Once the interface is in place, the processor IP can be instantiated with different coprocessor/accelerator IPs. In this work, we leveraged OpenASIP's hardware generation capabilities to automatically generate CV-X-IF-based coprocessors. Operations of the coprocessor can be defined with OpenASIP's processor designer (ProDE). In this work, we also describe the improvements to the RISC-V support in OpenASIP. Operations from C code can now automatically be mapped to (suitable) custom operations in the coprocessor.

cv-x-if coprocessor

Nov 22nd, 2024: Programmable Instruction Dictionary Compression in Springer DAES

Instruction compression has been used in a variety of ways to mitigate the overheads of programmability in processors. We proposed a programmable instruction dictionary compression with the goal of improving dynamic compression ratio and energy-efficiency, and compared our approach to "traditional" instruction stream components. The article titled "Energy-efficient instruction compression with programmable dictionaries" was published in Springer Design Automation for Embedded Systems (DAES).

Parallel dictionaries

Oct 10th, 2024: FPGA bitstream database paper in IEEE VLSI

Implementing applications efficiently on FPGAs requires knowledge not only on the algorithms used in the application, but also on RTL description and FPGA EDA tools. In order to separate the tasks of the SW designer from those of the HW designer, Topi Leppänen proposes to use pre-generated bitstream databases together with partial FPGA reconfiguration. The SW designer can implement an application by picking from kernels in the database and is not required to have expertise in RTL or FPGA design. Our proposed tool, AFOCL, handles downloading the bitstreams and reconfiguring the FPGA automatically. The article "Bitstream Database-Driven FPGA Programming Flow Based on Standard OpenCL" is published in IEEE Transactions on Very Large Scale Integration (VLSI). The code is released as open-source and is available here.

FPGA bitream database flow

Following a succesful tapeout of the Headsail SoC, Beaivi DSP is up and running after initial testing! Read the detailed news item here.

Headsail chip

July 2nd, 2024: CPC at RISC-V Summit Europe

A delegation of three CPC members (Pekka, Kari and Joonas) participated in the RISC-V Summit Europe 2024 in Münich. The hero of the pack was Kari who delivered both a poster and an excellent talk about OpenASIP's RISC-V support. Check the slides here.

June 16th, 2024: PoCL-R enables adaptable AI offloading from nanodrone in AISA Y3 Demonstrator

The 3rd year demonstrator of the AISA project was presented last Friday live at Paidia in Tampere, Finland. The demonstrator features adaptable AI compute offloading from a nanodrone to remote servers. The Crazyflie nanodrone offloads an object detection algorithm to a remote server via PoCL-R and adapts to the network quality by adjusting the compression rate of the images sent to the server on the fly. You can watch the demonstrator videos on YouTube.

April 29th, 2024: The Impact of Wireless Channel Impairments on Computer Vision Accuracy in WCNC 2024

When offloading computer vision (CV) computation from a small device, such as a drone, to a remote server, a stream of images needs to be sent over a wireless network channel. Traditional entropy-coded bitstreams, such as JPEG, transmitted via a digital channel are prone to a so-called “digital cliff”: A sudden drop in the reconstructed image quality due to data corruption caused by channel noise and lost packets. To circumvent the digital cliff, Linear Coding and Transmission schemes (LCT) were pioneered by SoftCast in 2010, in which the reconstructed image quality degrades smoothly with increased amount of channel impairments. So far, however, the impact of LCT and channel impairments on CV accuracy has been studied only minimally. Jakub Žádník recently presented a paper “Performance of Linear Coding and Transmission in Low-Latency Computer Vision Offloading” at the WCNC 2024 conference in Dubai (UAE) in which he studies the impact of LCT processing, wireless channel noise and packet losses on the accuracy of semantic segmentation and object detection tasks. The absence of the digital cliff in the task accuracy was confirmed via a thorough evaluation over a wide range of LCT configurations. The findings were further strengthened by a realistic 5G channel simulation and retraining the CV tasks to account for the distortions caused by LCT and noisy channel.

April 12th, 2024: OpenCL pipe specification improvements in IWOCL 2024

OpenCL Pipe is a memory object used for passing data between kernels. It is useful in streaming style applications, where data is forwarded from one task to another. Since the pipe can be implemented in multiple ways, and OpenCL is intended as a programming model for heterogeneous platforms, the performance of the pipe implementations can vary heavily. The PhD thesis work of Topi Leppänen has resulted in insights on how the pipe specification could be improved especially in the context of FPGAs. These findings, along with suggestions for the OpenCL specification, were presented in IWOCL 2024 by Topi. Read the publication here.

April 5th, 2024: Adding fault tolerance to OpenCL

The modern computing landscape includes a variety of platforms. In addition to general-purpose devices, specialized processors are used to increase efficiency in various application domains and use cases. The OpenCL standard presents a unified way to program these heterogeneous devices, and the CPC group's PoCL is a vendor-independent, open-source implementation of the standard. In his MSc thesis "Adding fault tolerance to OpenCL" (2023) Robin Bijl added a mechanism to achieve robust computation with PoCL. This allows fault tolerance and reliable computing even in the context of heterogeneous platforms. Read the thesis here.

December 11th, 2023: Improving IoT device capabilities by offloading OpenCL kernels to edge servers

The Internet of things (IoT) consists of an enormous amount of devices with their size varying from large to extremely tiny. While it may be desirable to have complex functionalities in even the tiniest devices, this is often not feasible simply due to the lack of available resources. However, offloading the computation to a (nearby) server or a larger device enables sharing of the resources and seemingly allows even small devices to perform demanding computations. In his MSc thesis "Offloading Computation with a Minimized OpenCL Runtime from a Nano Drone" (2022) Jyry Uitto created a proof-of-concept implementation of a nano drone that can offload OpenCL kernel execution onto an edge server. Read the thesis here.

November 30th, 2023: Dual-IS article in IEEE TC

Static multi-issue processors exploit instruction level parallelism efficiently thanks to the lack of dynamic hardware that schedules instructions during run time. However, their instruction stream energy consumption is significantly higher than that of their dynamic multi- or single-issue counterparts. Processor designers must choose between the benefits of static multi-issue capabilities and higher code density, but is it too much to ask for both? In our latest article, we introduce an energy-efficient dual-mode (RISC-V single-issue and an exposed datapath VLIW) architecture for leveraging instruction level parallelism statically when available in the program, without suffering from VLIW’s poor code density when there’s a lack of it. The flexibility of the architecture is utilized by a novel compilation method that can generate code for both instruction sets with fine-grained mode switching. Read more in the article.

November 16th, 2023: BrainTTA presentation in IEEE ICCD 2023

Our Dutch colleague Maarten Molendijk from TU Eindhoven presented a co-authored paper "BrainTTA: A 28.6 TOPS/W Compiler Programmable Transport-Triggered NN SoC" in IEEE ICCD 2023. The publication was a result of successful collaboration work between our CPC group and PARSE/TUE where a programmable TTA/SIMD-based accelerator was designed for ultra low power AI inference on low precision use cases. The design was done using the OpenASIP tools with the design work conducted by Molendijk et al. Read more about it in the preprint. The presentation slides are available here.

November 6th, 2023: New publications added
  • Topi Leppänen, Joonas Multanen, Leevi Leppänen, Pekka Jääskeläinen:
    AFOCL: Portable OpenCL Programming of FPGAs via Automated Built-in Kernel Management
    in IEEE Nordic Circuits and Systems Conference (NorCAS 2023) (download).
  • Niklas Rother, Leonard Mätzner, Pekka Jääskeläinen, Topi Leppänen, Jens Karsten Schleusner, Holger Christoph Blume:
    Synthetic Aperture Radar Algorithms on Transport Triggered Architecture Processors using OpenCL
    International Radar Conference 2023
  • Maarten Molendijk, Floran de Putter, Manil Dev Gomony, Pekka Jääskeläinen and Henk Corporaal:
    BrainTTA: A 28.6 TOPS/W Compiler Programmable Transport-Triggered NN SoC
    IEEE International Conference on Computer Design (ICCD 2023)
  • Panagiotis Mousouliotis, Topi Leppanen, Pekka Jaaskelainen, Nikos Petrellis, Panagiotis Christakos, Georgios Keramidas, Christos Antonopoulos, Nikolaos Voros:
    On the OpenCL Support for Streaming Fixed-Function Accelerators on Embedded SoC FPGAs
    The 19th International Symposium on Applied Reconfigurable Computing (ARC 2023)
November 1st, 2023: AFOCL presentation in NorCAS 2023 conference

Our doctoral researcher Topi Leppänen presented the paper "AFOCL: Portable OpenCL Programming of FPGAs via Automated Built-in Kernel Management" in NorCAS 2023. AFOCL allows FPGA device users to avoid vendor lock-in and separates the roles of software and FPGA engineer. Behind the curtain, the OpenCL implementation automatically selects IPs from a precompiled bitstream database and handles FPGA reconfiguration. Details in the paper.

August 24th, 2023: Final demonstrator video for the CPSoSAware EU project available

Check out the video below of the final demonstrator for the CPSoSAware EU project. The work was a collaboration with the University of Peloponnese. The demonstrator features a nanodrone, which offloads processing to edge resources wirelessly using Pocl-R.

August 8th, 2023: Added a publication from 2022 missing from the web page
  • Topi Leppänen, Atro Lotvonen, Pekka Jääskeläinen:
    "Cross-vendor programming abstraction for diverse heterogeneous platforms"
    in Frontiers in Computer Science, Vol. 4, Oct. 2022 (download).
June 15th, 2023: Two new publications added
  • Topi Leppänen, Atro Lotvonen, Panagiotis Mousouliotis, Joonas Multanen, Georgios Keramidas, Pekka Jääskeläinen:
    "Efficient OpenCL system integration of non-blocking FPGA accelerators"
    in Microprocessors and Microsystems (MICPRO), Vol. 97, Mar. 2023 (download).
  • Alex Hirvonen, Topi Leppänen, Kari Hepola, Joonas Multanen, Joost Hoozemans, Pekka Jääskeläinen:
    "AEX: Automated High-Level Synthesis of Compiler Programmable Co-processors"
    in Journal of Signal Processing Systems (JSPS), Feb. 2023​ (download).
October 17th, 2022: A master's thesis and a new publication added
  • Kari Hepola:
    Generation of Customized RISC-V Implementations
    (2022) (link)
  • Kanishkan Vadivel, Barry de Bruin, Pekka Jääskeläinen, Roel Jordans and Henk Corporaal:
    "Prebypass: Software Register File Bypassing for Reduced Interconnection Architecture"
    in Euromicro Conference on Digital Systems Design (DSD 2022) (download).
September 22nd, 2022: New publications added
  • Jakub Žádník, Markku Mäkitalo, Pekka Jääskeläinen,
    "Pruned Lightweight Encoders for Computer Vision"
    IEEE 24th International Workshop on Multimedia Signal Processing (MMSP 2022) download poster
  • Kari Hepola, Joonas Multanen and Pekka Jääskeläinen:
    "Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism"
    in 35th GI/ITG International Conference on Architecture of Computing Systems (ARCS 2022) (download).
  • Kari Hepola, Joonas Multanen and Pekka Jääskeläinen:
    "OpenASIP 2.0: Co-Design Toolset for RISC-V Application-Specific Instruction-Set Processors"
    in 33rd IEEE International Conference on Applicationspecific Systems, Architectures and Processors (ASAP 2022) (download).

(older news here)

Social Media

Follow the CPC group on Twitter/X: https://twitter.com/CustomParComp

Contact Us

Send email to to contact us.