This is the home page of the CPC research group of Tampere University. The group's name in Finnish is Räätälöity rinnakkaislaskenta. CPC's main research focus is on design and programming methodologies of customized parallel computing platforms and real time implementations of challenging algorithms.
In addition to publications and theses listed here as academic contributions, CPC has also made major open source contributions in the field of portable and customized heterogeneous computing: The group has created OpenASIP and Portable Computing Language (pocl) which are being used widely as research platforms and even for product use cases. CPC also created the prototype HIPCL tool which evolved into chipStar, a portable CUDA/HIP implementation using open standards.
An algorithm domain with extreme computational demands that CPC has been very interested in the past years is real time ray tracing. A separate focus group was formed for finding algorithmic, parallel/heterogeneous implementation and custom hardware solutions for its challenges in 2015. The group's web pages are here.
OpenCL Pipe is a memory object used for passing data between kernels. It is useful in streaming style applications, where data is forwarded from one task to another. Since the pipe can be implemented in multiple ways, and OpenCL is intended as a programming model for heterogeneous platforms, the performance of the pipe implementations can vary heavily. The PhD thesis work of Topi Leppänen has resulted in insights on how the pipe specification could be improved especially in the context of FPGAs. These findings, along with suggestions for the OpenCL specification, were presented in IWOCL 2024 by Topi. Read the publication here.
The modern computing landscape includes a variety of platforms. In addition to general-purpose devices, specialized processors are used to increase efficiency in various application domains and use cases. The OpenCL standard presents a unified way to program these heterogeneous devices, and the CPC group's PoCL is a vendor-independent, open-source implementation of the standard. In his MSc thesis "Adding fault tolerance to OpenCL" (2023) Robin Bijl added a mechanism to achieve robust computation with PoCL. This allows fault tolerance and reliable computing even in the context of heterogeneous platforms. Read the thesis here.
The Internet of things (IoT) consists of an enormous amount of devices with their size varying from large to extremely tiny. While it may be desirable to have complex functionalities in even the tiniest devices, this is often not feasible simply due to the lack of available resources. However, offloading the computation to a (nearby) server or a larger device enables sharing of the resources and seemingly allows even small devices to perform demanding computations. In his MSc thesis "Offloading Computation with a Minimized OpenCL Runtime from a Nano Drone" (2022) Jyry Uitto created a proof-of-concept implementation of a nano drone that can offload OpenCL kernel execution onto an edge server. Read the thesis here.
Static multi-issue processors exploit instruction level parallelism efficiently thanks to the lack of dynamic hardware that schedules instructions during run time. However, their instruction stream energy consumption is significantly higher than that of their dynamic multi- or single-issue counterparts. Processor designers must choose between the benefits of static multi-issue capabilities and higher code density, but is it too much to ask for both? In our latest article, we introduce an energy-efficient dual-mode (RISC-V single-issue and an exposed datapath VLIW) architecture for leveraging instruction level parallelism statically when available in the program, without suffering from VLIW’s poor code density when there’s a lack of it. The flexibility of the architecture is utilized by a novel compilation method that can generate code for both instruction sets with fine-grained mode switching. Read more in the article.
Our Dutch colleague Maarten Molendijk from TU Eindhoven presented a co-authored paper "BrainTTA: A 28.6 TOPS/W Compiler Programmable Transport-Triggered NN SoC" in IEEE ICCD 2023. The publication was a result of successful collaboration work between our CPC group and PARSE/TUE where a programmable TTA/SIMD-based accelerator was designed for ultra low power AI inference on low precision use cases. The design was done using the OpenASIP tools with the design work conducted by Molendijk et al. Read more about it in the preprint. The presentation slides are available here.
Our doctoral researcher Topi Leppänen presented the paper "AFOCL: Portable OpenCL Programming of FPGAs via Automated Built-in Kernel Management" in NorCAS 2023. AFOCL allows FPGA device users to avoid vendor lock-in and separates the roles of software and FPGA engineer. Behind the curtain, the OpenCL implementation automatically selects IPs from a precompiled bitstream database and handles FPGA reconfiguration. Details in the paper.
Check out the video below of the final demonstrator for the CPSoSAware EU project. The work was a collaboration with the University of Peloponnese. The demonstrator features a nanodrone, which offloads processing to edge resources wirelessly using Pocl-R.
Social Media
Follow the CPC group on Twitter/X: https://twitter.com/CustomParComp