Whether it is in the home, office or for high performance industrial or research simulations, improved performance from our computing equipment is a highly desirable property and up until the mid-2000s, improved performance in single-core processing was achieved through frequency scaling, i.e. effectively increasing the number of calculations a processor can make per second. In the early 1980’s processor speeds were at 4MHz, by 1995 it was 100MHz, in 2000 AMD reached 1GHz and are typically around 4GHz at present (the current record from 2011 stands at a little over 8.8GHz). However, despite continued shrinking of transistor size, according to Moore’s Law, further frequency scaling leads to prohibitive power draw, making it infeasible to build faster processors that do not overheat. Instead, improved performance is now achieved by running multi-core processors that do not aim for higher frequencies, and in fact may run at lower speeds to retain energy efficiency, but operate so that many calculations can be carried out at the same time in parallel. This enlarges the scope for performance improvements.
One of the most computationally expensive types of everyday computing is processing graphical images. A typical image can easily consist of several million pixels and each pixel can require a number of calculations to be performed in order to render it correctly. To efficiently perform these tasks, specialised graphical processing units (GPUs) were developed. Recently, GPUs have increasingly been used for mainstream computing tasks where highly parallel processing is required, as modern GPUs offer thousands of parallel processing elements.
Merely being able to process many different calculations in parallel is only one part of the challenge; the much more complex aspect is managing the processing so that the multiple parallel processors are used efficiently. If a standard computer program was run using a GPU, without modification, all the calculations would be performed in sequence using a single processing element. Instead, programs need to be constructed in a specific way so that the GPU can correctly execute the software in parallel. This is either done automatically using a specialised compiler, or can be done by the programmer. The manual approach is a very complex task, but can lead to better performance than can be obtained using automated compilation tools.
Parallel processing and how to support the developers in writing correct parallel programs are questions that have driven the research of Dr Alastair Donaldson in recent years. Using funding from the EPSRC IAA, he has been developing GPUVerify, a tool that will help programmers by identifying errors that exist in their parallel code. For example, one of the most common problems to occur is called a data race. This is where, when manipulating large volumes of data, two processors try to access and process the same or an overlapping chunk of data simultaneously, without synchronization to control the order in which the accesses occur. Data races can lead to software that does not compute reliable results – each time the software is executed, the results that are computed may differ depending on the outcomes of data races. GPUVerify is designed to inform programmers where errors like this exist in their code. So far this tool has generated great interest in software development firms, Imagination Technologies recommend GPUVerify through a third party showcase and ARM have incorporated GPUVerify support in the latest release of their Mali Graphics Debugger. Alastair is now looking to develop his other complimentary tool CLsmith, which aims to check for errors that arise during the compilation of GPU source code into machine code.