11. simple_cl Module

This module provides a convenient method for working with openCL via pyopencl, by simplifying access to devices, allowing precision to be determined on the fly and by providing common complex variable functions.

It also simplifies function calls be assuming that the desired number of work-groups and work-items is the number of compute units and max workgroup size, respectively. This means the looping needs to be handled in the kernel, as in the example below.

11.1. Overview

CLSession([device, use_doubles, group_size, ...]) Create an OpenCL session on the specified device, while defining float precision and some utility functions.
acquire_opencl_devices(devname) Find OpenCL devices with matching descriptions.
is_job_done(event) Check if an OpenCL task is done.
get_device_info(device, field) Get information about an opencl device.

11.2. Example Usage

from ilpm import simple_cl
import numpy as np
import time

NP = 10**6
REPEATS = 10

#Here we are creating the an OpenCL context on the CPU.  Why bother?  This
#  causes the code execution to be multithreaded, which should speed it up!
ctx = simple_cl.CLSession(device='cpu', use_doubles=False)

#OpenCL code
ctx.compile('''
    __kernel void dot_product(__global REAL* a, __global REAL* b, int np, __global REAL* result)
    {
        for(int i = get_global_id(0); i < np; i += get_global_size(0)) {
        result[i] = dot(get_3D(a, i), get_3D(b, i));
        }
    }''')

#Initial data
a = np.random.rand(NP, 3)
b = np.random.rand(NP, 3)

#Create version of data on device
a_cl = ctx.to_device(a)
b_cl = ctx.to_device(b)
result_cl = ctx.empty(NP)

#Run in OpenCL
start = time.time()
for n in range(REPEATS):
    event = ctx.dot_product(a_cl, b_cl, NP, result_cl)
event.wait()
print 'OpenCL: %5.1f ms' % ((time.time() - start)*1E3)

result = result_cl.get()


#Run in numpy
start = time.time()
for n in range(REPEATS):
    result_n = (a*b).sum(-1)
print ' Numpy: %5.1f ms' % ((time.time() - start)*1E3)


#Find error; note that the OpenCL version is single precision, as we specified
#  use_doubles=False on the context creation.  This causes an error equal to
#  the single precision size.
print 'Dot product error:', np.abs(result_n - result).sum() / NP

Say you want to run your code on the computer’s GPU, not its CPU. This option is available with all high end recent Macs (maybe not the Macbook Airs?). How would I find out my GPU’s name to use it in my code? I’ll do it below without using simple_cl.

We can choose from the list. For instance, my MBP has 3 devices: the CPU, an Iris graphics card, and an AMD graphics card. So I could say

ctx = simple_cl.CLSession( device = ‘AMD Radeon R9 M370X Compute Engine’, use_doubles=True)

11.3. Classes and Functions

ilpm.simple_cl.acquire_opencl_devices(devname)[source]

Find OpenCL devices with matching descriptions.

Parameters:

devname : string

A partial name for an opencl device, or “cpu” or “gpu”.

Returns:

devices : list of OpenCL devices

All matching devices. May be empty

class ilpm.simple_cl.CLSession(device=['cpu', 'gpu'], use_doubles=False, group_size=None, show_warnings=False)[source]

Bases: object

Create an OpenCL session on the specified device, while defining float precision and some utility functions.

The actual creation of the OpenCL contexts are delayed until necessary. This allows the context parameters to be modified before it is used – this is useful for external modules where the user may which to change the device used for calculations.

The following data types are defined as members of the class:

Name OpenCL type Size (bytes)
real REAL (float or double) 4 or 8
complex COMPLEX (float2 or double2) 8 or 16
char char 1
uchar uchar 1
short short 2
ushort ushort 2
int int 4
uint uint 4
long long 8
ulong ulong 8

The following functions and constants will also be made available to any kernel compiled via the session:

Type Name Parameters Description
COMPLEX c_mul COMPLEX x, COMPLEX y xy
COMPLEX c_div COMPLEX x, COMPLEX y x/y
COMPLEX conj COMPLEX x x
COMPLEX c_exp COMPLEX x exp(x)
REAL c_angle COMPLEX x arg(x)
REAL c_abs COMPLEX x |x|
REAL c_abs_sq COMPLEX x |x|2
COMPLEX c_exp_i REAL x exp(ix)
COMPLEX native_exp_i REAL x (uses native precision math)
COMPLEX c_exp_i_T REAL x exp(ix)
COMPLEX native_exp_i_T REAL x (uses native precision math)
REAL PI constant π
REAL TWO_PI constant 2π
REAL EULER_CONSTANT constant γ=0.57721...
Parameters:

device : openCL device, description string or list of strings

The openCL device to use. If a string or list is passed, :function: acquire_opencl_devices will be used to find devices. As soon as it finds a matching device it stops; lists of devices used for fallbacks.

use_doubles : bool (default: False)

Specify if REAL and COMPLEX types are double precision

group_size : int (default: max for device)

The default group_size in calls. If not specified, determined by the max_workgroup_size for the device (recommended). If the device type is CPU, this will be set to 1 (CPUs sometimes erroneously report more than 1 for this value)

show_warnings : bool, optional (default: False)

If True, warnings displayed in build error messages.

Methods

compile(code) Compile OpenCL kernel code (as a string) on the present device.
compile_file(fn) Compile OpenCL kernels from file.
device_info()

Methods

empty(shape[, dtype]) Create an empty array on the OpenCL device.
empty_like(arr) Create an empty array with copied shape/dtype.
enqueue_copy(dst, src) Call pyopencl.enqueue_copy with the session queue.
fft(arr[, sign, inplace, swap, max_threads]) Perform a multidimensional FFT on a pyopencl array.
get_device_info(field) Get info about the opencl device for the session.
initialize([context]) Initialize the OpenCL context, if not already done.
local_memory(size) Create a local memory object on the OpenCL device.
ones(shape[, dtype]) Create a one filled array on the OpenCL device.
ones_like(arr) Create a one filled array with copied shape/dtype.
to_device(X) Make a copy of a local array on the OpenCL device.
zeros(shape[, dtype]) Create a zeroed array on the OpenCL device.
zeros_like(arr) Create a zeroed array with copied shape/dtype.
initialize(context=None)[source]

Initialize the OpenCL context, if not already done.

Parameters:

context : None or pyopencl Context

The OpenCL context to used. If none, created using the preset “device” attribute.

to_device(X)[source]

Make a copy of a local array on the OpenCL device.

Parameters:

X : numpy array

Returns:

X_cl : pyopencl array

A version of the array stored on the OpenCL device. If the data type is float or double, it will be converted to the precision of the device context.

compile_file(fn)[source]

Compile OpenCL kernels from file. See compile() for details.

compile(code)[source]

Compile OpenCL kernel code (as a string) on the present device.

Compliation is delayed if this session is not initialized. If you require access to the Program object (which is usually not important), call initialize() first.

Returns:

program : pyopencl Program or None

The compiled program. May be used to make function calls, although the contained kernels will automatically become functions of the base class. If context is not initialized yet, returns None.

empty(shape, dtype=None)[source]

Create an empty array on the OpenCL device.

Parameters:

shape : tuple

dtype : numpy data type

Returns:

arr : pyopencl array

An empty array of the specified type.

empty_like(arr)[source]

Create an empty array with copied shape/dtype.

zeros(shape, dtype=None)[source]

Create a zeroed array on the OpenCL device.

Parameters:

shape : tuple

dtype : numpy data type

Returns:

arr : pyopencl array

A zeroed array of the specified type.

ones_like(arr)[source]

Create a one filled array with copied shape/dtype.

ones(shape, dtype=None)[source]

Create a one filled array on the OpenCL device.

Parameters:

shape : tuple

dtype : numpy data type

Returns:

arr : pyopencl array

A zeroed array of the specified type.

zeros_like(arr)[source]

Create a zeroed array with copied shape/dtype.

local_memory(size)[source]

Create a local memory object on the OpenCL device.

Parameters:

size : int

The size (in bytes) of the local memory required.

Returns:

mem : cl.LocalMemory

A local memory object for an OpenCL kernel.

enqueue_copy(dst, src)[source]

Call pyopencl.enqueue_copy with the session queue.

fft(arr, sign=1, inplace=True, swap=None, max_threads=None)[source]

Perform a multidimensional FFT on a pyopencl array. Note: array dimension sizes must be multiples of 2, 3, 5, 7.

Parameters:

arr : pyopencl array, or an object castable to one (e.g. numpy array)

sign : integer (+-1)

Forward or backward transform.

inplace : bool

In place transform? If not, a copy is made first. Note: if a numpy array is passed it will never by in place!

swap : pyopencl array (default: None)

The swap buffer used in the calculation. If not specified, one will be created. Should have the same dimensions/type as arr

max_threads : int (default self.max_work_group_size)

The max_thread argument used for the fft code generation. Note that if a given size FFT has already been generated, this will be ignored. (Generally, this should be left unset, unless there isn’t enough local memory to transform multiple blocks.)

Returns:

arr : opencl array

The output array; will match the input array if inplace.

get_device_info(field)[source]

Get info about the opencl device for the session. See get_device_info()

ilpm.simple_cl.get_device_info(device, field)[source]

Get information about an opencl device.

Parameters:

field : A valid field from pyopencl.device_info or str

The field to obtain. If specified as a string, name should match a member of pyopencl.device_info, ignoring case. Examples include: "global_mem_size" and "max_compute_units".

Returns:

value : varies (usually int)

The value returned by calling device.get_info(...)

ilpm.simple_cl.is_job_done(event)[source]

Check if an OpenCL task is done.

Parameters:event : pyopencl Event
Returns:done : bool