11. simple_cl Module¶
This module provides a convenient method for working with openCL via pyopencl, by simplifying access to devices, allowing precision to be determined on the fly and by providing common complex variable functions.
It also simplifies function calls be assuming that the desired number of work-groups and work-items is the number of compute units and max workgroup size, respectively. This means the looping needs to be handled in the kernel, as in the example below.
11.1. Overview¶
CLSession ([device, use_doubles, group_size, ...]) |
Create an OpenCL session on the specified device, while defining float precision and some utility functions. |
acquire_opencl_devices (devname) |
Find OpenCL devices with matching descriptions. |
is_job_done (event) |
Check if an OpenCL task is done. |
get_device_info (device, field) |
Get information about an opencl device. |
11.2. Example Usage¶
from ilpm import simple_cl
import numpy as np
import time
NP = 10**6
REPEATS = 10
#Here we are creating the an OpenCL context on the CPU. Why bother? This
# causes the code execution to be multithreaded, which should speed it up!
ctx = simple_cl.CLSession(device='cpu', use_doubles=False)
#OpenCL code
ctx.compile('''
__kernel void dot_product(__global REAL* a, __global REAL* b, int np, __global REAL* result)
{
for(int i = get_global_id(0); i < np; i += get_global_size(0)) {
result[i] = dot(get_3D(a, i), get_3D(b, i));
}
}''')
#Initial data
a = np.random.rand(NP, 3)
b = np.random.rand(NP, 3)
#Create version of data on device
a_cl = ctx.to_device(a)
b_cl = ctx.to_device(b)
result_cl = ctx.empty(NP)
#Run in OpenCL
start = time.time()
for n in range(REPEATS):
event = ctx.dot_product(a_cl, b_cl, NP, result_cl)
event.wait()
print 'OpenCL: %5.1f ms' % ((time.time() - start)*1E3)
result = result_cl.get()
#Run in numpy
start = time.time()
for n in range(REPEATS):
result_n = (a*b).sum(-1)
print ' Numpy: %5.1f ms' % ((time.time() - start)*1E3)
#Find error; note that the OpenCL version is single precision, as we specified
# use_doubles=False on the context creation. This causes an error equal to
# the single precision size.
print 'Dot product error:', np.abs(result_n - result).sum() / NP
Say you want to run your code on the computer’s GPU, not its CPU. This option is available with all high end recent Macs (maybe not the Macbook Airs?). How would I find out my GPU’s name to use it in my code? I’ll do it below without using simple_cl.
We can choose from the list. For instance, my MBP has 3 devices: the CPU, an Iris graphics card, and an AMD graphics card. So I could say
ctx = simple_cl.CLSession( device = ‘AMD Radeon R9 M370X Compute Engine’, use_doubles=True)
11.3. Classes and Functions¶
-
ilpm.simple_cl.
acquire_opencl_devices
(devname)[source]¶ Find OpenCL devices with matching descriptions.
Parameters: devname : string
A partial name for an opencl device, or “cpu” or “gpu”.
Returns: devices : list of OpenCL devices
All matching devices. May be empty
-
class
ilpm.simple_cl.
CLSession
(device=['cpu', 'gpu'], use_doubles=False, group_size=None, show_warnings=False)[source]¶ Bases:
object
Create an OpenCL session on the specified device, while defining float precision and some utility functions.
The actual creation of the OpenCL contexts are delayed until necessary. This allows the context parameters to be modified before it is used – this is useful for external modules where the user may which to change the device used for calculations.
The following data types are defined as members of the class:
Name OpenCL type Size (bytes) real
REAL
(float
ordouble
)4 or 8 complex
COMPLEX
(float2
ordouble2
)8 or 16 char
char
1 uchar
uchar
1 short
short
2 ushort
ushort
2 int
int
4 uint
uint
4 long
long
8 ulong
ulong
8 The following functions and constants will also be made available to any kernel compiled via the session:
Type Name Parameters Description COMPLEX
c_mul
COMPLEX x, COMPLEX y
x∗y COMPLEX
c_div
COMPLEX x, COMPLEX y
x/y COMPLEX
conj
COMPLEX x
x∗ COMPLEX
c_exp
COMPLEX x
exp(x) REAL
c_angle
COMPLEX x
arg(x) REAL
c_abs
COMPLEX x
|x| REAL
c_abs_sq
COMPLEX x
|x|2 COMPLEX
c_exp_i
REAL x
exp(ix) COMPLEX
native_exp_i
REAL x
(uses native precision math) COMPLEX
c_exp_i_T
REAL x
exp(−ix) COMPLEX
native_exp_i_T
REAL x
(uses native precision math) REAL
PI
constant π REAL
TWO_PI
constant 2π REAL
EULER_CONSTANT
constant γ=0.57721... Parameters: device : openCL device, description string or list of strings
The openCL device to use. If a string or list is passed, :function: acquire_opencl_devices will be used to find devices. As soon as it finds a matching device it stops; lists of devices used for fallbacks.
use_doubles : bool (default: False)
Specify if REAL and COMPLEX types are double precision
group_size : int (default: max for device)
The default group_size in calls. If not specified, determined by the max_workgroup_size for the device (recommended). If the device type is CPU, this will be set to 1 (CPUs sometimes erroneously report more than 1 for this value)
show_warnings : bool, optional (default: False)
If True, warnings displayed in build error messages.
Methods
compile
(code)Compile OpenCL kernel code (as a string) on the present device. compile_file
(fn)Compile OpenCL kernels from file. device_info
()Methods
empty
(shape[, dtype])Create an empty array on the OpenCL device. empty_like
(arr)Create an empty array with copied shape/dtype. enqueue_copy
(dst, src)Call pyopencl.enqueue_copy
with the session queue.fft
(arr[, sign, inplace, swap, max_threads])Perform a multidimensional FFT on a pyopencl array. get_device_info
(field)Get info about the opencl device for the session. initialize
([context])Initialize the OpenCL context, if not already done. local_memory
(size)Create a local memory object on the OpenCL device. ones
(shape[, dtype])Create a one filled array on the OpenCL device. ones_like
(arr)Create a one filled array with copied shape/dtype. to_device
(X)Make a copy of a local array on the OpenCL device. zeros
(shape[, dtype])Create a zeroed array on the OpenCL device. zeros_like
(arr)Create a zeroed array with copied shape/dtype. -
initialize
(context=None)[source]¶ Initialize the OpenCL context, if not already done.
Parameters: context : None or pyopencl Context
The OpenCL context to used. If none, created using the preset “device” attribute.
-
to_device
(X)[source]¶ Make a copy of a local array on the OpenCL device.
Parameters: X : numpy array
Returns: X_cl : pyopencl array
A version of the array stored on the OpenCL device. If the data type is float or double, it will be converted to the precision of the device context.
-
compile
(code)[source]¶ Compile OpenCL kernel code (as a string) on the present device.
Compliation is delayed if this session is not initialized. If you require access to the Program object (which is usually not important), call
initialize()
first.Returns: program : pyopencl Program or None
The compiled program. May be used to make function calls, although the contained kernels will automatically become functions of the base class. If context is not initialized yet, returns None.
-
empty
(shape, dtype=None)[source]¶ Create an empty array on the OpenCL device.
Parameters: shape : tuple
dtype : numpy data type
Returns: arr : pyopencl array
An empty array of the specified type.
-
zeros
(shape, dtype=None)[source]¶ Create a zeroed array on the OpenCL device.
Parameters: shape : tuple
dtype : numpy data type
Returns: arr : pyopencl array
A zeroed array of the specified type.
-
ones
(shape, dtype=None)[source]¶ Create a one filled array on the OpenCL device.
Parameters: shape : tuple
dtype : numpy data type
Returns: arr : pyopencl array
A zeroed array of the specified type.
-
local_memory
(size)[source]¶ Create a local memory object on the OpenCL device.
Parameters: size : int
The size (in bytes) of the local memory required.
Returns: mem : cl.LocalMemory
A local memory object for an OpenCL kernel.
-
fft
(arr, sign=1, inplace=True, swap=None, max_threads=None)[source]¶ Perform a multidimensional FFT on a pyopencl array. Note: array dimension sizes must be multiples of 2, 3, 5, 7.
Parameters: arr : pyopencl array, or an object castable to one (e.g. numpy array)
sign : integer (+-1)
Forward or backward transform.
inplace : bool
In place transform? If not, a copy is made first. Note: if a numpy array is passed it will never by in place!
swap : pyopencl array (default: None)
The swap buffer used in the calculation. If not specified, one will be created. Should have the same dimensions/type as arr
max_threads : int (default self.max_work_group_size)
The max_thread argument used for the fft code generation. Note that if a given size FFT has already been generated, this will be ignored. (Generally, this should be left unset, unless there isn’t enough local memory to transform multiple blocks.)
Returns: arr : opencl array
The output array; will match the input array if inplace.
-
get_device_info
(field)[source]¶ Get info about the opencl device for the session. See
get_device_info()
-
-
ilpm.simple_cl.
get_device_info
(device, field)[source]¶ Get information about an opencl device.
Parameters: field : A valid field from
pyopencl.device_info
or strThe field to obtain. If specified as a string, name should match a member of pyopencl.device_info, ignoring case. Examples include:
"global_mem_size"
and"max_compute_units"
.Returns: value : varies (usually int)
The value returned by calling device.get_info(...)