Over 60 trainings all over Europe for universities and industryOn-site trainings on the whole range of GPU computing technologiesEach lecture accompanied with a practical session on remote GPU clusterBest recipes of GPU code optimization, based on our 5-year development experienceWe have multiple training programs and even books! Check out our catalogue here.

Using CUDA device functions from OpenACC

Category: OpenACC
Published: Friday, 16 September 2016
    

 

OpenACC enables rapid transition of serial C/C++/Fortran into GPU-enabled parallel code. However, due to high-level nature, OpenACC does not offer access to GPU-specific features useful for debugging, optimization and other purposes. In this article we demonstrate how to call CUDA device functions from within OpenACC kernels by two examples: GPU compute grid retrieval and printf.

In OpenACC source file make forward declarations of our CUDA device functions: 

Read more: Using CUDA device functions from OpenACC

OpenACC workshop at the Univeristy of Mainz

Category: OpenACC
Published: Friday, 16 September 2016
    

 

Applied Parallel Computing LLC delivered the OpenACC Workshop at the University of Mainz, Germany. The workshop has been kindly supported by NVIDIA and Fluidyna GmbH.

Workshop program

Calling CUDA device function from OpenACC Fortran kernel

Category: OpenACC
Published: Friday, 11 July 2014

OpenACC is known to be a fast method of developing quite efficient GPU-enabled applications. It is also possible to mix CUDA kernels and libraries with OpenACC kernels in single source. But do you know it is also possible to call CUDA device functions from within OpenACC kernels? This feature enables, for instance, the use of libraries with device functions (e.g. CURAND) and dynamic parallelism in OpenACC. We've developed an example program to show you how CUDA device subroutine could be called from OpenACC kernel in Fortran:

program main
use openacc
use device_subroutine_module

!$acc routine (device_subroutine) seq
integer, dimension(:), allocatable :: a
integer, dimension(:), allocatable :: b
integer :: i

allocate(a(10), b(10))

!$acc kernels copyout (a,b)
!$acc loop
do i = 1, 10
	call device_subroutine(i, a, b)
enddo

!$acc end kernels
print *, a

deallocate(a, b)
	
end program

Read more: Calling CUDA device function from OpenACC Fortran kernel