Over 60 trainings all over Europe for universities and industryOn-site trainings on the whole range of GPU computing technologiesEach lecture accompanied with a practical session on remote GPU clusterBest recipes of GPU code optimization, based on our 5-year development experienceWe have multiple training programs and even books! Check out our catalogue here.

NVIDIA Visual Profiler allows to connect 64-bit Linux server from 32-bit Windows

Category: CUDA
Published: Sunday, 13 July 2014

In CUDA 6.0 release an extremely handy feature has been added to Visual Profiler: support for remote profiling. This means that you can run the profiler GUI from your local machine - laptop or tablet (even without GPU!), while profiling is performed on remote machine, e.g. company server or university cluster.

Moreover, it does not matter which operating system local machine is running. On our screenshot remote profiling on 64-bit Linux server is performed from Visual Profiler running on 32-bit Windows 8.


Jetson K1: bandwidthTest

Category: CUDA
Published: Sunday, 15 June 2014


Chart on the left shows the bandwidths of memory transfers on Jetson K1 (click to enlarge). For the baseline we also added GTX680M's host-device and device-host (peak device-device is ~90K -- too large for this chart).

Summary of Jetson K1's bandwidthTest results, two of which are quite unusual:

  • peak device-device is ~7.5x smaller than GTX680M's
  • peak pinned host-device is ~12x smaller than GTX680M's
  • peak pinned device-host is only ~2x smaller than GTX680M's (!)
  • peak pageable host-device is ~1.5x higher than pinned host-device (!)


Download raw data table in ODS format.

Jetson K1: from unboxing straight to CUDA in 5 steps

Category: CUDA
Published: Saturday, 14 June 2014


We finally got the most wanted Jetson K1 board in the house! In this post we show how to turn a just unboxed tiny board into fully-functional CUDA development node.

From now on our training center also offers CUDA developer certification on Jetson K1.


Read more: Jetson K1: from unboxing straight to CUDA in 5 steps

How to break Ubuntu 13.04/14.04 with vanilla CUDA driver and unbreak it back

Category: CUDA
Published: Sunday, 01 June 2014

After installing CUDA driver from NVIDIA website, Ubuntu 13.04/14.04 window manager decorations (Unity, via Compiz) may stop working properly on Optimus machines (primary low-end Intel GPU + secondary high-end NVIDIA GPU).

This tutorial explains how to bring back window manager decorations.

(Also available as PDF)


Read more: How to break Ubuntu 13.04/14.04 with vanilla CUDA driver and unbreak it back

Improving CUDA profiler output of the MPI-CUDA program

Category: CUDA
Published: Thursday, 24 April 2014

Consider we need to profile the following MPI-CUDA program on GPU cluster. The most obvious way to profile this code on console-only cluster would be to invoke the profiler inside the mpirun command:

Read more: Improving CUDA profiler output of the MPI-CUDA program

One non-obvious reason of "Illegal instruction" in GPU code

Category: CUDA
Published: Saturday, 12 April 2014

If cuda-gdb throws Program received signal CUDA_EXCEPTION_4, Warp Illegal Instruction. for the following code line:

[Switching focus to CUDA kernel 295, grid 148, block (0,2,3), thread (0,0,0), device 0, sm 0, warp 26, lane 0]
0x00000000010b39f0 in cos ()
(cuda-gdb) disass
Dump of assembler code for function cos:
   0x00000000010b39e8 <+376>:	LDC.64 R32, c[0x3][R12]

(note debugger always points to the next address after problematic instruction, i.e. 0xe8 + 0x8 = 0xf0 in this case)

then this means used register index is outside of the legal bounds set by kernel's register count.