Register pressure occurs when there are not enough registers available for a given task. Even though each multiprocessor contains thousands of 32-bit registers (see Section F.1 of the CUDA C Programming Guide), these are partitioned among concurrent threads. To prevent the compiler from allocating too many registers, use the –maxrregcount=N compiler command-line option (see NVCC below) or the launch bounds kernel definition qualifier (see Section B.17 of the CUDA C Programming Guide) to control the maximum number of registers to allocated per thread.