Article Source

Title: GPU Accelerated Computing with C and C++
Source: NVIDIA CUDA Zone

GPU Accelerated Computing with C and C++

With the CUDA Toolkit from NVIDIA, you can accelerate your C or C++ code by moving the computationally intensive portions of your code to an NVIDIA GPU. In addition to providing drop-in library acceleration, you are able to efficiently access the massive parallel power of a GPU with a few new syntactic elements and calling functions from the CUDA Runtime API.

The CUDA toolkit from NVIDIA is free and includes:

Visual and command-line debugger
Visual and command-line GPU profiler
Many GPU optimized libraries
The CUDA C/C++ compiler
GPU management tools
Lots of other features

Getting Started:

Make sure you have an understanding of what CUDA is.
- Read through the Introduction to CUDA C/C++ series on Mark Harris’ Parallel Forall blog.
Try CUDA by taking a self-paced lab on nvidia.qwiklab.com. These labs only require a supported web browser and a network that allows Web Sockets. Click here to verify that your network & system support Web Sockets in section “Web Sockets (Port 80)”, all check marks should be green.
Download and install the CUDA Toolkit.
- You can watch a quick how-to video for Windows showing this process:
- Also see Getting Started Guides for Windows, Mac, and Linux.
See how to quickly write your first CUDA C program by watching the following video:

Learning CUDA:

Take the easily digestible, high-quality, and free Udacity Intro to Parallel Programming course which uses CUDA as the parallel programming platform of choice.
Visit docs.nvidia.com for CUDA C/C++ documentation.
Work through hands-on examples:
- Adding two vectors together
Look through the code samples that come installed with the CUDA Toolkit.
If you are working in C++, you should definitely check out the Thrust parallel template library.
Browse and ask questions on stackoverflow.com or NVIDIA’s DevTalk forum.
Learn more by:
- Reading the CUDA C Programming Guide
- Reading the CUDA C Best Practices Guide
- Watching the many hours of recorded sessions from the gputechconf.com site.
- d.Participating in trainings provided at conferences, such as Supercomputing, International Supercomputing, GPU Technology Conference, any may others.
- Browsing here for more learning opportunities.
Look at the following for more advanced hands-on examples:
- A 1D Stencil example, including shared memory and synchronized threads.
- Optimizing a Jacobi Point Iterative method.

So, now you’re ready to deploy your application? You can register today to have FREE access to NVIDIA TESLA K40 GPUs. Develop your codes on the fastest accelerator in the world. Try a Tesla K40 GPU and accelerate your development.

Availability

The CUDA Toolkit is a free download from NVIDIA and is supported on Windows, Mac, and most standard Linux distributions.

Starting with CUDA 5.5, CUDA also supports the ARM architecture
For the host-side code in your application, the nvcc compiler will use your default host compiler.

Stop Thinking, Just Do!

GPU Accelerated Computing with C and C++

Tags

30 April 2014

Article Source

GPU Accelerated Computing with C and C++

Getting Started:

Learning CUDA:

Availability