Advanced algorithmic techniques for GPUs

Lecture Details This section discusses the particular limitations and constraints of many-core hardware, and which computational patterns are desirable or undesirable, from the architectural point of view. Among the main obstacles to performance, discussion includes conflicts in critical resources leading to serialization, load imbalance, and memory bandwidth bottlenecks.

Created by

skill expert

05:35:14 Hours

0 Enrolled

(0 Reviews)

English

Last updated

Mon, 25-Jul-2022

Course description

Computational thinking

Professor Hwu is co-author of the first book on GPU programming in the English language.

Parallelism transformations for performance
Avoidance of conflicts in resources

An efficient technique for non-uniform data is sorting by binning.

Dealing with data efficiently

In this section, we discuss the particular limitations and constraints of many-core hardware, and which computational patterns are desirable or undesirable, from the architectural point of view. Among the main obstacles to performance, we discuss: conflicts in critical resources leading to serialization, load imbalance, and memory bandwidth bottlenecks.

With these constraints in sight, we aim to discover how computational thinking enables one to transform the structure of a problem by: identifying inherently serial parts, finding the parts which are amenable to parallel execution, and the methods and compromises for transforming one to the other. We make the observation that there are only a modest number of fundamental algorithm strategies used in successful GPU programs.

Often domain problems have inherent parallelism that needs to be recognized. The most efficient implementation that exploits the problem’s parallelism may be non-intuitive. For example, two alternative thread arrangements that appear in electrostatics calculations have, respectively, scatter and gather memory access behavior. The first is more intuitive, but the second is much more efficient on the GPU architecture.

The GPU architecture is characterized by memory access bandwidth that, although fast, is often limiting in comparison to compute throughput. Thus, achieving performance critically depends on finding ways to reduce and regularize global memory access. Three important algorithmic strategies for conserving bandwidth are “register/memory tiling”, “layout transformation” and “thread coarsening”. These come at a cost of increased on-chip memory usage, which is also a limited resource. We will discuss a variety of examples from PDE solvers, linear algebra, and convolution.

Regular computation and data access is the ideal combination for GPUs. Domain problems present data in a variety of ways to the programmer: often data is sparse, sometimes it is non-uniform, or it can be dynamic. Specific techniques such as data binning, compaction, and queuing are effective for GPU architectures. We will present examples from medical imaging and graph problems to illustrate these techniques.

What will i learn?

Requirements

skill expert

View profile Follow

Free

Lectures

6

Skill level

Beginner

Expiry period

Lifetime

Certificate

Yes

Enroll now

Top categories

Advanced algorithmic techniques for GPUs

Course description

What will i learn?

Requirements

Lecture 6 Lessons 05:35:14.000000 Hours

skill expert

Free

Lectures

6

Skill level

Beginner

Expiry period

Lifetime

Certificate

Yes

Related courses

Intermediate

Became a Game Designer

Free

Beginner

HTML

Free

Beginner

CSS

Free

Beginner

JavaScript-Basics

Free

Intermediate

JavaScript - Intermediate

Free

Beginner

Jquery

Free

Beginner

ASP.Net

Free

Beginner

Bash scripting

Free

Beginner

2D Games

Free

Beginner

Wifi Penetration

Free

Beginner

Kali Linux Hacking

Free

Beginner

Blogging Increase Blog Traffic Without Ads

Free

Top categories

Useful links

Help

Subscribe to our newsletter

Are you sure ?