Intelligent Visual Supercomputing on Hybrid Graphical Multiprocessor Environments

ROGMANS, Sammy

Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/15830

Title:	Intelligent Visual Supercomputing on Hybrid Graphical Multiprocessor Environments
Other Titles:	Parallellisatie van multicamera-algoritmen voor het intelligent supercalculeren op hybride multiprocessoromgevingen
Authors:	ROGMANS, Sammy
Advisors:	BEKAERT, Philippe
Issue Date:	2013
Abstract:	In this dissertation we investigate the principles of efficiently utilizing massively available parallel computational power, i.e. more specifically the resources that are available in the Graphics Processing Unit (GPU). The continuous and exponential increase of processing power nowadays, makes the ability to use all this computational power even until this day a challenging problem. To tackle this, another state of mind needs to be set, most definitely in designing new algorithms that are compatible with such architectures and paradigms. Using this background philosophy, we redesign and innovate algorithms in the domain of visual computing, focusing on the latest trends in multicamera applications. For one, these applications, thanks to their requirement for massive data processing, can be seen as an ideal casestudy for mastering the skills of high performance computing. Secondly, thanks to the latest applicative trends in multicamera, this domain receives a lot of research attention, definitely considering real-time or high-speed performance. We first introduce the fundamentals of GPU computing, while optimizing a number of basic processing modules in the context of multicamera capturing and correcting. We suggest a number of rules to optimize local data communication within computational kernels, how to properly (SIMD) vectorize given data structures, how to maximally exploit the small available caches in a spatiotemporal way, and how to exploit specialized hardware in conjunction with algorithmic redesign to create efficient algorithmic building blocks. These set of rules are further used throughout the book to move to a more advanced level, where we focus on multicamera analysis and reconstruction. Next to being algorithmically consecutive to the multicamera capturing, it provides both algorithmic and implementation-wise challenges. We therefore introduce a novel stereo matching algorithm that is able to extract depth from two parallel input camera images, which is designed from the ground up with the fundamentals of GPU computing in mind. Doing so, we achieve a very high algorithmic throughput while jointly providing superior quality to many of its competitors. Furthermore, we identify and investigate the most computational intensive part, i.e. matching cost aggregation in the form of a multidimensional convolution. This research makes it possible for us to propose a set of transformation rules to indicate the use of different computational models depending on the size of the convolution kernel. Moreover, future-proof scalability is taken into account. We further present the possibilities when more cameras are introduced to release the algorithmic requirement of this processing step, as more cameras allows us to make more confident reconstructions without relying on aggregation. In this context, we present an initial prototype of eye-gaze corrected video chat that is already able to synthesize basic views using the developed and proposed technology. As a consequence, we investigate multicamera rendering and synthesis in-depth, as the end-to-end result of multicamera applications within the context of a generic framework. Hence, we were able to design advanced and very effective multicamera synthesis approaches that properly tackle the shortcomings of the previous processing blocks. Furthermore, we investigate multicamera output as a driver for stereoscopic perception, extending our framework with the capability of rendering biologicalaware content depending on the viewer and his position, to maintain visual comfort. In the following part, we exploit the framework to again perform optimizations but on an holistic level, i.e. with an effect on the entire application chain. To this respect, we propose a set of high-level transformation rules to reshape the data flow of the algorithm in order to maximize the arithmetic intensity and throughput. Furthermore, we introduce a possible additional control loop module that strips a large number of the computations of the typically brute-force backbone of these parallel algorithms. Next, we show that completely migrating or using next-generation GPU computing paradigms not necessarily leads to the best performance results, but that hybrid implementations that exploit the benefits of both worlds are able to perform optimally. In a final part, we exploit our framework to allow us to investigate and optimize the end-to-end performance by relying on a large set of building blocks, different algorithmic approaches and measuring the sensitivity of the output to the intermediate results. A rich set of design rules are therefore proposed, leading to multiple forms of quality-speed trade-offs and graceful degradation.
Document URI:	http://hdl.handle.net/1942/15830
Category:	T1
Type:	Theses and Dissertations
Appears in Collections:	PhD theses Research publications

Files in This Item:

File	Description	Size	Format
SammyRogmans-PhD2013-Printed.pdf		34.66 MB	Adobe PDF	View/Open

Show full item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM