I have created a BIBTEX file which has an entry for every chapter of the two "Pearls" books, the Xeon Phi books (original Knights Corner and the Knight Landing versions), and the Structured Parallel Programming book. I also included entries for all the other books I've been involved with, including TBB, VTune and Multithreading for VfX, and much more.
The entries include DOI numbers for the chapters of the Xeon Phi books (original Knights Corner and the Knight Landing versions), the two "Pearls" books, and the Structured Parallel Programming book.
This is a resource for the many people who have contributed to these books, and anyone who would like to cite these works.
I will gladly take feedback, and update the file from time to time based on feedback and new publications.
This "version 4" now includes all page numbers and DOI information for our latest book covering Knights Landing.
Download "v4" of ThinkParallel.bib (ZIP)
Parallelism Pearls for Multicore and Many-core Programming
Our first "Pearls" book has been translated to Korean (고성능 병렬화 핵심 가이드: 멀티 코어와 매니 코어 프로그래밍 접근법). We are very thankful for our Korean friends who made this happen! |
![]() |
The download file for Pearls 2 is compelte Note: I have moved the ZIP file to be on this server instead of Dropbox, based on feedback that some employers block Dropbox access. This download has the code (1.2GB in size), complete with Makefiles and build instructions, from the code used in our book "High Performance Parallelism Pearls Volume Two" - for the whole book (posted this complete version in January 2016). Call this "version 3." We hope you find it useful. Please drop us a note with any feedback or suggestions! DOWNLOAD CODE - 1.4GB ZIP FILE LINK Thanks to Ryan Coleman, at Sandia National Labs, we have a this on github at https://github.com/ryancoleman/lotsofcoresbook2code |
![]() |
Code from Volume One is a separate download (90Mb in size), complete with Makefiles and build instructions, from the code used in our book "High Performance Parallelism Pearls." Please drop us a note with any feedback or suggestions! DOWNLOAD CODE - 90Mb ZIP FILE LINK Thanks to Ryan Coleman, at Sandia National Labs, we have a this on github at https://github.com/ryancoleman/lotsofcoresbook1code |
![]() |
We have created Powerpoint summaries of the High Performance Parallelism Pearls books. If you expand on these - please share with us! I will be happy to grow and expand (and correct) these powerpoint decks. I have uploaded completely open and unlocked PPTX files. The files are a bit large, but I did not want to over compress the images. I doubt anyone would ever use more than a quarter of the slides in any one talk, probably less - but having them all is useful. Powerpoint for Pearls Volume Two (14.5Mb ZIP file) |
![]() |
Powerpoint for Pearls Volume One (20.6Mb PPTX file) | ![]() |
An article about our discussion of the work from Chapter 10 ran in HPCwire: COSMOS Team Achieves 100x Speedup on Cosmology Code. Unknown to us at the time, Tiffany Trader at HPCwire attended our talk at IDF in San Francisco on August 19, 2015. She enjoyed our talk... I think our enthusiasm about this work showed!
The "100X" speed-up is real - and compares Intel to Intel. Nothing in it was an attempt to mislead anyone - it was not a comparison of products from different companies in any attempt to mislead.
The team truly gets their analysis done 100X faster than when they started. It's a great example of "code modernization" - and the authors shared step-by-step their thinking as they made nine distinct changes to their code, discussing each one, on the path to higher performance on processors and the Intel Xeon Phi coprocessor. The tracking of performance improvement for both with the same changes is remarkable as well. There is a lot to learn from their example. In fact, readers of our Pearls books know that both volumes are full of teaching examples like this. "Just parallelism" as we are guilty of saying on occasion. It's not easy - but neither is regular programming.
We really like how the article captured our enthusiasm in presenting this work.
We got our first copies of our latest book today!
We have ALL the figures from both Volumes of High Performance Parallelism Pearls available for download.
We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: All figures are available in TIFF format, many are also available in EPS format. For most uses, the TIFF files are what you want. The four downloads files available are:
|
![]() ![]() |
Also, all the figures, tables, charts and drawings in the original "Intel Xeon Phi Coprocessor High Performance Programming" are available for download. See figures-tables-charts-drawings
I had the privilege of giving a talk today in Maryland that covered many topic ranging from Parallelism, Intel Xeon Phi, Intel Parallel Studio XE (tools), and our books. I have posted the slides for the students and anyone else who is interested.
Jim Dempsey provided this video related to his chapter: High Performance Parallelism Pearls, Chapter 5, Plesiochronous Phasing Barriers, by Jim Dempsey.
This is a video of the Plesiochronous Phasing Barriers in action. The video is not annotated nor does it have a voice over... a short explanation is provided below the video. |
![]() |
The left half of the screen represents the optimized tiled version and the right half represents the plesiochronous version. Each half is divided into two parts:
Top) A view of the Y/Z plane with the X dimension into the screen. Each pixel in the top portion of each side changes color upon completion of computation of column along X. Color changes are an indication of rate of computation, position of change indicates where and when in the Y/Z plane the computation occurred
Bottom) Each thread displays an individual line progressing in time from left to right, and wrapping around (raster-like) with two different colors: green for thread computing, red for in barrier wait. (red “ticks” may appear dark rather than red).
In the left half (traditional tiled), you can note that the Y/Z columns of X are at most in any one of two colors (time phases). The bottom of the left half illustrates the traditional tiled method runs well until the point where the threads start completion of their designated tile(s) and reach the barrier. It looks like a cascade of cars reaching a traffic jamb, which doesn’t clear until all threads reach the barrier.
The right half (plesiochronous), you can note that the Y/Z columns of X are at most in any one of three colors (time phases). The bottom half illustrates the barrier wait time for each thread, are for the most part not synchronized. You may notice that four threads appear to be synchronized, and they are. These are the treads of the same core, and the plesiochronous barrier scheme uses core barriers. These threads are not adjacent because of KMP_AFFINITY=scatter. You may also note that each thread computes their X columns along in the Y direction, essentially the threads tile is not rectangular. You also notice time domain edge is ragged indicating the time skew between threads. Occasionally you will also notice threads getting delayed, presumably by worst case memory latencies due to evictions.
The programs were instrumented to collect (RDTSC) time stamp counter information for each thread as it entered and left a computational region. The timer interval between computational regions is the barrier wait time.
You may click on the video to bring it up full size (double the width and height from that shown here).
Jim and I got to see the first copies of the new book today - together. They are here in time for SC'14. We have a book signing in the Intel booth on Thursday (Nov 20, 2014) at noon (drop by with your copy and we can sign it! - hopefully some of our coauthors will be there too.) Many thanks to the amazing team at Morgan Kaufmann Publishing, and to the wonderful contributors who worked so hard to share their work.
Link to the PUBLISHER'S WEB SITE: for High Performance Parallelism Pearls
October 22: update... all editing is done... it heads to be printed now. 548 pages by my count.
We have a publication date: November 17! (and ISBN number: 978-0128021187)
High Performance Parallelism Pearls: Multicore and Many-core Programming Approaches
Where to order:
There are some early reviews/write-ups based on a draft of the book:
Teaching The World About Intel Xeon Phi
The Unabridged Chapter 1 Introduction To High Performance Parallelism Pearls
From ‘Correct’ to ‘Correct & Efficient’: a Hydro2D case study with Godunov’s scheme
(check for even more being posted for other chapters... Xeon Phi articles)
Colfax Research has just posted the 280-slide deck from their “Parallel Programming and Optimization with Intel Xeon Phi Coprocessors” developer training program.
All the figures, tables, charts and drawings are available for download. Please use them freely with attribution. You should find them to all be high quality artwork, suitable for presentations and other uses. Suggestion attribution: (c) 2013 Jim Jeffers and James Reinders, used with permission. Feel free to mention the book too: "Intel Xeon Phi Coprocessor High Performance Programming." If you like our book - please let others know! If you have suggestions or feedback, please let us know! GZipped TAR file: XeonPhiBookFiguresEtc.tar.gz ZIP file: XeonPhiBookFiguresEtc.zip |
![]() |
Our book has been reviewed at Dr. Dobbs - online at http://www.drdobbs.com/tools/developer-reading-list/240152134
I was excited to get a copy (sent to each author express from the printer) this week. It is available for purchase from many stores including http://store.elsevier.com/product.jsp?isbn=9780124104143
As of today - the book is in final production steps... we have proofreading to do still, but everything is in the production department at Morgan Kaufmann - on track to see books in February 2013.
As a teaser - here is the outline for the book:
Forward
Preface
Chapters:
Chapter 1 - Introduction
Chapter 2 - High Performance Closed TrackTest Drive!
Chapter 3 - A Friendly Country Road Race
Chapter 4 - Driving Around Town:Optimizing A Real-WorldCode Example
Chapter 5 - Lots of Data (Vectors)
Chapter 6 - Lots of Tasks (not Threads)
Chapter 7 - Offload
Chapter 8 - Coprocessor Architecture
Chapter 9 - Coprocessor System Software
Chapter 10 - Linux on the Coprocessor
Chapter 11 - Math Library
Chapter 12 - MPI
Chapter 13 - Profiling and Timing
Chapter 14 - Summary
Glossary
Index
We expect that to come out just over 400 pages.
This book belongs on the bookshelf of every HPC professional. Not only does it successfully and accessibly teach us how to use and obtain high performance on the Intel MIC architecture, it is about much more than that. It takes us back to the universal fundamentals of high-performance computing including how to think and reason about the performance of algorithms mapped to modern architectures, and it puts into your hands powerful tools that will be useful for years to come.
—Robert J. Harrison
Institute for Advanced Computational Science,
Stony Brook University
(this will be in the Preface to the book)
Our book Intel Xeon Phi Corpocessor High Performance Programming (ISBN 978-0-124-10414-3) will be available from the publisher Morgan Kaufmann in February 2013, and many book sellers (including Amazon.com). Pushing computing to new heights is among one of the most exciting human endeavors both for the thrill of doing it, and the thrill of what it makes possible. The Intel® Many Integrated Core (MIC) architecture and the first Intel® Xeon Phi™ coprocessor have brought us one of those rare, and very important, new chapters in this quest to push computing to new limits. Jim and James spent two years helping educate customers about the prototype and pre-production hardware before Intel introduced the first Intel® Xeon Phi™ coprocessor. They have distilled their own experiences coupled with insights from many expert customers, Intel Field Engineers, Application Engineers and Technical Consulting Engineers, to create this authoritative first book on programming for this new architecture and these new products. This book is useful even before you ever touch a system with an Intel® Xeon Phi™ coprocessor. The key techniques emphasized in this book are essential to programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi coprocessors, or other high performance microprocessors. Applying these techniques will generally increase your program performance on any system, and better prepare you for Intel Xeon Phi coprocessors and the Intel MIC architecture. |
![]() |