Errata: Intel Xeon Phi Coprocessor High Performance Programming

This is a list of known errata for the first printing of Intel Xeon Phi Coprocessor High Performance Programming, Jim Jeffers and James Reinders; Morgan Kaufmann (Elsevier), 2013.
If you find other items we should review for possible inclusion, please email us.

General comments:

  • none

Typographical errors:

  • Page 38
    • "Petal to the metal" should read "Pedal to the metal"
  • Page 39
    • Figure 2.5. The “0” under Core 1, Thread 0 in the “Scatter Affinity” portion of the figure should be “1”
  • Page 99
    • Line 2 of terminal output "% export OMP_NUM__THREADS = 244" has an extra underscore before THREADS. Should be:
      % export OMP_NUM_THREADS = 244
    • Line 9 of terminal output "% Export OMP_NUM_THREADS = 183" the word export should not be capitalized. Should be:
      % export OMP_NUM_THREADS = 183
  • Page 141
    • Line 5 there is a missing space in "the-vec-report6 output" should be:
      "the -vec-report6 output".
  • Page 381
    • last line in page "widely in used" should be:
      "widely used"

Code errors:

  • Page 352 - MPI + Offload Trapezoidal Rule Source Code

    The MPI Offload Trapezoidal example no longer works with the latest Intel compiler versions. The Intel compiler no longer supports function inlining in offload regions. This is the corrected code:

    #include
    #include
    #include
    #include

    #define NUM_TRAPEZOIDS 1000000000

    __attribute__((target(mic))) inline double f(double x) {
    return 1.00 * x*x * exp(-(x-0.0)*(x-0.0)/(2.0*0.25*0.25))
    + 0.50 * x*x * exp(-(x-0.2)*(x-0.2)/(2.0*0.50*0.50))
    + 0.50 * x*x * exp(-(x+0.2)*(x+0.2)/(2.0*0.50*0.50))
    + 0.25 * x*x * exp(-(x-0.4)*(x-0.4)/(2.0*1.00*1.00))
    + 0.25 * x*x * exp(-(x+0.4)*(x+0.4)/(2.0*1.00*1.00));
    }

    __attribute__((target(mic))) double kernel(const int chunk_size, const double x0, const double width) {
    double integral = 0;

    #pragma omp parallel
    #pragma omp for reduction(+:integral)
    for (int i = 0; i integral += 0.5 * width * (f(x0+width*i) + f(x0+width*(i+1)));
    }

    return integral;
    }

    int main (int argc, char *argv[]) {
    int namelen, rank, size;
    char name[MPI_MAX_PROCESSOR_NAME];
    double upper_bound = 5.0, lower_bound = -5.0;
    double x0, x1, width;
    double integral = 0;
    double compute_time, total_time;
    int chunk_size;

    MPI_Init(&argc, &argv);

    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Get_processor_name(name, &namelen);

    chunk_size = NUM_TRAPEZOIDS / size;

    x0 = lower_bound + (upper_bound - lower_bound)*rank/size;
    x1 = x0 + (upper_bound - lower_bound)/size;
    width = (x1-x0)/chunk_size;

    MPI_Barrier(MPI_COMM_WORLD);

    compute_time = total_time = MPI_Wtime();
    #pragma offload target(mic)
    integral = kernel(chunk_size, x0, width);
    compute_time = MPI_Wtime() - compute_time;

    MPI_Allreduce(MPI_IN_PLACE, &integral, 1, MPI_DOUBLE, MPI_SUM,
    MPI_COMM_WORLD);
    total_time = MPI_Wtime() - total_time;

    printf("rank %d of %d running on %s: %f seconds\n", rank, size, name, compute_time);

    if (rank == 0) {
    printf("integral = %f, time = %f seconds\n", integral, total_time);
    }

    MPI_Finalize();

    return(0);
    }

  • Page 232 - Fortran asynchronous data transfer code example
    The code example has 2 errors

    Line 04 should be:
    integer:: signal_1 = 1, signal_2 = 2

    Explanation: For the signaling to work properly in the latest compiler version, a value must be assigned to the signal variable

    Line 09 should be:
    f1 = 1.0

    Explanation: f1 was inadvertently missed in printing the book.