Reinders' Blogs

Subscribe to Reinders' Blogs feed
Updated: 17 min 30 sec ago

Open Source Project: Intel Data Analytics Acceleration Library (DAAL)

April 15, 2016 - 10:16am

We have created a data analytics acceleration project on github, to help accelerate data analytics applications. We have placed the Intel® Data Analytics Acceleration Library (Intel® DAAL), the high performance analytics (for "Big Data") library for x86 and x86-64, into open source to create this project. 

Intel DAAL helps accelerate big data analytics by providing highly optimized algorithmic building blocks for all data analysis stages (preprocessing, transformation, analysis, modeling, validation, and decision making) for batch, online and distributed processing modes of computation. It’s designed for use with popular data platforms including Hadoop*, Spark*, R, and Matlab* for highly efficient data access. Intel DAAL is available for Linux*, OS X* and Windows* and is licensed with the Apache 2.0 license. The DAAL project is available on github for download, feedback and contributions.

Intel DAAL has benefited from customer feedback since its initial release in 2015. Following a year of intense feedback and additional development as a full product, we are excited to introduce it as a very solid open source project ready for use and participation. Intel DAAL remains an integral part of Intel's software developer tools and is backed by Intel with support and future development investments.

WHERE ACCELERATE DATA ANALYTICS

The Intel Data Analytics Acceleration Library (Intel DAAL) is a library delivering high performance machine learning and data analytics algorithms.  Intel DAAL is an essential component of Intel’s overall machine learning solution including Intel® Xeon® Processor E7 Family, the Trusted Analytics Platform and Intel® Xeon Phi™ Processors (Knights Landing). Intel DAAL works with a wide selection of data platforms and programming languages including Hadoop, Spark, Python, Java and C++. Intel DAAL was first released in 2015 without source code to give us time to evolve some interfaces on our path to open sourcing this year. We appreciate the many users who have given feedback and encouraged us to get where we are today. Previous versions of Intel DAAL required separate installation of the Intel Math Kernel Library (Intel MKL) and Intel Integrated Performance Primitives (Intel IPP). The latest version of Intel DAAL actually comes with the necessary binary parts of Intel MKL (for BLAS and LAPACK) as well as Intel IPP (compression and decompression) so that the tremendous performance from these key routines are available automatically with no additional downloads needed! In order to make the most of multicore and many-core parallelism, and for superior threading interoperability, it is notable that the threading in Intel DAAL relies on the open source project known as "TBB" (Intel Threading Building Blocks).

EXPERIENCE PERFORMANCE

In the exciting and rapidly-evolving data analytics market, this key Intel performance library can really boost performance. At the Intel Developers Forum in 2015, Capital One discussed significant acceleration (over 200X - see slide 26) as an early user of Intel DAAL. We've seen numerous examples across many industries in the first year of product of substantial performance improvements using Intel DAAL -  it is definitely worth a try!

Many more details about the product are available on the product page including some benchmarking data to share more related to the potential performance gains when using DAAL.

SPEEDING TOWARD 2017 - JOIN US!

DAAL is currently speeding toward a "2017" release (expected in late Q3 2016) in conjunction with Intel's award winning Intel Parallel Studio suite of developer tools.  Precompiled binaries with installers are available for free as part of the beta program. Registration for the beta is available at tinyurl.com/ipsbeta2017.

The open source project feeds the product; there are no features held exclusively for the product version. The only difference when purchased is that Intel's Premier support is included for the entire product.

Support for all users of Intel DAAL is available online through the online Intel DAAL forum.

How to detect Knights Landing AVX-512 support (Intel Xeon Phi processor)

February 22, 2016 - 9:09am

The Intel Xeon Phi processor, code named Knights Landing, is part of the second generation of Intel Xeon Phi products.  Knights Landing supports AVX-512 instructions, specifically AVX-512F (foundation), AVX-512CD (conflict detection), AVX-512ER (exponential and reciprocal) and AVX-512PF (prefetch).

If we want an application to run everywhere, in order to use these instructions in a program, we need to make sure that the operating system and the processor have support for them when the application is run.

The Intel compiler provides a single function _may_i_use_cpu_feature that does all this easily. This program shows how we can use it to test for the ability to use AVX-512F, AVX-512ER, AVX-512PF and AVX-512CD instructions.

#include <immintrin.h> #include <stdio.h> int main(int argc, char *argv[]) { const unsigned long knl_features = (_FEATURE_AVX512F | _FEATURE_AVX512ER | _FEATURE_AVX512PF | _FEATURE_AVX512CD ); if ( _may_i_use_cpu_feature( knl_features ) ) printf("This CPU supports AVX-512F+CD+ER+PF as introduced in Knights Landing\n"); else printf("This CPU does not support all Knights Landing AVX-512 features\n"); return 1; }

if we compile with the -xMIC_AVX512 flag, the Intel compiler will automatically protect the binary and such checking is not necessary.  For instance, if we compile and run as follow we can see the result of running on a machine other than a Knights Landing.

icc -xMIC-AVX512 -o sample sample.c
./sample

Please verify that both the operating system and the processor support Intel(R) MOVBE, F16C, AVX, FMA, BMI, LZCNT, AVX2, AVX512F, ADX, RDSEED, AVX512ER, AVX512PF and AVX512CD instructions.


In order to run on all processors, we compile and run as follows:

icc -axMIC-AVX512 -o sample sample.c
./sample

When we run on a Knights Landing it prints:
This CPU supports AVX-512F+CD+ER+PF as introduced in Knights Landing

When we run on a processor without the AVX-512 support at least equivalent to Knights Landing it prints:
This CPU does not support all Knights Landing AVX-512 features

If we want to support compilers other than Intel, the code is slightly more complex because the function _may_i_use_cpu_feature is not standard (and neither are the __buildin functions in gcc and clang/LLVM).  The following code works with at least the Intel compiler, gcc, clang/LLVM and Microsoft compilers.

#if defined(__INTEL_COMPILER) && (__INTEL_COMPILER >= 1300) #include <immintrin.h> int has_intel_knl_features() { const unsigned long knl_features = (_FEATURE_AVX512F | _FEATURE_AVX512ER | _FEATURE_AVX512PF | _FEATURE_AVX512CD ); return _may_i_use_cpu_feature( knl_features ); } #else /* non-Intel compiler */ #include <stdint.h> #if defined(_MSC_VER) #include <intrin.h> #endif void run_cpuid(uint32_t eax, uint32_t ecx, uint32_t* abcd) { #if defined(_MSC_VER) __cpuidex(abcd, eax, ecx); #else uint32_t ebx, edx; #if defined( __i386__ ) && defined ( __PIC__ ) /* in case of PIC under 32-bit EBX cannot be clobbered */ __asm__ ( "movl %%ebx, %%edi \n\t cpuid \n\t xchgl %%ebx, %%edi" : "=D" (ebx), # else __asm__ ( "cpuid" : "+b" (ebx), # endif "+a" (eax), "+c" (ecx), "=d" (edx) ); abcd[0] = eax; abcd[1] = ebx; abcd[2] = ecx; abcd[3] = edx; #endif } int check_xcr0_zmm() { uint32_t xcr0; uint32_t zmm_ymm_xmm = (7 << 5) | (1 << 2) | (1 << 1); #if defined(_MSC_VER) xcr0 = (uint32_t)_xgetbv(0); /* min VS2010 SP1 compiler is required */ #else __asm__ ("xgetbv" : "=a" (xcr0) : "c" (0) : "%edx" ); #endif return ((xcr0 & zmm_ymm_xmm) == zmm_ymm_xmm); /* check if xmm, zmm and zmm state are enabled in XCR0 */ } int has_intel_knl_features() { uint32_t abcd[4]; uint32_t osxsave_mask = (1 << 27); // OSX. uint32_t avx2_bmi12_mask = (1 << 16) | // AVX-512F (1 << 26) | // AVX-512PF (1 << 27) | // AVX-512ER (1 << 28); // AVX-512CD run_cpuid( 1, 0, abcd ); // step 1 - must ensure OS supports extended processor state management if ( (abcd[2] & osxsave_mask) != osxsave_mask ) return 0; // step 2 - must ensure OS supports ZMM registers (and YMM, and XMM) if ( ! check_xcr0_zmm() ) return 0; return 1; } #endif /* non-Intel compiler */ static int can_use_intel_knl_features() { static int knl_features_available = -1; /* test is performed once */ if (knl_features_available < 0 ) knl_features_available = has_intel_knl_features(); return knl_features_available; } #include <stdio.h> int main(int argc, char *argv[]) { if ( can_use_intel_knl_features() ) printf("This CPU supports AVX-512F+CD+ER+PF as introduced in Knights Landing\n"); else printf("This CPU does not support all Knights Landing AVX-512 features\n"); return 1; }

Acknowledgment: Thank you to Max Locktyukhin (Intel) for his article 'How to detect New Instruction support in the 4th generation Intel® Core™ processor family' which served as the model for my Knights Landing detection code.

Python accelerated (using Intel MKL)

January 3, 2016 - 2:23pm

Python can be accelerated by having the numerical libraries, NumPy and SciPy, use the Intel® Math Kernel Library (MKL).  This requires no change to your Python application, and instantly speeds up performance on Intel processors, including Intel® Xeon Phi™ processors (codenamed Knights Landing).

There are several ways to do this, the easiest being simply to use a distribution which already optimizes Python libraries with Intel MKL.

Here is a list of distributions, which are available for free, which offer accelerated Python performance:

You can also build the libraries yourself to use Intel MKL.  Instructions for doing so, along with other performance oriented tuning advice/tips, can be found at https://software.intel.com/runtimes/python.

 

No Cost Options for Intel Integrated Performance Primitives Library (IPP), Support yourself, Royalty-Free

September 14, 2015 - 12:29pm

The Intel® Integrated Performance Primitives Library (Intel® IPP), a high performance library with thousands of optimized functions for x86 and x86-64, is available for free for everyone (click here to register and download). Purchasing is only necessary if you want access to Intel® Premier Support (direct 1:1 private support from Intel), older versions of the library or access to other tools in Intel® Parallel Studio XE or Intel® System Studio. Intel continues to actively develop and support this very powerful library - and everyone can benefit from that!

Intel® IPP is an extensive library which includes thousands of optimized functions covering frequently used fundamental algorithms including those for creating digital media, enterprise data, embedded, communications, and scientific/technical applications.  Intel IPP includes routines for Image Processing, Computer Vision, Data Compression, Signal Processing and (with an optional add-on) Cryptography. Intel IPP is available for Linux*, OS X* and Windows* under the Community Licensing program currently.

Intel® IPP is shipped with the Intel® Compilers and all the other Intel® Performance Libraries in various products from Intel. It can be obtained with tools for analysis, debugging and tuning, tools for MPI and the Intel® MPI Library by acquiring the Intel® Parallel Studio XE or with Android support with Intel® System Studio. Did you know that some of these are available for free?

Here is a guide to various ways to obtain the latest version of the Intel® Integrated Performance Primitives Library (Intel® IPP) for free without access to Intel® Premier Support (get support by posting to the Intel Integrated Performance Primitives Library forum). Anytime you want, the full suite of tools (Intel® Parallel Studio XE or Intel® System Studio) with Intel® Premier Support and access to previous library versions can be purchased worldwide.

Who What is Free? Information Where? Community Licenses for Everyone

Intel® Integrated Performance Primitives (Intel® IPP - Linux*, Windows* or OS X* versions)

Intel® Data Analytics Acceleration Library
(Intel® DAAL - Linux*, Windows* or OS X* versions)

Intel® Math Kernel Library (Intel® MKL - Linux* or Windows* versions)

Intel® Threading Building Blocks
(Intel® TBB - Linux*, Windows* or OS X* versions)

Community Licensing for Intel® Performance Libraries – free for all, registration required, no royalties, no restrictions on company or project size, current versions of libraries, no Intel Premier Support access.

Forums for discussion and support are open to everyone:

Community Licensing for Intel Performance Libraries Evaluation Copies for Everyone

Intel® Integrated Performance Primitives (Intel® IPP)
along with Compilers, libraries and analysis tools (most everything!)

Evaluation Copies – Try before you buy.

Intel® Parallel Studio for Linux, Windows or OS X versions;

Intel® System Studio for Android, Linux or Windows.

Try Intel Parallel Studio (with Intel IPP) before you buy: Linux, Windows or OS X.

Try Intel System Studio (with Intel IPP) before you buy: Android, Linux or Windows.

Use as an Academic Researcher

Linux, Windows or OS X versions of:

Intel® Integrated Performance Primitives

Intel® Data Analytics Acceleration Library

Intel® Math Kernel Library

Intel® Threading Building Blocks

Intel® MPI Library (not available for OS X)

If you will use in conjunction with academic research at institutions of higher education.

(Linux, Windows or OS X versions, except the Intel® MPI Library which is not supported on OS X, and Intel® MKL which is not available standalone on OS X)

Qualify for Use as an Academic Researcher Student

Intel® Integrated Performance Primitives (Intel® IPP)
along with Compilers, libraries and analysis tools (most everything!)

If you are a current student at a degree-granting institutions.

Intel® Parallel Studio for Linux, Windows or OS X versions;

Intel® System Studio for Android, Linux or Windows.

Qualify for Use as a Student Teacher

Intel® Integrated Performance Primitives (Intel® IPP)
along with Compilers, libraries and analysis tools (most everything!)

If you will use in a teaching curriculum.

Intel® Parallel Studio for Linux, Windows or OS X versions;

Intel® System Studio for Android, Linux or Windows.

Qualify for Use as an Educator Use as an
Open Source Contributor

Intel® Integrated Performance Primitives (Intel® IPP)
along with all of the
Intel® Parallel Studio XE Professional Edition for Linux

If you are a developer actively contributing to a open source projects – and that is why you will utilize the tools.

(Linux versions)

Qualify for Use as an Open Source Contributor

Free licenses for certain users has always been an important dimension in our offerings. One thing that really distinguishes Intel is that we sell excellent tools and provide second-to-none support for software developers who buy our tools. We provide multiple options - and we hope you will find exactly what you need in one of our options.

 

No Cost Options for Intel Data Analytics Acceleration Library (DAAL), Support yourself, Royalty-Free

September 4, 2015 - 8:56am

The Intel® Data Analytics Acceleration Library (Intel® DAAL), the high performance analytics (for "Big Data") library for x86 and x86-64, is available for free for everyone (click here now to register and download). Purchasing is only necessary if you want access to Intel® Premier Support (direct 1:1 private support from Intel), older versions of the library or access to other tools in Intel® Parallel Studio XE. Intel continues to actively develop and support this very powerful library - and everyone can benefit from that!

Intel® DAAL is a library product from Intel that accelerates big data analytics by providing highly optimized algorithmic building blocks for all data analysis stages (Pre-processing, Transformation, Analysis, Modeling, Validation, and Decision Making) for offline, streaming and distributed analytics usages. It’s designed for use with popular data platforms including Hadoop*, Spark*, R, and Matlab*. for highly efficient data access. Intel DAAL is available for Linux*, OS X* and Windows*. 

Intel® DAAL is shipped with the Intel® Compilers and all the other Intel® Performance Libraries in various products from Intel. It can be obtained with tools for analysis, debugging and tuning, tools for MPI and the Intel® MPI Library by acquiring the Intel® Parallel Studio XE. Did you know that some of these are available for free?

Here is a guide to various ways to obtain the latest version of the Intel® Data Analytics Acceleration Library (Intel® DAAL) for free without access to Intel® Premier Support (get support by posting to the Intel Data Analytics Acceleration Library forum). Anytime you want, the full suite of tools (Intel® Parallel Studio XE) with Intel® Premier Support and access to previous library versions can be purchased worldwide.

Who What is Free? Information Where? Community Licenses for Everyone

Intel® Data Analytics Acceleration Library
(Intel® DAAL - Linux*, Windows* or OS X* versions)

Intel® Math Kernel Library (Intel® MKL - Linux* or Windows* versions)

Intel® Threading Building Blocks
(Intel® TBB - Linux*, Windows* or OS X* versions)

Intel® Integrated Performance Primitives (Intel® IPP - Linux*, Windows* or OS X* versions)

Community Licensing for Intel® Performance Libraries – free for all, registration required, no royalties, no restrictions on company or project size, current versions of libraries, no Intel Premier Support access.

(

Forums for discussion and support are open to everyone:

Community Licensing for Intel Performance Libraries Evaluation Copies for Everyone

Intel® Data Analytics Acceleration (Intel® DAAL)
along with Compilers, libraries and analysis tools (most everything!)

Evaluation Copies – Try before you buy.

(Linux, Windows or OS X versions)

Try before you buy Use as an Academic Researcher

Linux, Windows or OS X versions of:

Intel® Data Analytics Acceleration Library

Intel® Math Kernel Library

Intel® Threading Building Blocks

Intel® Integrated Performance Primitives

Intel® MPI Library (not available for OS X)

If you will use in conjunction with academic research at institutions of higher education.

(Linux, Windows or OS X versions, except the Intel® MPI Library which is not supported on OS X)

Qualify for Use as an Academic Researcher Student

Intel® Data Analytics Acceleration (Intel® DAAL)
along with Compilers, libraries and analysis tools (most everything!)

If you are a current student at a degree-granting institutions.

(Linux, Windows or OS X versions)

Qualify for Use as a Student Teacher

Intel® Data Analytics Acceleration (Intel® DAAL)
along with Compilers, libraries and analysis tools (most everything!)

If you will use in a teaching curriculum.

(Linux, Windows or OS X versions)

Qualify for Use as an Educator Use as an
Open Source Contributor

Intel® Data Analytics Acceleration (Intel® DAAL)
along with all of the
Intel® Parallel Studio XE Professional Edition for Linux

If you are a developer actively contributing to a open source projects – and that is why you will utilize the tools.

(Linux versions)

Qualify for Use as an Open Source Contributor

Free licenses for certain users has always been an important dimension in our offerings. One thing that really distinguishes Intel is that we sell excellent tools and provide second-to-none support for software developers who buy our tools. We provide multiple options - and we hope you will exactly what you need in one or our options.

 

MKL Community Edition for OS X availability

August 31, 2015 - 2:27pm

Hello! I don't see MKL listed in my available downloads for "Community Licensing for Intel® Performance Libraries for OS X". It appears to be available for Linux and Windows, though. Will MKL be made available for OS X?

Thanks!
Tim

No Cost Options for Intel Math Kernel Library (MKL), Support yourself, Royalty-Free

August 31, 2015 - 10:36am

The Intel® Math Kernel Library (Intel® MKL), the high performance math library for x86 and x86-64, is available for free for everyone (click here now to register and download). Purchasing is only necessary if you want access to Intel® Premier Support (direct 1:1 private support from Intel), older versions of the library or access to other tools in Intel® Parallel Studio XE. Intel continues to actively develop and support this very powerful library - and everyone can benefit from that!

Intel® Math Kernel Library (Intel® MKL) is a very popular library product from Intel that accelerates math processing routines to increase application performance. Intel® MKL includes highly vectorized and threaded Linear Algebra, Fast Fourier Transforms (FFT), Vector Math and Statistics functions. The easiest way to take advantage of all of that processing power is to use a carefully optimized computing math library; even the best compiler can’t compete with the level of performance possible from a hand-optimized library. If your application already relies on the BLAS or LAPACK functionality, simply re-link with Intel® MKL to get better performance on Intel and compatible architectures.

Intel® MKL is most often obtained with the Intel® Compilers and all the other Intel® Performance Libraries in various products from Intel. It can be obtained with tools for analysis, debugging and tuning, tools for MPI and the Intel® MPI Library by acquiring the Intel® Parallel Studio XE. Did you know that some of these are available for free?

Here is a guide to various ways to obtain the latest versions of the Intel® Math Kernel Library (Intel® MKL) for free without access to Intel® Premier Support (get support by posting to the Intel Math Kernel Library forum). Anytime you want, the full suite of tools (Intel® Parallel Studio XE) with Intel® Premier Support and access to previous library versions can be purchased worldwide.

Who What is Free? Information Where? Community Licenses for Everyone

Intel® Math Kernel Library (Intel® MKL)

Intel® Data Analytics Acceleration Library
(Intel® DAAL)

Intel® Threading Building Blocks
(Intel® TBB)

Intel® Integrated Performance Primitives (Intel® IPP)

Community Licensing for Intel® Performance Libraries – free for all, registration required, no royalties, no restrictions on company or project size, current versions of libraries, no Intel Premier Support access.

(Linux*, Windows* or OS X* versions)

Forums for discussion and support are open to everyone:

Community Licensing for Intel Performance Libraries Evaluation Copies for Everyone

Intel® Math Kernel Library (Intel® MKL)
along with Compilers, libraries and analysis tools (most everything!)

Evaluation Copies – Try before you buy.

(Linux, Windows or OS X versions)

Try before you buy Use as an Academic Researcher

Linux, Windows or OS X versions of:

Intel® Math Kernel Library

Intel® Data Analytics Acceleration Library

Intel® Threading Building Blocks

Intel® Integrated Performance Primitives

Intel® MPI Library (not available for OS X)

If you will use in conjunction with academic research at institutions of higher education.

(Linux, Windows or OS X versions, except the Intel® MPI Library which is not supported on OS X)

Qualify for Use as an Academic Researcher Student

Intel® Math Kernel Library (Intel® MKL)
along with Compilers, libraries and analysis tools (most everything!)

If you are a current student at a degree-granting institutions.

(Linux, Windows or OS X versions)

Qualify for Use as a Student Teacher

Intel® Math Kernel Library (Intel® MKL)
along with Compilers, libraries and analysis tools (most everything!)

If you will use in a teaching curriculum.

(Linux, Windows or OS X versions)

Qualify for Use as an Educator Use as an
Open Source Contributor

Intel® Math Kernel Library (Intel® MKL)
along with all of the
Intel® Parallel Studio XE Professional Edition for Linux

If you are a developer actively contributing to a open source projects – and that is why you will utilize the tools.

(Linux versions)

Qualify for Use as an Open Source Contributor

Free licenses for certain users has always been an important dimension in our offerings. One thing that really distinguishes Intel is that we sell excellent tools and provide second-to-none support for software developers who buy our tools. We provide multiple options - and we hope you will exactly what you need in one or our options.

 

Intel Data Analytics Acceleration Library

August 25, 2015 - 9:14am

The Intel® Data Analytics Acceleration Library (Intel® DAAL) helps speed big data analytics by providing highly optimized algorithmic building blocks for all data analysis stages (Pre-processing, Transformation, Analysis, Modeling, Validation, and Decision Making) for offline, streaming and distributed analytics usages. It’s designed for use with popular data platforms including Hadoop, Spark, R, and Matlab. for highly efficient data access.

Intel DAAL is available for Linux, OS X and Windows. 

DAAL is more than MKL for Big Data

Like the Intel® Math Kernel Library (Intel® MKL), Intel DAAL is a highly optimized library of computationally intensive routines supporting Intel architecture including Intel® Xeon® processors, Intel® Core processors, Intel® Atom processors and Intel® Xeon Phi™ processors (codenamed Knights Landing).

Indeed, Data Scientists have been using Intel MKL to help with Big Data problems for some time. There are algorithms in Intel DAAL that have been in Intel MKL for years such matrix decomposition and low order moments. However, most of Intel MKL was designed for when entire data fits in memory. Intel DAAL can handle situations when data is too big to fit in memory all at once, which can be referred to as ‘out of core’ algorithms. Intel DAAL provides for data to be available in chunks rather than all at once. Intel DAAL is designed for use with popular data platforms including Hadoop, Spark, R, Matlab, etc. for highly efficient data access. Intel DAAL has data management built in so that applications can directly access data from various kind for sources including files, in-memory buffer, SQL database, HDFS, etc.

Intel® DAAL supported three processing modes:

  • Batch processing – When all data fits in the memory, a function is called to process the data all at once.
  • Online processing (also called Streaming) – when all data does not fit in memory. Intel® DAAL can process data chunks individually and combine all partial results at the finalizing stage.
  • Distributed processing – Intel® DAAL supports a model similar to MapReduce. Consumers in a cluster process local data (map stage), and then the Producer process collects and combines partial results from Consumers (reduce stage). DAAL offers flexibility in this mode by leaving the communication functions completely to the developer. Developers can choose to use the data movement in a framework such as Hadoop or Spark, or explicitly coding communications most likely with MPI
Rich Set of Algorithms, with more to come!

Intel® DAAL provides a rich set of algorithms, ranging from the most basic descriptive statistics for datasets to more advanced data mining and machine learning algorithms.

We love feedback, and encourage it! Feedback from the beta this year means we will be adding more customer requested algorithms in the upcoming months. The initial release, available as of August 25, 2015, of Intel® DAAL includes the following algorithms:

  • Low Order Moments – Includes computing min, max, mean, standard deviation, variance, etc. for a dataset. These are fundamental, in a sense they are the bread and butter in any data analysis.
  • Quantiles – splitting observations into equal-sized groups defined by quantile orders. My spell checker wants to correct ‘quantiles’ to ‘quartiles’ which is simply a 4-quantile.
  • Correlation matrix and variance-covariance matrix – A basic tool in understanding statistical dependence among variables. The degree of correlation indicates the tendency of one change to indicate the likely change in another. For instance, I notice that as the temperature rises outside that sales increase at our local ice cream shop.
  • Correlation distance matrix – Measuring pairwise distance between items using correlation distance. Zero means true independence, an improvement over Pearson's correlation which is biased to linear relationships.
  • Cosine distance matrix – Measuring pairwise distance using cosine distance. Very commonly used in information retrieval, for example, comparing if two text documents are similar to each other. Plagiarists beware, big data is watching.
  • Data transformation through matrix decomposition – Intel® DAAL algorithms operate on tabular data (conceptually), and when the data is homogeneous (of the same data type) then it is essentially 2D matrices. A lot of algorithms depend on matrix decomposition. Intel DAAL provides Cholesky, QR, and SVD decomposition algorithms. Cholesky is used to solve symmetric linear systems; QR is used in least squares problems and linear regression; SVD is used for principle component analysis.
  • Principle Component Analysis (PCA), the most popular algorithm for dimensionality reduction. Useful to reduce dimensions (the number of columns) of a dataset such that it is easier to handle and smaller to be carried around.
  • Outlier detection – Identifying observations that are abnormally distant from typical distribution of other observations. This can be useful to detect erroneous data; abnormal events such as system failures, or erratic behaviors.
  • Association rules mining – Detecting co-occurrence patterns. Commonly known as “shopping basket mining.” It’s the type of mining method that Pandora.com uses to predict what song you want to listen to next, or Amazon.com uses to predict know what else you are likely to buy, and Target may use to predict if your coupon book should include items because you are pregnant.
  • Linear regression – The simplest regression method. Fitting a linear equation to model the relationship between dependent variables (things to be predicted) and explanatory variables (things known). A linear model might be used to predict highway traffic growth as a result of the population increase of a city.
  • Classification – Building a model to assign items into different labeled groups. Intel DAAL provides multiple algorithms in this area, including Naïve Bayes classifier, Support Vector Machine, and multi-class classifiers. Classifications include “junk filters” that analyze an email message to be spam or not spam, or a bank analyzing a loan to be high, medium or low risk.
  • Clustering – Grouping data into unlabeled groups. This is a typical technique used in “unsupervised learning” where there is not established model to rely on. Intel DAAL provides 2 algorithms for clustering: K-Means and “EM for GMM.” Clustering might be used in the analysis of clinical data from many patients where we may discover that one group might be predominated by overweight male with high cholesterol levels, and another group that is predominated by slim female with low cholesterol,

Intel DAAL includes C++ and Java interfaces. In order to maximize performance, all the compute kernels in Intel DAAL are actually implemented using C++. Java is supported via wrappers around the high performance C++ implementation. The Java interface interacts with C++ kernel through the JNI (Java Native Interface). Users do not need to write any JNI code, it’s included with Intel DAAL.

Ease of use and Performance

The performance advantages of Intel® DAAL can be substantial. Our comparison of the Principle Component Analysis in Intel DAAL vs. Spark + MLLib is shown here:

Our 4X – 7X result is based on this very specific benchmark. Of course, your results may vary. For instance, consider this quote from a customer:

“Using a pre-release version of Intel® Data Analytics Acceleration Library, we’ve seen an up to 200x improved performance for the Alternating Least Square prediction algorithm powering our recommendation engine compared with the latest open source Spark + MLlib baseline.  At Capital One™, this lets us realize our vision of transforming the way our customers manage their finances through personalized and meaningful interaction.”

- Ilya Ganelin, Senior Data Engineer, Capital One Data Innovation Lab

Learn more about Intel® DAAL

You can download an evaluation copy of Intel DAAL today.

There is a series of webinars being held starting in September 2015 which cover many topics related to Intel Parallel Studio XE 2016.  On September 29, 2015 (9am-10am Pacific Time) there is one entitled "Faster Big Data Analytics Using New Intel® Data Analytics Acceleration Library." The webinars can be attended live, and offer interactive question and answer time. The webinars will also be available for replay after the live webinar is held.

If you want to look at the mechanics of using Intel DAAL, you might want to take a look at Intel DAAL Code Samples showing some integration examples (which were posted to the Intel DAAL Forum on the Intel website), starting with a basic usage in C++ and Java code examples for both Apache Spark (interacting with Spark* RDD (Resilient Distributed Datasets)) and Apache Hadoop (using DAAL functions with Hadoop MapReduce including interacting with HDFS). Included in the Intel DAAL Code Samples are three code samples:

  1. Principle Component Analysis (PCA) - This C++ code illustrate the basic usage of DAAL API.
  2. Apache Spark* example - This Java code shows how to interact with Spark* RDD (Resilient Distributed Datasets).
  3. Apache Hadoop* example - This Java code shows how to use DAAL functions with Hadoop MapReduce and how to interact with HDFS.
Machine learning/deep learning

The Intel MKL optimizes many routines critical for Machine learning/deep learning. Intel fellow, Pradeep Dubey, had an excellent talk at Intel’s Developer Forum in San Francisco (August 18-20, 2015) which he summarizes in his blog “Pushing Machine Learning to a New Level with Intel® Xeon® and Intel® Xeon Phi™ Processors.” His presentation “Technology Insight: Data Analytics and Machine Learning” covers this topic.  As Pradeep notes, even though results today offer record breaking performance, future releases of both Intel MKL and Intel DAAL will feature additional improvements for CNN/DNN.

Another presentation you may want to examine is “Accelerating Machine Learning with Intel® Tools and Libraries” created by Fred Magnotta, Zhang Zhang, and Vikram Saletore of Intel with Ilya Ganelin, Sr. Data Engineer, Capital One.

Download Intel DAAL today

Intel DAAL is available for Linux, OS X and Windows.

An evaluation copy of Intel DAAL can be obtained by requesting an evaluation copy of Intel® Parallel Studio XE 2016. It is available for purchase worldwide as stand-alone library, or as part of an Intel® Parallel Studio XE 2016.

The Intel DAAL is also available via the Community Licensing for Intel Performance Libraries. Under this option, the library is free for any one who registers, with no royalties, and no restrictions on company or project size. The community licensing program offers the current versions of Intel DAAL without Intel Premier Support access (Intel Premier Support offers exclusive 1-on-1 support via an interactive and secure web site where you can submit questions or problems and monitor previously submitted issues. Intel® Premier Support requires registration after purchase of the software, or special qualification offered to students, educators, academic researchers and open source contributors.).

No Cost Options for Intel Parallel Studio XE, Support yourself, Royalty-Free

August 25, 2015 - 9:14am

Intel® Parallel Studio XE is a very popular product from Intel that includes the Intel Compilers, Intel Performance Libraries, tools for analysis, debugging and tuning, tools for MPI and the Intel MPI Library. Did you know that some of these are available for free?

Here is a guide to “what is available free” from the Intel Parallel Studio XE suites.

Who What is Free? Information Where? Community Licenses for Everyone

Intel® Math Kernel Library

Intel® Data Acceleration Library

Intel® Threading Building Blocks

Intel® Integrated Performance Primitives

Community Licensing for Intel Performance Libraries – free for all, registration required, no royalties, no restrictions on company or project size, current versions of libraries, no Intel Premier Support access.

(Linux, Windows or OS X versions)

Community Licensing for Intel Performance Libraries Evaluation Copies for Everyone

Compilers, libraries and analysis tools (most everything!)

Evaluation Copies – Try before you buy.

(Linux, Windows or OS X versions)

Try before you buy Use as an Academic Researcher

Linux, Windows or OS X versions of:

Intel® Math Kernel Library

Intel® Data Acceleration Library

Intel® Threading Building Blocks

Intel® Integrated Performance Primitives

Intel® MPI Library (not available for OS X)

If you will use in conjunction with academic research at institutions of higher education.

(Linux, Windows or OS X versions, expect Intel® MPI Library which is not supported on OS X)

Qualify for Use as an Academic Researcher Student

Compilers, libraries and analysis tools (most everything!)

If you are a current student at a degree-granting institutions.

(Linux, Windows or OS X versions)

Qualify for Use as a Student Teacher

Compilers, libraries and analysis tools (most everything!)

If you will use in a teaching curriculum.

(Linux, Windows or OS X versions)

Qualify for Use as an Educator Use as an
Open Source Contributor

Intel® Parallel Studio XE Professional Edition for Linux

If you are a developer actively contributing to a open source projects – and that is why you will utilize the tools.

(Linux versions)

Qualify for Use as an Open Source Contributor

Free licenses for certain users has always been an important dimension in our offerings. One thing that really distinguishes Intel is that we sell excellent tools and provide second-to-none support for software developers who buy our tools. We provide multiple options - and we hope you will exactly what you need in one or our options.

 

Intel® Parallel Studio XE 2016: High Performance for HPC Applications and Big Data Analytics

August 25, 2015 - 9:14am

Intel® Parallel Studio XE 2016, launched on August 25, 2015, is the latest installment in the premier developer toolkit for high performance computing (HPC) and technical computing applications. This suite of compilers, libraries, debugging facilities, and analysis tools, target Intel® architecture, including support for the latest Intel® Xeon® processors (codenamed Skylake) and Intel® Xeon Phi™ processors (codenamed Knights Landing). Intel® Parallel Studio XE 2016 assists software developers design, build, verify and tune code on Fortran, C++, C, and Java.

There are four things which stand-out for me when I describe this year’s tool release:

  1. Intel® Data Analytics Acceleration Library
  2. Vectorization Advisor
  3. MPI Performance Snapshot
  4. High performance support for industry standards, the latest processors, operating systems and their related development environments.
Intel Data Analytics Acceleration Library (Intel® DAAL)

Data Scientists are finding the new Intel® DAAL very exciting because it helps speed big data analytics. It’s designed for use with popular data platforms including Hadoop, Spark, R, and Matlab. for highly efficient data access. We’ve seen Intel DAAL accelerate PCA by 4-7X and a customer that has seen 200X for the Alternating Least Square prediction algorithm when compared with the latest open source Spark + MLlib. (details for both claims are in my blog about DAAL).  Intel DAAL was created by the renowned team that creates the Intel® Math Kernel Library (Intel® MKL). Intel DAAL can be thought of as “Intel MKL for Big Data” – but it is actually much more! Many more details on Intel DAAL, including ways to download it today for free are in my blog about DAAL. Intel DAAL is available for Linux, OS X and Windows. 

Vectorization Advisor

Vectorization is the process of using SIMD instructions in processors. In the quest to “modernize” application to get top performance out of any modern processor, a software developer needs to tackle multithreading, vectorization and fabric scaling. Intel® Advisor XE 2016 provides tools to help with multithreading and vectorization:

  • Vectorization Advisor is an analysis tool that helps identify loops that will benefit most from vectorization by identifying obstacles to vectorization that are particular to your program, explore the benefit of alternative data reorganizations, and increase the confidence that transformations, aimed to increase vectorization, will preserve the correctness of your original program.
  • Threading Advisor is a threading design and prototyping tool that lets you analyze, design, tune, and check threading design options rapidly.

Threading Advisor has gained a reputation in the past five years for helping find the right choice for multithreading an application more quickly and without costly oversights. The experience of refining this ‘advisor’ has helped us to create this new advisor for vectorization with our knowledge of the best ways to give advice based on a program analysis.

Vector Advisor cannot tell you anything we could not show you how to do yourself. However, when I teach ‘vectorization’ I tend to rattle off a list of things to check. Each item that I suggest to “check” involves using a tool in a particular way. Bringing that into one tool, make life easier and definitely makes the process faster and more efficient. One of the key Vectorization Advisor features is a Survey Report that offers integrated compiler report data and performance data all in one place, including GUI-embedded advice on how to fix vectorization issues specific to your code. This page augments that GUI-embedded advice with links to web-based vectorization resources.

An excellent 12 minute introduction to the Vectorization Advisor is available as a video online.

MPI Performance Snapshot

The MPI Performance Snapshot is a scalable lightweight performance tool for MPI applications. It collects a variety of MPI application statistics (such as communication, activity, and load balance) and presents it in an easy-to-read format. The tool is not available separately but is provided as part of the Intel® Parallel Studio XE 2016 Cluster Edition.

The MPI Performance Snapshot is trying to solve the following problems as it relates to analysis of MPI application when scaling out to thousands of ranks:

  1. Cluster Sizes continue to grow so applications are getting more and more scalable
  2. Large amounts of data are collected when doing profiling at larger scale - that can easily become unmanageable
  3. It's hard to identify which are the key metrics to track when you gather so much data

By addressing these three items, MPI Performance Snapshot improves scaling to at least 32K ranks which is an order of magnitude above what is tolerable with the prior Intel Trace Analyzer and Collector. Therefore, we can now recommend when aiming to optimize a large scale run (anything above 1000 MPI ranks), we suggesting starting with the MPI Performance Snapshot capability first and figure out where you need to dig deeper (which processes are slowing you down, where are the peaks in your memory usage, etc.).  Then, do another run with the Intel Trace Analyzer and Collector on a subset of selected ranks to get a more detailed per-process information in order to visualize how a communication algorithm is implemented and if see if there are apparent bottlenecks.

MPI Performance Snapshot combines lightweight statistics from the Intel® MPI Library with OS and hardware-level counters to provide you with high-level categorization of your application: MPI vs. OpenMP load imbalance info, memory usage, and a break-down of MPI vs. computation vs. serial time.

For more details, you should check out the full MPI Performance Snapshot User's Guide and Analyzing MPI Applications with MPI Performance Snapshot on the Intel Trace Analyzer and Collector documentation page.

High performance support for… The latest processors...

are supported including support for the Skylake microarchitecture and Knight Landing microarchitecture.

The latest industry standards...

We take pride in having very strong support for industry standards – we aim to be a leader and maintain our reputation of being second-to-none. 

Our Fortran support even includes a feature from the draft Fortran 2015 standard which can help MPI-3 users. The current status of features of Fortran can be found in Dr. Fortran’s blog “Intel® Fortran Compiler - Support for Fortran language standards.”

The current status of C/C++ standard support features can be found in Jennifer’s blogs “C++14 Features Supported by Intel® C++ Compiler” and “C11 Support in Intel C++ Compiler.” 

Our OpenMP support is detailed in the latest user guide for the C/C++ compiler and the latest user guide for the Fortran compiler.

Operating system support includes Debian 7.0, 8.0; Fedora 21, 22; Red Hat Enterprise Linux 5, 6, 7; SuSE LINUX Enterprise Server 11,12; Ubuntu 12.04 LTS (64-bit only), 13.10, 14.04 LTS, 15.04; OS X 10.10; Windows 7 thru 10, Windows Server 2008-2012. These are just the versions we have tested, many additional operating systems should work (for instance, CentOS).

Learn More

There is a series of webinars being held starting in September 2015 which cover many topics related to Intel Parallel Studio XE 2016.  The webinars can be attended live, and offer interactive question and answer time. The webinars will also be available for replay after the live webinar is held.  The first webinar is on September 1 – “What’s New in Intel® Parallel Studio XE 2016?

Many more ways to learn more are on the Intel® Parallel Studio XE 2016 website.

There are many new features that I did not dive into, including great new support for MPI+OpenMP tuning with Intel VTune Amplifier XE, as well as a number of enhancements to Intel® Threading Building Blocks including the incresingly popular flow graph capabilities and task arenas,

Download Intel® Parallel Studio XE 2016 today

An evaluation copy can be obtained by requesting an evaluation copy of Intel® Parallel Studio XE 2016. It is available for purchase worldwide.

Students, educators, academic researchers and open source contributors may qualify for some free tools.

The Intel Performance Libraries are also available via the Community Licensing for Intel Performance Libraries. Under this option, the library is free for any one who registers, with no royalties, and no restrictions on company or project size. The community licensing program offers the current versions of libraries without Intel Premier Support access (Intel Premier Support offers exclusive 1-on-1 support via an interactive and secure web site where you can submit questions or problems and monitor previously submitted issues. Intel® Premier Support requires registration after purchase of the software, or special qualification offered to students, educators, academic researchers and open source contributors.).

 

 

 

ModernCode Project - Intel and Partners Helping You Make Today’s Software Be Ready for Tomorrow’s Computers

July 13, 2015 - 5:16am

Today, we introduced the Intel® Modern Code Developer Community, which focuses on the pursuit of parallel programming.  In addition to the online community, we have an exciting contest for a very special and worthy cause, in partnership with the OHSU Knight Cancer Institute and CERN (Intel® Modern Code Challenge 2015), coming this fall. The community includes our very successful series of Modern Code Live Workshops taught around the world and our upcoming Intel® HPC Developer Conferences. Our community includes the Intel® Parallel Computer Centers (IPCCs) located at institutions around the world with the goal to modernize key technical codes, and experts from around the world including the Intel® Black Belt Software Developers.

Encouraging and Educating Parallel Programming

The end of rising clock rates, a decade ago, has ushered in an era of parallelism driven by the continued rise in transistor count in keeping with half a century of Moore’s Law. Today multicore and many-core processors offer amazing capabilities which are maximized by parallel programming.

Modern Code – architecting and optimizing for today and the future

“Modern Code” is code that has been re-architected and optimized, for parallelism, to run on today and tomorrow’s computers, including supercomputers, thus increasing application performance. These efforts benefit from the fruits of the Intel® Parallel Computer Centers (IPCCs) that we established with universities and other institutions around the world with the goal to modernize key technical codes. Many examples of successful techniques, including many from the IPCCs, are captured in content on the web site (Code Modernization Library), with more to come. You will also find excellent material on modernizing code in the series of “Pearls” books edited by myself and Jim Jeffers.

The Intel Modern Code Community hosts a growing collection of tools, training and support. We proudly feature an elite group of experts in parallelism and HPC, from Intel and the industry worldwide, that we call Intel® Black Belt Software Developers. Intel is partnering with these experts to train and support the broader community on modern code techniques.

Intel has been helping educate and encourage parallel programming. We have our very successful series of Modern Code Live Workshops taught around the world in conjunction with our training partners. Later this year, we will hold Intel® HPC Developer Conferences. We will keep you updated through the Intel® Modern Code Developer Community online (see "Upcoming Events").

Online Community - find us in person too!

To join the Intel Modern Code Community or find out more, visit the Intel® Modern Code Developer Community online, or find us here at the International Supercomputing Conference (July 13-15, 2015). There will be many more opportunities in the future to engage us in person, including the Intel® Developer Forum (IDF) in San Francisco, August 18-20, 2015 as well as the Supercomputing Conference 2015 in Austin, November 14-20, 2015.  Personally, I’ll be at all these and I would be very interested in discussing code modernization with you. I’ll also be at SIGGRAPH to teach a tutorial “Multithreading for Visual Effects” with five experts on Visual Effects on August 12 in Los Angeles, and I’m speaking at ATPESC 2015 the week prior.

Intel® Modern Code Challenge 2015
Coding for Science to Build a Better Tomorrow

As a way to test out newly acquired Modern Code skills and techniques, while contributing to a social cause, developers can participate in the Intel® Modern Code Challenge 2015. Intel is launching this challenge in partnership with the OHSU Knight Cancer Institute and CERN.

Parallel computing plays a role in advancing scientific research in key areas like cancer research, physics and climate modeling which rely on parallel computing to push the envelope in performance. Developers that participate in the Intel Modern Code Challenge will have an opportunity to help the industry make the best use of the computers we have available today to enable scientific breakthroughs.

Prizes will include a trip to CERN’s facilities and to SC15 in Austin, Texas. The top student participants will be eligible for scholarships. To receive additional details sign up at software.intel.com/moderncode/challenge.

Intel® HPC Developer Conferences

We will hold three Intel HPC Developer Conferences this year – one in the U.S., one in China and one in India.  We will announce more details on the Modern Code website soon.  I am the overall technical committee chair, and I’m very excited by the speakers and content we already have lined up. I expect to be able to announce complete details late this summer. We will keep you updated through the Intel® Modern Code Developer Community online (see "Upcoming Events").

Modern Code Live Workshops

In conjunction with partners, we have been holding hands-on training around the world for developers and partners. The classes are enabled with remote access to Intel® Xeon® processor and Xeon Phi™ coprocessor-based clusters. You can learn more at the Modern Code Live Workshops website. Training and resources cover architecture overviews, memory optimizations, multithreading, vectorization, Intel® Math Kernel Library, Intel® Threading Building Blocks, Intel® Parallel Studio XE and much more.

Join Us

Parallelism been long been embraced for High Performance Computing (HPC) for programming the world’s most powerful computers often called supercomputers. It is fitting that, today, at the International Supercomputing Conference, we launched the Intel® Modern Code Developer Community with many resources to help HPC developers get the most out of their applications on modern hardware.

I encourage you to come take advantage of one or more of the many benefits of the Intel® Modern Code Developer Community.

Meet the Experts - James Reinders

June 30, 2015 - 3:00pm

 

James R. Reinders
Chief Evangelist, HPC and Parallel Programming

James is involved in multiple engineering, research and educational efforts to increase use of parallel programming throughout the industry. Joining Intel Corporation in 1989, his contributions have included working on the world's first TeraFLOP/s supercomputer (ASCI Red) and the world's first TeraFLOP/s microprocessor (Intel® Xeon Phi™ coprocessor). James has been an author on numerous technical books, including Intel® Threading Building Blocks (O'Reilly Media, 2007), Structured Parallel Programming (Morgan Kaufmann, 2012), Intel® Xeon Phi™ Coprocessor High Performance Programming (Morgan Kaufmann, 2013), Multithreading for Visual Effects (A K Peters/CRC Press, 2014), and High Performance Parallelism Pearls Volumes One and Two (Morgan Kaufmann, 2015).

Useful Links

Books

High Performance Parallelism Pearls Volume Two

 

High Performance Parallelism Pearls Volume One

 

Intel Xeon Phi Coprocessor High-Performance Programming

 

Structured Parallel Programming: Patterns for Efficient Computation

 

Meet the experts

New Book, features TBB: Multithreading for Visual Effects

August 22, 2014 - 12:30pm

I have a copy of my latest book (with 6 wonderful co-authors)!  Based on the SIGGRAPH tutorial we did last year, it reviews successful techniques for parallel programming in applications doing visual effects (think: animated movies!)  The most referenced technique is TBB although other methods including OpenCL are also discussed.

Nice write-up associated with its release at SIGGRAPH 2014.

 

New book: Multithreading for Visual Effects

August 4, 2014 - 12:03pm

Several authors from DreamWorks Animation, Pixar, Side Effects, AMD and Intel got together to write a book based on the Siggraph 2013 course on Multithreading in Visual Effects. The material in the book is greatly expanded and updated from the course material, and includes an additional chapter on OpenSubdiv, authored by Manuel Kraemer of Pixar. Ron Henderson received a Technical Achievement Award earlier this year (Feb 2014) for the development of the FLUX gas simulation system (Chapter 5 in our book).

The book has just been published.  You can order it from many places including the publisher CRC Press and Amazon.  It will also be in an e-book form for Kindle, Google Play and Nook soon.

I won't be at Siggraph 2014, but several of my co-authors will be. On August 11, 2014, there is a bird of a feather "Siggraph 2014 Birds of a Feather session." The book will also be available on the Siggraph 2014 show floor at the Taylor & Francis booth (Booth 1213).

Chapter/Author list

Multithreading Introduction and Overview
James Reinders, Intel Corporation

Houdini: Multithreading existing software
Jeff Lait, Side Effects Software Inc

The Presto Execution System: Designing for Multithreading
George ElKoura, Pixar Animation Studios

LibEE: Parallel Evaluation of Character Rigs
Martin Watt, Dreamworks Animation

Fluids: Simulation on the CPU
Ron Henderson, Dreamworks Animation

Bullet Physics: Simulation with OpenCL
Erwin Coumans, Advanced Micro Devices, Inc.

OpenSubdiv: Interoperating GPU Compute and Drawing
Manuel Kraemer, Pixar

Additional AVX-512 instructions

July 17, 2014 - 11:21pm

Additional Intel® Advanced Vector Extensions 512 (Intel® AVX-512)

The Intel® Architecture Instruction Set Extensions Programming Reference includes the definition of additional Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions.

As I discussed in my first blog about Intel® AVX-512 last year, Intel AVX-512 will be first implemented in the future Intel® Xeon Phi™ processor and coprocessor known by the code name Knights Landing.

We had committed that Intel® AVX-512 would also be supported by some future Intel® Xeon® processors scheduled to be introduced after Knights Landing. These additional instructions we have documented now will appear in such processors together with most Intel® AVX-512 instructions published previously.

These new instructions enrich the operations available as part of Intel® AVX-512. These are provided in two groups. A group of byte and word (8 and 16-bit) operations known as Byte and Word Instructions, indicated by the AVX512BW CPUID flag, enhance integer operations. It is notable that these do make use of all 64 bits in the mask registers. A group of doubleword and quadword (32 and 64-bit) operations known as Doubleword and Quadword Instructions,  indicated by the AVX512DQ CPUID flag, enhance integer and floating-point operations.

An additional orthogonal capability known as Vector Length Extensions provide for most AVX-512 instructions to operate on 128 or 256 bits, instead of only 512. Vector Length Extensions can currently be applied to most Foundation Instructions, the Conflict Detection Instructions as well as the new Byte, Word, Doubleword and Quadword instructions. These AVX-512 Vector Length Extensions are indicated by the AVX512VL CPUID flag. The use of Vector Length Extensions extends most AVX-512 operations to also operate on XMM (128-bit, SSE) registers and YMM (256-bit, AVX) registers. The use of Vector Length Extensions allows the capabilities of EVEX encodings, including the use of mask registers and access to registers 16..31, to be applied to XMM and YMM registers instead of only to ZMM registers.

Emulation for Testing, Prior to Product

In order to help with testing of support, the Intel® Software Development Emulator has been extended to include these new Intel AVX-512 instructions and will be available very soon at http://www.intel.com/software/sde.

Intel AVX-512 Family of instructions

Intel AVX-512 Foundation Instructions will be included in all implementations of Intel AVX-512. While the Intel AVX-512 Conflict Detection Instructions are documented as optional extensions, the value for compiler vectorization has proven strong enough that they will be included in Intel Xeon processors that support Intel AVX-512. This makes Foundation Instructions and Conflict Detection Instructions both part of all Intel® AVX-512 support for both future Intel Xeon Phi coprocessor and processors and future Intel Xeon processors.

Knights Landing will support Intel AVX-512 Exponential & Reciprocal Instructions and Intel AVX-512 Prefetch Instructions, while the first Intel Xeon processors with Intel AVX-512 will support Intel AVX-512 Doubleword and Quadword Instructions, Intel AVX-512 Byte and Word Instructions and Intel AVX-512 Vector Length Extensions. Future Intel® Xeon Phi™ Coprocessors and processors, after Knights Landing, may offer additional Intel AVX-512 instructions but should maintain a level of support at least that of Knight Landing (Foundation Instructions, Conflict Detection Instructions, Exponential & Reciprocal Instructions, and Prefetch Instructions). Likewise, the level of Intel AVX-512 support in the Intel Xeon processor family should include at least Foundation Instructions, Conflict Detection Instructions, Byte and Word Instructions, Doubleword and Quadword Instructions and Vector Length Extensions whenever Intel AVX-512 instructions are supported. Assuming these baselines in each family simplifies compiler designs and should be done.

Intel AVX-512 support

Release of detailed information on these additional Intel AVX-512 instructions helps enable support in tools, applications and operating systems by the time products appear. We are working with open source projects, application providers and tool vendors to help incorporate support. The Intel compilers, libraries, and analysis tools have strong support for Intel AVX-512 today and updates, planned for November 2014, will provide support for these additional instructions as well.

Intel AVX-512 documentation

The Intel AVX-512 instructions are documented in the Intel® Architecture Instruction Set Extensions Programming Reference. Intel AVX-512 is detailed in Chapters 2-7.

 

When is AVX 512 on a chip, not just an emulator?

July 1, 2014 - 5:06am

I'm having a really hard time finding anything other than rumors about this. I have seen the official statement that Broadwell chips will be available before Christmas, but I can't tell if Broadwell includes the AVX 512 extensions or not (I've heard both ways).  Anyone know for sure? Better yet can anyone point me to a link on intel.com that provides a definitive answer?

#1 System, 3rd time in a row, uses Intel Xeon Phi coprocessors!

June 23, 2014 - 1:53am

The world's fastest computer, for the third time in a row on biannual Top500 list, uses Intel Xeon Phi coprocessors to make it possible.
Intel Xeon Phi coprocessors are used in the #1, #7, #15, #39, #50, #51, #65, #92, #101, #102, #103, #134, #157, #186, #235, #251 and #451 systems.
No wonder we are working on another book about programming for highly parallel systems!

what is the relation between &quot;hardware thread&quot; and &quot;hyperthread&quot;?

May 15, 2014 - 2:31pm

Dear Forum,

One of the Intel TBB webpages states that "a typical Xeon Phi coprocessor has 60 cores, and 4 hyperthreads/core". But this blog from Intel emphasizes that "The Xeon Phi co-processor utilizes multi-threading on each core as a key to masking the latencies inherent in an in-order micro-architecture. This should not be confused with hyper-threading on Xeon processors that exists primarily to more fully feed a dynamic execution engine."

I'm confused with these two conflicting statements. Could anyone explain the difference/similarity between hyperthread and hardware thread?

Besides, the software developer's guide says MIC has hardware multithreading by replicating complete architectural state 4 times (has this been used in xeon's hyperthreading, where one physical core is seen as two logical cores?), and further, MIC implements a “smart” round-robin multithreading. Could you explain the relation between these two multithreading techniques?

Thanks a lot!

 

Structured Parallel Programming: Tutorial (materials posted) and Discounts

November 17, 2013 - 9:45am

We taught a one day tutorial at Supercomputing 2013 (in Denver) on Sunday November 17, 2013 based on the principles in the book. The presentation material we used is available here.  Also, now during November 2013, the book is available with free shipping as part of an the Supercomputing conference (25th anniversary)... our publisher is doing a special on buying books. Their paper flyer (PDF here) describes our book and the special, or you can just jump to the web site http://store.elsevier.com/sc13 for the ordering information.

Full Presentation in PDF (3.1MB)

The following details about the tutorial which used this presentation are on the SC13 website:

Structured Parallel Programming with Patterns

SESSION: Structured Parallel Programming with Patterns

Tutorial, 8:30am-5:00pm, Room 302, November 17, 2013 (SC13 - Denver)

Presenters:
Michael D. McCool - Intel Corporation
James R. Reinders - Intel Corporation
Arch Robison - Intel Corporation
Michael Hebenstreit - Intel Corporation

ABSTRACT:
Parallel programming is important for performance, and developers need a comprehensive set of strategies and technologies for tackling it. This tutorial is intended for C++ programmers who want to better grasp how to envision, describe and write efficient parallel algorithms at the single shared-memory node level. This tutorial will present a set of algorithmic patterns for parallel programming. Patterns describe best known methods for solving recurring design problems. Algorithmic patterns in particular are the building blocks of algorithms. Using these patterns to develop parallel algorithms will lead to better structured, more scalable, and more maintainable programs. This course will discuss when and where to use a core set of parallel patterns, how to best implement them, and how to analyze the performance of algorithms built using them. Patterns to be presented include map, reduce, scan, pipeline, fork-joint, stencil, tiling, and recurrence. Each pattern will be demonstrated using working code in one or more of Cilk Plus, Threading Building Blocks, OpenMP, or OpenCL. Attendees also will have the opportunity to test the provided examples themselves on an HPC cluster for the time of the SC13 conference.

As I mentioned above, until November 30, 2013... as part of the Supercomputing conference (25th anniversary)... our publisher is doing a special on buying books. Their paper flyer (PDF here) describes our book and the special, or you can just jump to the web site http://store.elsevier.com/sc13 for the ordering information.

 

Pages