PLEASE BE ADVISED: THE ACCELERATED VERSION OF R IS CURRENTLY NOT SUPPORTED. DUE TO LACK OF PARTICIPATION,
THE STATLAB WILL NOT KEEP R ACCELERATED ON THE UPGRADE PATH. THE VERSION LISTED BELOW IS STILL AVAILABLE.
Using the Accelerated Version of R
General Usage Information
The 64-bit version accelerated version of R, (currently R-2.8.1a) provides accelerated performance for some R applications that make extensive use of BLAS and LAPACK. This version is ONLY available for the 64-bit Intel servers. It cannot be used on 32-bit systems, and it should not be used on AMD Opteron-based, 64-bit systems. bigmem01 and bigmem02 are the "public", Intel, 64-bit servers available to everyone. Other Intel, 64-bit servers that can use the accelerated R are:
diabetes01-04, darwin01-12, hercules01-04, star01-02, mars01, athena, zeus01-07 and dots01
The 64-bit Opteron servers (desk00 and durga) should NOT be used with R-2.8.1a.
The executable for the new version is at:
It is also possible to always get the executable for the latest supported version of accelerated R by using:
Performance improvements are NOT guaranteed. In some cases, performance decreases are possible. It is also important check that that results from the accelerated version are accurate. So, it is a good idea to run your application with the non-accelerated version first, using the current default version of R (on the same 64-bit server):
This will provide a verified result and an estimate of non-accelerated timing. Then run the accelerated version for comparison. You can then decide which version is best for your R application.
For some simple and well structured problems (cross product of large arrays, for example), 40X speedups are possible with the accelerated version of R. For more typical problems with large arrays, speedups of 2X to 7X are possible. For some smaller problems, and ones that are not structured to take advantage of the acceleration, it is possible to get slowdowns of up to 10%.
Additional Optimization Information
Accelerated R uses the Math Kernel Library (MKL) supported by Intel Corp. Intel provides documentation
on how to structure problems in ways that will work best with MKL. Links to the documentation and discussion forums for MKL can be found at the Intel BLAS/LAPACK overview pages
. The main recommendations involve setting array boundaries and sizes to specific values:
• arrays are aligned at 16-byte boundaries
• leading dimension values (n*element_size) of two-dimensional arrays are divisible by 16
• for two-dimensional arrays, leading dimension values divisible by 2048 are avoided.
So, where possible, using R array dimensions that are multiples of 16 will potentially be advantageous.
By default, R-2.8.1a sets the multithreading level to 1 thread (single threaded). This is the best option for most cases. However, there are special cases where multithreading can improve performance. IF your R application is the ONLY one running on a multicore, 64-bit server, setting the multithreading level to 2, 3 or 4 will typically reduce completion time at the expense of using more of the server resources. To get this advantage, you should only run ONE instance of your R job. If you run more, the competition between your multiple, multithreaded R jobs may actually increase completion time compared with submitting the multiple R jobs with the default, single threading. Performance improvements are NOT linear. In a benchmark evaluation, the following improvements were measured:
2 threads - ideal is 2X speedup - actual was 1.4X - or about 70% of ideal
3 threads - ideal is 3X speedup - actual was 1.6X - or about 50% of ideal
4 threads - ideal is 4X speedup - actual was 1.8X - or about 45% of ideal
The benefits of using 3 or 4 threads is marginal compared to 2 threads. And with 2 threads, there is still a possibility to share the computer resources with other jobs. I recommend use of 2 threads when using multithreading.
To use multithreading at the recommended level, add the following command near the beginning of your R application:
When using multithreading, it is especially important to verify the accuracy of the result. So please make sure you have a baseline solution against which you can verify that results are equivalent.
Issues with swap
If you submit multiple, large R jobs on a single 64-bit, multi-core server, you may exhaust the typical 8G of "real" memory and start to use swap. This applies whether you use the accelerated or non-accelerated versions of R. You should watch the use of swap by running a tool like "top" in a separate window. If the amount of swap starts increasing, you will see performance degradation. If this occurs, you can either reduce the number of jobs you submit, or you can accept the level of degradation that occurs.
- 18 Jan 2011