Zone des développeurs Intel® :
Performance

Points forts

Juste publié ! Intel® Xeon Phi™ Coprocessor High Performance Programming 
Apprenez les fondements de la programmation pour cette nouvelle architecture et les nouveaux produits. Nouveau !
Intel® System Studio
Intel® System Studio est une suite exhaustive d’outils intégrés de développement de logiciels qui peut accélérer la mise sur le marché, renforcer la fiabilité des systèmes et améliorer l’efficacité énergétique et les performances. Nouveau !
Au cas où vous l’avez manqué – Rediffusion du webinaire en direct de deux jours
Introduction au développement d’applications hautes performances pour processeurs Intel® Xeon® et coprocesseurs Intel® Xeon Phi™.
Structured Parallel Programming
Les auteurs Michael McCool, Arch D. Robison et James Reinders utilisent une approche basée sur des modèles structurés qui devrait rendre le sujet accessible à tous les développeurs de logiciels.

Optimisez les performances de vos applications grâce à la programmation parallèle et avec l'aide des ressources novatrices d'Intel.

Ressources de développement


Outils de développement

 

Intel® Parallel Studio

Intel® Parallel Studio, qui apporte aux développeurs Microsoft Visual Studio* C/C++ un traitement parallèle de bout en bout simplifié, fournit des outils avancés permettant d’optimiser les applications clientes pour un traitement multicœur et à nombreux cœurs.

Produits Intel® de développement logiciel ›

Explorez tous les outils qui vous aideront à optimiser vos applications pour l’architecture Intel. Certains outils sont disponibles pour une période d’évaluation gratuite de 45 jours.

Base de connaissances sur les outils

Trouvez des guides et des informations d'assistance sur les outils Intel.

List of Useful Power and Power Management Articles, Blogs and References
By Taylor Kidd (Intel)Posted 04/16/20141
INTRODUCTION AND PURPOSE: This article endeavors to provide a single point of reference to Power Management blogs, articles and other resources relevant to the Intel® Xeon Phi™ coprocessor. There are many excellent resources out there on power, power management and tools; this article cannot ho...
A Parallel Stable Sort Using C++11 for TBB, Cilk Plus, and OpenMP
By Arch D. Robison (Intel)Posted 04/11/20140
This article describes a parallel merge sort code, and why it is more scalable than parallel quicksort or parallel samplesort. The code relies on the C++11 “move” semantics. It also points out a scalability trap to watch out for with C++. The attached code has implementations in Intel® Threading ...
Intel® Software Development Tools 2015 Beta
By Gergana Slavova (Intel)Posted 03/27/20140
Intel® Software Development Tools 2015 Beta What's New in the 2015 Beta This suite of products brings together exciting new technologies along with improvements to Intel’s existing software development tools: Get guidance on how to boost performance safely without creating threading bugs using...
Resource Guide for People Investigating the Intel® Xeon Phi™ Coprocessor
By Taylor Kidd (Intel)Posted 03/25/20140
This article identifies resources for anyone investigating the value to their organization of the Intel® Xeon Phi™ coprocessor, which is based on the Intel® Many Integrated Core (Intel® MIC) architecture. It is one of three such guides, each for people in one of the following specific roles: Adm...

Pages

S’abonner à
[Acceler8 '12] Scaling fast sequential algorithms using MapReduce
By seviyorPosted 05/30/20120
Parallel algorithm vs. work in parallel As many of the forum posts have shown, fast algorithms for solving the problem of maximal common substrings gave good results on the benchmarck but didn't really scale with the number of threads. This is because those sub-square (linear or n*logn) algorithm...
Create a Ubuntu 11.04 LiveUSB to use Intel® Parallel Studio XE
By Xavier H. (Intel)Posted 05/14/20120
You need a license for Intel® Parallel Studio XE for Linux and and at least a 4GB USB Key. Get an ISO image of Ubuntu 11.04. Create a new Ubuntu 11.04 LiveUSB, with persistence mode enabled (you can specify a size of 1mo for the persistence file, you will overwrite it with a ~3Go file in the next...
Getting system parameters in order to improve data structures
By andreibPosted 05/13/20120
Dear programmer, there are a lot of situations when you have to deal with very efficient data structures to get a good performance. An important characteristic of a data structure is granularity. How big the data structure should be? Which is the optimum size of its elements? Of course there is ...
Retour d'expérience concours Acceler'8
By Maxime RIVIEREPosted 02/01/20121
La nouvelle édition du concours acceler'8 a pris fin il y'a un peu plus d'un mois. Contrairement au concours précédent, nous n'avons pas publié d'article. Il faudrait que nous le fassions à l'occasion. C'etait une part intéressante du concours précédent. Les contraintes de la vie courante reprenn...

Pages

S’abonner à Blogs de la Zone des développeurs Intel®
Intel® Parallel Studio XE SP1 & Intel® Cluster Studio XE SP1
By kathy-farrel (Intel)0
Intel® Parallel Studio XE SP1 & Intel® Cluster Studio XE SP1 - What's New - Webinar Tuesday, September 17 9am PDT Please join us for a technical presentation on the new features found in the recently released Intel® Parallel Studio XE 2013 SP1 Intel® Cluster Studio XE SP1. This release includes support for compilers and performance analysis on Intel® Xeon Phi™ on Windows*. The technical presentation will briefly cover new features for both C++ and Fortran on Linux*, Windows*, and OS X* operating systems as well as error checking and performance profiling tools. Learn how to efficiently boost your application performance! Not too late! - Register Now  Learn about Upcoming Webinars
Responsive OpenMP Theads in Hybrid Parallel Environment
By Don K.1
I have a Fortran code that runs both MPI and OpenMP.  I have done some profiling of the code on an 8 core windows laptop varying the number of mpi  tasks vs. openmp threads and have some understanding of where some performance bottlenecks for each parallel method might surface.  The problem I am having is when I port over to a Linux cluster with several 8-core nodes.  Specifically, my openmp thread parallelism performance is very poor.  Running 8 mpi tasks per node is significantly faster than 8 openmp threads per node (1 mpi task), but even 2 omp threads + 4 mpi tasks runs was running very slowly, more so than I could solely attribute to a thread starvation issue.  I saw a few related posts in this area and am hoping for further insight and recommendations in to this issue.  What I have tried so far ... 1.  setenv OMP_WAIT_POLICY active      ## seems to make sense 2.  setenv KMP_BLOCKTIME 1          ## this is counter to what I have read but when I set this to a large number (2500...
Optimizing cilk with ternary conditional
By Fabio G.3
What is the best way to optimize the cycle cilk_for(i=0;i<n;i++){ x[i]=x[i]<0?0:x[i]; }or somethings like that? Thanks, Fabio
have asked them to
By Robert P.0
ICC t20 World Cup 2014 Live StreamIndia vs Pakistan Live Stream
Optimizing reduce_by_key implementation using TBB
By Shruti R.0
Hello Everyone, I'm quite new to TBB & have been trying to optimize reduce_by_key implementation using TBB constructs. However serial STL code is always outperforming the TBB code! It would be helpful if I'm given an idea about how reduce_by_key can be improvised using tbb::parallel_scan. Any help at the earliest would be much appreciated. Thanks.
reading a shared variable
By VIKRANT G.3
hello everyone I am relatively new to parallel programming and have the following doubt:- is reading a shared variable(that is not modified by any thread) without using locks a good practice thanks for the help in advance  
Weird Openmp bug
By Cheng C.1
Dear all, I want to combine OpenMP and RSA_public_encrypt and RSA_private_decrypt routines. However, I was confused by a weird bug for a few days.    In the attached program, if I generated 2 threads for parallel encryption and decryption, everything works well. If I generated 3 or more threads, the RSA_public_encrypt routine works fine. All strings are successfully encrypted (encrypt_len=256). However, the RSA_private_decrypt routine went wrong, that is, only one thread works properly, all the other threads failed to decrypt some of the strings (decrypt_len=-1, rsa_eay_private_decrypt padding check failed). If there are 1000 strings and 4 threads, the total number of string failed to decrypt went around 710 (some times as low as around 200). So as expected, if I use 4 threads for parallel RSA_public_encrypt and one thread for RSA_private_decrypt, nothing went wrong.   It would be great if you could give some ideas. Thanks very much.    #include <openssl/rsa.h> #include <...
performance loss
By Bo W.8
Hi, some interesting performance loss happened with my measurements. I have a system with two sockets, each socket is a E5-2680 processor. Each processor has 8 cores and with hyper-threading. The hyper-threading was ignored.  On this system, I started a program 16 times at the same time and each time pinned the program to different cores. At first, i set all cores to 2.7GHz and saw : Program 0 Runtime 7.7s Program 8 Runtime 7.63s And then, i set  cores on the second socket  to 1.2GHz and saw: Program 0 Runtime 12.18s Program 8 Runtime 15.73s The program 8 ran slower. It is clear, because core 8 had lower frequency. But why was program 0 also slower? Its frequency wasn't touched.   Regards, Bo

Pages

S’abonner à Forums

Points forts