Pubblicato in: Senza categoria

Supercalcolatori cinesi. Ora sono i primi al mondo.

Giuseppe Sandro Mela.


 Tianhe-2 Supercomputer


«In informatica FLOPS è un’abbreviazione di FLoating Point Operations Per Second e indica il numero di operazioni in virgola mobile eseguite in un secondo dalla CPU.»


Molti moderni microprocessori sono in grado di eseguire simultaneamente anche quattro Flops per ciclo di clock.

Un PetaFlops indica 1015 (dieci elevato a potenza 15, 10^15) operazioni in virgola mobile al secondo,  ossia mille trilioni di Flops.


Sono prestazioni da capogiro, e molti potrebbero domandarsi a cosa possa servire una simile potenza di calcolo.


Un grande numero di problemi in fisica, chimica e scienze collegate, quali per esempio la metereologa sono modellati da set di equazioni che non sono risolvibili in modo chiuso, bensì aperto, cosa che richiede un elevatissimo numero di operazioni.

Per esempio, il progetto della sagoma di un’ala di aeroplano richiede la simulazione di velocità crescenti in atmosfera a densità decrescente: è un computo particolarmente lungo, specie poi se si intendesse fare variazioni al progetto per ricercare la sagoma ottimale. Conti che solo una ventina di anni or sono sarebbero stati impossibili.

Un altro esempio? Decrittare un messaggio redatto con codice di cifra ignoto.

Un altro esempio ancora? Interrogare un database composto da milioni di miliardi di informazioni, ottenendo la risposta cercata in tempo reale.


È evidente il valore strategico di un supercalcolatore efficiente.

La contromisura americana è del tutto sequenziale:

«A year ago, we revealed that the U.S. State Department blocked the further sales of Intel Xeon and Xeon Phi processors to Chinese institutions, most notably the Tianhe-2 supercomputer»

* * * * * *

La risposta cinese è stata anch’essa sequenziale.

Invece che acquisire materiale statunitense, se lo è progettato e costruito in casa.

Il risultato è che adesso dispone del più potente supercalcolatore esistente al mondo.

Nulla vieta di pensare che nel volgere di qualche anno sarà superato, ma nel contempo nulla vieterebbe di pensare che i cinesi provvedessero a migliore costantemente il Tianhe-2.

Anche in questo settore la Cina è diventata autonoma.

L’epoca in cui i cinesi producevano canestri di giunco è terminata.


E l’Italia? Il progetto del Cineca prevede di ottenere 50/60 PFlps entro il 2020.

Solo Dio e la Segretiria del partito democratico sanno quanto costerà al Contribuente.


VrWorld. 2016-04-15. 100 PFLOPS: China’s Supercomputer Circumvents U.S. Sales Ban.

A year ago, we revealed that the U.S. State Department blocked the further sales of Intel Xeon and Xeon Phi processors to Chinese institutions, most notably the Tianhe-2 supercomputer. The U.S. Administration also blocked the move in which a China-based investment fund would invest in AMD i.e. one of original reasons for Radeon Technologies Group – which is even without the said investment, performing above and beyond its financial capabilities.

The reason to move against Tianhe-2 is complicated yet simple – ever since its debut in June 2013, the Tianhe-2 supercomputer from NUDT (National University for Defense Technologies) sits on top of the World’s 500 fastest computers list. From the looks of it, Tianhe-2 (the name translates to ‘Milky Way’) looks to keep on sitting on top even after we see the launch of U.S. supercomputers Summit and Sierra (IBM + Nvidia), as well as Aurora and Theta (Intel).

With its 32,000 Intel Xeon E5-2692 v2 processors, and 48,000 Intel Xeon Phi 31S1P co-processors, Tianhe-2 delivers a peak performance of fantastic 54.9 PFLOPS, and a sustained performance of 33.86 PFLOPS. What is little known is that Tianhe-2 is not a fully built supercomputer. In fact, Tianhe operated at a 50% capacity, as the original target for the system was 100 PFLOPS peak and 80 PFLOPS sustained.

According to our sources, China did not react in a way the current administration expected. Rather than pressuring with (empty) threats that affect the commerce between the two of world’s largest economies, China invested all the funds intended for Intel and other foreign vendors – into the development of in-house Alpha and ARM superprocessors, which have the potential to beat the traditional x86 architecture. In terms of funds, NUDT planned to buy 32,000 more Xeon processors (this time, based on Haswell-E) and 48,000 more Xeon Phi co-processors. We’ve been hearing that over $500 million was invested in bringing the Chinese silicon from a prototype phase to production-grade level.

The New Tianhe-2: Meet the 100 PFLOPS Supercomputer

At the 2016 Supercomputing Frontiers conference in Singapore, we learned the first details of the fully developed Tianhe-2 supercomputer, scheduled to debut in June 2016 during the 2016 International Supercomputing Conference in Frankfurt, Germany. This system is expected to deliver over 100 PFLOPS peak performance, and keep the crown of the world’s fastest (super)computer.

The new Tianhe-2 represents a hybrid design, featuring two new additions, as the old Xeon Phi cards are being phased out. Phytium Technologies recently delivered their “Mars” processors in the form of PCI Express cards that replaced the Xeon Phi cards, and motherboards to upgrade the system. Given that there are 48,000 add-in boards installed, the new 64-core design enables the system to reach its original performance targets. With the three million new ARM cores inside the Tianhe-2, its estimated Rpeak performance in the Linpack benchmark should exceed 100 PFLOPS.

Should Tianhe-2 reach its full deployment of 32,000 Xeons, 32,000 ShenWei processor, and 96,000 Phytium accelerator cards, we might see an upgrade in the range of 200-300 PFLOPS – if the building can withstand the thermal and power challenges associated with it.

In August 2015, a little known company Phytium Technologies planned to demonstrated “Mars” processors at the HotChips conference in Cupertino, CA. However, its Lead scientist was denied a visa to enter the U.S. and we could not see the physical boards which featured this extremely powerful processor. The slide above shows the base architecture of the initial engineering sample, with the final delivered boards featured significantly higher performance specifications.

While we were not privy to see the final silicon, we known that the performance went up by almost three fold, and that the final production board delivers 1.5 TFLOPS of compute power, most probably in a dual chip arrangement (akin to Tesla K80 and FirePro S9300 x2).

There are several implementations of this processor in Tianhe-2: add-in card that replaces the Xeon Phi, and motherboards featuring upgradable memory, all using very affordable DDR3-1600 memory. Phytium Technology delivered motherboards with multiple processors and up to 256 GB per Mars processor. Typical implementation measns the company achieves a triple 64 – 64-bit ARM core inside a 64-core processor attaches to 64 GB memory using 8-channel memory interface, not the 16-channel as mentioned in slides – that is for onboard (G)DDR memory.

Bottom line is, the sales restriction enabled a small startup to deliver a product which achieves higher performance than the products it was supposed to replace. All in all, a win for NUDT, and a small company that ‘no one ever heard off’. We will see how the market will develop, and is there a space for Phytium Technology on the supercomputing market. Tianhe-2 might be just the beginning.

Also, this is not the only development coming from mainland China. Jiāngnán Computing Lab successfully developed a new multi-core Alpha processor. Considered a sixth generation design, ShenWei Alpha processors achieve more than 1 TFLOPS of compute performance. However, we were not able to confirm what volumes are involved with the new batch of ShenWei processors. What makes them mysterious is the fact that Wikipedia only lists three generations of their Alpha processors, while the scientists are talking about fifth, sixth and seventh generations.


VrWorld. 2016-04-07. Uncle Sam Shocks Intel With a Ban on Xeon Supercomputers in China.

Just as Intel’s (NASDAQ: INTC) CEO Brian Krzanich opens the regular staff meetings before a dramatically reduced IDF2015 Shenzhen conference, it is a good time to review how government and enterprises don’t see eye to eye when it comes to strategic business.

Remember the Tianhe-2 machine at Guangzhou Supercomputer Center, the current World’s number one according to Top 500 Supercomputer list? Unlike some other China supercomputers – Tianhe-2 is fully Intel based machine,  the world’s largest assembly of Intel Xeon CPUs and Xeon Phi accelerators.

Even after Intel ‘opened the kimono’ and gave a nearly 70%  discount on its processors and accelerators, it has given Intel, and therefore US technology sector a major foothold in China and Asian region as such. Over the course of past two years, we were involved in a lot of discussions with Intel staff who were not privy to see the financial impact of the deal — and even argued our undoubtedly solid information. We’re not here to report how things should be, or are in marketing and investor presentations to its numerous staff, but how things really are.

During 2015, the Tianhe-2 supercomputer was supposed to be doubled in its size, up to 110 PFLOPs peak, again using the very same Intel processors and accelerators. Since now these are mature products with lower real manufacturing cost for Intel, they could finally make some real money.

Well, it was not to be: our tweety bird from the window chirped to us that Uncle Sam has put this supercomputer centre, together with National University of Defense Technology in Changsha, the system’s creators, and Tianjin centre, among others, on so a so-called “Denial List”, which prevents any high technology from the USA to be sold to these sites. Our sources used even harsher words.

Knowing that these several sites alone are expected to order some 250+ PFLOPS of compute in the next few years (around 500,000 top-end Broadwell-EP Xeon E5v4 processors, or  approximately $1 billion high margin list price) and they were THE Intel friendly ones, this is quite a loss to Intel, thanks to Uncle Sam.

But, what’s worse strategic loss in time is that, based on this decision as an excuse, indigenous China high end processor architectures can now push the government to gradually remove any dependence on US. This means just one thing: an AMD or Intel x86 processor technology is increasingly becoming errata non grata. Should the Chinese government react in force, it will give the Chinese vendors the blank check support to go all the way a developing their Alpha, POWER and MIPS processors for both the government and the mainstream commercial use.

You may think they are not up to the mark, but remember how fast British ARM architecture became the dominant processing architecture in the world. And this group doesn’t need to worry about the antiquated x86 ISA, worry about satisfying the dumbed down shareholder masses, or overpaying their marketing and sales staff, as well as the fat check, golden parachute-protected CxOs.

They have taken the best that the USA has developed (some of key Alpha, GPGPU and MIPS architects left US over the course of past four years, a lot of them due to non-renewed visas) and discarded due to corporate shenanigans, and the continued developing it much farther than anyone expected both on hardware and software side.

So, thanks to Uncle Sam, China might not have a 110 PFLOPS Intel based supercomputer but it definitely will launch a 100 PFLOPS system based on upcoming 64-core, TFLOPS-class ShenWei Alpha, with true blue CPUs possibly faster per socket then even the next generation Xeon Phi or Volta/Pascal-based Teslas.  Next, of course 100 PFLOPS Chinese POWER8 or 9 — (thank you IBM) and then possibly even Loongson MIPS – -it may come back into the high end field with renewed government support because of this Uncle Sam move. All are clean, elegant, scalable high end RISC architectures.

So who are the winners and losers from this?

NUDT and Tianhe may be the losers for now, but only short term. They will simply speed up their HPC ARM plan.

Intel comes out the big loser from this and a lot: who will want to do a phased deployment large x86 machine in China now, and worry about future phases? Then comes Uncle Sam himself: they lost even that little bit of influence on the high end China HPC. How is that for “cutting your nose to spite your face?”.


Rai News. 2016-04-14. Ricerca: al Cineca arriva “Marconi”, il supercomputer.

È tutto pronto per la nascita di ‘Marconi’: inizierà a metà aprile l’installazione del nuovo supercomputer italiano per la ricerca, un sistema co-disegnato dal consorzio Cineca, che andrà ad affiancare il ‘fratello’ Galileo. Cineca è il Consorzio Interuniversitario di calcolo, con sede a Casalecchio di Reno. Fondato nel 1969, senza scopo di lucro, è costituito da 70 Università italiane, 5 Enti di Ricerca e il Ministero dell’Istruzione, dell’Università e della Ricerca, per il supporto alle attività di ricerca della comunità scientifica tramite il supercalcolo. Dopo una selezione iniziata oltre un anno fa tramite bando europeo, a dicembre 2015, è stato assegnato l’incarico per l’infrastruttura a Lenovo, uno dei tre maggiori produttori globali nel mercato dei server basati su architettura x86 e da oltre due anni in cima al mercato pc. Il piano complessivo prevede un investimento di 50 milioni di euro in due fasi: la prima metterà a disposizione della comunità scientifica una potenza di calcolo pari a circa 20 Pflop/s e una capacità di memorizzazione dati di oltre 20 PetaByte, che sarà in produzione nel 2017. La seconda fase, che inizierà nel 2019, punterà sull’incremento della potenza disponibile per raggiungere i 50/60 Pflop/s entro il 2020.