Tuesday, August 22nd, 2017 | Gadgets

Intel has spoken about the Knights Mill at Hot Chips 29 in Cupertino, California. This is the next generation of Xeon Phi, ie computer cards for servers, which have now been developed with a focus on deep learning training. Compared to Knights Landing aka Xeon Phi 7200, the cores were rebuilt so they can work with half precision.

    The chip again consists of 36 tiles and 72 cores. (Picture: Intel)

The basic construction remains the same: Knights Mill consists of 36 tiles, which communicate by mesh structure. Each tile consists of two cores as well as VPUs (Vector Processing Unit) with one MByte L2 cache. The vector units are responsible for the AVX-512 instructions, which work 72 cores in favor of four-hyperthreading utilization.
For deep learning, the VPUs are so-called Quad FMA, ie, four multiplications together with an addition in a single-step floating-point (FP32) step. As a result, the speed per clock compared to Knights Landing is doubled ("pumped" as in Pentium 4); Because one of the double-precision ports is missing, the performance halves at FP64. Instead, Intel integrated four of the new VNNI units.

    Knights Mill achieves double FP32 and half FP64 performance, but also master INT16 at quadruple speed. (Picture: Intel)

Surprisingly, the Virtual Neural Network Instructions do not work with half floating point precision (FP16), but with variable and less flexible fixed point format. With INT16 input and INT32 output at 31 bits, however, Intel achieves enough accuracy for the training of neural networks.

At an assumed 1.5 GHz for 72 cores as with Knights, Knights Mill would theoretically achieve the following computing performance: 13.8 instead of 6.9 teraflops with single precision (FP32) and 1.7 instead of 3.5 teraflops with double precision (FP64) as well as 27.6 Terafops at half precision (INT16).
Apart from the cores, there are no innovations at Knights Mill. The chip is combined with 16 GB of MCRAM (modified hybrid memory cubes) on the package. As a socked version, the Xeon Phi can access DDR4 via six memory channels and has 36 PCIe Gen3 lanes.


