GeForce GTX 1080
GeForce GTX 1080 – built on the basis of GPU GP104 is the second-oldest chip in the Pascal line. P104 is almost twice smaller in the number of transistors and crystal area than P100. If we start from the Maxwell line, the new chip occupies an intermediate position between GM204, which NVIDIA uses in GeForce GTX 970/980, and GM200 (GeForce GTX 980 Ti and GTX 980 TITAN X) both in terms of the “physical” crystal parameters and the number of CUDA cores and texture units. The back-end configuration of GP104 unmistakably defines its position as a successor to GM204, since it is also equipped with a 256-bit memory bus, divided between eight controllers, and 64 ROP blocks.
From the point of view of the layout of computing units GPU The Pascal architecture in the GP104 implementation closely follows the principles laid down in Maxwell. All the computing logic is concentrated in structures called Graphics Processing Cluster (GPC) - there are four of them in this processor. Inside the GPC there are five Stream Multiprocessors, each of which includes 128 CUDA cores, 8 texture units and an L1 cache section, which is increased from 24 to 48 KB compared to Maxwell. Each GPC also includes a single Polymorph Engine (Raster Engine in the diagram), which performs the initial stages of rendering: determining the edges of polygons, projection and clipping invisible pixels.
The main achievement of the 16 nm process technology here is expressed in clock frequencies, which have almost doubled compared to the GeForce GTX 980: base frequency – 1607 MHz, Boost Clock – 1733 MHz (since the latter is the average frequency in typical applications, the GTX 1080 is capable of briefly overclocking to higher values.
The GP64 processor performs double-precision (FP104) calculations at 1/32 of FP32, in this it follows the second- and subsequent-tier chips of the Maxwell family. The Pascal architecture can also perform FP16 operations with double the performance of FP32, while Maxwell performs them at the same speed. In terms of power consumption, the GeForce GTX 1080 belongs to the same class as the GeForce GTX 980 - 180 W. Based on this data and the declared performance in TFLOPS for the GTX 980 and GTX 1080, we get an increase in Pascal's energy efficiency of 63% compared to Maxwell. The RAM volume is 8 GB of GDDR5X type - a volume that was previously the prerogative of AMD video cards based on GPU Hawaii, which have a 512-bit memory bus.
One of the key differences between GDDR5X and GDDR5 is the ability to transmit four bits of data per signal cycle (QDR - Quad Data Rate) as opposed to two bits (DDR - Double Data Rate), as was the case in all previous modifications of DDR SDRAM memory. The physical frequencies of the memory cores and data transfer interface are located approximately in the same range as those of GDDR5 chips.
Characteristics of GeForce GeForce GTX 1080
|
||||||
Chip
|
||||||
Frequencies
|
||||||
Memory
|
||||||
Interface and TDP
|
And in order to saturate the increased bandwidth of the chips with data, GDDR5X uses an increased data prefetch (prefetch) from 8n to 16n. With a 32-bit interface of a separate chip, this means that the controller selects not 32, but 64 bytes of data per memory access cycle. As a result, the resulting interface bandwidth reaches 10-14 Gbit/s per contact at a CK (command clock) frequency of 1250-1750 MHz - this is the frequency shown by utilities for monitoring and overclocking video cards - such as GPU-Z. At least, these are the figures included in the standard now, but in the future Micron plans to achieve figures up to 16 Gbps.
The next advantage of GDDR5X is the increased chip size - from 8 to 16 Gbit. The GeForce GTX 1080 is equipped with eight 8 Gb chips, but in the future, graphics card manufacturers will be able to double the amount of RAM as more capacious chips become available. Like GDDR5, GDDR5X allows the use of two chips on one 32-bit controller in the so-called clamshell mode, which as a result makes it possible to address 32 GB of memory on the 256-bit GP104 bus.
The Maxwell architecture already has the widest range of GPU on the market with support for new rendering functions in the DirectX 12 standard (feature level 12_1). Pascal adds several more options to this arsenal, which also have potential for use in the VR field. Async Compute is one of the DirectX 12 features, previously characteristic only of AMD processors on the GCN architecture, which allows dynamic resource allocation GPU between the graphics and computational load so that the resources freed up after the completion of one of the tasks can be immediately thrown at the remaining task.
While AMD in configurations of several GPU switched to synchronization via the PCI Express bus, NVIDIA still uses a separate interface in SLI. However, the fact that at sufficiently high separate screen resolutions has escaped public attention GPU NVIDIA also exchanges some data via PCI Express. This suggests that in the form that was implemented in previous NVIDIA architectures, SLI has already exhausted its bandwidth limit. As far as we know, it is 1 GB/s, which is no longer enough to exchange frames at a resolution of 3840x2160 with a frequency of 60 Hz.
But rather than switch entirely to PCI Express, Pascal reworked the existing interface. Traditionally, NVIDIA graphics cards have two SLI connections that work simultaneously to communicate GPU with its neighbors in a triple or quad configuration, but only one channel is used to transmit data in a dual-processor bundle. Use two channels in tandem GPU – the most obvious way to increase performance, and that's exactly what happened in Pascal.
NVIDIA has also released a special bridge, available in several versions of different lengths, which has improved physical characteristics to operate the interface at a frequency increased from the previous 400 MHz to 650 MHz.