GeForce RTX 4070 Ti
The GeForce RTX 4070 Ti is based on the AD104 GPU and features 7680 CUDA cores delivering 40 FP32-precision shader teraflops for graphics rendering, 240 fourth-generation Tensor Cores offering 641 trillion sparse matrix operations for AI and DLSS processing, 60 third-generation RT cores generation Ada architecture with 93 RT-TFLOPS performance for next-generation ray-traced graphics acceleration and 12 GB of GDDR6X memory. Like all GeForce RTX 40 series GPUs, the RTX 4070 Ti is equipped with Ada innovations including Shader Execution Reordering (SER), a new optical flow engine, new RT cores and DLSS 3.
NVIDIA Ada Architecture
The NVIDIA Ada architecture is a giant leap in performance. Numerous improvements make it the fastest and most advanced. The RTX 4070 Ti is manufactured using TSMC's custom 4N process and contains 35,8 billion transistors and 7680 CUDA cores. Hardware-accelerated tracing, fourth-generation Tensor Cores for improved AI performance, eighth-generation encoders with support for AV1 encoding and decoding, and DLSS enhancements that deliver high frame rates in competitive gaming and at ultra settings with ray tracing enabled.
NVIDIA Ada streaming multiprocessor
RTX video cards have three main processors: programmable universal CUDA cores on which shaders and general-purpose CUDA applications are processed, RT cores for accelerating calculations of intersections of rays with triangles and bounding volumes, in the RT cores of the Ada architecture they doubled the rate of calculation of intersections with triangles, the latter processor type - artificial intelligence processing pipeline called tensor cores.
Ada improves all three RTX processors
Programmable Shaders: 40 shader teraflops compared to 21,7 teraflops on the RTX 3070 Ti. The Ada shader processor includes an important new technology called Shader Execution Reordering (SER), which reorders work on the fly, providing up to XNUMXx speedup for ray tracing shaders. SER is as big a innovation for GPUs as out-of-order execution once was for CPUs.
4th Generation Tensor Cores: The new Tensor Core in Ada includes the NVIDIA Hopper FP8 Transformer Engine, delivering up to 641 tensor teraflops with FP8 precision on sparse matrices in the RTX 4070 Ti for AI training and inference, up from 174 tensor teraflops on sparse matrices in the RTX 3070 Ti. Compared to FP16, FP8 halves memory requirements and doubles AI performance.
Gen 3 RT Cores: The new Opacity Micromap Engine speeds up intersection calculations on average by half for texture transparency test surfaces when developers use this feature, and the new Micro-Mesh Engine increases geometric detail without the cost of assembly and BVH storage. Ada's throughput on intersection tests is 93 RT-TFLOPS, compared to 42,5 RT-TFLOPS for the 3070 Ti.
4th Generation Tensor Cores
Tensor cores are high-performance computing cores specialized and adapted for matrix multiplication and addition operations, used in artificial intelligence and high-performance computing applications. Tensor cores provide breakthrough performance for matrix computations, which are critical for training multilayer neural networks and inference of already trained networks. Examples of inference applications include NVIDIA DLSS 3 technology for gamers, in which a separate neural network is responsible for generating high-quality frames, all powered by NVIDIA's Tensor Core. DLSS has become so popular that there are already more than 250 games that support this technology, in which gamers can double their performance with one click. In addition, many creative apps have begun to use artificial intelligence features to help artists create content faster and with better quality. Today, more than 110 popular creative applications use acceleration on the Tensor and RT cores of RTX graphics cards. And exclusive NVIDIA applications such as Broadcast и Canvas, offer tools for removing noise, creating virtual backgrounds, and many other AI-powered effects for video streaming and conferencing.
The fourth-generation Ada tensor core builds on the capabilities of previous Ampere GPUs, which supported many new data types and added structured sparsity acceleration to double the throughput of previous Turing cores. Generation Ada Tensor Cores support the new FP8 data format, first introduced in the NVIDIA Hopper GPU architecture. Compared to FP16, FP8 halves storage requirements and doubles AI performance. With the new FP8 format and sparsity feature, the GeForce RTX 4070 Ti delivers 641 TFLOPS of performance for AI workloads.
3rd generation RT cores
Ada's third-generation RT cores are specialized hardware units for accelerating BVH traversal and ray-triangle intersection calculations, which are critical to accelerating ray tracing. The RT cores of RTX video cards are completely independent, they perform all BVH traversal and intersection calculations, thereby offloading SM streaming multiprocessors with CUDA cores and freeing them up to perform other tasks such as pixel shading, vertex shading and general purpose calculations.
Ada's RT cores deliver up to 2x faster ray-triangle intersection testing compared to NVIDIA Ampere GPUs, allowing developers to add more detail to their virtual worlds. Ada's RT cores also include new Opacity Micromap Engine blocks that speed up geometry tracing with alpha testing by up to 2x, helping developers speed up tracing-intensive scenes with vegetation and particle effects by up to 2x. The new RT cores also include Displaced MicroMesh Engine blocks, which generate micro meshes on the fly to create additional geometry.
All of these technologies to improve ray tracing performance give the Ada architecture a lot of room for the future. As new games come out using Ada technologies to increase performance, RTX 40 series video cards will undoubtedly become faster and all break away from the previous generation of RTX 30 series video cards. A recent remaster can be cited as an example. Portal with RTX based on RTX Remix, in which NVIDIA uses new features of the Ada architecture, such as the OMM and SER engines (which, by the way, can be disabled in the settings), together they allow the RTX 4090 to be up to 3 times faster than the RTX 3080 Ti without using DLSS, but with Using the DLSS 3 frame generator, the advantage can be up to 5 times.