enfrdepluk
Search find 4120  disqus socia  tg2 f2 lin2 in2 X icon 3 y2  p2 tik steam2

GeForce 4 Ti 4200 architecture features

GeForce 4 Ti - evolution of GeForce 3 Ti

Key architectural innovations of the NV25 (compared to NV20)

Two independent display controllers (CRTC). Flexible support for all possible modes with the output of two independent frame buffers in terms of resolution and content to any available signal receivers.
Two full-fledged 350 MHz RAMDAC integrated into the chip (with a 10-bit palette).
TV-Out interface integrated into the chip.
TDMS transmitter integrated into the chip (for DVI interface).
Two blocks of interpretation and execution of vertex shaders. They promise a significant increase in the speed of processing scenes with complex geometry. Blocks cannot use different shader microcode; the only purpose of such duplication is to process two vertices at the same time - to increase performance.
Improved shading pipelines provide hardware support for pixel shaders up to and including version 1.3.
According to NVIDIA, the effective fill rate in MSAA modes has been increased, now 2x AA and Quincunx AA modes will cause a significantly smaller performance hit. Quincunx AA has been slightly improved (sample positions have been shifted). A new AA method has appeared - 4xS.
Improved split caching system (4 separate caches for geometry, textures, frame buffer and Z buffer).
Advanced lossless compression (1:4) and fast Z buffer flushing.
Improved algorithm for rejecting invisible surfaces (Z Cull HSR).

To summarize this list, I would like to note the evolutionary rather than revolutionary nature of the changes in comparison with the previous creation of NVIDIA (NV20). However, this is not surprising - historically, NVIDIA first offered a product that contained many new technologies, and then released a more advanced (optimized) version based on it, eliminating the shortcomings that attracted the main attention (during the product's presence on the market).

Block diagram of N25

Nv251

According to testing carried out after the release of video cards, the GeForce4 Ti turned out to be noticeably faster than the GeForce3 Ti. Such an impressive performance gap for the NV25 was achieved not thanks to some fundamentally new technology, but due to further debugging and tuning of existing technologies in the GeForce3 (NV20). This means that the GeForce4 core was only 5% larger than the NV20 core with the same process technology (0,15 microns).

chip_block1

nfiniteFX II Vertex Shaders

If GeForce3 had only one vertex shader module, then GeForce4 Ti already has two. However, this is unlikely to surprise you, since the nVidia chip for the Microsoft Xbox also contains two vertex shader modules. Except that in NV25 the modules were improved.

Obviously, two parallel vertex shader modules could process more vertices per unit time. To do this, the chip itself decomposed the vertices into two threads, so the new mechanism is transparent to applications and APIs. Instruction dispatching is handled by the NV25, and the chip must ensure that each vertex shader module is working on its own vertex. Improvements in vertex shader modules since the GeForce3 have led to reduced latency when processing instructions.

As a result, the GeForce4 Ti4600 could process approximately 3 times more vertices than the GeForce3 Ti500 due to the presence of two vertex shader modules, their improvement and operation at a higher clock frequency.

nfiniteFX II Pixel Shaders

nVidia was able to improve the functionality of pixel shaders in GeForce4 Ti.
The new chip supports pixel shaders 1.2 and 1.3, but not the ATi 1.4 extension.

Below are the new pixel shader modes.
OFFSET_PROJECTIVE_TEXTURE_2D_NV
OFFSET_PROJECTIVE_TEXTURE_2D_SCALE_NV
OFFSET_PROJECTIVE_TEXTURE_RECTANGLE_NV
OFFSET_PROJECTIVE_TEXTURE_RECTANGLE_SCALE_NV
OFFSET_HILO_TEXTURE_2D_NV
OFFSET_HILO_TEXTURE_RECTANGLE_NV
OFFSET_HILO_PROJECTIVE_TEXTURE_2D_NV
OFFSET_HILO_PROJECTIVE_TEXTURE_RECTANGLE_NV
DEPENDENT_HILO_TEXTURE_2D_NV
DEPENDENT_RGB_TEXTURE_3D_NV
DEPENDENT_RGB_TEXTURE_CUBE_MAP_NV
DOT_PRODUCT_TEXTURE_1D_NV
DOT_PRODUCT_PASS_THROUGH_NV
DOT_PRODUCT_AFFINE_DEPTH_REPLACE_NV

We will not describe each new mode, but it should be noted that the GeForce4 Ti introduced support for z-correct bump mapping, which made it possible to eliminate artifacts that appear when a bump texture comes into contact with other geometry (for example, when water in a lake or the river comes into contact with the ground).

nVidia was eventually able to improve the pixel shader pipeline, which had a noticeable effect on the rendering speed of scenes with 3-4 textures per pixel.

29_s

Accuview - improved anti-aliasing At the time of the GeForce3's release, nVidia announced HRAA - high-resolution anti-aliasing based on multi-sample full-screen anti-aliasing. GeForce4 introduced Accuview anti-aliasing, which is essentially an improved multi-sample anti-aliasing, both in terms of quality and performance.
 Nvidia has shifted the sample positions, which should improve the quality of anti-aliasing due to the accumulation of fewer errors, especially when using Quincunx anti-aliasing. Nvidia released documentation on this procedure, but it was hardly worth reading since it didn't explain much. New filtering technology was enabled every time samples were combined on the final antialiased frame, and the technology allowed saving one full write to the frame buffer, which in turn significantly affected antialiasing performance.

nv25lma21

LMA II - new memory architectureIt was thanks to improvements in the memory architecture that the GeForce4 Ti showed such a strong lead over the GeForce3.

In GeForce3/GeForce4 the memory controller was divided into four independent controllers, each of which uses a dedicated 32-bit DDR bus. All memory requests were divided between these controllers.

In LMA II, almost every component has been improved. You can pay attention to four caches. But caching is an exclusive feature of the GeForce, since the Radeon 8500 also had similar caches. In general, much less attention was paid to caching in graphics chips than to caches in processors, since their size was not so large. The reason for this is clear: graphics chips then worked slower than memory buses, while central processors worked 2-16 times faster, so the cache played a much more important role there.

Cross memory controller (crossbar memory controller)
The GeForce3 already had this controller, allowing 64-bit, 128-bit and regular 256-bit transfers, which greatly improved throughput. In LMA II, nVidia improved the load balancing algorithms for different memory sections and modernized the priority scheme.

Visual subsystem (visibility subsystem) - discarding occluded pixels
This technology already existed in the GeForce3, but was improved in the NV25 to perform more accurate pixel rejection using less memory bandwidth. Rejection was then performed using a special cache on the chip, which reduced access to the graphics card's external memory. As Anandtech's research showed, the GeForce4 was 25% better at rejecting pixels than the GeForce3 at the same clock rate.

Lossless Z-buffer compression
Again, this feature existed in GeForce3, but thanks to the new compression algorithm in LMA II, successful 4:1 compression was more often achieved.

Vertex cache
Stores vertices after they have been sent via AGP. The cache improved the use of AGP because it avoided passing duplicate vertices (for example, if primitives had common boundaries).

Primitive cache
Accumulated primitives after their processing (after the vertex shader) into fundamental primitives for transfer to the triangle installation module.

Dual texture cache
Already existed on GeForce3. The new algorithms worked better when using multitexturing or high-quality filtering. Thanks to this, GeForce4 Ti has significantly improved performance when applying 3-4 textures.

Pixel cache
The cache was used at the end of the rendering pipeline for accumulation, very similar to the function in Intel/AMD processors. The cache accumulated a certain number of pixels and then wrote them to memory in batch mode.

Automatic pre-charge
Before reading from the memory bank, it must be precharged, which leads to delays. GeForce4 Ti could proactively charge using a special prediction algorithm.

Quick Z-clearing (Z-clear)
This feature has been known for some time and has been used in other chips. The first time fast Z-clearing was used was in an ATi Radeon chip. It simply set a flag for a specific section of the frame buffer, so that instead of filling that section with zeros, you could just set the flag, which saved memory bandwidth.

nfxii8_small1

Characteristics of NVIDIA GeForce 4 Ti 4200

Name GeForce 4 Ti 4200
Core NV25
Process technology (µm) 0,15
Transistors (millions) 63
Core frequency 250
Memory operating frequency (DDR) 222 (444)
Bus and memory type DDR-128 bit
Bandwidth (Gb/s) 7,1
Pixel pipelines 4
TMU on conveyor 2
Textures per beat 8
Textures per pass 4
Vertex conveyors 2
Pixel Shaders 1,3
Vertex Shaders 1,1
Fill Rate (Mpix/s) 1000
Fill Rate (Mtex/s) 2000
DirectX 8.0
Anti-Aliasing (Max) MS - 4x
Anisotropic Filtering (Max) 8x
Memory Capacity 64 / 128 MB
Interface AGP 4x
RAMDAC 2x350 MHz

GeForce4 Ti 4200 is a lightweight version of the GeForce4 Ti 4600 or 4400 cards; it had a lower clock frequency, but was also much cheaper. 
In many ways, the GeForce4 Ti 4200 card can be considered a potential “gravedigger” of the GeForce3 Ti 500 line. If the Ti 4200 video card, combining high performance with a low price, were released simultaneously with the more expensive GeForce4 Ti 4600 and 4400 models, the situation would be clearly not in favor of the latter. Therefore, NVIDIA delayed the release of the Ti 4200 until a later date, until there was a significant decline in sales in the GeForce3 line.

Mafia

mafia_colh_car_pursuit1