Yeah, I finally purchase EVGA Geforce RTX 3090 K|NGP|N. This is a genuine handpicked graphics card for overclocking, and the boost clock (core) is set to 1920 MHz. This is the highest level of any Geforce RTX 3090 and this prides the highest performance (side by side with GALAX HOF OC Lab Edition). As you know, semiconductors are depleted worldwide today, and it was very difficult to obtain this top-of-the-line graphic card.
This graphics card is equipped with 360mm AIO water cooler to cool the core and air cooler to chill the other components. As a result of trying overclocking, this cooling mechanism is showing higher performance than I imagined, and I think that needs no modification when basic overclocking, except vBIOS*.
*You should rewrite vBIOS to break the power limits. The default power limit is till 450W, but modified vBIOS provide the power limit till 1000W. You can get the modified vBIOS and method to flash vBIOS to graphic cards of NVIDIA are easily from some web sites.
I organized a PC for validation. I already have a PC, but it isn’t suitable for validation because it has the hard line loop which is very hard to exchange any PC parts. So, I had hope to construct an environment to validate some PC parts exclusively. A PC for validation is required easy exchangeability of parts and simplicity of composition.
Memory: [G.Skill] F4-3600C14Q -32GTZN (use only 2 modules)
M/B: [ASUS] ROG Crosshair VIII Hero (Wi-fi)
GPU: [MSI] Geforce GT1030 2G LP OC
SSD: [Plextor] PX-1TM8VC+
PSU:[CoolerMaster] V1200 Platinum
Cooler: [EVGA] CLC 360 Liquid CPU Cooler
Case: [Streacom] BC1 Open Bench Table
Thermal paste: [Shinwa Sangyo] OC Master SMZ-01R
I overclocked CPU and memories easily in this environment, and try some benchmark tests, to validate performance and durability. In conclusion, there are nice durability, but scope for improvement about performance. I’ll tuning this environment continuously, and try to achieve high performance without losing durability for daily-use.
I lined up early morinig in Akihabara, and purchased Ryzen 9 5950X.
I’ve already builded the hard loop PC with the Ryzen 9 3950X, so for the time being, I plan to use this processor for verification.
But I don’t have enough parts for building a PC, so I should wait to use this processor. I’m sad. Having said that, I’m looking forward to exam Ryzen 9 5950X, because I heard that zen3 processor can achieve higher Infinity Fabric clock than zen2 though it mounts same I/O die in terms of processing rule.
The lot number is “2037 SUS.” This means that the location this processor was integrated was China (= SU) and the location the I/O die was manufactured was Saratoga (= S). About the former, there is other one from Malaysia (= PG), and about the latter, there is other one from Texas (= T). In other words, there are four types of processor production areas: SUT / SUS / PGT / PGS.
I purchased DRAMs from G.Skill, F4-3600C16D-32GTZN. I already have F4-3600C14Q-32GTZN, but these are single rank memories. I want to compare four single rank memories and two dual rank memories, at same memory capacity, same memory frequency (and near timings). And, in the QVL of ASUS Crosshair VIII Hero (Wi-fi) which I adopt my PC, there are F4-3600C16D-32GTZN (at two DRAMs), so I can run them at eaI purchased DRAMs from G.Skill, F4-3600C16D-32GTZN. I already have F4-3600C14Q-32GTZN, but these are single rank memories. I want to compare four single rank memories and two dual rank memories, at same memory capacity, same memory frequency (and near timings). And, in the QVL of ASUS Crosshair VIII Hero (Wi-fi) which I adopt in my PC, there are F4-3600C16D-32GTZN (at two DRAMs), so I can operate them at ease relatively. F4-3600C16D-32GTZN equip Samsung B-dies, and these can run at the lowest timings (16-16-16-36) as dual rank memory at 3600 MHz. I’m looking forward to tuning them because the XMP can run at 1.35 V, so there is margin till the limit of voltage of Samsung B-die.
G.SkillのDRAM,F4-3600C16D-32GTZNを購入しました.私は既にF4-3600C14Q-32GTZNを持っていますが,これらはシングルランクのメモリです.私は,同じメモリ容量,同じメモリ周波数(,および近似したタイミング) で,4枚のシングルランクメモリと2枚のデュアルランクメモリを比較したいと考えています.また,私のPCに採用しているASUS Crosshair VIII Hero(Wi-fi)のQVLには,F4-3600C16D-32GTZN(DRAM2個分)が載っており,比較的安心して操作できます.F4-3600C16D-32GTZNはSamsung Bダイを用いており,これらは3600 MHzのデュアルランクメモリとしては最低のタイミング(16-16-16-36)で稼働できます.このXMPは1.35 Vで動作し,Samsung B-dieの電圧の限界まで猶予があるので,それらを調整することを楽しみにしています.
In this article, I will write about DRAM timings. Since a DRAM plays the important role in an operation of the Zen architecture, so it is necessary DRAM tuning when you want to extract better performance of Ryzen. DRAM tuning mainly focuses on the memory frequency and DRAM timings (memory latency). And in Zen2, the optimal memory clock is already determined (3733 MHz or 3800 MHz), so it is important to tune DRAM timings. For better tuning DRAM timings, you should know what it is to some extent. So, I will introduce some of the knowledge about DRAM timings and write down the standards of setting them which I follow, in this article.
DRAM timings have the primary timings, the secondary timings, the tertiary timings. The range of tuning is often as far as the secondary timings (DRAM Calculator for Ryzen also cover as far as the secondary timings). Therefore, this article targets the primary timings and the secondary timings.
DRAMタイミングには,ファーストタイミング(プライマリタイミング),セカンドタイミング,サードタイミングがあります.DRAMタイミングのチューニングにおいては,セカンドタイミングまでを対象とすることが多いです(DRAM Calculator for Ryzenでも,対象となっているのはセカンドタイミングまでです).したがって,この記事ではセカンドタイミングまでを対象とします.
Knowledge about Operation of SDRAM for Understanding the Primary Timings
The primary timings consists of tRCD, tRP, tRAS, including tCL (CAS latency) which is the most important DRAM timing (and CR is often included). I will write here the outline of 1st timings with introducing the basic operation of DRAM, especially SDRAM including DDR4.
SDRAM (Synchronous Dynamic Random Access Memory) operates based on clock signals and has the function that synchronizes with the clock of a memory controller (on a CPU) or system buses (transmission paths connecting the chipset and the CPU), and so on. DDR SDRAM (Double Data Rate SDRAM) including DDR4 achieve twice bandwidth per clock compared to SDR SDRAM (Single Data Rate SDRAM, which sends or receives data only at the rising edges of the clock), due to that a CPU distribute read/write processing to two or more memory banks, and each memory banks send or receive data at two timings, the rising edges and the falling edges of a clock. Also, the next command can be issued without waiting for completing the processing by the previous command (for example, a pre command (described later) can be issued before all memory banks have completed reading the data). In addition, in an operation of DRAM, when determine a cell to be read/written, a row is determined first and a column determined second, and there is the function called burst mode which if once the column is determine, the next column is determined automatically for next read/written, and which can save operation time by omitting command to determine the second and subsequent column.
SDRAM(Synchronous Dynamic Random Access Memory)は,クロック信号に基づき動作し,(CPU上の)メモリコントローラーやシステムバス(チップセットとCPUをつなぐ伝送路)等のクロックと同期する仕組みを持っています.DDR4等のDDR SDRAM(Double Data Rate SDRAM)は,CPUが2つ以上のメモリバンクに読み書きの処理を振り分け,それぞれのメモリバンクがクロックの立ち上がりと立ち下がりの2つのタイミングでデータを送受信することで,SDR SDRAM(Single Data Rate SDRAM,これのデータ送信はクロックの立ち上がりのみで行われる)に対し,クロック当たり2倍の帯域を実現しています.また,DDR SDRAMは,以前の命令の処理の完了を待たずに次の命令を発行することができます(すべてのメモリバンクがデータの読み取りを完了する前にpreコマンド(後述)を発行できる,等).さらにDRAMの動作では,読み書きの対象となるセルの指定に当たり,まず行を指定し次に列を指定しますが,一度特定の列が指定されれば自動的に次の列が次の読み書きの対象となるバーストモードという機能を備えており,2回目以降の列指定の命令を省くことで動作時間の短縮に貢献します.
To understand some DRAM timings, you need to know some of the basic commands in DRAM operation.
CS (chip select): It determines which memory chip is being accessed.
act (activate): It determines a bank address and a row address.
read/write: It determines a column address, and read/write data. It also include auto precharge individual columns.
pre (precharge): It closes the row (thus the row has remained open til fetch a precharge command), terminates all operation, and returns the memory chip to the standby state.
ref (refresh): The capacitors used for DRAM are volatile (they leaks electric charge unless written), so data will be lost if left untouched. To prevent this, it is necessary to periodically rewrite (refresh) the precharged memory chip.
The diagram about very simplified operation of DDR SDRAM The diagram about very simplified state transitions of SDRAM
Meaning of the Primary Timings
The primary timings are the most basic and important one of DRAM timings. Since there are few kinds of primary timings, the basic tuning method is to reduce the value one by one while balancing with the DRAM voltage based on the XMP of the DRAM.
tCL or tCAS (Column Address Strobe Latency): This means the number of clocks from when a column is selected by read/write command to when start read/written data. There are three types of read/write transaction of DRAM corresponding to status of the bank (page): Page-Hit Access (when the bank targeted to read/written is already open, and an act command isn’t required); Page-Empty Access (when the bank targeted to read/written is closed, so an act command is required); Page-Miss Access (when the wrong row is being opened so it is necessary to close the row opened currently and open a different row, thus it is necessary not only act command but also pre command before the act command). And tCL is the latency intervene in all of them, thus, tCL greatly affects the performance of DRAM. Therefore, it is important to reduce tCL preferentially.
tRAS (Raw Address Strobe Latency): This means the minimum number of clocks required from when selecting a row by an act command to when issuing a pre command. The target row remains opened during column selection and data reading/writing, so the range of tRAS is till a precharge command.
tRCD (Row Address to Column Address Delay): This means the minimum number of clocks required from when selecting a row by an act command to when issue a read/write command. There are two kinds of tRCDs: tRCDWR is tRCD which is required when a write command is issued; tRCDRD is required when a read command is issued.
tRP (Row Precharge Time): This means the minimum number of clocks required to complete a precharging.
CR (Command Rate): This means the number of consecutive clocks required when a memory controller sends commands to a memory bank. The CR is indicated as 1T or 2T (often 1N or 2N), each means one clock and two clocks. If the CR is 1T, each of the commands are sent only once to the memory bank which takes one clock (therefore, one clock delay occurs), and if 2T, each of the commands are sent twice which takes one clock (therefore, two clocks delay occurs). Since the CR is delay about each of the commands, it significantly influence the performance of DRAM. So you should set 1T (1N) preferentially.
The latency related to the commands in SDRAM (DRAM timings) is indicated in clock units, via the character of operation of SDRAM that synchronize with clock signals. For example, tCL 2 means that the latency between opening a column and reading/writing data is 2 clocks. The tCL can also be expressed in actual time, which is called true latency and is calculated as follows.
When comparing multiple DRAMs, if the memory frequencies are different, the required time at Column Address Strobe indicated by the number of clocks will be different. If you want to compare the performance of DRAMs with different memory frequencies and DRAM timings, for this reason, it is necessary to refer to True Latency. The higher value of the memory frequency and the smaller value of DRAM timing means the better DRAM respect of performance, and therefore, the DRAM which have the smaller True Latency is the better one.
For Zen2, it has been recommended that you set memory frequency 3733 MHz or 3800 MHz, and it is ok to overclock a DRAM which works 3200 MHz or to downclock a DRAM which works 4000 MHz. This True Latency can be helpful when choosing a DRAM to use with a Zen 2 processor. For example, in the case of (I) F4-3200C14-8gGFX (3200 MHz, CL14), (II) F4-3600C15-8gGTZ (3600 MHz, CL15) and (III) F4-4000C17-8GTZR (4000 MHz, CL17), the each actual latency is (I) 8.75 ns, (II) 8.33 ns, and (III) 8.5 ns, so the performance when operating at XMP is (II) > (III) > (I).
Meaning of the Secondary Timings and Standards for Setting Them
tRC (Row Cycle Time): This means the minimum number of clocks from when opening a row to when completing precharging (row cycle). tRC = tRP + tRAS.
tRRD (RAS to RAS Delay): This is the number of clocks required between two issuance of the RAS signal (which is enabled between the act command and the pre command). There are two kinds of tRRDs: tRRDS means tRRD among different memory bank groups; tRRDL means among the same memory bank groups. The values I often use are: tRRDS -> 4; tRRDL -> 6.
tFAW (Four Activate Window): DRAM can be activated up to four memory banks (and rows) per memory rank, and tFAW means the time between when four memory banks are opened to when the next memory bank can be opened. tFAW = 4 * tRRDS.
tWTR (Write to Read Delay): This means the number of clocks required after a write command was completed successfully before a read command can be issued. There are two kinds of tWTRs: tWTRS means tWTR among different memory bank groups; tWTRL means among the same memory bank groups. The values I often use are: tWTRS -> 6; tWTRL -> 8.
tWR (Write Recovery Time): This means the number of clocks required after a write command was completed successfully before a pre command is issued. The value I often use is: tWR -> 12.
tRFC (Refresh Cycle Time): This means the number of clocks required from when a ref command is issued to when the next act command can be issued. tRFC = n * tRC, n = 6 or 7. One of the strengths of the Samsung B-Die chip is that it can significantly reduce tRFC, so it is best to target less than 300 when tuning the DRAM in which it is used. (And tRFC has two other variations, but in Ryzen, they aren’t used: tRFC2 = tRFC / 1.346 (at double frequency mode); tRFC4 = tRFC2 / 1.625 (at quad frequency mode).)
tCWL (CAS Write Latency): It is the number of clocks required from when an activation of the column of DRAM to when the execution of the write command, and it is de facto tCL. tCWL = tCL.
tRTP (Read to Precharge Delay): This means the number of clocks required from when issuance of the read command to when issurance the pre command in the same memory rank. tRTP = tWR / 2.
tCKE (Clock Enable Time): This means the number of clocks required to issue a CKE signal. All commands of DRAM operations are fetched only when the CKE signal is high (that is, commands are not read when the CKE signal is low). Therefore, when PowerDownMode (function to make the CKE signal low) is off, the value of tCKE has no effect on the DRAM operation because the CKE signal doesn’t become low.
tRDRD (Read to Read Delay): This means the number of clocks required from when a read command is issued to when the next read command can be issued. There are four kinds of tRDRDs: tRDRDSc means tRDRD among different memory bank groups; tRDRDScL means among the same memory bank groups; tRDRDSd means among different memory ranks; tRDRDDd means among different DRAMs. The values I often use are: tRDRDSc -> 1; tRDRDScL -> 4; tRDRDSd -> 4; tRDRDDd -> 4.
tWRWR (Write to Write Delay): This means the number of clocks required from when a write command is issued to when the next write command can be issued. There are four kinds of tWRWRs: tWRWRSc means tWRWR among different memory bank groups; tWRWRScL means among the same memory bank groups; tWRWRSd means among different memory ranks; tWRWRDd means among different DRAMs. The values I often use are: tWRWRSc -> 1; tWRWRScL -> 4; tWRWRSd -> 6; tWRWRDd -> 6.
tRDWR (Read Write Command Spacing): This means the number of clocks required from when a read command is issued to when the next write command can be issued in the same memory rank. The value I often use is: tRDWR -> 8.
tWRRD (Write Read Command Spacing): This means the number of clocks required from when a read command is issued to when the next write command can be issued in the same memory rank. The value I often use is: tWRRD -> 2.