Birentech, a small enterprise based in Shanghai, China, has released the country's most powerful General-Purpose GPU, the Biren BR100.
China Makes Its Most Powerful General-Purpose GPU To Date, The Birentech BR100 With 77 Billion Transistors
The Birentech BR100 is the flagship General-Purpose GPU that China has to offer, featuring an in-house GPU architecture that utilizes a 7nm process node and houses 77 Billion transistors within its die. The GPU has been fabricated on TSMC's 2.5D CoWoS design and also comes packed with 300 MB of on-chip cache, 64 GB of HBM2e with a memory bandwidth of 2.3 TB/s, and support for PCIe Gen 5.0 (CXL interconnect protocol).



2 of 9
During the announcement, Brientech disclosed various performance metrics of the chip. It offers up to 2048 TOPs (INT8), 1024 TFLOPs (BF16), 512 TFLOPs (TF32+), 256 TFLOPs (FP32), and based on the performance figures, it looks like this chip is going to be faster than the NVIDIA Ampere A100, at least on paper. The Hopper H100 GPU offers nearly 2x or 2.5x the performance in the same GPU performance metrics. The chip also supports 64-channel encoding and 512-channel encoding.

What's interesting is that the BR100 isn't that far behind in terms of overall transistor count compared to the NVIDIA H100. The H100 features 80 Billion transistors on the new N4 process node whereas the BR100 is only 3 Billion transistors behind the 7nm process node. This would lead to a much bigger die size.


2 of 9
| Biren BR100 | |
|---|---|
| Process | 7nm |
| System interface, bandwidth, interconnection protocol | PCIe5.0 X16, 128GB/s, support CXL |
| FP32 TFLOPS (peak) | 256 |
| TF32+ TFLOPS (peak) | 512 |
| BF16 TFLOPS (peak) | 1,024 |
| INT8 TOPS (peak) | 2,048 |
| Memory capacity, interface bit width, bandwidth | 64GB HBM2E;4,096bit, 1.64TB/s |
| interconnection | 512GB/s BLink™, supports 8 x8 ports |
| Secure virtual instance | Up to 8 servings |
| Video codec (FHD@30fps) | 64-channel HEVC/H.264 encoding/512-channel HEVC/H.264 decoding |
| TDP | 550W |
| Product form | OAM module |
The Biren BR100 isn't the only chip that the China-based company has announced. There's also the Biren BR104 which offers half the performance metrics of the BR100 but the specifications aren't told yet. The only detail available on the other chip is that, unlike the Biren BR100 which uses a chiplet design, the BR104 is a monolithic die and comes in a standard PCIe form factor with a TDP of 300W.

| Biren 104 | |
|---|---|
| Process | 7nm |
| System interface, bandwidth, interconnection protocol | PCIe5.0 X16, 128GB/s, support CXL |
| FP32 TFLOPS (peak) | 128 |
| TF32+ TFLOPS (peak) | 256 |
| BF16 TFLOPS (peak) | 512 |
| INT8 TOPS (peak) | 1,024 |
| Memory capacity, interface bit width, bandwidth | 32GB HBM2E; 2,048bit, 819GB/s |
| interconnection | 192GB/s BLink™, supports 3 x8 ports |
| Secure virtual instance | up to 4 servings |
| Video codec (FHD@30fps) | 32 channels of HEVC/H.264 encoding, 256 channels of HEVC/H.264 decoding |
| TDP | 300W |
| Product form | Full-height full-length, dual-slot PCIe card |
The company states that a chip with 77 Billion transistors can mimic the human brain nerve cells and the chip itself will be used for DNN and AI purposes so it is more or less going to replace China's dependence on NVIDIA's AI GPUs.

Pictures shown off during the event reveal that the GPU will come in an OAM form factor board and will use the company's own tower-type passive-cooler solution.










