Apple M1 ULTRA plays "dual-chip fusion"

At Apple's spring conference on March 9, Apple released many new products, the most dazzling of which is Apple's top flagship chip M1 ULTRA. So as the last chip of the M1 series, what about the M1 ULTRA?

1 M1 ULTRA = 2 M1 Max

If we briefly summarize M1 ULTRA in one sentence, it is: 2 pieces of M1 Max are integrated into one through packaging to become one chip.

The M1 Max implements breakthrough Die-to-Die (die-to-die) technology, so it can be expanded to M1 ULTRA based on two M1 Max dies, connected together by the innovative custom architecture UltraFusion, bringing two times the performance. The UltraFusion architecture uses a silicon interposer to connect more than 10,000 signal points, enabling a low-latency processor to interconnect bandwidth of 2.5TB/s between two dies.

Judging from the information released so far, Apple considered this "dual-core fusion" scheme as early as the development of the M1 Max and reserved a place for future Die-to-Die connections on the chip layout. So what's the benefit of doing this?

Performance boost

For the current Apple, the TSMC 5nm process currently used is already the top process in the industry, and TSMC's 3nm process is still on the way. So in this case, what should Apple do if it wants to launch a more powerful chip? At this point there are two ways:

The first is to design a larger chip.

The second is to use the original chips together, that is to say, use two at a time.

For friends who are familiar with the semiconductor industry, they should have heard of the high cost of 5nm chips, and then design a larger chip, even Apple's cost is difficult to resist. Then the second one is left, combining two chips or more of the existing chips.

If you want to connect two chips together, the current mainstream practice in the industry is to connect them through the motherboard PCB.

For example, this ASUS WS C621E SAGE motherboard is a dual-channel CPU motherboard, which supports two CPUs to work at the same time at the beginning of the design.

However, the disadvantages of doing so are also obvious. For example, the two CPU sockets and the wiring required for the corresponding connection will obviously occupy a large PCB area, and the product size will be large. And because the two CPUs are connected by PCB traces, the delay will become very large.

It is not difficult to find here that the shortcomings of connecting two CPUs through the motherboard PCB are basically caused by the long connection, so can it be solved by shortening the connection distance between the two chips? This is where the "glue duo" came into being.

A "glue duo" refers to a processor manufactured using a special method to package two or more chips together. Due to this special method like gluing two or more cores together, it is named "glue duocore".

The "glue dual-core" technology dates back to Intel's Pentium Pro in 1995, but it's not a far-fetched technology.

For example, AMD's first-generation EPYC (Xiaolong) processor uses this so-called "glue dual-core" technology. It can be clearly seen in the picture that it packs 4 Dies (bare chips) into one CPU. Such 4-die products are undoubtedly stronger than 1-die products (same die model) in performance.

In the opening video of AMD's first-generation EPYC processor, we can see that there is a millimeter-level distance between each die that is visible to the naked eye. What Apple has done is to take this one step further and encapsulate the two dies "face-to-face", making the connection between the two dies shorter. And due to the interface reserved in the previous design of the M1 Max, the two dies can communicate at a faster speed, and finally a connection bandwidth of 2.5TB/s between the two dies is achieved.

Save costs

Although everyone knows that the cost of tape-out of chips is very high, in fact, the threshold cost of tape-out is high. When a chip starts mass production, the marginal cost is relatively low. For example, making a chip is like making a plastic washbasin. The production line of the washbasin is very expensive to set up, but after you build the production line of the washbasin, the cost of the materials required later is very low.

If you want to produce another size plastic washbasin at this time, then you need to modify the production line or even build a new production line, which is expensive.

So for Apple at this time, if another larger-sized chip is designed separately, and it is still a 5nm chip, the cost is very high. Therefore, Apple's final choice is to package two M1 Max together, so that the new product can continue to use the original M1 Max production line, and only need to carry out special packaging of the bare chip at the end.

It's like a manufacturer of plastic washbasins. It used to be a carton for one washbasin, but now it has become a carton for two washbasins. At this time, you only need to reorder a larger size carton.

On the other hand, it is the yield problem of the die. When the die area is larger, the yield is lower.

Assuming that a maximum of 4 pieces of M1 Max can be produced on a small wafer, as shown in the figure, when a "dead spot" appears on the wafer, only 3 pieces of M1 Max can be produced in the end, and the yield rate is 75%. When there are two "dead pixels", only 2 M1 Max can be produced in the end, and the yield rate is 50%. But what would happen if a die as large as the M1 ULTRA were directly produced?

At this time, a maximum of 2 M1 ULTRAs can be produced on the same wafer. As shown in the figure, when there is a "dead spot" on the wafer, only one M1 ULTRA can be produced in the end, and the yield rate is 50%. When there are two "dead pixels", only 0 M1 ULTRAs can be produced in the end, and the yield rate is 0%.

It can be seen from this that when the area of ​​the die is larger and other conditions are the same, the yield will be lower. On the contrary, if a large chip is divided into two small chips, the yield rate will be improved. The yield is improved, and the corresponding cost is saved.


At the end of the article, I have some thoughts and additional information to share with you here.

  • As a Fabless (fabless model) company, Apple is powerless to the bottleneck of semiconductor process development. We can only wait for wafer factories such as TSMC and Samsung to develop more advanced processes.
  • After the process has entered the bottleneck, if the chip performance is to be improved, Apple’s approach is simple and rude, and effective.
  • The essence of Die-to-Die technology is to divide a large Die (bare chip) into several small Die (bare chips), which is cheaper.
  • There is also a trend of this kind of design in the industry now. The original large chip is disassembled into several small chips for design and manufacture.
  • This technology is not perfect, such as introducing additional heat dissipation burden. By connecting two CPUs through the motherboard PCB, the heat dissipation problem will be much smaller. After all, the spacing is large enough to be solved by multiple heat sinks.
  • Make a prediction based on this situation, the actual performance of M1 ULTRA will not be 2 times that of M1 Max, but it should be able to exceed 1.5 times. After all, Apple has the precedent of "single-tube pressing i9", and how to deal with the heat dissipation problem still has to wait for the real machine to be released.

Post a Comment