Questions about Apple Display Engines

theorist9

Site Champ
Posts
613
Reaction score
563
What determines how rapidly a Mac can respond graphically to user input (scrolling through long files, Mission Control, window tiling, scrolling through Finder's Gallery View, etc.)—is it the the GPU, the display engine, or both?

For instance, I've read that some Macs can feel sluggish with the last supported OS, because that OS may have graphical features that overtax the device's capabilities—such that it's recommended to turn off additional graphical features (e.g., image transparency). Which of the two above components would determine the device's peformance in this area?

As a concrete example: Let's assume the external display engines Apple uses in the M3 Pro and M3 Max have the same capabilities (they can both drive up to 6k@60Hz over TB and 8k@60Hz/4k@240Hz over HDMI). Suppose it's 2028 and you've just installed MacOS 18 on your M3 Pro, and find that its graphical responsiveness feels a bit sluggish. If that responsiveness is detemined principally by the display engine, then an M3 Max would suffer from the same issues. But if if the GPU plays an important role, then the M3 Max should perform better.

Here's another example: Suppose I'm using Finder's preview (the "eye" icon) while scrolling through the files on my desktop, in order to find a screenshot I took (and that I didn't name). My iMac can't scroll through those as fast as I could examine them, since it can't switch between images quickly enough. Same thing if I instead scroll through them using Finder's Gallery View. Is this limitation because of the display engine or the GPU?
 
Last edited:

mr_roboto

Site Champ
Posts
288
Reaction score
464
UI responsiveness has nothing to do with the display engine.

I am not a real expert on it but the Apple Silicon pixel pipeline looks something like this:

1. Applications use CPU and/or GPU cores to write pixel values into window buffers
2. WindowServer (part of macOS) uses the GPU to composite all visible window buffers into a frame buffer
3. Display engine reads from frame buffer to refresh the display

Window and frame buffers are just chunks of RAM.

The fanciness of the graphics computed in #1 doesn't scale the difficulty of tasks #2 and #3, they're mainly about schlepping pixel values computed by somebody else to the display. There is some computation involved in #2 since it does window-level transparency effects (such as alpha blending the drop shadows surrounding each window), but the amount of it per unit of window area doesn't go up often. (The notable exceptions being when Retina displays quadrupled the pixel count per unit area, and that time when Apple decided to greatly expand use of transparency + blur effects to let the colors behind a window bleed through a bit. But after some backlash they had to limit that, tone what remained down, and give us a switch to turn it off, so it's possible current macOS does less of that out of the box than versions from several years ago.)

If you ever ran into problems with the display engine (#3) not keeping up, you wouldn't experience display lag. Display refresh is a hard-realtime task; if the display engine falls behind you'll get to see wrong pixel values (possibly random garbage). For this reason, Apple's display engines should have the highest memory read QoS in the system, or close to it.
 

jbailey

Power User
Posts
170
Reaction score
187
Apple calls their display engine the DCP (Display Controller Processor) according to the Asahi Linux developers. Their work is the best place to get info on this component.

From their 2021 progress report:
One of the biggest challenges for Asahi Linux is making the M1’s GPU work. But what most people think of as a “GPU” is actually two completely distinct pieces of hardware: the GPU proper, which is in charge of rendering frames in memory, and the display controller, which is in charge of sending those rendered frames from memory to the display.

And
before we can use the GPU to render anything, we need a way to put it on the screen! Up until now, we’ve been using the firmware-provided framebuffer, which is just an area of memory where we can write pixels to be shown on the screen, but this won’t cut it for a real desktop. We need features such as displaying new frames without tearing, support for hardware sprites such as the mouse cursor, switching resolutions and configuring multiple outputs, and more. This is the job of the display controller.

On most mobile SoCs, the display controller is just a piece of hardware with simple registers. While this is true on the M1 as well, Apple decided to give it a twist. They added a coprocessor to the display engine (called DCP), which runs its own firmware (initialized by the system bootloader), and moved most of the display driver into the coprocessor. But instead of doing it at a natural driver boundary… they took half of their macOS C++ driver, moved it into the DCP, and created a remote procedure call interface so that each half can call methods on C++ objects on the other CPU! Talk about overcomplicating things…

Basically Apple’s DCP is a complicated beast but it seems very well protected from obsolescence since the resolution and framerate limitations are already part of the driver.
 
Last edited:

leman

Site Champ
Posts
641
Reaction score
1,196
As @mr_roboto says, display engines are responsible for sending the final image to the display. No matter how complex your image is, the amount of work for display engines does not change. The only time when I imagine this step having performance issues is if your system is memory bandwidth starved and cannot supply the frame buffer data quickly enough.

Regarding your question, there are multiple pieces to the puzzle. I don't think one can confidently state that component X is responsible without a througough profiling of the entire graphical pipeline. Just some potential causes for slowdowns:

- GPU not being able to process graphical commands quickly enough
- Algorithmic complexity (e.g. the layout computation system that cannot handle large amount of UI elements)
- Software design (e.g. if resizing is expensive for some window due to some custom computation, trying to continuously resize and render might be choppy)
- Animation design (certain styles of transitions and/or animations can appear sluggish under certain circumstances)

etc...
 

theorist9

Site Champ
Posts
613
Reaction score
563
Thanks everyone for your explanations.
Regarding your question, there are multiple pieces to the puzzle.
That certainly seems to apply to my last example (the rate at which images can change while scrolling through a directory using Gallery View).

As a test, I used a directory containing 109 JPEG photos (510.3 MB total => 4.68 MB/pic)

I put Finder into Gallery View, depressed the arrow key, and counted how many different photos flashed up on the screen when I scanned through the entire directory. The test was repeated under four separate conditions. In all cases the scanning speed remained the same: 9 s to scan through the directory (12 photos/s).

On my 218 ppi Retina display, I only saw ≈30 different photos (which means it was able to display new images at a rate of 3.3 photos/s). When I switched to my 94 ppi WUXGA display (which requires only 1/5 the pixels/image), my system was able to display ≈50 different photos (5.6 photos/s). This suggests a GPU limitation.

As I was doing this, I noticed (from Activity Monitor) that disk read rates increased markedly, indicating (as makes sense) that the computer needed to access the SSD to read the files and generate a Gallery preview image. To see whether disk access speed was also a limiting factor, I repeated the test after copying the directory to a RAM disk. I found the number of photos/s increased, which suggests disk access times may also play a role:
Retina display: ≈40 photos (4.4 photos/s); WUXGA display: ≈60 photos (6.7 photos/s)

[Note: I repeated each of the four tests four times, and got consistent results.]
 
Last edited:

theorist9

Site Champ
Posts
613
Reaction score
563
Another display engine question: Based on Apple's spec's, the M2 and M3 Pro MBP's can either drive two lower-bandwidth displays (2 x 6k@60 or 1 x 6k@60 + 1 x 4k@144), or a single higher-bandwidth display (8k@60 or 4k@240). And we see a similar one-display decrease when the M2 and M3 Max MBP's are driving a higher-bandwidth display.

Consider the M3 Pro. That chip has two external display engines (plus an internal display engine for the laptop screen) ( https://techboards.net/threads/m3-core-counts-and-performance.4282/page-21#post-149581 ). Why does it lose its ability to drive a 2nd external display when it's driving an 8k@60 or a 4k@240? These are the only two possibilities I can think of, but I have no idea if either one applies:

1) It's bridging those two display engines together in order to drive that single display.

2) It needs to (and is somehow able to) internally reallocate PCIe lanes to the HDMI port when it drives a higher-bandwidth display, thus leaving it without enough I/O for a 2nd external monitor.

Here's the display support verbiage for the M3 Pro and Max MBP's, from Apple's tech specs; that for the M2 is identical:
1704435782198.png
 
Last edited:

Nycturne

Elite Member
Posts
1,139
Reaction score
1,488
As I was doing this, I noticed (from Activity Monitor) that disk read rates increased markedly, indicating (as makes sense) that the computer needed to access the SSD to read the files and generate a Gallery preview image. To see whether disk access speed was also a limiting factor, I repeated the test after copying the directory to a RAM disk. I found the number of photos/s increased, which suggests disk access times may also play a role:
Retina display: ≈40 photos (4.4 photos/s); WUXGA display: ≈60 photos (6.7 photos/s)

This is a common scrolling scenario. It's one that I need to dig into deeper in my own app as it does something similar with album artwork, and I haven't yet found the right balance in my use cases to ensure a great scrolling experience, but I haven't gotten to the point where it makes sense to sit down with Instruments and profile it yet to understand what it is I fat fingered.

Disk access times absolutely play a role as the balance is between memory usage, responsiveness, and CPU usage. A gallery is effectively an unbounded list of images. But I want to render thumbnails for each. Because it's unbounded, pre-loading the images into buffers is not only memory intensive, it's super slow. So you need to be able to stream the images from disk, and try to manage how much memory is used to cache the image buffers to keep the memory profile flat in the face of the unbounded list (i.e. memory should be "O[1]" rather than "O[n]" where n is the number of images). Even when I want to display a single large image at full resolution, things can get messy as my images get large. Think a 100MP panorama from a DSLR (or a single medium format image) that represents over a GB of uncompressed image data.

What's a common approach for cases like this is that there's an asynchronous pipeline feeding thumbnails to the UI thread during scrolling. If an image's buffer isn't ready in time, it "misses the train". If you load images on the UI thread instead, you get stuttering in your scrolling rather than pop-in as the scroll event runs for longer than the ~16ms it takes to keep a 60Hz display fed.

But generally with UIKit/AppKit/CoreAnimation type stuff, if you have places where performance dips, it's a surprisingly safe bet it is related to how quickly the layout and rasterization engine can respond to changes as it fills the texture buffers before the compositing engine goes to work. A lot of it is very fast these days, but images especially still tend to be an achilles heel simply because so much of the initial processing needs to be done on the CPU (decompression), can be bottlenecked by I/O, and that management of caches/etc is a per-app exercise. It's also pretty easy for developers to take the easy way out and read images into buffers in ways that aren't performant.

And just to add to mr_roboto's excellent summary: In UIKit and modern AppKit, you can do GPU compositing inside Windows as well. But there's a bunch of caveats with AppKit + CoreAnimation especially. iOS also has the same issues as UIViews are always backed by CoreAnimation layers, but it's a lot less common to use iOS on a non-retina display. Because you can have a CALayer with a clear background, or with rotation/scaling applied to it, there were many cases where subpixel AA would get disabled once you started backing NSViews with CALayers or font rasterization would get weird because pixels in the layer's texture weren't 1:1 to pixels in the screen's framebuffer. This ties into Mojave as to get the blur effects they added, they needed to back things with CALayers that historically weren't, which is why subpixel AA went out the window at that point in time.

Another display engine question: Based on Apple's spec's, the M2 and M3 Pro MBP's can either drive two lower-bandwidth displays (2 x 6k@60 or 1 x 6k@60 + 1 x 4k@144), or a single higher-bandwidth display (8k@60 or 4k@240). And we see a similar one-display decrease when the M2 and M3 Max MBP's are driving a higher-bandwidth display.

Consider the M3 Pro. That chip has two external display engines (plus an internal display engine for the laptop screen) ( https://techboards.net/threads/m3-core-counts-and-performance.4282/page-21#post-149581 ). Why does it lose its ability to drive a 2nd external display when it's driving an 8k@60 or a 4k@240? These are the only two possibilities I can think of, but I have no idea if either one applies:

For 8K60, it likely is bridging the two display engines together. It is likely using some tricks similar to the custom controllers that they used in the 5K iMac to use two displayport streams to drive a single 5K display back in the day, but using the output of both display engines to feed a single HDMI signal. How it's all wired together is well beyond my pay grade.
 
Last edited:

theorist9

Site Champ
Posts
613
Reaction score
563
For 8K60, it likely is bridging the two display engines together. It is likely using some tricks similar to the custom controllers that they used in the 5K iMac to use two displayport streams to drive a single 5K display back in the day, but using the output of both display engines to feed a single HDMI signal. How it's all wired together is well beyond my pay grade.
I'd also asked jovet, who posts on MR, and is the developer of the AllRez display utility, about this. He just replied, agreeing scenario #1 is a "reasonable guess".

But, interestingly, he pointed out we don't actually know that the M2/M3 Pro absolutely can't drive a 2nd display when the other one is an 8k60, until someone actually tests it, since sometimes Apple under-reports its display capacity. As an example, he mentioned that "Apple said the Vega MPX module can't do three 6K displays but then someone tried it and it works fine." He explained Apple can do that because even if it works, the performance reduction or power requirements may be unacceptable (or it may work sometimes but not others).

He also found this 2019 presentation, by Manasi Navare, on how Intel GPU's do 8k with Linux
https://lpc.events/event/5/contributions/317/attachments/435/689/XDC2019_8K_Trans_Port_Sync.pptx.pdf
 
Last edited:

Nycturne

Elite Member
Posts
1,139
Reaction score
1,488
He also found this 2019 presentation, by Manasi Navare, on how Intel GPU's do 8k with Linux
https://lpc.events/event/5/contributions/317/attachments/435/689/XDC2019_8K_Trans_Port_Sync.pptx.pdf

This is very similar (if not nearly identical) to how Apple does 5K/6K over DP1.2 for the iMac 5K and the XDR display. And a lot of the sync work can be done internally to the system when feeding the HDMI 2.1 port, masking the complexity of such a system.

While I believe jovet‘s comments about Apple under-reporting and agree that testing it will give us more details (especially since there’s tools to dump details about the graphics subsystems that would be illuminating), my own skepticism in this particular case stems from two points:

1) As the presentation makes clear as well, 8K over DP1.4 requires DSC, or two DP1.4 feeds. And Apple feeds the HDMI port using DisplayPort.
2) Apple seems to still be using DP1.4.

My suspicion is that at least for now, unless you use an 8K HDMI display that supports DSC, you need two streams of DP1.4 to feed it. What would need testing is to see if a display that supports DSC can be brought down low enough in terms of bandwidth to free up the second display controller.
 
Top Bottom
1 2