- Joined
- Sep 26, 2021
- Posts
- 6,333
- Main Camera
- Sony
In theory, you could blur the distinction of "core", replacing it with a bunch of code stream handlers that each have their own register sets and handle their own branches and perhaps simple math but share the heavier compute resources (FP and SIMD units) with other code stream handlers. Basically a sort of secretary pool, and each stream grabs a unit to do a thing or puts its work into a queue. It might work pretty well.
The tricky part is memory access. If you are running heterogenous tasks on one work blob, you basically have to have enough logically-discrete load/store units to handle address map resolution for each individual task, because modern operating systems use different maps for different tasks. Thus, this task has to constrain itself to using a single specific logical LSU for memory access so that it gets the right data and is not stepping in the space of another task.
It is a difficult choice to make, whether to maintain strict core separation or to share common resources. Each strategy has advantages and drawbacks, and it is not really possible to assess how good a design is in terms of throughput and P/W without actually building one. Building a full-scale prototype is expensive, and no one wants to spend that kind of money on a thing that might be a dud.,
AMD tried something slightly along those lines and it didn’t work out so well for them