Jimmyjames
Elite Member
- Joined
- Jul 13, 2022
- Posts
- 1,109
Had a chance to briefly check out the 13” earlier. Really nice screen and very nice and light to hold.
Seriously? They have not figured out how to do unimplemented instruction exceptions? What, is this kindergarteners-with-crayons software design?
I have to raise eyebrows on this one.
My naive assumption was that you should be able to trap on an unimplemented instruction, and then reschedule that thread on a more capable core. No emulation of unimplemented instructions needed. A nonstupid OS would also flag that thread to only execute on the better cores from then on. Is there something about AVX512 that makes this hard?I have to raise eyebrows on this one. Especially in places where SIMD-type instructions get used, shortcuts are most likely to happen to reduce overhead and improve performance, and then you get to live with the legacy of those decisions many years later.
The last time I even had to think about this stuff, the pattern was: check for availability, then pick which implementation to use based on that, cache the result so you didn't get all hung up on re-calculating which intrinsics to be using every time. Heterogenous compute like we see today wasn't even a thing yet (minus GPGPU compute). With AVX512, it's a more recent (and niche) case, but even if we assume libraries get updated to emulate the AVX512 with other instructions in the case of the exception, it gets messy:
A) What's the overhead of emulating AVX512 from the exception handler? Is it worse than just using AVX256 everywhere?
B) Who owns the state of which intrinsics are being used? That needs to be updated to avoid repeated exceptions if the performance of the above isn't sufficient.
C) If the library implementing the intrinsics (assuming one is used) holds the state, and updates the library properly to use TLS for core-specific state, will the app? Is the library statically linked requiring that the app using the library update?
D) Does the app using said library even use best practices?
E) Finally, if we don't have a good way to "upgrade" back to AVX512 on the large cores and update the state again, why bother?
On one hand, I get that it'd be great if a thread could degrade gracefully on the small cores, but I get why Intel went with the more compatible approach. Maybe I've just spent too much time in legacy software stacks.
Welcome!My naive assumption was that you should be able to trap on an unimplemented instruction, and then reschedule that thread on a more capable core. No emulation of unimplemented instructions needed. A nonstupid OS would also flag that thread to only execute on the better cores from then on. Is there something about AVX512 that makes this hard?
Love to see new people not spouting gibberish!My naive assumption was that you should be able to trap on an unimplemented instruction, and then reschedule that thread on a more capable core. No emulation of unimplemented instructions needed. A nonstupid OS would also flag that thread to only execute on the better cores from then on. Is there something about AVX512 that makes this hard?
Thanks... I eventually found this place (a couple months ago, have occasionally lurked since then) from discussions in MR. SNR there is depressing.Love to see new people not spouting gibberish!
The new keyboard is nice - having an escape key feels like a breath of fresh air. I fired up vim (my muscle memory still works!) and ESC does what ESC is supposed to do No more cmd-. for me.
When on the lock screen and authenticated, ESC also brings you to the home screen, so you don’t have to swipe up on the screen.
Yeah scanning the binary was my first thought as well, but immediately thought of a few potential issues:Well the binary is there and the instructions are encoded in the binary. Lots of ways to do it, and yes many are messy. But intel isn’t helping.
In some cases it's even worse. IIRC, for numpy you need to choose whether or not to use some CPU features at compile time.The last time I even had to think about this stuff, the pattern was: check for availability, then pick which implementation to use based on that, cache the result so you didn't get all hung up on re-calculating which intrinsics to be using every time. Heterogenous compute like we see today wasn't even a thing yet (minus GPGPU compute). With AVX512, it's a more recent (and niche) case, but even if we assume libraries get updated to emulate the AVX512 with other instructions in the case of the exception, it gets messy:
Hm that sounds like something that could workMy naive assumption was that you should be able to trap on an unimplemented instruction, and then reschedule that thread on a more capable core. No emulation of unimplemented instructions needed. A nonstupid OS would also flag that thread to only execute on the better cores from then on. Is there something about AVX512 that makes this hard?
Can relateHaving faceid on the correct edge has also saved me at least 5 bouts of annoyance so far - on my old ipad, it seems like i always had my hand in the way.
As far as I can tell it only goes to the Home Screen from the Lock Screen.It took me a while to find out how to do the ESC on the old keyboard, because some sites seem to think that you are always surfing with a computer.
I'm always using Cmd-H for the home screen, but simply using ESC might be quicker.
How does iPadOS decide whether ESC should be used to "escape" something or go back to the home screen?
I am old enough to remember when ESC actually meant something. We used to use it, on some systems, to enter commands. Like, ESC followed by C was the "copy" command. I suppose there may still be some venerable COBOL-based hardware that works that way.
The original versions of microsoft’s spreadsheet (multiplan), and word processor (word) used ESC to change focus to the menu. To save your file, you’d do ESC-T-S (the “t” stood for transfer).I've forgotten if Forth or APL use it. Probably.
If i were designing a CPU where, for some dumb reason, the cores were sufficiently heterogeneous that only certain instructions could execute on certain cores, and if I couldn’t get the OS maker’s to deal with it properly in software, I guess there are a couple things I could do to deal with it. Maybe most likely would be to detect the illegal instruction during decoding and trap it myself, sending a message to a core-scheduling block that lives outside any core and coordinates shifting things between cores. I’d make sure cores are virtualized, so that the OS can only request certain properties (e.g. “priority/speed”) when issuing threads, but cannot rely on the CPU actually picking any specific core. The CPU would dynamically move threads to cores that can handle them. There would be a performance penalty, approximately equivalent to 2x a branch misprediction, each time a thread had to move, but any given thread would presumably only move once (because if it has the illegal instruction in its instruction stream once, you have to assume it will happen again). You’d essentially flush the pipelines like in a branch mispredict, then write out the register file and program counter to the new core.Yeah scanning the binary was my first thought as well, but immediately thought of a few potential issues:
- The scheduling would need to be per-app instead of per-thread. So using AVX512 in some part of the app might prevent other unrelated non-AVX512 parts of the app. Seems like the granularity is too large.
- What about dynamically linked libraries that use AVX512 while the main app doesn't?
- What about JIT compilers?
Possible, but yeah, messy. I think Intel is in a bad position to attempt to fix this.
In some cases it's even worse. IIRC, for numpy you need to choose whether or not to use some CPU features at compile time.
Hm that sounds like something that could work
Can relate
My naive assumption was that you should be able to trap on an unimplemented instruction, and then reschedule that thread on a more capable core. No emulation of unimplemented instructions needed. A nonstupid OS would also flag that thread to only execute on the better cores from then on. Is there something about AVX512 that makes this hard?
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.