As I understand it, modern x86 decode doesn't work like this at all.
Instead, they use brute force speculation. They just toss in N copies of the early decode logic, where N is a fairly large number. The first of these assumes that byte offset 0 of the slab of bytes provided by the fetcher is a valid x86 instruction, the second early decoder starts from byte offset 1, the third from offset 2, and so forth. There is no horizontal borrowing between these, just parallel decode of every possible starting byte position.
The next layer of decode logic uses all the lengths determined by the early decoders, starts from the first one known to be on a correct boundary, and uses it to determine which early decoder started on the correct second boundary, then uses that one to determine where the third true boundary is, and so forth.
At this point, the output of all the loser early decoders which started on false boundaries is simply thrown away. It was purely speculative work, and as with all forms of speculation, when your guess doesn't work, goodbye!
After early decode, things narrow down to the nominal number of instructions decoded per cycle for that processor, e.g. 6 in Intel's modern performance cores. It's here where they begin to perform more committal tasks like assigning rename registers (or equivalent), cracking into µops, and so forth.