I'd be interested in learning more about this. As I understand it, a central function of the extra bits is to offset transient DC biases caused by an imbalance between 0's and 1's in the data stream. This gives a statistical probability that the DC bias won't exceed a certain threshold. I'd like to understand how this probability is calculated from PCIe's encoding algorithm.
There's another key reason for the extra bits: you want to limit the run length of all-zeroes or all-ones to guarantee a minimum frequency of transitions. This is important for the CDR (Clock Data Recovery) circuit in the receiver. CDR consists of some analog or mixed signal wizardry I've never learned the details of, and as the name implies, it reconstructs the transmitter's clock signal from the data signal. For that to work, the data signal can't be static. It has to transition often enough for CDR to lock on and track. It's a fair bet that some kind of PLL is involved, but like I say, never learned the details.
You mention statistical probability and DC bias. This is actually a key difference between 8b10b and 128b130b.
8b10b splits each incoming byte into a 5-bit and a 3-bit group, encodes the 5-bit group in 6 bits, and the 3-bit group in 4 bits. The 6-bit and 4-bit alphabets contain alternate encodings for some (not all) input values. If an encoding has an equal number of ones and zeroes, there is only one encoding, but if it has unequal, the difference between the count of ones and zeroes is always 2. For these, the alternate encoding is an inversion, which produces the opposite difference in ones/zeroes counts. Example: 011011 (+2 disparity) and 100100 (-2 disparity) both decode to the same 5-bit value. (Note: 8b10b doesn't use all possible 2^6 / 2^4 bit patterns in its symbol alphabets, only those with -2, 0, or +2 disparity.)
You can probably guess where that's going. 8b10b encoders maintain a "running disparity" counter. It's always -1 or +1. Whenever input data selects one of the symbol table entries where you have a choice between a +2 and -2 disparity encoding, if the RD counter is -1 it chooses the +2 encoding, and if the RD counter is +1 it chooses the -2 encoding.
So, 8b10b isn't merely statistically DC balanced, it is mathematically guaranteed to be essentially perfectly balanced. There is no possible input sequence which can force an 8b10b encoder to make its running disparity reach any value more positive than +1 or more negative than -1.
Other properties of 8b10b encoding guarantee that the maximum run length (consecutive bits with the same value) is 5 bits.
8b10b was designed when logic was really expensive, there wasn't as much pressure to extract as much of the raw channel bandwidth as possible, and CDR wasn't as good as it is now. It's costly; 20% is a lot of overhead. But in return you get very low implementation complexity (a couple very small ROMs and simple control logic) and essentially perfect disparity and edge rate guarantees.
64b66b and 128b130b are modern era updates on the concept, and these are where statistical behavior comes in. Both eschew the concept of tracking running disparity. They even give up on short guaranteed run length limits, because state of the art CDR has gotten a lot better at surviving long runs.
Worst case run length is bounded by occasionally inserting 01 or 10 into the output stream. These are the extra 2 bits per input word in both 64b66b and 128b130b.
The next piece to the puzzle is scrambling (XORing) the 64 or 128 data payload bits with the output of a self-synchronizing linear feedback shift register (LFSR). Self-sync means that due to the properties of LFSRs, the receiver can easily self-synchronize to the transmitter's scrambling sequence by feeding incoming data through essentially the same LFSR circuit and letting its feedback XORs cancel out the scrambling from the transmitter.
Since LFSRs produce balanced pseudorandom sequences with small run lengths, scrambling reduces the average run length well below the worst case of 65 or 129 bits. It also provides statistical bounds on running disparity; because the LFSR sequence is balanced the pattern on the link should also be close to balanced.
The worst possible case for 64b66b and 128b130b is when the data payload is the same LFSR sequence (or its inverse), and is synchronized to the scrambler, such that the scrambled data is all zeroes or all ones. I don't remember if there are any tricks to try to mend DC balance here, but I'm fairly sure that CDR is expected and engineered to live off the worst case diet of two edges per 66 or 130 bits.