some rough numbers:
20 ticks to process the isr itself is reasonable but let's budget for 80 ticks.
XC8 1.12 free mode, PIC12F675 @ 1MIPS, 2x 1Khz pulse trains (=1 encoder), cpu load is 25 - 28%. That translates into about 75 ticks per isr invocation, vs. 80 ticks estimated above. (edit: I should add that the implementation processes both edges using IOC interrupts, entirely in the isr).
the gist of the code contains 3 lines:
//read the enc port -> done in clearing the gpif flag
//update encoder state
_enc_state << = 2; //left shift for new input
_enc_state |= (IO_GET(tmp, ENC_A)?0x02:0) | (IO_GET(tmp, ENC_B)?0x01:0);
return _enc_dir[_enc_state & 0x0f];
1khz per pulse train, 2 pulse trains per encoder, 6 encoders total means 1k x 2 x 6 x 100 or 1.2 mips on the high side and 0.5mips on the low side.
1.2MIPS for 6 encoders seems pretty accurate,

There are chances for improvement, however.