As you can see, most of us certainly wouldn't go down the same road developing this project. However, for a one-off, or at least very small volume production, a lot of the suggestions to use a better chip and cram more functionality into it are not particularly helpful.
You have a PICkit 2 and a fully set up development environment you are famliar with. To use Microchip's latest and greatest chips, you'd need to spend approx $50 USD on a PICkit 3 + however many days it takes to get the current MPLAB X set up and working. There's also the risk that you will be one of the few unlucky ones and will end up having to find a different PC because for some nebulous and inscrutable reason it isn't stable (or plain doesn't work) on your current one. Saving a few bucks on a couple of tubes of chips + having to take the time to learn a new processor core just doesn't make sense, especially when you already have stock in hand. Upgrading is an investment in the future, not this current project.
However, your board is over-complex. Given a 20MHz clock, a PIC16F876 can produce a 19.5KHz 10 bit hardware PWM. The lowest PWM frequency it can do directly is with the same clock is 1.2KHz, but if you diddle with the PWM registers in the Timer 2 ISR, you can skip cycles with the output either '1' or '0' so with a six cycle counter in the ISR, and a bit of code you could generate the 200Hz low speed PWM mostly using the hardware PWM. The only extra complication would be switching in and out a low pass filter when you transition between fast PWM to generate an analog voltage, and slow PWM. You can actually do that directly by returning the filter cap's ground leg to a PIC I/O pin rather than ground. Set the pin low to connect the cap and enable low pass, set it as input(hi-Z) to disconnect it and disable filtering. That gets the part count down to two chips per channel.
Next, make each channel *ENTIRELY* self-contained, so adding another channel is simple copy/paste. That means ditch the ULN2003 chips and replace them with sensitive gate MOSFETs. You will need clamping diodes as well but those are best fitted across the relay coils, not at the driver. For the same reason, move the channel address decoder to the top left of the board so its outputs can run across the board near the other horizontal tracks and enter the channel module block at the same relative location. Reallocate PIC pins to make these hanges happen.
As others have suggested, stitched ground fill both sides and local 0.1uF decoupling for both the PIC and the OPAMP, and use individual pullups withon each channel and just about all your routing problems should simply go away.
How I would have done it, constrained by a PICkit 2 debugger for development:
There is no need to be running high speed clocks over iboard to board interconnects. That's a potential EMI nightmare. Each board with PICs on it would have its own 20MHz crystal clock. The first PIC would have the crystal and the others would have a daisy-chain clock running from OSC2 (out) to OSC1 (in). You don't need to sync the PWMs and in fact its an advantage not to as if they are synced it increases the stress on the PSU.
I wouldn't have used a parallel data bus. I would have used a one wire logic level serial bus using RS232 framing, probably at 19.2Kbaud (but check the data transfer rate needed) by connecting a Schottky diode between TX and RX to effectively make the TX pin open drain so all the channels can share a bus with a simple pullup. /MCLR reset to all devices (as you have) for initialisation and emergency stop. To identify each chip, there would be a daisy-chained ID line from each PIC to the next, with a pullup between chips. It would be set to '0' by the master at the first chip's input side. When /MCLR is released, if the ID_in is high, the PIC loops waiting for it to go low. As soon as it goes low, the PIC waits for the master controller to send it an address. It acknowledges, sets its ID_out low stores that address and responds only to commands prefixed with that address from then on. The master then sends the next address to the next PIC in the chain. When the master sends an address but doesn't get an acknowledge, it knows all the PICs have been enumerated.
Each PIC only replies in response to a command from the master. The master can send a break on the line to end a command and reply prematurely if it needs to free the bus. Therefore one doesn't have to worry about bus collisions.l
I *MIGHT* wire-or an 'attention' (IRQ) iine back to the master so each PIC can urgently report fault conditions. To do so, when the master receives the attention signal, it would set the ID line high, All the PICs pass along the ID line change as soon as possible. If a PIC has signalled for attention, it reports with its address and the fault code before passing the ID line change along, and releases the attention IRQ line. Once the attention IRQ goes high the master knows all reports have been received, sets ID low again and business as usual resumes. As the comms is in a format that can easily be generated by a PC, I don't need to write the master code to test all the slaves, but can start testing with a terminal program on a PC that lets me manually control the handshake lines.
I'd use PIC16F886 chips, they are 1/3 the price of PIC16F876 and less than half the price of PIC16F876A. As all these chips have two hardware PWM modules (but only one PWM timer) After the initial one channel prototype had been fully tested, I'd have reworked the code to put two channels in one PIC, with a quad OPAMP for the filter/buffer to get it down to one chip per channel. Each PIC would have two addresses, and from the master's and motor drive board's viewpoints would be indistinguishable from a daisy-chained pair of single channel PICs.
All code would be written in C for easier portability to different PICs and much better maintainability than MPASM assembler. Data structures would be in a header shared with the master PIC (if present) and PC application source code.
If you *did* want to go to mass production, then upgrading to a PIC16F1xxx part with more PWM channels, and putting the output stages, relays and a 5V regulator on the same board as the PIC so you have one board that has all you need for somewhere between four and six channels makes sense. That would be a 4 layer surface mount board, prototyped with interior power/ground planes for ease of hack and patch rework but respun with interior signals and exterior planes to reduce EMI before production.