Disclaimer: I don't do PICs (just ARM, a little AVR, old time 8 bitters).
So some parts of the code are just inscrutable to me (#pragmas etc.)
.
I have this feeling that i'm using wrong delays in my code
But yes, I think you missed something in you lcd_e_toggle() (mad) macro.
The macro as it is will simply toggle the E line (H->L or L->H):
#define lcd_e_toggle() toggle (LCD_E_PORT, LCD_E_PIN)
#define toggle(LAT,PIN) LAT ^= _BV(PIN)
But in your code you use it as it would cycle the E line:
/* output high nibble first */
[...large inefficient way to write a contiguous series of bits...]
lcd_e_toggle();
/* output low nibble */
[...large inefficient way to write a contiguous series of bits...]
lcd_e_toggle();
This also happens in the initialization routine, AFAICS, but not in the read read routine (where a different mistake is found).
The E line of a 447780 needs to go High, stay High for minimum of 500ns (at 5V, more at a lower supply) then go Low.
For writing you need to set E high, set the output data, wait a little (myabe not needed, I don't know your PIC's speed) then set it Low.
For reading, set E High, wait a little (see above), read the data, set E Low, instead what you are doing is: read data, E=H, wait, E=L; that will not work since 44780 internal state machine is tied to E transitions:
data = LCD_PORT << 4; /* read high nibble first */
lcd_e_high();
lcd_e_delay();
lcd_e_low();
These are the first two things I found. I did not check much further (e.g. init sequence, display addressing) but these should get you going.
Another remark:
The code tries to take care of data bits spread on different ports and positions, but is not very consistent, e.g. in lcd_read() if the bits are on the same port they are assumed to be adjacent. I would write for a generic case and a case where same port and adjacent bits are used, and select at compile time.
Hope this helps.