Receive your bytes in to a buffer, wait for a termination sequence to be received (e.g. CRLF), and then process the string in its entirety afterwards. Its basically how everything works because you then have the full state and contents of the request or response to examine at will.
This can still be tied in to a state machine to ensure that the correct sequence of commands/responses are transmitted/received. You just wont be processing in real time, byte by byte and trying to keep track of where you are in respect to a particular sequence of bytes which should make things much more simple.
You might only look at the first 4 bytes, for example, to see if that matches any of the 4 byte commands/responses you are expecting, or maybe you look at 5, or more, or some combination. It all depends on what you are looking for to verify a correct command/response at any given state of your state machine. If there are "parameters" contained within the message as well, you should be able to find them by parsing the string and picking out the pieces which live between the appropriate delimiter character.
Its going to be fiddly and require effort, but thats what you get with microcontrollers.
If what you want is something more cookie cutter and ready to go, you might want to look at an Arduino or maybe a Raspberry Pi or similar where there may be other libraries or languages available that are easier to work with and require you to do less ground work to get your project up and running.