It isn't as nasty as it may sound.
The lexer (the code that uses the compiled regular expressions _command and _comment) can emulate how a typical G-code parser parses the code. That is, it basically just needs to understand what the next token is, and extract it from the line.
...
Sure, I have never said it can't be done. However, if I faced a task like that where this was possible, I would completely forget about futzing around with regexps and splitting strings and use a proper parser library instead to write the lexer. E.g PLY:
https://www.dabeaz.com/ply/ply.html, textX:
https://github.com/textX/textX or maybe pyparsing:
https://github.com/pyparsing/pyparsing. It would be much cleaner and more maintainable.
Or look at that gcode parsing library someone mentioned above.
However, that's getting waaay too far ahead of what the OP was after.
How can it be a byte array?
Like I wrote, if you obtain the G-code data via a socket (a TCP/IP connection, an Unix domain socket, or even a character device), you almost always need to open that in the binary mode. Instead of forcing the user to remember to convert the bytearray to str via decode(), I added the two lines I thought would avoid that misstep.
Look. You need to think about what kind of code others will write based on your code. Consider the case when a user supplies "a random object" to the GCodeLine() constructor. The only use case that makes sense, is when the user intends that data to be treated as an ASCII string, then parsed as G-code.
This is my assumption based on my experience on how others use awk and python code I've written to mangle HPGL.
I could be wrong, but that is the basis for that initial choice.
The problem is that Python isn't awk. If someone gets a binary buffer from a socket then the proper way is to run decode("ascii") on it and be done with it. Not blindly convert random objects to strings all over the place. The issue is not the user intentionally passing e.g. an object to the GCodeLine constructor but doing it by mistake (easy to do, with Python being dynamically typed and function arguments not having types declared). Instead of getting an immediate exception your code will happily convert it and continue running - making for a ton of head scratching later when trying to figure out why it isn't doing what it is supposed to do. Imagine the "fun" of finding a bug like that where only 2-3 lines of a huge file get corrupted/misparsed like this. That's what makes your approach a really terrible example.
Problem is that Python converts text input from the character set used by the users locale to Unicode. For example, if I have a file with \xA4 in it, my Python code will provide it in a string as U+20AC (€) if my locale uses ISO-8859-15, but as U+00A4 if my locale uses Windows-1252. Using UTF-8, Python will raise UnicodeDecodeError. Because of this, I wanted the code to strip only those Unicode characters that correspond to ASCII whitespace.
If you have done the conversion correctly, i.e. used decode("ascii", "replace"), then you wouldn't have a decoding error on the characters that aren't valid ASCII (it would replace them with close character or delete them if you use "ignore" instead of "replace") and you wouldn't need to work around one bug by introducing a potential second one.
Heck, even your approach with str() can do it, because str() has the error argument too, so str(buffer, "ascii", "replace") works as well - with the caveat that it will happily convert arbitrary objects (and not just the intended buffers) and hide bugs in the code, as pointed out elsewhere.
I am interested in discussing what kind of choices make sense, but honestly, I'm getting pretty pissed off at those choices being called "bugs" even when I've already explained their rationale. Instead of discussing that, you keep calling the code "buggy" and "overly complicated". OP is doing this to learn, not to just catch tool changes!
I've tried being civil, and try to get something constructive going, but nothing seems to work with you, so I'll just ignore you from now on.
I am sorry? You do realize that explaining a rationale for something doesn't make the code any less incorrect, right? I am
trying to be constructive here, explaining at length my reasoning and giving examples how it actually should be done instead. I guess you have missed that part, being busy getting offended. But what do I know, only using Python professionally for some 18-something years ...
I rest my case, I hope the OP got what they needed. rx8pilot, feel free to PM me if you have any other Python-related questions.