Products > Programming

Python text file processing - extracting data based on data found

<< < (6/7) > >>

maybe this is a too simple solution, but let's try:
install the pygcode library :
pip install pygcode
and use this simple script:

--- Code: ---from pygcode import Line

# read gcodes from file 'part.gcode':
with open('part.gcode', 'r') as fh:
    commands = []  # define an empty list of gcode commands
    # scan whole file for gcode commands
    for line_text in fh.readlines():
        line = Line(line_text)  #decode gcode line

        #print(line)  # will print the line (with cosmetic changes)
        line.block.gcodes  # is your list of gcodes
        line.block.modal_params  # are all parameters not assigned to a gcode,
                                # assumed to be motion modal parameters
        if (line.block.words): # if list of gcode commands is not empty
            commands = commands + line.block.words  # add gcode commands to the list
        #if (line.block.modal_params) :
        #    print ("M ", line.block.modal_params)
        #if (line.block.gcodes):
        #    print("G ", line.block.gcodes)
        if line.comment:
            line.comment.text  # your comment text
            commands.append(line.comment) # add comment to the list

    # scan is ended: print the whole list of commands:
    print(*commands, sep='\n') # add a newline after every list item

--- End code ---

it will scan/parse the 'part.gcode' file and will create a list in which each element is a different gcode command.
You will be able to scan it, identify (using regexp) if the token you are searching is OK and (as you requested) scan backward/forward the list to see if nearby there is your other token (Mxx, Ty).
Not an elegant/efficient solution but a rather workable one.
P.S.: the parser is more intelligent than shown in my script: you can uncomment some of commented the lines to get some hints.

Nominal Animal:
Good point, eliocor.  If one were to parse say CSV or XML files in Python, one should definitely use the existing libraries, too.

Even if one wants to write a parser from scratch, say for learning purposes, or just to get a better understanding of g-code, looking at existing code, and especially the issues others have found, is always useful.

The comments in the code also reveal a lot about different g-code dialects, and how different machines parse and process g-code.  Even if not writing your own, reading the comments in the code is quite informative.

BTW, the libraries will normalize the gcode, converting it from eg: 'M6' to 'M06' easing the token analysis!


--- Quote from: Nominal Animal on October 17, 2019, 07:16:27 pm ---It isn't as nasty as it may sound.

The lexer (the code that uses the compiled regular expressions _command and _comment) can emulate how a typical G-code parser parses the code.  That is, it basically just needs to understand what the next token is, and extract it from the line.

--- End quote ---

Sure, I have never said it can't be done. However, if I faced a task like that where this was possible, I would completely forget about futzing around with regexps and splitting strings and use a proper parser library instead to write the lexer. E.g PLY:,  textX: or maybe pyparsing: It would be much cleaner and more maintainable.

Or look at that gcode parsing library someone mentioned above.

However, that's getting waaay too far ahead of what the OP was after.

--- Quote from: Nominal Animal on October 17, 2019, 07:16:27 pm ---
--- Quote from: janoc on October 17, 2019, 02:00:41 pm ---How can it be a byte array?
--- End quote ---
Like I wrote, if you obtain the G-code data via a socket (a TCP/IP connection, an Unix domain socket, or even a character device), you almost always need to open that in the binary mode.  Instead of forcing the user to remember to convert the bytearray to str via decode(), I added the two lines I thought would avoid that misstep.

Look.  You need to think about what kind of code others will write based on your code.  Consider the case when a user supplies "a random object" to the GCodeLine() constructor.  The only use case that makes sense, is when the user intends that data to be treated as an ASCII string, then parsed as G-code.
This is my assumption based on my experience on how others use awk and python code I've written to mangle HPGL.
I could be wrong, but that is the basis for that initial choice.

--- End quote ---

The problem is that Python isn't awk. If someone gets a binary buffer from a socket then the proper way is to run decode("ascii") on it and be done with it. Not blindly convert random objects to strings all over the place. The issue is not the user intentionally passing e.g. an object to the GCodeLine constructor but doing it by mistake (easy to do, with Python being dynamically typed and function arguments not having types declared). Instead of getting an immediate exception your code will happily convert it and continue running - making for a ton of head scratching later when trying to figure out why it isn't doing what it is supposed to do. Imagine the "fun" of finding a bug like that where only 2-3 lines of a huge file get corrupted/misparsed like this. That's what makes your approach a really terrible example.

--- Quote from: Nominal Animal on October 17, 2019, 07:16:27 pm ---Problem is that Python converts text input from the character set used by the users locale to Unicode.  For example, if I have a file with \xA4 in it, my Python code will provide it in a string as U+20AC (€) if my locale uses ISO-8859-15, but as U+00A4 if my locale uses Windows-1252.  Using UTF-8, Python will raise UnicodeDecodeError.  Because of this, I wanted the code to strip only those Unicode characters that correspond to ASCII whitespace.

--- End quote ---

If you have done the conversion correctly, i.e. used decode("ascii", "replace"), then you wouldn't have a decoding error on the characters that aren't valid ASCII (it would replace them with close character or delete them if you use "ignore" instead of "replace") and you wouldn't need to work around one bug by introducing a potential second one.

Heck, even your approach with str() can do it, because str() has the error argument too, so str(buffer, "ascii", "replace") works as well - with the caveat that it will happily convert arbitrary objects (and not just the intended buffers) and hide bugs in the code, as pointed out elsewhere.

--- Quote from: Nominal Animal on October 17, 2019, 07:16:27 pm ---I am interested in discussing what kind of choices make sense, but honestly, I'm getting pretty pissed off at those choices being called "bugs" even when I've already explained their rationale.  Instead of discussing that, you keep calling the code "buggy" and "overly complicated".  OP is doing this to learn, not to just catch tool changes!

I've tried being civil, and try to get something constructive going, but nothing seems to work with you, so I'll just ignore you from now on.  :-+

--- End quote ---

I am sorry? You do realize that explaining a rationale for something doesn't make the code any less incorrect, right?  I am trying to be constructive here, explaining at length my reasoning and giving examples how it actually should be done instead. I guess you have missed that part, being busy getting offended. But what do I know, only using Python professionally for some 18-something years ...

I rest my case, I hope the OP got what they needed. rx8pilot, feel free to PM me if you have any other Python-related questions.


--- Quote ---Or look at that gcode parsing library someone mentioned above. 
However, that's getting waaay too far ahead of what the OP was after.
--- End quote ---

I'm sorry, but it is EXACTLY what the OP asked for:
--- Quote --- I also used what appears to be a simpler method of .find() where I could get the index position of a string and presumable walk around that index until I find what I am looking for.
--- End quote ---

Maybe not elegant, but good enough without ranting about!


[0] Message Index

[#] Next page

[*] Previous page

There was an error while thanking
Go to full version