Products > Programming

Python text file processing - extracting data based on data found

(1/7) > >>

Python is fairly new to me. I have a considerable amount of experience in lower-level languages, generally for embedded designs.

Extract pieces of data from a text based G-Code program. In general, looking for certain strings followed by gathering other elements before and after that instance. Hoping for an overview of typical options to do this in Python with reasonable elegance.

For the past couple of days, I have been doing a general education effort in Python and trying to get my head separated from my usual C/C++ thoughts. I have opened a few sample text files and used some standard library methods like find, index, and regular expressions. That has given me some ideas on how to iterate through a text file in a basic way, find things in a basic way, etc.

Some sample data:

--- Code: ---O0401( PGM-M05-610-0325 SB2 V-LOCK H-PLATE B-V2C S1 )
( DATE - SEP. 04 2018 )
( TIME - 9:24 PM )
G0 G17 G40 G80 G90 G94 G98
G0 G28 G91 Z0.
(  TOOL-2  3/8 FLAT ENDMILL VIPER    D OFF-2  LEN-2  DIA-.375 )
T2 M6
G0 G90 G54 X-2.8392 Y1.6562 S15000 M3
G43 H2 Z.24 /M8
G1 Z-.02 F150.
X2.725 F200.
--- End code ---

In this example, I would be looking for the string "M6", get its index, and search forward/backward until I find the 'T' followed by a number, that number would become an element needed later.

G-Code has flexibility and programmers have various styles. M6 calls a tool change on a CNC mill. T is the number of the tool. It can be written in a few ways.
T6 M6
M6 T6
....are all valid commands to call a tool change for the #6 tool.

Ultimately, I would be extracting a lot more of the individual elements of the g-code for analysis but if I can find the M6 and the T6 close to it, I can deal with all the rest. There is a lot of repetition, so in general, I would be looking for an event and then examining the data surrounding that event.

Regular Expressions seem to be a powerful way to filter and search text files. Not sure if they are the best for this type of application. The learning curve on regular expressions is not trivial - the powerful nature ensures a long list of syntax rules.

I also used what appears to be a simpler method of .find() where I could get the index position of a string and presumable walk around that index until I find what I am looking for.

I feel like my experience in C has me over-thinking this in Python terms where there are libraries galore to deal with this sort of problem. Since I am trying to drastically improve my Python skill, the goal here is NOT to have someone code it for me, but rather point me in the right direction. I need to learn this, not jsut copy/paste code that I do not understand.

Grateful in advance for any guidance.

Python is all and good, and I don't know the exact end goal but .....
egrep "M6|M06"  filenamehere will give you a list of all of 'em

( egrep -n  "M6|M06"  filenamehere   will show line numbers)

Useless in enhancing your python skills , but you do end up with every M6 and M06 in your file in case you have work to do :)

I  might be one of the few that doesn't do python so don't expect any code coming your way, but I havescrewed around with strings
What it sounds like you want to do is find out if a linestring contains one of the toolchangetrings you are looking for, and if it does, substract the toolschange string from the linestring to leave you with the toolsectedstring

If that's what you are after ...

don't take the link personally :) but these should get you going :

Here is a small example for Python 3 (make sure you have a recent version due to the use of f-strings!) that I have concocted:

--- Code: ---import re

# Add new patterns as required here
regexs = [re.compile("T(\d+)\s*?M6"),       # T followed by a number, followed by optional whitespace and M6
          re.compile("M6\s*?T(\d+)")]            # M6 followed by optional whitespace, then T and a number

with open("test.gcode") as f:
    lines = f.readlines()       # Let's be lazy and read the entire file into memory
                                # It can obviously be done line by line too, since we are analyzing
                                # the content by lines anyway

    for line_no, line in enumerate(lines):
        for r in regexs:
            match =, line)  # Search through the string to see if there is match for our regex

            if match is not None:
                # assume that there is only a single match on a line - group #1
                group =        # Group 0 - entire thing matched by the regex, groups 1-n are content of the parentheses
                number = int(group)           # Get rid of any leading zeroes and converts it to number

                print(f"{line_no}: Match at characters {match.start(1)}:{match.end(1)}, found: {number}")

--- End code ---

(if the formatting/whitespace gets mangled by the forum, here is a better copy: )

It is not the only way to do it, likely not the most efficient neither and there is zero error checking for brevity but it does what you are after. It assumes your GCode is in a file "test.gcode" in the same directory. If I run it on your example GCode, it prints:

--- Code: ---$ python3
10: Match at characters 1:2, found: 2
--- End code ---

(10 is a line number, 1:2 - character position where the match on that line is and finally the number following the T)

It uses regular expressions, they are probably the easiest way how to code this if you have multiple ways the things could be written - different order, optional whitespace, leading zeroes or not, etc. Doing this manually by searching through the string would get really really painful fast, even though Python has good facilities to do that.

The script has the patterns in a list and runs through them, the idea is that you could have a lot of different patterns so a single regex with alternatives would get unwieldy really fast. Instead of printing you can then call some function to process the data or even modify the string and output a modified version - up to you.

Regular expressions are not that complex if you keep it to small patterns. I have found this tool really useful for testing stuff out quickly:

If you try the expressions I have used there you will get a detailed explanation of what they do as well.


--- Quote from: janoc on October 13, 2019, 09:26:18 pm ---Here is a small example for Python 3 (make sure you have a recent version due to the use of f-strings!) that I have concocted:
--- End quote ---

Wow, thanks.....trying this example. RegEx looks like it is worth the cost of learning for this type of data extraction. I ran my full file through it and it works well. The code is concise.

Since it returns the line number, I can use that to look for additional data before and after the M6. It also looks like RegEx is an easy way to ignore g-code comments that are encapsulated in parentheses:  (comment with M6 T6)

...continuing to experiment.


[0] Message Index

[#] Next page

There was an error while thanking
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod