Products > Programming

Python text file processing - extracting data based on data found

<< < (7/7)


--- Quote from: eliocor on October 18, 2019, 07:48:01 pm ---
--- Quote ---Or look at that gcode parsing library someone mentioned above. 
However, that's getting waaay too far ahead of what the OP was after.
--- End quote ---

I'm sorry, but it is EXACTLY what the OP asked for:

--- End quote ---

For clarity sake, parsing G-code is the exercise I chose to learn how to parse just about any text data. The primary end product is:

--- Code: ---from eevblog import skills

--- End code ---

At the end of the day, I am looking to have a solid toolbox to develop code to parse all sorts of data where g-code is simply the first step. I had found the pygcode library but was not able to fully understand it at first. After spending a number of hours studying general Python syntax and libraries, I think I can follow along now and learn something from how the author approached the task. Others have already suggested stepping through other parsing libraries for various formats - I agree that is a good way to expose some of the key concepts and how they are practically implemented in a Python environment.

Now that that weekend is here.....I can leave all the C coding at the office and dive back into Python. This conversation has warmed up nicely and I look forward to some learning experiments.


--- Quote from: janoc on October 15, 2019, 06:54:51 pm ---While it is possible to create a regular expression that would grab only the ones that end with 6 and ignore everything else it will be needlessly complex and it is unlikely you will only ever be interested in the M6 command. If you are going to look for M1, M4 or others later, you will have to define specific regexp for each = pain in the butt to write, slow (matching regexps is fairly expensive) and not maintainable code, with a ton of regexps that do almost the same thing, differing only in the value they are searching for.
--- End quote ---
What you are searching for might be a "boundary" = "\b" (boundary between words) or the opposite "word boundary" = "\w" (the word), does actually split what you want (usually non alphanumeric-characters).

So lines with M6 or M06 in it would be found using this regex:

--- Code: ---/\bM[0]*6\b/

--- End code ---

Of course it is also possible to search for variables /\b$str\b/, where $str can be a regex of its own.


[0] Message Index

[*] Previous page

There was an error while thanking
Go to full version