EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: westfw on November 11, 2019, 10:11:08 am

Title: C++ Template Meta Programming for regular expressions?
Post by: westfw on November 11, 2019, 10:11:08 am
As I understand things, regular expressions require that the standard format we're used to be "compiled" into an FSM before actual processing occurs.  I'm curious whether anyone has written a C++ library that does the FSM generation at compile time rather than during execution (presumably using template meta programming?)
Title: Re: C++ Template Meta Programming for regular expressions?
Post by: Ash on November 11, 2019, 10:28:09 am
It is possible, and there is a pretty amazing library under development:

https://github.com/hanickadot/compile-time-regular-expressions

I've not used it, but I've been following its development.. there are a number of CppCon videos on Youtube about it.

*edit* Just a warning you will need a modern compiler.. C++17 minimum I think.

Enjoy,
Ash.
Title: Re: C++ Template Meta Programming for regular expressions?
Post by: magic on November 11, 2019, 10:14:34 pm
It's such a basic problem I'm sure there are tools which take regular expression and spit out a C function which recognizes it.
Title: Re: C++ Template Meta Programming for regular expressions?
Post by: SiliconWizard on November 11, 2019, 11:48:49 pm
As it's possible to program a Tetris game with C++ templates apparently, what you're after should be perfectly possible. I don't know of any such available solution, but I'm sure it's possible.

Apart from performance reasons, what would be your rationale for using that as opposed to a generic regex library? Code size I suppose?

Edit: just found this: https://github.com/NTSFka/template_regex
Title: Re: C++ Template Meta Programming for regular expressions?
Post by: Cerebus on November 12, 2019, 12:08:18 am
It's such a basic problem I'm sure there are tools which take regular expression and spit out a C function which recognizes it.

Since the very earliest versions of Unix, it's called 'lex'. Not the easiest thing to use, but it's there and pretty much always has been.
Title: Re: C++ Template Meta Programming for regular expressions?
Post by: westfw on November 12, 2019, 12:36:18 am
Quote
what would be your rationale for using that as opposed to a generic regex library? Code size I suppose?
Mainly code and data memory size.  For the traditional "generate first", you need the FSM generation code present, and the resulting FSM will end up in RAM.  Doing things at compile time, you could at least theoretically eliminate the generation code and the original string, and put the whole FSM in ROMish memory.

(Mostly, this seemed like an "obvious over-optimization" that someone would have done already.)

Quote
it's called 'lex'
Hmm.  I haven't actually used LEX, but I never got the impression that it could be used similarly to modern regexp libraries, or that it aimed at producing particularly efficient code.


Title: Re: C++ Template Meta Programming for regular expressions?
Post by: Cerebus on November 12, 2019, 01:21:42 am
Hmm.  I haven't actually used LEX, but I never got the impression that it could be used similarly to modern regexp libraries, or that it aimed at producing particularly efficient code.

Lex is a bit specialised, in that it's intended to really be used to produce the lexical front end for parsing tools. But at its heart it's just a regex => FSM generator. The generated code is very fast, table based, and as efficient as you're going to get without actually generating object code that directly executes the FSM. More modern implementations such as GNU flex are functionally similar but a bit easier to use. Lex and flex are really at home when you're trying to evaluate several regexes in parallel and finding which one matches, which is what they are designed for.

If you're desperate for a solution, you could cobble one with lex or flex, but, were it me, it's not the route I'd take if I could find something better fitted. As someone else said, it's unlikely that there isn't a tool out there that's a better fit but, offhand, I don't know of one.

By pure chance it happens that I may well be starting on writing a lex/flex style tool myself in the near future. If and when I do I'll consider whether it's practical to produce a sideline to it that's targetted at generating simple standalone recogniser routines from regexes.
Title: Re: C++ Template Meta Programming for regular expressions?
Post by: magic on November 13, 2019, 09:41:12 am
Another RE to C compiler.

https://re2c.org/manual/manual.html