EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: rodolfok on March 03, 2020, 05:36:53 pm

Title: C++ regular expressions: how to match spaces
Post by: rodolfok on March 03, 2020, 05:36:53 pm
Hello,
I am trying to match spaces in a regular expression in C++ using the regex library.
I want to match a floating number followed by one or more spaces, then followed by other floating numbers (or letters).
I tried e.g.:

"(0[.]0*[1-9]+)|[1-9][0-9]*([.][0-9]+)?\\s+A"

where the last letter 'A' is just for testing purposes, but this doesn't work. Not even using [\\s]+ instead of \\s+.
I searched for tutorials in the Internet and in books, but don't understand where the error is.
Can someone help please?
Thanks in advance.
Title: Re: C++ regular expressions: how to match spaces
Post by: HwAoRrDk on March 03, 2020, 05:57:11 pm
Which regex library, specifically? The one in C++11's stdlib?

Are you sure that implementation supports the '\s' meta sequence? Some regular expression libraries do not support these backslash meta sequences. Perhaps you need to specify some kind of option to switch from 'basic' expressions to 'extended'. Have you tried it substituting the '\s' for a literal space character to see if that works?

A handy site for testing and figuring out regexes is https://regex101.com/. If I try yours there, the expression works mostly as expected (doesn't match numbers consisting of all zero digits, e.g. "0.00"). Although, that is against a PCRE-compatible implementation, which I don't know if stdlib's (or whatever library you're using) is.
Title: Re: C++ regular expressions: how to match spaces
Post by: mrflibble on March 03, 2020, 05:58:54 pm
There are so many things that could be causing it... Quick debug list:
- replace '\\s' by '.'  Yes, match any char. Does it still not do what you'd expect? -> Problem not exclusive to space matching.
- replace '\\s' by ' '  Yes, match just the space character. Does it suddenly work for testcases with just regular space at that location? --> either \s not implemented, or wrong escaping.
- replace '\\s' by '\\\\s'. Suddenly works? --> have fun escaping.
- replace all instances of (x)+ by (x)(x)* Fixed? --> + operator not operating as you expect.
etc.

Does it really work correctly for all other cases? So if you remove the '\\s+' from your regex, and you change all testcases to match (i.e no whitespace there), does it work for that?
Title: Re: C++ regular expressions: how to match spaces
Post by: rodolfok on March 03, 2020, 06:51:56 pm
I am using the Dev-C++ compiler for Windows, the <regex> library is the one provided.
I tried your suggestions, but could not obtain any different result, there must be some implementation issue, but at the moment I cannot find it out.
Although I'm curious to know why it doesn't work, I changed my RegEx using a '+' instead of a whitespace, the purpose being just to separate two parts of the string. The escape '\\+' is correctly recognized, so I use

"(0[.]0*[1-9]+)|[1-9][0-9]*([.][0-9]+)?\\+A"

now, which works.
Thanks for your help.
Title: Re: C++ regular expressions: how to match spaces
Post by: ve7xen on March 03, 2020, 09:31:20 pm
If you are using <regex>, this is probably the STL. Multiple grammars are available, including ECMAScript and both POSIX variants. The default is ECMAScript (same as JavaScript), so browser-based tools like regexr.com (http://regexr.com) (my favourite) should be accurate.

To avoid escaping issues, I recommend you use C++11's raw string literals (https://docs.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=vs-2019) instead of normal double quotes, and this is probably one of the main reasons they were added to the language:

Code: [Select]
std::regex myre(R"((0[.]0*[1-9]+)|[1-9][0-9]*([.][0-9]+)?\s+A)");

std::regex_search("0.123", myre); // true
std::regex_search("51.1  A", myre); // true