Regular expressions in c++11 - c++

I want to parser cpu info in Linux. I wrote such code:
// Returns full data of the file in a string
std::string filedata = readFile("/proc/cpuinfo");
std::cmath results;
// In file that string looks like: 'model name : Intel ...'
std::regex reg("model name: *");
std::regex_search(filedata.c_str(), results, reg);
std::cout << results[0] << " " << results[1] << std::endl;
But it returns empty string. What's wrong?

Not all compilers support the full C++11 specification yet. Notably, regex_search does not work in GCC (as of version 4.7.1), but it does in VC++ 2010.

You didn't specify any capture in your expression.
Given the structure of /proc/cpuinfo, I'd probably prefer a line
oriented input, using std::getline, rather than trying to do
everything at once. So you'ld end up with something like:
std::string line;
while ( std::getline( input, line ) ) {
static std::regex const procInfo( "model name\\s*: (.*)" );
std::cmatch results;
if ( std::regex_match( line, results, procInfo ) ) {
std::cout << "???" << " " << results[1] << std::endl;
}
}
It's not clear to me what you wanted as output. Probably, you also
have to capture the processor line as well, and output that at the
start of the processor info line.
The important things to note are:
You need to accept varying amounts of white space: use "\\s*" for 0 or more, "\\s+" for one or more whitespace characters.
You need to use parentheses to delimit what you want to capture.
(FWIW: I'm actually basing my statements on boost::regex, since I
don't have access to std::regex. I think that they're pretty similar,
however, and that my statements above apply to both.)

Try std::regex reg("model_name *: *"). In my cpuinfo there are spaces before colon.

Related

How to match complex strings with regular expressions

I am a newbie in C++, I am using the regular expression function, but I have not been able to get the results I want
c++ code:
#include <regex>
std::string str = "[game.exe+009E820C]+338";
std::smatch result;
std::regex pattern("\\[([^\\[\\]]+)\\]");
std::regex_match(str, result, pattern);
// no result
std::cout << result[1] << std::endl;
I am familiar with javascript regular expressions, so I can get the value I want:
'[game.exe+009E820C]+338'.match(/\[([^\[\]]+)\]/)[1] => game.exe+009E820C
Is my c++ code doing something wrong
If you want to access the capture groups, it appears that the regex_match API requires a pattern which matches the entire input. Also, to avoid getting bogged down by a negative character class which includes a closing square bracket, I recommend using the Perl lazy dot instead. Putting all this together:
std::string str = "[game.exe+009E820C]+338";
std::smatch result;
std::regex pattern(".*\\[(.*?)\\].*");
std::regex_match(str, result, pattern);
std::cout << result[1] << std::endl;
This prints:
game.exe+009E820C

Properly handle escape sequences in strings from argv in C++

I'm writing a larger program that takes arguments from the command line after the executable. Some of the arguments are expected to be passed after the equals sign of an option. For instance, the output to the log is a comma separated vector by default, but if the user wants to change the separator to a period or something else instead of a comma, they might give the argument as:
./main --separator="."
This works fine, but if a user wants the delimiter be a special character (for example: tab), they might expect to pass the escape sequence in one of the following ways:
./main --separator="\t"
./main --separator='\t'
./main --separator=\t
It doesn't behave the way I want it to (to interpret \t as a tab) and instead prints out the string as written (sans quotes, and with no quotes it just prints 't'). I've tried using double slashes, but I think I might just be approaching this incorrectly and I'm not sure how to even ask the question properly (I tried searching).
I've recreated the issue in a dummy example here:
#include <string>
#include <iostream>
#include <cstdio>
// Pull the string value after the equals sign
std::string get_option( std::string input );
// Verify that the input is a valid option
bool is_valid_option( std::string input );
int main ( int argc, char** argv )
{
if ( argc != 2 )
{
std::cerr << "Takes exactly two arguments. You gave " << argc << "." << std::endl;
exit( -1 );
}
// Convert from char* to string
std::string arg ( argv[1] );
if ( !is_valid_option( arg ) )
{
std::cerr << "Argument " << arg << " is not a valid option of the form --<argument>=<option>." << std::endl;
exit( -2 );
}
std::cout << "You entered: " << arg << std::endl;
std::cout << "The option you wanted to use is: " << get_option( arg ) << "." << std::endl;
return 0;
}
std::string get_option( std::string input )
{
int index = input.find( '=' );
std::string opt = input.substr( index + 1 ); // We want everything after the '='
return opt;
}
bool is_valid_option( std::string input )
{
int equals_index = input.find('=');
return ( equals_index != std::string::npos && equals_index < input.length() - 1 );
}
I compile like this:
g++ -std=c++11 dummy.cpp -o dummy
With the following commands, it produces the following outputs.
With double quotes:
/dummy --option="\t"
You entered: --option=\t
The option you wanted to use is: \t.
With single quotes:
./dummy --option='\t'
You entered: --option=\t
The option you wanted to use is: \t.
With no quotes:
./dummy --option=\t
You entered: --option=t
The option you wanted to use is: t.
My question is: Is there a way to specify that it should interpret the substring \t as a tab character (or other escape sequences) rather than the string literal "\t"? I could parse it manually, but I'm trying to avoid re-inventing the wheel when I might just be missing something small.
Thank you very much for your time and answers. This is something so simple that it's been driving me crazy that I'm not sure how to fix it quickly and simply.
The escape sequences are already parsed from the shell you use, and are passed to your command line parameters array argv accordingly.
As you noticed only the quoted versions will enable you to detect that a "\\t" string was parsed and passed to your main().
Since most shells may just skip a real TAB character as a whitespace, you'll never see it in your command line arguments.
But as mentioned it's mainly a problem of how the shell interprets the command line, and what's left going to your program call arguments, than how to handle it with c++ or c.
My question is: Is there a way to specify that it should interpret the substring \t as a tab character (or other escape sequences) rather than the string literal "\t"? I could parse it manually, but I'm trying to avoid re-inventing the wheel when I might just be missing something small.
You actually need to scan for a string literal
"\\t"
within the c++ code.

How to match absolute value using regex

I am having trouble with absolute value in regex in C++. This is what I have as the pattern:
std::tr1::regex loadAbsNM("load -|M\\((\\d+)\\)|"); // load -|M(x)|
I am trying to use std::tr1::regex_match( IR, result, loadNM ) to match. But it is not matching anything, even though it should be.
I'm using Visual Stuido 2010 compilier
shortened version of program (included above is iostream and regex)
int main()
{
std::string IR = "load -|M(x)|";
std::smatch result;
std::tr1::regex loadAbsNM("load -|M\\((\\d+)\\)|");
if( std::tr1::regex_match( IR , result, loadAbsNM ) )
{
int x = 2;
std::cout << "matched!" << std::endl;
}
else
{
std::cout << "!UNABLE TO DECODE INSTRUCTION!" << std::endl;
}
}
output produced
!UNABLE TO DECODE INSTRUCTION!
Note that from your code, you're not going to have a match. The letter x won't match the regex \d+.
Also, I'm not too sure whether you need a backslash in front of the pipe character. As you may know, pipe (|) is used to separate possible entries: (a|b) means a or b.
Finally, since their is a pipe at the end, the expression matches the empty string which is often a bad idea.
I would suggest something like this:
"load -\\|M\\((\\d+)\\)\\|"
But that won't match:
"load -|M(x)|"
You'd need to use a number instead of 'x' as in:
"load -|M(123)|"

reading a "\n" string and writing to textfile?

I'm struggling with the following: I'm reading from an XML file the following std::stringstream
"sigma=0\nreset"
Which after some copying&processing is written to a text-file. And I was hoping for the following
sigma=0
reset
But sadly I only get
sigma=0\nreset
but when I directly stream
out << "sigma=0\nreset"
I get:
sigma=0
reset
I currently suspect that some qualifier of the "\n" is lost during the "copy&processing"... is this possible? How to track down a "\n" in the stream which isn't a linefeed anymore?
Thank you!
It's because the output functions doesn't handle the escape sequences like '\n', it's the compiler that does and then only for literals. The compiler knows nothing of the contents of strings, and so can not do the translation "\n" to newline when inside a string.
You have to parse the string itself, and write out newlines when appropriate.
Assuming that the std::stringstream actually contains what is equivalent to the literal "sigma=0\\nreset" (length = 14 characters) and not "sigma=0\nreset" (length = 13 characters), you'll have to replace it yourself. Doing so is not very difficult, either use boost's replace_all (http://www.boost.org/doc/libs/1_53_0/doc/html/boost/algorithm/replace_all.html), or std::string::find and std::string::replace:
std::stringstream inStream;
inStream.str ("sigma=0\\nreset");
std::string content = inStream.str();
size_t index = content.find("\\n",0);
while(index != std::string::npos)
{
content.replace(index, 2, "\n");
index = content.find("\\n",index);
}
std::cout << content << '\n';
Note: you may want to consider cases when the system end-of-line is something other than "\n"
If the std::stringstream actually contains "sigma=0\nreset", then please post the code that does the copying/processing and the writing to the text file.

Boost xpressive regex results in garbage character

I am trying to write some code that changes a string like "/path/file.extension" to another specified extension. I am trying to use boost::xpressive to do so. But, I am having problems. It appears that a garbage character appears in the output:
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
using namespace std;
int main()
{
std::string str( "xml.xml.xml.xml");
sregex date = sregex::compile( "(\\.*)(\\.xml)$");
std::string format( "\1.zipxml");
std::string str2 = regex_replace( str, date, format );
std::cout << "str = " << str << "\n";
std::cout << "str2 = " << str2 << "\n";
return 0;
}
Now compile and run it:
[bitdiot#kantpute foodir]$ g++ badregex.cpp
[bitdiot#kantpute foodir]$ ./a.out > output
[bitdiot#kantpute foodir]$ less output
[bitdiot#kantpute foodir]$ cat -vte output
str = xml.xml.xml.xml$
str2 = xml.xml.xml^A.zipxml$
In the above example, I redirect output to a file, and use cat to print out the non-printable character. Notice the ctrl-A in the str2.
Anyways, am I using boost libraries incorrectly? Is this a boost bug? Is there another regular expression I can use that can allow me to string replace the ".tail" with some other string? (It's fix in my example.)
thanks.
At least as I'm reading things, the culprit is right here: std::string format( "\1.zipxml");.
You forgot to escape the backslash, so \1 is giving you a control-A. You almost certainly want \\1.
Alternatively (if your compiler is new enough) you could use a raw string instead, so it would be something like: R"(\1.zipxml)", and you wouldn't have to escape your backslashes. I probably wouldn't bother to mention this, except for the fact that if you're writing REs in C++ strings, raw strings are pretty much your new best friend (IMO, anyway).
As Jerry Coffin pointed out to me. It was a stupid mistake on my part.
The errant code is the following:
std::string format( "\1.zipxml");
This should be replaced with:
std::string format( "$1.zipxml");
Thanks for your help everyone.