set boost regex from foreign source

set boost regex from foreign source - c++

I need to parse log and I`ve good working regex, but now I need to set regex from config file and here is problem.
int logParser()
{
std::string bd_regex; // this reads from config in other part of program
boost::regex parsReg;
//("(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])");
try
{
parsReg.assign(bd_regex, boost::regex_constants::icase);
}
catch (boost::regex_error& e)
{
cout << bd_regex << " is not a valid regular expression: \""
<< e.what() << "\"" << endl;
}
cout << parsReg << endl;
// here it looks exactly like:
// "("(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])");"
int count=0;
ifstream in;
in.open(bd_log_path.c_str());
while (!in.eof())
{
in.getline(buf, BUFSIZE-1);
std::string s = buf;
boost::smatch m;
if (boost::regex_search(s, m, parsReg)) // it doesn't obey this "if"
{
std::string name, diagnosis;
name.assign(m[2]);
diagnosis.assign(m[4]);
strcpy(bd_scan_results[count].file_name, name.c_str());
strcpy(bd_scan_results[count].out, diagnosis.c_str());
strcat(bd_scan_results[count].out, " ");
count++;
}
}
return count;
}
and I really dont know why the same regex dont work when I tryed to set it from config variable.
Any help will be appreciated (:

On your direct question: Try storing the regex without escapes in the config file
(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])
Besides, I must say, that it looks like you wanted to match backslashes here:
C:.tmp.bd.
In the config, write:
C:\\tmp\\bd\\
In a C++ string literal that would be
"C:\\\\tmp\\\\bd\\\\"

#sehe gives the correct answer.
If this line of code were parsed by the c++ parser,
str = "(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])";
it would unescape the escape character \\ into just an escape: \, then
asign it to variable 'str'. Inside of the variable 'str', it now looks like this:
(C:.tmp.bd.*?)+(([a-zA-Z0-9_]+\.)+[a-zA-Z]{2,4})+(.+[a-zA-Z0-9_])
But, you are reading this text from a file, there is no parsing in a language sense.
You are asigning to 'str', a raw line of text. A line that is not pre-processed by the c++ parser.

Related

C++14 extract quoted strings verbatim including the quotes

Let me have a string:
string tstring = "Some arbitrarily long string which has \"double quotes\" which has to be printed verbatim";
I tried use stringstreams and quoted to extract words
stringstream stream(tstring);
string tepm;
while(stream >> std::quoted(temp))
cout << temp << endl;
But the above skips the quotes in the the quoted string
Some
arbitrarily
.
.
double quotes
.
.
verbatim
I want the quoted string printed verbatim with the quotes included
Some
arbitrarily
.
.
"double quotes"
.
.
verbatim
How do i do this using the quoted function or if it is not possible is there a better way to do this (apart from of course reading character by character and doing all the work myself)
EDIT:
Here is the MCVE as requested
#include <iostream>
#include <string>
#include <sstream>
#include <iomanip>
using namespace std;
int main(){
string sspace = "Hi this is \"Real Madrid\"";
stringstream stream(sspace);
string fpart;
while(stream >> quoted(fpart)){
cout << fpart << endl;
}
return 0;
}

I don't think std::quoted is the right tool for the job here because there's no easy way to tell if the next string had quotes that were stripped before you print it (it discards your delimiter, which is '\"' by default)
I think we can safely fall back on std::string's find method.
Include a subroutine to print all words (space delimited) that aren't within quotes
Continually read until the next quote character taking advantage of find:
Full Code:
void PrintUnquoted(std::string _in)
{
std::istringstream ss(_in);
std::string temp;
while(ss >> temp)
{
std::cout << temp << '\n';
}
}
int main(){
std::string sspace = "Hi this is \"Real Madrid\" etc.";
size_t start = 0;
size_t nextQuote = 0;
while(nextQuote = sspace.find('\"', start), nextQuote != std::string::npos)
{
size_t endQuote = sspace.find('\"', nextQuote+1);
if (endQuote == std::string::npos)
{
throw std::logic_error("Unmatched quotes");
}
PrintUnquoted(sspace.substr(start, nextQuote-start));
std::cout << sspace.substr(nextQuote, endQuote-nextQuote+1) << std::endl;
start = endQuote+1;
}
if (start < sspace.size())
{
PrintUnquoted(sspace.substr(start));
}
return 0;
}
Live Demo
If you need to store the quoted characters within a variable, the line
std::cout << sspace.substr(nextQuote, endQuote-nextQuote+1) << std::endl;
Should be easily modifiable to obtain that.

When used in input, std::quoted removes unescaped quotes from the string and un-escapes escaped quotes. So a string like this:
"some \"string with\" inner quotes"
becomes this when read in:
some "string with" inner quotes
But for this to work, the string must actually be quoted and escaped in the stream. If you do this:
std::string str = "string \"with some\" quotes";
std::stringstream ss (str);
std::cout << "stream contents: " << ss.str() << std::endl;
the stream contents will actually be:
string "with some" quotes
The escaping you're doing when declaring str doesn't end up in the stream, it's there only for the parser. If you want it to be written exactly like that in the output stream you would have to write it like this instead:
std::string str = "\"string \\\"with some\\\" quotes\"";
or better yet:
std::string str = "string \"with some\" quotes";
ss << std::quoted(str);
and leave std::quoted do it's job.

How do I parse a line into pieces and ignore parts of it?

I am sorry. I wasn't clair previously. I have a file that include data in the following format
A(3)
B(4),A
C(2),A
E(5),A
G(3),A
J(8),B,H
H(7),C,E,G
I(6),G
F(5),H
...
These data represent a graph.
I will use the critical path method to calculate how to get through this text file.
the char is the step
the int is the length of each task
the other char is step that come before the first char
So I have created the class Task to read the file and its constructor have the following parameters
Tache::Tache(char step2, int duration, list<Task*> precedentTask)
{
this->step = step2;
this -> duration = duration;
for(list<Task*>::iterator it = this-> precedentTask.begin(); it != this-> precedentTask.end(); it++)
{
this-> precedentTask.push_back(*it);
}
}
In the main I added
string line;
list<Task> *allTaches = new list<Task>();
while(getline(file, line, ','))
{
//I want to be able to receive the parse line from the file and add it like
//allTaches.push_back(line)
//But the format needs to look like (Char, duration, <a list of> PrecedentChar)
//when I do
cout<< line << Lendl;
it prints
A(3)
B(4)
A
C(2)
A
E(5)
A
}
So I am not sure to know what to do really.

You can use a regular expression to parse out the pieces you need and then pass them to Task
In c++ that is done using std::regex
The code below will help you understand how to parse out the pieces, applying them to test is a simple step from there, but best done by you to make sure the concept is clear.
First we will need a regular expression that grabs each piece, this is called a capture group and all that is needed is to use parenthesis
If we break down what you have - it is:
Something, an open paren we dont want, Something, a close paren we dont want, a comma we don't want, and Something
in simple regex that would be:
(.*)\((.*)\),(.*)
But things are never so simple
The first Something ends with the open paren, so we want everything but that first open paren: ([^(]) the ^ means not, the square bracket [] means every character
The second Something ends with the close paren, so we have ([^)])
The third something excludes the optional comma, but we can use (.*) and then group the , in an optional * (There is likely a better way to do this)
We also need to double escape the \ once for the compiler and once for regex
We also need to allow for people entering random spaces in there so we add * in all breaks
This leads to our regex:
*([^(]*) *\\( *([^)]*) *\\) *(, *(.*))*
Then we search and if found it will be in the result and we can iterate it to get the pieces.
#include <iostream>
#include <string>
#include <regex>
int main()
{
// std::string seq = "A(4),B";
std::string seq = "A(4)";
try {
std::regex rgx(" *([^(]*) *\\( *([^)]*) *\\) *(, *(.*))*");
std::smatch result;
if(std::regex_search(seq, result, rgx))
{
std::cout << "Size=" << result.size() << std::endl;
for(size_t i=0; i<result.size(); ++i)
{
std::cout << result[i] << std::endl;
}
}
else
{
std::cout << "NO MATCH" << std::endl;
}
} catch (std::regex_error& e) {
std::cout << "BAD REGEX" << std::endl;
}
}

What you're actually looking to do here is create an extraction operator for your Tache object. I'm going to assume that your code looks something like this:
typedef char Task;
struct Tache {
char step;
int duration;
list<Task> precedentTask;
};
Your extraction operator will be a method of Tache. It's brute force implementation will look something like this:
istream& operator>>(istream& lhs, Tache& rhs) {
string line;
getline(lhs, line, '\n');
stringstream ss(line);
ss >> rhs.step;
ss.ignore(numeric_limits<streamsize>::max(), '(');
ss >> rhs.duration;
ss.ignore(numeric_limits<streamsize>::max(), ')');
const regex re("\\s*,\\s*([a-zA-Z])");
string precedentTasks;
getline(ss, precedentTasks);
rhs.precedentTask.clear();
transform(sregex_token_iterator(cbegin(precedentTasks), cend(precedentTasks), re, 1), sregex_token_iterator(), back_insert_iterator<list<Task>>(rhs.precedentTask), [](const string& i) {
return i.front();
});
return lhs;
}
Live Example

How to use boost split to split a string and ignore empty values?

I am using boost::split to parse a data file. The data file contains lines such as the following.
data.txt
1:1~15 ASTKGPSVFPLAPSS SVFPLAPSS -12.6 98.3
The white space between the items are tabs. The code I have to split the above line is as follows.
std::string buf;
/*Assign the line from the file to buf*/
std::vector<std::string> dataLine;
boost::split( dataLine, buf , boost::is_any_of("\t "), boost::token_compress_on); //Split data line
cout << dataLine.size() << endl;
For the above line of code I should get a print out of 5, but I get 6. I have tried to read through the documentation and this solution seems as though it should do what I want, clearly I am missing something. Thanks!
Edit:
Running a forloop as follows on dataLine you get the following.
cout << "****" << endl;
for(int i = 0 ; i < dataLine.size() ; i ++) cout << dataLine[i] << endl;
cout << "****" << endl;
****
1:1~15
ASTKGPSVFPLAPSS
SVFPLAPSS
-12.6
98.3
****

Even though "adjacent separators are merged together", it seems like the trailing delimeters make the problem, since even when they are treated as one, it still is one delimeter.
So your problem cannot be solved with split() alone. But luckily Boost String Algo has trim() and trim_if(), which strip whitespace or delimeters from beginning and end of a string. So just call trim() on buf, like this:
std::string buf = "1:1~15 ASTKGPSVFPLAPSS SVFPLAPSS -12.6 98.3 ";
std::vector<std::string> dataLine;
boost::trim_if(buf, boost::is_any_of("\t ")); // could also use plain boost::trim
boost::split(dataLine, buf, boost::is_any_of("\t "), boost::token_compress_on);
std::cout << out.size() << std::endl;
This question was already asked: boost::split leaves empty tokens at the beginning and end of string - is this desired behaviour?

I would recommend using C++ String Toolkit Library. This library is much faster than Boost in my opinion. I used to use Boost to split (aka tokenize) a line of text but found this library to be much more in line with what I want.
One of the great things about strtk::parse is its conversion of tokens into their final value and checking the number of elements.
you could use it as so:
std::vector<std::string> tokens;
// multiple delimiters should be treated as one
if( !strtk::parse( dataLine, "\t", tokens ) )
{
std::cout << "failed" << std::endl;
}
--- another version
std::string token1;
std::string token2;
std::string token3:
float value1;
float value2;
if( !strtk::parse( dataLine, "\t", token1, token2, token3, value1, value2) )
{
std::cout << "failed" << std::endl;
// fails if the number of elements is not what you want
}
Online documentation for the library: String Tokenizer Documentation
Link to the source code: C++ String Toolkit Library

Leading and trailing whitespace is intentionally left alone by boost::split because it does not know if it is significant or not. The solution is to use boost::trim before calling boost::split.
#include <boost/algorithm/string/trim.hpp>
....
boost::trim(buf);

Error recovering values from boost::unordered::unordered_map using std::string keys

I'm storing in an unordered_map the results I get from a regex match.
std::cout the sub matches m[1].str() and m[2].str() shows the pair key-value correctly.
Although when I store them in an unordered_map I always get an exception reporting that the key wasn't found.This is the code:
boost::unordered::unordered_map<std::string, std::string>
loadConfigFile(std::string pathToConfFile) throw(std::string){
std::fstream fs;
fs.open(pathToConfFile.c_str());
if(!fs)
throw std::string("Cannot read config file.");
boost::unordered::unordered_map<std::string, std::string> variables;
while(!fs.eof())
{
std::string line;
std::getline(fs, line);
//std::cout << line << std::endl;
boost::regex e("^(.+)\\s*=\\s*(.+)");
boost::smatch m; //This creates a boost::match_results
if(boost::regex_match(line, m, e)){
std::cout << m[1].str() << " " << m[2].str() << std::endl;
variables[m[1].str()] = m[2].str();
}
}
std::cout << variables.at(std::string("DEPOT_PATH")) << std::endl; //Here I get the exception
return variables;
}
DEPOT_PATH is the name of a "variable" in a config file. std::cout << m[1].str() shows it perfectly, but not found in the unordered_map.
Any ideas?

Most likely, the key you put in the unordered map contains whitespace (which you don't see when outputting it) and therefore is not found later.
In your regex ^(.+)\\s*=\\s*(.+), the first (.+) will greedily match as many characters as possible, including leading and trailing whitespace. The \\s* following it will always match an empty string. To prevent this, you can use (\\S+) for non-whitespace only, or use a non-greedy (.+?).
By the way, while (!fs.eof()) is wrong. Use while (std::getline(fs, line)) {...} instead.

I can't add a new line to c++ string

How do you add a new line to a c++ string? I'm trying to read a file but when I try to append '\n' it doesn't work.
std::string m_strFileData;
while( DataBegin != DataEnd ) {
m_strFileData += *DataBegin;
m_strFileData += '\n';
DataBegin++;
}

If you have a lot of lines to process, using stringstream could be more efficient.
ostringstream lines;
lines << "Line 1" << endl;
lines << "Line 2" << endl;
cout << lines.str(); // .str() is a string
Output:
Line 1
Line 2

Sorry about the late answer, but I had a similar problem until I realised that the Visual Studio 2010 char* visualiser ignores \r and \n characters. They are completely ommitted from it.
Note: By visualiser I mean what you see when you hover over a char* (or string).

Just a guess, but perhaps you should change the character to a string:
m_strFileData += '\n';
to be this:
m_strFileData += "\n";

This would append a newline after each character, or string depending on what type DataBegin actually is. Your problem does not lie in you given code example. It would be more useful if you give your expected and actual results, and the datatypes of the variables use.

Try this:
ifstream inFile;
inFile.open(filename);
std::string entireString = "";
std::string line;
while (getline(inFile,line))
{
entireString.append(line);
entireString.append("\n");
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

set boost regex from foreign source - c++

Related

C++14 extract quoted strings verbatim including the quotes

How do I parse a line into pieces and ignore parts of it?

How to use boost split to split a string and ignore empty values?

Error recovering values from boost::unordered::unordered_map using std::string keys

I can't add a new line to c++ string

Categories

Resources