C++ regex, parsing

C++ regex, parsing - c++

I'm pretty new with regexp and I can't get my function doing what I would like.
I have a long string, and I want to extract from it, 3 variables.
My string looks :
Infoname/info :
Input_Device_Name GTape Buffer_Size 16384 Acquisition_Event_Rate 163691.000000
Acquisition_Buffer_Rate 14873.333008 Acquisition_Succes_Rate 100.000000
And my goal is to store 163691.000000, 14873.333008 and 100.000000 in three differents variables.
What is the fastest and nicest way to do it please ?
Thank you,
eo

You could use the following regex to look for it:
"Input_Device_Name\s+GTape\s+Buffer_Size\s+[0-9.]+\s+Acquisition_Event_Rate\s+([0-9.]+)\s+Acquisition_Buffer_Rate\s+([0-9.]+)\s+Acquisition_Succes_Rate\s+([0-9.]+)"
This should catch the three values assuming that your text stays the same and that your numbers always take this form (i.e. are positive and not in exponential form.) Note that only the last three numbers are captured by putting brackets round them.
If you use boost regex, you could do something like this:
#include <boost/regex.hpp>
...
boost::smatch what;
static const boost::regex pp("Input_Device_Name\\s+GTape\s+Buffer_Size\\s+[0-9.]+\\s+Acquisition_Event_Rate\\s+([0-9.]+)\\s+Acquisition_Buffer_Rate\\s+([0-9.]+)\\s+Acquisition_Succes_Rate\\s+([0-9.]+)");
if ( boost::regex_match(inputTextString, what, pp) )
{
if ( what.size() == 4 )
{
double d1 = strtod(static_cast<const string&>( what[1] ).c_str(), NULL, 0);
double d2 = strtod(static_cast<const string&>( what[2] ).c_str(), NULL, 0);
double d3 = strtod(static_cast<const string&>( what[3] ).c_str(), NULL, 0);
// These are your doubles, do some stuff with them.
}
}
Where inputTextString contains the line of text you want to parse, so if this is coming from a file say, you would want to place this code in a loop. The what variable is a vector of all the matching text though what[0] contains the whole line and so can be ignored unless you need it. Last but not least, remember to double escape the 'space' character class otherwise it will already be processed (or generate an error or warning) by the compiler prior to being presented to the regex processor. Also, please note that I've not had time to compile this, though it is based on working code
Watch out for trailing, leading space on your input file and use ^ and $ to mark the beginning or end of the line respectively if it helps.

Just search for [0-9\.]+ as long as it returns any results. And, for example, if you would like to refuse 16384 as a variable you don't need, test every search result for having a dot in it.

Related

In c++, how do you get the input of a string, float and integer from 1 line?

An input file is entered with the following data:
Juan Dela Cruz 150.50 5
'Juan Dela Cruz' is a name that I would like to assign to string A,
'150.50' is a number I would like to assign to float B
and 5 is a number I would like to assign to int C.
If I try cin, it is delimited by the spaces in between.
If I use getline, it's getting the whole line as a string.
What would be the correct syntax for this?

If we analyze the string, then we can make the following observation. At the very end, we have an integer. In front of the integer we have a space. And in front of that the float value. And again in fron of that a space.
So, we can simply look from the back of the string for the 2nd last space. This can easily be achieved by
size_t position = lineFromeFile.rfind(' ', lineFromeFile.rfind(' ')-1);
We need a nested statement of rfind please see here, version no 3.
Then we build a substring with the name. From start of the string up to the found position.
For the numbers, we put the rest of the original string into an std::istringstream and then simply extract from there.
Please see the following simple code, which has just a few lines of code.
#include <iostream>
#include <string>
#include <cctype>
#include <sstream>
int main() {
// This is the string that we read via getline or whatever
std::string lineFromeFile("Juan Dela Cruz 150.50 5");
// Let's search for the 2nd last space
size_t position = lineFromeFile.rfind(' ', lineFromeFile.rfind(' ')-1);
// Get the name as a substring from the original string
std::string name = lineFromeFile.substr(0, position);
// Put the numbers in a istringstream for better extraction
std::istringstream iss(lineFromeFile.substr(position));
// Get the rest of the values
float fValue;
int iValue;
iss >> fValue >> iValue;
// Show result to use
std::cout << "\nName:\t" << name << "\nFloat:\t" << fValue << "\nInt:\t" << iValue << '\n';
return 0;
}

Probably simplest in this case would be to read whole line into string and then parse it with regex:
const std::regex reg("\\s*(\\S.*)\\s+(\\d+(\\.\\d+)?)\\s+(\\d+)\\s*");
std::smatch match;
if (std::regex_match( input, match, reg)) {
auto A = match[1];
auto B = std::stof( match[2] );
auto C = std::stoi( match[4] );
} else {
// error invalid format
}
Live example

As always when the input does not (or sometimes does not) match a strict enough syntax, read the whole line and then apply the rules which to a human are "obvious".
In this case (quoting comment by john):
Read the whole string as a single line. Then analyze the string to work out where the breaks are between A, B and C. Then convert each part to the type you require.
Specifically, you probably want to use reverse searching functions (e.g. https://en.cppreference.com/w/cpp/string/byte/strrchr ), because the last parts of the input seem the most strictly formatted, i.e. easiest to parse. The rest is then the unpredictable part at the start.

either try inputting the different data type in different lines and then use line breaks to input different data types or use the distinction to differentiate different data types like adding a . or comma

use the same symbol after each data package, for example, Juan Dela Cruz;150.50;5 then you can check for a ; and separate your string there.
If you want to use the same input format you could use digits as an indicator to separate them

Why a "no matching function" error for call by reference with literal number?

The problem asks to create a program that asks the user to enter some text and that text will be surrounded by asterisks depending on the width of the screen for example if the user inputs "Hello world" the output should be:
****************
* Hello World! *
****************
I've tried to create the functions but I'm stuck becaus of a compiler error with the shown minimal code.
Question: Why does it tell me no matching function for within_width(text, 80)?
Some of the code I have is below:
#include <iostream>
#include <string>
void display_header (std::string &header) {
std::string text;
header = text;
}
bool within_width (std::string& text, unsigned short int& max_width) {
}
int main() {
std::string text;
std::cout << "Please enter header text: ";
std::getline(std::cin, text);
if (within_width(text, 80)) {
// call the display_header function and pass in the text
// inputted by the user
} else {
std::cout << text;
}
return 0;
}

This declaration of the function
bool within_width (std::string& text, unsigned short int& max_width)
asks for an unsigned short int variable, because it has a reference parameter, see the second &.
To satisfy it, you need to put the value 80 into a variable and give the variable as parameter.
unsigned short int MyWidth=80;
if (within_width(text, MyWidth))
Alternatively (but I assume you are not allowed) you can use a call by value parameter
bool within_width (std::string& text, unsigned short int max_width)
Then you could call as shown.

I won't give a full answer to the exercise here, just some clues.
the display_header() and within_width() functions need to know the string given in parameters but may not modify it ; thus the type of this parameter should be const std::string & (the const was missing).
the second parameter of the within_width() function is just an integer that will be compared to the length of the string ; you don't need to pass it by reference (or at least const), rather by value. Here, the (non-const) reference prevents from passing the literal constant 80.
(it seems to be the main concern of the question after edition)
You need to reason step by step.
all of this depends on the size of the string (12 for Hello World!) ; this information is available via size(text) (or text.size())
(https://en.cppreference.com/w/cpp/iterator/size)
(https://en.cppreference.com/w/cpp/string/basic_string/size)
This size will have to be compared to max_width
Displaying the line with header will require 4 more characters because * will be prepended and * will be appended.
Thus the two surrounding lines will have the length size(header)+4 too.
In order to create such a string made of *, you could use a constructor of std::string taking two parameters : the count of characters and the character to be repeated.
(https://en.cppreference.com/w/cpp/string/basic_string/basic_string)
Send all of this to std::cout in the correct order.

Edit: Just noticing that this answer probably goes far beyond the scope of the task you have been given (just filling in some skeleton that has been provided by your teacher).
I'll still leave it here to illustrate what could be done with arbitrary input. Maybe you want to experiment a little further than what you have been asked...
bool within_width(...)
Pretty simple: string.length() <= max – just wait a second, you need to consider asterisks and spaces at beginning and end of output, so: max - 4
But you can do better, you can split the string, best at word boundaries. That's a bit difficult more difficult, though:
std::vector<std::string> lines;
// we'll be starting with an initially empty line:
auto lineBegin = text.begin();
auto lineEnd = text.begin();
for(auto i = text.begin(); i != text.end(); ++)
// stop condition empty: we'll stop from inside the loop...
{
// ok, we need to find next whitespace...
// we might try using text.find_first_of("..."), but then we
// need to know any whitespace characters ourselves, so I personally
// would rather iterate manually and use isspace function to determine;
// advantage: we can do other checks at the same time, too
auto distance = std::distance(lineBegin, i);
if(std::distance(lineBegin, i) > maxLineLength)
{
if(lineEnd == lineBegin)
{
// OK, now we have a problem: the word itself is too long
// decide yourself, do you want to cut the word somewhere in the
// middle (you even might implement syllable division...)
// or just refuse to print (i. e. throw an exception you catch
// elsewhere) - decide yourself...
}
else
{
lines.emplace_back(lineBegin, lineEnd);
lineBegin = lineEnd; // start next line...
}
}
// OK, now handle current character appropriately
// note: no else: we need to handle the character in ANY case,
// if we terminated the previous line or not
if(std::isspace(static_cast<unsigned char>(*i)))
{
lineEnd = i;
}
// otherwise, we're inside a word and just go on
}
// last line hasn't been added!
lines.emplace_back(lineBegin, lineEnd);
Now you can calculate maximum length over all the strings contained. Best: Do this right when adding a new line to the vector, then you don't need a separate loop...
You might have noticed that I didn't remove whitespace at the end of the strings, so you wouldn't need to add you own one, apart, possibly, from the very last string (so you might add a lines.back() += ' ';).
The ugly part, so far, is that I left multiple subsequent whitespace. Best is removing before splitting into lines, but be aware that you need to leave at least one. So:
auto end = text.begin();
bool isInWord = false; // will remove leading whitespace, if there is
for(auto c : text)
{
if(std::isspace(static_cast<unsigned char>(c)))
{
if(isInWord)
{
*end++ = ' '; // add a single space
isInWord = false;
}
}
else
{
*end++ = c;
isInWord = true;
}
}
This would have moved all words towards the beginning of the string, but we yet to drop the surplus part of the string yet contained:
text.erase(end, text.end());
Fine, the rest is pretty simple:
iterate over maximum length, printing a single asterisk in every loop
iterate over all of your strings in the vector: std::cout << "* " << line << "*\n";
repeat the initial loop to print second line of asterisks
Finally: You introduced a fix line limit of 80 characters. If console is larger, you just won't be using the entire available width, which yet might be acceptable, if it is smaller, you will get lines broken at the wrong places.
You now could (but that's optional) try to detect the width of the console – which has been asked before, so I won't go any deeper into.
Final note: The code presented above is untested, so no guarantee to be bugfree!

How can I read CSV file in to vector in C++

I'm doing the project that convert the python code to C++, for better performance. That python project name is Adcvanced EAST, for now, I got the input data for nms function, in .csv file like this:
"[ 5.9358170e-04 5.2773970e-01 5.0061589e-01 -1.3098677e+00
-2.7747922e+00 1.5079222e+00 -3.4586751e+00]","[ 3.8175487e-05 6.3440394e-01 7.0218205e-01 -1.5393494e+00
-5.1545496e+00 4.2795391e+00 -3.4941311e+00]","[ 4.6003381e-05 5.9677261e-01 6.6983813e-01 -1.6515008e+00
-5.1606908e+00 5.2009044e+00 -3.0518508e+00]","[ 5.5172237e-05 5.8421570e-01 5.9929764e-01 -1.8425952e+00
-5.2444854e+00 4.5013981e+00 -2.7876694e+00]","[ 5.2929961e-05 5.4777789e-01 6.4851379e-01 -1.3151239e+00
-5.1559062e+00 5.2229333e+00 -2.4008298e+00]","[ 8.0250458e-05 6.1284608e-01 6.1014801e-01 -1.8556541e+00
-5.0002270e+00 5.2796564e+00 -2.2154367e+00]","[ 8.1256607e-05 6.1321974e-01 5.9887391e-01 -2.2241254e+00
-4.7920742e+00 5.4237065e+00 -2.2534993e+00]
one unit is 7 numbers, but a '\n' after first four numbers,
I wanna read this csv file into my C++ project,
so that I can do the math work in C++, make it more fast.
using namespace std;
void read_csv(const string &filename)
{
//File pointer
fstream fin;
//open an existing file
fin.open(filename, ios::in);
vector<vector<vector<double>>> predict;
string line;
while (getline(fin, line))
{
std::istringstream sin(line);
vector<double> preds;
double pred;
while (getline(sin, pred, ']'))
{
preds.push_back(preds);
}
}
}
For now...my code emmmmmm not working ofc,
I'm totally have no idea with this...
please help me with read the csv data into my code.
thanks

Unfortunately parsing strings (and consequently files) is very tedious in C++.
I highly recommend using a library, ideally a header-only one, like this one.
If you insist on writing it yourself, maybe you can draw some inspiration from this StackOverflow question on how to parse general CSV files in C++.

You could look at getdelim(',', fin, line),
But the other issue will be those quotes, unless you /know/ the file is always formatted exactly this way, it becomes difficult.
One hack I have used in the past that is NOT PERFECT, if the first character is a quote, then the last character before the comma must also be a matching quote, and not escaped.
If it is not a quote then getdelim() some more, but the auto-alloc feature of getdelim means you must use another buffer. In C++ I end up with a vector of all the pieces of getdelim results that then need to be concatenated to make the final string:
std::vector<char*> gotLine;
gotLine.push_back(malloc(2));
*gotLine.back() = fgetch();
gotLine.back()[1] = 0;
bool gotquote = *gotLine.back() == '"'; // perhaps different classes of quote
if (*gotLine.back() != ',')
for(;;)
{
char* gotSub= nullptr;
gotSub=getdelim(',');
gotLine.push_back(gotSub);
if (!gotquote) break;
auto subLen = strlen(gotSub);
if (subLen>1 && *(gotSub-1)=='"') // again different classes of quote
if (sublen==2 || *(gotSub-2)!='\\') // needs to be a while loop
break;
}
Then just concatenate all these string segments back together.
Note that getdelim supports null bytes. If you expect null bytes in the content, and not represented by the character sequences \000 or \# you need to store the actual length returned by getdelim, and use memcpy to concatenate them.
Oh, and if you allow utf-8 extended quotes it gets very messy!
The case this doesn't cover is a string that ends \\" or \\\\". Ideally you need to while count the number of leading backslashes, and accept the quote if the count is even.
Note that this leave the issue of unescaping the quoted content, i.e. converting any \" into ", and \\ into \, etc. Also discarding the enclosing quotes.
In the end a library may be easier if you need to deal with completely arbitrary content. But if the content is "known" you can live without.

Extracting certain integers from string C++

Good day to all,
I am having a hard time trying to extract desired integers from a string. I am given the following to read in from a file:
itemnameitemnumber price percentmarkup
examples
Gowns-u2285 24.22 37%
TwoB1Ask1-m1275 90.4 1%
What I have been trying to do is get the item number separated from the item name so that I can store the item number as a reference for sorting. As you can see the first example itemnameitemnumber is a clear cut character to digit separation, whereas the next example has numbers within its item name.
I have tried several different approaches, however with certain item names having integers apart of their name is proving to be beyond my experience.
If anyone can help me with this I would be greatly appreciative for their time and knowledge.

Good day,
I don't know, if you have a fixed number of digits for itemnumber, but i am going to assume that you don't.
This is a simple approach; first you have to separate the words of your line. For example, use std::istringstream.
When you have the line split to words, for example by giving its iterators to a vector, or reading it with operator>>, you start to check the first word from backwards, until you find anything that is not one of "0123456789 " (note the whitespace at the end).
After you've done this, you get the iterator about where these digits end (from backwards), and cut your original string, or if you have the opportunity, the already split string. Voilá! You have yourself your item name and item number.
For the record, i am going to do this whole thing, utilising the same technique for the percent markup too, of course with the exception characters being "% ".
#define VALID_DIGITS "0123456789 "
#define VALID_PERCENTAGE "% "
struct ItemData {
std::string Name;
int Count;
double Price;
double PercentMarkup;
};
int ExtractItemData(std::string Line, ItemData & Output) {
std::istringstream Stream( Line );
std::vector<std::string> Words( Stream.begin(), Stream.end() );
if (Words.size() < 3) {
/* somebody gave us a malformed line with less than needed words */
return -1;
}
// Search from backwards, until you do not find anything that is not digits (0-9) or a whitespace
std::size_t StartOfDigits = Words[0].find_last_not_of( VALID_DIGITS );
if (StartOfDigits == std::string::npos) {
/* error; your item name is invalid */
return -2;
}
else {
// Separate the string into 2 parts
Output.Name = Words[0].substr(0, StartOfDigits); // Get the first part
Output.Count = std::stoi( Words[0].substr(StartOfDigits, Words[0].length() - StartOfDigits) );
Output.Price = std::stod( Words[1] );
// Search from backwards, until we do not find anything that is not '%' or ' '
std::size_t StartOfPercent = Words[2].find_last_not_of(VALID_PERCENTAGE);
Output.PercentMarkup = std::stod( Words[2].substr(0, StartOfPercent) );
}
return 0;
}
Code requies includes sstream, vector, string, and cstdint if you do not have size_t defined
Hope the answer was useful.
Best of luck, COlda.
PS.: My first answer on stack overflow ^^;

you can iterate on the string pushing the numbers to a vector then use stringstream to convert them to integers

C : Using substr to parse a text file

I just need a little help with file parsing. We have to parse a file that has 6 string entries per row in the format:
"string1", "string2", "string3", "string4", "string5", "string6"
My instructor recently gave us a little piece of code as a "hint," and I'm supposed to use it. Unfortunately, I can't figure out how to get it to work. Here's my file parsing function.
void parseData(ifstream &myFile, Book bookPtr[])
{
string bookInfo;
int start, end;
string bookData[6];
getline(myFile, bookInfo);
start = -2;
myFile.open("Book List.txt");
for (int j = 0; j < 6; j++)
{
start += 3;
end = bookInfo.find('"', start);
bookData[j] = bookInfo.substr(start, end-start);
start = end;
}
}
So I'm trying to read the 6 strings into an array of strings. Can someone please help walk me through the process?

start = -2;
for (int j = 0; j < 6; j++)
{
start += 3;
end = bookInfo.find('"', start);
bookData[j] = bookInfo.substr(start, end-start);
start = end;
}
So ", " is four characters. The leading closing quote is 3 characters behind the opening closing quote.
At entry to the loop start is pointing to the last closing quote. (On first entry to loop it is faked as -2 to be pointing to the closing quote of the imaginary "-1th" element.)
So we advance from the last closing quote to the following opening quote:
start += 3;
Then we use std::string::find to find the closing quote:
end = bookInfo.find('"', start);
The offset tells it to ignore all characters up to and including that position.
We then have the two quote positions, start..end, so we use substr to extract the string:
bookData[j] = bookInfo.substr(start, end-start);
And we then update start for the next loop to be the last closing quote:
start = end

Please, for your own sake, create a minimal example. This starts with a string like the line you gave as example and ends with the different parts in an array. Leave the loading from a file out for now, getline() seems to work for you, or? Then, do not declare every variable you might want to use at the beginning of a function. This is not ancient C, where you simply had to do that or introduce additional {} blocks. There is another thing odd, and that is the Book bookPtr[]. This is indeed just a Book* bookPtr, i.e. you are not passing an array to a function but just a pointer. Don't fall for this misleading syntax, it's a lie! Anyway, you don't seem to be using that pointer to the object(s) of the unknown type anyway.
Concerning the splitting of a line into strings, one approach is to locate pairs of double quotes. Everything in between is one of the strings, everything without is irrelevant. The string class has a find() function which optionally takes a starting position. Starting position is always one behind the previously found position.
Your code above seems to assume that there is exactly one double quote, a comma, a space and another double quote that separates two strings. This isn't 100% clear, I would also be prepared for handling multiple spaces or no space at all. Also, is the comma guaranteed? Are the double quotes guaranteed? Anyway, keep it simple. Unless you get a better spec on the input, just assume that only the parts between the quotes is what differs.
Then, what exactly works and what doesn't? You need to ask more specific questions and give more detailed information. The code above doesn't look broken per se, although there are a few things a bit off. For example, you don't typically pass ifstreams to a function, but use the istream baseclass. In your case, you read a line from that file and then open another file using the same fstream object, which doesn't make sense to me, since you don't use it after that. If you only needed that stream locally, you would create and open it there (handling errors of course!) and pass in the filename as parameter only.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ regex, parsing - c++

Just search for [0-9\.]+ as long as it returns any results. And, for example, if you would like to refuse 16384 as a variable you don't need, test every search result for having a dot in it.

Related

In c++, how do you get the input of a string, float and integer from 1 line?

Why a "no matching function" error for call by reference with literal number?

How can I read CSV file in to vector in C++

Extracting certain integers from string C++

C : Using substr to parse a text file

Categories

Resources