How do I do string parsing in C++? [duplicate] - c++

This question already has answers here:
How do I tokenize a string in C++?
(37 answers)
How do I iterate over the words of a string?
(84 answers)
Closed 9 years ago.
I have the following example:
string name_data = "John:Green;96";
I need to parse the string name, the string surname and int data. I've tried using sscanf, but it isn't working!
How should I do this?

You can use strtok() to first extract, the element that ends with :, and then the one that ends with ;. What remains will be the 96.
sscanf() is another option, although I don't consider it to be quite as flexible.

If you want to do it with sscanf, you can do it as follows:
string name_data = "John:Green;96";
char name_[256], surname_[256];
int data;
sscanf(name_data.c_str(), "%[^:]:%[^;];%d", name_, surname_, &data);
string name = name_, surname = surname_;
Note this assumes you'll only have up to 255 characters for name and surname, otherwise you'll need a bigger temporary buffer before converting it to string.
"%[^:]:%[^;];%d" means read a string until you find a ':', then skip that ':', then read a string until you find a ';', then skip that ';' and then read an integer
You can find additional functionalities/specifiers here.

Without using any secondary function calls other than that assoc. with the string class, we can do it like this:
size_t index = 0;
string firstName, surname, dataStr;
string tmpInput;
while (index!=name_data.length())
{
// First identify the first name
if (firstName.empty())
{
if (name_data[index] == ':')
{
firstName = tmpInput;
tmpInput.clear();
}
else
tmpInput.push_back(name_data[index]);
}
// Next identify the surname
else if (surname.empty())
{
if (name_data[index] == ";")
{
surname = tmpInput;
tmpInput.clear();
}
else
tmpInput.push_back(name_data[index]);
}
// Finally identify the integer as a string object
else
{
dataStr.push_back(name_data[index]);
}
index++;
}
To convert dataStr to an int would be just a simple atoi() call or use of the stringstream library.

You can use stringstream to get your tokens. That program demonstrate how you can do that:
#include <sstream>
#include <iostream>
#include <string>
using namespace std;
int main() {
string x = "John:Green;96";
stringstream str(x);
std::string first;
std::string second;
int age;
std::getline(str, first, ':');
std::getline(str, second,';');
str >> age;
cout << "First: " << first << " Last: " << second << " Age: " << age << endl;
}

Related

C++ Using getline() inside loop to read in CSV file

I'm trying to read in a CSV file that contains rows of 3 people/patients, where col 1 is userid, col 2 is fname, col 3 is lname, col 4 is insurance, and col 5 is version that looks something like below.
Edit: Apologies, I simply copy/pasted my CSV spreadsheet in here, so it didn't show the commas before. Wouldn't it look something more like below? John below also pointed out that there are no commas after the version, and this seemed to fix the issue! Thanks so much John! ( trying to figure out how I can accept your answer :) )
nm92,Nate,Matthews,Aetna,1
sc91,Steve,Combs,Cigna,2
ml94,Morgan,Lands,BCBS,3
I'm trying to use getline() inside of a loop to read everything in, and it works fine for the first iteration, but getline() seems to be causing it to skip a value on the next iterations. Any idea how I can solve this?
I'm also not sure why the output looks like below, because I'm not seeing where the lines w/ "sc91" and "ml94" are being printed in the code. This is what the output of the current code looks like.
userid is: nm92
fname is: Nate
lname is: Matthews
insurance is: Aetna
version is: 1
sc91
userid is: Steve
fname is: Combs
lname is: Cigna
insurance is: 2
ml94
version is: Morgan
userid is: Lands
fname is: BCBS
lname is: 3
insurance is:
version is:
I've done a ton of research on differences between getline() and the >> stream operator, but most of the getline() materials seem to revolve around getting input from cin rather than reading from a file like here, so I'm thinking there's something going on w/ getline() and how it's reading the file that I'm not understanding. Unfortunately when I tried >> operator, that forces me to use the strtok() function, and I was struggling a lot with c strings and assigning them to an array of C++ strings.
#include <iostream>
#include <string> // for strings
#include <cstring> // for strtok()
#include <fstream> // for file streams
using namespace std;
struct enrollee
{
string userid = "";
string fname = "";
string lname = "";
string insurance = "";
string version = "";
};
int main()
{
const int ENROLL_SIZE = 1000; // used const instead of #define since the performance diff is negligible,
const int numCols = 5; // while const allows for greater utility/debugging bc it is known to the compiler ,
// while #define is a preprocessor directive
ifstream inputFile; // create input file stream for reading only
struct enrollee enrollArray[ENROLL_SIZE]; // array of structs to store each enrollee and their respective data
int arrayPos = 0;
// open the input file to read
inputFile.open("input.csv");
// read the file until we reach the end
while(!inputFile.eof())
{
//string inputBuffer; // buffer to store input, which will hold an entire excel row w/ cells delimited by commas
// must be a c string since strtok() only takes c string as input
string tokensArray[numCols];
string userid = "";
string fname = "";
string lname = "";
string insurance = "";
string sversion = "";
//int version = -1;
//getline(inputFile,inputBuffer,',');
//cout << inputBuffer << endl;
getline(inputFile,userid,',');
getline(inputFile,fname,',');
getline(inputFile,lname,',');
getline(inputFile,insurance,',');
getline(inputFile,sversion,',');
enrollArray[0].userid = userid;
enrollArray[0].fname = fname;
enrollArray[0].lname = lname;
enrollArray[0].insurance = insurance;
enrollArray[0].version = sversion;
cout << "userid is: " << enrollArray[0].userid << endl;
cout << "fname is: " << enrollArray[0].fname << endl;
cout << "lname is: " << enrollArray[0].lname << endl;
cout << "insurance is: " << enrollArray[0].insurance << endl;
cout << "version is: " << enrollArray[0].version << endl;
}
}
Your problem is that there is no comma after the final data item in each line, so
getline(inputFile,sversion,',');
is incorrect because it reads to the next comma, which is actually on the next line after the user id of the next patient. This explains the output you see where the user id of the next patent gets output with the version.
To fix this simply replace the code above with
getline(inputFile,sversion);
which will read to the end of line as required.
Regarding your function. If you look at the structure of the source file, then you will see that it contains 5 strings, separated by ",". So a typical CSV file.
A call to std::getline will read a complete line with the 5 strings. In your code you are trying to call std::getline for each single string, followed by a comma. Commaa is not present after the last string. That will not work. You should also use getline to get a complete line.
You need to read the whole line and then tokenize it.
I will show you an example on how to do that with the std::sregex_token_iterator. That is very simple. Additionally, we will overwrite the inserter and extracot operator. With that, you can easiyl read and write "enrollee" data like Enrollee e{}; std::cout << e;
Additionally I use C++ algorithms. That makes life very easy. Input and Output are a one-liner in main.
Please see:
#include <iostream>
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <regex>
struct Enrollee
{
// Data
std::string userid{};
std::string fname{};
std::string lname{};
std::string insurance{};
std::string version{};
// Overload Extractor Operator to read data from somewhere
friend std::istream& operator >> (std::istream &is, Enrollee& e) {
std::vector<std::string> wordsInLine{}; // Here we will store all words that we read in onle line;
std::string wholeLine; // Temporary storage for the complete line that we will get by getline
std::regex separator("[ \\;\\,]"); ; // Separator for a CSV file
std::getline(is, wholeLine); // Read one complete line and split it into parts
std::copy(std::sregex_token_iterator(wholeLine.begin(), wholeLine.end(), separator, -1), std::sregex_token_iterator(), std::back_inserter(wordsInLine));
// If we have read all expted strings, then store them in our struct
if (wordsInLine.size() == 5) {
e.userid = wordsInLine[0];
e.fname = wordsInLine[1];
e.lname = wordsInLine[2];
e.insurance = wordsInLine[3];
e.version = wordsInLine[4];
}
return is;
}
// Overload Inserter operator. Insert data into output stream
friend std::ostream& operator << (std::ostream& os, const Enrollee& e) {
return os << "userid is: " << e.userid << "\nfname is: " << e.fname << "\nlname is: " << e.lname << "\ninsurance is: " << e.insurance << "\nversion is: " << e.version << '\n';
}
};
int main()
{
// Her we will store all Enrollee data in a dynamic growing vector
std::vector<Enrollee> enrollmentData{};
// Define inputFileStream and open the csv
std::ifstream inputFileStream("r:\\input.csv");
// If we could open the file
if (inputFileStream) {
// Then read all csv data
std::copy(std::istream_iterator<Enrollee>(inputFileStream), std::istream_iterator<Enrollee>(), std::back_inserter(enrollmentData));
// For Debug Purposes: Print all data to cout
std::copy(enrollmentData.begin(), enrollmentData.end(), std::ostream_iterator<Enrollee>(std::cout, "\n"));
}
else {
std::cerr << "Could not open file 'input.csv'\n";
}
}
This will read the input file "input.csv" containing
nm92,Nate,Matthews,Aetna,1
sc91,Steve,Combs,Cigna,2
ml94,Morgan,Lands,BCBS,3
And show as output:
userid is: nm92
fname is: Nate
lname is: Matthews
insurance is: Aetna
version is: 1
userid is: sc91
fname is: Steve
lname is: Combs
insurance is: Cigna
version is: 2
userid is: ml94
fname is: Morgan
lname is: Lands
insurance is: BCBS
version is: 3
That is only an idea, but it could help you. It's a piece of code of one project I am working on:
std::vector<std::string> ARDatabase::split(const std::string& line, char delimiter)
{
std::vector<std::string> tokens;
std::string token;
std::istringstream tokenStream(line);
while (std::getline(tokenStream, token, delimiter))
{
tokens.push_back(token);
}
return tokens;
}
void ARDatabase::read_csv_map(std::string root_csv_map)
{
qDebug() << "Starting to read the people database...";
std::ifstream file(root_csv_map);
std::string str;
while (std::getline(file, str))
{
std::vector<std::string> tokens = split(str, ' ');
std::vector<std::string> splitnames = split(tokens.at(1), '_');
std::string name_w_spaces;
for(auto i: splitnames) name_w_spaces = name_w_spaces + i + " ";
people_names.insert(std::make_pair(stoi(tokens.at(0)), name_w_spaces));
people_images.insert(std::make_pair(stoi(tokens.at(0)), std::string("database/images/" + tokens.at(2))));
}
}
Instead of std::vector, you might want to use other container more suitable for your case. And the last example is made for the input format of my case. You can modify it easily for adapting it to your code.

Read comma separated values with stray whitespaces from a textfile in c++

I have a file that contains string,int,int values in multiple lines.
Delhi,12,13
Mumbai,100 , 101
Kolkata,11, 12
The values are separated by commas but there can be stray whitespaces in between.My current code is this :
#include<cstdio>
#include<iostream>
#include<string>
using namespace std;
int main()
{
FILE *f = fopen("input.txt","r");
int lines = 0;
char c = getc(f);
while(c != EOF)
{
if(c == '\n')
{
lines++;
}
c = getc(f);
}
lines++;
string arr[lines];
int t1[lines];
int t2[lines];
char s1[100],s2[100],s3[100];
int x,y;
fclose(f);
f = fopen("input.txt","r");
while (fscanf(f,"%99[^,],%99[^,],%99[^,]", s1, s1, s2)==3)
{
cout << s1 << s2 << s3 << endl;
}
}
This doesn't seem to quite properly read the values and display on the screen first of all. How do I read the string and the integer values here(which may have stray whitespaces) and store them into an array (three arrays to be precise) ?
Try doing this:
fscanf(f,"%[^, ]%*[ ,]%d%*[ ,]%d ", s1, &x, &y);
%[^, ] => searches for everything except , and <space> and stores it in s1
%*[ ,] => searches for , and <space> but does not store it anywhere (the * ensures that)
%d => stores the number
The problem is on this line:
while (fscanf(f,"%99[^,],%99[^,],%99[^,]", s1, s1, s2)==3)
It tries to scan up to the next comma character ',', which occurs on the next line. Replace with %99[^\n] to fix this problem:
while (fscanf(f,"%99[^,],%99[^,],%99[^\n]", s1, s1, s2)==3)
Why are you using FILE* and friends in C++?
The other answers specify the problem with your code, so I'm writing this answer to show you how to improve it.
std::ifstream file("input.txt");
std::string name, value0, value1;
while (std::getline(file, name, ',')) {
// Get the value strings from the stream.
std::getline(file, value0, ',');
std::getline(file, value1, ',');
// These will throw an exception when given invalid input.
int v0 = std::stoi(value0);
int v1 = std::stoi(value1);
// Do stuff with the strings
}
std::getline can be used to extract a string from a stream up until a certain delimiter. Whitespaces are ignored here, so we don't have to care about them. The return value of std::getline is the stream passed in, and it has an operator bool() that allows us to use it as a boolean expression. The value will become false when the stream is either empty or in some erroneous state.
Note that the above should be similar in behavior to:
while (file) {
std::getline(file, name, ',');
// ...
}
I'm pretty sure this must be a whole lot more readable than a string like "%99[^,],%99[^,],%99[^,]".
Cheers~

How to extract a substring from a string in C++?

I've been looking thousand of questions and answers about what I'm going to ask, but I still didn't find the way to do what I'm gonna to explain.
I have a text file from which I have to extract information about several things, all of them with the following format:
"string1":"string2"
And after that, there is more information, I mean:
The text file is something like this:
LINE 1
XXXXXXXXXXXXXXXXXXXXXXXXXXXX"string1":"string2"XXXXXXXXXXXXXXXXXXXXXXXXXX"string3":"string4"XXXXXXXXXXXXXXXXXXXXXXXXXXXX...('\n')
LINE 2
XXXXXXXXXXXXXXXXXXXXXXXXXXXX"string5":"string6"XXXXXXXXXXXXXXXXXXXXXXXXXX"string7":"string8"XXXXXXXXXXXXXXXXXXXXXXXXXXXX...
XXX represents irrelevant information I do not need, and theEntireString (string used in the code example) stores all the information of a single line, not all the information of the text file.
I have to find first the content of string1 and store the content of string2 into another string without the quotes. The problem is that I have to stop when I reache the last quote and I don't know how exactly do this. I suppose I have to use the functions find() and substr(), but despite having tried it repeatedly, I did not succeed.
What I have done is something like this:
string extractInformation(string theEntireString)
{
string s = "\"string1\":\"";
string result = theEntireString.find(s);
return result;
}
But this way I suppose I store into the string the last quote and the rest of the string.
"find" function just give you the position of matched string to get the resulting string you need to use the "subst" function. Try This
string start,end;
start = theEntireString.substr(1,theEntireString.find(":")-2);
end = theEntireString.substr(theEntireString.find(":")+2,theEntireString.size()-1);
That will solve you problem
Assuming either the key or value contains a quotation mark. The following will output the value after the ":". You can also use it in a loop to repeatedly extract the value field if you have multiple key-value pairs in the input string, provided that you keep a record of the position of last found instance.
#include <iostream>
using namespace std;
string extractInformation(size_t p, string key, const string& theEntireString)
{
string s = "\"" + key +"\":\"";
auto p1 = theEntireString.find(s);
if (string::npos != p1)
p1 += s.size();
auto p2 = theEntireString.find_first_of('\"',p1);
if (string::npos != p2)
return theEntireString.substr(p1,p2-p1);
return "";
}
int main() {
string data = "\"key\":\"val\" \"key1\":\"val1\"";
string res = extractInformation(0,"key",data);
string res1 = extractInformation(0,"key1",data);
cout << res << "," << res1 << endl;
}
Outputs:
val,val1
Two steps:
First we have to find the position of the : and splice the string into two parts:
string first = theEntireString.substr(0, theEntireString.find(":"));
string second = theEntireString.substr(theEntireString.find(":") + 1);
Now, we have to remove the "":
string final_first(first.begin() + 1, first.end() - 1);
string final_second(second.begin() + 1, second.end() - 1);
You don't need any string operation. I hope the XXXXX doesn't contain any '"', so You can read the both strings directly from the file:
ifstream file("input.txt");
for( string s1,s2; getline( getline( file.ignore( numeric_limits< streamsize >::max(), '"' ), s1, '"' ) >> Char<':'> >> Char<'"'>, s2, '"' ); )
cout << "S1=" << s1 << " S2=" << s2 << endl;
the little help-function Char is:
template< char C >
std::istream& Char( std::istream& in )
{
char c;
if( in >> c && c != C )
in.setstate( std::ios_base::failbit );
return in;
}
#include <regex>
#include <iostream>
using namespace std;
const string text = R"(
XXXXXXXXXXXXXXXXXXXXXXXXXXXX"string1":"string2"XXXXXXXXXXXXXXXXXXXXXXXXXX"string3" :"string4" XXXXXXXXXXXXXXXXXXXXXXXXXXXX...
XXXXXXXXXXXXXXXXXXXXXXXXXXXX"string5": "string6"XXXXXXXXXXXXXXXXXXXXXXXXXX"string7" : "string8" XXXXXXXXXXXXXXXXXXXXXXXXXXXX...
)";
int main() {
const regex pattern{R"~("([^"]*)"\s*:\s*"([^"]*)")~"};
for (auto it = sregex_iterator(begin(text), end(text), pattern); it != sregex_iterator(); ++it) {
cout << it->format("First: $1, Second: $2") << endl;
}
}
Output:
First: string1, Second: string2
First: string3, Second: string4
First: string5, Second: string6
First: string7, Second: string8
Running (with clang and libc++): http://coliru.stacked-crooked.com/a/f0b5fd383bc227fc
This is how raw string literals look in an editor that understand them: http://bl.ocks.org/anonymous/raw/9442865/

Reading from a CSV/text file with quotes in C++

I have a working function that reads lines from a text file (CSV), but I need to modify it to be able to read double quotes (I need to have these double quotes because some of my string values contain commas, so I am using double-quotes to denote the fact that the read function should ignore commas between the double-quotes). Is there a relatively simple way to modify the function below to accommodate the fact that some of the fields will be enclosed in double quotes?
A few other notes:
I could have all of the fields enclosed in double-quotes fairly easily if that helps (rather than just the ones that are strings, as is currently the case)
I could also change the delimiter fairly easily from a comma to some other character (like a pipe), but was hoping to stick with CSV if its easy to do so
Here is my current function:
void ReadLoanData(vector<ModelLoanData>& mLoan, int dealnum) {
// Variable declarations
fstream InputFile;
string CurFileName;
ostringstream s1;
string CurLineContents;
int LineCounter;
char * cstr;
vector<string> currow;
const char * delim = ",";
s1 << "ModelLoanData" << dealnum << ".csv";
CurFileName = s1.str();
InputFile.open(CurFileName, ios::in);
if (InputFile.is_open()) {
LineCounter = 1;
while (InputFile.good()) {
// Grab the line
while (getline (InputFile, CurLineContents)) {
// Create a c-style string so we can tokenize
cstr = new char [CurLineContents.length()+1];
strcpy (cstr, CurLineContents.c_str());
// Need to resolve the "blank" token issue (strtok vs. strsep)
currow = split(cstr,delim);
// Assign the values to our model loan data object
mLoan[LineCounter] = AssignLoanData(currow);
delete[] cstr;
++LineCounter;
}
}
// Close the input file
InputFile.close();
}
else
cout << "Error: File Did Not Open" << endl;
}
The following works with the given input: a,b,c,"a,b,c","a,b",d,e,f
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main() {
std::string line;
while(std::getline(cin, line, '"')) {
std::stringstream ss(line);
while(std::getline(ss, line, ',')) {
cout << line << endl;
}
if(std::getline(cin, line, '"')) {
cout << line;
}
}
}

CString Parsing Carriage Returns

Let's say I have a string that has multiple carriage returns in it, i.e:
394968686
100630382
395950966
335666021
I'm still pretty amateur hour with C++, would anyone be willing to show me how you go about: parsing through each "line" in the string ? So I can do something with it later (add the desired line to a list). I'm guessing using Find("\n") in a loop?
Thanks guys.
while (!str.IsEmpty())
{
CString one_line = str.SpanExcluding(_T("\r\n"));
// do something with one_line
str = str.Right(str.GetLength() - one_line.GetLength()).TrimLeft(_T("\r\n"));
}
Blank lines will be eliminated with this code, but that's easily corrected if necessary.
You could try it using stringstream. Notice that you can overload the getline method to use any delimeter you want.
string line;
stringstream ss;
ss << yourstring;
while ( getline(ss, line, '\n') )
{
cout << line << endl;
}
Alternatively you could use the boost library's tokenizer class.
You can use stringstream class in C++.
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;
int main()
{
string str = "\
394968686\
100630382\
395950966\
335666021";
stringstream ss(str);
vector<string> v;
string token;
// get line by line
while (ss >> token)
{
// insert current line into a std::vector
v.push_back(token);
// print out current line
cout << token << endl;
}
}
Output of the program above:
394968686
100630382
395950966
335666021
Note that no whitespace will be included in the parsed token, with the use of operator>>. Please refer to comments below.
If your string is stored in a c-style char* or std::string then you can simply search for \n.
std::string s;
size_t pos = s.find('\n');
You can use string::substr() to get the substring and store it in a list. Pseudo code,
std::string s = " .... ";
for(size_t pos, begin = 0;
string::npos != (pos = s.find('\n'));
begin = ++ pos)
{
list.push_back(s.substr(begin, pos));
}