populating a string vector with tab delimited text - c++

I'm very new to C++.
I'm trying to populate a vector with elements from a tab delimited file. What is the easiest way to do that?
Thanks!

There could be many ways to do it, simple Google search give you a solution.
Here is example from one of my projects. It uses getline and read comma separated file (CSV), I let you change it for reading tab delimited file.
ifstream fin(filename.c_str());
string buffer;
while(!fin.eof() && getline(fin, buffer))
{
size_t prev_pos = 0, curr_pos = 0;
vector<string> tokenlist;
string token;
// check string
assert(buffer.length() != 0);
// tokenize string buffer.
curr_pos = buffer.find(',', prev_pos);
while(1) {
if(curr_pos == string::npos)
curr_pos = buffer.length();
// could be zero
int token_length = curr_pos-prev_pos;
// create new token and add it to tokenlist.
token = buffer.substr(prev_pos, token_length);
tokenlist.push_back(token);
// reached end of the line
if(curr_pos == buffer.length())
break;
prev_pos = curr_pos+1;
curr_pos = buffer.find(',', prev_pos);
}
}
UPDATE: Improved while condition.

This is probably the easiest way to do it, but vcp's approach can be more efficient.
std::vector<string> tokens;
std::string token;
while (std::getline(infile, token, '\t')
{
tokens.push_back(token);
}
Done. You can actually get this down to about three lines of code with an input iterator and a back inserter, but why?
Now if the file is cut up into lines and separated by tabs on those lines, you also have to handle the line delimiters. Now you just do the above twice, one loop for lines and an inner loop to parse the tabs.
std::vector<string> tokens;
std::string line;
while (std::getline(infile, line)
{
std::stringstream instream(line)
std::string token;
while (std::getline(instream, token, '\t')
{
tokens.push_back(token);
}
}
And if you needed to do line, then tabs, then... I dunno... quotes? Three loops. But to be honest by three I'm probably looking at writing a state machine. I doubt your teacher wants anything like that at this stage.

Related

How to find certain substring in string and then go back to certain character?

I save messages in string and I need to make filter function that finds user specified word in those messages. I've split each message by '\n' so the example of one chat would be:
user1:Hey, man\nuser2:Hey\nuser1:What's up?\nuser2:Nothing, wbu?\n etc.
Now user could ask to search for word up and I've implemented a search like this:
for (auto it = msg.cbegin(); (it = std::find(it, msg.cend(), str)) != msg.cend(); it++)
and I could put that string into stringstream and use getline to \n, but how do I go backwards to previous \n so I can get full message? Also, what about first message, cause it doesn't start with \n?
Since you said you split the strings, I image you have a vector of strings where you want to find up for example. You would do something like this
for (const auto& my_string: vector_of_strings){
if (my_string.find("up") != string::npos) {
// message containing up is my_string
}
}
In case you haven't split the strings in a vector you can use this func inspired by this:
vector<string> split(const string& s, const string& delimiter){
vector<string> ret;
size_t last = 0;
size_t next = 0;
while ((next = s.find(delimiter, last)) != string::npos) {
ret.emplace_back(s.substr (last, next - last));
last = next + 1;
}
ret.emplace_back(s.substr(last));
return ret;
}
If this function doesn't work you can always take a look at How do I iterate over the words of a string?

Separate multiple inputs from one line in a file C++

I have this file that I need to input into my code. The ^ and the + are operators in this case.
AB+^AB+^A^B
AB^C^D+AB^CD+^A^B^CD
AB^C^D+^AB^C^D+A^B^C^D
B^D+^B^D
^A^BD+^A^B^D
B^D+^A^BD+A^B^C
^B^C+BCD+B^C^D
A^C+ACD+^A^CD
AB^D+^ABD+A^BD+^A^B^D
B^D+^A^CD+^A^B^C^D
I wanted to separate each node between the '+' but I also want to keep the lines separate. For example, the first line would be separated into AB, ^AB,^A^B and would be separate from the second line. I am aware of the getline(string,file,"+") function but I do not know how to differentiate each line using that method. Any help would be appreciated!.
Start by using getline to read all individual lines in the file. For each of the lines, split the line into a vector of operands:
// copied from my answer on Code Review: https://codereview.stackexchange.com/a/238026
auto split(std::string_view s, std::string_view delimiter)
{
std::vector<std::string> result;
std::size_t pos_start = 0, pos_end;
while ((pos_end = s.find(delimiter, pos_start)) != s.npos) {
res.push_back(s.substr(pos_start, pos_end - pos_start));
pos_start = pos_end + delimiter.size();
}
res.push_back(s.substr(pos_start));
return res;
}
This function uses the find and substr methods of std::string. Then, you can do
std::ifstream file{"filename"};
std::vector<std::vector<std::string>> data;
for (std::string line; std::getline(file, line);) {
data.push_back(split(line, "+"));
}

Tokenize elements from a text file by removing comments, extra spaces and blank lines in C++

I'm trying to eliminate comments, blank lines and extra spaces within a text file, then tokenize the elements leftover. Each token needs a space before and after.
exampleFile.txt
var
/* declare variables */a1 ,
b2a , c,
Here's what's working as of now,
string line; //line: represents one line of text from file
ifstream InputFile("exampleFile", ios::in); //read from exampleFile.txt
//Remove comments
while (InputFile && getline(InputFile, line, '\0'))
{
while (line.find("/*") != string::npos)
{
size_t Begin = line.find("/*");
line.erase(Begin, (line.find("*/", Begin) - Begin) + 2);
// Start at Begin, erase from Begin to where */ is found
}
}
This removes comments, but I can't seem to figure out a way to tokenize while this is happening.
So my questions are:
Is it possible to remove comments, spaces, and empty lines and tokenize all in this while statement?
How can I implement a function to add spaces in between each token before they are tokenized? Tokens like c, need to be recognized as c and , individually.
Thank you in advanced for the help!
If you need to skip whitespace characters and you don't care about new lines then I'd recommend reading the file with operator>>.
You could write simply:
std::string word;
bool isComment = false;
while(file >> word)
{
if (isInsideComment(word, isComment))
continue;
// do processing of the tokens here
std::cout << word << std::endl;
}
Where the helper function could be implemented as follows:
bool isInsideComment(std::string &word, bool &isComment)
{
const std::string tagStart = "/*";
const std::string tagStop = "*/";
// match start marker
if (std::equal(tagStart.rbegin(), tagStart.rend(), word.rbegin())) // ends with tagStart
{
isComment = true;
if (word == tagStart)
return true;
word = word.substr(0, word.find(tagStart));
return false;
}
// match end marker
if (isComment)
{
if (std::equal(tagStop.begin(), tagStop.end(), word.begin())) // starts with tagStop
{
isComment = false;
word = word.substr(tagStop.size());
return false;
}
return true;
}
return false;
}
For your example this would print out:
var
a1
,
b2a
,
c,
The above logic should also handle multiline comments if you're interested.
However, denote that the function implementation should be modified according to what are your assumptions regarding the comment tokens. For instance, are they always separated with whitespaces from other words? Or is it possible that a var1/*comment*/var2 expression would be parsed? The above example won't work in such situation.
Hence, another option would be (what you already started implementing) reading lines or even chunks of data from the file (to assure begin and end comment tokens are matched) and learning positions of the comment markers with find or regex to remove them afterwards.

What types of indicators are there for the end of a string when tokenizing a sentence?

I am trying to take a string holding a sentence and break it up by words to add to a linked list class called wordList.
When dealing with strings in C++, what is the indicator that you have reached the end of a string? Searched here are found that c strings are null terminated and some are indicated with a '\0' but these solutions give me errors.
I know there are other ways to do this (like scanning through individual characters) but I am fuzzy on how to implement.
void lineScan( string line) // Adds words to wordList from line of a file
{
istringstream iss(line);
string lineWord;
getline(iss, lineWord, ' ');
wrds.addWords( lineWord );
while( lineWord!= NULL )
{
getline(iss, lineWord, ' ');
wrds.addWords( lineWord );
}
}
You probably want to skip all whitespace, not use a single space as separator (your code will read empty tokens).
But you're not really dealing with strings here, and in particular not with C strings.
Since you're using istringstream, you're looking for the end of a stream, and it works like all instreams.
void lineScan(string line) // Adds words to wordList from line of a file
{
istringstream iss(line);
string word;
while (iss >> word)
{
wrds.addWords(word);
}
}

How to split a string on every 25th line break(\n) if it contains multiple \n

I've read a file and stored its content into a std::string variable BUF and now want to split the data into small blocks where each block would contain 25 lines.
I see two options:
construct std::istringstream from BUF (or use the stream with which you read the file) and use std::getline to read line by line and append to previous lines, while counting up to 25 and so on...
use std::string::find in a loop counting to 25 and then use std::string::substr or std::find and construction from an iterator range.
I think that's sufficient hint.
I made it!
if(N>25) \\N no of lines
{
int i=0;
std::istringstream ss(text);
text="";
string splited,part="";
while(std::getline(ss, splited, '\n')) {
part+=splited+"\n";
if(i==24)
{
i=0;
part.erase(part.size() - 1);
d.Insert(part,time);
cout<<part;
part="";
continue;
}
else{
++i;
continue;
}
}
if(i!=0)
{
d.Insert(part,time);
}
}
else{
d.Insert(text,time);
}