Separate multiple inputs from one line in a file C++ - c++

I have this file that I need to input into my code. The ^ and the + are operators in this case.
AB+^AB+^A^B
AB^C^D+AB^CD+^A^B^CD
AB^C^D+^AB^C^D+A^B^C^D
B^D+^B^D
^A^BD+^A^B^D
B^D+^A^BD+A^B^C
^B^C+BCD+B^C^D
A^C+ACD+^A^CD
AB^D+^ABD+A^BD+^A^B^D
B^D+^A^CD+^A^B^C^D
I wanted to separate each node between the '+' but I also want to keep the lines separate. For example, the first line would be separated into AB, ^AB,^A^B and would be separate from the second line. I am aware of the getline(string,file,"+") function but I do not know how to differentiate each line using that method. Any help would be appreciated!.

Start by using getline to read all individual lines in the file. For each of the lines, split the line into a vector of operands:
// copied from my answer on Code Review: https://codereview.stackexchange.com/a/238026
auto split(std::string_view s, std::string_view delimiter)
{
std::vector<std::string> result;
std::size_t pos_start = 0, pos_end;
while ((pos_end = s.find(delimiter, pos_start)) != s.npos) {
res.push_back(s.substr(pos_start, pos_end - pos_start));
pos_start = pos_end + delimiter.size();
}
res.push_back(s.substr(pos_start));
return res;
}
This function uses the find and substr methods of std::string. Then, you can do
std::ifstream file{"filename"};
std::vector<std::vector<std::string>> data;
for (std::string line; std::getline(file, line);) {
data.push_back(split(line, "+"));
}

Related

How to find certain substring in string and then go back to certain character?

I save messages in string and I need to make filter function that finds user specified word in those messages. I've split each message by '\n' so the example of one chat would be:
user1:Hey, man\nuser2:Hey\nuser1:What's up?\nuser2:Nothing, wbu?\n etc.
Now user could ask to search for word up and I've implemented a search like this:
for (auto it = msg.cbegin(); (it = std::find(it, msg.cend(), str)) != msg.cend(); it++)
and I could put that string into stringstream and use getline to \n, but how do I go backwards to previous \n so I can get full message? Also, what about first message, cause it doesn't start with \n?
Since you said you split the strings, I image you have a vector of strings where you want to find up for example. You would do something like this
for (const auto& my_string: vector_of_strings){
if (my_string.find("up") != string::npos) {
// message containing up is my_string
}
}
In case you haven't split the strings in a vector you can use this func inspired by this:
vector<string> split(const string& s, const string& delimiter){
vector<string> ret;
size_t last = 0;
size_t next = 0;
while ((next = s.find(delimiter, last)) != string::npos) {
ret.emplace_back(s.substr (last, next - last));
last = next + 1;
}
ret.emplace_back(s.substr(last));
return ret;
}
If this function doesn't work you can always take a look at How do I iterate over the words of a string?

How to create a function to extract field in string inside of vector? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
i want to make a function like this.
using namespace std;
vector<string> hj;
vector<string> feldseperator(const string& s, char delimiter) {
size_t pos = 0;
string token;
while ((pos = s.find(delimiter)) != string::npos)
{
token = s.substr(0, pos);
cout << token << endl;
hj.push_back(token);
s.erase(); // I WANT TO DELETE THE FIRST FIELD + CHAR
}
return hj;
}
int main()
{
string s = "dog;cat;fish;fax;fox;fast;";
char f = ';';
feldseperator(s, f);
cin.get();
}
Ah, I see, you're trying to break string on basis of delimiters. Your problem is you want to remove prefix of string uptil first occurrence of delimiter - as stated by comment in your code. So, you could:
Use second param of std::find, which is "Position of the first character in the string to be considered in the search" and update your code like this:
size_t last_pos = 0 , pos = 0;
while ((pos = s.find(delimiter , last_pos)) != string::npos)
{
token = s.substr(last_pos, pos - last_pos);
last_pos = pos + 1; //pos is position of delimiter, you want next search to begin from character which comes after delimiter.
..
}
Since you already have position of delimiter, you can reinitialize string s as s = s.substr(pos+1 , s.size() - pos ) but then you'd have to removes const keyword, making option 1 a better choice.
Try this,
According to std::erase (overload 3), it deletes in the range [first, last) hence the +1.
using namespace std;
vector<string> hj;
vector<string> feldseperator(const string& s, char delimiter) {
auto copy = s;
size_t pos = 0;
string token;
while ((pos = copy.find(delimiter)) != string::npos)
{
token = s.substr(0, pos);
cout << token << endl;
hj.push_back(token);
copy.erase(copy.begin(), copy.begin()+pos+1); // I WANT TO DELETE THE FIRST FIELD + CHAR
}
return hj;
}
int main()
{
string s = "dog;cat;fish;fax;fox;fast;";
char f = ';';
feldseperator(s, f);
cin.get();
}
Note that I'm passing a copy of the string, and not a reference. You may want to change that as required.
So, as I understand, you want to split a string that contains substrings, delimited by a ";". This process is called tokenizing. Becuase you want to split a string into smaller tokens.
Modern C++ has a build in functionaliyt which is exactly designed for that purpose. It is called std::sregex_token_iterator. What is this thing?
As it name says, it is an iterator. It will iterate over a string (hence the 's' in its name) and return the split up tokens. The tokens will be matched again a regular expression. Or, nagtively, the delimiter will be matched and the rest will be seen as token and returned. This will be controlled via the last flag in its constructor.
Let's have a look at this constructor:
hj(std::sregex_token_iterator(s.begin(), s.end(), delimiter, -1), {});
The first parameter is, where it should start in the source string, the 2nd parameter is the end position, up to which the iterator should work. The lase parameter is:
1, if you want to have a positive match for the regex
-1, will return everything that not matches the regex
And last but not least the regex itself. Please read in the net abot regex'es. There are tons of pages available.
So, then we check the one liner for the field extraction.
std::vector<std::string> hj(std::sregex_token_iterator(s.begin(), s.end(), delimiter, -1), {});
What is that. The first is obvious. We define a std::vector<std::string>> with the name "hj". As possible with any variable definition, we will call a constructor, to construct the std::vector<std::string>>.
If you look in std::vector range constructor (no 5), you will see that we can initialize the vector, with a other iterator (begin and end), and copy the values from there. The begin iterator is given and the end iterator is given automatically with {}, because the empty constructor for the std::sregex_token_iterator is equal to its "end()".
That's it, everything done with a one-liner.
Please see:
#include <iostream>
#include <iterator>
#include <regex>
#include <string>
#include <vector>
int main() {
// The string to split
std::string s("dog;cat;fish;fax;fox;fast;");
// The delimiter
std::regex delimiter(";");
// Tokenize and store result in vector
std::vector<std::string> hj(std::sregex_token_iterator(s.begin(), s.end(), delimiter, -1), {});
std::cin.get();
}
Byt the way, if you have an existing vector, then you can copy the result into that vector:
std::copy(std::sregex_token_iterator(s.begin(), s.end(), delimiter, -1), {}, std::back_inserter(hj));
I hope that you can see the simplicity of that approach.
Of course there are many other possible solutions, and everybdoy can select whatever he wants.

Can I use 2 or more delimiters in C++ function getline? [duplicate]

This question already has answers here:
How can I read and parse CSV files in C++?
(39 answers)
Closed 4 years ago.
I would like to know how can I use 2 or more delimiters in the getline functon, that's my problem:
The program reads a text file... each line is goning to be like:
New Your, Paris, 100
CityA, CityB, 200
I am using getline(file, line), but I got the whole line, when I want to to get CityA, then CityB and then the number; and if I use ',' delimiter, I won't know when is the next line, so I'm trying to figure out some solution..
Though, how could I use comma and \n as a delimiter?
By the way,I'm manipulating string type,not char, so strtok is not possible :/
some scratch:
string line;
ifstream file("text.txt");
if(file.is_open())
while(!file.eof()){
getline(file, line);
// here I need to get each string before comma and \n
}
You can read a line using std::getline, then pass the line to a std::stringstream and read the comma separated values off it
string line;
ifstream file("text.txt");
if(file.is_open()){
while(getline(file, line)){ // get a whole line
std::stringstream ss(line);
while(getline(ss, line, ',')){
// You now have separate entites here
}
}
No, std::getline() only accepts a single character, to override the default delimiter. std::getline() does not have an option for multiple alternate delimiters.
The correct way to parse this kind of input is to use the default std::getline() to read the entire line into a std::string, then construct a std::istringstream, and then parse it further, into comma-separate values.
However, if you are truly parsing comma-separated values, you should be using a proper CSV parser.
Often, it is more intuitive and efficient to parse character input in a hierarchical, tree-like manner, where you start by splitting the string into its major blocks, then go on to process each of the blocks, splitting them up into smaller parts, and so on.
An alternative to this is to tokenize like strtok does -- from the beginning of input, handling one token at a time until the end of input is encountered. This may be preferred when parsing simple inputs, because its is straightforward to implement. This style can also be used when parsing inputs with nested structure, but this requires maintaining some kind of context information, which might grow too complex to maintain inside a single function or limited region of code.
Someone relying on the C++ std library usually ends up using a std::stringstream, along with std::getline to tokenize string input. But, this only gives you one delimiter. They would never consider using strtok, because it is a non-reentrant piece of junk from the C runtime library. So, they end up using streams, and with only one delimiter, one is obligated to use a hierarchical parsing style.
But zneak brought up std::string::find_first_of, which takes a set of characters and returns the position nearest to the beginning of the string containing a character from the set. And there are other member functions: find_last_of, find_first_not_of, and more, which seem to exist for the sole purpose of parsing strings. But std::string stops short of providing useful tokenizing functions.
Another option is the <regex> library, which can do anything you want, but it is new and you will need to get used to its syntax.
But, with very little effort, you can leverage existing functions in std::string to perform tokenizing tasks, and without resorting to streams. Here is a simple example. get_to() is the tokenizing function and tokenize demonstrates how it is used.
The code in this example will be slower than strtok, because it constantly erases characters from the beginning of the string being parsed, and also copies and returns substrings. This makes the code easy to understand, but it does not mean more efficient tokenizing is impossible. It wouldn't even be that much more complicated than this -- you would just keep track of your current position, use this as the start argument in std::string member functions, and never alter the source string. And even better techniques exist, no doubt.
To understand the example's code, start at the bottom, where main() is and where you can see how the functions are used. The top of this code is dominated by basic utility functions and dumb comments.
#include <iostream>
#include <string>
#include <utility>
namespace string_parsing {
// in-place trim whitespace off ends of a std::string
inline void trim(std::string &str) {
auto space_is_it = [] (char c) {
// A few asks:
// * Suppress criticism WRT localization concerns
// * Avoid jumping to conclusions! And seeing monsters everywhere!
// Things like...ah! Believing "thoughts" that assumptions were made
// regarding character encoding.
// * If an obvious, portable alternative exists within the C++ Standard Library,
// you will see it in 2.0, so no new defect tickets, please.
// * Go ahead and ignore the rumor that using lambdas just to get
// local function definitions is "cheap" or "dumb" or "ignorant."
// That's the latest round of FUD from...*mumble*.
return c > '\0' && c <= ' ';
};
for(auto rit = str.rbegin(); rit != str.rend(); ++rit) {
if(!space_is_it(*rit)) {
if(rit != str.rbegin()) {
str.erase(&*rit - &*str.begin() + 1);
}
for(auto fit=str.begin(); fit != str.end(); ++fit) {
if(!space_is_it(*fit)) {
if(fit != str.begin()) {
str.erase(str.begin(), fit);
}
return;
} } } }
str.clear();
}
// get_to(string, <delimiter set> [, delimiter])
// The input+output argument "string" is searched for the first occurance of one
// from a set of delimiters. All characters to the left of, and the delimiter itself
// are deleted in-place, and the substring which was to the left of the delimiter is
// returned, with whitespace trimmed.
// <delimiter set> is forwarded to std::string::find_first_of, so its type may match
// whatever this function's overloads accept, but this is usually expressed
// as a string literal: ", \n" matches commas, spaces and linefeeds.
// The optional output argument "found_delimiter" receives the delimiter character just found.
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters, char& found_delimiter) {
const auto pos = str.find_first_of(std::forward<D>(delimiters));
if(pos == std::string::npos) {
// When none of the delimiters are present,
// clear the string and return its last value.
// This effectively makes the end of a string an
// implied delimiter.
// This behavior is convenient for parsers which
// consume chunks of a string, looping until
// the string is empty.
// Without this feature, it would be possible to
// continue looping forever, when an iteration
// leaves the string unchanged, usually caused by
// a syntax error in the source string.
// So the implied end-of-string delimiter takes
// away the caller's burden of anticipating and
// handling the range of possible errors.
found_delimiter = '\0';
std::string result;
std::swap(result, str);
trim(result);
return result;
}
found_delimiter = str[pos];
auto left = str.substr(0, pos);
trim(left);
str.erase(0, pos + 1);
return left;
}
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters) {
char discarded_delimiter;
return get_to(str, std::forward<D>(delimiters), discarded_delimiter);
}
inline std::string pad_right(const std::string& str,
std::string::size_type min_length,
char pad_char=' ')
{
if(str.length() >= min_length ) return str;
return str + std::string(min_length - str.length(), pad_char);
}
inline void tokenize(std::string source) {
std::cout << source << "\n\n";
bool quote_opened = false;
while(!source.empty()) {
// If we just encountered an open-quote, only include the quote character
// in the delimiter set, so that a quoted token may contain any of the
// other delimiters.
const char* delimiter_set = quote_opened ? "'" : ",'{}";
char delimiter;
auto token = get_to(source, delimiter_set, delimiter);
quote_opened = delimiter == '\'' && !quote_opened;
std::cout << " " << pad_right('[' + token + ']', 16)
<< " " << delimiter << '\n';
}
std::cout << '\n';
}
}
int main() {
string_parsing::tokenize("{1.5, null, 88, 'hi, {there}!'}");
}
This outputs:
{1.5, null, 88, 'hi, {there}!'}
[] {
[1.5] ,
[null] ,
[88] ,
[] '
[hi, {there}!] '
[] }
I don't think that's how you should attack the problem (even if you could do it); instead:
Use what you have to read in each line
Then split up that line by the commas to get the pieces that you want.
If strtok will do the job for #2, you can always convert your string into a char array.

populating a string vector with tab delimited text

I'm very new to C++.
I'm trying to populate a vector with elements from a tab delimited file. What is the easiest way to do that?
Thanks!
There could be many ways to do it, simple Google search give you a solution.
Here is example from one of my projects. It uses getline and read comma separated file (CSV), I let you change it for reading tab delimited file.
ifstream fin(filename.c_str());
string buffer;
while(!fin.eof() && getline(fin, buffer))
{
size_t prev_pos = 0, curr_pos = 0;
vector<string> tokenlist;
string token;
// check string
assert(buffer.length() != 0);
// tokenize string buffer.
curr_pos = buffer.find(',', prev_pos);
while(1) {
if(curr_pos == string::npos)
curr_pos = buffer.length();
// could be zero
int token_length = curr_pos-prev_pos;
// create new token and add it to tokenlist.
token = buffer.substr(prev_pos, token_length);
tokenlist.push_back(token);
// reached end of the line
if(curr_pos == buffer.length())
break;
prev_pos = curr_pos+1;
curr_pos = buffer.find(',', prev_pos);
}
}
UPDATE: Improved while condition.
This is probably the easiest way to do it, but vcp's approach can be more efficient.
std::vector<string> tokens;
std::string token;
while (std::getline(infile, token, '\t')
{
tokens.push_back(token);
}
Done. You can actually get this down to about three lines of code with an input iterator and a back inserter, but why?
Now if the file is cut up into lines and separated by tabs on those lines, you also have to handle the line delimiters. Now you just do the above twice, one loop for lines and an inner loop to parse the tabs.
std::vector<string> tokens;
std::string line;
while (std::getline(infile, line)
{
std::stringstream instream(line)
std::string token;
while (std::getline(instream, token, '\t')
{
tokens.push_back(token);
}
}
And if you needed to do line, then tabs, then... I dunno... quotes? Three loops. But to be honest by three I'm probably looking at writing a state machine. I doubt your teacher wants anything like that at this stage.

break long string into multiple c++

I have a string that is received from third party. This string is actually the text from a text file and it may contain UNIX LF or Windows CRLF for line termination. How can I break this into multiple strings ignoring blank lines? I was planning to do the following, but am not sure if there is a better way. All I need to do is read line by line. Vector here is just a convenience and I can avoid it.
* Unfortunately I donot have access to the actual file. I only receive the string object *
string textLine;
vector<string> tokens;
size_t pos = 0;
while( true ) {
size_t nextPos = textLine.find( pos, '\n\r' );
if( nextPos == textLine.npos )
break;
tokens.push_back( string( textLine.substr( pos, nextPos - pos ) ) );
pos = nextPos + 1;
}
You could use std::getline as you're reading from the file instead of reading the whole thing into a string. That will break things up line by line by default. You can simply not push_back any string that comes up empty.
string line;
vector<string> tokens;
while (getline(file, line))
{
if (!line.empty()) tokens.push_back(line);
}
UPDATE:
If you don't have access to the file, you can use the same code by initializing a stringstream with the whole text. std::getline works on all stream types, not just files.
I'd use getline to create new strings based on \n, and then manipulate the line endings.
string textLine;
vector<string> tokens;
istringstream sTextLine;
string line;
while(getline(sTextLine, line)) {
if(line.empty()) continue;
if(line[line.size()-1] == '\r') line.resize(line.size()-1);
if(line.empty()) continue;
tokens.push_back(line);
}
EDIT: Use istringstream instead of stringstream.
I would use the approach given here (std::getline on a std::istringstream)...
Splitting a C++ std::string using tokens, e.g. ";"
... except omit the ';' parameter to std::getline.
A lot depends on what is already present in your toolkit. I work a lot
with files which come from Windows and are read under Unix, and vice
versa, so I have most of the tools for converting CRLF into LF at hand.
If you don't have any, you might want a function along the lines of:
void addLine( std::vector<std::string>& dest, std::string line )
{
if ( !line.empty() && *(line.end() - 1) == '\r' ) {
line.erase( line.end() - 1 );
}
if ( !line.empty() ) {
dest.push_back( line );
}
}
to do your insertions. As for breaking the original text into lines,
you can use std::istringstream and std::getline, as others have
suggested; it's simple and straightforward, even if it is overkill.
(The std::istringstream is a fairly heavy mechanism, since it supports
all sorts of input conversions you don't need.) Alternatively, you
might consider a loop along the lines of:
std::string::const_iterator start = textLine.begin();
std::string::const_iterator end = textLine.end();
std::string::const_iterator next = std::find( start, end, '\n' );
while ( next != end ) {
addLine( tokens, std::string( start, next ) );
start = next + 1;
next = std::find( start, end, '\n' );
}
addLine( tokens, std::string( start, end ) );
Or you could break things down into separate operations:
textLine.erase(
std::remove( textLine.begin(), textLine.end(), '\r'),
textLine.end() );
to get rid of all of the CR's,
std::vector<std:;string> tokens( split( textLine, '\n' ) );
, to break it up into lines, where split is a generalized function
along the lines of the above loop (a useful tool to add to your
toolkit), and finally:
tokens.erase(
std::remove_if( tokens.begin(), tokens.end(),
boost::bind( &std::string::empty, _1 ) ),
tokens.end() );
. (Generally speaking: if this is a one-of situation, use the
std::istringstream based solution. If you think you may have to do
something like this from time to time in the future, add the split
function to your took kit, and use it.)
You could use strtok.
Split string into tokens
A sequence of calls to this function
split str into tokens, which are
sequences of contiguous characters
separated by any of the characters
that are part of delimiters.
I would put the string in a stringstream and then use the getline method like the previous answer mentioned. Then, you could just act like you were reading the text in from a file when it really comes from another string.