Reading a iostream until a string delimiter is found

Reading a iostream until a string delimiter is found - c++

I currently have a function that reads from a stream until a predefined stream-stopper is found. The only way I could currently get it up and running is by using std::getline and having one character followed by a newline (in my case char(3)) as my stream-stopper.
std::string readuntil(std::istream& in) {
std::string text;
std::getline(in, text, char(3));
return text;
}
Is there any way to achieve the same but with a larger string as my stream-stopper? I don't mind it having to be followed by a new-line, but I want my delimiter to be a random string of some size so that the probability of it occurring by change in the stream is very very low.
Any idea how to achieve this?

I assume that your requirements are:
a function taking an istream ref and a string as parameter
the string is a delimiter and the function must return a string containing all the characters that arrived before it
the stream must be positioned immediately after the delimiter for further processing.
AFAIK, neither the C++ nor the C standard library contain a function for that. I would just:
read until the last character of the delimiter in a temporary string
accumulate that in a global string
iterate the 2 above actions if the global string does not end with the delimiter
optionaly remove the delimiter from the end of the global string
return the global string
A possible C++ implementation is:
std::string readuntil(std::istream& in, std::string delimiter) {
std::string cr;
char delim = *(delimiter.rbegin());
size_t sz = delimiter.size(), tot;
do {
std::string temp;
std::getline(in, temp, delim);
cr += temp + delim;
tot = cr.size();
} while ((tot < sz) || (cr.substr(tot - sz, sz) != delimiter));
return cr.substr(0, tot - sz); // or return cr; if you want to keep the delimiter
}

Related

How can I erase first and last character in string?

This function returns an array of strings with a list of files in a folder. It looks like this:
"folder//xyz.txt"
How can I make it look like this?
folder//xyz.txt
Its the same but without "".
vector<string> list_of_files(string folder_name)
{
vector<string> files;
string path = folder_name;
for (const auto& entry : fs::directory_iterator(path))
{
stringstream ss;
ss << entry.path(); //convert entry.path() to string
string str = ss.str();
files.push_back(ss.str());
}
return files;
}

Erasing the first and last characters of a string is easy:
if (str.size() >= 1)
str.erase(0, 1); // from 1st char (#0), len 1; bit verbose as not designed for this
if (str.size() >= 1)
str.pop_back(); // chop off the end
Your quotes have come from inserting the path to a stream (quoted is used to help prevent bugs due to spaces down the line).
Fortunately, you don't need any of this: as explored in the comments, the stringstream is entirely unnecessary; the path already converts to a string if you ask it to:
vector<string> list_of_files(string folder_name)
{
vector<string> files;
for (const auto& entry : fs::directory_iterator(folder_name))
files.push_back(entry.path().string());
return files;
}

how do you split a string embedded in a delimiter in C++?

I understand how to split a string by a string by a delimiter in C++, but how do you split a string embedded in a delimiter, e.g. try and split ”~!hello~! random junk... ~!world~!” by the string ”~!” into an array of [“hello”, “ random junk...”, “world”]? are there any C++ standard library functions for this or if not any algorithm which could achieve this?

#include <iostream>
#include <vector>
using namespace std;
vector<string> split(string s,string delimiter){
vector<string> res;
s+=delimiter; //adding delimiter at end of string
string word;
int pos = s.find(delimiter);
while (pos != string::npos) {
word = s.substr(0, pos); // The Word that comes before the delimiter
res.push_back(word); // Push the Word to our Final vector
s.erase(0, pos + delimiter.length()); // Delete the Delimiter and repeat till end of String to find all words
pos = s.find(delimiter); // Update pos to hold position of next Delimiter in our String
}
res.push_back(s); //push the last word that comes after the delimiter
return res;
}
int main() {
string s="~!hello~!random junk... ~!world~!";
vector<string>words = split(s,"~!");
int n=words.size();
for(int i=0;i<n;i++)
std::cout<<words[i]<<std::endl;
return 0;
}
The above program will find all the words that occur before, in between and after the delimiter that you specify. With minor changes to the function, you can make the function suit your need ( like for example if you don't need to find the word that occurs before the first delimiter or last delimiter) .
But for your need, the given function does the word splitting in the right way according to the delimiter you provide.
I hope this solves your question !

How to cleanly extract a string delimited string from an istream in c++

I am trying to extract a string from an istream with strings as delimiters, yet i haven't found any string operations with behavior close to such as find() or substr() in istreams.
Here is an example istream content:
delim_oneFUUBARdelim_two
and my goal is to get FUUBAR into a string with as little workarounds as possible.
My current solution was to copy all istream content into a string using this solution for it and then extracting using string operations. Is there a way to avoid this unnecessary copying and only read as much from the istream as needed to preserve all content after the delimited string in case there are more to be found in similar fashion?

You can easily create a type that will consume the expected separator or delimiter:
struct Text
{
std::string t_;
};
std::istream& operator>>(std::istream& is, Text& t)
{
is >> std::skipws;
for (char c: t.t_)
{
if (is.peek() != c)
{
is.setstate(std::ios::failbit);
break;
}
is.get(); // throw away known-matching char
}
return is;
}
See it in action on ideone
This suffices when the previous stream extraction naturally stops without consuming the delimiter (e.g. an int extraction followed by a delimiter that doesn't start with a digit), which will typically be the case unless the previous extraction is of a std::string. Single-character delimiters can be specified to getline, but say your delimiter is "</block>" and the stream contains "<black>metalic</black></block>42" - you'd want something to extract "<black>metallic</black>" into a string, throw away the "</block>" delimiter, and leave the "42" on the stream:
struct Until_Delim {
Until_Delim(std::string& s, std::string delim) : s_(s), delim_(delim) { }
std::string& s_;
std::string delim_;
};
std::istream& operator>>(std::istream& is, const Until_Delim& ud)
{
std::istream::sentry sentry(is);
size_t in_delim = 0;
for (char c = is.get(); is; c = is.get())
{
if (c == ud.delim_[in_delim])
{
if (++in_delim == ud.delim_.size())
break;
continue;
}
if (in_delim) // was part-way into delimiter match...
{
ud.s_.append(ud.delim_, 0, in_delim);
in_delim = 0;
}
ud.s_ += c;
}
// may need to trim trailing whitespace...
if (is.flags() & std::ios_base::skipws)
while (!ud.s_.empty() && std::isspace(ud.s_.back()))
ud.s_.pop_back();
return is;
}
This can then be used as in:
string a_string;
if (some_stream >> Until_Delim(a_string, "</block>") >> whatevers_after)
...
This notation might seem a bit hackish, but there's precedent in Standard Library's std::quoted().
You can see the code running here.

Standard streams are equipped with locales that can do classification, namely the std::ctype<> facet. We can use this facet to ignore() characters in a stream while a certain classification is not present in the next available character. Here's a working example:
#include <iostream>
#include <sstream>
using mask = std::ctype_base::mask;
template<mask m>
void scan_classification(std::istream& is)
{
auto& ctype = std::use_facet<std::ctype<char>>(is.getloc());
while (is.peek() != std::char_traits<char>::eof() && !ctype.is(m, is.peek()))
is.ignore();
}
int main()
{
std::istringstream iss("some_string_delimiter3.1415another_string");
double d;
scan_classification<std::ctype_base::digit>(iss);
if (iss >> d)
std::cout << std::to_string(d); // "3.1415"
}

wordwap function fix to preserve whitespace between words

Some time ago I was looking for a snippet to do a wordwrap for a certain size of line length without breaking up the words. It was working fair enough, but now when I started using it in edit control, I noticed it eats up multiple white space symbols in between. I am contemplating how to fix it or get rid of it completely if wstringstream is not suitable for the task. Maybe someone out there have a similar function?
void WordWrap2(const std::wstring& inputString, std::vector<std::wstring>& outputString, unsigned int lineLength)
{
std::wstringstream iss(inputString);
std::wstring line;
std::wstring word;
while(iss >> word)
{
if (line.length() + word.length() > lineLength)
{
outputString.push_back(line+_T("\r"));
line.clear();
}
if( !word.empty() ) {
if( line.empty() ) line += word; else line += +L" " + word;
}
}
if (!line.empty())
{
outputString.push_back(line+_T("\r"));
}
}
Wrap line delimiter symbol should remain \r

Instead of reading a word at a time, and adding words until you'd exceed the desired line length, I'd start from the point where you want to wrap, and work backwards until you find a white-space character, then add that entire chunk to the output.
#include <iostream>
#include <string>
#include <vector>
#include <stdlib.h>
void WordWrap2(const std::wstring& inputString,
std::vector<std::wstring>& outputString,
unsigned int lineLength) {
size_t last_pos = 0;
size_t pos;
for (pos=lineLength; pos < inputString.length(); pos += lineLength) {
while (pos > last_pos && !isspace((unsigned char)inputString[pos]))
--pos;
outputString.push_back(inputString.substr(last_pos, pos-last_pos));
last_pos = pos;
while (isspace((unsigned char)inputString[last_pos]))
++last_pos;
}
outputString.push_back(inputString.substr(last_pos));
}
As it stands, this will fail if it encounters a single word that's longer than the line length you've specified (in such a case, it probably should just break in the middle of the word, but it currently doesn't).
I've also written it to skip over whitespace between words when they happen at a line break. If you really don't want that, just eliminate the:
while (isspace((unsigned char)inputString[last_pos]))
++last_pos;

If you don't want to loose space characters, you need to add the following line before doing any reads:
iss >> std::noskipws;
But then using >> with a string as a second argument won't work well w.r.t. spaces.
You'll have to resort to reading chars, and manage them in an ad'hoc manner yourself.

How to tokenize (words) classifying punctuation as space

Based on this question which was closed rather quickly:
Trying to create a program to read a users input then break the array into seperate words are my pointers all valid?
Rather than closing I think some extra work could have gone into helping the OP to clarify the question.
The Question:
I want to tokenize user input and store the tokens into an array of words.
I want to use punctuation (.,-) as delimiter and thus removed it from the token stream.
In C I would use strtok() to break an array into tokens and then manually build an array.
Like this:
The main Function:
char **findwords(char *str);
int main()
{
int test;
char words[100]; //an array of chars to hold the string given by the user
char **word; //pointer to a list of words
int index = 0; //index of the current word we are printing
char c;
cout << "die monster !";
//a loop to place the charecters that the user put in into the array
do
{
c = getchar();
words[index] = c;
}
while (words[index] != '\n');
word = findwords(words);
while (word[index] != 0) //loop through the list of words until the end of the list
{
printf("%s\n", word[index]); // while the words are going through the list print them out
index ++; //move on to the next word
}
//free it from the list since it was dynamically allocated
free(word);
cin >> test;
return 0;
}
The line tokenizer:
char **findwords(char *str)
{
int size = 20; //original size of the list
char *newword; //pointer to the new word from strok
int index = 0; //our current location in words
char **words = (char **)malloc(sizeof(char *) * (size +1)); //this is the actual list of words
/* Get the initial word, and pass in the original string we want strtok() *
* to work on. Here, we are seperating words based on spaces, commas, *
* periods, and dashes. IE, if they are found, a new word is created. */
newword = strtok(str, " ,.-");
while (newword != 0) //create a loop that goes through the string until it gets to the end
{
if (index == size)
{
//if the string is larger than the array increase the maximum size of the array
size += 10;
//resize the array
char **words = (char **)malloc(sizeof(char *) * (size +1));
}
//asign words to its proper value
words[index] = newword;
//get the next word in the string
newword = strtok(0, " ,.-");
//increment the index to get to the next word
++index;
}
words[index] = 0;
return words;
}
Any comments on the above code would be appreciated.
But, additionally, what is the best technique for achieving this goal in C++?

Have a look at boost tokenizer for something that's much better in a C++ context than strtok().

Already covered by a lot of questions is how to tokenize a stream in C++.
Example: How to read a file and get words in C++
But what is harder to find is how get the same functionality as strtok():
Basically strtok() allows you to split the string on a whole bunch of user defined characters, while the C++ stream only allows you to use white space as a separator. Fortunately the definition of white space is defined by the locale so we can modify the locale to treat other characters as space and this will then allow us to tokenize the stream in a more natural fashion.
#include <locale>
#include <string>
#include <sstream>
#include <iostream>
// This is my facet that will treat the ,.- as space characters and thus ignore them.
class WordSplitterFacet: public std::ctype<char>
{
public:
typedef std::ctype<char> base;
typedef base::char_type char_type;
WordSplitterFacet(std::locale const& l)
: base(table)
{
std::ctype<char> const& defaultCType = std::use_facet<std::ctype<char> >(l);
// Copy the default value from the provided locale
static char data[256];
for(int loop = 0;loop < 256;++loop) { data[loop] = loop;}
defaultCType.is(data, data+256, table);
// Modifications to default to include extra space types.
table[','] |= base::space;
table['.'] |= base::space;
table['-'] |= base::space;
}
private:
base::mask table[256];
};
We can then use this facet in a local like this:
std::ctype<char>* wordSplitter(new WordSplitterFacet(std::locale()));
<stream>.imbue(std::locale(std::locale(), wordSplitter));
The next part of your question is how would I store these words in an array. Well, in C++ you would not. You would delegate this functionality to the std::vector/std::string. By reading your code you will see that your code is doing two major things in the same part of the code.
It is managing memory.
It is tokenizing the data.
There is basic principle Separation of Concerns where your code should only try and do one of two things. It should either do resource management (memory management in this case) or it should do business logic (tokenization of the data). By separating these into different parts of the code you make the code more generally easier to use and easier to write. Fortunately in this example all the resource management is already done by the std::vector/std::string thus allowing us to concentrate on the business logic.
As has been shown many times the easy way to tokenize a stream is using operator >> and a string. This will break the stream into words. You can then use iterators to automatically loop across the stream tokenizing the stream.
std::vector<std::string> data;
for(std::istream_iterator<std::string> loop(<stream>); loop != std::istream_iterator<std::string>(); ++loop)
{
// In here loop is an iterator that has tokenized the stream using the
// operator >> (which for std::string reads one space separated word.
data.push_back(*loop);
}
If we combine this with some standard algorithms to simplify the code.
std::copy(std::istream_iterator<std::string>(<stream>), std::istream_iterator<std::string>(), std::back_inserter(data));
Now combining all the above into a single application
int main()
{
// Create the facet.
std::ctype<char>* wordSplitter(new WordSplitterFacet(std::locale()));
// Here I am using a string stream.
// But any stream can be used. Note you must imbue a stream before it is used.
// Otherwise the imbue() will silently fail.
std::stringstream teststr;
teststr.imbue(std::locale(std::locale(), wordSplitter));
// Now that it is imbued we can use it.
// If this was a file stream then you could open it here.
teststr << "This, stri,plop";
cout << "die monster !";
std::vector<std::string> data;
std::copy(std::istream_iterator<std::string>(teststr), std::istream_iterator<std::string>(), std::back_inserter(data));
// Copy the array to cout one word per line
std::copy(data.begin(), data.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reading a iostream until a string delimiter is found - c++

Related

How can I erase first and last character in string?

how do you split a string embedded in a delimiter in C++?

How to cleanly extract a string delimited string from an istream in c++

wordwap function fix to preserve whitespace between words

How to tokenize (words) classifying punctuation as space

Categories

Resources