How to read only some previously know lines using ifstream (C++) - c++

By preprocessing on the file i found some line for further processing, know i want to read that lines. is there any faster solution than reading lines one by one using ifstream::getline(...) ?
For example i know that i want only lines of product 4 (0-4-8-12-16-...) or special line numbers stored in a vector...
Now I'm doing this :
string line;
int counter = 0;
while( getline(ifstr,line) ){
if(counter%4 =0){
// some code working with line
}
}
but i want something like this (if faster)
while(getline(ifstr,line)){
// some code working with line
while(++counter%4 !=0){ // or checking on index vector
skipline(ifstr)
}
}
Let me mention again that i have some line index (sorted but not this regular) but i use this example of product4 for simplicity.
Edit: and i want to jump to line at the begining, for example i know that i need to read from line number 2000, how to skip 1999 lines quickly ?
Thanks all

Because #caps said this left him with the feeling there's nothing in the standard library to help with this kind of task, I felt compelled to demonstrate otherwise :)
Live On Coliru
template <typename It, typename Out, typename Filter = std::vector<int> >
Out retrieve_lines(It begin, It const end, Filter lines, Out out, char const* delim = "\\n") {
if (lines.empty())
return out;
// make sure input is orderly
assert(std::is_sorted(lines.begin(), lines.end()));
assert(lines.front() >= 0);
std::regex re(delim);
std::regex_token_iterator<It> line(begin, end, re, -1), eof;
// make lines into incremental offsets
std::adjacent_difference(lines.begin(), lines.end(), lines.begin());
// iterate advancing by each offset requested
auto advanced = [&line, eof](size_t n) { while (line!=eof && n--) ++line; return line; };
for (auto offset = lines.begin(); offset != lines.end() && advanced(*offset) != eof; ++offset) {
*out++ = *line;
}
return out;
}
This is noticably more generic. The trade off (for now) is that the tokenizing iterator requires a random access iterator. I find this a good trade-off because "random access" on files really asks for memory mapped files anyways
Live Demo 1: from string to vector<string>
Live On Coliru
int main() {
std::vector<std::string> output_lines;
std::string is(" a b c d e\nf g hijklmnop\nqrstuvw\nxyz");
retrieve_lines(is.begin(), is.end(), {0,3,999}, back_inserter(output_lines));
// for debug purposes
for (auto& line : output_lines)
std::cout << line << "\n";
}
Prints
a b c d e
xyz
Live Demo 2: From file to cout
Live On Coliru
#include <boost/iostreams/device/mapped_file.hpp>
int main() {
boost::iostreams::mapped_file_source is("/etc/dictionaries-common/words");
retrieve_lines(is.begin(), is.end(), {13,784, 9996}, std::ostream_iterator<std::string>(std::cout, "\n"));
}
Prints e.g.
ASL's
Apennines
Mercer's
The use of boost::iostreams::mapped_file_source can easily be replaced with straight up ::mmap but I found it uglier in the presentation sample.

Store std::fstream::streampos instances corresponding to line beginnings of your file into a std::vector and then you can access a specific line using the index of this vector. A possible implementation follows,
class file_reader {
public:
// load file streampos offsets during construction
explicit file_reader(const std::string& filename)
: fs(filename) { cache_file_streampos(); }
std::size_t lines() const noexcept { return line_streampos_vec.size(); }
// get a std::string representation of specific line in file
std::string read_line(std::size_t n) {
if (n >= line_streampos_vec.size() - 1)
throw std::out_of_range("out of bounds");
navigate_to_line(n);
std::string rtn_str;
std::getline(fs, rtn_str);
return rtn_str;
}
private:
std::fstream fs;
std::vector<std::fstream::streampos> line_streampos_vec;
const std::size_t max_line_length = // some sensible value
// store file streampos instances in vector
void cache_file_streampos() {
std::string s;
s.reserve(max_line_length);
while (std::getline(fs, s))
line_streampos_vec.push_back(fs.tellg());
}
// go to specific line in file stream
void navigate_to_line(std::size_t n) {
fs.clear();
fs.seekg(line_streampos_vec[n]);
}
};
Then you can read a specific line of your file via,
file_reader fr("filename.ext");
for (int i = 0; i < fr.lines(); ++i) {
if (!(i % 4))
std::string line_contents = fr.read_line(i); // then do something with the string
}

ArchbishopOfBanterbury's answer is nice, and I would agree with him that you will get cleaner code and better efficiency by just storing the character positions of the beginning of each line when you do your preprocessing.
But, supposing that is not possible (perhaps the preprocessing is handled by some other API, or is from user input), there is a solution that should do the minimal amount of work necessary to read in only the specified lines.
The fundamental problem is that, given a file with variable line lengths, you cannot know where each line begins and ends, since a line is defined as a sequence of characters that end in '\n'. So, you must parse every character to check and see if it is '\n' or not, and if so, advance your line counter and read in the line if the line counter matches one of your desired inputs.
auto retrieve_lines(std::ifstream& file_to_read, std::vector<int> line_numbers_to_read) -> std::vector<std::string>
{
auto begin = std::istreambuf_iterator<char>(file_to_read);
auto end = std::istreambuf_iterator<char>();
auto current_line = 0;
auto next_line_num = std::begin(line_numbers_to_read);
auto output_lines = std::vector<std::string>();
output_lines.reserve(line_numbers_to_read.size()); //this may be a silly "optimization," since all the strings are still separate unreserved buffers
//we can bail if we've reached the end of the lines we want to read, even if there are lines remaining in the stream
//we *must* bail if we've reached the end of the stream, even if there are supposedly lines left to read; that input must have been incorrect
while(begin != end && next_line_num != std::end(line_numbers_to_read))
{
if(current_line == *next_line_num)
{
auto matching_line = std::string();
if(*begin != '\n')
{
//potential optimization: reserve matching_line to something that you expect will fit most/all of your input lines
while(begin != end && *begin != '\n')
{
matching_line.push_back(*begin++);
}
}
output_lines.emplace_back(matching_line);
++next_line_num;
}
else
{
//skip this "line" by finding the next '\n'
while(begin != end && *begin != '\n')
{
++begin;
}
}
//either code path in the previous if/else leaves us staring at the '\n' at the end of a line,
//which is not the right state for the next iteration of the loop.
//So skip this '\n' to get to the beginning of the next line
if (begin != end && *begin == '\n')
{
++begin;
}
++current_line;
}
return output_lines;
}
Here it is live on Coliru, along with the input I tested it with. As you can see, it correctly handles empty lines as well as correctly handling being told to grab more lines than are in the file.

Related

What is the reason behind the debugging getting stopped abruptly in the following code?

Here is the code to find the number of matches of a string, which is input from the user, can be found in the file temp.txt. If, for example, we want love to be counted, then matches like love, lovely, beloved should be considered. We also want to count the total number of words in temp.txt file.
I am doing a line by line reading here, not word by word.
Why does the debugging stop at totalwords += counting(line)?
/*this code is not working to count the words*/
#include<iostream>
#include<fstream>
#include<string>
using namespace std;
int totalwords{0};
int counting(string line){
int wordcount{0};
if(line.empty()){
return 1;
}
if(line.find(" ")==string::npos){wordcount++;}
else{
while(line.find(" ")!=string::npos){
int index=0;
index = line.find(" ");
line.erase(0,index);
wordcount++;
}
}
return wordcount;
}
int main() {
ifstream in_file;
in_file.open("temp.txt");
if(!in_file){
cerr<<"PROBLEM OPENING THE FILE"<<endl;
}
string line{};
int counter{0};
string word {};
cout<<"ENTER THE WORD YOU WANT TO COUNT IN THE FILE: ";
cin>>word;
int n {0};
n = ( word.length() - 1 );
while(getline(in_file>>ws,line)){
totalwords += counting(line);
while(line.find(word)!=string::npos){
counter++;
int index{0};
index = line.find(word);
line.erase(0,(index+n));
}
}
cout<<endl;
cout<<counter<<endl;
cout<<totalwords;
return 0;
}
line.erase(0, index); doesn't erase the space, you need
line.erase(0, index + 1);
Your code reveals a few problems...
At very first, counting a single word for an empty line doesn't appear correct to me. Second, erasing again and again from the string is pretty inefficient, with every such operation all of the subsequent characters are copied towards the front. If you indeed wanted to do so you might rather want to search from the end of the string, avoiding that. But you can actually do so without ever modifying the string if you use the second parameter of std::string::find (which defaults to 0, so has been transparent to you...):
int index = line.find(' ' /*, 0*); // first call; 0 is default, thus implicit
index = line.find(' ', index + 1); // subsequent call
Note that using the character overload is more efficient if you search for a single character anyway. However, this variant doesn't consider other whitespace like e. g. tabulators.
Additionally, the variant as posted in the question doesn't consider more than one subsequent whitespace! In your erasing variant – which erases one character too few, by the way – you would need to skip incrementing the word count if you find the space character at index 0.
However I'd go with a totally new approach, looking at each character separately; you need a stateful loop for in that case, though, i.e. you need to remember if you already are within a word or not. It might look e. g. like this:
size_t wordCount = 0; // note: prefer an unsigned type, negative values
// are meaningless anyway
// size_t is especially fine as it is guaranteed to be
// large enough to hold any count the string might ever
// contain characters
bool inWord = false;
for(char c : line)
{
if(isspace(static_cast<unsigned char>(c)))
// you can check for *any* white space that way...
// note the cast to unsigned, which is necessary as isspace accepts
// an int and a bare char *might* be signed, thus result in negative
// values
{
// no word any more...
inWord = false;
}
else if(inWord)
{
// well, nothing to do, we already discovered a word earlier!
//
// as we actually don't do anything here you might just skip
// this block and check for the opposite: if(!inWord)
}
else
{
// OK, this is the start of a word!
// so now we need to count a new one!
++wordCount;
inWord = true;
}
}
Now you might want to break words at punctuation characters as well, so you might actually want to check for:
if(isspace(static_cast<unsigned char>(c)) || ispunct(static_cast<unsigned char>(c))
A bit shorter is the following variant:
if(/* space or punctuation */)
{
inWord = false;
}
else
{
wordCount += inWord; // adds 0 or 1 depending on the value
inWord = false;
}
Finally: All code is written freely, thus unchecked – if you find a bug, please fix yourself...
debugging getting stopped abruptly
Does debugging indeed stop at the indicated line? I observed instead that the program hangs within the while loop in counting. You may make this visible by inserting an indicator output (marked by HERE in following code):
int counting(string line){
int wordcount{0};
if(line.empty()){
return 1;
}
if(line.find(" ")==string::npos){wordcount++;}
else{
while(line.find(" ")!=string::npos){
int index=0;
index = line.find(" ");
line.erase(0,index);
cout << '.'; // <--- HERE: indicator output
wordcount++;
}
}
return wordcount;
}
As Jarod42 pointed out, the erase call you are using misses the space itself. That's why you are finding spaces and “counting words” forever.
There is also an obvious misconception about words and separators of words visible in your code:
empty lines don't contain words
consecutive spaces don't indicate words
words may be separated by non-spaces (parentheses for example)
Finally, as already mentioned: if the problem is about counting total words, it's not necessary to discuss the other parts. And after the test (see HERE) above, it also appears to be independent on file input. So your code could be reduced to something like this:
#include <iostream>
#include <string>
int counting(std::string line) {
int wordcount = 0;
if (line.empty()) {
return 1;
}
if (line.find(" ") == std::string::npos) {
wordcount++;
} else {
while (line.find(" ") != std::string::npos) {
int index = 0;
index = line.find(" ");
line.erase(0, index);
wordcount++;
}
}
return wordcount;
}
int main() {
int totalwords = counting("bla bla");
std::cout << totalwords;
return 0;
}
And in this form, it's much easier to see if it works. We expect to see a 2 as output. To get there, it's possible to try correcting your erase call, but the result would then still be wrong (1) since you are actually counting spaces. So it's better to take the time and carefully read Aconcagua's insightful answer.

How to know it's the end of the line in c++ reading file

I need to know how to stop if it's the end of the line
I need the int value for the last element which is 15
and also for the other lines, I need to know if this value is the last element in the line
i try getline(file ,line)
but it just gives me a string!
and also line.length(); gives me the length of the file.
You should either use std::getline and parse the string you get, or go char-by-char and treat CR/LF as end of line. Then just move back from it until you get a space.
You should continue to use your getline function, from that you will have your std::string then to pull your last digit use:
string str = <getline() result>
size_t last_index = str.find_last_not_of("0123456789");
string result = str.substr(last_index + 1);
int num = std::stoi(result);
This will give you your int representation of your last number.
This will extract the last number on each line:
ifstream f ("./numbers");
vector<int> v;
int i;
while (f >> i) {
char c = f.get();
if (f.eof() || c == '\r' || c == '\n') {
v.push_back(i);
}
}
f.eof() may be true if the file is not terminated by a new line, in which case the f.get() call may fail and set the eofbit.
std::ifstream file{"your-file"};
std::string line;
while (getline(file, line)) {
std::istringstream iss{line};
std::vector<int> vec{std::istream_iterator<int>{iss},
std::istream_iterator<int>{}};
if (vec.size() == 1) {
// only one element
}
vec.back() // last value of the line (/!\ check not empty first)
}

Traversing a Fatsa file in C/C++

I'm looking to write a program in C/C++ to traverse a Fasta file formatted like:
>ID and header information
SEQUENCE1
>ID and header information
SEQUENCE2
and so on
in order to find all unique sequences (check if subset of any other sequence)
and write unique sequences (and all headers) to an output file.
My approach was:
Copy all sequences to an array/list at the beginning (more efficient way to do this?)
Grab header, append it to output file, compare sequence for that header to everything in the list/array. If unique, write it under the header, if not delete it.
However, I'm a little unsure as to how to approach reading the lines in properly. I need to read the top line for the header, and then "return?" to the next line to read the sequence. Sometimes the sequence spans more then two lines, so would I use > (from the example above) as a delimiter? If I use C++, I imagine I'd use iostreams to accomplish this?
If anybody could give me a nudge in the right direction as to how I would want to read the information I need to manipulate/how to carry out the comparison, it'd be greatly appreciated.
First, rather than write your own FASTA reading routine you probably want to use something that alrady exists, for example, see: http://lh3lh3.users.sourceforge.net/parsefastq.shtml
Internally you'll have the sequences without newlines and that is probably helpful. I think the simplest approach from a high level is
loop over fasta and write out sequences to a file
sort that file
with the sorted file it becomes easier to pick out subsequences so write a program to find the "unique ids"
Using the unique id's go back to the original fasta and get whatever additional information you need.
Your approach is usable. Below is an implementation of it.
However, I'm a little unsure as to how to approach reading the lines
in properly. ... Sometimes the sequence spans more then two lines, so would I use > (from the example above) as a delimiter?
That's right; in addition, there's just the EOF which has to be checked.
I wrote the function getd() for that, which reads a single-line description or concatenated lines of sequence data and returns a pointer to the string it allocated.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *getd()
{
char *s = NULL, *eol;
size_t i = 0, size = 1; // 1 for null character
int c;
#define MAXLINE 80 // recommended max. line length; longer lines are okay
do // read single-line description or concatenated lines of sequence data
{
do // read a line (until '\n')
{
s = realloc(s, size += MAXLINE+1); // +1 for newline character
if (!s) puts("out of memory"), exit(1);
if (!fgets(s+i, size, stdin)) break;
eol = strchr(s+i, '\n');
i += MAXLINE+1;
} while (!eol);
if (!i) { free(s); return NULL; } // nothing read
if (*s == '>') return s; // single-line description
i = eol-s;
ungetc(c = getchar(), stdin); // peek at next character
} while (c != '>' && c != EOF);
return s;
}
int main()
{
char *s;
struct seq { char *head, *data; } *seq = NULL;
int n = 0, i, j;
while (s = getd())
if (*s == '>')
{ // new sequence: make room, store header
seq = realloc(seq, ++n * sizeof *seq);
if (!seq) puts("out of memory"), exit(1);
seq[n-1] = (struct seq){ s, "" };
}
else
if (n) // store sequence data if at least one header present
seq[n-1].data = s;
for (i = 0; i < n; ++i)
{
const int max = 70; // reformat output data to that line length max.
printf("%s", seq[i].head);
for (s = seq[i].data, j = 0; j < n; ++j)
if (j != i) // compare sequence to the others, delete if not unique
if (strstr(seq[j].data, s)) { s = seq[i].data = ""; break; }
for (; strlen(s) > max && s[max] != '\n'; s += max)
printf("%.*s\n", max, s);
printf("%s", s);
}
}

Getting the nth line of a text file in C++

I need to read the nth line of a text file (e.g. textfile.findline(0) would find the first line of the text file loaded with ifstream textfile). Is this possible?
I don't need to put the contents of the file in an array/vector, I need to just assign a specific line of the text file to a varible (specifically a int).
P.S. I am looking for the simplest solution that would not require me to use any big external library (e.g. Boost)
Thanks in advance.
How about this?
std::string ReadNthLine(const std::string& filename, int N)
{
std::ifstream in(filename.c_str());
std::string s;
//for performance
s.reserve(some_reasonable_max_line_length);
//skip N lines
for(int i = 0; i < N; ++i)
std::getline(in, s);
std::getline(in,s);
return s;
}
If you want to read the start of the nth line, you can use stdin::ignore to skip over the first n-1 lines, then read from the next line to assign to the variable.
template<typename T>
void readNthLine(istream& in, int n, T& value) {
for (int i = 0; i < n-1; ++i) {
in.ignore(numeric_limits<streamsize>::max(), '\n');
}
in >> value;
}
Armen's solution is the correct answer, but I thought I'd throw out an alternative, based on jweyrich's caching idea. For better or for worse, this reads in the entire file at construction, but only saves the newline positions (doesn't store the entire file, so it plays nice with massive files.) Then you can simply call ReadNthLine, and it will immediately jump to that line, and read in the one line you want. On the other hand, this is only optimal if you want to get only a fraction of the lines at a time, and the line numbers are not known at compile time.
class TextFile {
std::ifstream file_stream;
std::vector<std::ifstream::streampos> linebegins;
TextFile& operator=(TextFile& b) = delete;
public;
TextFile(std::string filename)
:file_stream(filename)
{
//this chunk stolen from Armen's,
std::string s;
//for performance
s.reserve(some_reasonable_max_line_length);
while(file_stream) {
linebegins.push_back(file_stream.tellg());
std::getline(file_stream, s);
}
}
TextFile(TextFile&& b)
:file_stream(std::move(b.file_stream)),
:linebegins(std::move(b.linebegins))
{}
TextFile& operator=(TextFile&& b)
{
file_stream = std::move(b.file_stream);
linebegins = std::move(b.linebegins);
}
std::string ReadNthLine(int N) {
if (N >= linebegins.size()-1)
throw std::runtime_error("File doesn't have that many lines!");
std::string s;
// clear EOF and error flags
file_stream.clear();
file_stream.seekg(linebegins[N]);
std::getline(file_stream, s);
return s;
}
};
It's certainly possible. There are (n-1) '\n' characters preceding the nth line. Read lines until you reach the one you're looking for. You can do this on the fly without storing anything except the current line being considered.

Selective iterator

FYI: no boost, yes it has this, I want to reinvent the wheel ;)
Is there some form of a selective iterator (possible) in C++? What I want is to seperate strings like this:
some:word{or other
to a form like this:
some : word { or other
I can do that with two loops and find_first_of(":") and ("{") but this seems (very) inefficient to me. I thought that maybe there would be a way to create/define/write an iterator that would iterate over all these values with for_each. I fear this will have me writing a full-fledged custom way-too-complex iterator class for a std::string.
So I thought maybe this would do:
std::vector<size_t> list;
size_t index = mystring.find(":");
while( index != std::string::npos )
{
list.push_back(index);
index = mystring.find(":", list.back());
}
std::for_each(list.begin(), list.end(), addSpaces(mystring));
This looks messy to me, and I'm quite sure a more elegant way of doing this exists. But I can't think of it. Anyone have a bright idea? Thanks
PS: I did not test the code posted, just a quick write-up of what I would try
UPDATE: after taking all your answers into account, I came up with this, and it works to my liking :). this does assume the last char is a newline or something, otherwise an ending {,}, or : won't get processed.
void tokenize( string &line )
{
char oneBack = ' ';
char twoBack = ' ';
char current = ' ';
size_t length = line.size();
for( size_t index = 0; index<length; ++index )
{
twoBack = oneBack;
oneBack = current;
current = line.at( index );
if( isSpecial(oneBack) )
{
if( !isspace(twoBack) ) // insert before
{
line.insert(index-1, " ");
++index;
++length;
}
if( !isspace(current) ) // insert after
{
line.insert(index, " ");
++index;
++length;
}
}
}
Comments are welcome as always :)
That's relatively easy using the std::istream_iterator.
What you need to do is define your own class (say Term). Then define how to read a single "word" (term) from the stream using the operator >>.
I don't know your exact definition of a word is, so I am using the following definition:
Any consecutive sequence of alpha numeric characters is a term
Any single non white space character that is also not alpha numeric is a word.
Try this:
#include <string>
#include <sstream>
#include <iostream>
#include <iterator>
#include <algorithm>
class Term
{
public:
// This cast operator is not required but makes it easy to use
// a Term anywhere that a string can normally be used.
operator std::string const&() const {return value;}
private:
// A term is just a string
// And we friend the operator >> to make sure we can read it.
friend std::istream& operator>>(std::istream& inStr,Term& dst);
std::string value;
};
Now all we have to do is define an operator >> that reads a word according to the rules:
// This function could be a lot neater using some boost regular expressions.
// I just do it manually to show it can be done without boost (as requested)
std::istream& operator>>(std::istream& inStr,Term& dst)
{
// Note the >> operator drops all proceeding white space.
// So we get the first non white space
char first;
inStr >> first;
// If the stream is in any bad state the stop processing.
if (inStr)
{
if(std::isalnum(first))
{
// Alpha Numeric so read a sequence of characters
dst.value = first;
// This is ugly. And needs re-factoring.
while((first = insStr.get(), inStr) && std::isalnum(first))
{
dst.value += first;
}
// Take into account the special case of EOF.
// And bad stream states.
if (!inStr)
{
if (!inStr.eof())
{
// The last letter read was not EOF and and not part of the word
// So put it back for use by the next call to read from the stream.
inStr.putback(first);
}
// We know that we have a word so clear any errors to make sure it
// is used. Let the next attempt to read a word (term) fail at the outer if.
inStr.clear();
}
}
else
{
// It was not alpha numeric so it is a one character word.
dst.value = first;
}
}
return inStr;
}
So now we can use it in standard algorithms by just employing the istream_iterator
int main()
{
std::string data = "some:word{or other";
std::stringstream dataStream(data);
std::copy( // Read the stream one Term at a time.
std::istream_iterator<Term>(dataStream),
std::istream_iterator<Term>(),
// Note the ostream_iterator is using a std::string
// This works because a Term can be converted into a string.
std::ostream_iterator<std::string>(std::cout, "\n")
);
}
The output:
> ./a.exe
some
:
word
{
or
other
std::string const str = "some:word{or other";
std::string result;
result.reserve(str.size());
for (std::string::const_iterator it = str.begin(), end = str.end();
it != end; ++it)
{
if (isalnum(*it))
{
result.push_back(*it);
}
else
{
result.push_back(' '); result.push_back(*it); result.push_back(' ');
}
}
Insert version for speed-up
std::string str = "some:word{or other";
for (std::string::iterator it = str.begin(), end = str.end(); it != end; ++it)
{
if (!isalnum(*it))
{
it = str.insert(it, ' ') + 2;
it = str.insert(it, ' ');
end = str.end();
}
}
Note that std::string::insert inserts BEFORE the iterator passed and returns an iterator to the newly inserted character. Assigning is important since the buffer may have been reallocated at another memory location (the iterators are invalidated by the insertion). Also note that you can't keep end for the whole loop, each time you insert you need to recompute it.
a more elegant way of doing this exists.
I do not know how BOOST implements that, but traditional way is by feeding input string character by character into a FSM which detects where tokens (words, symbols) start and end.
I can do that with two loops and find_first_of(":") and ("{")
One loop with std::find_first_of() should suffice.
Though I'm still a huge fan of FSMs for such parsing tasks.
P.S. Similar question
How about something like:
std::string::const_iterator it, end = mystring.end();
for(it = mystring.begin(); it != end; ++it) {
if ( !isalnum( *it ))
list.push_back(it);
}
This way, you'll only iterate once through the string, and isalnum from ctype.h seems to do what you want. Of course, the code above is very simplistic and incomplete and only suggests a solution.
Are you looking to tokenize the input string, ala strtok?
If so, here is a tokenizing function that you can use. It takes an input string and a string of delimiters (each char int he string is a possible delimitter), and it returns a vector of tokens. Each token is a tuple with the delimitted string, and the delimiter used in that case:
#include <cstdlib>
#include <vector>
#include <string>
#include <functional>
#include <iostream>
#include <algorithm>
using namespace std;
// FUNCTION : stringtok(char const* Raw, string sToks)
// PARAMATERS : Raw Pointer to NULL-Terminated string containing a string to be tokenized.
// sToks string of individual token characters -- each character in the string is a token
// DESCRIPTION : Tokenizes a string, much in the same was as strtok does. The input string is not modified. The
// function is called once to tokenize a string, and all the tokens are retuned at once.
// RETURNS : Returns a vector of strings. Each element in the vector is one token. The token character is
// not included in the string. The number of elements in the vector is N+1, where N is the number
// of times the Token character is found in the string. If one token is an empty string (as with the
// string "string1##string3", where the token character is '#'), then that element in the vector
// is an empty string.
// NOTES :
//
typedef pair<char,string> token; // first = delimiter, second = data
inline vector<token> tokenize(const string& str, const string& delims, bool bCaseSensitive=false) // tokenizes a string, returns a vector of tokens
{
bCaseSensitive;
// prologue
vector<token> vRet;
// tokenize input string
for( string::const_iterator itA = str.begin(), it=itA; it != str.end(); it = find_first_of(++it,str.end(),delims.begin(),delims.end()) )
{
// prologue
// find end of token
string::const_iterator itEnd = find_first_of(it+1,str.end(),delims.begin(),delims.end());
// add string to output
if( it == itA ) vRet.push_back(make_pair(0,string(it,itEnd)));
else vRet.push_back(make_pair(*it,string(it+1,itEnd)));
// epilogue
}
// epilogue
return vRet;
}
using namespace std;
int main()
{
string input = "some:word{or other";
typedef vector<token> tokens;
tokens toks = tokenize(input.c_str(), " :{");
cout << "Input: '" << input << " # Tokens: " << toks.size() << "'\n";
for( tokens::iterator it = toks.begin(); it != toks.end(); ++it )
{
cout << " Token : '" << it->second << "', Delimiter: '" << it->first << "'\n";
}
return 0;
}