How to take formatted input from ifstream - c++

I have a text file with a set of names formatted in the following way:
"MARY","PATRICIA","LINDA","BARBARA","ELIZABETH"
and so on. I want to open the file using ifstream and read the names into a string array (without quotes, commas). I somehow managed to do it by checking the input stream character by character. Is there an easier way to take this formatted input?
EDIT:
I heard that you can use something like
fscanf (f, "\"%[a-zA-Z]\",", str);
in C, but is there such a method for ifstream?

That input should be parsable with std::getline or std::regex_token_iterator (though the latter is shooting sparrows with artillery).
Examples:
Regex
Quick and dirty, yet heavyweight solution (using boost so most compilers eat this)
#include <boost/regex.hpp>
#include <iostream>
int main() {
const std::string s = "\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\",\"ELIZABETH\"";
boost::regex re("\"(.*?)\"");
for (boost::sregex_token_iterator it(s.begin(), s.end(), re, 1), end;
it != end; ++it)
{
std::cout << *it << std::endl;
}
}
Output:
MARY
PATRICIA
LINDA
BARBARA
ELIZABETH
Alternatively, you can use
boost::regex re(",");
for (boost::sregex_token_iterator it(s.begin(), s.end(), re, -1), end;
to let it split along commas (note also the -1) or other regexes.
getline
getline solution (whitespace allowed)
#include <sstream>
#include <iostream>
int main() {
std::stringstream ss;
ss.str ("\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\",\"ELIZABETH\"");
std::string curr;
while (std::getline (ss, curr, ',')) {
size_t from = 1 + curr.find_first_of ('"'),
to = curr.find_last_of ('"');
std::cout << curr.substr (from, to-from) << std::endl;
}
}
Output is the same.
getline
getline solution (whitespace not allowed)
The loop becomes almost trivial:
std::string curr;
while (std::getline (ss, curr, ',')) {
std::cout << curr.substr (1, curr.length()-2) << std::endl;
}
homebrew solution
Least wasteful w.r.t. performance (especially when you wouldn't store those strings, but iterators or indices instead)
#include <iostream>
int main() {
const std::string str ("\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\",\"ELIZABETH\"");
size_t i = 0;
while (i != std::string::npos) {
size_t begin = str.find ('"', i) + 1, // one behind initial '"'
end = str.find ('"', begin),
comma = str.find (',', end);
i = comma;
std::cout << str.substr(begin, end-begin) << std::endl;
}
}

As far as I know, there is no tokenizer in the STL. But if you are willing to use boost, there's a very good tokenizer class there. Other than that, character by character is your best C++ way of addressing it (unless you are willing to go the C route, and use strtok_t on your raw char * strings).

A simple tokenizer should do the trick; no need for something heavy-weight like regular expressions. C++ doesn't have a built-in one, but it's easy enough to write. Here's one which I myself stole off the internet so long ago I don't even remember who wrote it, so apologies for the blatant plagiarism:
#include <vector>
#include <string>
std::vector<std::string>
tokenize(const std::string & str, const std::string & delimiters)
{
std::vector<std::string> tokens;
// Skip delimiters at beginning.
std::string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
std::string::size_type pos = str.find_first_of(delimiters, lastPos);
while (std::string::npos != pos || std::string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
return tokens;
}
Usage: std::vector<std::string> words = tokenize(line, ",");

Actually, because I was interested, I worked out how to do this using Boost.Spirit.Qi:
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace boost::spirit::qi;
int main() {
// our test-string
std::string data("\"MARY\",\"PATRICIA\",\"LINDA\",\"BARBARA\"");
// this is where we will store the names
std::vector<std::string> names;
// parse the string
phrase_parse(data.begin(), data.end(),
( lexeme['"' >> +(char_ - '"') >> '"'] % ',' ),
space, names);
// print what we have parsed
std::copy(names.begin(), names.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
To check if an error occurred during parsing, simply store the iterators over the string in variables, and compare them afterwards. If they are equal, the whole string was matched, if not, the begin-iterator will point to the error site.

Related

Reading all the words in a text file in C++

I have a large .txt file and I want to read all of the words inside it and print them on the screen. The first thing I did was to use std::getline() in this way:
std::vector<std::string> words;
std::string line;
while(std::getline(std::cin,line)){
words.push_back(line);
}
and then I printed out all the words present in the vector words. The .txt file is passed from command line as ./a.out < myTxt.txt.
The problem is that each component of the vector is a whole line, and so I am not reading each word.
The problem, I guess, is the spaces between words: how can I tell the code to ignore them? More specifically, is there any function that I can use in order to read each word from a .txt file?
UPDATE:
I'm trying to avoid all the commas ., but also ? ! (). I used find_first_of(), but my program doesn't work. Also, I don't know how to set what are the characters I don't want to be read, i.e. ., ?, !, and so on
std::vector<std::string> my_vec;
std::string line;
while(std::cin>>line){
std::size_t pos = line.find_first_of("!");
std::string line = line.substr(pos);
my_vec.push_back(line);
}
'>>' operator of type string exactly fills your requirements.
std::vector<std::string> words;
std::string line;
while (std::cin >> line) {
words.push_back(line);
}
If you need remove some noisy characters, e.g. ',','.', you can replace them with space character first.
#include <iostream>
#include <sstream>
#include <vector>
#include <algorithm>
int main() {
std::vector<std::string> words;
std::string line;
while (getline(std::cin, line)) {
std::transform(line.begin(), line.end(), line.begin(),
[](char c) { return std::isalnum(c) ? c : ' '; });
std::stringstream linestream(line);
std::string w;
while (linestream >> w) {
std::cout << w << "\n";
words.push_back(w);
}
}
}
cppreference
The getline function, as it sounds, only returns a whole line. You can split each line on spaces after reading it, or you can read word by word using operator>>:
string word;
while (cin >> word){
cout << word << "\n";
words.push_back(word);
}
Use operator>> instead of std::getline(). The operator will read individual whitespace-separated substrings for you.
#include <iostream>
#include <string>
#include <vector>
std::vector<std::string> my_vec;
std::string s;
while (std::cin >> s){
// use s as needed...
}
However, you may still end up receiving strings that have punctuation in them without any surrounding whitespace, ie hello,world, so you will have to manually split those strings as needed, eg:
#include <iostream>
#include <string>
#include <vector>
#include <cctype>
std::vector<std::string> my_vec;
std::string s;
while (std::cin >> s){
std::string::size_type start = 0, pos;
while ((pos = s.find_first_of(".,?!()", start)) != std::string::npos){
my_vec.push_back(s.substr(start, pos-start));
start = s.find_first_not_of(".,?!() \t\f\r\n\v", pos+1);
}
if (start == 0)
my_vec.push_back(s);
else if (start != std::string::npos)
my_vec.push_back(s.substr(start));
}

Using strtok() to parse text file

I've been trying to make a program that parses a text file and feeds 6 pieces of information into an array of objects. The problem for me is that I'm having issues figuring out how to process the text file. I was told that the first step I needed to do was to write some code that counted how many letters long each entry was. The txt file is in this format:
"thing1","thing2","thing3","thing4","thing5","thing6"
This is the current version of my code:
#include<iostream>
#include<string>
#include<fstream>
#include<cstring>
using namespace std;
int main()
{
ifstream myFile("Book List.txt");
while(myFile.good())
{
string line;
getline(myFile, line);
char *sArr = new char[line.length() + 1];
strcpy(sArr, line.c_str());
char *sPtr;
sPtr = strtok(sArr, " ");
while(sPtr != NULL)
{
cout << strlen(sPtr) << " ";
sPtr = strtok(NULL, " ");
}
cout << endl;
}
myFile.close();
return 0;
}
So there are two things making it hard for me right now.
1) How do I deal with the delimiters?
2) How do I deal with "skipping" the first quotation mark in each line?
Read in a string instead of a c-style string. This means that you can use the handy std methods.
The std::string::find() method should help you out with finding each thing that you want to parse.
http://www.cplusplus.com/reference/string/string/find/
You can use this to find all the commas, which will give you the starts of all the things.
Then you can use std::string::substr() to cut up the string into each piece.
http://www.cplusplus.com/reference/string/string/substr/
You can manage to get rid of the quotation marks by passing in 1 more than the start and 1 less than the length of the thing, you can also use
If you have to use strtok then this code snippet should give enough to modify your program to parse your data:
#include <cstdio>
#include <cstring>
int main ()
{
char str[] ="\"thing1\",\"thing2\",\"thing3\",\"thing4\",\"thing5\"";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str,"\",");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, ",\"");
}
return 0;
}
If you do not have to use strtok then you should use std::string as others have advised. Using std::string and std::istringstream:
#include <string>
#include <sstream>
#include <vector>
#include <iostream>
int main ()
{
std::string str2( "\"thing1\",\"thing2\",\"thing3\",\"thing4\",\"thing5\"" ) ;
std::istringstream is(str2);
std::string part;
while (getline(is, part, ','))
std::cout << part.substr(1,part.length()-2) << std::endl;
return 0;
}
For starters, don't use strtok if you can avoid it (and you easily can here - and you can even avoid using the find series of functions as well).
If you want to read in the whole line and then parse it:
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
// defines a new ctype that treats commas as whitespace
struct csv_reader : std::ctype<char>
{
csv_reader() : std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());
rc['\n'] = std::ctype_base::space;
rc[','] = std::ctype_base::space;
return &rc[0];
}
};
int main()
{
std::ifstream fin("yourFile.txt");
std::string line;
csv_reader csv;
std::vector<std::vector<std::string>> values;
while (std::getline(fin, line))
{
istringstream iss(line);
iss.imbue(std::locale(std::locale(), csv));
std::vector<std::string> vec;
std::copy(std::istream_iterator<std::string>(iss), std::istream_iterator<std::string>(), std::back_inserter(vec));
values.push_back(vec);
}
// values now contains a vector for each line that has the strings split by their commas
fin.close();
return 0;
}
That answers your first question. For your second, you can skip all the quotation marks by adding them to the rc mask (also treating them as whitespace) or you can strip them out afterwards (either directly or by using a transform):
std::transform(vec.begin(), vec.end(), vec.begin(), [](std::string& s)
{
std::string::iterator pend = std::remove_if(s.begin(), s.end(), [](char c)
{
return c == '"';
});
s.erase(pend, s.end());
});

Splitting std::string and inserting into a std::set

As per request of the fantastic fellas over at the C++ chat lounge, what is a good way to break down a file (which in my case contains a string with roughly 100 lines, and about 10 words in each line) and insert all these words into a std::set?
The easiest way to construct any container from a source that holds a series of that element, is to use the constructor that takes a pair of iterators. Use istream_iterator to iterate over a stream.
#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
using namespace std;
int main()
{
//I create an iterator that retrieves `string` objects from `cin`
auto begin = istream_iterator<string>(cin);
//I create an iterator that represents the end of a stream
auto end = istream_iterator<string>();
//and iterate over the file, and copy those elements into my `set`
set<string> myset(begin, end);
//this line copies the elements in the set to `cout`
//I have this to verify that I did it all right
copy(myset.begin(), myset.end(), ostream_iterator<string>(cout, "\n"));
return 0;
}
http://ideone.com/iz1q0
Assuming you've read your file into a string, boost::split will do the trick:
#include <set>
#include <boost/foreach.hpp>
#include <boost/algorithm/string.hpp>
std::string astring = "abc 123 abc 123\ndef 456 def 456"; // your string
std::set<std::string> tokens; // this will receive the words
boost::split(tokens, astring, boost::is_any_of("\n ")); // split on space & newline
// Print the individual words
BOOST_FOREACH(std::string token, tokens){
std::cout << "\n" << token << std::endl;
}
Lists or Vectors can be used instead of a Set if necessary.
Also note this is almost a dupe of:
Split a string in C++?
#include <set>
#include <iostream>
#include <string>
int main()
{
std::string temp, mystring;
std::set<std::string> myset;
while(std::getline(std::cin, temp))
mystring += temp + ' ';
temp = "";
for (size_t i = 0; i < mystring.length(); i++)
{
if (mystring.at(i) == ' ' || mystring.at(i) == '\n' || mystring.at(i) == '\t')
{
myset.insert(temp);
temp = "";
}
else
{
temp.push_back(mystring.at(i));
}
}
if (temp != " " || temp != "\n" || temp != "\t")
myset.insert(temp);
for (std::set<std::string>::iterator i = myset.begin(); i != myset.end(); i++)
{
std::cout << *i << std::endl;
}
return 0;
}
Let's start at the top. First off, you need a few variables to work with. temp is just a placeholder for the string while you build it from each character in the string you want to parse. mystring is the string you are looking to split up and myset is where you will be sticking the split strings.
So then we read the file (input through < piping) and insert the contents into mystring.
Now we want to iterate down the length of the string, searching for spaces, newlines, or tabs to split the string up with. If we find one of those characters, then we need to insert the string into the set, and empty our placeholder string, otherwise, we add the character to the placeholder, which will build up the string. Once we finish, we need to add the last string to the set.
Finally, we iterate down the set, and print each string, which is simply for verification, but could be useful otherwise.
Edit: A significant improvement on my code provided by Loki Astari in a comment which I thought should be integrated into the answer:
#include <set>
#include <iostream>
#include <string>
int main()
{
std::set<std::string> myset;
std::string word;
while(std::cin >> word)
{
myset.insert(std::move(word));
}
for(std::set<std::string>::const_iterator it=myset.begin(); it!=myset.end(); ++it)
std::cout << *it << '\n';
}

CString Parsing Carriage Returns

Let's say I have a string that has multiple carriage returns in it, i.e:
394968686
100630382
395950966
335666021
I'm still pretty amateur hour with C++, would anyone be willing to show me how you go about: parsing through each "line" in the string ? So I can do something with it later (add the desired line to a list). I'm guessing using Find("\n") in a loop?
Thanks guys.
while (!str.IsEmpty())
{
CString one_line = str.SpanExcluding(_T("\r\n"));
// do something with one_line
str = str.Right(str.GetLength() - one_line.GetLength()).TrimLeft(_T("\r\n"));
}
Blank lines will be eliminated with this code, but that's easily corrected if necessary.
You could try it using stringstream. Notice that you can overload the getline method to use any delimeter you want.
string line;
stringstream ss;
ss << yourstring;
while ( getline(ss, line, '\n') )
{
cout << line << endl;
}
Alternatively you could use the boost library's tokenizer class.
You can use stringstream class in C++.
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;
int main()
{
string str = "\
394968686\
100630382\
395950966\
335666021";
stringstream ss(str);
vector<string> v;
string token;
// get line by line
while (ss >> token)
{
// insert current line into a std::vector
v.push_back(token);
// print out current line
cout << token << endl;
}
}
Output of the program above:
394968686
100630382
395950966
335666021
Note that no whitespace will be included in the parsed token, with the use of operator>>. Please refer to comments below.
If your string is stored in a c-style char* or std::string then you can simply search for \n.
std::string s;
size_t pos = s.find('\n');
You can use string::substr() to get the substring and store it in a list. Pseudo code,
std::string s = " .... ";
for(size_t pos, begin = 0;
string::npos != (pos = s.find('\n'));
begin = ++ pos)
{
list.push_back(s.substr(begin, pos));
}

Splitting a C++ std::string using tokens, e.g. ";" [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to split a string in C++?
Best way to split a string in C++? The string can be assumed to be composed of words separated by ;
From our guide lines point of view C string functions are not allowed and also Boost is also not allowed to use because of security conecerns open source is not allowed.
The best solution I have right now is:
string str("denmark;sweden;india;us");
Above str should be stored in vector as strings. how can we achieve this?
Thanks for inputs.
I find std::getline() is often the simplest. The optional delimiter parameter means it's not just for reading "lines":
#include <sstream>
#include <iostream>
#include <vector>
using namespace std;
int main() {
vector<string> strings;
istringstream f("denmark;sweden;india;us");
string s;
while (getline(f, s, ';')) {
cout << s << endl;
strings.push_back(s);
}
}
You could use a string stream and read the elements into the vector.
Here are many different examples...
A copy of one of the examples:
std::vector<std::string> split(const std::string& s, char seperator)
{
std::vector<std::string> output;
std::string::size_type prev_pos = 0, pos = 0;
while((pos = s.find(seperator, pos)) != std::string::npos)
{
std::string substring( s.substr(prev_pos, pos-prev_pos) );
output.push_back(substring);
prev_pos = ++pos;
}
output.push_back(s.substr(prev_pos, pos-prev_pos)); // Last word
return output;
}
There are several libraries available solving this problem, but the simplest is probably to use Boost Tokenizer:
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
#include <boost/foreach.hpp>
typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
std::string str("denmark;sweden;india;us");
boost::char_separator<char> sep(";");
tokenizer tokens(str, sep);
BOOST_FOREACH(std::string const& token, tokens)
{
std::cout << "<" << *tok_iter << "> " << "\n";
}