How to let std::istringstream treat a given character as white space? - c++

Using std::istringstream it is easy to read words separated by white space. But to parse the following line, I need the character / to be treated like white space.
f 104/387/104 495/574/495 497/573/497
How can I read values separated by either slash or white space?

One way is to define a ctype facet that classifies / as white-space:
class my_ctype : public std::ctype<char> {
public:
mask const *get_table() {
static std::vector<std::ctype<char>::mask>
table(classic_table(), classic_table()+table_size);
table['/'] = (mask)space;
return &table[0];
}
my_ctype(size_t refs=0) : std::ctype<char>(get_table(), false, refs) { }
};
From there, imbue the stream with a locale using that ctype facet, then read words:
int main() {
std::string input("f 104/387/104 495/574/495 497/573/497");
std::istringstream s(input);
s.imbue(std::locale(std::locale(), new my_ctype));
std::copy(std::istream_iterator<std::string>(s),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}

If boost is available, then boost::split() would be a possible solution. Populate a std::string using std::getline() and then split the line:
#include <iostream>
#include <vector>
#include <string>
#include <boost/algorithm/string.hpp>
#include <boost/algorithm/string/split.hpp>
int main()
{
std::vector<std::string> tokens;
std::string line("f 104/387/104 495/574/495 497/573/497");
boost::split(tokens, line, boost::is_any_of("/ "));
for (auto& token: tokens) std::cout << token << "\n";
return 0;
}
Output:
f
104
387
104
495
574
495
497
573
497

If you know when to split by either slash or whitespace, you can use std::getline
std::istringstream is("f 104/387/104 495/574/495 497/573/497");
std::string f, i, j, k;
std::getline(is, f, ' ');
std::getline(is, i, '/');
std::getline(is, j, '/');
std::getline(is, k, ' ');
Alternatively, you can use formatted input and discard the slashes manually
std::string f;
int i, j, k;
char slash;
is >> f >> i >> slash >> j >> slash >> k;

I'm sure this isn't the best way at all, but I was working on an exercise in the book Programming Principles and Practice Using C++ 2nd Ed. by Bjarne Stroustrup and I came up with a solution that might work for you. I searched around to see how others were doing it (which is how I found this thread) but I really didn't find anything.
First of all, here's the exercise from the book:
Write a function vector<string> split(const string& s, const string&
w) that returns a vector of whitespace-separated substrings from the
argument s, where whitespace is defined as "ordinary whitespace" plus
the characters in w.
Here's the solution that I came up with, which seems to work well. I tried commenting it to make it more clear. Just want to mention I'm pretty new to C++ (which is why I'm reading this book), so don't go too hard on me. :)
// split a string into its whitespace-separated substrings and store
// each string in a vector<string>. Whitespace can be defined in argument
// w as a string (e.g. ".;,?-'")
vector<string> split(const string& s, const string& w)
{
string temp{ s };
// go through each char in temp (or s)
for (char& ch : temp) {
// check if any characters in temp (s) are whitespace defined in w
for (char white : w) {
if (ch == white)
ch = ' '; // if so, replace them with a space char ('')
}
}
vector<string> substrings;
stringstream ss{ temp };
for (string buffer; ss >> buffer;) {
substrings.push_back(buffer);
}
return substrings;
}
Then you can do something like this to use it:
cout << "Enter a string and substrings will be printed on new lines:\n";
string str;
getline(cin, str);
vector<string> substrings = split(str, ".;,?-'");
cout << "\nSubstrings:\n";
for (string s : substrings)
cout << s << '\n';
I know you aren't wanting to split strings, but this is just an example of how you can treat other characters as whitespace. Basically, I'm just replacing those characters with ' ' so they literally do become whitespace. When using that with a stream, it works pretty well. The for loop(s) might be the relevant code for your case.

Related

Reading all the words in a text file in C++

I have a large .txt file and I want to read all of the words inside it and print them on the screen. The first thing I did was to use std::getline() in this way:
std::vector<std::string> words;
std::string line;
while(std::getline(std::cin,line)){
words.push_back(line);
}
and then I printed out all the words present in the vector words. The .txt file is passed from command line as ./a.out < myTxt.txt.
The problem is that each component of the vector is a whole line, and so I am not reading each word.
The problem, I guess, is the spaces between words: how can I tell the code to ignore them? More specifically, is there any function that I can use in order to read each word from a .txt file?
UPDATE:
I'm trying to avoid all the commas ., but also ? ! (). I used find_first_of(), but my program doesn't work. Also, I don't know how to set what are the characters I don't want to be read, i.e. ., ?, !, and so on
std::vector<std::string> my_vec;
std::string line;
while(std::cin>>line){
std::size_t pos = line.find_first_of("!");
std::string line = line.substr(pos);
my_vec.push_back(line);
}
'>>' operator of type string exactly fills your requirements.
std::vector<std::string> words;
std::string line;
while (std::cin >> line) {
words.push_back(line);
}
If you need remove some noisy characters, e.g. ',','.', you can replace them with space character first.
#include <iostream>
#include <sstream>
#include <vector>
#include <algorithm>
int main() {
std::vector<std::string> words;
std::string line;
while (getline(std::cin, line)) {
std::transform(line.begin(), line.end(), line.begin(),
[](char c) { return std::isalnum(c) ? c : ' '; });
std::stringstream linestream(line);
std::string w;
while (linestream >> w) {
std::cout << w << "\n";
words.push_back(w);
}
}
}
cppreference
The getline function, as it sounds, only returns a whole line. You can split each line on spaces after reading it, or you can read word by word using operator>>:
string word;
while (cin >> word){
cout << word << "\n";
words.push_back(word);
}
Use operator>> instead of std::getline(). The operator will read individual whitespace-separated substrings for you.
#include <iostream>
#include <string>
#include <vector>
std::vector<std::string> my_vec;
std::string s;
while (std::cin >> s){
// use s as needed...
}
However, you may still end up receiving strings that have punctuation in them without any surrounding whitespace, ie hello,world, so you will have to manually split those strings as needed, eg:
#include <iostream>
#include <string>
#include <vector>
#include <cctype>
std::vector<std::string> my_vec;
std::string s;
while (std::cin >> s){
std::string::size_type start = 0, pos;
while ((pos = s.find_first_of(".,?!()", start)) != std::string::npos){
my_vec.push_back(s.substr(start, pos-start));
start = s.find_first_not_of(".,?!() \t\f\r\n\v", pos+1);
}
if (start == 0)
my_vec.push_back(s);
else if (start != std::string::npos)
my_vec.push_back(s.substr(start));
}

Extract all numbers from stringstream

I want to read string and extract all numbers.
Input: 5a3 1f a0aaaa f1fg3
Output: 53 1 0 13
I tried this code:
string s;
getline(cin, s);
stringstream str_strm(s);
int found;
string temp;
while (!str_strm.eof()) {
str_strm >> temp;
if (stringstream(temp) >> found)
{
cout << found << endl;
}
}
but when found 5 (from example)after that automatically start to check the other string. How can I extract all numbers?
Here's a possible solution - while loop is used to separate strings with whitespaces, after that digits are extracted from the sub-strings.
int main()
{
stringstream ss("5a3 1f a0aaaa f1fg3");
string str;
while (getline(ss, str, ' ') ){
str.erase(std::remove_if(str.begin(), str.end(), [](unsigned char c) { return !std::isdigit(c); }), str.end());
cout << str << " ";
}
}
You could read each space separated word, and then remove the non-digits, like this
std::string word;
while (std::cin >> word)
{
word.erase(std::remove_if(word.begin(), word.end(),
[](unsigned char c) { return not std::isdigit(c); }),
word.end());
std::cout << word << " ";
}
For the input of 5a3 1f a0aaaa f1fg3, it prints 53 1 0 13.
The admittedly odd way of removing elements of a range, is a common idiom.
You could even avoid the loop entirely, if you have the input on a single line
std::string word;
std::getline(std::cin, word);
word.erase(std::remove_if(word.begin(), word.end(),
[](unsigned char c) { return not std::isdigit(c)
and not std::isspace(c); }),
word.end());
std::cout << word;
Please see here the ultra simple example. (There is an even simpler solution at the bottom of this post)
It is using modern C++ elements and algorithms. And has only a few lines of code.
#include <iostream>
#include <string>
#include <regex>
#include <iterator>
#include <algorithm>
#include <sstream>
int main() {
// Read a string from the console
if (std::string line{}; std::getline(std::cin, line)) {
// Put the complete line into a std::istringstream
std::istringstream iss{line};
// Print result
std::transform(std::istream_iterator<std::string>(iss), {}, std::ostream_iterator<std::string>(std::cout, " "),
[](const std::string& s) { return std::regex_replace(s, std::regex{ R"([^\d])" }, ""); });
}
return 0;
}
So, what's going on here. Let us look at it statement by statement. So, first:
if (std::string line{}; std::getline(std::cin, line)) {
This is a if-statement with initializer. If you look up if in the C++ reference, here, then you can see, that we can now have an additional initialization statement as the first part in the if. And why are we using that? Because it is an additional measure for scoping. The variable "line" is only used within the scope of the if statement. It is not needed outside the if. From the functionality point of view, it is the same as writing:
std::string line{};
if (std::getline(std::cin, line)) {
But then, "line" would be also visible outside of the if statement. And, because we want to prevent the pollution of outer namespace, we select this method.
Next is std::getline. This will read a complete line from the input stream, so, from the console (std::cin)and put it into the string. The std::getline returns a reference to the stream. The stream has an overloaded bool operator, that returns, if there was a failure (or end of file) or not. So, the if statement checks, if the input operation works. By the way. All IO-opereations should be checked, if they work or fail.
Good, now we have the complete line of the user input in our variable "line".
With
std::istringstream iss{line};
we put the string into an std::istringstream. We do this, because we want to make use of the C++ "iostream" library. The std::istringstream behaves as any other stream, for example std::cin and you can extract values from it that are separated by a white space. Like in std::cin >> v1 >> v2. The disadvantage for such an approach is, that you need to know the number of values in advance or use a dynamic growing container and a loop.
And this brings ud to our next construct that I want to explain. You may have heard about "iterators". Iterators are like pointers and can point to a range of elements. If you have a std::vector or any other container, then you can iterate with the begin() and end() iterator over all elements in the std::vector without knowing, how many elements are in the std::vector, without knowing how many elements it contains.
And for input streams, we have something similar: The std::istream_iterator. This iterator will iterate over the elements in the std::sitringstream and returns the type of variable given in its template parameter, by repeatedly calling the extractor operator >>. Here, in our case, a std::string. You may know ask: Until when? Where is the end. If you look in the description of the constructor number 1 of the std::istream_operator then you will see, that the default constructor Constructs the end-of-stream iterator. and the default construct can be generated by using the empty braced {} initializer. So {} is the end iterator.
If we want to read all std::strings from the std::istringstream, then we read between
std::istream_iterator<std::string>(iss) and {}. So every string that is in the std::istringstream.
Good, next, there is a similar thing for output, the std::ostream_iterator. This will call the inserter operator "<<" for all elements in a given range. And, we can can specify, to which stream it should send the data, here std::cout and additionally a separator-string, which will be appended to the outputted value.
OK, next: std::transform. As it names says, it will transform the elements in a range of elements, between a begin() and end() iterator, to a other range. So, it will transform the elements as shown above from the std::istringstream and send them to the std::ostream iterator. So, we read the source value, transform it, then write it.
But, how to transform. For the transformation, we give a simple lambda function, which calls the std::regex_replace function. This is a standard function, to replace parts of a string with other string data. And, the what that will be replaced is specified by a std::regex. This is a special pattern that is defined in some kind of meta language and matches specified parts of a string. in our case we use [^\d] which means, not a digit. You can test regexes here. You can also lean about them here.
And now, all together, explains the above solution.
All this can be further optimized to 2 statements:
#include <iostream>
#include <string>
#include <regex>
int main() {
// Read a string from the console
if (std::string line{}; std::getline(std::cin, line)) {
// Remove unnecessary characters
std::cout << std::regex_replace(line, std::regex{ R"([^\d ])" }, "") << "\n";
}
return 0;
}
I cannot think of a more simpler solution.
In case of questions, please ask.
You can use get from istream to get each character, including whitespace, and then isdigit to check for a digit character...
#include <iostream>
#include <cctype>
int main()
{
char ch;
std::cin.get(ch);
while (!std::cin.eof())
{
if (isdigit(ch) || ch == ' ' || ch == '\n')
{
std::cout << ch;
}
std::cin.get(ch);
}
return 0;
}
However, you can avoid using std::cin.eof() for your expression for your While loop as follows...
#include <iostream>
#include <cctype>
int main()
{
char ch;
while (std::cin.get(ch))
{
if (isdigit(ch) || ch == ' ' || ch == '\n')
{
std::cout << ch;
}
}
return 0;
}
Regular expression pattern matching can be used to find all the digits in the input string.
Here is an example program to find the digits:
// C++ program to find all digits in a string
#include <bits/stdc++.h>
using namespace std;
int main() {
string inputString;
cout << "Enter the input string: ";
getline(cin, inputString);
cout << "Digits found: ";
// Define the regular expression matcher and pattern
smatch matcher;
regex pattern("[[:digit:]]");
while (regex_search(inputString, matcher, pattern)) {
// Show the match
cout << matcher.str(0);
// Continue searching the rest of the string
inputString = matcher.suffix().str();
}
return 0;
}
Output:
Enter the input string: sdfh354 eutyt;ljkn756897490uiotureu 587689jkgf 90
Digits found: 35475689749058768990
Here is another approach of finding the numbers in the string, without using the regular expression pattern matching:
#include <iostream>
#include <cctype>
#include <bits/stdc++.h>
using namespace std;
int main() {
string rawInput;
cout <<"Enter input string: ";
getline(cin, rawInput);
// Get all words from the input string
stringstream allWords(rawInput);
// Find and print digits in each word
string word;
while(allWords >> word) {
for(int i = 0; word[i]; i++) {
// Print only the numbers in the word
if(isdigit(word[i])) {
cout<<word[i];
}
}
cout<<" ";
}
cout<<"\n";
return 0;
}
Output:
Enter input string: ghjg45 jsdfj 897897 343yut45 90
45 897897 34345 90
How can I extract all numbers?
When you KNOW that the input numbers are all hex values ... (and how many)
stringstream ss ("5a3 1f a0aaaa f1fg3");
for (int i=0; i<4; ++i)
{
int k;
ss >> hex >> k;
cout << k << endl;
}
with output
1443
31
10529450
3871

Split string by the first N occurrences of a delimiter using boost::split

I know that maybe this is a silly question, but I didn't find an answer yet. Let's suppose I have the following string:
std::string str("1,2,3,4,5,6,7,8,9,10");
I would like to split it using the comma as a delimiter, but only at the first N occurrences. For instance, for N = 2 I should obtain as a result:
{ "1", "2", "3,4,5,6,7,8,9,10" }.
Is it possibile to do it with boost::split? I have always used it in the following form:
std::vector<std::string> values;
boost::split(values, str, boost::is_any_of(","), boost::token_compress_on);
which splits the string on every occurrence of the delimiter.
In case it is not possibile, can someone suggest me an efficient way to do it?
I would like to avoid dealing with regexes, i. e. I would like to write a function like:
std::vector<std::string> tokenize(const std::string& to_split, const std::string& delimiter, int occurrences)
Thanks in advance for the help.
You can use:
find_nth(str, ",", 1);
from <boost/algorithm/string/find.hpp> to find the 2nd (or nth) occurrence of the character of interest.
Then use the traditional substr() on that possition.
Then tokenize the first part of the string boost::split.
And add the last part to the vector.
You can make use of stringstream and a modified locale:
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
struct csv_whitespace : std::ctype<char> {
static const mask* make_table() {
static std::vector<mask> v(classic_table(),
classic_table() + table_size);
v[','] |= space;
v[' '] &= ~space;
return &v[0];
}
csv_whitespace(std::size_t refs = 0)
: std::ctype<char>{ make_table(), false, refs } { }
};
int main() {
std::string str("1,2,3,4,5,6,7,8,9,10");
std::stringstream ss(str);
auto loc = ss.getloc();
ss.imbue(std::locale(loc, new csv_whitespace));
std::size_t N = 3;
while(N--) {
std::string a;
ss >> a;
std::cout << a << '\n';
}
ss.ignore();
ss.imbue(loc);
std::string rest;
ss >> rest;
std::cout << "Rest: " << rest << "\n";
}
Live demo
The trick here is to treat commas as whitespace, which can be done in the ctype<> facet.

Read comma separated values with stray whitespaces from a textfile in c++

I have a file that contains string,int,int values in multiple lines.
Delhi,12,13
Mumbai,100 , 101
Kolkata,11, 12
The values are separated by commas but there can be stray whitespaces in between.My current code is this :
#include<cstdio>
#include<iostream>
#include<string>
using namespace std;
int main()
{
FILE *f = fopen("input.txt","r");
int lines = 0;
char c = getc(f);
while(c != EOF)
{
if(c == '\n')
{
lines++;
}
c = getc(f);
}
lines++;
string arr[lines];
int t1[lines];
int t2[lines];
char s1[100],s2[100],s3[100];
int x,y;
fclose(f);
f = fopen("input.txt","r");
while (fscanf(f,"%99[^,],%99[^,],%99[^,]", s1, s1, s2)==3)
{
cout << s1 << s2 << s3 << endl;
}
}
This doesn't seem to quite properly read the values and display on the screen first of all. How do I read the string and the integer values here(which may have stray whitespaces) and store them into an array (three arrays to be precise) ?
Try doing this:
fscanf(f,"%[^, ]%*[ ,]%d%*[ ,]%d ", s1, &x, &y);
%[^, ] => searches for everything except , and <space> and stores it in s1
%*[ ,] => searches for , and <space> but does not store it anywhere (the * ensures that)
%d => stores the number
The problem is on this line:
while (fscanf(f,"%99[^,],%99[^,],%99[^,]", s1, s1, s2)==3)
It tries to scan up to the next comma character ',', which occurs on the next line. Replace with %99[^\n] to fix this problem:
while (fscanf(f,"%99[^,],%99[^,],%99[^\n]", s1, s1, s2)==3)
Why are you using FILE* and friends in C++?
The other answers specify the problem with your code, so I'm writing this answer to show you how to improve it.
std::ifstream file("input.txt");
std::string name, value0, value1;
while (std::getline(file, name, ',')) {
// Get the value strings from the stream.
std::getline(file, value0, ',');
std::getline(file, value1, ',');
// These will throw an exception when given invalid input.
int v0 = std::stoi(value0);
int v1 = std::stoi(value1);
// Do stuff with the strings
}
std::getline can be used to extract a string from a stream up until a certain delimiter. Whitespaces are ignored here, so we don't have to care about them. The return value of std::getline is the stream passed in, and it has an operator bool() that allows us to use it as a boolean expression. The value will become false when the stream is either empty or in some erroneous state.
Note that the above should be similar in behavior to:
while (file) {
std::getline(file, name, ',');
// ...
}
I'm pretty sure this must be a whole lot more readable than a string like "%99[^,],%99[^,],%99[^,]".
Cheers~

c++ inputing a string - not with getline() but with cin

Inputting a string with spaces!
Here is what I was thinking:
string name;
std::cout << "Please enter your full name: ";
std::cin >> std::noskipws;
while (std::cin >> name >> std::ws) {
full_name += name + " ";
}
Say your name was Bill Billy Bobby Bronson Billson.
or maybe something like adding:
if (name == "\n")
break;
With getline(), it's one statement. However, I don't want to use getline() for research reasons.
Can it be done?
Update:
If I try my code I get an infinite loop no matter what I change.
I don't see why you'd really want to do so, but yes, this is possible.
operator>> for std::string reads input characters until it encounters a white-space character. A stream has a ctype facet that it uses to determine whether a character is white space or not.
In this case, you want a ctype facet that only classifies \n as white space.
struct line_reader: std::ctype<char> {
line_reader(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
static std::vector<std::ctype_base::mask>
rc(table_size, std::ctype_base::mask());
rc['\n'] = std::ctype_base::space;
return &rc[0];
}
};
You imbue your input file with an instance of a locale that includes that ctype facet:
int main() {
std::vector<std::string> lines;
// Tell the stream to use our facet, so only '\n' is treated as a space.
std::cin.imbue(std::locale(std::locale(), new line_reader()));
// to keep things at least a little interesting, we'll copy lines from input
// to output if (and only if) they contain at least one space character:
std::copy_if(std::istream_iterator<std::string>(std::cin),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout, "\n"),
[](std::string const &s) {
return s.find(' ') != std::string::npos;
});
}
Here I've used std::istream_iterator, which uses the extraction operator for the specified type (std::string in this case) to read the data.