I have a string str ( "1 + 2 = 3" ). I want to obtain the individual numbers of the string in their decimal values( not ASCII ). I have tried atoi and c_str(). But both them require the entire string to consist of only numbers. I am writing my code in C++.
Any help would be great.
My challenge is to evaluate a prefix expression. I am reading from a file where each line contains a prefix expression. My code snippet to tokenize and and store the variables is as shown below. Each line of the file contains numbers and operators(+,-,*) which are separated by a space.
Ex - line = ( * + 2 3 4);
ifstream file;
string line;
file.open(argv[1]);
while(!file.eof())
{
getline(file,line);
if(line.length()==0)
continue;
else
{
vector<int> vec;
string delimiters = " ";
size_t current;
size_t next = -1;
do
{
current = next + 1;
next = line.find_first_of( delimiters, current );
if((line[next] <=57)&&(line[next] >=48))
vec.push_back(atoi((line.substr( current, next - current )).c_str()));
}while (next != string::npos);
cout << vec[0] << endl;
}
}
file.close();
In this case vec[0] prints 50 not 2.
You need to learn to delimit a string. Your delimiting characters would be mathematical operators (ie:
C: creating array of strings from delimited source string
http://www.gnu.org/software/libc/manual/html_node/Finding-Tokens-in-a-String.html
In the case of the second link, you would do something like:
const char delimiters[] = "+-=";
With this knowledge, you can create an array of strings, and call atoi() on each string to get the numeric equivalent. Then you can use the address (array index) of each delimiter to determine which operator is there.
For just things like addition and subtraction, this will be dead simple. If you want order of operations and multiplication, parentheses, etc, your process flow logic will be more complicated.
For a more in-depth example, please see this final link. A simple command-line calculator in C. That should make it crystal clear.
http://stevehanov.ca/blog/index.php?id=26
You will not fall into your if, since your next position will be at a delimiter.
string delimiters = " ";
...
next = line.find_first_of( delimiters, current );
if((line[next] <=57)&&(line[next] >=48))
...
Since your delimiters consist of " ", then line[next] will be a space character.
From the description of your problem, you are missing code that will save away your operators. There is no code to attempt to find the operators.
You don't have to assume ASCII for testing for a digit. You can use is_digit() for example, or you can compare against '9' and '0'.
When you print your vector element, you may be accessing the vector inappropriately, because no item may have ever been inserted into the array.
Don't use fin.eof() to control a loop. That function is only useful after a read has failed.
There are a number of ways to get ints from a std::string, I'm choosing std::stoi() from the C++11 standard in this case.
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
typedef std::vector<int> ints;
bool is_known_operator(std::string const& token)
{
static char const* tokens[] = {"*", "/", "+", "-"};
return std::find(std::begin(tokens), std::end(tokens), token) != std::end(tokens);
}
ints tokenise(std::string const& line)
{
ints vec;
std::string token;
std::istringstream iss(line);
while (iss >> token)
{
if (is_known_operator(token))
{
std::cout << "Handle operator [" << token << "]" << std::endl;
}
else
{
try
{
auto number = std::stoi(token);
vec.push_back(number);
}
catch (const std::invalid_argument&)
{
std::cerr << "Unexpected item in the bagging area ["
<< token << "]" << std::endl;
}
}
}
return vec;
}
int main(int, const char *argv[])
{
std::ifstream file(argv[1]);
std::string line;
ints vec;
while (std::getline(file, line))
{
vec = tokenise(line);
}
std::cout << "The following " << vec.size() << " numbers were read:\n";
std::copy(vec.begin(), vec.end(), std::ostream_iterator<int>(std::cout, "\n"));
}
Related
I would like to extract several sets of numbers from a string and paste them into another string accordingly.
Let's say there is a dialog and we get the WndCaption as String A (input string).
String A: Voltage: 2.0V, Current:0.4A, Resistance: 5.0Ω, Power: 1.5W.
String A is a dynamic input, it depends on the WndCaption of the dialog. For example, String A also can be The apple is inside column: 12, row: 3, box: 5.
String B (input reference string) is exactly the same as String A except the numbers are replaced by delimiter. It is used to extract the numbers from String A.
String B: Voltage: %fV, Current:%fA, Resistance: %fΩ, Power: %fW.
Then this is String C (output reference string),
String C: The answers are %fV; %fA; %fΩ; %fW.
String B and String C are paired together that get from a database (.txt file) using std::vector<>::data.
Question is: how can I extract that 4 sets of number and paste them into String C and get the final output as The answers are 2.0V; 0.4A; 5.0Ω; 1.5W.?
I tried to implement the split and merge method but it seems not possible for this situation. Any idea?
I've impelemented code below as an example what can be done.
I didn't understand what exact steps to achieve your task, but I understood that you need two things - first finding and extracting matches of some substring inside input string using regular expressions or something, second compose new string by putting inside some substrings found on first stage.
In my code below I implemented both stages - first extracting substrings using regular expression, second composing new string.
I used functions and objects from standard C++ module std::regex. Also I coded a special helper function string_format(format, args...) to be able to do formatting of composed result, formatting is achieved through described here formatting specifiers.
You may achieve your goal by repeating next code few times by first extracting necessary substring and the composing result formatted string.
Next code can be also run online here! or here!
#include <regex>
#include <string>
#include <regex>
#include <iostream>
#include <stdexcept>
#include <memory>
using namespace std;
// Possible formatting arguments are described here https://en.cppreference.com/w/cpp/io/c/fprintf
template<typename ... Args>
std::string string_format( const std::string& format, Args ... args )
{
size_t size = snprintf( nullptr, 0, format.c_str(), args ... ) + 1; // Extra space for '\0'
if( size <= 0 ){ throw std::runtime_error( "Error during formatting." ); }
std::unique_ptr<char[]> buf( new char[ size ] );
snprintf( buf.get(), size, format.c_str(), args ... );
return std::string( buf.get(), buf.get() + size - 1 ); // We don't want the '\0' inside
}
int main() {
try {
string str = "abc12345DEF xyz12ABcd";
cout << "Input string [" << str << "]." << endl;
string sre0 = "\\d+[A-Z]+";
// https://en.cppreference.com/w/cpp/header/regex
std::regex re0(sre0);
vector<string> elems;
std::sregex_token_iterator iter(str.begin(), str.end(), re0, 0);
std::sregex_token_iterator end;
while (iter != end) {
elems.push_back(*iter);
++iter;
cout << "matched [" << elems.back() << "] " << endl;
}
string fmt = "New String part0 \"%s\" and part1 \"%s\" and some number %d.";
string formatted = string_format(fmt.c_str(), elems.at(0).c_str(), elems.at(1).c_str(), 987);
cout << "Formatted [" << formatted << "]." << endl;
return 0;
} catch (exception const & ex) {
cout << "Exception: " << ex.what() << endl;
return -1;
}
}
Code output:
Input string [abc12345DEF xyz12ABcd].
matched [12345DEF]
matched [12AB]
Formatted [New String part0 "12345DEF" and part1 "12AB" and some number 987.].
You should definitely use regex to perform that task. They're more flexible and can parse almost anything.
In case the format is very limited, and will never ever change, you might get away by using sscanf. Here's an example program :
#include <iostream>
#include <string>
int main()
{
char input[100];
int param1, param2, param3;
char output[100];
sprintf( input, "%s", "AA123BBB45CCCCC6789DDDD" ); // your input
sscanf( input, "AA%dBBB%dCCCCC%dDDDD", ¶m1, ¶m2, ¶m3); // parse the parameters
sprintf(output, "E%dFF%dGGG%dHHHH", param1, param2, param3); // print in output with your new format
printf("%s", output);
return(0);
}
Post-edit :
That answer doesn't make any sense anymore, but I'll leave it like that.
My advice remains : use regexes.
I want to read string and extract all numbers.
Input: 5a3 1f a0aaaa f1fg3
Output: 53 1 0 13
I tried this code:
string s;
getline(cin, s);
stringstream str_strm(s);
int found;
string temp;
while (!str_strm.eof()) {
str_strm >> temp;
if (stringstream(temp) >> found)
{
cout << found << endl;
}
}
but when found 5 (from example)after that automatically start to check the other string. How can I extract all numbers?
Here's a possible solution - while loop is used to separate strings with whitespaces, after that digits are extracted from the sub-strings.
int main()
{
stringstream ss("5a3 1f a0aaaa f1fg3");
string str;
while (getline(ss, str, ' ') ){
str.erase(std::remove_if(str.begin(), str.end(), [](unsigned char c) { return !std::isdigit(c); }), str.end());
cout << str << " ";
}
}
You could read each space separated word, and then remove the non-digits, like this
std::string word;
while (std::cin >> word)
{
word.erase(std::remove_if(word.begin(), word.end(),
[](unsigned char c) { return not std::isdigit(c); }),
word.end());
std::cout << word << " ";
}
For the input of 5a3 1f a0aaaa f1fg3, it prints 53 1 0 13.
The admittedly odd way of removing elements of a range, is a common idiom.
You could even avoid the loop entirely, if you have the input on a single line
std::string word;
std::getline(std::cin, word);
word.erase(std::remove_if(word.begin(), word.end(),
[](unsigned char c) { return not std::isdigit(c)
and not std::isspace(c); }),
word.end());
std::cout << word;
Please see here the ultra simple example. (There is an even simpler solution at the bottom of this post)
It is using modern C++ elements and algorithms. And has only a few lines of code.
#include <iostream>
#include <string>
#include <regex>
#include <iterator>
#include <algorithm>
#include <sstream>
int main() {
// Read a string from the console
if (std::string line{}; std::getline(std::cin, line)) {
// Put the complete line into a std::istringstream
std::istringstream iss{line};
// Print result
std::transform(std::istream_iterator<std::string>(iss), {}, std::ostream_iterator<std::string>(std::cout, " "),
[](const std::string& s) { return std::regex_replace(s, std::regex{ R"([^\d])" }, ""); });
}
return 0;
}
So, what's going on here. Let us look at it statement by statement. So, first:
if (std::string line{}; std::getline(std::cin, line)) {
This is a if-statement with initializer. If you look up if in the C++ reference, here, then you can see, that we can now have an additional initialization statement as the first part in the if. And why are we using that? Because it is an additional measure for scoping. The variable "line" is only used within the scope of the if statement. It is not needed outside the if. From the functionality point of view, it is the same as writing:
std::string line{};
if (std::getline(std::cin, line)) {
But then, "line" would be also visible outside of the if statement. And, because we want to prevent the pollution of outer namespace, we select this method.
Next is std::getline. This will read a complete line from the input stream, so, from the console (std::cin)and put it into the string. The std::getline returns a reference to the stream. The stream has an overloaded bool operator, that returns, if there was a failure (or end of file) or not. So, the if statement checks, if the input operation works. By the way. All IO-opereations should be checked, if they work or fail.
Good, now we have the complete line of the user input in our variable "line".
With
std::istringstream iss{line};
we put the string into an std::istringstream. We do this, because we want to make use of the C++ "iostream" library. The std::istringstream behaves as any other stream, for example std::cin and you can extract values from it that are separated by a white space. Like in std::cin >> v1 >> v2. The disadvantage for such an approach is, that you need to know the number of values in advance or use a dynamic growing container and a loop.
And this brings ud to our next construct that I want to explain. You may have heard about "iterators". Iterators are like pointers and can point to a range of elements. If you have a std::vector or any other container, then you can iterate with the begin() and end() iterator over all elements in the std::vector without knowing, how many elements are in the std::vector, without knowing how many elements it contains.
And for input streams, we have something similar: The std::istream_iterator. This iterator will iterate over the elements in the std::sitringstream and returns the type of variable given in its template parameter, by repeatedly calling the extractor operator >>. Here, in our case, a std::string. You may know ask: Until when? Where is the end. If you look in the description of the constructor number 1 of the std::istream_operator then you will see, that the default constructor Constructs the end-of-stream iterator. and the default construct can be generated by using the empty braced {} initializer. So {} is the end iterator.
If we want to read all std::strings from the std::istringstream, then we read between
std::istream_iterator<std::string>(iss) and {}. So every string that is in the std::istringstream.
Good, next, there is a similar thing for output, the std::ostream_iterator. This will call the inserter operator "<<" for all elements in a given range. And, we can can specify, to which stream it should send the data, here std::cout and additionally a separator-string, which will be appended to the outputted value.
OK, next: std::transform. As it names says, it will transform the elements in a range of elements, between a begin() and end() iterator, to a other range. So, it will transform the elements as shown above from the std::istringstream and send them to the std::ostream iterator. So, we read the source value, transform it, then write it.
But, how to transform. For the transformation, we give a simple lambda function, which calls the std::regex_replace function. This is a standard function, to replace parts of a string with other string data. And, the what that will be replaced is specified by a std::regex. This is a special pattern that is defined in some kind of meta language and matches specified parts of a string. in our case we use [^\d] which means, not a digit. You can test regexes here. You can also lean about them here.
And now, all together, explains the above solution.
All this can be further optimized to 2 statements:
#include <iostream>
#include <string>
#include <regex>
int main() {
// Read a string from the console
if (std::string line{}; std::getline(std::cin, line)) {
// Remove unnecessary characters
std::cout << std::regex_replace(line, std::regex{ R"([^\d ])" }, "") << "\n";
}
return 0;
}
I cannot think of a more simpler solution.
In case of questions, please ask.
You can use get from istream to get each character, including whitespace, and then isdigit to check for a digit character...
#include <iostream>
#include <cctype>
int main()
{
char ch;
std::cin.get(ch);
while (!std::cin.eof())
{
if (isdigit(ch) || ch == ' ' || ch == '\n')
{
std::cout << ch;
}
std::cin.get(ch);
}
return 0;
}
However, you can avoid using std::cin.eof() for your expression for your While loop as follows...
#include <iostream>
#include <cctype>
int main()
{
char ch;
while (std::cin.get(ch))
{
if (isdigit(ch) || ch == ' ' || ch == '\n')
{
std::cout << ch;
}
}
return 0;
}
Regular expression pattern matching can be used to find all the digits in the input string.
Here is an example program to find the digits:
// C++ program to find all digits in a string
#include <bits/stdc++.h>
using namespace std;
int main() {
string inputString;
cout << "Enter the input string: ";
getline(cin, inputString);
cout << "Digits found: ";
// Define the regular expression matcher and pattern
smatch matcher;
regex pattern("[[:digit:]]");
while (regex_search(inputString, matcher, pattern)) {
// Show the match
cout << matcher.str(0);
// Continue searching the rest of the string
inputString = matcher.suffix().str();
}
return 0;
}
Output:
Enter the input string: sdfh354 eutyt;ljkn756897490uiotureu 587689jkgf 90
Digits found: 35475689749058768990
Here is another approach of finding the numbers in the string, without using the regular expression pattern matching:
#include <iostream>
#include <cctype>
#include <bits/stdc++.h>
using namespace std;
int main() {
string rawInput;
cout <<"Enter input string: ";
getline(cin, rawInput);
// Get all words from the input string
stringstream allWords(rawInput);
// Find and print digits in each word
string word;
while(allWords >> word) {
for(int i = 0; word[i]; i++) {
// Print only the numbers in the word
if(isdigit(word[i])) {
cout<<word[i];
}
}
cout<<" ";
}
cout<<"\n";
return 0;
}
Output:
Enter input string: ghjg45 jsdfj 897897 343yut45 90
45 897897 34345 90
How can I extract all numbers?
When you KNOW that the input numbers are all hex values ... (and how many)
stringstream ss ("5a3 1f a0aaaa f1fg3");
for (int i=0; i<4; ++i)
{
int k;
ss >> hex >> k;
cout << k << endl;
}
with output
1443
31
10529450
3871
I was given a code from my professor that takes multiple lines of input. I am currently changing the code for our current assignment and I came across an issue. The code is meant to take strings of input and separate them into sentences from periods and put those strings into a vector.
vector<string> words;
string getInput() {
string s = ""; // string to return
bool cont = true; // loop control.. continue is true
while (cont){ // while continue
string l; // string to hold a line
cin >> l; // get line
char lastChar = l.at(l.size()-1);
if(lastChar=='.') {
l = l.substr(0, l.size()-1);
if(l.size()>0){
words.push_back(s);
s = "";
}
}
if (lastChar==';') { // use ';' to stop input
l = l.substr(0, l.size()-1);
if (l.size()>0)
s = s + " " + l;
cont = false; // set loop control to stop
}
else
s = s + " " + l; // add line to string to return
// add a blank space to prevent
// making a new word from last
// word in string and first word
// in line
}
return s;
}
int main()
{
cout << "Input something: ";
string s = getInput();
cout << "Your input: " << s << "\n" << endl;
for(int i=0; i<words.size(); i++){
cout << words[i] << "\n";
}
}
The code puts strings into a vector but takes the last word of the sentence and attaches it to the next string and I cannot seem to understand why.
This line
s = s + " " + l;
will always execute, except for the end of input, even if the last character is '.'. You are most likely missing an else between the two if-s.
You have:
string l; // string to hold a line
cin >> l; // get line
The last line does not read a line unless the entire line has non-white space characters. To read a line of text, use:
std::getline(std::cin, l);
It's hard telling whether that is tripping your code up since you haven't posted any sample input.
I would at least consider doing this job somewhat differently. Right now, you're reading a word at a time, then putting the words back together until you get to a period.
One possible alternative would be to use std::getline to read input until you get to a period, and put the whole string into the vector at once. Code to do the job this way could look something like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
int main() {
std::vector<std::string> s;
std::string temp;
while (std::getline(std::cin, temp, '.'))
s.push_back(temp);
std::transform(s.begin(), s.end(),
std::ostream_iterator<std::string>(std::cout, ".\n"),
[](std::string const &s) { return s.substr(s.find_first_not_of(" \t\n")); });
}
This does behave differently in one circumstance--if you have a period somewhere other than at the end of a word, the original code will ignore that period (won't treat it as the end of a sentence) but this will. The obvious place this would make a difference would be if the input contained a number with a decimal point (e.g., 1.234), which this would break at the decimal point, so it would treat the 1 as the end of one sentence, and the 234 as the beginning of another. If, however, you don't need to deal with that type of input, this can simplify the code considerably.
If the sentences might contain decimal points, then I'd probably write the code more like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
class sentence {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, sentence &s) {
std::string temp, word;
while (is >> word) {
temp += word + ' ';
if (word.back() == '.')
break;
}
s.data = temp;
return is;
}
operator std::string() const { return data; }
};
int main() {
std::copy(std::istream_iterator<sentence>(std::cin),
std::istream_iterator<sentence>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Although somewhat longer and more complex, at least to me it still seems (considerably) simpler than the code in the question. I guess it's different in one way--it detects the end of the input by...detecting the end of the input, rather than depending on the input to contain a special delimiter to mark the end of the input. If you're running it interactively, you'll typically need to use a special key combination to signal the end of input (e.g., Ctrl+D on Linux/Unix, or F6 on Windows).
In any case, it's probably worth considering a fundamental difference between this code and the code in the question: this defines a sentence as a type, where the original code just leaves everything as strings, and manipulates strings. This defines an operator>> for a sentence, that reads a sentence from a stream as we want it read. This gives us a type we can manipulate as an object. Since it's like a string in other ways, we provide a conversion to string so once you're done reading one from a stream, you can just treat it as a string. Having done that, we can (for example) use a standard algorithm to read sentences from standard input, and write them to standard output, with a new-line after each to separate them.
I'm doing THIS UVa problem, which takes in the following input:
This is fun-
ny! Mr.P and I've never seen
this ice-cream flavour
before.Crazy eh?
#
This is fun-
ny! Mr.P and I've never seen
this ice-cream flavour
before.Crazy eh?
#
and produces this output:
1 1
2 3
3 2
4 3
5 3
6 1
7 1
8 1
1 1
2 3
3 2
4 3
5 3
6 1
7 1
8 1
In the input, # divides the cases. I'm supposed to get the length of each word and count the frequency of each different length (as you see in the output, a word of length 1 occurs once, length 2 occurs three times, 3 occurs twice, and so on).
My problem is this: When reading in cin, before.Crazy is counted as one word, since there is no space dividing them. It should then be as simple as splitting the string on certain punctuation ({".",",","!","?"} for example)...but C++ seems to have no simple way to split the string.
So, my question: How can I split the string and send in each returned string to my function that handles the rest of the problem?
Here's my code:
int main()
{
string input="";
while(cin.peek()!=-1)
{
while(cin >> input && input!="#")
{
lengthFrequency(input);
cout << input << " " << input.length() << endl;
}
if(cin.peek()!=-1) cout << endl;
lengthFrequencies.clear();
}
return 0;
}
lengthFrequency is a map<int,int>.
You can redefine what a stream considers to be a whitespace character using a std::locale with a custom std::ctype<char> facet. Here is corresponding code which doesn't quite do the assignment but demonstrates how to use the facet:
#include <algorithm>
#include <iostream>
#include <locale>
#include <string>
struct ctype
: std::ctype<char>
{
typedef std::ctype<char> base;
static base::mask const* make_table(char const* spaces,
base::mask* table)
{
base::mask const* classic(base::classic_table());
std::copy(classic, classic + base::table_size, table);
for (; *spaces; ++spaces) {
table[int(*spaces)] |= base::space;
}
return table;
}
ctype(char const* spaces)
: base(make_table(spaces, table))
{
}
base::mask table[base::table_size];
};
int main()
{
std::cin.imbue(std::locale(std::locale(), new ctype(".,!?")));
for (std::string s; std::cin >> s; ) {
std::cout << "s='" << s << "'\n";
}
}
Before counting the frequencies, you could parse the input string and replace all the {".",",","!","?"} characters with spaces (or whatever separation character you want to use). Then your existing code should work.
You may want to handle some characters differently. For example, in the case of before.Crazy you would replace the '.' with a space, but for something like 'ny! ' you would remove the '!' altogether because it is already followed by a space.
How about this (using the STL, comparators and functors)?
NOTE: All assumptions and explanations are in the source code itself.
#include <iostream>
#include <string>
#include <vector>
#include <cstdlib>
#include <sstream>
#include <algorithm>
#include <cctype>
#include <utility>
#include <string.h>
bool compare (const std::pair<int, int>& l, const std::pair<int, int>& r) {
return l.first < r.first;
}
//functor/unary predicate:
struct CompareFirst {
CompareFirst(int val) : val_(val) {}
bool operator()(const std::pair<int, int>& p) const {
return (val_ == p.first);
}
private:
int val_;
};
int main() {
char delims[] = ".,!?";
char noise[] ="-'";
//I'm assuming you've read the text from some file, and that information has been stored in a string. Or, the information is a string (like below):
std::string input = "This is fun-\nny, Mr.P and I've never seen\nthis ice-cream flavour\nbefore.Crazy eh?\n#\nThis is fun-\nny! Mr.P and I've never seen\nthis ice-cream flavour\nbefore.Crazy eh?\n#\n";
std::istringstream iss(input);
std::string temp;
//first split the string by #
while(std::getline(iss, temp, '#')) {
//find all the occurences of the hypens as it crosses lines, and remove the newline:
std::string::size_type begin = 0;
while(std::string::npos != (begin = temp.find('-', begin))) {
//look at the character in front of the current hypen and erase it if it's a newline, if it is - remove it
if (temp[begin+1] == '\n') {
temp.erase(begin+1, 1);
}
++begin;
}
//now, erase all the `noise` characters ("'-") as these count as these punctuation count as zero
for (int i = 0; i < strlen(noise); ++i) {
//this replaces all the hyphens and apostrophes with nothing
temp.erase(std::remove(temp.begin(), temp.end(), noise[i]), temp.end());//since hyphens occur across two lines, you need to erase newlines
}//at this point, everything is dandy for complete substitution
//now try to remove any other delim chracters by replacing them with spaces
for (int i = 0; i < strlen(delims); ++i) {
std::replace(temp.begin(), temp.end(), delims[i], ' ');
}
std::vector<std::pair<int, int> > occurences;
//initialize another input stringstream to make use of the whitespace
std::istringstream ss(temp);
//now use the whitespace to tokenize
while (ss >> temp) {
//try to find the token's size in the occurences
std::vector<std::pair<int, int> >::iterator it = std::find_if(occurences.begin(), occurences.end(), CompareFirst(temp.size()));
//if found, increment count by 1
if (it != occurences.end()) {
it->second += 1;//increment the count
}
//this is the first time it has been created. Store value, and a count of 1
else {
occurences.push_back(std::make_pair<int, int>(temp.size(), 1));
}
}
//now sort and output:
std::stable_sort(occurences.begin(), occurences.end(), compare);
for (int i = 0; i < occurences.size(); ++i) {
std::cout << occurences[i].first << " " << occurences[i].second << "\n";
}
std::cout << "\n";
}
return 0;
}
91 lines, and all vanilla C++98.
A rough outline of what I did is:
Since hyphens occur across two lines, find all hyphens and remove any newlines that follow them.
There are characters that don't add to the length of a word such as the legitimate hypenated words and the apostrophe. Find these and erase them as it makes tokenizing easier.
All the other remaining delimiters can now be found and replaced with whitespace. Why? Because we can use the whitespace to our advantage by using streams (whose default action is to skip whitespace).
Create a stream and tokenize the text via whitespace as per the previous.
Store the lengths of the tokens and their occurrences.
Sort the lengths of the tokens, and then output the token length and corresponding occurrences.
REFERENCES:
https://stackoverflow.com/a/5815875/866930
https://stackoverflow.com/a/12008126/866930
I'm trying to read just the integers from a text file structured like this....
ALS 46000
BZK 39850
CAR 38000
//....
using ifstream.
I've considered 2 options.
1) Regex using Boost
2) Creating a throwaway string ( i.e. I read in a word, don't do anything with it, then read in the score ). However, this is a last resort.
Are there any ways to express in C++ that I want the ifstream to only read in text that is an integer? I'm reluctant to use regular expressions if it turns out that there is a much simpler way to accomplish this.
why to make simple things complicated?
whats wrong in this :
ifstream ss("C:\\test.txt");
int score;
string name;
while( ss >> name >> score )
{
// do something with score
}
Edit:
it's in fact possible to work on streams directly with spirit than I suggested previously, with a parser:
+(omit[+(alpha|blank)] >> int_)
and one line of code(except for variable definitions):
void extract_file()
{
std::ifstream f("E:/dd/dd.trunk/sandbox/text.txt");
boost::spirit::istream_iterator it_begin(f), it_end;
// extract all numbers into a vector
std::vector<int> vi;
parse(it_begin, it_end, +(omit[+(alpha|blank)] >> int_), vi);
// print them to verify
std::copy(vi.begin(), vi.end(),
std::ostream_iterator<int>(std::cout, ", " ));
}
you get all numbers into a vector at once with one line, couldn't be simpler.
if you do not mind using boost.spirit2. the parser to get numbers from a line only is
omit[+(alpha|blank)] >> int_
to extract everything is
+(alpha|blank) >> int_
See the whole program below(Test with VC10 Beta 2):
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <cstring>
#include <vector>
#include <fstream>
#include <algorithm>
#include <iterator>
using std::cout;
using namespace boost::spirit;
using namespace boost::spirit::qi;
void extract_everything(std::string& line)
{
std::string::iterator it_begin = line.begin();
std::string::iterator it_end = line.end();
std::string s;
int i;
parse(it_begin, it_end, +(alpha|blank)>>int_, s, i);
cout << "string " << s
<< "followed by nubmer " << i
<< std::endl;
}
void extract_number(std::string& line)
{
std::string::iterator it_begin = line.begin();
std::string::iterator it_end = line.end();
int i;
parse(it_begin, it_end, omit[+(alpha|blank)] >> int_, i);
cout << "number only: " << i << std::endl;
}
void extract_line()
{
std::ifstream f("E:/dd/dd.trunk/sandbox/text.txt");
std::string s;
int i;
// iterated file line by line
while(getline(f, s))
{
cout << "parsing " << s << " yields:\n";
extract_number(s); //
extract_everything(s);
}
}
void extract_file()
{
std::ifstream f("E:/dd/dd.trunk/sandbox/text.txt");
boost::spirit::istream_iterator it_begin(f), it_end;
// extract all numbers into a vector
std::vector<int> vi;
parse(it_begin, it_end, +(omit[+(alpha|blank)] >> int_), vi);
// print them to verify
std::copy(vi.begin(), vi.end(),
std::ostream_iterator<int>(std::cout, ", " ));
}
int main(int argc, char * argv[])
{
extract_line();
extract_file();
return 0;
}
outputs:
parsing ALS 46000 yields:
number only: 46000
string ALS followed by nubmer 46000
parsing BZK 39850 yields:
number only: 39850
string BZK followed by nubmer 39850
parsing CAR 38000 yields:
number only: 38000
string CAR followed by nubmer 38000
46000, 39850, 38000,
You can call ignore to have in skip over a specified number of characters.
istr.ignore(4);
You can also tell it to stop at a delimiter. You would still need to know the maximum number of characters the leading string could be, but this would also work for shorter leading strings:
istr.ignore(10, ' ');
You could also write a loop that just reads characters until you see the first digit character:
char c;
while (istr.getchar(c) && !isdigit(c))
{
// do nothing
}
if (istr && isdigit(c))
istr.putback(c);
here goes :P
private static void readFile(String fileName) {
try {
HashMap<String, Integer> map = new HashMap<String, Integer>();
File file = new File(fileName);
Scanner scanner = new Scanner(file).useDelimiter(";");
while (scanner.hasNext()) {
String token = scanner.next();
String[] split = token.split(":");
if (split.length == 2) {
Integer count = map.get(split[0]);
map.put(split[0], count == null ? 1 : count + 1);
System.out.println(split[0] + ":" + split[1]);
} else {
split = token.split("=");
if (split.length == 2) {
Integer count = map.get(split[0]);
map.put(split[0], count == null ? 1 : count + 1);
System.out.println(split[0] + ":" + split[1]);
}
}
}
scanner.close();
System.out.println("Counts:" + map);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
readFile("test.txt");
}
}
fscanf(file, "%*s %d", &num);
or %05d if you have leading zeros and fixed width of 5....
sometimes the fastest way to do things in C++ is to use C. :)
You can create a ctype facet that classifies letters as white space. Create a locale that uses this facet, then imbue the stream with that locale. Having that, you can extract numbers from the stream, but all letters will be treated as white space (i.e. when you extract numbers, the letters will be ignored just like a space or a tab would be):
Such a locale can look like this:
#include <iostream>
#include <locale>
#include <vector>
#include <algorithm>
struct digits_only: std::ctype<char>
{
digits_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
if (rc['0'] == std::ctype_base::space)
std::fill_n(&rc['0'], 9, std::ctype_base::mask());
return &rc[0];
}
};
Sample code to use it could look like this:
int main() {
std::cin.imbue(std::locale(std::locale(), new digits_only()));
std::copy(std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
std::ostream_iterator<int>(std::cout, "\n"));
}
Using your sample data, the output I get from this looks like this:
46000
39850
38000
Note that as it stands, I've written this to accept only digits. If (for example) you were reading floating point numbers, you'd also want to retain '.' (or the locale-specific equivalent) as the decimal point. One way to handle things is to start with a copy of the normal ctype table, and then just set the things you want to ignore as space.