Comparing elements of text file - c++

I am trying to compare blocks of four numbers with each other to make a new output file with only the ones that meet that: four digit numbers which have all digits the same.
This is my code for the input file:
int main()
{
ofstream outfile ("text.txt");
outfile << "1111 1212 4444 \n 2222 \n \n 8888 4567" <<endl;
I want to split this in blocks of four like "1111", "1212" and so on to be able to only write the ones that meet the requirement in the new output file. I decided to conver the whole file into an integer vector to be able to compare them.
char digit;
ifstream file("text.txt");
vector <int> digits;
while(file>>digit)
{
digits.push_back(digit - '0');
}
and I suppose that the method that compares them would look something like this:
bool IsValid(vector<int> digits){
for (int i=0; i<digits.size() i++)
{
if(digits[0] == digits[1] == digits[2] == digits [3])
return true;
else
{
return false;
}
}
}
However this would just compare the first block, would you do it differently? or should I keep doing the vector idea.

Hm, all what I have seen is rather complicated.
Obviously you want to check for a pattern in a string. And patterns are usually matched with regular expressions.
This will give you an extremely short solution. Use std::regex. Regular expressions are part of C++ standard library. And they are also easy to use. And for your case you the regex is (\d)\1{3}. So, a digit followed by 3 of the same digits.
Program then boils down to one statement:
#include <sstream>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <string>
#include <regex>
std::istringstream testData{R"(1111 1212 444414 555
2222
8888 4567)"};
int main()
{
std::copy_if(
std::istream_iterator<std::string>(testData),
{},
std::ostream_iterator<std::string>(std::cout,"\n"),
[](const std::string& s){
return std::regex_match(s,std::regex(R"((\d)\1{3})"));
}
);
return 0;
}
Of course you may use any std::fstream instead of the std::istringstream
And of course this is only one of many many possible and maybe not the best solution . . .

I decided to conver the whole file into an integer vector to be able to compare them.
You can then extract ints from the stream directly (file >> int_variable) and check if they are multiples of 1111 or not.
Suggestions in code:
#include <fstream>
#include <iomanip>
#include <iostream>
#include <vector>
bool IsValid(int number) {
// Check that number is in the valid range and that it's a multiple of 1111.
return number >= 0 && number <= 9999 && (number / 1111) * 1111 == number;
}
// A function to process the values in a stream
std::vector<int> process_stream(std::istream& is) {
std::vector<int> digits;
int number;
while(is >> number) {
if(IsValid(number)) // Only save valid numbers
digits.push_back(number);
}
return digits;
}
int main() {
std::vector<int> digits;
// Check that opening the file succeeds before using it
if(std::ifstream file = std::ifstream("text.txt")) {
digits = process_stream(file);
}
// Print the collected int:s
for(int x : digits) {
std::cout << std::setw(4) << std::setfill('0') << x << '\n';
}
}

Another approach is to simply handle each input as a string, and the loop over each character in the string validating that it is a digit and equal to the previous character. If it fails either test, then what was read wasn't an integer with all digits equal.
For example you could do:
#include <iostream>
#include <sstream>
#include <string>
#include <cctype>
int main (void) {
int main (void) {
std::string s;
std::stringstream ss { "1 11 1111 foo 2222\nbar 1212\n4444\n8888\n4567\n"
"3433333 a8\n9999999999999999999\n" };
while (ss >> s) { /* read each string */
bool equaldigits = true; /* flags equal digits */
for (size_t i = 1; i < s.length(); i++) /* loop 1 - length */
/* validate previous & current digits & equal */
if (!isdigit(s[i-1]) || !isdigit(s[i]) || s[i-1] != s[i]) {
equaldigits = false; /* if not set flag false */
break; /* break loop */
}
/* handle empty-string or single char case */
if (!s.length() || (s.length() == 1 && !isdigit(s[0])))
equaldigits = false;
if (equaldigits) /* if all digits & equal */
std::cout << s << '\n'; /* output string */
}
}
The std::stringstream above simply provides simulated input for the program.
(note: you can loop with std::string::iterator if you like, or use a range-based for loop and prev char to store the last seen. Here, it's just as easy to iterate over indexes)
Using std::string find_first_not_of
Using existing string functions provides another way. After comparing that the first character is a digit, you can use std::basic_string::find_first_not_of to scan the rest of the string for a character that isn't the same as the first -- if the result isn't std::string::npos, then your string isn't all the same digit.
#include <iostream>
#include <sstream>
#include <string>
#include <cctype>
int main (void) {
std::string s;
std::stringstream ss { "1 11 1111 foo 2222\nbar 1212\n4444\n8888\n4567\n"
"3433333 a8\n9999999999999999999\n" };
while (ss >> s) { /* read each string */
if (!isdigit(s.at(0))) /* 1st char digit? */
continue;
/* if remainder of chars not equal 1st char - not equal digits */
if (s.find_first_not_of(s.at(0)) != std::string::npos)
continue;
std::cout << s << '\n';
}
}
Both approaches product the same output.
Example Use/Output
$ ./bin/intdigitssame
1
11
1111
2222
4444
8888
9999999999999999999
There are many other ways to do this as shown by the other good answers. It's worth understanding each approach.

Related

Extract all numbers from stringstream

I want to read string and extract all numbers.
Input: 5a3 1f a0aaaa f1fg3
Output: 53 1 0 13
I tried this code:
string s;
getline(cin, s);
stringstream str_strm(s);
int found;
string temp;
while (!str_strm.eof()) {
str_strm >> temp;
if (stringstream(temp) >> found)
{
cout << found << endl;
}
}
but when found 5 (from example)after that automatically start to check the other string. How can I extract all numbers?
Here's a possible solution - while loop is used to separate strings with whitespaces, after that digits are extracted from the sub-strings.
int main()
{
stringstream ss("5a3 1f a0aaaa f1fg3");
string str;
while (getline(ss, str, ' ') ){
str.erase(std::remove_if(str.begin(), str.end(), [](unsigned char c) { return !std::isdigit(c); }), str.end());
cout << str << " ";
}
}
You could read each space separated word, and then remove the non-digits, like this
std::string word;
while (std::cin >> word)
{
word.erase(std::remove_if(word.begin(), word.end(),
[](unsigned char c) { return not std::isdigit(c); }),
word.end());
std::cout << word << " ";
}
For the input of 5a3 1f a0aaaa f1fg3, it prints 53 1 0 13.
The admittedly odd way of removing elements of a range, is a common idiom.
You could even avoid the loop entirely, if you have the input on a single line
std::string word;
std::getline(std::cin, word);
word.erase(std::remove_if(word.begin(), word.end(),
[](unsigned char c) { return not std::isdigit(c)
and not std::isspace(c); }),
word.end());
std::cout << word;
Please see here the ultra simple example. (There is an even simpler solution at the bottom of this post)
It is using modern C++ elements and algorithms. And has only a few lines of code.
#include <iostream>
#include <string>
#include <regex>
#include <iterator>
#include <algorithm>
#include <sstream>
int main() {
// Read a string from the console
if (std::string line{}; std::getline(std::cin, line)) {
// Put the complete line into a std::istringstream
std::istringstream iss{line};
// Print result
std::transform(std::istream_iterator<std::string>(iss), {}, std::ostream_iterator<std::string>(std::cout, " "),
[](const std::string& s) { return std::regex_replace(s, std::regex{ R"([^\d])" }, ""); });
}
return 0;
}
So, what's going on here. Let us look at it statement by statement. So, first:
if (std::string line{}; std::getline(std::cin, line)) {
This is a if-statement with initializer. If you look up if in the C++ reference, here, then you can see, that we can now have an additional initialization statement as the first part in the if. And why are we using that? Because it is an additional measure for scoping. The variable "line" is only used within the scope of the if statement. It is not needed outside the if. From the functionality point of view, it is the same as writing:
std::string line{};
if (std::getline(std::cin, line)) {
But then, "line" would be also visible outside of the if statement. And, because we want to prevent the pollution of outer namespace, we select this method.
Next is std::getline. This will read a complete line from the input stream, so, from the console (std::cin)and put it into the string. The std::getline returns a reference to the stream. The stream has an overloaded bool operator, that returns, if there was a failure (or end of file) or not. So, the if statement checks, if the input operation works. By the way. All IO-opereations should be checked, if they work or fail.
Good, now we have the complete line of the user input in our variable "line".
With
std::istringstream iss{line};
we put the string into an std::istringstream. We do this, because we want to make use of the C++ "iostream" library. The std::istringstream behaves as any other stream, for example std::cin and you can extract values from it that are separated by a white space. Like in std::cin >> v1 >> v2. The disadvantage for such an approach is, that you need to know the number of values in advance or use a dynamic growing container and a loop.
And this brings ud to our next construct that I want to explain. You may have heard about "iterators". Iterators are like pointers and can point to a range of elements. If you have a std::vector or any other container, then you can iterate with the begin() and end() iterator over all elements in the std::vector without knowing, how many elements are in the std::vector, without knowing how many elements it contains.
And for input streams, we have something similar: The std::istream_iterator. This iterator will iterate over the elements in the std::sitringstream and returns the type of variable given in its template parameter, by repeatedly calling the extractor operator >>. Here, in our case, a std::string. You may know ask: Until when? Where is the end. If you look in the description of the constructor number 1 of the std::istream_operator then you will see, that the default constructor Constructs the end-of-stream iterator. and the default construct can be generated by using the empty braced {} initializer. So {} is the end iterator.
If we want to read all std::strings from the std::istringstream, then we read between
std::istream_iterator<std::string>(iss) and {}. So every string that is in the std::istringstream.
Good, next, there is a similar thing for output, the std::ostream_iterator. This will call the inserter operator "<<" for all elements in a given range. And, we can can specify, to which stream it should send the data, here std::cout and additionally a separator-string, which will be appended to the outputted value.
OK, next: std::transform. As it names says, it will transform the elements in a range of elements, between a begin() and end() iterator, to a other range. So, it will transform the elements as shown above from the std::istringstream and send them to the std::ostream iterator. So, we read the source value, transform it, then write it.
But, how to transform. For the transformation, we give a simple lambda function, which calls the std::regex_replace function. This is a standard function, to replace parts of a string with other string data. And, the what that will be replaced is specified by a std::regex. This is a special pattern that is defined in some kind of meta language and matches specified parts of a string. in our case we use [^\d] which means, not a digit. You can test regexes here. You can also lean about them here.
And now, all together, explains the above solution.
All this can be further optimized to 2 statements:
#include <iostream>
#include <string>
#include <regex>
int main() {
// Read a string from the console
if (std::string line{}; std::getline(std::cin, line)) {
// Remove unnecessary characters
std::cout << std::regex_replace(line, std::regex{ R"([^\d ])" }, "") << "\n";
}
return 0;
}
I cannot think of a more simpler solution.
In case of questions, please ask.
You can use get from istream to get each character, including whitespace, and then isdigit to check for a digit character...
#include <iostream>
#include <cctype>
int main()
{
char ch;
std::cin.get(ch);
while (!std::cin.eof())
{
if (isdigit(ch) || ch == ' ' || ch == '\n')
{
std::cout << ch;
}
std::cin.get(ch);
}
return 0;
}
However, you can avoid using std::cin.eof() for your expression for your While loop as follows...
#include <iostream>
#include <cctype>
int main()
{
char ch;
while (std::cin.get(ch))
{
if (isdigit(ch) || ch == ' ' || ch == '\n')
{
std::cout << ch;
}
}
return 0;
}
Regular expression pattern matching can be used to find all the digits in the input string.
Here is an example program to find the digits:
// C++ program to find all digits in a string
#include <bits/stdc++.h>
using namespace std;
int main() {
string inputString;
cout << "Enter the input string: ";
getline(cin, inputString);
cout << "Digits found: ";
// Define the regular expression matcher and pattern
smatch matcher;
regex pattern("[[:digit:]]");
while (regex_search(inputString, matcher, pattern)) {
// Show the match
cout << matcher.str(0);
// Continue searching the rest of the string
inputString = matcher.suffix().str();
}
return 0;
}
Output:
Enter the input string: sdfh354 eutyt;ljkn756897490uiotureu 587689jkgf 90
Digits found: 35475689749058768990
Here is another approach of finding the numbers in the string, without using the regular expression pattern matching:
#include <iostream>
#include <cctype>
#include <bits/stdc++.h>
using namespace std;
int main() {
string rawInput;
cout <<"Enter input string: ";
getline(cin, rawInput);
// Get all words from the input string
stringstream allWords(rawInput);
// Find and print digits in each word
string word;
while(allWords >> word) {
for(int i = 0; word[i]; i++) {
// Print only the numbers in the word
if(isdigit(word[i])) {
cout<<word[i];
}
}
cout<<" ";
}
cout<<"\n";
return 0;
}
Output:
Enter input string: ghjg45 jsdfj 897897 343yut45 90
45 897897 34345 90
How can I extract all numbers?
When you KNOW that the input numbers are all hex values ... (and how many)
stringstream ss ("5a3 1f a0aaaa f1fg3");
for (int i=0; i<4; ++i)
{
int k;
ss >> hex >> k;
cout << k << endl;
}
with output
1443
31
10529450
3871

Program is counting consonants wrong

I'm trying to make a program that counts all the vowels and all the consonants in a text file. However, if the file has a word such as cat it says that there are 3 consonants and 1 vowel when there should be 2 consonants and 1 vowel.
#include <string>
#include <cassert>
#include <cstdio>
using namespace std;
int main(void)
{
int i, j;
string inputFileName;
ifstream fileIn;
char ch;
cout<<"Enter the name of the file of characters: ";
cin>>inputFileName;
fileIn.open(inputFileName.data());
assert(fileIn.is_open());
i=0;
j=0;
while(!(fileIn.eof())){
ch=fileIn.get();
if (ch == 'a'||ch == 'e'||ch == 'i'||ch == 'o'||ch == 'u'||ch == 'y'){
i++;
}
else{
j++;
}
}
cout<<"The number of Consonants is: " << j << endl;
cout<<"The number of Vowels is: " << i << endl;
return 0;
}
Here you check if the eof state is set, then try to read a char. eof will not be set until you try to read beyond the end of the file, so reading a char fails, but you'll still count that char:
while(!(fileIn.eof())){
ch=fileIn.get(); // this fails and sets eof when you're at eof
So, if your file only contains 3 chars, c, a and t and you've read the t you'll find that eof() is not set. It'll be set when you try reading the next char.
A better way is to check if fileIn is still in a good state after the extraction:
while(fileIn >> ch) {
With that in place the counting should add up. All special characters will be counted as consonants though. To improve on that, you could check that the char is a letter:
#include <cctype>
// ...
while(fileIn >> ch) {
if(std::isalpha(ch)) { // only count letters
ch = std::tolower(ch); // makes it possible to count uppercase letters too
if(ch == 'a' || ch == 'e' || ch == 'i' || ch == 'o' || ch == 'u' || ch == 'y') {
i++;
} else {
j++;
}
}
}
Your program doesn't check for numbers and special characters, as well as uppercase letters. Plus, the .eof() is misused: it gets to the last character of the file, loops again, reads one more character, and only then it realizes it is at the end of the file, generating the extra consonant problem. Consider using while((ch = inFile.get()) != EOF).
I would use a different approach, searching strings:
const std::string vowels = "aeiou";
int vowel_quantity = 0;
int consonant_quantity = 0;
char c;
while (file >> c)
{
if (isalpha(c))
{
if (vowels.find(c) != std::string::npos)
{
++vowel_quantity;
}
else
{
++consonant_quantity;
}
}
}
Note: in the above code fragment, the character is first tested for an alphabetic characters. Characters may not be alphabetical like period or question mark. Your code counts periods as consonants.
Edit 1: character arrays
If you are not allowed to use std::string, you could also use character arrays (a.k.a. C-Strings):
static const char vowels[] = "aeiou";
int vowel_quantity = 0;
int consonant_quantity = 0;
char c;
while (file >> c)
{
if (isalpha(c))
{
if (strchr(vowels, c) != NULL)
{
++vowel_quantity;
}
else
{
++consonant_quantity;
}
}
}
I first thought my very first comment to your question was just a sidenote, but in fact it's the reason for the results you're getting. Your reading loop
while(!(fileIn.eof())){
ch=fileIn.get();
// process ch
}
is flawed. At the end of the file you'll check for EOF with !fileIn.eof() but you haven't read past the end yet so your program enters the loop once again and fileIn.get() will return EOF which will be counted as a consonant. The correct way to read is
while ((ch = file.get()) != EOF) {
// process ch
}
with ch declared as integer or
while (file >> ch) {
// process ch
}
with ch declared as char. To limit the scope of ch to the loop consider using a for-loop:
for (int ch{ file.get() }; ch != EOF; ch = file.get()) {
// process ch;
}
As #TedLyngmo pointed out in the comments, EOF could be replaced by std::char_traits<char>::eof() for consistency although it is specified to return EOF.
Also your program should handle everything that isn't a letter (numbers, signs, control characters, ...) differently from vowels and consonants. Have a look at the functions in <cctype>.
In addition to Why !.eof() inside a loop condition is always wrong., you have another test or two you must implement to count all vowels and consonants. As mentioned in the comment, you will want to use tolower() (by including cctype) to convert each char to lower before your if statement to ensure you classify both upper and lower-case vowels.
In addition to testing for vowels, you need an else if (isalpha(c)) test. You don't want to classify whitespace or punctuation as consonants.
Additionally, unless you were told to treat 'y' as a vowel, it technically isn't one. I'll leave that up to you.
Adding the tests, you could write a short implementation as:
#include <iostream>
#include <fstream>
#include <string>
#include <cctype>
int main (void) {
size_t cons = 0, vowels = 0;
std::string ifname {};
std::ifstream fin;
std::cout << "enter filename: ";
if (!(std::cin >> ifname)) {
std::cerr << "(user canceled input)\n";
exit (EXIT_FAILURE);
}
fin.open (ifname);
if (!fin.is_open()) {
std::cerr << "error: file open failed '" << ifname << "'\n";
exit (EXIT_FAILURE);
}
/* loop reading each character in file */
for (int c = fin.get(); !fin.eof(); c = fin.get()) {
c = tolower(c); /* convert to lower */
if (c=='a' || c=='e' || c=='i' || c=='o' || c=='u')
vowels++;
else if (isalpha(c)) /* must be alpha to be consonant */
cons++;
}
std::cout << "\nIn file " << ifname << " there are:\n " << vowels
<< " vowels, and\n " << cons << " conansants\n";
}
(also worth reading Why is “using namespace std;” considered bad practice?)
Example Input File
$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.
Example Use/Output
$ ./bin/vowelscons
enter filename: dat/captnjack.txt
In file dat/captnjack.txt there are:
25 vowels, and
34 conansants
Which if you count and classify each character gives the correct result.
Look things over and let me know if you have any questions.
I know that the following will be hard to digest. I want to show it anyway, because it is the "more-modern C++"-solution.
So, I will first think and develop an algorithm, and then use moderen C++ elements to implement it.
First to the algorithm. If we use the ASCII code to encode letters, then we will see the following:
We see that the ASCII code for uppercase and lowercase letters just differ in the lower 5 bits. So, if we mask the ASCII code with 0x1F, so char c{'a'}; unsigned int x{c & 0x1F}, we will get values between 1 and 26. So, we can calculte a 5 bit value for each letter. If we now mark all vowels with a 1, we can build a binary number, consisting of 32bits (an unsigned int) and set a bit at each position, where the vowel is true. We then get something like
Bit position
3322 2222 2222 1111 1111 1100 0000 0000
1098 7654 3210 9876 5432 1098 7654 3210
Position with vowels:
0000 0000 0010 0000 1000 0010 0010 0010
This numer can be converted to 0x208222. And if we now want to find out, if a letter (regardless whether upper- or lowercase) is a vowel, then we mask out the not necessary bits from the chararcter ( C & 1F ) and shift the binary number to the right as much position, as the resulting letter code has. If then the bit is set at the LSB position, then we have a vowel. This know how is decades old.
Aha. No so easy, but will work for ASCII coded letters.
Next, we create a Lambda, that will read a string that purely consists of alpha letters and counts the vowels. What is not a vowel, that is a consonant (because we have letters only).
Then we use modern C++ elements to calculate the requested values:
The result is some elegant C++ code with only a few lines.
Please see
#include <utility>
#include <algorithm>
#include <string>
#include <iostream>
#include <fstream>
#include <cctype>
int main() {
// Lambda for counting vowels and consonants in a string consisting of letters only
auto countVowelsAndConsonants = [](std::string& s) -> std::pair<size_t, size_t> {
size_t numberOfVowels = std::count_if(s.begin(), s.end(), [](const char c) { return (0x208222 >> (c & 0x1f)) & 1; });
return { numberOfVowels, s.size() - numberOfVowels }; };
// Inform the user what to do: He should enter a valid filename
std::cout << "\nCount vowels and consonants.\n\nEnter a valid filename with the source text: ";
// Read the filename
if (std::string fileName{}; std::cin >> fileName) {
// Now open the file and check, if it could be opened
if (std::ifstream sourceFileStream(fileName); sourceFileStream) {
// Read the complete source text file into a string. But only letters
std::string completeSourceTextFile{};
std::copy_if(std::istreambuf_iterator<char>(sourceFileStream), {}, std::back_inserter(completeSourceTextFile), std::isalpha);
// Now count the corresponding vowels and consonants
const auto [numberOfVowels, numberOfConsonants] = countVowelsAndConsonants(completeSourceTextFile);
// Show result to user:
std::cout << "\n\nNumber of vowels: " << numberOfVowels << "\nNumber of consonants: " << numberOfConsonants << "\n\n";
}
else {
std::cerr << "\n*** Error. Could not open source text file '" << fileName << "'\n\n";
}
}
else {
std::cerr << "\n*** Error. Could not get file name for source text file\n\n";
}
return 0;
}
Please note:
There are one million possible solutions. Everbody can do, what he wants.
Some people are still more in a C-Style mode and others do like more to program in C++

How to look up a pattern in string input?

How can I parse a string that looks like "xxxx-xxxx" and get those xxxx parts as a number? For an example, the user will type in "9349-2341" and I will get those numbers as two different integers.
I need to do that for a random number generator, which chooses the number between these xxxx variables.
Thanks.
You can use std::stringstream to extract numbers from string. It looks like that:
std::stringstream str_stream;
std::string str_to_parse = "1234-5678";
int num[2];
str_stream << str_to_parse;
str_stream >> num[0];
str_stream.ignore(1); // otherwise it will extract negative number (-5678)
str_stream >> num[1];
Also, there is C functions, like sscanf(). For example, your pattern can be extracted with this format: "%d-%d".
std::string str = "1234-5678";
std::string str1 = str.substr (0,4);
std::string str2 = str.substr(5, 4);
int n1 = std::stoi(str1);
int n2 = std::stoi(str2);
// do your random number generation between n1 and n2
Using regular expression
If your input is assured to resemble "xxxx-xxxx" where 'x' represents a digit, you can simply ultilize the following function:
#include <iostream>
#include <string>
#include <regex>
using namespace std;
int main()
{
string input = "9349-2341";
// This pattern matches any string begining with 4 digits and ending with 4 digits, both parts seperated by a slash
string pattern = "([0-9]{4})-[0-9]{4}";
smatch matcher;
regex prog (pattern);
if (regex_search(input, matcher, prog))
{
auto x = matcher[1];
cout << x << " " << endl;
input = matcher.suffix().str();
}
else
{
cout << "Invalid input!" << endl;
}
return 0;
}
As for how to convert string to number, check out this article, from which the following segment is quoted:
string Text = "456";//string containing the number
int Result;//number which will contain the result
stringstream convert(Text); // stringstream used for the conversion initialized with the contents of Text
if ( !(convert >> Result) )//give the value to Result using the characters in the string
Result = 0;//if that fails set Result to 0
//Result now equal to 456
Or, simply as followed:
Using sscanf
#include <cstdio>
using namespace std;
int main(int argc, char ** argv)
{
char input[] = "1234-5678";
int result, suffix;
sscanf(input, "%i-%i", &result, &suffix);
printf("Output: '%i-%i'.\n", result, suffix);
return 0;
}
You should check out C++ reference websites, such as CPlusPlus.

Splitting a String in C++ (using cin)

I'm doing THIS UVa problem, which takes in the following input:
This is fun-
ny! Mr.P and I've never seen
this ice-cream flavour
before.Crazy eh?
#
This is fun-
ny! Mr.P and I've never seen
this ice-cream flavour
before.Crazy eh?
#
and produces this output:
1 1
2 3
3 2
4 3
5 3
6 1
7 1
8 1
1 1
2 3
3 2
4 3
5 3
6 1
7 1
8 1
In the input, # divides the cases. I'm supposed to get the length of each word and count the frequency of each different length (as you see in the output, a word of length 1 occurs once, length 2 occurs three times, 3 occurs twice, and so on).
My problem is this: When reading in cin, before.Crazy is counted as one word, since there is no space dividing them. It should then be as simple as splitting the string on certain punctuation ({".",",","!","?"} for example)...but C++ seems to have no simple way to split the string.
So, my question: How can I split the string and send in each returned string to my function that handles the rest of the problem?
Here's my code:
int main()
{
string input="";
while(cin.peek()!=-1)
{
while(cin >> input && input!="#")
{
lengthFrequency(input);
cout << input << " " << input.length() << endl;
}
if(cin.peek()!=-1) cout << endl;
lengthFrequencies.clear();
}
return 0;
}
lengthFrequency is a map<int,int>.
You can redefine what a stream considers to be a whitespace character using a std::locale with a custom std::ctype<char> facet. Here is corresponding code which doesn't quite do the assignment but demonstrates how to use the facet:
#include <algorithm>
#include <iostream>
#include <locale>
#include <string>
struct ctype
: std::ctype<char>
{
typedef std::ctype<char> base;
static base::mask const* make_table(char const* spaces,
base::mask* table)
{
base::mask const* classic(base::classic_table());
std::copy(classic, classic + base::table_size, table);
for (; *spaces; ++spaces) {
table[int(*spaces)] |= base::space;
}
return table;
}
ctype(char const* spaces)
: base(make_table(spaces, table))
{
}
base::mask table[base::table_size];
};
int main()
{
std::cin.imbue(std::locale(std::locale(), new ctype(".,!?")));
for (std::string s; std::cin >> s; ) {
std::cout << "s='" << s << "'\n";
}
}
Before counting the frequencies, you could parse the input string and replace all the {".",",","!","?"} characters with spaces (or whatever separation character you want to use). Then your existing code should work.
You may want to handle some characters differently. For example, in the case of before.Crazy you would replace the '.' with a space, but for something like 'ny! ' you would remove the '!' altogether because it is already followed by a space.
How about this (using the STL, comparators and functors)?
NOTE: All assumptions and explanations are in the source code itself.
#include <iostream>
#include <string>
#include <vector>
#include <cstdlib>
#include <sstream>
#include <algorithm>
#include <cctype>
#include <utility>
#include <string.h>
bool compare (const std::pair<int, int>& l, const std::pair<int, int>& r) {
return l.first < r.first;
}
//functor/unary predicate:
struct CompareFirst {
CompareFirst(int val) : val_(val) {}
bool operator()(const std::pair<int, int>& p) const {
return (val_ == p.first);
}
private:
int val_;
};
int main() {
char delims[] = ".,!?";
char noise[] ="-'";
//I'm assuming you've read the text from some file, and that information has been stored in a string. Or, the information is a string (like below):
std::string input = "This is fun-\nny, Mr.P and I've never seen\nthis ice-cream flavour\nbefore.Crazy eh?\n#\nThis is fun-\nny! Mr.P and I've never seen\nthis ice-cream flavour\nbefore.Crazy eh?\n#\n";
std::istringstream iss(input);
std::string temp;
//first split the string by #
while(std::getline(iss, temp, '#')) {
//find all the occurences of the hypens as it crosses lines, and remove the newline:
std::string::size_type begin = 0;
while(std::string::npos != (begin = temp.find('-', begin))) {
//look at the character in front of the current hypen and erase it if it's a newline, if it is - remove it
if (temp[begin+1] == '\n') {
temp.erase(begin+1, 1);
}
++begin;
}
//now, erase all the `noise` characters ("'-") as these count as these punctuation count as zero
for (int i = 0; i < strlen(noise); ++i) {
//this replaces all the hyphens and apostrophes with nothing
temp.erase(std::remove(temp.begin(), temp.end(), noise[i]), temp.end());//since hyphens occur across two lines, you need to erase newlines
}//at this point, everything is dandy for complete substitution
//now try to remove any other delim chracters by replacing them with spaces
for (int i = 0; i < strlen(delims); ++i) {
std::replace(temp.begin(), temp.end(), delims[i], ' ');
}
std::vector<std::pair<int, int> > occurences;
//initialize another input stringstream to make use of the whitespace
std::istringstream ss(temp);
//now use the whitespace to tokenize
while (ss >> temp) {
//try to find the token's size in the occurences
std::vector<std::pair<int, int> >::iterator it = std::find_if(occurences.begin(), occurences.end(), CompareFirst(temp.size()));
//if found, increment count by 1
if (it != occurences.end()) {
it->second += 1;//increment the count
}
//this is the first time it has been created. Store value, and a count of 1
else {
occurences.push_back(std::make_pair<int, int>(temp.size(), 1));
}
}
//now sort and output:
std::stable_sort(occurences.begin(), occurences.end(), compare);
for (int i = 0; i < occurences.size(); ++i) {
std::cout << occurences[i].first << " " << occurences[i].second << "\n";
}
std::cout << "\n";
}
return 0;
}
91 lines, and all vanilla C++98.
A rough outline of what I did is:
Since hyphens occur across two lines, find all hyphens and remove any newlines that follow them.
There are characters that don't add to the length of a word such as the legitimate hypenated words and the apostrophe. Find these and erase them as it makes tokenizing easier.
All the other remaining delimiters can now be found and replaced with whitespace. Why? Because we can use the whitespace to our advantage by using streams (whose default action is to skip whitespace).
Create a stream and tokenize the text via whitespace as per the previous.
Store the lengths of the tokens and their occurrences.
Sort the lengths of the tokens, and then output the token length and corresponding occurrences.
REFERENCES:
https://stackoverflow.com/a/5815875/866930
https://stackoverflow.com/a/12008126/866930

Reading integers from a text file with words

I'm trying to read just the integers from a text file structured like this....
ALS 46000
BZK 39850
CAR 38000
//....
using ifstream.
I've considered 2 options.
1) Regex using Boost
2) Creating a throwaway string ( i.e. I read in a word, don't do anything with it, then read in the score ). However, this is a last resort.
Are there any ways to express in C++ that I want the ifstream to only read in text that is an integer? I'm reluctant to use regular expressions if it turns out that there is a much simpler way to accomplish this.
why to make simple things complicated?
whats wrong in this :
ifstream ss("C:\\test.txt");
int score;
string name;
while( ss >> name >> score )
{
// do something with score
}
Edit:
it's in fact possible to work on streams directly with spirit than I suggested previously, with a parser:
+(omit[+(alpha|blank)] >> int_)
and one line of code(except for variable definitions):
void extract_file()
{
std::ifstream f("E:/dd/dd.trunk/sandbox/text.txt");
boost::spirit::istream_iterator it_begin(f), it_end;
// extract all numbers into a vector
std::vector<int> vi;
parse(it_begin, it_end, +(omit[+(alpha|blank)] >> int_), vi);
// print them to verify
std::copy(vi.begin(), vi.end(),
std::ostream_iterator<int>(std::cout, ", " ));
}
you get all numbers into a vector at once with one line, couldn't be simpler.
if you do not mind using boost.spirit2. the parser to get numbers from a line only is
omit[+(alpha|blank)] >> int_
to extract everything is
+(alpha|blank) >> int_
See the whole program below(Test with VC10 Beta 2):
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <cstring>
#include <vector>
#include <fstream>
#include <algorithm>
#include <iterator>
using std::cout;
using namespace boost::spirit;
using namespace boost::spirit::qi;
void extract_everything(std::string& line)
{
std::string::iterator it_begin = line.begin();
std::string::iterator it_end = line.end();
std::string s;
int i;
parse(it_begin, it_end, +(alpha|blank)>>int_, s, i);
cout << "string " << s
<< "followed by nubmer " << i
<< std::endl;
}
void extract_number(std::string& line)
{
std::string::iterator it_begin = line.begin();
std::string::iterator it_end = line.end();
int i;
parse(it_begin, it_end, omit[+(alpha|blank)] >> int_, i);
cout << "number only: " << i << std::endl;
}
void extract_line()
{
std::ifstream f("E:/dd/dd.trunk/sandbox/text.txt");
std::string s;
int i;
// iterated file line by line
while(getline(f, s))
{
cout << "parsing " << s << " yields:\n";
extract_number(s); //
extract_everything(s);
}
}
void extract_file()
{
std::ifstream f("E:/dd/dd.trunk/sandbox/text.txt");
boost::spirit::istream_iterator it_begin(f), it_end;
// extract all numbers into a vector
std::vector<int> vi;
parse(it_begin, it_end, +(omit[+(alpha|blank)] >> int_), vi);
// print them to verify
std::copy(vi.begin(), vi.end(),
std::ostream_iterator<int>(std::cout, ", " ));
}
int main(int argc, char * argv[])
{
extract_line();
extract_file();
return 0;
}
outputs:
parsing ALS 46000 yields:
number only: 46000
string ALS followed by nubmer 46000
parsing BZK 39850 yields:
number only: 39850
string BZK followed by nubmer 39850
parsing CAR 38000 yields:
number only: 38000
string CAR followed by nubmer 38000
46000, 39850, 38000,
You can call ignore to have in skip over a specified number of characters.
istr.ignore(4);
You can also tell it to stop at a delimiter. You would still need to know the maximum number of characters the leading string could be, but this would also work for shorter leading strings:
istr.ignore(10, ' ');
You could also write a loop that just reads characters until you see the first digit character:
char c;
while (istr.getchar(c) && !isdigit(c))
{
// do nothing
}
if (istr && isdigit(c))
istr.putback(c);
here goes :P
private static void readFile(String fileName) {
try {
HashMap<String, Integer> map = new HashMap<String, Integer>();
File file = new File(fileName);
Scanner scanner = new Scanner(file).useDelimiter(";");
while (scanner.hasNext()) {
String token = scanner.next();
String[] split = token.split(":");
if (split.length == 2) {
Integer count = map.get(split[0]);
map.put(split[0], count == null ? 1 : count + 1);
System.out.println(split[0] + ":" + split[1]);
} else {
split = token.split("=");
if (split.length == 2) {
Integer count = map.get(split[0]);
map.put(split[0], count == null ? 1 : count + 1);
System.out.println(split[0] + ":" + split[1]);
}
}
}
scanner.close();
System.out.println("Counts:" + map);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
readFile("test.txt");
}
}
fscanf(file, "%*s %d", &num);
or %05d if you have leading zeros and fixed width of 5....
sometimes the fastest way to do things in C++ is to use C. :)
You can create a ctype facet that classifies letters as white space. Create a locale that uses this facet, then imbue the stream with that locale. Having that, you can extract numbers from the stream, but all letters will be treated as white space (i.e. when you extract numbers, the letters will be ignored just like a space or a tab would be):
Such a locale can look like this:
#include <iostream>
#include <locale>
#include <vector>
#include <algorithm>
struct digits_only: std::ctype<char>
{
digits_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
if (rc['0'] == std::ctype_base::space)
std::fill_n(&rc['0'], 9, std::ctype_base::mask());
return &rc[0];
}
};
Sample code to use it could look like this:
int main() {
std::cin.imbue(std::locale(std::locale(), new digits_only()));
std::copy(std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
std::ostream_iterator<int>(std::cout, "\n"));
}
Using your sample data, the output I get from this looks like this:
46000
39850
38000
Note that as it stands, I've written this to accept only digits. If (for example) you were reading floating point numbers, you'd also want to retain '.' (or the locale-specific equivalent) as the decimal point. One way to handle things is to start with a copy of the normal ctype table, and then just set the things you want to ignore as space.