Function that counts the number of integers in a text file? - c++

I need to write just a function that counts the number of integers in an already opened and good text file.
a. Assume there is a text file with a large number of integers divided by spaces
b. Write a function called analyzeFile that accepts a previously opened ifstream
file object as a parameter, and counts the number of integers in the file.
c. It does not need to do anything with the integers, but it must count exactly the
correct number of integers in the file and return that number to the calling
function.
d. It also does not need to manipulate the file operations themselves, so it does not
need to close the file or conduct any other actions other than counting the integers
and returning the number of them.
Thank you for any help on my problem!
Edit:
Here is what I have as a function do far, is it right, I don't know:
int analizeFile (ifstream &inf, const string &fileName) {
int count = 1;
int num;
fin.open(fileName.c_str() );
fin >> num;
while (fin.good() ) {
fin>> num;
count ++;
}
return count;
}

Comments:
int analizeFile (ifstream &inf, const string &fileName) {
Since the count is always a non-negative quantity, I'd prefer to use size_t rather than int. Nit: You may want to change the name of the function to analyzeFile.
int count = 1;
Problem starts here: If your file does not have any integer then you return a wrong result.
int num;
fin.open(fileName.c_str() );
No need to call open. This would typically be called by the ifstream ctor.
fin >> num;
while (fin.good() ) {
Again, this is not required. You can extract from the stream and test in the while condition -- something which is more frequently used.
fin>> num;
count ++;
}
return count;
}

You can use a functional approach too
// it was previously opened, so you don't need a filename.
int analyzeFile (istream &inf) {
std::istream_iterator<int> b(inf), e;
return std::distance(b, e);
}
If the iterator cannot read an integer, it will set the fail state on the stream and will compare equal to the end iterator. distance then returns the number of iteration steps it took to reach the end iterator.

Many many years later, you could come up with a more modern solution.
You can simply use associative containers like std__map or std::unordered:map for counting. This is more ore less the standard approach.
Then there are many many new and powerful functions availbale.
Using those, you could come up with some like:
#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <map>
#include <cctype>
using Counter = std::map<char, std::size_t>;
const std::string fileName{"test.txt"};
int main() {
// Open file and check, if it could be opened
if (std::ifstream ifs{fileName}; ifs) {
// Read all data
std::string text{std::istream_iterator<char>(ifs),{}};
// Define the counters
Counter upperCaseLettterCount{},lowerCaseLettterCount{};
// Iterate over all characters in the string and count
for (const char c : text) {
if (std::isupper(c)) upperCaseLettterCount[c]++;
if (std::islower(c)) lowerCaseLettterCount[c]++;
}
// Show result
std::cout << "\nUppercase count:\n\n";
for (const auto& [letter,count] : upperCaseLettterCount) std::cout << letter << " -> " << count << '\n';
std::cout << "\nLowercase count:\n\n";
for (const auto& [letter,count] : lowerCaseLettterCount) std::cout << letter << " -> " << count << '\n';
}
else
// Error, file could not be opened
std::cerr << "\n\n*** Error: Text file '" << fileName << "' could not be opened\n\n";
}

Related

How skip reading " : " while taking input from a file in C++

Lets say I want to input the hours, minutes and seconds from the first line of a file and store them to 3 different variables, hrs, mins and sec respectively.
I cant figure out an easy way to skip reading the colon character (":").
Input file example:
12:49:00
Store:
hrs = 12
mins = 59
sec = 00
You can use std::regex to match, range-check and validate your input all at once.
#include <iostream>
#include <regex>
#include <string>
int main()
{
const std::regex time_regex("(\\d|[0,1]\\d|2[0-3]):([0-5]\\d):([0-5]\\d)");
std::smatch time_match;
std::string line;
while (std::getline(std::cin, line))
{
if (std::regex_match(line, time_match, time_regex))
{
int hours = std::stoi(time_match[1]);
int minutes = std::stoi(time_match[2]);
int seconds = std::stoi(time_match[3]);
std::cout << "h=" << hours << " m=" << minutes << " s=" << seconds << std::endl;
}
else
{
std::cout << "Invalid time: " << line << std::endl;
}
}
return 0;
}
See this example live here.
Breaking down the regular expression (\\d|[0,1]\\d|2[0-3]):([0-5]\\d):([0-5]\\d):
\d|[0,1]\d|2[0-3] matches the hour (24-hour time) which is one of:
\d : 0-9
[0,1]\d : 01-19
2[0-3] : 20-23
[0-5]\d matches the minutes: two digits 00-59
[0-5]\d matches the seconds: two digits 00-59, as above.
An alternative not using a temporary character for skipping the colon:
#include <iostream>
int main()
{
int h,m,s;
std::cin >> h;
std::cin.ignore(1) >> m;
std::cin.ignore(1) >> s;
std::cout << h << ':' << m << ':' << s << std::endl;
return 0;
}
This seems to work:
int h, m, s;
char c;
cin >> h >> c >> m >> c >> s;
You just skip : symbol this way. I don't know whether it's a good solution.
With cin.ignore:
cin >> h;
cin.ignore(1);
cin >> m;
cin.ignore(1);
cin >> s;
There are already several good answers and one that has already been accepted; however I like to propose my solution not only as a valid answer to your problem but also in regards to a good design practice. IMHO when it involves reading information from a file and storing it's contents to variables or data structures I prefer to do it in a specific way. I like to separate the functionality and responsibility of specific operations into their own functions:
1: I first like to have a function to open a file, read the contents and to store the information into either a string, a stream or some large buffer. Once the appropriate amount of information is read from the file, then the function will close the file handle as we are done with it and then return back the results. There are several ways to do this yet they are all similar.
a: Read a single line from the file and return back a string or a stream.
b: Read in all information form the file line by line and store each line into its own string or stream and return back a vector of those strings or streams.
c: Read in all of the contents of the file into a single string, stream or large buffer and return that back.
2: After I have the contents of that file then I will typically call a function that will parse that data and these functions will vary depending on the type of content that needs to be parsed based on the data structures that will be used. Also, these parsing functions will call a function that will split the string into a vector of strings called tokens. After the split string function is called then the parsing of data will use the string manipulators-converters to convert a string to the required built in types that are needed for the current data structure that is in use and store them into the data structure that is passed in by reference.
3: There are two variations of my splitString function.
a: One takes a single character as a delimiter.
b: The other will take a string as its delimiter.
c: Both functions will return a vector of strings, based on the delimiter used.
Here is an example of my code using this text file for input.
time.txt
4:32:52
main.cpp
#include <vector>
#include <string>
#include <sstream>
#include <fstream>
#include <iostream>
#include <exception>
struct Time {
int hours;
int minutes;
int seconds;
};
std::vector<std::string> splitString( const std::string& s, char delimiter ) {
std::vector<std::string> tokens;
std::string token;
std::istringstream tokenStream( s );
while( std::getline( tokenStream, token, delimiter ) ) {
tokens.push_back( token );
}
return tokens;
}
std::string getLineFromFile( const char* filename ) {
std::ifstream file( filename );
if( !file ) {
std::stringstream stream;
stream << "failed to open file " << filename << '\n';
throw std::runtime_error( stream.str() );
}
std::string line;
std::getline( file, line );
file.close();
return line;
}
void parseLine( const std::string& fileContents, Time& time ) {
std::vector<std::string> output = splitString( fileContents, ':' );
// This is where you would want to do your sanity check to make sure
// that the contents from the file are valid inputs before converting
// them to the appropriate types and storing them into your data structure.
time.hours = std::stoi( output.at( 0 ) );
time.minutes = std::stoi( output.at( 1 ) );
time.seconds = std::stoi( output.at( 2 ) );
}
int main() {
try {
Time t;
std::string line = getLineFromFile( "time.txt" );
parseLine( line, t );
std::cout << "Hours: " << t.hours << '\n'
<< "Minutes: " << t.minutes << '\n'
<< "Seconds: " << t.seconds << "\n\n";
} catch( std::runtime_error& e ) {
std::cerr << e.what() << std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
Output:
Hours: 4
Minutes: 32
Seconds: 52
Now as you can see in this particular situation the functions that are being used here is designed only to read a single line from the file and of course the very first line from the file. I have other functions in my library not shown here that will read each line of a file until there are no more lines to read, or read all of the file into a single buffer. I have another version of split string that will take a string as its delimiter instead of a single character. Finally for the parsing function, each parsing function will end up being unique due to the fact that it will rely on the data structure that you are trying to use.
This allows the code to be readable as each function does what it is supposed to do and nothing more. I prefer this design over the fact of trying to get information from a file and trying to parse it while the file is open. Too many things can go wrong while the file is open and if the data is read wrong or corrupted but to the point where the compiler doesn't complain about it, then your variables or data structures may contain invalid information without you being aware of it. At least in this way you can open the file, get what you need from the file and store it into a string or a vector of strings, close the file when done reading and return back the contents. Then it becomes the parsing function's responsibility to test the data after it has been tokenized. Now, in the current parsing function that I shown above I did not do any sanity check to keep things simple, but that is where you would test your data to see if the information is valid before returning back your populated data structure.
If you are interested in another version of this where there are multiple lines being read in from the file, just comment a request and I will append it to this answer.

Unknown number of strings (in letters) in the input

I want to write a program in which the names of n different chemical elements are read in the same line in the input (where 1 ≤ n ≤ 17 and n is also read in the input) (the names are made apart by a space). The names of the chemical elements should be stored in different strings for further uses.
As n is unknown, I don't know how to make something like an "array of strings". Of course I should not make 17 different strings st1,st2,st3,... :D.
Can you please help me? Any help will be high appreciated and they will help me a lot.
Thank you in advance.
It sounds like you want to read in a line and split it with spaces. Try something such as this:
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
int main()
{
std::string input;
std::getline(std::cin, input); // takes one line, stops when enter is pressed
std::stringstream ss(input); // makes a stream using the string
std::vector<std::string> strings;
while (ss >> input) { // while there's data left in the stream, store it in a new string and add it to the vector of strings
strings.push_back(input);
}
for (std::string s : strings) {
std::cout << "string: " << s << std::endl;
}
}
You give input such as H He Li, terminated by hitting enter, and the strings are stored in strings (printed in last loop for demonstration).
Edit:
I now see that you want to read the number n in the input, too. In this case, you don't need the stringstream solution. You can do this instead:
int main()
{
int amount;
std::cin >> amount; // read in the amount
std::vector<std::string> strings;
for (int i = 0; i < amount; i++) {
std::string s;
std::cin >> s; // read in the nth string
strings.push_back(s); // add it to the vector
}
for (std::string s : strings) {
std::cout << "string: " << s << std::endl;
}
}
And pass input such as 3 H He Li.

.back and sort function

I am a complete beginner to c++ and for an assignment I have been asked to sort a text file of superheroes and output them to another text file e.g.
Unsorted.txt
Deadpool_8
Phoenix_9
Toad_4
Jubilee_3
alphabetically and numerically.
I have attempted to use .back for each line of string for the numerical sorting although it simply will not accept it and returns an error (which I have included in the code) as well as, whilst it is happy to write to console in full with cout attempting to write to text file results in only the last line of string e.g.
Toad_4
(all my mistakes have been commented out and it is currently only sorting alphabetically)
I'm paranoid about asking the same question as someone else, but I have not been able to find anything which solves my problem.
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <fstream>
#include <limits>
using namespace std;
// Empty vector holding names from file
vector<string> names;
string word;
string number;
string filename;
string sortChoice;
string lastChar;
bool alphaSortFinished = false; //bool added to prevent unnecessary looping
bool sortFinished = false;
void sortNumerically()
{
//word = word.back; returns this error
/*Error 1 error C3867: 'std::basic_string<char,std::char_traits<char>,std::allocator<char>>::back': function call missing argument list; use
'&std::basic_string<char,std::char_traits<char>,std::allocator<char>>::back' to create a pointer to member
d:\visual studio 2013\assessment one mdu118\strings, classes assessment one\source.cpp 20 1 Strings, Classes Assessment One*/
cout << "Please specify the file you would like to open\n" << endl;
cin >> filename;
// Read names from specified file
ifstream inFile(filename);
while (!inFile && sortFinished == false)
{
cout << "Unable to open file\n";
inFile.close();
sortNumerically();
}
while (getline(inFile, word)) //get lines of the string, store them in string word;
{
names.push_back(word);
}
sort(names.begin(), names.end());
// Loop to print names
for (size_t i = 0; i < names.size(); i++)
{
//ofstream writeToFile;
//writeToFile.open("NumericalSort.txt");
//writeToFile << names[i] << '\n';
//writeToFile.close();
cout << names[i] << '\n';
}
sortFinished = true;
inFile.close();
}
sorry there's probably a lot of crap that refers to other functions I didn't include. Am I taking the wrong route?
Thank you in advance
To populate vector of strings
vector<string> names;
ifstream inFile(filename);
copy(istream_iterator<string>(inFile), istream_iterator<string>(), back_inserter(names));
To sort lexicographically
sort(names.begin(), names.end());
To sort numerically
sort(names.begin(), names.end(), cmpr());
cmpr is a custom comparator defined to compare numerical part of strings.
To get the numerical part of string, use
int num = stoi(s.substr(s.find_last_of('_') + 1));
An example in C++11
sort(names.begin(), names.end(), [](const string & a, const string & b) {
int ia = stoi(a.substr(a.find_last_of('_') + 1));
int ib = stoi(b.substr(b.find_last_of('_') + 1));
return ia < ib;
});
See http://ideone.com/1Wcnvq demo
Read the names from the file to std::vector<string>.
sort(names.begin(), names.end()).
Now,
sort(names.begin(), names.end(), comparator)
The comparator should take 2 strings as arguments and compare the number part of the strings.

Iterating over a text file using Fortran like format in C++

I am making an application that deals with txt file data.
The idea is that txt files may come in different formats, and it should be read into C++.
One example might be 3I2, 3X, I3, which should be done as: "first we have 3 integers of length 2, then we have 3 empty spots, then we have 1 integer of length 3.
Is the best to iterate over the file, yielding lines, followed by iterating over the lines as strings? What would be an effective approach for iterating smartly leaving out the 3 spots to be ignored?
E.g.
101112---100
102113---101
103114---102
to:
10, 11, 12, 100
10, 21, 13, 101
10, 31, 14, 102
The link given by Kyle Kanos is a good one; *scanf/*printf format strings map pretty well onto fortran format strings. It's actually easier to do this using C-style IO, but using C++ style streams is doable as well:
#include <cstdio>
#include <iostream>
#include <fstream>
#include <string>
int main() {
std::ifstream fortranfile;
fortranfile.open("input.txt");
if (fortranfile.is_open()) {
std::string line;
getline(fortranfile, line);
while (fortranfile.good()) {
char dummy[4];
int i1, i2, i3, i4;
sscanf(line.c_str(), "%2d%2d%2d%3s%3d", &i1, &i2, &i3, dummy, &i4);
std::cout << "Line: '" << line << "' -> " << i1 << " " << i2 << " "
<< i3 << " " << i4 << std::endl;
getline(fortranfile, line);
}
}
fortranfile.close();
return 0;
}
Running gives
$ g++ -o readinput readinput.cc
$ ./readinput
Line: '101112---100' -> 10 11 12 100
Line: '102113---101' -> 10 21 13 101
Line: '103114---102' -> 10 31 14 102
Here the format string we're using is %2d%2d%2d%3s%3d - 3 copies of %2d (decimal integer of width 2) followed by %3s (string of width 3, which we read into a variable we never use) followed by %3d (decimal integer of width 3).
Given that you wish to dynamically parse Fortran Format specifier flags, you should note that: you've immediately walked into the realm of parsers.
In addition to the other methods of parsing such input that others have noted here:
By using Fortran and CC/++ bindings to do the parsing for you.
Using pure C++ to parse it for you by writing a parser using a combination of:
sscanf
streams
My proposal is that if boost is available to you, you can use it to implement a simple parser for on-the-fly operations, using a combination of Regexes and STL containers.
From what you've described, and what is shown in different places, you can construct a naive implementation of the grammar you wish to support, using regex captures:
(\\d{0,8})([[:alpha:]])(\\d{0,8})
Where the first group is the number of that variable type.
The second is the type of the variable.
and the third is the length of variable type.
Using this reference for the Fortran Format Specifier Flags, you can implement a naive solution as shown below:
#include <iostream>
#include <string>
#include <vector>
#include <fstream>
#include <cstdlib>
#include <boost/regex.hpp>
#include <boost/tokenizer.hpp>
#include <boost/algorithm/string.hpp>
#include <boost/lexical_cast.hpp>
//A POD Data Structure used for storing Fortran Format Tokens into their relative forms
typedef struct FortranFormatSpecifier {
char type;//the type of the variable
size_t number;//the number of times the variable is repeated
size_t length;//the length of the variable type
} FFlag;
//This class implements a rudimentary parser to parse Fortran Format
//Specifier Flags using Boost regexes.
class FormatParser {
public:
//typedefs for further use with the class and class methods
typedef boost::tokenizer<boost::char_separator<char> > bst_tokenizer;
typedef std::vector<std::vector<std::string> > vvstr;
typedef std::vector<std::string> vstr;
typedef std::vector<std::vector<int> > vvint;
typedef std::vector<int> vint;
FormatParser();
FormatParser(const std::string& fmt, const std::string& fname);
void parse();
void printIntData();
void printCharData();
private:
bool validateFmtString();
size_t determineOccurence(const std::string& numStr);
FFlag setFortranFmtArgs(const boost::smatch& matches);
void parseAndStore(const std::string& line);
void storeData();
std::string mFmtStr; //this holds the format string
std::string mFilename; //the name of the file
FFlag mFmt; //a temporary FFlag variable
std::vector<FFlag> mFortranVars; //this holds all the flags and details of them
std::vector<std::string> mRawData; //this holds the raw tokens
//this is where you will hold all the types of data you wish to support
vvint mIntData; //this holds all the int data
vvstr mCharData; //this holds all the character data (stored as strings for convenience)
};
FormatParser::FormatParser() : mFmtStr(), mFilename(), mFmt(), mFortranVars(), mRawData(), mIntData(), mCharData() {}
FormatParser::FormatParser(const std::string& fmt, const std::string& fname) : mFmtStr(fmt), mFilename(fname), mFmt(), mFortranVars(), mRawData(), mIntData(), mCharData() {}
//this function determines the number of times that a variable occurs
//by parsing a numeric string and returning the associated output
//based on the grammar
size_t FormatParser::determineOccurence(const std::string& numStr) {
size_t num = 0;
//this case means that no number was supplied in front of the type
if (numStr.empty()) {
num = 1;//hence, the default is 1
}
else {
//attempt to parse the numeric string and find it's equivalent
//integer value (since all occurences are whole numbers)
size_t n = atoi(numStr.c_str());
//this case covers if the numeric string is expicitly 0
//hence, logically, it doesn't occur, set the value accordingly
if (n == 0) {
num = 0;
}
else {
//set the value to its converted representation
num = n;
}
}
return num;
}
//from the boost::smatches, determine the set flags, store them
//and return it
FFlag FormatParser::setFortranFmtArgs(const boost::smatch& matches) {
FFlag ffs = {0};
std::string fmt_number, fmt_type, fmt_length;
fmt_number = matches[1];
fmt_type = matches[2];
fmt_length = matches[3];
ffs.type = fmt_type.c_str()[0];
ffs.number = determineOccurence(fmt_number);
ffs.length = determineOccurence(fmt_length);
return ffs;
}
//since the format string is CSV, split the string into tokens
//and then, validate the tokens by attempting to match them
//to the grammar (implemented as a simple regex). If the number of
//validations match, everything went well: return true. Otherwise:
//return false.
bool FormatParser::validateFmtString() {
boost::char_separator<char> sep(",");
bst_tokenizer tokens(mFmtStr, sep);
mFmt = FFlag();
size_t n_tokens = 0;
std::string token;
for(bst_tokenizer::const_iterator it = tokens.begin(); it != tokens.end(); ++it) {
token = *it;
boost::trim(token);
//this "grammar" is based on the Fortran Format Flag Specification
std::string rgx = "(\\d{0,8})([[:alpha:]])(\\d{0,8})";
boost::regex re(rgx);
boost::smatch matches;
if (boost::regex_match(token, matches, re, boost::match_extra)) {
mFmt = setFortranFmtArgs(matches);
mFortranVars.push_back(mFmt);
}
++n_tokens;
}
return mFortranVars.size() != n_tokens ? false : true;
}
//Now, parse each input line from a file and try to parse and store
//those variables into their associated containers.
void FormatParser::parseAndStore(const std::string& line) {
int offset = 0;
int integer = 0;
std::string varData;
std::vector<int> intData;
std::vector<std::string> charData;
offset = 0;
for (std::vector<FFlag>::const_iterator begin = mFortranVars.begin(); begin != mFortranVars.end(); ++begin) {
mFmt = *begin;
for (size_t i = 0; i < mFmt.number; offset += mFmt.length, ++i) {
varData = line.substr(offset, mFmt.length);
//now store the data, based on type:
switch(mFmt.type) {
case 'X':
break;
case 'A':
charData.push_back(varData);
break;
case 'I':
integer = atoi(varData.c_str());
intData.push_back(integer);
break;
default:
std::cerr << "Invalid type!\n";
}
}
}
mIntData.push_back(intData);
mCharData.push_back(charData);
}
//Open the input file, and attempt to parse the input file line-by-line.
void FormatParser::storeData() {
mFmt = FFlag();
std::ifstream ifile(mFilename.c_str(), std::ios::in);
std::string line;
if (ifile.is_open()) {
while(std::getline(ifile, line)) {
parseAndStore(line);
}
}
else {
std::cerr << "Error opening input file!\n";
exit(3);
}
}
//If character flags are set, this function will print the character data
//found, line-by-line
void FormatParser::printCharData() {
vvstr::const_iterator it = mCharData.begin();
vstr::const_iterator jt;
size_t linenum = 1;
std::cout << "\nCHARACTER DATA:\n";
for (; it != mCharData.end(); ++it) {
std::cout << "LINE " << linenum << " : ";
for (jt = it->begin(); jt != it->end(); ++jt) {
std::cout << *jt << " ";
}
++linenum;
std::cout << "\n";
}
}
//If integer flags are set, this function will print all the integer data
//found, line-by-line
void FormatParser::printIntData() {
vvint::const_iterator it = mIntData.begin();
vint::const_iterator jt;
size_t linenum = 1;
std::cout << "\nINT DATA:\n";
for (; it != mIntData.end(); ++it) {
std::cout << "LINE " << linenum << " : ";
for (jt = it->begin(); jt != it->end(); ++jt) {
std::cout << *jt << " ";
}
++linenum;
std::cout << "\n";
}
}
//Attempt to parse the input file, by first validating the format string
//and then, storing the data accordingly
void FormatParser::parse() {
if (!validateFmtString()) {
std::cerr << "Error parsing the input format string!\n";
exit(2);
}
else {
storeData();
}
}
int main(int argc, char **argv) {
if (argc < 3 || argc > 3) {
std::cerr << "Usage: " << argv[0] << "\t<Fortran Format Specifier(s)>\t<Filename>\n";
exit(1);
}
else {
//parse and print stuff here
FormatParser parser(argv[1], argv[2]);
parser.parse();
//print the data parsed (if any)
parser.printIntData();
parser.printCharData();
}
return 0;
}
This is standard c++98 code and can be compiled as follows:
g++ -Wall -std=c++98 -pedantic fortran_format_parser.cpp -lboost_regex
BONUS
This rudimentary parser also works on Characters too (Fortran Format Flag 'A', for up to 8 characters). You can extend this to support whatever flags you may like by editing the regex and performing checks on the length of captured strings in tandem with the type.
POSSIBLE IMPROVEMENTS
If C++11 is available to you, you can use lambdas in some places and substitute auto for the iterators.
If this is running in a limited memory space, and you have to parse a large file, vectors will inevitably crash due to the way how vectors manages memory internally. It will be better to use deques instead. For more on that see this as discussed from here:
http://www.gotw.ca/gotw/054.htm
And, if the input file is large, and file I/O is a bottleneck, you can improved performance by modifying the size of the ifstream buffer:
How to get IOStream to perform better?
DISCUSSION
What you will notice is that: the types that you're parsing must be known at runtime, and any associated storage containers must be supported in the class declaration and definition.
As you would imagine, supporting all types in one main class isn't efficient. However, as this is a naive solution, an improved full solution can be specialized to support these cases.
Another suggestion is to use Boost::Spirit. But, as Spirit uses a lot of templates, debugging such an application is not for the faint of heart when errors can and do occur.
PERFORMANCE
Compared to #Jonathan Dursi's solution, this solution is slow:
For 10,000,000 lines of randomly generated output (a 124MiB file) using this same line format ("3I2, 3X, I3"):
#include <fstream>
#include <cstdlib>
#include <ctime>
using namespace std;
int main(int argc, char **argv) {
srand(time(NULL));
if (argc < 2 || argc > 2) {
printf("Invalid usage! Use as follows:\t<Program>\t<Output Filename>\n");
exit(1);
}
ofstream ofile(argv[1], ios::out);
if (ofile.is_open()) {
for (int i = 0; i < 10000000; ++i) {
ofile << (rand() % (99-10+1) + 10) << (rand() % (99-10+1) + 10) << (rand() % (99-10+1)+10) << "---" << (rand() % (999-100+1) + 100) << endl;
}
}
ofile.close();
return 0;
}
My solution:
0m13.082s
0m13.107s
0m12.793s
0m12.851s
0m12.801s
0m12.968s
0m12.952s
0m12.886s
0m13.138s
0m12.882s
Clocks an average walltime of 12.946s
Jonathan Dursi's solution:
0m4.698s
0m4.650s
0m4.690s
0m4.675s
0m4.682s
0m4.681s
0m4.698s
0m4.675s
0m4.695s
0m4.696s
Blazes with average walltime of 4.684s
His is faster than mine by at least 270% with both on O2.
However, since you don't have to actually modify the source code every time you want to parse an additional format flag, then this solution is more optimal.
Note: you can implement a solution that involves sscanf / streams that only requires you to know what type of variable you wish to read (much like mine), but the additional checks such as verifying the type(s) bloats development time. (This is why I offer my solution in Boost, because of the convenience of tokenizers and regexes - which makes the development process easier).
REFERENCES
http://www.boost.org/doc/libs/1_34_1/libs/regex/doc/character_class_names.html
You could translate 3I2, 3X, I3 in a scanf format.
Given that Fortran is easily callable from C, you could write a little Fortran function to do this "natively." The Fortran READ function takes a format string as you describe, after all.
If you want this to work, you'll need to brush up on Fortran just a tiny bit (http://docs.oracle.com/cd/E19957-01/806-3593/2_io.html), plus learn how to link Fortran and C++ using your compiler. Here are a few tips:
The Fortran symbols may be implicitly suffixed with underscore, so MYFUNC may be called from C as myfunc_().
Multi-dimensional arrays have the opposite ordering of dimensions.
Declaring a Fortran (or C) function in a C++ header requires placing it in an extern "C" {} scope.
If your user is actually supposed to enter it in the Fortran format, or if you very quickly adapt or write Fortran code to do this, I would do as John Zwinck and M.S.B. suggest. Just write a short Fortran routine to read the data into an array, and use "bind(c)" and the ISO_C_BINDING types to set up the interface. And remember that the array indexing is going to change between Fortran and C++.
Otherwise, I would recommend using scanf, as mentioned above:
http://en.cppreference.com/w/cpp/io/c/fscanf
If you don't know the number of items per line you need to read, you might be able to use vscanf instead:
http://en.cppreference.com/w/cpp/io/c/vfscanf
However, although it looks convenient, I've never used this, so YMMV.
Thought about this some today but no time to write an example. #jrd1's example and analysis are on track but I'd try to make the parsing more modular and object oriented. The format string parser could build a list of item parsers that then worked more or less independently, allowing adding new ones like floating point without changing old code. I think a particularly nice interface would be an iomanip initialized with a format string so that the ui would be something like
cin >> f77format("3I2, 3X, I3") >> a >> b >> c >> d;
On implementation I'd have f77format parse the bits and build the parser by components, so it would create 3 fixed width int parsers, a devNull parser and another fixed width parser that would then consume the input.
Of course if you want to support all of the edit descriptors, it would be a big job! And in general it wouldn't just be passing the rest of the string on to the next parser since there are edit descriptors that require re-reading the line.

Counting occurrences of letter in a file

I'm trying to count the number of times each letter appears in a file. When I run the code below it counts "Z" twice. Can anyone explain why?
The test data is:
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
#include <iostream> //Required if your program does any I/O
#include <iomanip> //Required for output formatting
#include <fstream> //Required for file I/O
#include <string> //Required if your program uses C++ strings
#include <cmath> //Required for complex math functions
#include <cctype> //Required for letter case conversion
using namespace std; //Required for ANSI C++ 1998 standard.
int main ()
{
string reply;
string inputFileName;
ifstream inputFile;
char character;
int letterCount[127] = {};
cout << "Input file name: ";
getline(cin, inputFileName);
// Open the input file.
inputFile.open(inputFileName.c_str()); // Need .c_str() to convert a C++ string to a C-style string
// Check the file opened successfully.
if ( ! inputFile.is_open())
{
cout << "Unable to open input file." << endl;
cout << "Press enter to continue...";
getline(cin, reply);
exit(1);
}
while ( inputFile.peek() != EOF )
{
inputFile >> character;
//toupper(character);
letterCount[static_cast<int>(character)]++;
}
for (int iteration = 0; iteration <= 127; iteration++)
{
if ( letterCount[iteration] > 0 )
{
cout << static_cast<char>(iteration) << " " << letterCount[iteration] << endl;
}
}
system("pause");
exit(0);
}
As others have pointed out, you have two Qs in the input. The reason you have two Zs is that the last
inputFile >> character;
(probably when there's just a newline character left in the stream, hence not EOF) fails to convert anything, leaving a 'Z' in the global 'character' from the previous iteration. Try inspecting inputFile.fail() afterwards to see this:
while (inputFile.peek() != EOF)
{
inputFile >> character;
if (!inputFile.fail())
{
letterCount[static_cast<int>(character)]++;
}
}
The idiomatic way to write the loop, and which also fixes your 'Z' problem, is:
while (inputFile >> character)
{
letterCount[static_cast<int>(character)]++;
}
There are two Q's in your uppercase string. I believe the reason you get two counts for Z is that you should check for EOF after reading the character, not before, but I am not sure about that.
Well, others already have pointed out the error in your code.
But here is one elegant way you can read the file and count the letters in it:
struct letter_only: std::ctype<char>
{
letter_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
std::fill(&rc['A'], &rc['z'+1], std::ctype_base::alpha);
return &rc[0];
}
};
struct Counter
{
std::map<char, int> letterCount;
void operator()(char item)
{
if ( item != std::ctype_base::space)
++letterCount[tolower(item)]; //remove tolower if you want case-sensitive solution!
}
operator std::map<char, int>() { return letterCount ; }
};
int main()
{
ifstream input;
input.imbue(std::locale(std::locale(), new letter_only())); //enable reading only leters only!
input.open("filename.txt");
istream_iterator<char> start(input);
istream_iterator<char> end;
std::map<char, int> letterCount = std::for_each(start, end, Counter());
for (std::map<char, int>::iterator it = letterCount.begin(); it != letterCount.end(); ++it)
{
cout << it->first <<" : "<< it->second << endl;
}
}
This is modified (untested) version of this solution:
Elegant ways to count the frequency of words in a file
For one thing, you do have two Q's in the input.
Regarding Z, #Jeremiah is probably right in that it is doubly counted due to it being the last character, and your code not detecting EOF properly. This can be easily verified by e.g. changing the order of input characters.
As a side note, here
for (int iteration = 0; iteration <= 127; iteration++)
your index goes out of bounds; either the loop condition should be iteration < 127, or your array declared as int letterCount[128].
Given that you apparently only want to count English letters, it seems like you should be able to simplify your code considerably:
int main(int argc, char **argv) {
std::ifstream infile(argv[1]);
char ch;
static int counts[26];
while (infile >> ch)
if (isalpha(ch))
++counts[tolower(ch)-'a'];
for (int i=0; i<26; i++)
std::cout << 'A' + i << ": " << counts[i] <<"\n";
return 0;
}
Of course, there are quite a few more possibilities. Compared to #Nawaz's code (for example), this is obviously quite a bit shorter and simpler -- but it's also more limited (e.g., as it stands, it only works with un-accented English characters). It's pretty much restricted to the basic ASCII letters -- EBCDIC encoding, ISO 8859-x, or Unicode will break it completely.
His also makes it easy to apply the "letters only" filtration to any file. Choosing between them depends on whether you want/need/can use that flexibility or not. If you only care about the letters mentioned in the question, and only on typical machines that use some superset of ASCII, this code will handle the job more easily -- but if you need more than that, it's not suitable at all.