For a school assignment I need to check whether a string entered by the user is stored in a pre-defined word array.
I want to implement a function to perform the check, that may look like this:
bool exists(dict words, char check) { /* how to implement this? */ }
But I have no clue whether this will work or how to implement it. Can anyone help?
Here's my code:
#include <iostream>
#include <string>
using namespace std;
struct dict {
string word;
};
int main() {
dict words[5];
words[0].word = 'abc';
words[1].word = 'bcd';
words[2].word = 'cde';
words[3].word = 'def';
words[4].word = 'efg';
char user_input[100];
cin.getline(user_input, 100);
if (...) { // how do I check if the user input is in my word array?
cout << "found\n";
}
else {
cout << "not found\n";
}
}
First of all, dict is a structure and char is type able to hold single character, so you would rather need to have:
bool exists(const dict* words, const string& check);
From this point, I would say, that:
const dict* should be changed to const vector<dict>&.
std::getline is able to read input directly into string, so no plain char array is needed.
But since it's a school assignment, I suppose, that you have some limitations (and can't use neither std::vector nor std::find, that would do the job). So:
bool exists(const dict* words, size_t count, const std::string& check)
{
for(size_t n = 0; words && (n < count); ++n)
{
if(words[n].word == check)
return true;
}
return false;
}
Example:
dict langs[3];
langs[0].word = "C++";
langs[1].word = "Java";
langs[2].word = "Python";
std::string s_1 = "Java";
std::string s_2 = "C++ 11";
printf("exists(%s) : %s\n", s_1.c_str(), exists(langs, 3, s_1) ? "yes" : "no");
printf("exists(%s) : %s\n", s_2.c_str(), exists(langs, 3, s_2) ? "yes" : "no");
Output:
exists(Java) : yes
exists(C++ 11) : no
Link to sample code.
As the other answer has already pointed out, you should add a size parameter to the function signature in order to be able to iterate the array (especially to know when to stop iteration.). Then a simple loop with a comparison will do the trick.
Note that you shouldn't normally need to use raw arrays in C++, but rather one of the containers from the standard library, e.g., std::vector. Also, you should use std::string and std::getline() for your user input, and you should fix your string literals (use double quotes "..." instead of single quotes '...'). Further, you should avoid using namespace std; conciouslessly. Have a look at the links at the end of this post for some further reading on these points.
Example code:
#include <iostream>
#include <string>
#include <vector>
bool exists(std::string const & user_input,
std::vector<std::string> const & words)
{
for (int i = 0; i < words.size(); i++)
if (user_input == words[i])
return true;
return false;
}
int main() {
std::vector<std::string> words(5);
words[0] = "abc";
words[1] = "bcd";
words[2] = "cde";
words[3] = "def";
words[4] = "efg";
std::string user_input;
std::getline(std::cin, user_input);
if (exists(user_input, words))
std::cout << "found\n";
else
std::cout << "not found\n";
}
Example output:
$ g++ test.cc && echo "abc" | ./a.out
found
The following might be beyond the scope of your school assignment, but maybe this will be helpful for future visitors to this question.
Note that an array (which std::vector is) is not the most efficient data structure to perform this sort of task, as you have to iterate the entire array to check every single item (linear complexity).
The C++ standard library also provides the container types std::set and std::unordered_set (the latter since C++11). Here the search space is organized in a special way (binary search tree: logarithmic complexity, hash table: constant complexity on average) to improve lookup time of the key type (std::string in this case).
Here's an example:
#include <iostream>
#include <string>
#include <set>
typedef std::set<std::string> set_type;
bool input_exists(std::string input, set_type const & words) {
return words.find(input) != words.end();
}
int main() {
set_type words = {"abc", "bcd", "cde", "def", "efg"};
std::string input;
if (std::getline(std::cin, input)) {
std::cout << "input: '" << input << "' ";
if (input_exists(input, words))
std::cout << "found\n";
else
std::cout << "not found\n";
}
}
Example output:
$ g++ test.cc -std=c++11
$ echo "abc" | ./a.out
input: 'abc' found
$ echo "abcdefg" | ./a.out
input: 'abcdefg' not found
For reference:
http://en.cppreference.com/w/cpp/container/vector
http://en.cppreference.com/w/cpp/string/basic_string
http://en.cppreference.com/w/cpp/string/basic_string/getline
http://en.cppreference.com/w/cpp/language/string_literal
Why is "using namespace std" considered bad practice?
http://en.wikipedia.org/wiki/Binary_search_tree
http://en.wikipedia.org/wiki/Hash_table
http://en.cppreference.com/w/cpp/container/set
http://en.cppreference.com/w/cpp/container/set/find
http://en.cppreference.com/w/cpp/container/unordered_set
http://en.cppreference.com/w/cpp/container/unordered_set/find
http://en.wikipedia.org/wiki/Computational_complexity_theory
Related
I have a std::string
I want to keep the string after two spaces like in Newa and Newb
std::string a = "Command send SET Command comes here";
std::string b = "Command GET Command comes here";
std::string Newa = "SET Command comes here";
std::string Newb = "Command comes here";
what comes to my mind is that i can do std::string::find(' ') two times and use std::string::substr to get the desired result.
Can we do it in more refined manner.
The following is a more generalized approach. find_n returns an iterator past the element that matches the n-th element that needs to be searched for. It can be used to split after two spaces, three spaces, etc. You can in fact use it as a building block for other algorithms. The split function will return the input string in case the string contains less than two spaces.
#include <iostream>
template<class InputIt, class T>
InputIt find_n(InputIt first, InputIt last, const T& value, size_t n)
{
size_t count{0};
while (first != last && count < n) {
if (*first++ == value) ++count;
}
return first;
}
std::string split_after_two_spaces(const std::string& s)
{
const auto it{find_n(s.begin(), s.end(), ' ', 2)};
if (it != s.end()) {
return std::string(it, s.end());
}
return s;
}
int main()
{
const std::string a = "Command send SET Command comes here";
const std::string b = "Command GET Command comes here";
const std::string c = "Command GET";
std::cout << split_after_two_spaces(a) << '\n';
std::cout << split_after_two_spaces(b) << '\n';
std::cout << split_after_two_spaces(c) << '\n';
return 0;
}
I wanted to verify how much worse this approach would be in terms of performance compared to the straightforward double find() and substr() approach, and it turns out to be a bit faster for small strings, and it is slower for longer input strings. std::string::find() is going to be faster than std::find since it is likely to be optimized to deal with strings specifically.
Short string benchmark: http://quick-bench.com/sgKeT333zoXYBQS_1EXFzprCNrY
Longer string benchmark: http://quick-bench.com/FEhPTU4YfDPvemWMqgg4oTHZlLU
Update: The following implementation of find_n is more efficient, but also a bit more complex unfortunately. It has nicer semantics in the sense that it returns an iterator to the n-th matching element, instead of an iterator one past the n-th matching element.
template<class InputIt, class T>
InputIt find_n(InputIt first, InputIt last, const T& value, size_t n)
{
if (first != last && n > 0)
{
size_t count{0};
do
{
first = std::find(first, last, value);
}
while (first != last && ++count < n);
}
return first;
}
There's no need to construct sstream, you can just use find() function twice.
The function below can remove n first words from your string (based on spaces, but it can also be parametrized). All you need is to find the first space occurence using find, replace input string with a substring (starting from a next character after that space) and repeat the procedure depending on number of words you want to remove.
#include <iostream>
#include <string>
std::string removeWords(std::string s, int n) {
for(int i=0; i<n; ++i) {
const auto spaceIdx = s.find(' ');
s = s.substr(spaceIdx+1, s.length());
}
return s;
}
int main() {
std::cout << removeWords("Command send SET Command comes here", 2) << '\n';
std::cout << removeWords("Command GET Command comes here", 2) << '\n';
return 0;
}
You can try this, with the help of sstream and getline:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main() {
string a = "Command send SET Command comes here";
stringstream ss(a);
string w1, w2, w3, Newa;
ss >> w1 >> w2 >> w3;
getline(ss, Newa); //get rest of the string!
cout << w3 + Newa << endl;
return 0;
}
"SET Command comes here"
Hm, maybe I have a gross misunderstanding. I see many lines of descriptions, many lines of code, votings and an accepted answer.
But normally such text replacements can always be done with a one-liner. There is a dedicated function available for such tasks: std::regex_replace.
There are no requirements regarding performance, so I would recommend to use the dedicated function. Please see the following one-liner:
#include <iostream>
#include <iterator>
#include <string>
#include <regex>
int main()
{
std::string a{"Command send SET Command comes here"};
std::regex_replace(std::ostreambuf_iterator<char>(std::cout), a.begin(), a.end(), std::regex(R"(.*? .*? (.*$))"), "$1");
return 0;
}
I guess that because if its simplicity, further explanaions are not needed.
I am trying to write a program that eliminates blank spaces using a range based for loop in C++. For eg, if the input is, "what is your name?" , the output should be "Whatisyourname?" however when i run the code below, the output it gives is "Whatisyourname?me?", why is that?
int main()
{
string s = "What is your name?";
int write_index = 0;
for (const char &c : s)
{
if (c != ' ')
{
s[write_index++] = c;
}
}
cout << s << endl;
system("pause");
}
Add after the loop the following statement
s.erase( write_index );
or
s.resize( write_index );
to remove redundant characters from the string.
The general approach to such tasks is the following
#include <algorithm>
#include <string>
//...
s.erase( std::remove( s.begin(), s.end(), ' ' ), s.end() );
The reason for this is because string s is still as long as the original string, "What is your name?". You wrote over top of every character in the string except for the last three. What you could do is erase the last three characters from the string after you're done removing the spaces. This is untested but something like this should work:
s.erase(write_index, s.length() - write_index)
Your range based for loop usage is correct. Just keep in mind that you're looping over all the input characters (as though you were looping with for (int i = 0; i < s.length(); i++), but you're not outputting as many characters as you're reading.
So the equivalent for loop would be like this:
for (int i = 0; i < s.length(); i++) {
const char& c = s[i];
if (c != ' ') {
s[write_index++] = c;
}
}
Here are two useful little functions:
template<class C, class F>
bool remove_erase_if( C& c, F&& f ) {
using std::begin; using std::end;
auto it = std::remove_if( begin(c), end(c), std::forward<F>(f) );
if ( it == c.end())
return false;
c.erase( it, c.end() );
return true;
}
template<class C, class T>
bool remove_erase( C& c, T&& t ) {
using std::begin; using std::end;
auto it = std::remove( begin(c), end(c), std::forward<T>(t) );
if ( it == c.end())
return false;
c.erase( it, c.end() );
return true;
}
these both take a container, and either a test or an element.
They then remove and erase any elements that pass the test, or equal the element.
Your code emulated the remove part of the above code, and did not do the erase part. So the characters remaining at the end ... remained.
remove (or your code) simply moves all the "kept" data to the front of the container. The stuff left over at the end ... stays there. The erase step then tells the container that the stuff after the stuff you kept should be discarded. If you don't discard it, it ... remains ... and you get your bug.
With the above two functions, you can do this:
int main() {
std::string s = "What is your name?";
remove_erase( s, ' ' );
std::cout << s << '\n';
}
and you are done.
As an aside, using namespace std; is often a bad idea. And std::endl forces a buffer-flush, so I prefer '\n'. Finally, system("pause") can be emulated by running your IDE in a mode that leaves you your command window open, instead of adding it to your code Ctrl-F5.
You can keep track of the number of spaces you have and resize the string at the end.
int main()
{
string s = "What is your name?";
int length = s.length();
int write_index = 0;
for (const char &c : s)
{
if (c != ' ')
{
s[write_index++] = c;
}
else
{
length -= 1;
}
}
s.resize(length);
cout << s << endl;
}
Try this:
#include <string.h>
#include <iostream>
using namespace std;
int main()
{
string s = "What is your name?";
std::string aux(s.size(),' ');
int write_index = 0;
for (const char &c : s)
{
if (c != ' ')
{
aux[write_index++] = c;
}
}
cout << s << endl;
cout << aux << endl;
system("pause");
}
Now, I don't personally code C++, but this looks eerily similar to a for-each loop in C#, Java, and JavaScript; so I'll give it a go.
Let's first break down your code to see what's going on
int main() {
// creates a string object which is essentially a glorified array of chars
string s = "What is your name?";
int write_index = 0;
// for evry char "c" in the char-array "s"
for (const char &c : s) {
// if c isn't a space
if (c != ' ') {
// write c to s at index "write_index" then increment "write_index"
s[write_index++] = c;
}
}
std::cout << s << std::endl;
system("pause");
}
The logic seems good, so why does "what is your name?" turn into "whatisyourname?me?"? Simple. Because you're overwriting the existing array.
"what is your name?" is 18 characters long, and since you're only writing a non-space character to the array if it's not a space you're essentially copying characters one space left for every space in your text.
For example here's what happens after you run this code over the first 7 characters: "whatiss your name?", and after the first 12: "whatisyourur name?", and finally after all 18: "whatisyourname?me?". The length of the string never really changes.
So you got a number of options to solve this issue:
Build a new string from the old one with a string-builder (if such a thing exists in C++) and return the freshly created string.
Count the number of spaces you encounter and return a substring that is that many characters shorter (original is 18 chars - 3 spaces = new is 15 chars).
Reduce the length of the string by the required amount of characters (Thanks Yakk for this one)
This is a basic application of the copy_if algorithm from the standard library.
#include <algorithm>
#include <cctype>
#include <iostream>
#include <iterator>
#include <string>
int main()
{
std::string s = "What is your name?";
std::copy_if(s.begin(), s.end(), std::ostream_iterator<char>(std::cout),
[](char c){ return !std::isspace(c); });
return 0;
}
outputs:
Whatisyourname?
If you actually need to remove them from the original string, then use the algorithm remove_if followed by erase.
Can someone explain to me how to properly search for a "tab" character stored in a string class?
For example:
text.txt contents:
std::cout << "Hello"; // contains one tab space
User enters on prompt: ./a.out < text.txt
main.cpp:
string arrey;
getline(cin, arrey);
int i = 0;
while( i != 10){
if(arrey[i] == "\t") // error here
{
std::cout << "I found a tab!!!!"
}
i++;
}
Since there is only one tab space in the textfile, I am assuming it is stored in index [0], but the problem is that I can't seem to make a comparison and I don't know any other way of searching it. Can someone help explain an alternative?
Error: ISO C++ forbids comparison between pointer and integer
First of all, what is i? And secondly, when you use array-indexing of a std::string object, you get a character (i.e. a char) and not a string.
The char is converted to an int and then the compiler tries to compare that int with the pointer to the string literal, and you can't compare plain integers with pointers.
You can however compare a character with another character, like in
arrey[i] == '\t'
std::string::find() might help.
Try this:
...
if(arrey.find('\t') != string::npos)
{
std::cout << "I found a tab!!!!";
}
More info on std::string::find is available here.
Why not using what C++ library provides? You could do it like this:
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main()
{
string arrey;
getline(cin, arrey);
if (arrey.find("\t") != std::string::npos) {
std::cout << "found a tab!" << '\n';
}
return 0;
}
The code is based on this answer. Here is the ref for std::find.
About your edit, how are sure that the input is going to be 10 positions? That might be too little or too big! If it is less than the actual size of the input, you won't look all the characters of the string and if it is too big, you are going to overflow!
You could use .size(), which says the size of the string and use a for loop like this:
#include <iostream>
#include <string>
using namespace std;
int main() {
string arrey;
getline(cin, arrey);
for(unsigned int i = 0; i < arrey.size(); ++i) {
if (arrey[i] == '\t') {
std::cout << "I found a tab!!!!";
}
}
return 0;
}
How can I compare a single character from a string, and another string (which may or may not be greater than one character)
This program gives me almost 300 lines of random errors. The errors don't reference a specific line number either, just a lot of stuff about "char* ", "", or "std::to_string".
#include <iostream>
#include <string>
using std::cout;
using std::string;
int main() {
string str = "MDCXIV";
string test = "D";
if (test == str[4]) { // This line causes the problems
cout << test << endl;
}
return 0;
}
str[4] is a char type, which will not compare with a string.
Compare apples with apples.
Use
test[0] == str[4]
instead.
You need to convert str[4] (which is a char) to a string before you can compare it to another string. Here's a simple way to do this
if (test == string(1, str[4])) {
You're comparing a char to a std::string, this is not a valid comparison.
You're looking for std::string::find, as follows:
if( test.find( str[4] ) != std::string::npos ) cout << test << "\n";
Note that this will return true if test contains str[4].
You're mixing types. It doesn't know how to compare a string (test) to a char (str[4]).
If you change test to a char that will work fine. Or reference the specific character within test you want to compare such as if (test[0] == str[4]) it should compile and run.
However, as this is merely an example and not really the true question what you'll want to do is look at the functionality that the std::string class supplies
Also you need "D" to be a char value not a string value if you are comparing it like that.
std::string myString = "Hello World";
const char *myStringChars = myString.c_str();
You have to turn it into a char array before can access it. Unless you do.
str.at(i);
which you can also write as
str[i] <-- what you did.
Essentially, this all boils down to test needs to initialized like char test = 'D';
Final Output..
#include <iostream>
#include <string>
using std::cout;
using std::string;
int main() {
string str = "MDCXIV";
char test = 'D';
if (test == str[4]) { // This line causes NO problems
cout << test << endl;
}
return 0;
}
I think you might be mixing python with c++. In c++ 'g' refers to a single character g not a string of length 1. "g" refers to an array (string) which is 1 character long and looks like ['g']. So as you can see, if you compare a single character to an array of characters no matter if the array is a single character long, this operation is not defined.
This will work if define it yourself by building a class which is able to compare string of one character long to a single character. Or just overload the == operator for doing just that
Example:
#include <iostream>
#include <string>
using std::cout;
using std::string;
using std::endl;
bool operator == ( const string &lh, const char &rh) {
if (lh.length() == 1) return lh[0] == rh;
return 0;
}
int main() {
string str = "MDCXIV";
string test = "D";
if (test == str[4]) {
cout << test << endl;
}
else cout << "Not a match\n";
return 0;
}
I am making an application that deals with txt file data.
The idea is that txt files may come in different formats, and it should be read into C++.
One example might be 3I2, 3X, I3, which should be done as: "first we have 3 integers of length 2, then we have 3 empty spots, then we have 1 integer of length 3.
Is the best to iterate over the file, yielding lines, followed by iterating over the lines as strings? What would be an effective approach for iterating smartly leaving out the 3 spots to be ignored?
E.g.
101112---100
102113---101
103114---102
to:
10, 11, 12, 100
10, 21, 13, 101
10, 31, 14, 102
The link given by Kyle Kanos is a good one; *scanf/*printf format strings map pretty well onto fortran format strings. It's actually easier to do this using C-style IO, but using C++ style streams is doable as well:
#include <cstdio>
#include <iostream>
#include <fstream>
#include <string>
int main() {
std::ifstream fortranfile;
fortranfile.open("input.txt");
if (fortranfile.is_open()) {
std::string line;
getline(fortranfile, line);
while (fortranfile.good()) {
char dummy[4];
int i1, i2, i3, i4;
sscanf(line.c_str(), "%2d%2d%2d%3s%3d", &i1, &i2, &i3, dummy, &i4);
std::cout << "Line: '" << line << "' -> " << i1 << " " << i2 << " "
<< i3 << " " << i4 << std::endl;
getline(fortranfile, line);
}
}
fortranfile.close();
return 0;
}
Running gives
$ g++ -o readinput readinput.cc
$ ./readinput
Line: '101112---100' -> 10 11 12 100
Line: '102113---101' -> 10 21 13 101
Line: '103114---102' -> 10 31 14 102
Here the format string we're using is %2d%2d%2d%3s%3d - 3 copies of %2d (decimal integer of width 2) followed by %3s (string of width 3, which we read into a variable we never use) followed by %3d (decimal integer of width 3).
Given that you wish to dynamically parse Fortran Format specifier flags, you should note that: you've immediately walked into the realm of parsers.
In addition to the other methods of parsing such input that others have noted here:
By using Fortran and CC/++ bindings to do the parsing for you.
Using pure C++ to parse it for you by writing a parser using a combination of:
sscanf
streams
My proposal is that if boost is available to you, you can use it to implement a simple parser for on-the-fly operations, using a combination of Regexes and STL containers.
From what you've described, and what is shown in different places, you can construct a naive implementation of the grammar you wish to support, using regex captures:
(\\d{0,8})([[:alpha:]])(\\d{0,8})
Where the first group is the number of that variable type.
The second is the type of the variable.
and the third is the length of variable type.
Using this reference for the Fortran Format Specifier Flags, you can implement a naive solution as shown below:
#include <iostream>
#include <string>
#include <vector>
#include <fstream>
#include <cstdlib>
#include <boost/regex.hpp>
#include <boost/tokenizer.hpp>
#include <boost/algorithm/string.hpp>
#include <boost/lexical_cast.hpp>
//A POD Data Structure used for storing Fortran Format Tokens into their relative forms
typedef struct FortranFormatSpecifier {
char type;//the type of the variable
size_t number;//the number of times the variable is repeated
size_t length;//the length of the variable type
} FFlag;
//This class implements a rudimentary parser to parse Fortran Format
//Specifier Flags using Boost regexes.
class FormatParser {
public:
//typedefs for further use with the class and class methods
typedef boost::tokenizer<boost::char_separator<char> > bst_tokenizer;
typedef std::vector<std::vector<std::string> > vvstr;
typedef std::vector<std::string> vstr;
typedef std::vector<std::vector<int> > vvint;
typedef std::vector<int> vint;
FormatParser();
FormatParser(const std::string& fmt, const std::string& fname);
void parse();
void printIntData();
void printCharData();
private:
bool validateFmtString();
size_t determineOccurence(const std::string& numStr);
FFlag setFortranFmtArgs(const boost::smatch& matches);
void parseAndStore(const std::string& line);
void storeData();
std::string mFmtStr; //this holds the format string
std::string mFilename; //the name of the file
FFlag mFmt; //a temporary FFlag variable
std::vector<FFlag> mFortranVars; //this holds all the flags and details of them
std::vector<std::string> mRawData; //this holds the raw tokens
//this is where you will hold all the types of data you wish to support
vvint mIntData; //this holds all the int data
vvstr mCharData; //this holds all the character data (stored as strings for convenience)
};
FormatParser::FormatParser() : mFmtStr(), mFilename(), mFmt(), mFortranVars(), mRawData(), mIntData(), mCharData() {}
FormatParser::FormatParser(const std::string& fmt, const std::string& fname) : mFmtStr(fmt), mFilename(fname), mFmt(), mFortranVars(), mRawData(), mIntData(), mCharData() {}
//this function determines the number of times that a variable occurs
//by parsing a numeric string and returning the associated output
//based on the grammar
size_t FormatParser::determineOccurence(const std::string& numStr) {
size_t num = 0;
//this case means that no number was supplied in front of the type
if (numStr.empty()) {
num = 1;//hence, the default is 1
}
else {
//attempt to parse the numeric string and find it's equivalent
//integer value (since all occurences are whole numbers)
size_t n = atoi(numStr.c_str());
//this case covers if the numeric string is expicitly 0
//hence, logically, it doesn't occur, set the value accordingly
if (n == 0) {
num = 0;
}
else {
//set the value to its converted representation
num = n;
}
}
return num;
}
//from the boost::smatches, determine the set flags, store them
//and return it
FFlag FormatParser::setFortranFmtArgs(const boost::smatch& matches) {
FFlag ffs = {0};
std::string fmt_number, fmt_type, fmt_length;
fmt_number = matches[1];
fmt_type = matches[2];
fmt_length = matches[3];
ffs.type = fmt_type.c_str()[0];
ffs.number = determineOccurence(fmt_number);
ffs.length = determineOccurence(fmt_length);
return ffs;
}
//since the format string is CSV, split the string into tokens
//and then, validate the tokens by attempting to match them
//to the grammar (implemented as a simple regex). If the number of
//validations match, everything went well: return true. Otherwise:
//return false.
bool FormatParser::validateFmtString() {
boost::char_separator<char> sep(",");
bst_tokenizer tokens(mFmtStr, sep);
mFmt = FFlag();
size_t n_tokens = 0;
std::string token;
for(bst_tokenizer::const_iterator it = tokens.begin(); it != tokens.end(); ++it) {
token = *it;
boost::trim(token);
//this "grammar" is based on the Fortran Format Flag Specification
std::string rgx = "(\\d{0,8})([[:alpha:]])(\\d{0,8})";
boost::regex re(rgx);
boost::smatch matches;
if (boost::regex_match(token, matches, re, boost::match_extra)) {
mFmt = setFortranFmtArgs(matches);
mFortranVars.push_back(mFmt);
}
++n_tokens;
}
return mFortranVars.size() != n_tokens ? false : true;
}
//Now, parse each input line from a file and try to parse and store
//those variables into their associated containers.
void FormatParser::parseAndStore(const std::string& line) {
int offset = 0;
int integer = 0;
std::string varData;
std::vector<int> intData;
std::vector<std::string> charData;
offset = 0;
for (std::vector<FFlag>::const_iterator begin = mFortranVars.begin(); begin != mFortranVars.end(); ++begin) {
mFmt = *begin;
for (size_t i = 0; i < mFmt.number; offset += mFmt.length, ++i) {
varData = line.substr(offset, mFmt.length);
//now store the data, based on type:
switch(mFmt.type) {
case 'X':
break;
case 'A':
charData.push_back(varData);
break;
case 'I':
integer = atoi(varData.c_str());
intData.push_back(integer);
break;
default:
std::cerr << "Invalid type!\n";
}
}
}
mIntData.push_back(intData);
mCharData.push_back(charData);
}
//Open the input file, and attempt to parse the input file line-by-line.
void FormatParser::storeData() {
mFmt = FFlag();
std::ifstream ifile(mFilename.c_str(), std::ios::in);
std::string line;
if (ifile.is_open()) {
while(std::getline(ifile, line)) {
parseAndStore(line);
}
}
else {
std::cerr << "Error opening input file!\n";
exit(3);
}
}
//If character flags are set, this function will print the character data
//found, line-by-line
void FormatParser::printCharData() {
vvstr::const_iterator it = mCharData.begin();
vstr::const_iterator jt;
size_t linenum = 1;
std::cout << "\nCHARACTER DATA:\n";
for (; it != mCharData.end(); ++it) {
std::cout << "LINE " << linenum << " : ";
for (jt = it->begin(); jt != it->end(); ++jt) {
std::cout << *jt << " ";
}
++linenum;
std::cout << "\n";
}
}
//If integer flags are set, this function will print all the integer data
//found, line-by-line
void FormatParser::printIntData() {
vvint::const_iterator it = mIntData.begin();
vint::const_iterator jt;
size_t linenum = 1;
std::cout << "\nINT DATA:\n";
for (; it != mIntData.end(); ++it) {
std::cout << "LINE " << linenum << " : ";
for (jt = it->begin(); jt != it->end(); ++jt) {
std::cout << *jt << " ";
}
++linenum;
std::cout << "\n";
}
}
//Attempt to parse the input file, by first validating the format string
//and then, storing the data accordingly
void FormatParser::parse() {
if (!validateFmtString()) {
std::cerr << "Error parsing the input format string!\n";
exit(2);
}
else {
storeData();
}
}
int main(int argc, char **argv) {
if (argc < 3 || argc > 3) {
std::cerr << "Usage: " << argv[0] << "\t<Fortran Format Specifier(s)>\t<Filename>\n";
exit(1);
}
else {
//parse and print stuff here
FormatParser parser(argv[1], argv[2]);
parser.parse();
//print the data parsed (if any)
parser.printIntData();
parser.printCharData();
}
return 0;
}
This is standard c++98 code and can be compiled as follows:
g++ -Wall -std=c++98 -pedantic fortran_format_parser.cpp -lboost_regex
BONUS
This rudimentary parser also works on Characters too (Fortran Format Flag 'A', for up to 8 characters). You can extend this to support whatever flags you may like by editing the regex and performing checks on the length of captured strings in tandem with the type.
POSSIBLE IMPROVEMENTS
If C++11 is available to you, you can use lambdas in some places and substitute auto for the iterators.
If this is running in a limited memory space, and you have to parse a large file, vectors will inevitably crash due to the way how vectors manages memory internally. It will be better to use deques instead. For more on that see this as discussed from here:
http://www.gotw.ca/gotw/054.htm
And, if the input file is large, and file I/O is a bottleneck, you can improved performance by modifying the size of the ifstream buffer:
How to get IOStream to perform better?
DISCUSSION
What you will notice is that: the types that you're parsing must be known at runtime, and any associated storage containers must be supported in the class declaration and definition.
As you would imagine, supporting all types in one main class isn't efficient. However, as this is a naive solution, an improved full solution can be specialized to support these cases.
Another suggestion is to use Boost::Spirit. But, as Spirit uses a lot of templates, debugging such an application is not for the faint of heart when errors can and do occur.
PERFORMANCE
Compared to #Jonathan Dursi's solution, this solution is slow:
For 10,000,000 lines of randomly generated output (a 124MiB file) using this same line format ("3I2, 3X, I3"):
#include <fstream>
#include <cstdlib>
#include <ctime>
using namespace std;
int main(int argc, char **argv) {
srand(time(NULL));
if (argc < 2 || argc > 2) {
printf("Invalid usage! Use as follows:\t<Program>\t<Output Filename>\n");
exit(1);
}
ofstream ofile(argv[1], ios::out);
if (ofile.is_open()) {
for (int i = 0; i < 10000000; ++i) {
ofile << (rand() % (99-10+1) + 10) << (rand() % (99-10+1) + 10) << (rand() % (99-10+1)+10) << "---" << (rand() % (999-100+1) + 100) << endl;
}
}
ofile.close();
return 0;
}
My solution:
0m13.082s
0m13.107s
0m12.793s
0m12.851s
0m12.801s
0m12.968s
0m12.952s
0m12.886s
0m13.138s
0m12.882s
Clocks an average walltime of 12.946s
Jonathan Dursi's solution:
0m4.698s
0m4.650s
0m4.690s
0m4.675s
0m4.682s
0m4.681s
0m4.698s
0m4.675s
0m4.695s
0m4.696s
Blazes with average walltime of 4.684s
His is faster than mine by at least 270% with both on O2.
However, since you don't have to actually modify the source code every time you want to parse an additional format flag, then this solution is more optimal.
Note: you can implement a solution that involves sscanf / streams that only requires you to know what type of variable you wish to read (much like mine), but the additional checks such as verifying the type(s) bloats development time. (This is why I offer my solution in Boost, because of the convenience of tokenizers and regexes - which makes the development process easier).
REFERENCES
http://www.boost.org/doc/libs/1_34_1/libs/regex/doc/character_class_names.html
You could translate 3I2, 3X, I3 in a scanf format.
Given that Fortran is easily callable from C, you could write a little Fortran function to do this "natively." The Fortran READ function takes a format string as you describe, after all.
If you want this to work, you'll need to brush up on Fortran just a tiny bit (http://docs.oracle.com/cd/E19957-01/806-3593/2_io.html), plus learn how to link Fortran and C++ using your compiler. Here are a few tips:
The Fortran symbols may be implicitly suffixed with underscore, so MYFUNC may be called from C as myfunc_().
Multi-dimensional arrays have the opposite ordering of dimensions.
Declaring a Fortran (or C) function in a C++ header requires placing it in an extern "C" {} scope.
If your user is actually supposed to enter it in the Fortran format, or if you very quickly adapt or write Fortran code to do this, I would do as John Zwinck and M.S.B. suggest. Just write a short Fortran routine to read the data into an array, and use "bind(c)" and the ISO_C_BINDING types to set up the interface. And remember that the array indexing is going to change between Fortran and C++.
Otherwise, I would recommend using scanf, as mentioned above:
http://en.cppreference.com/w/cpp/io/c/fscanf
If you don't know the number of items per line you need to read, you might be able to use vscanf instead:
http://en.cppreference.com/w/cpp/io/c/vfscanf
However, although it looks convenient, I've never used this, so YMMV.
Thought about this some today but no time to write an example. #jrd1's example and analysis are on track but I'd try to make the parsing more modular and object oriented. The format string parser could build a list of item parsers that then worked more or less independently, allowing adding new ones like floating point without changing old code. I think a particularly nice interface would be an iomanip initialized with a format string so that the ui would be something like
cin >> f77format("3I2, 3X, I3") >> a >> b >> c >> d;
On implementation I'd have f77format parse the bits and build the parser by components, so it would create 3 fixed width int parsers, a devNull parser and another fixed width parser that would then consume the input.
Of course if you want to support all of the edit descriptors, it would be a big job! And in general it wouldn't just be passing the rest of the string on to the next parser since there are edit descriptors that require re-reading the line.