Reading integers from a text file with words - c++

I'm trying to read just the integers from a text file structured like this....
ALS 46000
BZK 39850
CAR 38000
//....
using ifstream.
I've considered 2 options.
1) Regex using Boost
2) Creating a throwaway string ( i.e. I read in a word, don't do anything with it, then read in the score ). However, this is a last resort.
Are there any ways to express in C++ that I want the ifstream to only read in text that is an integer? I'm reluctant to use regular expressions if it turns out that there is a much simpler way to accomplish this.

why to make simple things complicated?
whats wrong in this :
ifstream ss("C:\\test.txt");
int score;
string name;
while( ss >> name >> score )
{
// do something with score
}

Edit:
it's in fact possible to work on streams directly with spirit than I suggested previously, with a parser:
+(omit[+(alpha|blank)] >> int_)
and one line of code(except for variable definitions):
void extract_file()
{
std::ifstream f("E:/dd/dd.trunk/sandbox/text.txt");
boost::spirit::istream_iterator it_begin(f), it_end;
// extract all numbers into a vector
std::vector<int> vi;
parse(it_begin, it_end, +(omit[+(alpha|blank)] >> int_), vi);
// print them to verify
std::copy(vi.begin(), vi.end(),
std::ostream_iterator<int>(std::cout, ", " ));
}
you get all numbers into a vector at once with one line, couldn't be simpler.
if you do not mind using boost.spirit2. the parser to get numbers from a line only is
omit[+(alpha|blank)] >> int_
to extract everything is
+(alpha|blank) >> int_
See the whole program below(Test with VC10 Beta 2):
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <string>
#include <cstring>
#include <vector>
#include <fstream>
#include <algorithm>
#include <iterator>
using std::cout;
using namespace boost::spirit;
using namespace boost::spirit::qi;
void extract_everything(std::string& line)
{
std::string::iterator it_begin = line.begin();
std::string::iterator it_end = line.end();
std::string s;
int i;
parse(it_begin, it_end, +(alpha|blank)>>int_, s, i);
cout << "string " << s
<< "followed by nubmer " << i
<< std::endl;
}
void extract_number(std::string& line)
{
std::string::iterator it_begin = line.begin();
std::string::iterator it_end = line.end();
int i;
parse(it_begin, it_end, omit[+(alpha|blank)] >> int_, i);
cout << "number only: " << i << std::endl;
}
void extract_line()
{
std::ifstream f("E:/dd/dd.trunk/sandbox/text.txt");
std::string s;
int i;
// iterated file line by line
while(getline(f, s))
{
cout << "parsing " << s << " yields:\n";
extract_number(s); //
extract_everything(s);
}
}
void extract_file()
{
std::ifstream f("E:/dd/dd.trunk/sandbox/text.txt");
boost::spirit::istream_iterator it_begin(f), it_end;
// extract all numbers into a vector
std::vector<int> vi;
parse(it_begin, it_end, +(omit[+(alpha|blank)] >> int_), vi);
// print them to verify
std::copy(vi.begin(), vi.end(),
std::ostream_iterator<int>(std::cout, ", " ));
}
int main(int argc, char * argv[])
{
extract_line();
extract_file();
return 0;
}
outputs:
parsing ALS 46000 yields:
number only: 46000
string ALS followed by nubmer 46000
parsing BZK 39850 yields:
number only: 39850
string BZK followed by nubmer 39850
parsing CAR 38000 yields:
number only: 38000
string CAR followed by nubmer 38000
46000, 39850, 38000,

You can call ignore to have in skip over a specified number of characters.
istr.ignore(4);
You can also tell it to stop at a delimiter. You would still need to know the maximum number of characters the leading string could be, but this would also work for shorter leading strings:
istr.ignore(10, ' ');
You could also write a loop that just reads characters until you see the first digit character:
char c;
while (istr.getchar(c) && !isdigit(c))
{
// do nothing
}
if (istr && isdigit(c))
istr.putback(c);

here goes :P
private static void readFile(String fileName) {
try {
HashMap<String, Integer> map = new HashMap<String, Integer>();
File file = new File(fileName);
Scanner scanner = new Scanner(file).useDelimiter(";");
while (scanner.hasNext()) {
String token = scanner.next();
String[] split = token.split(":");
if (split.length == 2) {
Integer count = map.get(split[0]);
map.put(split[0], count == null ? 1 : count + 1);
System.out.println(split[0] + ":" + split[1]);
} else {
split = token.split("=");
if (split.length == 2) {
Integer count = map.get(split[0]);
map.put(split[0], count == null ? 1 : count + 1);
System.out.println(split[0] + ":" + split[1]);
}
}
}
scanner.close();
System.out.println("Counts:" + map);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
readFile("test.txt");
}
}

fscanf(file, "%*s %d", &num);
or %05d if you have leading zeros and fixed width of 5....
sometimes the fastest way to do things in C++ is to use C. :)

You can create a ctype facet that classifies letters as white space. Create a locale that uses this facet, then imbue the stream with that locale. Having that, you can extract numbers from the stream, but all letters will be treated as white space (i.e. when you extract numbers, the letters will be ignored just like a space or a tab would be):
Such a locale can look like this:
#include <iostream>
#include <locale>
#include <vector>
#include <algorithm>
struct digits_only: std::ctype<char>
{
digits_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
if (rc['0'] == std::ctype_base::space)
std::fill_n(&rc['0'], 9, std::ctype_base::mask());
return &rc[0];
}
};
Sample code to use it could look like this:
int main() {
std::cin.imbue(std::locale(std::locale(), new digits_only()));
std::copy(std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
std::ostream_iterator<int>(std::cout, "\n"));
}
Using your sample data, the output I get from this looks like this:
46000
39850
38000
Note that as it stands, I've written this to accept only digits. If (for example) you were reading floating point numbers, you'd also want to retain '.' (or the locale-specific equivalent) as the decimal point. One way to handle things is to start with a copy of the normal ctype table, and then just set the things you want to ignore as space.

Related

Split string by the first N occurrences of a delimiter using boost::split

I know that maybe this is a silly question, but I didn't find an answer yet. Let's suppose I have the following string:
std::string str("1,2,3,4,5,6,7,8,9,10");
I would like to split it using the comma as a delimiter, but only at the first N occurrences. For instance, for N = 2 I should obtain as a result:
{ "1", "2", "3,4,5,6,7,8,9,10" }.
Is it possibile to do it with boost::split? I have always used it in the following form:
std::vector<std::string> values;
boost::split(values, str, boost::is_any_of(","), boost::token_compress_on);
which splits the string on every occurrence of the delimiter.
In case it is not possibile, can someone suggest me an efficient way to do it?
I would like to avoid dealing with regexes, i. e. I would like to write a function like:
std::vector<std::string> tokenize(const std::string& to_split, const std::string& delimiter, int occurrences)
Thanks in advance for the help.
You can use:
find_nth(str, ",", 1);
from <boost/algorithm/string/find.hpp> to find the 2nd (or nth) occurrence of the character of interest.
Then use the traditional substr() on that possition.
Then tokenize the first part of the string boost::split.
And add the last part to the vector.
You can make use of stringstream and a modified locale:
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
struct csv_whitespace : std::ctype<char> {
static const mask* make_table() {
static std::vector<mask> v(classic_table(),
classic_table() + table_size);
v[','] |= space;
v[' '] &= ~space;
return &v[0];
}
csv_whitespace(std::size_t refs = 0)
: std::ctype<char>{ make_table(), false, refs } { }
};
int main() {
std::string str("1,2,3,4,5,6,7,8,9,10");
std::stringstream ss(str);
auto loc = ss.getloc();
ss.imbue(std::locale(loc, new csv_whitespace));
std::size_t N = 3;
while(N--) {
std::string a;
ss >> a;
std::cout << a << '\n';
}
ss.ignore();
ss.imbue(loc);
std::string rest;
ss >> rest;
std::cout << "Rest: " << rest << "\n";
}
Live demo
The trick here is to treat commas as whitespace, which can be done in the ctype<> facet.

How to read input from the text file until/after a specific location [C++]?

In my Project I need to get data about circuit from a text file then I need to parse it and produce to output data.
Here is a sample data for the text file
AND1 Adder1-3 Adder1-4 // Means AND1 gate gets its first input from
Adder1's 3rd output and its second input from Adder1's 4th output
AND2 Adder1-4 Adder1-2
OR1 AND1-1 AND2-1 //OR1's two inputs are from AND1's 1st output and
AND2's 1st output
now I need to read the component name first which is easy:
infile>>componentName;
But for the second part I though I could to this in to ways
Read the whole data and seperate it into two parts : ComponentName -
thOutput.
Read until "-" and put it into string variable then read
after "-" and put it into integer variable and repeat this for all
lines.
I tried the first way but I really stuck at converting string into integer (I tried using stoi but its for C++ only :( and also encountered with couple of problems) but I though the second way would be much easier but I couldn't figure it how to do it.
So can you help me with this ?
Basicly I need to put component name(before "-") into string variable and put the integer (after "-") into integer variable.
NOTE: Sorry for my poor English not a Native Speaker. and since the Project is large I didn't put unnecessary codes above.
This is rather basic, but should get you started. I prefer the scanf family for these tasks.
#include <stdio.h>
#include <iostream>
using namespace std;
int main() {
FILE *fp=fopen("tmpfile", "r");
char oper[5], dev1[10], dev2[10];
int op1, op2;
fscanf(fp, "%s %[^-]-%d %[^-]-%d", oper, dev1, &op1, dev2, &op2);
cout<<oper<<endl;
cout<<dev1<<endl;
cout<<op1<<endl;
cout<<dev1<<endl;
cout<<op2<<endl;
fclose(fp);
}
Output Produced for AND2 Adder1-4 Adder1-2:
AND2
Adder1
4
Adder1
2
You could use Boost.Spirit for this parsing task. It allows you to parse the result directly into a struct.
#include <iostream>
#include <sstream>
#include <string>
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
struct Entry
{
std::string op;
std::string dev1;
int n1;
std::string dev2;
int n2;
};
BOOST_FUSION_ADAPT_STRUCT(
Entry,
op,
dev1, n1,
dev2, n2)
Entry parse(std::string const &input)
{
auto iter = input.begin();
using namespace boost::spirit::x3;
auto op = rule<class op, std::string>{}
= lexeme[upper >> *(upper | digit)];
auto dev = rule<class dev, std::string>{}
= lexeme[upper >> *alnum];
Entry result;
bool r = phrase_parse(iter, input.end(),
op > dev > lit('-') > int_ > dev > lit('-') > int_,
space, result);
if (!r)
{
std::string rest(iter, input.end());
throw std::invalid_argument("Parsing failed at " + rest);
}
return result;
}
int main()
{
// This could be a file instead with std::ifstream
std::istringstream input;
input.str(
"AND1 Adder1-3 Adder1-4 // Means AND1 gate gets its first input from Adder1's 3rd output and its second input from Adder1's 4th output\n"
"AND2 Adder1-4 Adder1-2\n"
"OR1 AND1-1 AND2-1 //OR1's two inputs are from AND1's 1st output and AND2's 1st output\n");
for (std::string line; std::getline(input, line); )
{
Entry e = parse(line);
std::cout << "Found the following entries:\n"
<< " Operation: " << e.op << "\n"
<< " Device 1: " << e.dev1 << "-" << e.n1 << "\n"
<< " Device 2: " << e.dev2 << "-" << e.n2 << "\n";
}
}
Live example

What is an efficient method for extracting data from a string into a Map?

This is in C++. Let's say I have a string that looks like this "[05]some words here [13]some more words here [17]and so on"
I want to split this string into a Map<int, std::string> with the number as the key and the text up to the next code as the value. The brackets are to be completely ignored.
So far I've been getting by with the standard library and SDL (I'm making a small game), but I'm willing to install boost or any other library that would help.
My first thought was to either use some of Boosts Regex functions to do a kind of regex find and replace, or to simply convert it to a char array going through every character looking for the brackets and recording the number inside but that seems like it would be inefficient, especially since I'm sure there's probably a popular method to do this in C++.
You can use a regex_token_iterator for this. Here's the basic idea:
#include <iostream>
#include <map>
#include <string>
#include <vector>
#include <regex>
using namespace std;
map<int, string> extract( const std::string & s )
{
map<int, string> m;
static const regex r( "\\s*\\[(\\d+)\\]" );
sregex_token_iterator tok( s.begin(), s.end(), r, { -1, 1 } );
tok++; // Skip past the first end-of-sequence iterator.
for( sregex_token_iterator end; tok != end; )
{
int num = stoi( *tok, nullptr, 10 );
if( ++tok != end )
{
m.emplace( make_pair( num, *tok++ ) );
}
}
return m;
}
int main()
{
auto m = extract("[05]some words here [13]some more words here [17]and so on");
for( auto & p : m ) cout << p.first << ": '" << p.second << "'" << endl;
return 0;
}
Here, this is searching for and extracting the pattern \s*\[(\d+)\]\s*, which means it will drop any whitespace before and after the square brackets, and create a matching group to match at least one digit.
By using {-1, 1} on the iterator, we're asking for the iteration sequence to provide all text prior to the match, followed by matching group 1.
Output:
5: 'some words here'
13: 'some more words here'
17: 'and so on'
Working example is here
You can utilize substr() and find_first_of() to extract the actual data from a string as follows:
#include <string>
#include <iostream>
#include <map>
using std::string;
using std::cout;
using std::endl;
using std::map;
map<int,string> StrToMap(const string& str)
{
map<int, string> temMap;
for (int i(0); i < str.size(); ++i){
if ( str[i] == '[' ){
string tempIdx = str.substr(i+1, str.find_first_of("]",i)-i-1 );
int a = i+str.find_first_of("]",i)-i+1;
int b = str.find_first_of("[",a)-1;
if ( b < 0 )
b = str.size();
string tempStr = str.substr(a, b-a);
int idx = std::stoi( tempIdx );
temMap[idx] = tempStr;
}
}
return temMap;
}
int main(int argc, char* argv[])
{
map<int, string> temMap = StrToMap("[05]some words here [13]some more words here [17]and so on");
for (std::map<int, string>::const_iterator it=temMap.begin(); it!=temMap.end(); ++it)
std::cout << it->first << " " << it->second << '\n';
return 0;
}
The result is
5 some words here
13 some more words here
17 and so on
You can split string by '[' characters and collect parts to the vector. Then for each element of the vector split it on two parts (before ']' and after). Convert first to the number and put everything in map. It's all will be standard std methods.

Splitting a String in C++ (using cin)

I'm doing THIS UVa problem, which takes in the following input:
This is fun-
ny! Mr.P and I've never seen
this ice-cream flavour
before.Crazy eh?
#
This is fun-
ny! Mr.P and I've never seen
this ice-cream flavour
before.Crazy eh?
#
and produces this output:
1 1
2 3
3 2
4 3
5 3
6 1
7 1
8 1
1 1
2 3
3 2
4 3
5 3
6 1
7 1
8 1
In the input, # divides the cases. I'm supposed to get the length of each word and count the frequency of each different length (as you see in the output, a word of length 1 occurs once, length 2 occurs three times, 3 occurs twice, and so on).
My problem is this: When reading in cin, before.Crazy is counted as one word, since there is no space dividing them. It should then be as simple as splitting the string on certain punctuation ({".",",","!","?"} for example)...but C++ seems to have no simple way to split the string.
So, my question: How can I split the string and send in each returned string to my function that handles the rest of the problem?
Here's my code:
int main()
{
string input="";
while(cin.peek()!=-1)
{
while(cin >> input && input!="#")
{
lengthFrequency(input);
cout << input << " " << input.length() << endl;
}
if(cin.peek()!=-1) cout << endl;
lengthFrequencies.clear();
}
return 0;
}
lengthFrequency is a map<int,int>.
You can redefine what a stream considers to be a whitespace character using a std::locale with a custom std::ctype<char> facet. Here is corresponding code which doesn't quite do the assignment but demonstrates how to use the facet:
#include <algorithm>
#include <iostream>
#include <locale>
#include <string>
struct ctype
: std::ctype<char>
{
typedef std::ctype<char> base;
static base::mask const* make_table(char const* spaces,
base::mask* table)
{
base::mask const* classic(base::classic_table());
std::copy(classic, classic + base::table_size, table);
for (; *spaces; ++spaces) {
table[int(*spaces)] |= base::space;
}
return table;
}
ctype(char const* spaces)
: base(make_table(spaces, table))
{
}
base::mask table[base::table_size];
};
int main()
{
std::cin.imbue(std::locale(std::locale(), new ctype(".,!?")));
for (std::string s; std::cin >> s; ) {
std::cout << "s='" << s << "'\n";
}
}
Before counting the frequencies, you could parse the input string and replace all the {".",",","!","?"} characters with spaces (or whatever separation character you want to use). Then your existing code should work.
You may want to handle some characters differently. For example, in the case of before.Crazy you would replace the '.' with a space, but for something like 'ny! ' you would remove the '!' altogether because it is already followed by a space.
How about this (using the STL, comparators and functors)?
NOTE: All assumptions and explanations are in the source code itself.
#include <iostream>
#include <string>
#include <vector>
#include <cstdlib>
#include <sstream>
#include <algorithm>
#include <cctype>
#include <utility>
#include <string.h>
bool compare (const std::pair<int, int>& l, const std::pair<int, int>& r) {
return l.first < r.first;
}
//functor/unary predicate:
struct CompareFirst {
CompareFirst(int val) : val_(val) {}
bool operator()(const std::pair<int, int>& p) const {
return (val_ == p.first);
}
private:
int val_;
};
int main() {
char delims[] = ".,!?";
char noise[] ="-'";
//I'm assuming you've read the text from some file, and that information has been stored in a string. Or, the information is a string (like below):
std::string input = "This is fun-\nny, Mr.P and I've never seen\nthis ice-cream flavour\nbefore.Crazy eh?\n#\nThis is fun-\nny! Mr.P and I've never seen\nthis ice-cream flavour\nbefore.Crazy eh?\n#\n";
std::istringstream iss(input);
std::string temp;
//first split the string by #
while(std::getline(iss, temp, '#')) {
//find all the occurences of the hypens as it crosses lines, and remove the newline:
std::string::size_type begin = 0;
while(std::string::npos != (begin = temp.find('-', begin))) {
//look at the character in front of the current hypen and erase it if it's a newline, if it is - remove it
if (temp[begin+1] == '\n') {
temp.erase(begin+1, 1);
}
++begin;
}
//now, erase all the `noise` characters ("'-") as these count as these punctuation count as zero
for (int i = 0; i < strlen(noise); ++i) {
//this replaces all the hyphens and apostrophes with nothing
temp.erase(std::remove(temp.begin(), temp.end(), noise[i]), temp.end());//since hyphens occur across two lines, you need to erase newlines
}//at this point, everything is dandy for complete substitution
//now try to remove any other delim chracters by replacing them with spaces
for (int i = 0; i < strlen(delims); ++i) {
std::replace(temp.begin(), temp.end(), delims[i], ' ');
}
std::vector<std::pair<int, int> > occurences;
//initialize another input stringstream to make use of the whitespace
std::istringstream ss(temp);
//now use the whitespace to tokenize
while (ss >> temp) {
//try to find the token's size in the occurences
std::vector<std::pair<int, int> >::iterator it = std::find_if(occurences.begin(), occurences.end(), CompareFirst(temp.size()));
//if found, increment count by 1
if (it != occurences.end()) {
it->second += 1;//increment the count
}
//this is the first time it has been created. Store value, and a count of 1
else {
occurences.push_back(std::make_pair<int, int>(temp.size(), 1));
}
}
//now sort and output:
std::stable_sort(occurences.begin(), occurences.end(), compare);
for (int i = 0; i < occurences.size(); ++i) {
std::cout << occurences[i].first << " " << occurences[i].second << "\n";
}
std::cout << "\n";
}
return 0;
}
91 lines, and all vanilla C++98.
A rough outline of what I did is:
Since hyphens occur across two lines, find all hyphens and remove any newlines that follow them.
There are characters that don't add to the length of a word such as the legitimate hypenated words and the apostrophe. Find these and erase them as it makes tokenizing easier.
All the other remaining delimiters can now be found and replaced with whitespace. Why? Because we can use the whitespace to our advantage by using streams (whose default action is to skip whitespace).
Create a stream and tokenize the text via whitespace as per the previous.
Store the lengths of the tokens and their occurrences.
Sort the lengths of the tokens, and then output the token length and corresponding occurrences.
REFERENCES:
https://stackoverflow.com/a/5815875/866930
https://stackoverflow.com/a/12008126/866930

converting individual string elements to their decimal equivalents in c++

I have a string str ( "1 + 2 = 3" ). I want to obtain the individual numbers of the string in their decimal values( not ASCII ). I have tried atoi and c_str(). But both them require the entire string to consist of only numbers. I am writing my code in C++.
Any help would be great.
My challenge is to evaluate a prefix expression. I am reading from a file where each line contains a prefix expression. My code snippet to tokenize and and store the variables is as shown below. Each line of the file contains numbers and operators(+,-,*) which are separated by a space.
Ex - line = ( * + 2 3 4);
ifstream file;
string line;
file.open(argv[1]);
while(!file.eof())
{
getline(file,line);
if(line.length()==0)
continue;
else
{
vector<int> vec;
string delimiters = " ";
size_t current;
size_t next = -1;
do
{
current = next + 1;
next = line.find_first_of( delimiters, current );
if((line[next] <=57)&&(line[next] >=48))
vec.push_back(atoi((line.substr( current, next - current )).c_str()));
}while (next != string::npos);
cout << vec[0] << endl;
}
}
file.close();
In this case vec[0] prints 50 not 2.
You need to learn to delimit a string. Your delimiting characters would be mathematical operators (ie:
C: creating array of strings from delimited source string
http://www.gnu.org/software/libc/manual/html_node/Finding-Tokens-in-a-String.html
In the case of the second link, you would do something like:
const char delimiters[] = "+-=";
With this knowledge, you can create an array of strings, and call atoi() on each string to get the numeric equivalent. Then you can use the address (array index) of each delimiter to determine which operator is there.
For just things like addition and subtraction, this will be dead simple. If you want order of operations and multiplication, parentheses, etc, your process flow logic will be more complicated.
For a more in-depth example, please see this final link. A simple command-line calculator in C. That should make it crystal clear.
http://stevehanov.ca/blog/index.php?id=26
You will not fall into your if, since your next position will be at a delimiter.
string delimiters = " ";
...
next = line.find_first_of( delimiters, current );
if((line[next] <=57)&&(line[next] >=48))
...
Since your delimiters consist of " ", then line[next] will be a space character.
From the description of your problem, you are missing code that will save away your operators. There is no code to attempt to find the operators.
You don't have to assume ASCII for testing for a digit. You can use is_digit() for example, or you can compare against '9' and '0'.
When you print your vector element, you may be accessing the vector inappropriately, because no item may have ever been inserted into the array.
Don't use fin.eof() to control a loop. That function is only useful after a read has failed.
There are a number of ways to get ints from a std::string, I'm choosing std::stoi() from the C++11 standard in this case.
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <vector>
typedef std::vector<int> ints;
bool is_known_operator(std::string const& token)
{
static char const* tokens[] = {"*", "/", "+", "-"};
return std::find(std::begin(tokens), std::end(tokens), token) != std::end(tokens);
}
ints tokenise(std::string const& line)
{
ints vec;
std::string token;
std::istringstream iss(line);
while (iss >> token)
{
if (is_known_operator(token))
{
std::cout << "Handle operator [" << token << "]" << std::endl;
}
else
{
try
{
auto number = std::stoi(token);
vec.push_back(number);
}
catch (const std::invalid_argument&)
{
std::cerr << "Unexpected item in the bagging area ["
<< token << "]" << std::endl;
}
}
}
return vec;
}
int main(int, const char *argv[])
{
std::ifstream file(argv[1]);
std::string line;
ints vec;
while (std::getline(file, line))
{
vec = tokenise(line);
}
std::cout << "The following " << vec.size() << " numbers were read:\n";
std::copy(vec.begin(), vec.end(), std::ostream_iterator<int>(std::cout, "\n"));
}