Reading a file of mixed data into a C++ string - c++

I need to use C++ to read in text with spaces, followed by a numeric value.
For example, data that looks like:
text1
1.0
text two
2.1
text2 again
3.1
can't be read in with 2 "infile >>" statements. I'm not having any luck with getline
either. I ultimately want to populate a struct with these 2 data elements. Any ideas?

The standard IO library isn't going to do this for you alone, you need some sort of simple parsing of the data to determine where the text ends and the numeric value begins. If you can make some simplifying assumptions (like saying there is exactly one text/number pair per line, and minimal error recovery) it wouldn't be too bad to getline() the whole thing into a string and then scan it by hand. Otherwise, you're probably better off using a regular expression or parsing library to handle this, rather than reinventing the wheel.

Why? You can use getline providing a space as line separator. Then stitch extracted parts if next is a number.

If you can be sure that your input is well-formed, you can try something like this sample:
#include <iostream>
#include <sstream>
int main()
{
std::istringstream iss("text1 1.0 text two 2.1 text2 again 3.1");
for ( ;; )
{
double x;
if ( iss >> x )
{
std::cout << x << std::endl;
}
else
{
iss.clear();
std::string junk;
if ( !(iss >> junk) )
break;
}
}
}
If you do have to validate input (instead of just trying to parse anything looking like a double from it), you'll have to write some kind of parser, which is not hard but boring.

Pseudocode.
This should work. It assumes you have text/numbers in pairs, however. You'll have to do some jimmying to get all the typing happy, also.
while( ! eof)
getline(textbuffer)
getline(numberbuffer)
stringlist = tokenize(textbuffer)
number = atof(numberbuffer)

Related

In c++, how do you get the input of a string, float and integer from 1 line?

An input file is entered with the following data:
Juan Dela Cruz 150.50 5
'Juan Dela Cruz' is a name that I would like to assign to string A,
'150.50' is a number I would like to assign to float B
and 5 is a number I would like to assign to int C.
If I try cin, it is delimited by the spaces in between.
If I use getline, it's getting the whole line as a string.
What would be the correct syntax for this?
If we analyze the string, then we can make the following observation. At the very end, we have an integer. In front of the integer we have a space. And in front of that the float value. And again in fron of that a space.
So, we can simply look from the back of the string for the 2nd last space. This can easily be achieved by
size_t position = lineFromeFile.rfind(' ', lineFromeFile.rfind(' ')-1);
We need a nested statement of rfind please see here, version no 3.
Then we build a substring with the name. From start of the string up to the found position.
For the numbers, we put the rest of the original string into an std::istringstream and then simply extract from there.
Please see the following simple code, which has just a few lines of code.
#include <iostream>
#include <string>
#include <cctype>
#include <sstream>
int main() {
// This is the string that we read via getline or whatever
std::string lineFromeFile("Juan Dela Cruz 150.50 5");
// Let's search for the 2nd last space
size_t position = lineFromeFile.rfind(' ', lineFromeFile.rfind(' ')-1);
// Get the name as a substring from the original string
std::string name = lineFromeFile.substr(0, position);
// Put the numbers in a istringstream for better extraction
std::istringstream iss(lineFromeFile.substr(position));
// Get the rest of the values
float fValue;
int iValue;
iss >> fValue >> iValue;
// Show result to use
std::cout << "\nName:\t" << name << "\nFloat:\t" << fValue << "\nInt:\t" << iValue << '\n';
return 0;
}
Probably simplest in this case would be to read whole line into string and then parse it with regex:
const std::regex reg("\\s*(\\S.*)\\s+(\\d+(\\.\\d+)?)\\s+(\\d+)\\s*");
std::smatch match;
if (std::regex_match( input, match, reg)) {
auto A = match[1];
auto B = std::stof( match[2] );
auto C = std::stoi( match[4] );
} else {
// error invalid format
}
Live example
As always when the input does not (or sometimes does not) match a strict enough syntax, read the whole line and then apply the rules which to a human are "obvious".
In this case (quoting comment by john):
Read the whole string as a single line. Then analyze the string to work out where the breaks are between A, B and C. Then convert each part to the type you require.
Specifically, you probably want to use reverse searching functions (e.g. https://en.cppreference.com/w/cpp/string/byte/strrchr ), because the last parts of the input seem the most strictly formatted, i.e. easiest to parse. The rest is then the unpredictable part at the start.
either try inputting the different data type in different lines and then use line breaks to input different data types or use the distinction to differentiate different data types like adding a . or comma
use the same symbol after each data package, for example, Juan Dela Cruz;150.50;5 then you can check for a ; and separate your string there.
If you want to use the same input format you could use digits as an indicator to separate them

trying to read and store data from a file where there are multiple spaces between each intended data

Here's a simple code on how I read and store the data. I have a text file and inside the text file are the data that I want to pass to both number and text. The code runs fine if the text file contains text such as 2 HelloWorld1, 2 is stored into number and HelloWorld1 is stored into text.
But what if the text in the txt file is as such, 2 Hello World 1 where there are spaces between Hello, World and 1? My question is would it be possible for 2 to be stored in number and Hello World 1 to be stored in text. i understand that because of the empty spaces and as such only 2 and Hello and stored in both number and text respectively. Is there a way to overcome this?
using namespace std;
int main(){
ifstream theFile("key.txt");
int number;
string text;
while(theFile>>number>>text){
cout<<number<<" and "<<text<<endl;
}
}
You are out of luck with the default stream operator >> (if that is indeed your case).
1: Know the format
The way forward is to know the format which judging from your post you are somewhat uncertain about.
2: Use the best tools for the job
After that you choose the right tool for the job. That could involve: std::getline and handpassing, perhaps using a regex (in your case, fairly simply ones), boost::spirit, tokenization techniques, boost::string_algo, lex/bison and more.
I would add that customizing stream operator functionality (while possible) is rarely the straightforward choice.
3: Design your format to match
As an alternative to knowing the format, if you can design it, so much the better. If you have record style format, an easy way to handle strings with spaces is to put the string last - then put each record on a line. That way you may first look over each line using eg. std::getline and then just use stream operators for the rest - knowing the string will come last. Other delimiters (than linefeed) is certainly doable as well.
I would like to add an example to the very good answer of #darune.
It all depends on the input format.
Assuming that your line starts with a number and then ends with a string, you can use the following approach:
First read a number with extractor operator >>
Use getline to read the rest
Please see:
#include <iostream>
#include <string>
#include <sstream>
#include <cctype>
#include <algorithm>
#include <regex>
std::istringstream testData (
R"#(1 data1
2 data2 data3
3 data 4
)#");
int main()
{
// Definition of variables
int number{};
std::string str{};
// Read file
// Read the number
while (testData >> number)
{
// Read the rest of the line in a string
getline(testData, str);
// Remove leading and trailing spaces
str = std::regex_replace(str, std::regex("^ +| +$|( ) +"), "$1");
// Show result
std::cout << number << ' ' <<str << '\n';
};
return 0;
}
Result:
1 data1
2 data2 data3
3 data 4
But as said, it strongly depends on the input format

c++ if(cin>>input) doesn't work properly in while loop

I'm new to c++ and I'm trying to solve the exercise 6 from chapter 4 out of Bjarne Stroustrups book "Programming Principles and Practise Using C++ and don't understand why my code doesn't work.
The exercise:
Make a vector holding the ten string values "zero", "one", ...,
"nine". Use that in a program that converts a digit to its
corresponding spelled-out value: e.g., the input 7 gives the output
seven. Have the same program, using the same input loop, convert
spelled-out numbers into their digit form; e.g., the input seven gives
the output 7.
My loop only executes one time for a string and one time for an int, the loop seems to continue but it doesn't matter which input I'm giving, it doesn't do what it's supposed to do.
One time it worked for multiple int inputs, but only every second time. It's really weird and I don't know how to solve this in a different way.
It would be awesome if someone could help me out.
(I'm also not a native speaker, so sorry, if there are some mistakes)
The library in this code is a library provided with the book, to make the beginning easier for us noobies I guess.
#include "std_lib_facilities.h"
int main()
{
vector<string>s = {"zero","one","two","three","four","five","six","seven","eight","nine"};
string input_string;
int input_int;
while(true)
{
if(cin>>input_string)
{
for(int i = 0; i<s.size(); i++)
{
if(input_string == s[i])
{
cout<<input_string<<" = "<<i<<"\n";
}
}
}
if(cin>>input_int)
{
cout<<input_int<<" = "<<s[input_int]<<"\n";
}
}
return 0;
}
When you (successfully) read input from std::cin, the input is extracted from the buffer. The input in the buffer is removed and can not be read again.
And when you first read as a string, that will read any possible integer input as a string as well.
There are two ways of solving this:
Attempt to read as int first. And if that fails clear the errors and read as a string.
Read as a string, and try to convert to an int. If the conversion fails you have a string.
if(cin >> input) doesn't work properly in while loop?
A possible implementation of the input of your program would look something like:
std::string sentinel = "|";
std::string input;
// read whole line, then check if exit command
while (getline(std::cin, input) && input != sentinel)
{
// use string stream to check whether input digit or string
std::stringstream ss(input);
// if string, convert to digit
// else if digit, convert to string
// else clause containing a check for invalid input
}
To discriminate between int and string value you could use peek(), for example.
Preferably the last two actions of conversion (between int and string) are done by separate functions.
Assuming the inclusion of the headers:
#include <iostream>
#include <sstream>

C++ fstream - problems reading only certain variable types

I am using fstream to read a notepad file containing numerical data. I am using dynamic memory allocation and the data is type float.
However there is rouge data in form of characters in my file - how would I write code that searches for and ignores the characters in the file, and only reads in the numbers?
I am assuming I will need to use either ignore or peek?
fstream myfile("data1");
myfile.ignore ();
or myfile.peek ();
But am a bit unsure. Any help is appreciated!
If it has always this format, the words and numbers are separated by whitespace, you can simply read it one string at a time and let a std::istringstream do the parsing. When this fails, you know it is not a number
std::string word;
while (myfile >> word) {
std::istringstream is(word);
double d;
if (is >> d) {
// found a number
std::cout << d << '\n';
} else {
// something's wrong
std::cerr << word << '\n';
}
}
Update:
Due to popular demand: a stringstream works like any other stream (std::cin, std::cout or std::fstream). The main difference is that a stringstream operates on strings. This means input comes from a string instead of a file or standard input, or output goes to a string, much like it goes to standard output or a file.
Parsing input is like this typically requires that you extract the tokens into a string and
test the content of your string against your parsing requirements. For example, when you extract into the string, you can then run a function which inserts it into a std::stringstream, then extract into the data type you're testing against, and see if it succeeds.
Another option is to check if the string is not a certain string, and convert back to the desired data type if so:
while (f >> str)
{
if (f != "badInput")
{
// convert to double and add to array
}
}
Fortunately you can use the Boost.Regex facilities to avoid having to do most of the work yourself. Here's an example similar to yours:
#include <boost/regex.hpp>
int main()
{
std::fstream f("test.txt");
std::string token;
boost::regex floatingPoint("((\\+|-)?[0-9]+)?(\\.)?([0-9]+)");
while (f >> token)
{
if (boost::regex_match(token, floatingPoint))
{
// convert to double using lexical_cast<> and add to array
}
}
Thanks for the help everybody - But all this seems a bit advanced for someone of my poor capability! We have been suggested to use the following - but am unsure how you would do this to distinguish between words and numbers:
fstream myfile("data1");
myfile.eof ();
myfile.good ();
myfile.fail ();
myfile.clear ();
myfile.ignore ();
myfile.close ();
myfile.peek ();

How to read a file and get words in C++

I am curious as to how I would go about reading the input from a text file with no set structure (Such as notes or a small report) word by word.
The text for example might be structured like this:
"06/05/1992
Today is a good day;
The worm has turned and the battle was won."
I was thinking maybe getting the line using getline, and then seeing if I can split it into words via whitespace from there. Then I thought using strtok might work! However I don't think that will work with the punctuation.
Another method I was thinking of was getting everything char by char and omitting the characters that were undesired. Yet that one seems unlikely.
So to sort the thing short:
Is there an easy way to read an input from a file and split it into words?
Since it's easier to write than to find the duplicate question,
#include <iterator>
std::istream_iterator<std::string> word_iter( my_file_stream ), word_iter_end;
size_t wordcnt;
for ( ; word_iter != word_iter_end; ++ word_iter ) {
std::cout << "word " << wordcnt << ": " << * word_iter << '\n';
}
The std::string argument to istream_iterator tells it to return a string when you do *word_iter. Every time the iterator is incremented, it grabs another word from its stream.
If you have multiple iterators on the same stream at the same time, you can choose between data types to extract. However, in that case it may be easier just to use >> directly. The advantage of an iterator is that it can plug into the generic functions in <algorithm>.
Yes. You're looking for std::istream::operator>> :) Note that it will remove consecutive whitespace but I doubt that's a problem here.
i.e.
std::ifstream file("filename");
std::vector<std::string> words;
std::string currentWord;
while(file >> currentWord)
words.push_back(currentWord);
You can use getline with a space character, getline(buffer,1000,' ');
Or perhaps you can use this function to split a string into several parts, with a certain delimiter:
string StrPart(string s, char sep, int i) {
string out="";
int n=0, c=0;
for (c=0;c<(int)s.length();c++) {
if (s[c]==sep) {
n+=1;
} else {
if (n==i) out+=s[c];
}
}
return out;
}
Notes: This function assumes that it you have declared using namespace std;.
s is the string to be split.
sep is the delimiter
i is the part to get (0 based).
You can use the scanner technique to grabb words, numbers dates etc... very simple and flexible. The scanner normally returns token (word, number, real, keywords etc..) to a Parser.
If you later intend to interpret the words, I would recommend this approach.
I can warmly recommend the book "Writing Compilers and Interpreters" by Ronald Mak (Wiley Computer Publishing)