trying to read and store data from a file where there are multiple spaces between each intended data - c++

Here's a simple code on how I read and store the data. I have a text file and inside the text file are the data that I want to pass to both number and text. The code runs fine if the text file contains text such as 2 HelloWorld1, 2 is stored into number and HelloWorld1 is stored into text.
But what if the text in the txt file is as such, 2 Hello World 1 where there are spaces between Hello, World and 1? My question is would it be possible for 2 to be stored in number and Hello World 1 to be stored in text. i understand that because of the empty spaces and as such only 2 and Hello and stored in both number and text respectively. Is there a way to overcome this?
using namespace std;
int main(){
ifstream theFile("key.txt");
int number;
string text;
while(theFile>>number>>text){
cout<<number<<" and "<<text<<endl;
}
}

You are out of luck with the default stream operator >> (if that is indeed your case).
1: Know the format
The way forward is to know the format which judging from your post you are somewhat uncertain about.
2: Use the best tools for the job
After that you choose the right tool for the job. That could involve: std::getline and handpassing, perhaps using a regex (in your case, fairly simply ones), boost::spirit, tokenization techniques, boost::string_algo, lex/bison and more.
I would add that customizing stream operator functionality (while possible) is rarely the straightforward choice.
3: Design your format to match
As an alternative to knowing the format, if you can design it, so much the better. If you have record style format, an easy way to handle strings with spaces is to put the string last - then put each record on a line. That way you may first look over each line using eg. std::getline and then just use stream operators for the rest - knowing the string will come last. Other delimiters (than linefeed) is certainly doable as well.

I would like to add an example to the very good answer of #darune.
It all depends on the input format.
Assuming that your line starts with a number and then ends with a string, you can use the following approach:
First read a number with extractor operator >>
Use getline to read the rest
Please see:
#include <iostream>
#include <string>
#include <sstream>
#include <cctype>
#include <algorithm>
#include <regex>
std::istringstream testData (
R"#(1 data1
2 data2 data3
3 data 4
)#");
int main()
{
// Definition of variables
int number{};
std::string str{};
// Read file
// Read the number
while (testData >> number)
{
// Read the rest of the line in a string
getline(testData, str);
// Remove leading and trailing spaces
str = std::regex_replace(str, std::regex("^ +| +$|( ) +"), "$1");
// Show result
std::cout << number << ' ' <<str << '\n';
};
return 0;
}
Result:
1 data1
2 data2 data3
3 data 4
But as said, it strongly depends on the input format

Related

In c++, how do you get the input of a string, float and integer from 1 line?

An input file is entered with the following data:
Juan Dela Cruz 150.50 5
'Juan Dela Cruz' is a name that I would like to assign to string A,
'150.50' is a number I would like to assign to float B
and 5 is a number I would like to assign to int C.
If I try cin, it is delimited by the spaces in between.
If I use getline, it's getting the whole line as a string.
What would be the correct syntax for this?
If we analyze the string, then we can make the following observation. At the very end, we have an integer. In front of the integer we have a space. And in front of that the float value. And again in fron of that a space.
So, we can simply look from the back of the string for the 2nd last space. This can easily be achieved by
size_t position = lineFromeFile.rfind(' ', lineFromeFile.rfind(' ')-1);
We need a nested statement of rfind please see here, version no 3.
Then we build a substring with the name. From start of the string up to the found position.
For the numbers, we put the rest of the original string into an std::istringstream and then simply extract from there.
Please see the following simple code, which has just a few lines of code.
#include <iostream>
#include <string>
#include <cctype>
#include <sstream>
int main() {
// This is the string that we read via getline or whatever
std::string lineFromeFile("Juan Dela Cruz 150.50 5");
// Let's search for the 2nd last space
size_t position = lineFromeFile.rfind(' ', lineFromeFile.rfind(' ')-1);
// Get the name as a substring from the original string
std::string name = lineFromeFile.substr(0, position);
// Put the numbers in a istringstream for better extraction
std::istringstream iss(lineFromeFile.substr(position));
// Get the rest of the values
float fValue;
int iValue;
iss >> fValue >> iValue;
// Show result to use
std::cout << "\nName:\t" << name << "\nFloat:\t" << fValue << "\nInt:\t" << iValue << '\n';
return 0;
}
Probably simplest in this case would be to read whole line into string and then parse it with regex:
const std::regex reg("\\s*(\\S.*)\\s+(\\d+(\\.\\d+)?)\\s+(\\d+)\\s*");
std::smatch match;
if (std::regex_match( input, match, reg)) {
auto A = match[1];
auto B = std::stof( match[2] );
auto C = std::stoi( match[4] );
} else {
// error invalid format
}
Live example
As always when the input does not (or sometimes does not) match a strict enough syntax, read the whole line and then apply the rules which to a human are "obvious".
In this case (quoting comment by john):
Read the whole string as a single line. Then analyze the string to work out where the breaks are between A, B and C. Then convert each part to the type you require.
Specifically, you probably want to use reverse searching functions (e.g. https://en.cppreference.com/w/cpp/string/byte/strrchr ), because the last parts of the input seem the most strictly formatted, i.e. easiest to parse. The rest is then the unpredictable part at the start.
either try inputting the different data type in different lines and then use line breaks to input different data types or use the distinction to differentiate different data types like adding a . or comma
use the same symbol after each data package, for example, Juan Dela Cruz;150.50;5 then you can check for a ; and separate your string there.
If you want to use the same input format you could use digits as an indicator to separate them

How to get more performance when reading file

My program download files from site (via curl per 30 min). (it is possible that size of these files can reach 150 mb)
So i thought that getting data from these files can be inefficient. (search a line per 5 seconds)
These files can have ~10.000 lines
To parse this file (values are seperate by ",") i use regex :
regex wzorzec("(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*)");
There are 8 values.
Now i have to push it to vector:
allys.push_back({ std::stoi(std::string(wynik[1])), nick, tag, stoi(string(wynik[4])), stoi(string(wynik[5])), stoi(string(wynik[6])), stoi(string(wynik[7])), stoi(string(wynik[8])) });
I use std::async to do that, but for 3 files (~7 mb) procesor jumps to 80% and operation take about 10 secs. I read from SSD so this is not slowly IO fault.
I'm reading data line per line by fstream
How to boost this operation?
Maybe i have to parse this values, and push it to SQL ?
Best Regards
You can probably get some performance boost by avoiding regex, and use something along the lines of std::strtok, or else just hard-code a search for commas in your data. Regex has more power than you need just to look for commas. Next, if you use vector::reserve before you begin a sequence of push_back for any given vector, you will save a lot of time in both reallocation and moving memory around. If you are expecting a large vector, reserve room for it up front.
This may not cover all available performance ideas, but I'd bet you will see an improvement.
Your problem here is most likely additional overhead introduced by the regular expression, since you're using many variable length and greedy matches (the regex engine will try different alignments for the matches to find the largest matching result).
Instead, you might want to try to manually parse the lines. There are many different ways to achieve this. Here's one quick and dirty example (it's not flexible and has quite some duplicate code in there, but there's lots of room for optimization). It should explain the basic idea though:
#include <iostream>
#include <sstream>
#include <cstdlib>
const char *input = "1,Mario,Stuff,4,5,6,7,8";
struct data {
int id;
std::string nick;
std::string tag;
} myData;
int main(int argc, char **argv){
char buffer[256];
std::istringstream in(input);
// Read an entry and convert/store it:
in.get(buffer, 256, ','); // read
myData.id = atoi(buffer); // convert and store
// Skip the comma
in.seekg(1, std::ios::cur);
// Read the next entry and convert/store it:
in.get(buffer, 256, ','); // read
myData.nick = buffer; // store
// Skip the comma
in.seekg(1, std::ios::cur);
// Read the next entry and convert/store it:
in.get(buffer, 256, ','); // read
myData.tag = buffer; // store
// Skip the comma
in.seekg(1, std::ios::cur);
// Some test output
std::cout << "id: " << myData.id << "\nnick: " << myData.nick << "\ntag: " << myData.tag << std::endl;
return 0;
}
Note that there isn't any error handling in case entries are too long or too short (or broken in some other way).
Console output:
id: 1
nick: Mario
tag: Stuff

What's the correct way to read a text file in C++?

I need to make a program in C++ that must read and write text files line by line with an specific format, but the problem is that in my PC I work in Windows, and in College they have Linux and I am having problems because of line endings are different in these OS.
I am new to C++ and don't know could I make my program able read the files no matter if they were written in Linux or Windows. Can anybody give me some hints? thanks!
The input is like this:
James White 34 45.5 10 black
Miguel Chavez 29 48.7 9 red
David McGuire 31 45.8 10 blue
Each line being a record of a struct of 6 variables.
Using the std::getline overload without the last (i.e. delimiter) parameter should take care of the end-of-line conversions automatically:
std::ifstream in("TheFile.txt");
std::string line;
while (std::getline(in, line)) {
// Do something with 'line'.
}
Here's a simple way to strip string of an extra "\r":
std::ifstream in("TheFile.txt");
std::string line;
std::getline(input, line));
if (line[line.size() - 1] == '\r')
line.resize(line.size() - 1);
If you can already read the files, just check for all of the newline characters like "\n" and "\r". I'm pretty sure that linux uses "\r\n" as the newline character.
You can read this page: http://en.wikipedia.org/wiki/Newline
and here is a list of all the ascii codes including the newline characters:
http://www.asciitable.com/
Edit: Linux uses "\n", Windows uses "\r\n", Mac uses "\r". Thanks to Seth Carnegie
Since the result will be CR LF, I would add something like the following to consume the extras if they exist. So once your have read you record call this before trying to read the next.
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
If you know the number of values you are going to read for each record you could simply use the ">>" method. For example:
fstream f("input.txt" std::ios::in);
string tempStr;
double tempVal;
for (number of records) {
// read the first name
f >> tempStr;
// read the last name
f >> tempStr;
// read the number
f >> tempVal;
// and so on.
}
Shouldn't that suffice ?
Hi I will give you the answer in stages. Please go trough in order to understand the code.
Stage 1: Design our program:
Our program based on the requirements should...:
...include a definition of a data type that would hold the data. i.e. our
structure of 6 variables.
...provide user interaction i.e. the user should be able to
provide the program, the file name and its location.
...be able to
open the chosen file.
...be able to read the file data and
write/save them into our structure.
...be able to close the file
after the data is read.
...be able to print out of the saved data.
Usually you should split your code into functions representing the above.
Stage 2: Create an array of the chosen structure to hold the data
...
#define MAX 10
...
strPersonData sTextData[MAX];
...
Stage 3: Enable user to give in both the file location and its name:
.......
string sFileName;
cout << "Enter a file name: ";
getline(cin,sFileName);
ifstream inFile(sFileName.c_str(),ios::in);
.....
->Note 1 for stage 3. The accepted format provided then by the user should be:
c:\\SomeFolder\\someTextFile.txt
We use two \ backslashes instead of one \, because we wish it to be treated as literal backslash.
->Note 2 for stage 3. We use ifstream i.e. input file stream because we want to read data from file. This
is expecting the file name as c-type string instead of a c++ string. For this reason we use:
..sFileName.c_str()..
Stage 4: Read all data of the chosen file:
...
while (!inFile.eof()) { //we loop while there is still data in the file to read
...
}
...
So finally the code is as follows:
#include <iostream>
#include <fstream>
#include <cstring>
#define MAX 10
using namespace std;
int main()
{
string sFileName;
struct strPersonData {
char c1stName[25];
char c2ndName[30];
int iAge;
double dSomeData1; //i had no idea what the next 2 numbers represent in your code :D
int iSomeDate2;
char cColor[20]; //i dont remember the lenghts of the different colors.. :D
};
strPersonData sTextData[MAX];
cout << "Enter a file name: ";
getline(cin,sFileName);
ifstream inFile(sFileName.c_str(),ios::in);
int i=0;
while (!inFile.eof()) { //loop while there is still data in the file
inFile >>sTextData[i].c1stName>>sTextData[i].c2ndName>>sTextData[i].iAge
>>sTextData[i].dSomeData1>>sTextData[i].iSomeDate2>>sTextData[i].cColor;
++i;
}
inFile.close();
cout << "Reading the file finished. See it yourself: \n"<< endl;
for (int j=0;j<i;j++) {
cout<<sTextData[j].c1stName<<"\t"<<sTextData[j].c2ndName
<<"\t"<<sTextData[j].iAge<<"\t"<<sTextData[j].dSomeData1
<<"\t"<<sTextData[j].iSomeDate2<<"\t"<<sTextData[j].cColor<<endl;
}
return 0;
}
I am going to give you some exercises now :D :D
1) In the last loop:
for (int j=0;j<i;j++) {
cout<<sTextData[j].c1stName<<"\t"<<sTextData[j].c2ndName
<<"\t"<<sTextData[j].iAge<<"\t"<<sTextData[j].dSomeData1
<<"\t"<<sTextData[j].iSomeDate2<<"\t"<<sTextData[j].cColor<<endl;}
Why do I use variable i instead of lets say MAX???
2) Could u change the program based on stage 1 on sth like:
int main(){
function1()
function2()
...
functionX()
...return 0;
}
I hope i helped...

How to read a file and get words in C++

I am curious as to how I would go about reading the input from a text file with no set structure (Such as notes or a small report) word by word.
The text for example might be structured like this:
"06/05/1992
Today is a good day;
The worm has turned and the battle was won."
I was thinking maybe getting the line using getline, and then seeing if I can split it into words via whitespace from there. Then I thought using strtok might work! However I don't think that will work with the punctuation.
Another method I was thinking of was getting everything char by char and omitting the characters that were undesired. Yet that one seems unlikely.
So to sort the thing short:
Is there an easy way to read an input from a file and split it into words?
Since it's easier to write than to find the duplicate question,
#include <iterator>
std::istream_iterator<std::string> word_iter( my_file_stream ), word_iter_end;
size_t wordcnt;
for ( ; word_iter != word_iter_end; ++ word_iter ) {
std::cout << "word " << wordcnt << ": " << * word_iter << '\n';
}
The std::string argument to istream_iterator tells it to return a string when you do *word_iter. Every time the iterator is incremented, it grabs another word from its stream.
If you have multiple iterators on the same stream at the same time, you can choose between data types to extract. However, in that case it may be easier just to use >> directly. The advantage of an iterator is that it can plug into the generic functions in <algorithm>.
Yes. You're looking for std::istream::operator>> :) Note that it will remove consecutive whitespace but I doubt that's a problem here.
i.e.
std::ifstream file("filename");
std::vector<std::string> words;
std::string currentWord;
while(file >> currentWord)
words.push_back(currentWord);
You can use getline with a space character, getline(buffer,1000,' ');
Or perhaps you can use this function to split a string into several parts, with a certain delimiter:
string StrPart(string s, char sep, int i) {
string out="";
int n=0, c=0;
for (c=0;c<(int)s.length();c++) {
if (s[c]==sep) {
n+=1;
} else {
if (n==i) out+=s[c];
}
}
return out;
}
Notes: This function assumes that it you have declared using namespace std;.
s is the string to be split.
sep is the delimiter
i is the part to get (0 based).
You can use the scanner technique to grabb words, numbers dates etc... very simple and flexible. The scanner normally returns token (word, number, real, keywords etc..) to a Parser.
If you later intend to interpret the words, I would recommend this approach.
I can warmly recommend the book "Writing Compilers and Interpreters" by Ronald Mak (Wiley Computer Publishing)

Reading a file of mixed data into a C++ string

I need to use C++ to read in text with spaces, followed by a numeric value.
For example, data that looks like:
text1
1.0
text two
2.1
text2 again
3.1
can't be read in with 2 "infile >>" statements. I'm not having any luck with getline
either. I ultimately want to populate a struct with these 2 data elements. Any ideas?
The standard IO library isn't going to do this for you alone, you need some sort of simple parsing of the data to determine where the text ends and the numeric value begins. If you can make some simplifying assumptions (like saying there is exactly one text/number pair per line, and minimal error recovery) it wouldn't be too bad to getline() the whole thing into a string and then scan it by hand. Otherwise, you're probably better off using a regular expression or parsing library to handle this, rather than reinventing the wheel.
Why? You can use getline providing a space as line separator. Then stitch extracted parts if next is a number.
If you can be sure that your input is well-formed, you can try something like this sample:
#include <iostream>
#include <sstream>
int main()
{
std::istringstream iss("text1 1.0 text two 2.1 text2 again 3.1");
for ( ;; )
{
double x;
if ( iss >> x )
{
std::cout << x << std::endl;
}
else
{
iss.clear();
std::string junk;
if ( !(iss >> junk) )
break;
}
}
}
If you do have to validate input (instead of just trying to parse anything looking like a double from it), you'll have to write some kind of parser, which is not hard but boring.
Pseudocode.
This should work. It assumes you have text/numbers in pairs, however. You'll have to do some jimmying to get all the typing happy, also.
while( ! eof)
getline(textbuffer)
getline(numberbuffer)
stringlist = tokenize(textbuffer)
number = atof(numberbuffer)