Fastest way to read in a file c++ - c++

I would like to read in a file like this:
13.3027 29.2191 2.39999
13.3606 29.1612 2.39999
13.3586 29.0953 2.46377
13.4192 29.106 2.37817
It has more than 1mio lines.
My current cpp code is:
loadCloud(const string &filename, PointCloud<PointXYZ> &cloud)
{
print_info("\nLoad the Cloud .... (this takes some time!!!) \n");
ifstream fs;
fs.open(filename.c_str(), ios::binary);
if (!fs.is_open() || fs.fail())
{
PCL_ERROR(" Could not open file '%s'! Error : %s\n", filename.c_str(), strerror(errno));
fs.close();
return (false);
}
string line;
vector<string> st;
while (!fs.eof())
{
getline(fs, line);
// Ignore empty lines
if (line == "")
{
std::cout << " this line is empty...." << std::endl;
continue;
}
// Tokenize the line
boost::trim(line);
boost::split(st, line, boost::is_any_of("\t\r "), boost::token_compress_on);
cloud.push_back(PointXYZ(float(atof(st[0].c_str())), float(atof(st[1].c_str())), float(atof(st[2].c_str()))));
}
fs.close();
std::cout<<" Size of loaded cloud: " << cloud.size()<<" points" << std::endl;
cloud.width = uint32_t(cloud.size()); cloud.height = 1; cloud.is_dense = true;
return (true);
}
Reading this file currently takes really long. I would like to speed this up any ideas how to do that?

You can just read the numbers instead of the whole line plus parsing, as long as the numbers always come in sets of three.
void readFile(const std::string& fileName)
{
std::ifstream infile(fileName);
float vertex[3];
int coordinateCounter = 0;
while (infile >> vertex[coordinateCounter])
{
coordinateCounter++;
if (coordinateCounter == 3)
{
cloud.push_back(PointXYZ(vertex[0], vertex[1], vertex[2]));
coordinateCounter = 0;
}
}
}

Are you running optimised code? On my machine your code reads a million values in 1800ms.
The trim and the split are probably taking most of the time. If there is white space at the beginning of the string trim has to copy the whole string contents to erase the first characters. split is creating new string copies, you can optimise this by using string_view to avoid the copies.
As your separators are white space you can avoid all the copies with code like this:
bool loadCloud(const string &filename, std::vector<std::array<float, 3>> &cloud)
{
ifstream fs;
fs.open(filename.c_str(), ios::binary);
if (!fs)
{
fs.close();
return false;
}
string line;
vector<string> st;
while (getline(fs, line))
{
// Ignore empty lines
if (line == "")
{
continue;
}
const char* first = &line.front();
const char* last = first + line.length();
std::array<float, 3> arr;
for (float& f : arr)
{
auto result = std::from_chars(first, last, f);
if (result.ec != std::errc{})
{
return false;
}
first = result.ptr;
while (first != last && isspace(*first))
{
first++;
}
}
if (first != last)
{
return false;
}
cloud.push_back(arr);
}
fs.close();
return true;
}
On my machine this code runs in 650ms. About 35% of the time is used by getline, 45% by parsing the floats, the remaining 20% is used by push_back.
A few notes:
I've fixed the while(!fs.eof()) issue by checking the state of the stream after calling getline
I've changed the result to an array as your example wasn't a mcve so I didn't have a definition of PointCloud or PointXYZ, its possible that these types are the cause of your slowness.
If you know the number of lines (or at least an approximation) in advance then reserving the size of the vector would improve performance

Related

How to get line in a file partially by C++

I want to read data in an input file partially. For example, input file is 1GB, I want to read only 100MB each time, then store in a vector. How can I continue reading the next line after the first loop? As you can see in my code below, after the first loop of i, maybe the vector v stored 1000 lines from the input file. I'm not sure if the next loop of i, the command while(std::getline(infile, line)) will continue to read from line 1001 from the input file or not? If not, how can I modify my code to get lines from the input in several groups (1~1000), (1001~2000), (2001~3000)... then store in vector v?
#define FILESIZE 1000000000 // size of the file on disk
#define TOTAL_MEM 100000 // max items the memory buffer can hold
void ExternalSort(std::string infilepath, std::string outfilepath)
{
std::vector<std::string> v;
int runs_count;
std::ifstream infile;
if(!infile.is_open())
{
std::cout << "Unable to open file\n";
}
infile.open(infilepath, std::ifstream::in);
if(FILESIZE % TOTAL_MEM > 0)
runs_count = FILESIZE/TOTAL_MEM + 1;
else
runs_count = FILESIZE/TOTAL_MEM;
// Iterate through the elements in the file
for(i = 0; i < runs_count; i++)
{
// Step 1: Read M-element chunk at a time from the file
for (j = 0; j < (TOTAL_MEM < FILESIZE ? TOTAL_MEM : FILESIZE); j++)
{
while(std::getline(infile, line))
{
// If line is empty, ignore it
if(line.empty())
continue;
new_line = line + "\n";
// Line contains string of length > 0 then save it in vector
if(new_line.size() > 0)
v.push_back(new_line);
}
}
// Step 2: Sort M elements
sort(v.begin(), v.end()); //sort(v.begin(), v.end(), compare);
// Step 3: Create temporary files and write sorted data into those files.
std::ofstream tf;
tf.open(tfile + ToString(i) + ".txt", std::ofstream::out | std::ofstream::app);
std::ostream_iterator<std::string> output_iterator(tf, "\n");
std::copy(v.begin(), v.end(), output_iterator);
v.clear();
//for(std::vector<std::string>::iterator it = v.begin(); it != v.end(); ++it)
// tf << *it << "\n";
tf.close();
}
infile.close();
I didn’t have the patience to check the whole code. It was easier to write a splitter from scratch. Here are some observations, anyhow:
std::ifstream infile;
if (!infile.is_open())
{
std::cout << "Unable to open file\n";
}
infile.open(infilepath, std::ifstream::in);
You will always get the message since you check before opening the file. One correct way to open a file is:
std::ifstream infile(infilepath);
if (!infile)
throw "could not open the input file";
if (infile.peek() == std::ifstream::traits_type::eof())
This will be true, for instance, even for nonexistent files. The algorithm should work for empty files, too.
if(FILESIZE % TOTAL_MEM > 0)
runs_count = FILESIZE/TOTAL_MEM + 1;
else
runs_count = FILESIZE/TOTAL_MEM;
Why do you need the number of resulting files before generate them? You will never be able to calculate it correctly since it depends on how long lines are (you cannot read half of line just to fit it into TOTAL_MEM). You should read from input file at most TOTAL_MEM bytes (but a line, at least), sort & save and then continue from where you left (see the loop in execute, below).
How can I continue reading the next line after the first loop?
If you do not close the input stream, the next read will continue from exactly where you left.
A solution:
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
std::vector<std::string> split_file(const char* fn, std::size_t mem); // see the implementation below
int main()
{
const std::size_t max_mem = 8;
auto r = split_file("input.txt", max_mem);
std::cout << "generated files:" << std::endl;
for (const auto& fn : r)
std::cout << fn << std::endl;
}
class split_file_t
{
public:
split_file_t(std::istream& is, std::size_t mem) :is_{ is }, mem_{ mem }
{
// nop
}
std::vector<std::string> execute()
{
while (make_file())
;
return std::move(ofiles_);
}
protected:
std::istream& is_;
std::size_t mem_;
std::vector<std::string> ofiles_;
static std::string make_temp_file()
{
std::string fn(512, 0);
tmpnam_s(&fn.front(), fn.size()); // this might be system dependent
std::ofstream os(fn);
os.close();
return fn;
}
bool make_file()
{
using namespace std;
// read lines
vector<string> lines;
{
streamsize max_gpos = is_.tellg() + streamsize(mem_);
string line;
while (is_.tellg() < max_gpos && getline(is_, line))
lines.push_back(line);
}
//
if (lines.empty())
return false;
// sort lines
sort(lines.begin(), lines.end());
// save lines
{
string ofile = make_temp_file();
ofstream os{ ofile };
if (!os)
throw "could not open output file";
copy(lines.begin(), lines.end(), ostream_iterator<string>(os, "\n"));
ofiles_.push_back(ofile);
}
//
return bool(is_);
}
};
std::vector<std::string> split_file(const char* fn, std::size_t mem)
{
using namespace std;
ifstream is{ fn };
if (!is)
return vector<string>();
return split_file_t{ is, mem }.execute();
}

how to read line by line string in a text file?

this code will only read and calculate the first input in the input.txt file and ignore the rest of the inputs in the input file.I have been trying to solve it so that it can read all the rest of the inputs and calculate them.
this is my code i think there is something wrong with it.
i have tried several looping methods
int main()
{
string inputLine;
ifstream file ("input.txt");// input file to be read
ofstream file1;
file1.open("output.txt");
freopen("output.txt", "w", stdout);// store all the output to this file
while (std::getline (file, inputLine)) // read the strings in the input file
{
if( strncmp( "----", inputLine.c_str(), 4 ) == 0 )
continue;
//calculating binary and hexadecimal values
char *opr = "^+-/%*=,()";
std::string::iterator end_pos = std::remove(inputLine.begin(),
inputLine.end(), ' ');
inputLine.erase(end_pos, inputLine.end());
string str=inputLine;
string str2="";
int length=str.length();
char t[length];
str.copy(t, length);
t[length] = '\0';
char* tok;
char *cop=new char [length];
str.copy(cop,length);
char *w = strtok_fixed( t, opr );
while (w!=NULL)
{
string w2=w;
std::stringstream tr;
tr << w2;
w2.clear();
tr >> w2;
int x=w2.length();
int y=x-3;
string check= w2.substr(0,3);
string check1=w2.substr(0,x);
if(check.find("0x") != std::string::npos)
{
unsigned int x= strtol(w2.c_str(), NULL, 0);
std::ostringstream s;
s << x;
const std::string ii(s.str());
str2=str2+ ii;
}
else if (check1.find("b")!=std::string::npos)
{
w2.pop_back();
long bin=std::strtol(w2.c_str(),0,2);
std::ostringstream s2;
s2<<bin;
const std::string t2(s2.str());
//inputLine.replace(inputLine.find(w2),(w2.length()+1),t2);
str2=str2+t2;
}
else
{
str2=str2+w2;
}
char a =cop[w-t+strlen(w)];
string s1="";
s1=s1+a;
std::stringstream tr1;
tr1 << s1;
s1.clear();
tr1 >> s1;
str2=str2+s1;
w = strtok_fixed (NULL, opr);
}
//str2 should be taken to the parser for final evaluations
Parser p(str2);
double value = p.Evaluate ();
std::cout<<"----------------------"<<endl;
std::cout << "Result = " << value << std::endl;
std::cout<<"----------------------"<<endl;
return 0;
}
}
The problem is at the end
return 0;
}
}
should be
}
return 0;
}
You are returning from inside your while loop instead of after your while loop finishes.
You should spend the time to indent your code correctly. It will help you spot this kind of error. You should also learn to break up your code into smaller functions. Again this will help you understand your own code a bit better.

Txt to 2 different arrays c++

I have a txt file with a lot of things in it.
The lines have this pattern: 6 spaces then 1 int, 1 space, then a string.
Also, the 1st line has the amount of lines that the txt has.
I want to put the integers in an array of ints and the string on an array of strings.
I can read it and put it into an array , but only if I'm considering the ints as chars and putting into one array of strings.When I try to separate things I have no idea on how I'd do it. Any ideas?
The code I used for putting everything in an array was this:
int size()
{
ifstream sizeX;
int x;
sizeX.open("cities.txt");
sizeX>>x;
return x;
};
int main(void)
{
int size = size();
string words[size];
ifstream file("cities.txt");
file.ignore(100000,'\n');
if(file.is_open())
{
for(int i=0; i<size; i++)
{
getline(file,words[i]);
}
}
}
Just to start I'm going to provide some tips about your code:
int size = size();
Why do you need to open the file, read the first line and then close it? That process can be done opening the file just once.
The code string words[size]; is absolutely not legal C++. You cannot instantiate a variable-length-array in C++. That C feature has been not included in C++ standard (some ref). I suggest you to replace with std::vector, which is more C++ code.
Here I write a snippet of function which perform what you need.
int parse_file(const std::string& filename,
std::vector<std::string>* out_strings,
std::vector<int>* out_integers) {
assert(out_strings != nullptr);
assert(out_integers != nullptr);
std::ifstream file;
file.open(filename, std::ios_base::in);
if (file.fail()) {
// handle the error
return -1;
}
// Local variables
int num_rows;
std::string line;
// parse the first line
std::getline(file, line);
if (line.size() == 0) {
// file empty, handle the error
return -1;
}
num_rows = std::stoi(line);
// reserve memory
out_strings->clear();
out_strings->reserve(num_rows);
out_integers->clear();
out_integers->reserve(num_rows);
for (int row = 0; row < num_rows; ++row) {
// read the line
std::getline(file, line);
if (line.size() == 0) {
// unexpected end of line, handle it
return -1;
}
// get the integer
out_integers->push_back(
std::stoi(line.substr(6, line.find(' ', 6) - 6)));
// get the string
out_strings->push_back(
line.substr(line.find(' ', 6) + 1, std::string::npos));
}
file.close();
return 0;
}
You can definitely improved it, but I think it's a good point where to start.
The last suggest I can give you, in order to improve the robustness of your code, you can match each line with a regular expression. In this way you can be sure your line is formatted exactly how you need.
For example:
std::regex line_pattern("\\s{6}[0-9]+\\s[^\\n]+");
if (std::regex_match(line, line_pattern) == false) {
// ups... the line is not formatted how you need
// this is an error
}

C++ Counting words in a file between two words

I am currently trying to count the number of words in a file. After this, I plan to make it count the words between two words in the file. For example. My file may contain. "Hello my name is James". I want to count the words, so 5. And then I would like to count the number of words between "Hello" and "James", so the answer would be 3. I am having trouble with accomplishing both tasks.
Mainly due to not being exactly sure how to structure my code.
Any help on here would be greatly appreciated. The code I am currently using is using spaces to count the words.
Here is my code:
readwords.cpp
string ReadWords::getNextWord()
{
bool pWord = false;
char c;
while((c = wordfile.get()) !=EOF)
{
if (!(isspace(c)))
{
nextword.append(1, c);
}
return nextword;
}
}
bool ReadWords::isNextWord()
{
if(!wordfile.eof())
{
return true;
}
else
{
return false;
}
}
main.cpp
main()
{
int count = 0;
ReadWords rw("hamlet.txt");
while(rw.isNextWord()){
rw.getNextWord();
count++;
}
cout << count;
rw.close();
}
What it does at the moment is counts the number of characters. I'm sure its just a simple fix and something silly that I'm missing. But I've been trying for long enough to go searching for some help.
Any help is greatly appreciated. :)
Rather than parse the file character-by-character, you can simply use istream::operator<<() to read whitespace-separated words. << returns the stream, which evaluates to true as a bool when the stream can still be read from.
vector<string> words;
string word;
while (wordfile >> word)
words.push_back(word);
There is a common formulation of this using the <iterator> and <algorithm> utilities, which is more verbose, but can be composed with other iterator algorithms:
istream_iterator<string> input(wordfile), end;
copy(input, end, back_inserter(words));
Then you have the number of words and can do with them whatever you like:
words.size()
If you want to find "Hello" and "James", use find() from the <algorithm> header to get iterators to their positions:
// Find "Hello" anywhere in 'words'.
const auto hello = find(words.begin(), words.end(), "Hello");
// Find "James" anywhere after 'hello' in 'words'.
const auto james = find(hello, words.end(), "James");
If they’re not in the vector, find() will return words.end(); ignoring error checking for the purpose of illustration, you can count the number of words between them by taking their difference, adjusting for the inclusion of "Hello" in the range:
const auto count = james - (hello + 1);
You can use operator-() here because std::vector::iterator is a “random-access iterator”. More generally, you could use std::distance() from <iterator>:
const auto count = distance(hello, james) - 1;
Which has the advantage of being more descriptive of what you’re actually doing. Also, for future reference, this kind of code:
bool f() {
if (x) {
return true;
} else {
return false;
}
}
Can be simplified to just:
bool f() {
return x;
}
Since x is already being converted to bool for the if.
To count:
std::ifstream infile("hamlet.txt");
std::size_t count = 0;
for (std::string word; infile >> word; ++count) { }
To count only between start and stop:
std::ifstream infile("hamlet.txt");
std::size_t count = 0;
bool active = false;
for (std::string word; infile >> word; )
{
if (!active && word == "Hello") { active = true; }
if (!active) continue;
if (word == "James") break;
++count;
}
I think "return nextword;" should instead be "else return nextword;" or else you are returning from the function getNextWord every time, no matter what the char is.
string ReadWords::getNextWord()
{
bool pWord = false;
char c;
while((c = wordfile.get()) !=EOF)
{
if (!(isspace(c)))
{
nextword.append(1, c);
}
else return nextword;//only returns on a space
}
}
To count all words:
std::ifstream f("hamlet.txt");
std::cout << std::distance (std::istream_iterator<std::string>(f),
std::istream_iterator<std::string>()) << '\n';
To count between two words:
std::ifstream f("hamlet.txt");
std::istream_iterator<std::string> it(f), end;
int count = 0;
while (std::find(it, end, "Hello") != end)
while (++it != end && *it != "James")
++count;
std::cout << count;
Try this:
below the line
nextword.append(1, c);
add
continue;

How to get the last but not empty line in a txt file

I want to get the last but not empty line in a txt file.
This is my code:
string line1, line2;
ifstream myfile(argv[1]);
if(myfile.is_open())
{
while( !myfile.eof() )
{
getline(myfile, line1);
if( line1 != "" || line1 != "\t" || line1 != "\n" || !line1.empty() )
line2 = line1;
}
myfile.close();
}
else
cout << "Unable to open file";
The problem is I cannot check the empty line.
Okay, let's start with the obvious part. This: while( !myfile.eof() ) is essentially always wrong, so you're not going to detect the end of the file correctly. Since you're using getline to read the data, you want to check its return value:
while (getline(myfile, line1)) // ...
Likewise, the logic here:
if( line1 != "" || line1 != "\t" || line1 != "\n" || !line1.empty() )
line2 = line1;
...is clearly wrong. I'm guessing you really want && instead of || for this. As it stands, the result is always true, because no matter what value line1 contains, it must be unequal to at least one of those values (i.e., it can't simultaneously contain only a tab and contain only a new-line and contain nothing at all -- but that would be necessary for the result to be false). Testing for both !line1.empty() and line1 != "" appears redundant as well.
Why not read the file backwards? That way you don't have to scan the entire file to accomplish this. Seems like it ought to be possible.
int main(int argc, char **argv)
{
std::cout<<"Opening "<<fn<<std::endl;
std::fstream fin(fn.c_str(), std::ios_base::in);
//go to end
fin.seekg(0, std::ios_base::end);
int currpos = fin.tellg();
//go to 1 before end of file
if(currpos > 0)
{
//collect the chars here...
std::vector<char> chars;
fin.seekg(currpos - 1);
currpos = fin.tellg();
while(currpos > 0)
{
char c = fin.get();
if(!fin.good())
{
break;
}
chars.push_back(c);
currpos -= 1;
fin.seekg(currpos);
}
//do whatever u want with chars...
//this is the reversed order
for(std::vector<char>::size_type i = 0; i < chars.size(); ++i)
{
std::cout<<chars[i];
}
//this is the forward order...
for(std::vector<char>::size_type i = chars.size(); i != 0; --i)
{
std::cout<<chars[i-1];
}
}
return 0;
}
It wouldn't be enough to change your ||'s to &&'s to check if the line is empty. What if there are seven spaces, a tab character, another 3 spaces and finally a newline? You can't list all the ways of getting only whitespace in a line. Instead, check every character in the line to see if it is whitespace.
In this code, is_empty will be false if any non-space character is found in the line.
bool is_empty = true;
for (int i = 0; i < line.size(); i++) {
char ch = line[i];
is_empty = is_empty && isspace(ch);
}
Full solution:
#include <iostream>
#include <fstream>
#include <cctype>
#include <string>
using namespace std;
int main(int argc, char* argv[]) {
string line;
string last_line;
ifstream myfile(argv[1]);
if(myfile.is_open())
{
while( getline(myfile, line) ) {
bool is_empty = true;
for (int i = 0; i < line.size(); i++) {
char ch = line[i];
is_empty = is_empty && isspace(ch);
}
if (!is_empty) {
last_line = line;
}
}
myfile.close();
cout << "Last line: " << last_line << endl;
}
else {
cout << "Unable to open file";
}
return 0;
}
Additional to what the others said:
You can avoid reading whitespace by doing myfile >> std::ws before you call std::getline(). This will consume all leading whitespaces.
Then your condition reduces to !line1.empty(). This would also work when the line contains nothing but several whitespaces, for which your version fails.
I wasn't able to google an appropriate get_last_line function for my needs and here's what i came up with. You can even read multiple non-empty last lines by recalling the instream get_last_line func without resetting the seeker. It supports a 1 char only file. I added the reset parameter, which can be set to ios_base::end to allow output operations after reading the last line(s)
std::string& get_last_line(
std::istream& in_stream,
std::string& output = std::string(),
std::ios_base::seekdir reset = std::ios_base::cur)
{
output.clear();
std::streambuf& buf = *in_stream.rdbuf();
bool text_found = false;
while(buf.pubseekoff(-1, std::ios_base::cur) >= 0)
{
char c = buf.sgetc();
if(!isspace(c))
text_found = true;
if(text_found)
{
if(c == '\n' || c == -1)
break;
output.insert(0, sizeof c, c);
}
}
buf.pubseekoff(0, reset);
return output;
}
std::string& get_last_line(
const std::string& file_name,
std::string& output = std::string())
{
std::ifstream file_in(
file_name.c_str(),
std::ios_base::in | std::ios_base::ate);
if(!file_in.is_open())
{
output.clear();
return output;
}
get_last_line(file_in, output);
file_in.close();
return output;
}