When I try to read a file to a buffer, it always appends random characters to the end of the buffer.
char* thefile;
std::streampos size;
std::fstream file(_file, std::ios::in | std::ios::ate);
if (file.is_open())
{
size = file.tellg();
std::cout << "size: " << size;
thefile = new char[size]{0};
file.seekg(0, std::ios::beg);
file.read(thefile, size);
std::cout << thefile;
}
int x = 0;
While my original text in my file is: "hello"
The output becomes: "helloýýýý««««««««þîþîþ"
Could anyone help me as to what is happening here? Thanks
From the C++ docs: http://cplusplus.com/reference/istream/istream/read
"This function simply copies a block of data, without checking its contents nor appending a null character at the end."
So your string misses the trailing null character which indicates the end of the string. In this case cout will just continue printing characters from what is beyond thefile in memory.
Add a '\0' at the end of your string.
If the file is not opened with ios::binary mode, you cannot assume that the position returned by tellg() will give you the number of chars that you will read. Text mode operation may perform some transformations on the flow (f.ex: on windows, it will convert "\r\n" in the file in "\n", so you might find out a size of 2 but read only 1)
Anyway, read() doesn't add a null terminator.
Finally, you must allocate one more character than the size that you expect due to the null terminator that you have to add. Otherwise you risk a buffer overflow when you add it.
You should verify how many chars were really read with gcount(), and set a null terminator to your string accordingly.
thefile = new char[size + 1]{0}; // one more for the trailing null
file.seekg(0, std::ios::beg);
if (file.read(thefile, size))
thefile[size]=0; // successfull read: all size chars were read
else thefile[file.gcount()]=0; // or less chars were read due to text mode
Here's a better way of reading your collection:
#include <vector>
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <cstdint>
#include <iterator>
template<class T>
void Write(std::string const & path, T const & value, std::ios_base::openmode mode)
{
if (auto stream = std::ofstream(path, mode))
{
Write(stream, value);
stream.close();
}
else
{
throw std::runtime_error("failed to create/open stream");
}
}
template<class T>
void Write(std::ostream & stream, T const & value)
{
std::copy(value.begin(), value.end(), std::ostreambuf_iterator<char>(stream));
if (!stream)
{
throw std::runtime_error("failed to write");
}
}
template<class T>
void Read(std::istream & stream, T & output)
{
auto eof = std::istreambuf_iterator<char>();
output = T(std::istreambuf_iterator<char>(stream), eof);
if(!stream)
{
throw std::runtime_error("failed to read stream");
}
}
template<class T>
void Read(std::string const & path, T & output)
{
if (auto stream = std::ifstream(path, std::ios::in | std::ios::binary))
{
Read(stream, output);
stream.close();
}
else
{
throw std::runtime_error("failed to create stream");
}
}
int main(void)
{
// Write and read back text.
{
auto const s = std::string("I'm going to write this string to a file");
Write("temp.txt", s, std::ios_base::trunc | std::ios_base::out);
auto t = std::string();
Read("temp.txt", t);
}
// Write and read back a set of ints.
{
auto const v1 = std::vector<int>() = { 10, 20, 30, 40, 50 };
Write("temp.txt", v1, std::ios_base::trunc | std::ios_base::out | std::ios_base::binary);
auto v2 = std::vector<int>();
Read("temp.txt", v2);
}
return 0;
}
Pass in an iterable container rather than using "new".
Related
I want to read data in an input file partially. For example, input file is 1GB, I want to read only 100MB each time, then store in a vector. How can I continue reading the next line after the first loop? As you can see in my code below, after the first loop of i, maybe the vector v stored 1000 lines from the input file. I'm not sure if the next loop of i, the command while(std::getline(infile, line)) will continue to read from line 1001 from the input file or not? If not, how can I modify my code to get lines from the input in several groups (1~1000), (1001~2000), (2001~3000)... then store in vector v?
#define FILESIZE 1000000000 // size of the file on disk
#define TOTAL_MEM 100000 // max items the memory buffer can hold
void ExternalSort(std::string infilepath, std::string outfilepath)
{
std::vector<std::string> v;
int runs_count;
std::ifstream infile;
if(!infile.is_open())
{
std::cout << "Unable to open file\n";
}
infile.open(infilepath, std::ifstream::in);
if(FILESIZE % TOTAL_MEM > 0)
runs_count = FILESIZE/TOTAL_MEM + 1;
else
runs_count = FILESIZE/TOTAL_MEM;
// Iterate through the elements in the file
for(i = 0; i < runs_count; i++)
{
// Step 1: Read M-element chunk at a time from the file
for (j = 0; j < (TOTAL_MEM < FILESIZE ? TOTAL_MEM : FILESIZE); j++)
{
while(std::getline(infile, line))
{
// If line is empty, ignore it
if(line.empty())
continue;
new_line = line + "\n";
// Line contains string of length > 0 then save it in vector
if(new_line.size() > 0)
v.push_back(new_line);
}
}
// Step 2: Sort M elements
sort(v.begin(), v.end()); //sort(v.begin(), v.end(), compare);
// Step 3: Create temporary files and write sorted data into those files.
std::ofstream tf;
tf.open(tfile + ToString(i) + ".txt", std::ofstream::out | std::ofstream::app);
std::ostream_iterator<std::string> output_iterator(tf, "\n");
std::copy(v.begin(), v.end(), output_iterator);
v.clear();
//for(std::vector<std::string>::iterator it = v.begin(); it != v.end(); ++it)
// tf << *it << "\n";
tf.close();
}
infile.close();
I didn’t have the patience to check the whole code. It was easier to write a splitter from scratch. Here are some observations, anyhow:
std::ifstream infile;
if (!infile.is_open())
{
std::cout << "Unable to open file\n";
}
infile.open(infilepath, std::ifstream::in);
You will always get the message since you check before opening the file. One correct way to open a file is:
std::ifstream infile(infilepath);
if (!infile)
throw "could not open the input file";
if (infile.peek() == std::ifstream::traits_type::eof())
This will be true, for instance, even for nonexistent files. The algorithm should work for empty files, too.
if(FILESIZE % TOTAL_MEM > 0)
runs_count = FILESIZE/TOTAL_MEM + 1;
else
runs_count = FILESIZE/TOTAL_MEM;
Why do you need the number of resulting files before generate them? You will never be able to calculate it correctly since it depends on how long lines are (you cannot read half of line just to fit it into TOTAL_MEM). You should read from input file at most TOTAL_MEM bytes (but a line, at least), sort & save and then continue from where you left (see the loop in execute, below).
How can I continue reading the next line after the first loop?
If you do not close the input stream, the next read will continue from exactly where you left.
A solution:
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
std::vector<std::string> split_file(const char* fn, std::size_t mem); // see the implementation below
int main()
{
const std::size_t max_mem = 8;
auto r = split_file("input.txt", max_mem);
std::cout << "generated files:" << std::endl;
for (const auto& fn : r)
std::cout << fn << std::endl;
}
class split_file_t
{
public:
split_file_t(std::istream& is, std::size_t mem) :is_{ is }, mem_{ mem }
{
// nop
}
std::vector<std::string> execute()
{
while (make_file())
;
return std::move(ofiles_);
}
protected:
std::istream& is_;
std::size_t mem_;
std::vector<std::string> ofiles_;
static std::string make_temp_file()
{
std::string fn(512, 0);
tmpnam_s(&fn.front(), fn.size()); // this might be system dependent
std::ofstream os(fn);
os.close();
return fn;
}
bool make_file()
{
using namespace std;
// read lines
vector<string> lines;
{
streamsize max_gpos = is_.tellg() + streamsize(mem_);
string line;
while (is_.tellg() < max_gpos && getline(is_, line))
lines.push_back(line);
}
//
if (lines.empty())
return false;
// sort lines
sort(lines.begin(), lines.end());
// save lines
{
string ofile = make_temp_file();
ofstream os{ ofile };
if (!os)
throw "could not open output file";
copy(lines.begin(), lines.end(), ostream_iterator<string>(os, "\n"));
ofiles_.push_back(ofile);
}
//
return bool(is_);
}
};
std::vector<std::string> split_file(const char* fn, std::size_t mem)
{
using namespace std;
ifstream is{ fn };
if (!is)
return vector<string>();
return split_file_t{ is, mem }.execute();
}
I would like to read in a file like this:
13.3027 29.2191 2.39999
13.3606 29.1612 2.39999
13.3586 29.0953 2.46377
13.4192 29.106 2.37817
It has more than 1mio lines.
My current cpp code is:
loadCloud(const string &filename, PointCloud<PointXYZ> &cloud)
{
print_info("\nLoad the Cloud .... (this takes some time!!!) \n");
ifstream fs;
fs.open(filename.c_str(), ios::binary);
if (!fs.is_open() || fs.fail())
{
PCL_ERROR(" Could not open file '%s'! Error : %s\n", filename.c_str(), strerror(errno));
fs.close();
return (false);
}
string line;
vector<string> st;
while (!fs.eof())
{
getline(fs, line);
// Ignore empty lines
if (line == "")
{
std::cout << " this line is empty...." << std::endl;
continue;
}
// Tokenize the line
boost::trim(line);
boost::split(st, line, boost::is_any_of("\t\r "), boost::token_compress_on);
cloud.push_back(PointXYZ(float(atof(st[0].c_str())), float(atof(st[1].c_str())), float(atof(st[2].c_str()))));
}
fs.close();
std::cout<<" Size of loaded cloud: " << cloud.size()<<" points" << std::endl;
cloud.width = uint32_t(cloud.size()); cloud.height = 1; cloud.is_dense = true;
return (true);
}
Reading this file currently takes really long. I would like to speed this up any ideas how to do that?
You can just read the numbers instead of the whole line plus parsing, as long as the numbers always come in sets of three.
void readFile(const std::string& fileName)
{
std::ifstream infile(fileName);
float vertex[3];
int coordinateCounter = 0;
while (infile >> vertex[coordinateCounter])
{
coordinateCounter++;
if (coordinateCounter == 3)
{
cloud.push_back(PointXYZ(vertex[0], vertex[1], vertex[2]));
coordinateCounter = 0;
}
}
}
Are you running optimised code? On my machine your code reads a million values in 1800ms.
The trim and the split are probably taking most of the time. If there is white space at the beginning of the string trim has to copy the whole string contents to erase the first characters. split is creating new string copies, you can optimise this by using string_view to avoid the copies.
As your separators are white space you can avoid all the copies with code like this:
bool loadCloud(const string &filename, std::vector<std::array<float, 3>> &cloud)
{
ifstream fs;
fs.open(filename.c_str(), ios::binary);
if (!fs)
{
fs.close();
return false;
}
string line;
vector<string> st;
while (getline(fs, line))
{
// Ignore empty lines
if (line == "")
{
continue;
}
const char* first = &line.front();
const char* last = first + line.length();
std::array<float, 3> arr;
for (float& f : arr)
{
auto result = std::from_chars(first, last, f);
if (result.ec != std::errc{})
{
return false;
}
first = result.ptr;
while (first != last && isspace(*first))
{
first++;
}
}
if (first != last)
{
return false;
}
cloud.push_back(arr);
}
fs.close();
return true;
}
On my machine this code runs in 650ms. About 35% of the time is used by getline, 45% by parsing the floats, the remaining 20% is used by push_back.
A few notes:
I've fixed the while(!fs.eof()) issue by checking the state of the stream after calling getline
I've changed the result to an array as your example wasn't a mcve so I didn't have a definition of PointCloud or PointXYZ, its possible that these types are the cause of your slowness.
If you know the number of lines (or at least an approximation) in advance then reserving the size of the vector would improve performance
I want to write/read data from a file. Is it possible to divide the file (inside the code) in multiple Strings/Sections? Or read data untill a specific line?
Just like: "Read the Data untill line 32, put it inside a String, read the next 32 lines and put it into another string"
Im already know how to read and find data with seekp but i dont really like it because my code always gets to long.
I already found some code but i dont understand it how it works:
dataset_t* DDS::readFile(std::string filename)
{
dataset_t* dataset = NULL;
std::stringstream ss;
std::ifstream fs;
uint8_t tmp_c;
try
{
fs.open(filename.c_str(), std::ifstream::in);
if (!fs)
{
std::cout << "File not found: " << filename << std::endl;
return NULL;
}
while(fs.good())
{
fs.read((char*)&tmp_c, 1);
if (fs.good()) ss.write((char*)&tmp_c, 1);
}
fs.close();
dataset = new dataset_t();
const uint32_t bufferSize = 32;
char* buffer = new char[bufferSize];
uint32_t count = 1;
while(ss.good())
{
ss.getline(buffer, bufferSize);
dataitem_t dataitem;
dataitem.identifier = buffer;
dataitem.count = count;
dataset->push_back(dataitem);
count++;
}
return dataset;
}
catch(std::exception e)
{
cdelete(dataset);
return NULL;
}
}
The Code edits a binary save file.
Or can someone link me a website where i can learn more about buffers and stringstreams?
You could create some classes to model your requirement: a take<N> for 'grab 32 lines', and a lines_from to iterate over lines.
Your lines_from class would take any std::istream: something encoded, something zipped, ... as long as it gives you a series of characters. The take<N> would convert that into array<string, N> chunks.
Here's a snippet that illustrates it:
int main(){
auto lines = lines_from{std::cin};
while(lines.good()){
auto chunk = take<3>(lines);
std::cout << chunk[0][0] << chunk[1][0] << chunk[2][0] << std::endl;
}
}
And here are the supporting classes and functions:
#include <iostream>
#include <array>
class lines_from {
public:
std::istream ∈
using value_type = std::string;
std::string operator*() {
std::string line;
std::getline(in, line);
return line;
}
bool good() const {
return in.good();
}
};
template<int N, class T>
auto take(T &range){
std::array<typename T::value_type, N> value;
for (auto &e: value) { e = *range; }
return value;
}
(demo on cpp.sh)
I've made this class to read binary files and store their data.
FileInput.h:
#pragma once
#include <Windows.h>
#include <fstream>
using namespace std;
class FileInput
{
public:
FileInput(LPSTR Filename);
FileInput(LPWSTR Filename);
~FileInput();
operator char*();
explicit operator bool();
size_t Size;
private:
__forceinline void Read();
ifstream File;
char* Data;
};
FileInput.cpp
#include "FileInput.h"
FileInput::FileInput(LPSTR Filename)
{
File.open(Filename, ios::binary);
Read();
}
FileInput::FileInput(LPWSTR Filename)
{
File.open(Filename, ios::binary);
Read();
}
FileInput::~FileInput()
{
if (Data) delete[] Data;
}
FileInput::operator char*()
{
return Data;
}
FileInput::operator bool()
{
return (bool)Data;
}
void FileInput::Read()
{
if (!File)
{
Data = nullptr, Size = 0;
return;
}
File.seekg(0, ios::end);
Size = (size_t)File.tellg();
File.seekg(0, ios::beg);
Data = new char[Size];
File.read(Data, Size);
File.close();
}
Then I use it like this:
FileInput File(Filename); // This reads the file and allocates memory
if (!File) // This is for error checking
{
// Do something
}
if (File.Size >= sizeof(SomeType))
{
char FirstChar = File[0]; // Gets a single character
SomeStruct *pSomeStruct = reinterpret_cast<SomeStruct*>(&File[0]); // Gets a structure
}
So, is there any possibility that this class may be unsafe?
A reinterpret_cast<SomeStruct*>(&File) or other nonsense statement doesn't count.
EDIT: What I mean with unsafe is "to do unexpected or 'dangerous' things".
This is major overkill. If you want to copy a file into a buffer, you can use use an istreambuf_iterator:
std::ifstream inFile(fileName);
std::vector<char> fileBuffer ( (std::istreambuf_iterator<char>(inFile)),
std::istreambuf_iterator<char>() );
Then you can read from fileBuffer as necessary.
I want to read line by line from a file in C or C++, and I know how to do that when I assume some fixed size of a line, but is there a simple way to somehow calculate or get the exact size needed for a line or all lines in file? (Reading word by word until newline is also good for me if anyone can do it that way.)
If you use a streamed reader, all this will be hidden from you. See getline. The example below is based from the code here.
// getline with strings
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main () {
string str;
ifstream ifs("data.txt");
getline (ifs,str);
cout << "first line of the file is " << str << ".\n";
}
In C, if you have POSIX 2008 libraries (more recent versions of Linux, for example), you can use the POSIX getline() function. If you don't have the function in your libraries, you can implement it easily enough, which is probably better than inventing your own interface to do the job.
In C++, you can use std::getline().
Even though the two functions have the same basic name, the calling conventions and semantics are quite different (because the languages C and C++ are quite different) - except that they both read a line of data from a file stream, of course.
There isn't an easy way to tell how big the longest line in a file is - except by reading the whole file to find out, which is kind of wasteful.
I would use an IFStream and use getline to read from a file.
http://www.cplusplus.com/doc/tutorial/files/
int main () {
string line;
ifstream myfile ("example.txt");
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
cout << line << endl;
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}
You can't get the length of line until after you read it in. You can, however, read into a buffer repeatedly until you reach the end of line.
For programming in c, try using fgets to read in a line of code. It will read n characters or stop if it encounters a newline. You can read in a small buffer of size n until the last character in the string is the newline.
See the link above for more information.
Here is an example on how to read an display a full line of file using a small buffer:
#include <stdio.h>
#include <string.h>
int main()
{
FILE * pFile;
const int n = 5;
char mystring [n];
int lineLength = 0;
pFile = fopen ("myfile.txt" , "r");
if (pFile == NULL)
{
perror ("Error opening file");
}
else
{
do
{
fgets (mystring , n , pFile);
puts (mystring);
lineLength += strlen(mystring);
} while(mystring[strlen ( mystring)-1] != '\n' && !feof(pFile));
fclose (pFile);
}
printf("Line Length: %d\n", lineLength);
return 0;
}
In C++ you can use the std::getline function, which takes a stream and reads up to the first '\n' character. In C, I would just use fgets and keep reallocating a buffer until the last character is the '\n', then we know we have read the entire line.
C++:
std::ifstream file("myfile.txt");
std::string line;
std::getline(file, line);
std::cout << line;
C:
// I didn't test this code I just made it off the top of my head.
FILE* file = fopen("myfile.txt", "r");
size_t cap = 256;
size_t len = 0;
char* line = malloc(cap);
for (;;) {
fgets(&line[len], cap - len, file);
len = strlen(line);
if (line[len-1] != '\n' && !feof(file)) {
cap <<= 1;
line = realloc(line, cap);
} else {
break;
}
}
printf("%s", line);
getline is only POSIX, here is an ANSI (NO max-line-size info needed!):
const char* getline(FILE *f,char **r)
{
char t[100];
if( feof(f) )
return 0;
**r=0;
while( fgets(t,100,f) )
{
char *p=strchr(t,'\n');
if( p )
{
*p=0;
if( (p=strchr(t,'\r')) ) *p=0;
*r=realloc(*r,strlen(*r)+1+strlen(t));
strcat(*r,t);
return *r;
}
else
{
if( (p=strchr(t,'\r')) ) *p=0;
*r=realloc(*r,strlen(*r)+1+strlen(t));
strcat(*r,t);
}
}
return feof(f)?(**r?*r:0):*r;
}
and now it's easy and short in your main:
char *line,*buffer = malloc(100);
FILE *f=fopen("yourfile.txt","rb");
if( !f ) return;
setvbuf(f,0,_IOLBF,4096);
while( (line=getline(f,&buffer)) )
puts(line);
fclose(f);
free(buffer);
it works on windows for Windows AND Unix-textfiles,
it works on Unix for Unix AND Windows-textfiles
Here is a C++ way of reading the lines, using std algorithms and iterators:
#include <iostream>
#include <iterator>
#include <vector>
#include <algorithm>
struct getline :
public std::iterator<std::input_iterator_tag, std::string>
{
std::istream* in;
std::string line;
getline(std::istream& in) : in(&in) {
++*this;
}
getline() : in(0) {
}
getline& operator++() {
if(in && !std::getline(*in, line)) in = 0;
}
std::string operator*() const {
return line;
}
bool operator!=(const getline& rhs) const {
return !in != !rhs.in;
}
};
int main() {
std::vector<std::string> v;
std::copy(getline(std::cin), getline(), std::back_inserter(v));
}