I am taking input from a file in binary mode using C++; I read the data into unsigned ints, process them, and write them to another file. The problem is that sometimes, at the end of the file, there might be a little bit of data left that isn't large enough to fit into an int; in this case, I want to pad the end of the file with 0s and record how much padding was needed, until the data is large enough to fill an unsigned int.
Here is how I am reading from the file:
std::ifstream fin;
fin.open('filename.whatever', std::ios::in | std::ios::binary);
if(fin) {
unsigned int m;
while(fin >> m) {
//processing the data and writing to another file here
}
//TODO: read the remaining data and pad it here prior to processing
} else {
//output to error stream and exit with failure condition
}
The TODO in the code is where I'm having trouble. After the file input finishes and the loop exits, I need to read in the remaining data at the end of the file that was too small to fill an unsigned int. I need to then pad the end of that data with 0's in binary, recording enough about how much padding was done to be able to un-pad the data in the future.
How is this done, and is this already done automatically by C++?
NOTE: I cannot read the data into anything but an unsigned int, as I am processing the data as if it were an unsigned integer for encryption purposes.
EDIT: It was suggested that I simply read what remains into an array of chars. Am I correct in assuming that this will read in ALL remaining data from the file? It is important to note that I want this to work on any file that C++ can open for input and/or output in binary mode. Thanks for pointing out that I failed to include the detail of opening the file in binary mode.
EDIT: The files my code operates on are not created by anything I have written; they could be audio, video, or text. My goal is to make my code format-agnostic, so I can make no assumptions about the amount of data within a file.
EDIT: ok, so based on constructive comments, this is something of the approach I am seeing, documented in comments where the operations would take place:
std::ifstream fin;
fin.open('filename.whatever', std::ios::in | std::ios::binary);
if(fin) {
unsigned int m;
while(fin >> m) {
//processing the data and writing to another file here
}
//1: declare Char array
//2: fill it with what remains in the file
//3: fill the rest of it until it's the same size as an unsigned int
} else {
//output to error stream and exit with failure condition
}
The question, at this point, is this: is this truly format-agnostic? In other words, are bytes used to measure file size as discrete units, or can a file be, say, 11.25 bytes in size? I should know this, I know, but I've got to ask it anyway.
Are bytes used to measure file size as discrete units, or can a file be, say, 11.25 bytes in size?
No data type can be less than a byte, and your file is represented as an array of char meaning each character is one byte. Thus it is impossible to not get a whole number measure in bytes.
Here is step one, two, and three as per your post:
while (fin >> m)
{
// ...
}
std::ostringstream buffer;
buffer << fin.rdbuf();
std::string contents = buffer.str();
// fill with 0s
std::fill(contents.begin(), contents.end(), '0');
Related
Right so I am working on reading a large text file (100K+ Characters) and storing it into a large character array.
I have been brainstorming to find the best possible algorithm to do this. Now I know that file reading is an expensive operation in terms of time complexity.
So far I have come up with this using C++:
//Opening the text file from the path given
char * dataset;
int dataLength;
ifstream in(path);
if (!in) {
cout << "Could not open the file." << endl;
exit(-1);
}
//Calculating the number of characters in the file
in.ignore(numeric_limits<streamsize>::max());
dataLength = in.gcount();
//Clearing since EOF bit is set after ignore method above
in.clear();
in.seekg(0, ifstream::beg);
//Initializing the dataset array and copying the data
dataset = new T[dataLength + 1];
in.read(dataset, dataLength);
dataset[dataLength] = '\0';
I am not reading character by character as that renders a very costly operation. Now further trying to improve it, I thought if there would be a way to not read the file twice.
In my algo, I have to read it first (using .ignore) to count the number of characters so I can make a dynamic character array. And then I have to read it to store it into the character array.
Is there any way that when I read the file the first time, I could store the data temporarily into a stream, initialize an array and then ask the stream to copy data into my data structure (array)
I have 640*480 numbers. I need to write them into a file. I will need to read them later. What is the best solution? Numbers are between 0 - 255.
For me the best solution is to write them binary(8 bits). I wrote the numbers into txt file and now it looks like 1011111010111110 ..... So there are no questions where the number starts and ends.
How am I supposed to read them from the file?
Using c++
It's not good idea to write bit values like 1 and 0 to text file. The file size will bigger in 8 times. 1 byte = 8 bits. You have to store bytes, 0-255 - is byte. So your file will have size 640*480 bytes instead of 640*480*8. Every symbol in text file has size of 1 byte minimum. If you want to get bits, use binary operators of programming language that you use. To read bytes much easier. Use binary file for saving your data.
Presumably you have some sort of data structure representing your image, which somewhere inside holds the actual data:
class pixmap
{
public:
// stuff...
private:
std::unique_ptr<std::uint8_t[]> data;
};
So you can add a new constructor which takes a filename and reads bytes from that file:
pixmap(const std::string& filename)
{
constexpr int SIZE = 640 * 480;
// Open an input file stream and set it to throw exceptions:
std::ifstream file;
file.exceptions(std::ios_base::badbit | std::ios_base::failbit);
file.open(filename.c_str());
// Create a unique ptr to hold the data: this will be cleaned up
// automatically if file reading throws
std::unique_ptr<std::uint8_t[]> temp(new std::uint8_t[SIZE]);
// Read SIZE bytes from the file
file.read(reinterpret_cast<char*>(temp.get()), SIZE);
// If we get to here, the read worked, so we move the temp data we've just read
// into where we'd like it
data = std::move(temp); // or std::swap(data, temp) if you prefer
}
I realise I've assumed some implementation details here (you might not be using a std::unique_ptr to store the underlying image data, though you probably should be) but hopefully this is enough to get you started.
You can print the number between 0-255 as the char value in the file.
See the below code. in this example I am printing integer 70 as char.
So this result in print as 'F' on the console.
Similarly you can read it as char and then convert this char to integer.
#include <stdio.h>
int main()
{
int i = 70;
char dig = (char)i;
printf("%c", dig);
return 0;
}
This way you can restrict the file size.
string lineValue;
ifstream myFile("file.txt");
if (myFile.is_open()) {
//getline(myFile, lineValue);
//cout<<lineValue;
while (getline(myFile, lineValue)) {
cout << lineValue << '\n';
}
myFile.close();
}
else cout << "Unable to open file";
The txt file formate is like this
0 1
1 2
2 3
3 4
4 5
5 5
6 6
7 7
8 8
9 9
Above code is reading data from a text file line by line, but the text file size is quite large (10GB).
So how to read data from the file in chunks/blocks with less I/O and efficiently ?
If you are thinking of reading in large chunks of data then you will be using a technique called buffering. However, ifstream already provides buffering so my first step would be to see if you can get ifstream doing the job for you.
I would set a much larger buffer than the default in you're ifstream. Something like
const int BUFSIZE = 65536;
std::unique_ptr<char> buffer(new char[BUFSIZE]);
std::ifstream is;
is.rdbuf()->pubsetbuf(buffer.get(), BUFSIZE);
is.open(filename.c_str());
const int LINESIZE = 256;
char line[LINESIZE];
if (is) {
for (;;) {
is.getline(line, LINESIZE);
// check for errors and do other work here, (and end loop at some point!)
}
}
is.close();
Make sure your buffer lives as long as the ifstream object that uses it.
If you find the speed of this is still insufficient, then you can try reading chunks of data with ifstream::read. There is no guarantee it will be faster, you'll have to time and compare the options. You use ifstream::read something like this.
const int BUFSIZE = 65536;
std::unique_ptr<char> buffer(new char[BUFSIZE]);
is.read(buffer.get(), BUFSIZE);
You'll have to take care writing the code to call ifstream.read taking care to deal with the fact that a 'line' of input may get split across consecutive blocks (or even across more than two blocks depending upon your data and buffer size). That's why you want to modify ifstream's buffer as you're first option.
If and only if the text lines are the same length, you could simply read the file in using std::istream::read();.
The size of the block to read would be:
block_size = text_line_length * number_of_text_lines;
If you are brave enough to handle more complexity or your text lines are not equal lengths, you could read an arbitrary length of characters into a vector and process the text from the vector.
The complexities come into play when a text line overflows a block. Think of handling the case where only part of the sentence is available at the end of the block.
Does anyone know how to read in a file with raw encoding? So stumped.... I am trying to read in floats or doubles (I think). I have been stuck on this for a few weeks. Thank you!
File that I am trying to read from:
http://www.sci.utah.edu/~gk/DTI-data/gk2/gk2-rcc-mask.raw
Description of raw encoding:
hello://teem.sourceforge.net/nrrd/format.html#encoding (change hello to http to go to page)
- "raw" - The data appears on disk exactly the same as in memory, in terms of byte values and byte ordering. Produced by write() and fwrite(), suitable for read() or fread().
Info of file:
http://www.sci.utah.edu/~gk/DTI-data/gk2/gk2-rcc-mask.nhdr - I think the only things that matter here are the big endian (still trying to understand what that means from google) and raw encoding.
My current approach, uncertain if it's correct:
//Function ripped off from example of c++ ifstream::read reference page
void scantensor(string filename){
ifstream tdata(filename, ifstream::binary); // not sure if I should put ifstream::binary here
// other things I tried
// ifstream tdata(filename) ifstream tdata(filename, ios::in)
if(tdata){
tdata.seekg(0, tdata.end);
int length = tdata.tellg();
tdata.seekg(0, tdata.beg);
char* buffer = new char[length];
tdata.read(buffer, length);
tdata.close();
double* d;
d = (double*) buffer;
} else cerr << "failed" << endl;
}
/* P.S. I attempted to print the first 100 elements of the array.
Then I print 100 other elements at some arbitrary array indices (i.e. 9,900 - 10,000). I actually kept increasing the number of 0's until I ran out of bound at 100,000,000 (I don't think that's how it works lol but I was just playing around to see what happens)
Here's the part that makes me suspicious: so the ifstream different has different constructors like the ones I tried above.
the first 100 values are always the same.
if I use ifstream::binary, then I get some values for the 100 arbitrary printing
if I use the other two options, then I get -6.27744e+066 for all 100 of them
So for now I am going to assume that ifstream::binary is the correct one. The thing is, I am not sure if the file I provided is how binary files actually look like. I am also unsure if these are the actual numbers that I am supposed to read in or just casting gone wrong. I do realize that my casting from char* to double* can be unsafe, and I got that from one of the threads.
*/
I really appreciate it!
Edit 1: Right now the data being read in using the above method is apparently "incorrect" since in paraview the values are:
Dxx,Dxy,Dxz,Dyy,Dyz,Dzz
[0, 1], [-15.4006, 13.2248], [-5.32436, 5.39517], [-5.32915, 5.96026], [-17.87, 19.0954], [-6.02961, 5.24771], [-13.9861, 14.0524]
It's a 3 x 3 symmetric matrix, so 7 distinct values, 7 ranges of values.
The floats that I am currently parsing from the file right now are very large (i.e. -4.68855e-229, -1.32351e+120).
Perhaps somebody knows how to extract the floats from Paraview?
Since you want to work with doubles, I recommend to read the data from file as buffer of doubles:
const long machineMemory = 0x40000000; // 1 GB
FILE* file = fopen("c:\\data.bin", "rb");
if (file)
{
int size = machineMemory / sizeof(double);
if (size > 0)
{
double* data = new double[size];
int read(0);
while (read = fread(data, sizeof(double), size, file))
{
// Process data here (read = number of doubles)
}
delete [] data;
}
fclose(file);
}
I'm trying to copy a file, but whatever I try, the copy seems to be a few bytes short.
_file is an ifstream set to binary mode.
void FileProcessor::send()
{
//If no file is opened return
if(!_file.is_open()) return;
//Reset position to beginning
_file.seekg(0, ios::beg);
//Result buffer
char * buffer;
char * partBytes = new char[_bufferSize];
//Packet *p;
//Read the file and send it over the network
while(_file.read(partBytes,_bufferSize))
{
//buffer = Packet::create(Packet::FILE,std::string(partBytes));
//p = Packet::create(buffer);
//cout<< p->getLength() << "\n";
//writeToFile(p->getData().c_str(),p->getLength());
writeToFile(partBytes,_bufferSize);
//delete[] buffer;
}
//cout<< *p << "\n";
delete [] partBytes;
}
_writeFile is the file to be written to.
void FileProcessor::writeToFile(const char *buffer,unsigned int size)
{
if(_writeFile.is_open())
{
_writeFile.write(buffer,size);
_writeFile.flush();
}
}
In this case I'm trying to copy a zip file.
But opening both the original and copy in notepad I noticed that while they look identical , It's different at the end where the copy is missing a few bytes.
Any suggestions?
You are assuming that the file's size is a multiple of _bufferSize. You have to check what's left on the buffer after the while:
while(_file.read(partBytes,_bufferSize)) {
writeToFile(partBytes,_bufferSize);
}
if(_file.gcount())
writeToFile(partBytes, _file.gcount());
Your while loop will terminate when it fails to read _bufferSize bytes because it hits an EOF.
The final call to read() might have read some data (just not a full buffer) but your code ignores it.
After your loop you need to check _file.gcount() and if it is not zero, write those remaining bytes out.
Are you copying from one type of media to another? Perhaps different sector sizes are causing the apparent weirdness.
What if _bufferSize doesn't divide evenly into the size of the file...that might cause extra bytes to be written.
You don't want to always do writeToFile(partBytes,_bufferSize); since it's possible (at the end) that less than _bufferSize bytes were read. Also, as pointed out in the comments on this answer, the ifstream is no longer "true" once the EOF is reached, so the last chunk isn't copied (this is your posted problem). Instead, use gcount() to get the number of bytes read:
do
{
_file.read(partBytes, _bufferSize);
writeToFile(partBytes, (unsigned int)_file.gcount());
} while (_file);
For comparisons of zip files, you might want to consider using a non-text editor to do the comparison; HxD is a great (free) hex editor with a file compare option.