C++ Make a file of a specific size - c++

Here is my current problem: I am trying to create a file of x MB in C++. The user will enter in the file name then enter in a number between 5 and 10 for the size of the file they want created. Later on in this project i'm gonna do other things with it but I'm stuck on the first step of creating the darn thing.
My problem code (so far):
char empty[1024];
for(int i = 0; i < 1024; i++)
{
empty[i] = 0;
}
fileSystem = fopen(argv[1], "w+");
for(int i = 0; i < 1024*fileSize; i++){
int temp = fputs(empty, fileSystem);
if(temp > -1){
//Sucess!
}
else{
cout<<"error"<<endl;
}
}
Now if i'm doing my math correctly 1 char is 1byte. There are 1024 bytes in 1KB and 1024KB in a MB. So if I wanted a 2 MB file, i'd have to write 1024*1024*2 bytes to this file. Yes?
I don't encounter any errors but I end up with an file of 0 bytes... I'm not sure what I'm doing wrong here so any help would be greatly appreciated!
Thanks!

Potentially sparse file
This creates output.img of size 300 MB:
#include <fstream>
int main()
{
std::ofstream ofs("ouput.img", std::ios::binary | std::ios::out);
ofs.seekp((300<<20) - 1);
ofs.write("", 1);
}
Note that technically, this will be a good way to trigger your filesystem's support for sparse files.
Dense file - filled with 0's
Functionally identical to the above, but filling the file with 0's:
#include <iostream>
#include <fstream>
#include <vector>
int main()
{
std::vector<char> empty(1024, 0);
std::ofstream ofs("ouput.img", std::ios::binary | std::ios::out);
for(int i = 0; i < 1024*300; i++)
{
if (!ofs.write(&empty[0], empty.size()))
{
std::cerr << "problem writing to file" << std::endl;
return 255;
}
}
}

Your code doesn't work because you are using fputs which writes a null-terminated string into the output buffer. But you are trying to write all nulls, so it stops right when it looks at the first byte of your string and ends up writing nothing.
Now, to create a file of a specific size, all you need to do is to call truncate function (or _chsiz for Windows) exactly once and set what size you want the file to be.
Good luck!

To make a 2MB file you have to seek to 2*1024*1024 and write 0 bytes. fput()ting empty string will do no good no matter how many time. And the string is empty, because strings a 0-terminated.

Related

C++ storing 0 and 1 more efficiently, like in a binary file?

I want to store multiple arrays which all entries consist of either 0 or 1.
This file would be quite large if i do it the way i do it.
I made a minimalist version of what i currently do.
#include <iostream>
#include <fstream>
using namespace std;
int main(){
ofstream File;
File.open("test.csv");
int array[4]={1,0,0,1};
for(int i = 0; i < 4; ++i){
File << array[i] << endl;
}
File.close();
return 0;
}
So basically is there a way of storing this in a binary file or something, since my data is 0 or 1 in the first place anyways?
If yes, how to do this? Can i also still have line-breaks and maybe even commas in that file? If either of the latter does not work, that's also fine. Just more importantly, how to store this as a binary file which has only 0 and 1 so my file is smaller.
Thank you very much!
So basically is there a way of storing this in a binary file or something, since my data is 0 or 1 in the first place anyways? If yes, how to do this? Can i also still have line-breaks and maybe even commas in that file? If either of the latter does not work, that's also fine. Just more importantly, how to store this as a binary file which has only 0 and 1 so my file is smaller.
The obvious solution is to take 64 characters, say A-Z, a-z, 0-9, and + and /, and have each character code for six entries in your table. There is, in fact, a standard for this called Base64. In Base64, A encodes 0,0,0,0,0,0 while / encodes 1,1,1,1,1,1. Each combination of six zeroes or ones has a corresponding character.
This still leaves commas, spaces, and newlines free for your use as separators.
If you want to store the data as compactly as possible, I'd recommend storing it as binary data, where each bit in the binary file represents one boolean value. This will allow you to store 8 boolean values for each byte of disk space you use up.
If you want to store arrays whose lengths are not multiples of 8, it gets a little bit more complicated since you can't store a partial byte, but you can solve that problem by storing an extra byte of meta-data at the end of the file that specifies how many bits of the final data-byte are valid and how many are just padding.
Something like this:
#include <iostream>
#include <fstream>
#include <cstdint>
#include <vector>
using namespace std;
// Given an array of ints that are either 1 or 0, returns a packed-array
// of uint8_t's containing those bits as compactly as possible.
vector<uint8_t> packBits(const int * array, size_t arraySize)
{
const size_t vectorSize = ((arraySize+7)/8)+1; // round up, then +1 for the metadata byte
vector<uint8_t> packedBits;
packedBits.resize(vectorSize, 0);
// Store 8 boolean-bits into each byte of (packedBits)
for (size_t i=0; i<arraySize; i++)
{
if (array[i] != 0) packedBits[i/8] |= (1<<(i%8));
}
// The last byte in the array is special; it holds the number of
// valid bits that we stored to the byte just before it.
// That way if the number of bits we saved isn't an even multiple of 8,
// we can use this value later on to calculate exactly how many bits we should restore
packedBits[vectorSize-1] = arraySize%8;
return packedBits;
}
// Given a packed-bits vector (i.e. as previously returned by packBits()),
// returns the vector-of-integers that was passed to the packBits() call.
vector<int> unpackBits(const vector<uint8_t> & packedBits)
{
vector<int> ret;
if (packedBits.size() < 2) return ret;
const size_t validBitsInLastByte = packedBits[packedBits.size()-1]%8;
const size_t numValidBits = 8*(packedBits.size()-((validBitsInLastByte>0)?2:1)) + validBitsInLastByte;
ret.resize(numValidBits);
for (size_t i=0; i<numValidBits; i++)
{
ret[i] = (packedBits[i/8] & (1<<(i%8))) ? 1 : 0;
}
return ret;
}
// Returns the size of the specified file in bytes, or -1 on failure
static ssize_t getFileSize(ifstream & inFile)
{
if (inFile.is_open() == false) return -1;
const streampos origPos = inFile.tellg(); // record current seek-position
inFile.seekg(0, ios::end); // seek to the end of the file
const ssize_t fileSize = inFile.tellg(); // record current seek-position
inFile.seekg(origPos); // so we won't change the file's read-position as a side effect
return fileSize;
}
int main(){
// Example of packing an array-of-ints into packed-bits form and saving it
// to a binary file
{
const int array[]={0,0,1,1,1,1,1,0,1,0};
// Pack the int-array into packed-bits format
const vector<uint8_t> packedBits = packBits(array, sizeof(array)/sizeof(array[0]));
// Write the packed-bits to a binary file
ofstream outFile;
outFile.open("test.bin", ios::binary);
outFile.write(reinterpret_cast<const char *>(&packedBits[0]), packedBits.size());
outFile.close();
}
// Now we'll read the binary file back in, unpack the bits to a vector<int>,
// and print out the contents of the vector.
{
// open the file for reading
ifstream inFile;
inFile.open("test.bin", ios::binary);
const ssize_t fileSizeBytes = getFileSize(inFile);
if (fileSizeBytes < 0)
{
cerr << "Couldn't read test.bin, aborting" << endl;
return 10;
}
// Read in the packed-binary data
vector<uint8_t> packedBits;
packedBits.resize(fileSizeBytes);
inFile.read(reinterpret_cast<char *>(&packedBits[0]), fileSizeBytes);
// Expand the packed-binary data back out to one-int-per-boolean
vector<int> unpackedInts = unpackBits(packedBits);
// Print out the int-array's contents
cout << "Loaded-from-disk unpackedInts vector is " << unpackedInts.size() << " items long:" << endl;
for (size_t i=0; i<unpackedInts.size(); i++) cout << unpackedInts[i] << " ";
cout << endl;
}
return 0;
}
(You could probably make the file even more compact than that by running zip or gzip on the file after you write it out :) )
You can indeed write and read binary data. However having line breaks and commas would be difficult. Imagine you save your data as boolean data, so only ones and zeros. Then having a comma would mean you need an special character, but you have only ones and zeros!. The next best thing would be to make an object of two booleans, one meaning the usual data you need (c++ would then read the data in pairs of bits), and the other meaning whether you have a comma or not, but I doubt this is what you need. If you want to do something like a csv, then it would be easy to just fix the size of each column (int would be 4 bytes, a string of no more than 32 char for example), and then just read and write accordingly. Suppose you have your binary
To initially save your array of the an object say pets, then you would use
FILE *apFile;
apFile = fopen(FILENAME,"w+");
fwrite(ARRAY_OF_PETS, sizeof(Pet),SIZE_OF_ARRAY, apFile);
fclose(apFile);
To access your idx pet, you would use
Pet m;
ifstream input_file (FILENAME, ios::in|ios::binary|ios::ate);
input_file.seekg (sizeof(Pet) * idx, ios::beg);
input_file.read((char*) &m,sizeof(Pet));
input_file.close();
You can also add data add the end, change data in the middle and so on.

C++ reading large files part by part

I've been having a problem that I not been able to solve as of yet. This problem is related to reading files, I've looked at threads even on this website and they do not seem to solve the problem. That problem is reading files that are larger than a computers system memory. Simply when I asked this question a while ago I was referred too using the following code.
string data("");
getline(cin,data);
std::ifstream is (data);//, std::ifstream::binary);
if (is)
{
// get length of file:
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
// allocate memory:
char * buffer = new char [length];
// read data as a block:
is.read (buffer,length);
is.close();
// print content:
std::cout.write (buffer,length);
delete[] buffer;
}
system("pause");
This code works well apart from the fact that it eats memory like fat kid in a candy store.
So after a lot of ghetto and unrefined programing, I was able to figure out a way to sort of fix the problem. However I more or less traded one problem for another in my quest.
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
#include <stdio.h>
#include <stdlib.h>
#include <iomanip>
#include <windows.h>
#include <cstdlib>
#include <thread>
using namespace std;
/*======================================================*/
string *fileName = new string("tldr");
char data[36];
int filePos(0); // The pos of the file
int tmSize(0); // The total size of the file
int split(32);
char buff;
int DNum(0);
/*======================================================*/
int getFileSize(std::string filename) // path to file
{
FILE *p_file = NULL;
p_file = fopen(filename.c_str(),"rb");
fseek(p_file,0,SEEK_END);
int size = ftell(p_file);
fclose(p_file);
return size;
}
void fs()
{
tmSize = getFileSize(*fileName);
int AX(0);
ifstream fileIn;
fileIn.open(*fileName, ios::in | ios::binary);
int n1,n2,n3;
n1 = tmSize / 32;
// Does the processing
while(filePos != tmSize)
{
fileIn.seekg(filePos,ios_base::beg);
buff = fileIn.get();
// To take into account small files
if(tmSize < 32)
{
int Count(0);
char MT[40];
if(Count != tmSize)
{
MT[Count] = buff;
cout << MT[Count];// << endl;
Count++;
}
}
// Anything larger than 32
else
{
if(AX != split)
{
data[AX] = buff;
AX++;
if(AX == split)
{
AX = 0;
}
}
}
filePos++;
}
int tz(0);
filePos = filePos - 12;
while(tz != 2)
{
fileIn.seekg(filePos,ios_base::beg);
buff = fileIn.get();
data[tz] = buff;
tz++;
filePos++;
}
fileIn.close();
}
void main ()
{
fs();
cout << tmSize << endl;
system("pause");
}
What I tried to do with this code is too work around the memory issue. Rather than allocating memory for a large file that simply does not exist on a my system, I tried to use the memory I had instead which is about 8gb, but I only wanted to use maybe a few Kilobytes of it if at all possible.
To give you a layout of what I am talking about I am going to write a line of text.
"Hello my name is cake please give me cake"
Basically what I did was read said piece of text letter by letter. Then I put those letters into a box that could store 32 of them, from there I could use something like xor and then write them onto another file.
The idea in a way works but it is horribly slow and leaves off parts of files.
So basically how can I make something like this work without going slow or cutting off files. I would love to see how xor works with very large files.
So if anyone has a better idea than what I have, then I would be very grateful for the help.
To read and process the file piece-by-piece, you can use the following snippet:
// Buffer size 1 Megabyte (or any number you like)
size_t buffer_size = 1<<20;
char *buffer = new char[buffer_size];
std::ifstream fin("input.dat");
while (fin)
{
// Try to read next chunk of data
fin.read(buffer, buffer_size);
// Get the number of bytes actually read
size_t count = fin.gcount();
// If nothing has been read, break
if (!count)
break;
// Do whatever you need with first count bytes in the buffer
// ...
}
delete[] buffer;
The buffer size of 32 bytes, as you are using, is definitely too small. You make too many calls to library functions (and the library, in turn, makes calls (although probably not every time) to OS, which are typically slow, since they cause context-switching). There is also no need of tell/seek.
If you don't need all the file content simultaneously, reduce the working set first - like a set of about 32 words, but since XOR can be applied sequentially, you may further simplify the working set with constant size, like 4 kilo-bytes.
Now, you have the option to use file reader is.read() in a loop and process a small set of data each iteration, or use memmap() to map the file content as memory pointer which you can perform both read and write operations.

Does std::ofstream write sequential data to disk or does it only use the disk's free space?

Lets say that I want to create a 1 GB binary file of all 1's for whatever reason using ofstream. And lets say for the sake of argument that the particular drive I'm going to be creating this file on is heavily fragmented and it only has 1 GB of free space left on the disk. Here's a basic example of what it would look like:
#include <fstream>
#include <cstdint>
int main (int argv, char *argc[])
{
int length_of_file = 1073741823; // 1024^3 - 1 bytes
uint8_t data_block = 0xff;
std::ofstream os;
os.open("C:\\foo.bin", std::ios::trunc | std::ios::binary);
while (os.is_good()) {
for (int i = 0; i < length_of_file; ++i) {
os.write(data_block, sizeof(uint8_t));
}
}
os.close();
return 0;
}
This should write 1 GB worth of 1's to the file "foo.bin", but if ofstream writes sequential data to the drive then this would overwrite files on the disk with 1's.
So my question is: will this method overwrite any files in the hard drive?
No, this method won't overwrite any files on your hard drive (other than C:\foo.bin). The OS ensures that your files are independent. Most likely you will get an error during your run where the disk drive complains about running out of space and your drive will be nearly completely full.
Note, the way you've structured your loops is a bit strange and probably not what you intended. You probably want to eliminate your outer loop and move the call to os.is_good() into the inner loop:
for (int i = 0; os.is_good() && i < length_of_file; ++i) {
os.write(data_block, sizeof(uint8_t));
}

How to create a text file of specific size c++ [duplicate]

This question already has answers here:
C++ Make a file of a specific size
(3 answers)
Closed 9 years ago.
I have to create a text file of specific size, user enters the size. All I need to know is how to make a file faster. Currently creating a 10mb file takes about 15 seconds. I have to decrease this to max 5 seconds. How can I do that? Currently this is how I am making my file
void create_file()
{
int size;
cout<<"Enter size of File in MB's : ";
cin>>file_size;
size = 1024*1024*file_size; // 1MB = 1024 * 1024 bytes
ofstream pFILE("my_file.txt", ios::out);
for(int i=0; i<size; i++) //outputting spces to create file
pFILE<<' ';
pFILE.close();
}
Update, this is what I am using now, but I get garbage value written to the file as well,
void f_c()
{
int i, size;
cin>>size;
FILE * new_file = fopen("FILE TEST.txt", "w");
char buffer[1024];
memset(buffer,' ', 1024);
for(i = 0; i<1024 * size; i++)
fputs(buffer, new_file);
getchar();
}
You are filling it one character at a time. Instead, you could allocate a larger chunk of memory using new and then write larger chunks at once to speed up the process. You could use memset on the allocated memory to prevent having bytes characters in the memory. But also look at the comment about the duplicate question, there are even faster methods if the file needn't have specific content initially.
Here a simple sample, but without error checking.
Lets say you want size of 1000:
#include <fstream>
int main () {
int size = 1000;
const char* filename= "file.txt";
std::ofstream fout(filename);
fout.fill (' ');
fout.width (size);
fout << " ";
return 0;
}
You don't have to fill all of bytes in file if you just want to create a big file, that make it slow, just tell the file-system how big you want
On Linux, use truncate
truncate("data.txt",1024*1024*1024);
Windows use SetFilePointer
SetFilePointer (hFile, 1024*1024*1024, NULL, FILE_BEGIN);
both of them can create uninitialized file of some Gigabytes in less than seconds.

What is the proper method of reading and parsing data files in C++?

What is an efficient, proper way of reading in a data file with mixed characters? For example, I have a data file that contains a mixture of data loaded from other files, 32-bit integers, characters and strings. Currently, I am using an fstream object, but it gets stopped once it hits an int32 or the end of a string. if i add random data onto the end of the string in the data file, it seems to follow through with the rest of the file. This leads me to believe that the null-termination added onto strings is messing it up. Here's an example of loading in the file:
void main()
{
fstream fin("C://mark.dat", ios::in|ios::binary|ios::ate);
char *mymemory = 0;
int size;
size = 0;
if (fin.is_open())
{
size = static_cast<int>(fin.tellg());
mymemory = new char[static_cast<int>(size+1)];
memset(mymemory, 0, static_cast<int>(size + 1));
fin.seekg(0, ios::beg);
fin.read(mymemory, size);
fin.close();
printf(mymemory);
std::string hithere;
hithere = cin.get();
}
}
Why might this code stop after reading in an integer or a string? How might one get around this? Is this the wrong approach when dealing with these types of files? Should I be using fstream at all?
Have you ever considered that the file reading is working perfectly and it is printf(mymemory) that is stopping at the first null?
Have a look with the debugger and see if I am right.
Also, if you want to print someone else's buffer, use puts(mymemory) or printf("%s", mymemory). Don't accept someone else's input for the format string, it could crash your program.
Try
for (int i = 0; i < size ; ++i)
{
// 0 - pad with 0s
// 2 - to two zeros max
// X - a Hex value with capital A-F (0A, 1B, etc)
printf("%02X ", (int)mymemory[i]);
if (i % 32 == 0)
printf("\n"); //New line every 32 bytes
}
as a way to dump your data file back out as hex.