ofstream writes null characters on a file in binary mode - c++

We are facing an issue with device reboot. We are running our application in linux os on a raspberry pi board. We are maintaining a log file to which we are appending the records every 10sec with below code. One write can have one or more records in the pBuffer.
bool FileOP::Append(const std::string & PathName, const char * pBuffer, uint64_t Size)
{
bool AppendSuccessful = false;
std::ofstream File;
try
{
File.exceptions(std::ofstream::badbit | std::ofstream::failbit);
File.open(PathName.c_str(), std::ofstream::out | std::ofstream::binary | std::ofstream::app);
File.write(pBuffer, Size);
File.close();
AppendSuccessful = true;
}
catch (std::exception & e)
{
std::cout << "Error when appending string to file: " << PathName
<< std::strerror(errno) << " Exception : " << e.what() << std::endl;
}
return AppendSuccessful;
}
We have observed that when we write the data and exactly on that time if we reboot the board(remove power), we are getting a record with complete NULL characters. File size will be increased based on the record size, for example if we write 100 bytes file size will be header size(100) + old data size (100) + new data(100) = 300bytes. When we try to read the file we are getting last 100 bytes full of NULL characters.
If the record is not written completely how the file size is increasing?
How exactly the record is filled with NULL? we have verified that every new record written does not contain NULL characters.

This will depend on the filesystem in use, but what is likely happening here is that the filesystem is committing the change to the file metadata (in this case, its length) before all the data is written. If you require the file to be consistent even in case of a crash, and are using ext4, try mounting with the data=journal option. Note that this has performance impacts due to disabling delayed allocation.

If the record is not written completely how the file size is increasing?
If null bytes are written, those increase file size just as much as any other byte.
How exactly the record is filled with NULL?
It can potentially happens on the following line:
File.write(pBuffer, Size);
If pBuffer contains null chars, then those null chars are written to the file.

Related

Cannot serve png files and other binary files in hobby HTTP server

I am writing a HTTP server in C++, and serving static files is mostly OK, however when reading .PNG files or other binary's, every method I have tried fails. My main problem is when I open up Dev tools, reading a example image would give a transferred size of 29.56kb, and a size of 29.50 kb for my current method. The sizes given also do not match up with what du-sh give, which is 32kb.
My first method was to push the contents of a file onto a string, and call a function to serve that. However, this would also server ~6kb if memory serves correctly.
My current method is to read the file using std::ifstream in binary mode. I am getting the size of the file using C++17's filesystem header and using std::filesystem::file_size. I read the contents into a buffer and then call a function to send the buffer contents 1 byte at a time
void WebServer::sendContents(std::string contents) {
if (send(this->newFd, contents.c_str(), strlen(contents.c_str()), 0) == -1) {
throw std::runtime_error("Server accept: " + std::string(strerror(errno)));
}
}
void WebServer::sendFile(std::string path) {
path = "./" + path;
std::string fileCont; //File contents
std::string mimeType; //Mime type of the file
std::string contLength;
std::string::size_type idx = path.rfind('.');
if (idx != std::string::npos) mimeType = this->getMimeType(path.substr(idx + 1));
else mimeType = "text/html";
std::filesystem::path reqPath = std::filesystem::path("./" + path).make_preferred();
std::filesystem::path parentPath = std::filesystem::path("./");
std::filesystem::path actualPath = std::filesystem::canonical(parentPath / reqPath);
if (!this->isSubDir(actualPath, parentPath)) { this->sendRoute("404"); return; }
std::ifstream ifs;
ifs.open(actualPath, std::ios::binary);
if (ifs.is_open()) {
//Get the size of the static file being server
std::filesystem::path staticPath{path};
std::size_t length = std::filesystem::file_size(staticPath);
char* buffer = new char[length];
*buffer = { 0 }; //Initalize the buffer that will send the static file
ifs.read(buffer, sizeof(char) * length); //Read the buffer
std::string resp = "HTTP/1.0 200 OK\r\n"
"Server: webserver-c\r\n"
"Content-Length" + std::to_string(length) + "\r\n"
"Content-type: " + mimeType + "\r\n\r\n";
if (!ifs) std::cout << "Error! Only " << std::string(ifs.gcount()) << " could be read!" << std::endl;
this->sendContents(resp); //Send the headers
for (size_t i=0; i < length; i++) {
std::string byte = std::string(1, buffer[i]);
this->sendContents(byte);
}
delete buffer; //We do not need megs of memory stack up, that shit will grow quick
buffer = nullptr;
} else {
this->sendContents("HTTP/1.1 500 Error\r\nContent-Length: 0\r\nConnection: keep-alive\r\n\r\n"); return;
}
ifs.close();
}
It should be noted that this->newFd is a socket descriptor
It should also be noted that I have tried to take a look at this question here, however the same problem still occurs for me
if (send(this->newFd, contents.c_str(), strlen(contents.c_str()), 0) == -1) {
There are two bugs for the price of one, here.
This is used to send the contents of the binary file. One byte at a time. sendContents gets used, apparently, to send one byte at a time, here. This is horribly inefficient, but it's not the bug. The first bug is as follows.
Your binary file has plenty of bytes that are 00.
In that case, contents will proudly contain this 00 byte, here. c_str() returns a pointer to it. strlen() then reaches the conclusion that it is receiving an empty string, for input, and make a grandiose announcement that the string contains 0 characters.
In the end, send's third parameter will be 0.
No bytes will get sent, at all, instead of the famous 00 byte.
The second bug will come into play once the inefficient algorithm gets fixed, and sendContents gets used to send more than one byte at a time.
send() holds a secret: this system call may return other values, other than -1 to indicate the failure. Such as the actual number of bytes that were sent. So, if send() was called to send, say, 100 bytes, it may decide so send only 30 bytes, return 30, and leaving you holding the bag with the remaining 70 unsent bytes.
This is actually, already, an existing bug in the shown code. sendContents() also gets used to send the entire resp string. Which is, eh, in the neighborhood of a 100 bytes. Give or take a dozen.
You are relying on this house of cards: of send() always doing its job complete job, in this particular case, not slacking off, and actually sending the entire HTTP/1.0 response string.
But, send() is a famous slacker, and you have no guarantees, whatsoever, that this will actually happen. And I have it on good authority that an upcoming Friday the 13th your send() will decide to slack off, all of a sudden.
So, to fix the shown code:
Implement the appropriate logic to handle the return value from send().
Do not use c_str(), followed by strlen(), because: A) it's broken, for strings containing binary data, B) this elaborate routine simply reinvents a wheel called size(). You will be happy to know that size() does exactly what it's name claims to be.
One other bug:
char* buffer = new char[length];
It is possible for an exception to get thrown from the subsequent code. This memory get leaked, because delete does not get called.
C++ gurus know a weird trick: they rarely use new, but instead use containers, like std::vector, and they don't have to worry about leaking memory, because of that.

C++ get the size of a file while it's being written to

I have a recording application that is reading data from a network stream and writing it to file. It works very well, but I would like to display the file size as the data is being written. Every second the gui thread updates the status bar to update the displayed time of recording. At this point I would also like to display the current file size.
I originally consulted this question and have tried both the stat method:
struct stat stat_buf;
int rc = stat(recFilename.c_str(), &stat_buf);
std::cout << recFilename << " " << stat_buf.st_size << "\n";
(no error checking for simplicity) and the fseek method:
FILE *p_file = NULL;
p_file = fopen(recFilename.c_str(),"rb");
fseek(p_file,0,SEEK_END);
int size = ftell(p_file);
fclose(p_file);
but either way, I get 0 for the file size. When I go back and look at the file I write to, the data is there and the size is correct. The recording is happening on a separate thread.
I know that bytes are being written because I can print the size of the data as it is written in conjunction with the output of the methods shown above.
The filename plus the 0 is what I print out from the GUI thread. 'Bytes written x' is out of the recording thread.
You can read all about C++ file manipulations here http://www.cplusplus.com/doc/tutorial/files/
This is an example of how I would do it.
#include <fstream>
std::ifstream::pos_type filesize(const char* file)
{
std::ifstream in(file, std::ifstream::ate | std::ifstream::binary);
return in.tellg();
}
Hope this helps.
As a desperate alternative, you can use a ftell in "the write data thread" or maybe a variable to track the amount of data that is written, but going to the real problem, you must be making a mistake, maybe fopen never opens the file, or something like that.
I'll copy a test code to show that this works at least in a singlethread app
int _tmain(int argc, _TCHAR* argv[])
{
FILE * mFile;
FILE * mFile2;
mFile = fopen("hi.txt", "a+");
// fseek(mFile, 0, SEEK_END);
// ## this is to make sure that fputs and fwrite works equal
// fputs("fopen example", mFile);
fwrite("fopen ex", 1, 9, mFile);
fseek(mFile, 0, SEEK_END);
std::cout << ftell(mFile) << ":";
mFile2 = fopen("hi.txt", "rb");
fseek(mFile2, 0, SEEK_END);
std::cout << ftell(mFile2) << std::endl;
fclose(mFile2);
fclose(mFile);
getchar();
return 0;
}
Just use freopen function before calling stat. It seems freopen refreshes the file length.
I realize this post is rather old at this point, but in response to #TylerD007, while that works, that is incredibly expensive to do if all you're trying to do is get the amount of bytes written.
In C++17 and later, you can simply use the <filesystem> header and call
auto fileSize {std::filesystem::file_size(filePath)}; and now variable fileSize holds the actual size of the file.

Size error on read file

RESOLVED
I'm trying to make a simple file loader.
I aim to get the text from a shader file (plain text file) into a char* that I will compile later.
I've tried this function:
char* load_shader(char* pURL)
{
FILE *shaderFile;
char* pShader;
// File opening
fopen_s( &shaderFile, pURL, "r" );
if ( shaderFile == NULL )
return "FILE_ER";
// File size
fseek (shaderFile , 0 , SEEK_END);
int lSize = ftell (shaderFile);
rewind (shaderFile);
// Allocating size to store the content
pShader = (char*) malloc (sizeof(char) * lSize);
if (pShader == NULL)
{
fputs ("Memory error", stderr);
return "MEM_ER";
}
// copy the file into the buffer:
int result = fread (pShader, sizeof(char), lSize, shaderFile);
if (result != lSize)
{
// size of file 106/113
cout << "size of file " << result << "/" << lSize << endl;
fputs ("Reading error", stderr);
return "READ_ER";
}
// Terminate
fclose (shaderFile);
return 0;
}
But as you can see in the code I have a strange size difference at the end of the process which makes my function crash.
I must say I'm quite a beginner in C so I might have missed some subtilities regarding the memory allocation, types, pointers...
How can I solve this size issue?
*EDIT 1:
First, I shouldn't return 0 at the end but pShader; that seemed to be what crashed the program.
Then, I change the type of reult to size_t, and added a end character to pShader, adding pShdaer[result] = '/0'; after its declaration so I can display it correctly.
Finally, as #JamesKanze suggested, I turned fopen_s into fopen as the previous was not usefull in my case.
First, for this sort of raw access, you're probably better off
using the system level functions: CreateFile or open,
ReadFile or read and CloseHandle or close, with
GetFileSize or stat to get the size. Using FILE* or
std::filebuf will only introduce an additional level of
buffering and processing, for no gain in your case.
As to what you are seeing: there is no guarantee that an ftell
will return anything exploitable as a numeric value; it could
very well be just a magic cookie. On most current systems, it
is a byte offset into the physical file, but on any non-Unix
system, the offset into the physical file will not map directly
to the logical file you are reading unless you open the file in
binary mode. If you use "rb" to open the file, you'll
probably see the same values. (Theoretically, you could get
extra 0's at the end of the file, but practically, the OS's
where that happened are either extinct, or only used on legacy
mainframes.)
EDIT:
Since the answer stating this has been deleted: you should loop
on the fread until it returns 0 (setting errno to 0 before
each call, and checking it after the return to see whether the
function returned because of an error or because it reached the
end of file). Having said this: if you're on one of the usual
Windows or Unix systems, and the file is local to the machine,
and not too big, fread will read it all in one go. The
difference in size you are seeing (given the numerical values
you posted) is almost certainly due to the fact that the two
byte Windows line endings are being mapped to a single '\n'
character. To avoid this, you must open in binary mode;
alternatively, if you really are dealing with text (and want
this mapping), you can just ignore the extra bytes in your
buffer, setting the '\0' terminator after the last byte
actually read.

C++ copying files. Short on data

I'm trying to copy a file, but whatever I try, the copy seems to be a few bytes short.
_file is an ifstream set to binary mode.
void FileProcessor::send()
{
//If no file is opened return
if(!_file.is_open()) return;
//Reset position to beginning
_file.seekg(0, ios::beg);
//Result buffer
char * buffer;
char * partBytes = new char[_bufferSize];
//Packet *p;
//Read the file and send it over the network
while(_file.read(partBytes,_bufferSize))
{
//buffer = Packet::create(Packet::FILE,std::string(partBytes));
//p = Packet::create(buffer);
//cout<< p->getLength() << "\n";
//writeToFile(p->getData().c_str(),p->getLength());
writeToFile(partBytes,_bufferSize);
//delete[] buffer;
}
//cout<< *p << "\n";
delete [] partBytes;
}
_writeFile is the file to be written to.
void FileProcessor::writeToFile(const char *buffer,unsigned int size)
{
if(_writeFile.is_open())
{
_writeFile.write(buffer,size);
_writeFile.flush();
}
}
In this case I'm trying to copy a zip file.
But opening both the original and copy in notepad I noticed that while they look identical , It's different at the end where the copy is missing a few bytes.
Any suggestions?
You are assuming that the file's size is a multiple of _bufferSize. You have to check what's left on the buffer after the while:
while(_file.read(partBytes,_bufferSize)) {
writeToFile(partBytes,_bufferSize);
}
if(_file.gcount())
writeToFile(partBytes, _file.gcount());
Your while loop will terminate when it fails to read _bufferSize bytes because it hits an EOF.
The final call to read() might have read some data (just not a full buffer) but your code ignores it.
After your loop you need to check _file.gcount() and if it is not zero, write those remaining bytes out.
Are you copying from one type of media to another? Perhaps different sector sizes are causing the apparent weirdness.
What if _bufferSize doesn't divide evenly into the size of the file...that might cause extra bytes to be written.
You don't want to always do writeToFile(partBytes,_bufferSize); since it's possible (at the end) that less than _bufferSize bytes were read. Also, as pointed out in the comments on this answer, the ifstream is no longer "true" once the EOF is reached, so the last chunk isn't copied (this is your posted problem). Instead, use gcount() to get the number of bytes read:
do
{
_file.read(partBytes, _bufferSize);
writeToFile(partBytes, (unsigned int)_file.gcount());
} while (_file);
For comparisons of zip files, you might want to consider using a non-text editor to do the comparison; HxD is a great (free) hex editor with a file compare option.

Ubuntu 10.04, error in using MAP_HUGETLB with MAP_SHARED

Following is the code that I am using for mmaping a file in ubuntu with hugepages, but this call is failing with error "invalid argument". However, when I do pass
MAP_ANON flag with no file descriptor parameter in mmap, then it works. I am not being able to understand the possible reason behind this.
Secondly, I am not able to understand why file mmaping is allowed with MAP_PRIVATE when this flag itself means that no change will be written back to file. This can always be accomplished using MAP_ANON, or is there something I am missing ?
Can someone help me with these ?
int32_t main(int32_t argc, char** argv) {
int32_t map_length = 16*1024*1024; // 16 MB , huge page size is 2 MB
int32_t protection = PROT_READ | PROT_WRITE;
int32_t flags = MAP_SHARED | MAP_HUGETLB;
int32_t file__ = open("test",O_RDWR|O_CREAT | O_LARGEFILE,s_IRWXU | S_IRGRP | S_IROTH);
if(file__ < 0 ) {
std::cerr << "Unable to open file\n";
return -1;
}
if (ftruncate(file__, map_length) < 0) {
std::cerr
<< "main :: unable to truncate the file\n"
<< "main :: " << strerror(errno) << "\n"
<< "main :: error number is " << errno << "\n";
return -1;
}
void *addr= mmap(NULL, map_length, protection, flags, file__, 0);
if (addr == MAP_FAILED) {
perror("mmap");
return -1;
}
const char* msg = "Hello World\n";
int32_t len = strlen(msg);
memcpy(addr,msg,len);
munmap(addr, map_length);
close(file__);
return 0;
}
Both your questions come down to the same point: Using mmap() you can obtain two kinds of mappings: anonymous memory and files.
Anonymous memory is (as stated in the man page) not backed by any file in the file system. Instead the memory you get back from an MAP_ANON call to mmap() is plain system memory. The main user of this interface is the C library which uses it to obtain backing storage for malloc/free. So, using MAP_ANON is explicitly saying that you don't want to map a file.
File-backed memory kind of blends in a file (or portions of it) into the address space of your application. In this case, the memory content is actually backed by the file's content. Think of the MAP_PRIVATE flag as first allocating memory for the file and then copying the content into this memory. In truth this will not be what the kernel is doing, but let's just pretend.
HUGE_TLB is a feature the kernel provides for anonymous memory (see Documentation/vm/hugetlb‐page.txt as referenced in the mmap() man page). This should be the reason for your mmap() call failing when using HUGETLB for a file. *Edit: not completely correct. There is a RAM file system (hugetlbfs) that does support huge pages. However, huge_tlb mappings won't work on arbitrary files, as I understand the docs.*
For details on how to use HUGE_TLB and the corresponding in-memory file system (hugetlbfs), you might want to consider the following articles on LWN:
Huge Pages, Part 1 (Intro)
Huge Pages, Part 2 (Interfaces)
Huge Pages, Part 3 (Administration)
Huge Pages, Part 4 (Benchmarking)
Huge Pages, Part 5 (TLB costs)
Adding MAP_PRIVATE to the flags fixed this for me.