Ubuntu 10.04, error in using MAP_HUGETLB with MAP_SHARED - c++

Following is the code that I am using for mmaping a file in ubuntu with hugepages, but this call is failing with error "invalid argument". However, when I do pass
MAP_ANON flag with no file descriptor parameter in mmap, then it works. I am not being able to understand the possible reason behind this.
Secondly, I am not able to understand why file mmaping is allowed with MAP_PRIVATE when this flag itself means that no change will be written back to file. This can always be accomplished using MAP_ANON, or is there something I am missing ?
Can someone help me with these ?
int32_t main(int32_t argc, char** argv) {
int32_t map_length = 16*1024*1024; // 16 MB , huge page size is 2 MB
int32_t protection = PROT_READ | PROT_WRITE;
int32_t flags = MAP_SHARED | MAP_HUGETLB;
int32_t file__ = open("test",O_RDWR|O_CREAT | O_LARGEFILE,s_IRWXU | S_IRGRP | S_IROTH);
if(file__ < 0 ) {
std::cerr << "Unable to open file\n";
return -1;
}
if (ftruncate(file__, map_length) < 0) {
std::cerr
<< "main :: unable to truncate the file\n"
<< "main :: " << strerror(errno) << "\n"
<< "main :: error number is " << errno << "\n";
return -1;
}
void *addr= mmap(NULL, map_length, protection, flags, file__, 0);
if (addr == MAP_FAILED) {
perror("mmap");
return -1;
}
const char* msg = "Hello World\n";
int32_t len = strlen(msg);
memcpy(addr,msg,len);
munmap(addr, map_length);
close(file__);
return 0;
}

Both your questions come down to the same point: Using mmap() you can obtain two kinds of mappings: anonymous memory and files.
Anonymous memory is (as stated in the man page) not backed by any file in the file system. Instead the memory you get back from an MAP_ANON call to mmap() is plain system memory. The main user of this interface is the C library which uses it to obtain backing storage for malloc/free. So, using MAP_ANON is explicitly saying that you don't want to map a file.
File-backed memory kind of blends in a file (or portions of it) into the address space of your application. In this case, the memory content is actually backed by the file's content. Think of the MAP_PRIVATE flag as first allocating memory for the file and then copying the content into this memory. In truth this will not be what the kernel is doing, but let's just pretend.
HUGE_TLB is a feature the kernel provides for anonymous memory (see Documentation/vm/hugetlb‐page.txt as referenced in the mmap() man page). This should be the reason for your mmap() call failing when using HUGETLB for a file. *Edit: not completely correct. There is a RAM file system (hugetlbfs) that does support huge pages. However, huge_tlb mappings won't work on arbitrary files, as I understand the docs.*
For details on how to use HUGE_TLB and the corresponding in-memory file system (hugetlbfs), you might want to consider the following articles on LWN:
Huge Pages, Part 1 (Intro)
Huge Pages, Part 2 (Interfaces)
Huge Pages, Part 3 (Administration)
Huge Pages, Part 4 (Benchmarking)
Huge Pages, Part 5 (TLB costs)

Adding MAP_PRIVATE to the flags fixed this for me.

Related

File Mapping,How to open a file(txt) from a specific location

I have a big .txt file (over 1gb). While searching a way to open it fast I found mapping.
I managed to use CreateFile(), then I made a char buffer[] and finally put the file contents in the buffer with ReadFile(). The problem is that the file is too big, so I can't load it all at once into the buffer, because I can't make an array that big.
I think the solution would be to open and close the file at specified locations in the .txt file and get a few of the file contents each time. The only source I found explaining mapping was on MSDN but I can't find out how to do it.
So in the end, how do I read a big file with a mapping?
HANDLE my_File = CreateFileA("words.txt", GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (my_File == INVALID_HANDLE_VALUE)
{
cout << "Failed to open file" << endl;
return 0;
}
constexpr size_t BUFFSIZE = 1000000;
char buffer[BUFFSIZE];
DWORD dwBytesToRead = BUFFSIZE - 1;
DWORD dwBytesRead = 0;
BOOL my_Bool = ReadFile(my_File,(void*)buffer, dwBytesToRead, &dwBytesRead, NULL);
if (dwBytesRead > 0)
{
buffer[dwBytesRead] = '\0';
cout << "FILE IS: " << buffer << endl;
}
CloseHandle(my_File);
I think you are confused. The whole purpose of mapping part or all of a file into memory is to avoid the need to buffer the data yourself. Instead, the OS takes care of that for you, allowing you to access the contents of the file via a pointer, just like you would any other in-memory data structure.
Only you can decide if that's the best solution for you. In a 32 bit app, 1GB is a lot of addressing space to find. In a 64 bit app there is no such problem. As mentioned in the comments, reading the file in chunks into a smaller buffer can be a better bet, especially if you want to process it sequentially.
For some example code on how to memory map a file, see:
How to CreateFileMapping in C++?

ofstream writes null characters on a file in binary mode

We are facing an issue with device reboot. We are running our application in linux os on a raspberry pi board. We are maintaining a log file to which we are appending the records every 10sec with below code. One write can have one or more records in the pBuffer.
bool FileOP::Append(const std::string & PathName, const char * pBuffer, uint64_t Size)
{
bool AppendSuccessful = false;
std::ofstream File;
try
{
File.exceptions(std::ofstream::badbit | std::ofstream::failbit);
File.open(PathName.c_str(), std::ofstream::out | std::ofstream::binary | std::ofstream::app);
File.write(pBuffer, Size);
File.close();
AppendSuccessful = true;
}
catch (std::exception & e)
{
std::cout << "Error when appending string to file: " << PathName
<< std::strerror(errno) << " Exception : " << e.what() << std::endl;
}
return AppendSuccessful;
}
We have observed that when we write the data and exactly on that time if we reboot the board(remove power), we are getting a record with complete NULL characters. File size will be increased based on the record size, for example if we write 100 bytes file size will be header size(100) + old data size (100) + new data(100) = 300bytes. When we try to read the file we are getting last 100 bytes full of NULL characters.
If the record is not written completely how the file size is increasing?
How exactly the record is filled with NULL? we have verified that every new record written does not contain NULL characters.
This will depend on the filesystem in use, but what is likely happening here is that the filesystem is committing the change to the file metadata (in this case, its length) before all the data is written. If you require the file to be consistent even in case of a crash, and are using ext4, try mounting with the data=journal option. Note that this has performance impacts due to disabling delayed allocation.
If the record is not written completely how the file size is increasing?
If null bytes are written, those increase file size just as much as any other byte.
How exactly the record is filled with NULL?
It can potentially happens on the following line:
File.write(pBuffer, Size);
If pBuffer contains null chars, then those null chars are written to the file.

File read() hangs on binary large file

I'm working on a benchmark program. Upon making the read() system call, the program appears to hang indefinitely. The target file is 1 GB of binary data and I'm attempting to read directly into buffers that can be 1, 10 or 100 MB in size.
I'm using std::vector<char> to implement dynamically-sized buffers and handing off &vec[0] to read(). I'm also calling open() with the O_DIRECT flag to bypass kernel caching.
The essential coding details are captured below:
std::string fpath{"/path/to/file"};
size_t tries{};
int fd{};
while (errno == EINTR && tries < MAX_ATTEMPTS) {
fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
tries++;
}
// Throw exception if error opening file
if (fd == -1) {
ostringstream ss {};
switch (errno) {
case EACCES:
ss << "Error accessing file " << fpath << ": Permission denied";
break;
case EINVAL:
ss << "Invalid file open flags; system may also not support O_DIRECT flag, required for this benchmark";
break;
case ENAMETOOLONG:
ss << "Invalid path name: Too long";
break;
case ENOMEM:
ss << "Kernel error: Out of memory";
}
throw invalid_argument {ss.str()};
}
size_t buf_sz{1024*1024}; // 1 MiB buffer
std::vector<char> buffer(buf_sz); // Creates vector pre-allocated with buf_sz chars (bytes)
// Result is 0-filled buffer of size buf_sz
auto bytes_read = read(fd, &buffer[0], buf_sz);
Poking through the executable with gdb shows that buffers are allocated correctly, and the file I've tested with checks out in xxd. I'm using g++ 7.3.1 (with C++11 support) to compile my code on a Fedora Server 27 VM.
Why is read() hanging on large binary files?
Edit: Code example updated to more accurately reflect error checking.
There are multiple problems with your code.
This code will never work properly if errno ever has a value equal to EINTR:
while (errno == EINTR && tries < MAX_ATTEMPTS) {
fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
tries++;
}
That code won't stop when the file has been successfully opened and will keep reopening the file over and over and leak file descriptors as it keeps looping once errno is EINTR.
This would be better:
do
{
fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
tries++;
}
while ( ( -1 == fd ) && ( EINTR == errno ) && ( tries < MAX_ATTEMPTS ) );
Second, as noted in the comments, O_DIRECT can impose alignment restrictions on memory. You might need page-aligned memory:
So
size_t buf_sz{1024*1024}; // 1 MiB buffer
std::vector<char> buffer(buf_sz); // Creates vector pre-allocated with buf_sz chars (bytes)
// Result is 0-filled buffer of size buf_sz
auto bytes_read = read(fd, &buffer[0], buf_sz);
becomes
size_t buf_sz{1024*1024}; // 1 MiB buffer
// page-aligned buffer
buffer = mmap( 0, buf_sz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, NULL );
auto bytes_read = read(fd, &buffer[0], buf_sz);
Note also the the Linux implementation of O_DIRECT can be very dodgy. It's been getting better, but there are still potential pitfalls that aren't very well documented at all. Along with alignment restrictions, if the last amount of data in the file isn't a full page, for example, you may not be able to read it if the filesystem's implementation of direct IO doesn't allow you to read anything but full pages (or some other block size). Likewise for write() calls - you may not be able to write just any number of bytes, you might be constrained to something like a 4k page.
This is also critical:
Most examples of read() hanging appear to be when using pipes or non-standard I/O devices (e.g., serial). Disk I/O, not so much.
Some devices simply do not support direct IO. They should return an error, but again, the O_DIRECT implementation on Linux can be very hit-or-miss.
Pasting your program and running on my linux system, was a working and non-hanging program.
The most likely cause for the failure is the file is not a file-system item, or it has a hardware element which is not working.
Try with a smaller size - to confirm, and try on a different machine to help diagnose
My complete code (with no error checking)
#include <vector>
#include <string>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
int main( int argc, char ** argv )
{
std::string fpath{"myfile.txt" };
auto fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
size_t buf_sz{1024*1024}; // 1 MiB buffer
std::vector<char> buffer(buf_sz); // Creates vector pre-allocated with buf_sz chars (bytes)
// Result is 0-filled buffer of size buf_sz
auto bytes_read = read(fd, &buffer[0], buf_sz);
}
myfile.txt was created with
dd if=/dev/zero of=myfile.txt bs=1024 count=1024
If the file is not 1Mb in size, it may fail.
If the file is a pipe, it can block until the data is available.
Most examples of read() hanging appear to be when using pipes or non-standard I/O devices (e.g., serial). Disk I/O, not so much.
O_DIRECT flag is useful for filesystems and block devices. With this flag people normally map pages into the user space.
For sockets, pipes and serial devices it is plain useless because the kernel does not cache that data.
Your updated code hangs because fd is initialized with 0 which is STDIN_FILENO and it never opens that file, then it hangs reading from stdin.

Read memory from application which does not allow it

I am currently trying to read the entirety of the memory of a game which blocks calls to OpenProcess and ReadProcessMemory (I believe this is done through a windows driver/service, although I'm not sure how).
I use the following code to do try and open the process and read its memory to a file:
HANDLE process = OpenProcess(PROCESS_VM_READ, 0, pid);
if (!process) {
cout << "Failed to open process.";
return 1;
}
cout << "Successfully opened processs." << endl << "Dumping memory to mem.dmp..." << endl;
ofstream fout;
fout.open("mem.dmp");
char *base = (char *)0;
char *readCount = (char *)0;
do {
char buffer[PAGE_SIZE];
if (ReadProcessMemory(process, base, buffer, PAGE_SIZE, NULL) != 0)
{
fout << buffer;
}
base += PAGE_SIZE;
readCount++;
} while (base != 0);
if (readCount == 0) {
cout << "Warning: No memory was read from the process." << endl;
}
fout.flush();
fout.close();
However, when run, this cannot even open the process.
The only way to get past the driver blocking the process from being opened for memory reading is to dump the entirety of the physical memory to a file. I have no idea how to do this, other than having to set windows to dump all of the physical memory on a blue screen, and then forcing my computer to shutdown with a blue screen. This is obviously quite inconvenient as I will want to analyse the application's memory quite frequently.
Is there any way to dump all of the physical memory without using this method on Windows? I know virtually nothing about the driver or how it works so it would be almost impossible to work out another way of bypassing it.
You are trying to access the "0th" memory position, that is not possible (SO does not allow you to do it):
char *base = (char *) 0;
You should set correcly the address where you wanna read, and that address must be a readable address. Check the ReadProcessMemory doc here
lpBaseAddress [in]: A pointer to the base address in the specified
process from which to read. Before any data transfer occurs, the
system verifies that all data in the base address and memory of the
specified size is accessible for read access, and if it is not
accessible the function fails.
Check also the examples in this post
here

C++ get the size of a file while it's being written to

I have a recording application that is reading data from a network stream and writing it to file. It works very well, but I would like to display the file size as the data is being written. Every second the gui thread updates the status bar to update the displayed time of recording. At this point I would also like to display the current file size.
I originally consulted this question and have tried both the stat method:
struct stat stat_buf;
int rc = stat(recFilename.c_str(), &stat_buf);
std::cout << recFilename << " " << stat_buf.st_size << "\n";
(no error checking for simplicity) and the fseek method:
FILE *p_file = NULL;
p_file = fopen(recFilename.c_str(),"rb");
fseek(p_file,0,SEEK_END);
int size = ftell(p_file);
fclose(p_file);
but either way, I get 0 for the file size. When I go back and look at the file I write to, the data is there and the size is correct. The recording is happening on a separate thread.
I know that bytes are being written because I can print the size of the data as it is written in conjunction with the output of the methods shown above.
The filename plus the 0 is what I print out from the GUI thread. 'Bytes written x' is out of the recording thread.
You can read all about C++ file manipulations here http://www.cplusplus.com/doc/tutorial/files/
This is an example of how I would do it.
#include <fstream>
std::ifstream::pos_type filesize(const char* file)
{
std::ifstream in(file, std::ifstream::ate | std::ifstream::binary);
return in.tellg();
}
Hope this helps.
As a desperate alternative, you can use a ftell in "the write data thread" or maybe a variable to track the amount of data that is written, but going to the real problem, you must be making a mistake, maybe fopen never opens the file, or something like that.
I'll copy a test code to show that this works at least in a singlethread app
int _tmain(int argc, _TCHAR* argv[])
{
FILE * mFile;
FILE * mFile2;
mFile = fopen("hi.txt", "a+");
// fseek(mFile, 0, SEEK_END);
// ## this is to make sure that fputs and fwrite works equal
// fputs("fopen example", mFile);
fwrite("fopen ex", 1, 9, mFile);
fseek(mFile, 0, SEEK_END);
std::cout << ftell(mFile) << ":";
mFile2 = fopen("hi.txt", "rb");
fseek(mFile2, 0, SEEK_END);
std::cout << ftell(mFile2) << std::endl;
fclose(mFile2);
fclose(mFile);
getchar();
return 0;
}
Just use freopen function before calling stat. It seems freopen refreshes the file length.
I realize this post is rather old at this point, but in response to #TylerD007, while that works, that is incredibly expensive to do if all you're trying to do is get the amount of bytes written.
In C++17 and later, you can simply use the <filesystem> header and call
auto fileSize {std::filesystem::file_size(filePath)}; and now variable fileSize holds the actual size of the file.