In the following simple program:
# include <sys/mman.h>
# include <fcntl.h>
# include <cstdlib>
# include <cassert>
struct rgn_desc
{
size_t end_;
char data[];
};
int main(int argc, const char *argv[])
{
int fd = open("foo.mm", O_RDWR|O_CREAT|O_TRUNC, (mode_t)0700);
assert(fd != -1);
void * ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_POPULATE, fd, 0);
assert(ptr != (void*) -1);
rgn_desc * rgn_ptr = (rgn_desc*) ptr;
rgn_ptr->end_ = 0; // <-- bus error
}
Basically, I want to manage a simple mmaped arena allocator and store as first part of the mapping the bytes that I have allocated. So, when I recover from a file, I get how many bytes were allocated.
However, the last line is giving me a bus error. Could someone explain why, and if possible, to suggest to me a way for avoiding it. I am running Linux on a 32 bits pentium and using clang++ compiler
According to the doc, a sig bus can trigger if:
SIGBUS
Attempted access to a portion of the buffer that does not
correspond to the file (for example, beyond the end of the
file, including the case where another process has truncated
the file).
In your snipped your file size don't match with your mmap() size (0, 4096), so you could use ftruncate() to increase the size of your file.
ftruncate(fd, 4096);
Related
I am working with mmap() to fastly read big files, basing my script on this question answer (Fast textfile reading in c++).
I am using the second version from sehe answer :
#include <algorithm>
#include <iostream>
#include <cstring>
// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
const char* map_file(const char* fname, size_t& length);
int main()
{
size_t length;
auto f = map_file("test.cpp", length);
auto l = f + length;
uintmax_t m_numLines = 0;
while (f && f!=l)
if ((f = static_cast<const char*>(memchr(f, n, l-f))))
m_numLines++, f++;
std::cout << "m_numLines = " << m_numLines << "n";
}
void handle_error(const char* msg) {
perror(msg);
exit(255);
}
const char* map_file(const char* fname, size_t& length)
{
int fd = open(fname, O_RDONLY);
if (fd == -1)
handle_error("open");
// obtain file size
struct stat sb;
if (fstat(fd, &sb) == -1)
handle_error("fstat");
length = sb.st_size;
const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
if (addr == MAP_FAILED)
handle_error("mmap");
// TODO close fd at some point in time, call munmap(...)
return addr;
}
and it works just great.
But if I implement it over a loop of several files (I just change the main() function name to:
void readFile(std::string &nomeFile) {
and then get the file content in "f" object in main() function with:
size_t length;
auto f = map_file(nomeFile.c_str(), length);
auto l = f + length;
and call it from main() on a loop over a filenames list), after a while I got:
open: Too many open files
I imagine there would be a way to close the open() call after working on a file, but I can not figure out how and where to put it exactly. I tried:
int fc = close(fd);
at the end of the readFile() function but it did change nothing.
Thanks a lot in advance for any help!
EDIT:
after the important suggestions I received I made some performance comparison with different approaches with mmap() and std::cin(), check out: fast file reading in C++, comparison of different strategies with mmap() and std::cin() results interpretation for the results
Limit to the number of concurrently open files
As you can imagine, keeping a file open consumes resources. So there is in any case a practical limit to the number of open file descriptors on your system. This is why it's highly recommended to close files that you no longer need.
The exact limit depends on the OS and the configuration. If you want to know more, there are already a lot of answers available for this kind of question.
Special case of mmap
Obviously, with mmap() you open a file. And doing so repetitively in a loop risk to reach sooner or later the fatal file description limit, as you could experience.
The idea of trying to close the file is not bad. The problem is that it does not work. This is specified in the POSIX documentation:
The mmap() function adds an extra reference to the file associated
with the file descriptor fildes which is not removed by a subsequent
close() on that file descriptor. This reference is removed when there
are no more mappings to the file.
Why ? Because mmap() links the file in a special way to the virtual memory management in your system. And this file will be needed as long as you use the address range to which it was allocated.
So how to remove those mappings ? The answer is to use munmap():
The function munmap() removes any mappings for those entire pages
containing any part of the address space of the process starting at
addr and continuing for len bytes.
And of course, close() the file descriptor that you no longer need. A prudent approach would be to close after munmap(), but in principle, at least on a POSIX compliant system, it should not matter when you're closing. Nevertheless, check your latest OS documentation to be on the safe side :-)
*Note: file mapping is also available on windows; the documentation about closing the handles is ambiguous on potential memory leaks if there are remaining mappings. This is why I recommend prudence on the closing moment. *
Good evening, I am attempting to read some binary information from a .img file. I can retrieve 16-bit numbers (uint16_t) from ntohs(), but when I try to retrieve from the same position using ntohl(), it gives me 0 instead.
Here are the critical pieces of my program.
#include <iostream>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <arpa/inet.h>
#include <cmath>
int fd;
struct blockInfo {
long blockSize = 0;
long blockCount = 0;
long fatStart = 0;
long fatBlocks = 0;
long rootStart = 0;
long rootBlocks = 0;
long freeBlocks = 0;
long resBlocks = 0;
long alloBlocks = 0;
};
int main(int argc, char *argv[]) {
fd = open(argv[1], O_RDWR);
// Get file size
struct stat buf{};
stat(path, &buf);
size_t size = buf.st_size;
// A struct to hold data retrieved from a big endian image.
blockInfo info;
auto mapPointer = (char*) mmap(nullptr, size,
(PROT_READ | PROT_WRITE), MAP_PRIVATE, fd, 0);
info.blockSize = ntohs((uint16_t) mapPointer[12]);
long anotherBlockSize = ntohl((uint32_t) mapPointer[11]);
printf("%ld", info.blockSize); // == 512, correct
printf("%ld", anotherBlockSize); // == 0, what?
}
I understand that blockSize and anotherBlockSize are not supposed to be equal, but anotherBlockSize should be non-zero at the least, right?
Something else, I go to access data at ntohs(pointer[16]), which should return 2, but also returns 0. What is going on here? Any help would be appreciated.
No, anotherBlockSize will not necessarily be non-zero
info.blockSize = ntohs((uint16_t) mapPointer[12]);
This code reads a char from offset 12 relatively to mapPointer, casts it to uint16_t and applies ntohs() to it.
long anotherBlockSize = ntohl((uint32_t) mapPointer[11]);
This code reads a char from offset 11 relatively to mapPointer, casts it to uint32_t and applies ntohl() to it.
Obviously, you are reading non-overlapped data (different chars) from the mapped memory, so you should not expect blockSize and anotherBlockSize to be connected.
If you are trying to read the same memory in different ways (as uint32_t and uint16_t), you must do some pointer casting:
info.blockSize = ntohs( *((uint16_t*)&mapPointer[12]));
Note that such code will generally be platform dependent. Such cast working perfectly on x86 may fail on ARM.
auto mapPointer = (char*) ...
This declares mapPointer to be a char *.
... ntohl((uint32_t) mapPointer[11]);
Your obvious intent here is to use mapPointer to retrieve a 32 bit value, a four-byte value, from this location.
Unfortunately, because mapPointer is a plain, garden-variety char *, the expression mapPointer[11] evaluates to a single, lonely char value. One byte. That's what the code reads from the mmaped memory block, at the 11th offset from the start of the block. The (uint32_t) does not read an uint32_t from the address referenced mapPointer+11. mapPointer[11] reads a single char value from mapPointer+11, because mapPointer is a pointer to a char, converts it to a uint32_t, and feeds to to ntohl().
I'm working on a benchmark program. Upon making the read() system call, the program appears to hang indefinitely. The target file is 1 GB of binary data and I'm attempting to read directly into buffers that can be 1, 10 or 100 MB in size.
I'm using std::vector<char> to implement dynamically-sized buffers and handing off &vec[0] to read(). I'm also calling open() with the O_DIRECT flag to bypass kernel caching.
The essential coding details are captured below:
std::string fpath{"/path/to/file"};
size_t tries{};
int fd{};
while (errno == EINTR && tries < MAX_ATTEMPTS) {
fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
tries++;
}
// Throw exception if error opening file
if (fd == -1) {
ostringstream ss {};
switch (errno) {
case EACCES:
ss << "Error accessing file " << fpath << ": Permission denied";
break;
case EINVAL:
ss << "Invalid file open flags; system may also not support O_DIRECT flag, required for this benchmark";
break;
case ENAMETOOLONG:
ss << "Invalid path name: Too long";
break;
case ENOMEM:
ss << "Kernel error: Out of memory";
}
throw invalid_argument {ss.str()};
}
size_t buf_sz{1024*1024}; // 1 MiB buffer
std::vector<char> buffer(buf_sz); // Creates vector pre-allocated with buf_sz chars (bytes)
// Result is 0-filled buffer of size buf_sz
auto bytes_read = read(fd, &buffer[0], buf_sz);
Poking through the executable with gdb shows that buffers are allocated correctly, and the file I've tested with checks out in xxd. I'm using g++ 7.3.1 (with C++11 support) to compile my code on a Fedora Server 27 VM.
Why is read() hanging on large binary files?
Edit: Code example updated to more accurately reflect error checking.
There are multiple problems with your code.
This code will never work properly if errno ever has a value equal to EINTR:
while (errno == EINTR && tries < MAX_ATTEMPTS) {
fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
tries++;
}
That code won't stop when the file has been successfully opened and will keep reopening the file over and over and leak file descriptors as it keeps looping once errno is EINTR.
This would be better:
do
{
fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
tries++;
}
while ( ( -1 == fd ) && ( EINTR == errno ) && ( tries < MAX_ATTEMPTS ) );
Second, as noted in the comments, O_DIRECT can impose alignment restrictions on memory. You might need page-aligned memory:
So
size_t buf_sz{1024*1024}; // 1 MiB buffer
std::vector<char> buffer(buf_sz); // Creates vector pre-allocated with buf_sz chars (bytes)
// Result is 0-filled buffer of size buf_sz
auto bytes_read = read(fd, &buffer[0], buf_sz);
becomes
size_t buf_sz{1024*1024}; // 1 MiB buffer
// page-aligned buffer
buffer = mmap( 0, buf_sz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, NULL );
auto bytes_read = read(fd, &buffer[0], buf_sz);
Note also the the Linux implementation of O_DIRECT can be very dodgy. It's been getting better, but there are still potential pitfalls that aren't very well documented at all. Along with alignment restrictions, if the last amount of data in the file isn't a full page, for example, you may not be able to read it if the filesystem's implementation of direct IO doesn't allow you to read anything but full pages (or some other block size). Likewise for write() calls - you may not be able to write just any number of bytes, you might be constrained to something like a 4k page.
This is also critical:
Most examples of read() hanging appear to be when using pipes or non-standard I/O devices (e.g., serial). Disk I/O, not so much.
Some devices simply do not support direct IO. They should return an error, but again, the O_DIRECT implementation on Linux can be very hit-or-miss.
Pasting your program and running on my linux system, was a working and non-hanging program.
The most likely cause for the failure is the file is not a file-system item, or it has a hardware element which is not working.
Try with a smaller size - to confirm, and try on a different machine to help diagnose
My complete code (with no error checking)
#include <vector>
#include <string>
#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
int main( int argc, char ** argv )
{
std::string fpath{"myfile.txt" };
auto fd = open(fpath.c_str(), O_RDONLY | O_DIRECT | O_LARGEFILE);
size_t buf_sz{1024*1024}; // 1 MiB buffer
std::vector<char> buffer(buf_sz); // Creates vector pre-allocated with buf_sz chars (bytes)
// Result is 0-filled buffer of size buf_sz
auto bytes_read = read(fd, &buffer[0], buf_sz);
}
myfile.txt was created with
dd if=/dev/zero of=myfile.txt bs=1024 count=1024
If the file is not 1Mb in size, it may fail.
If the file is a pipe, it can block until the data is available.
Most examples of read() hanging appear to be when using pipes or non-standard I/O devices (e.g., serial). Disk I/O, not so much.
O_DIRECT flag is useful for filesystems and block devices. With this flag people normally map pages into the user space.
For sockets, pipes and serial devices it is plain useless because the kernel does not cache that data.
Your updated code hangs because fd is initialized with 0 which is STDIN_FILENO and it never opens that file, then it hangs reading from stdin.
When I want to map some shared memory in linux, I do:
hFileMap = open(MapName, O_RDWR | O_CREAT, 438);
pData = mmap(NULL, Size, PROT_READ | PROT_WRITE, MAP_FILE | MAP_SHARED, hFileMap, 0);
and it works just fine. It maps the memory properly. However three things arise that I don't like.
It creates a physical file on the disc. I'd have to remove this file manually or using remove function. I like that on Windows there is no physical file unless I map a file physically myself. I'd like to do the same on linux.
I have to use ftruncate to set the length of the file. Otherwise memcpy will segfault when copying data into the file. I mean, it doesn't make much sense for the file to have 0 space when I had to specify the size to mmap in the first place..
The size is fixed. I don't need it resizing so there should be no need for ftruncate?
Is there anyway at all to map memory without a physical file and still have other processes be able to access it? What would be the disadvantages to a solution?
I don't really care too much about the ftruncate but is there a way to also remove the call? It just bothers me a tiny bit that I have to do this when I don't have to on Windows.
shm_open will still create a file in the file system to represent the shared memory object.
You can call mmap with map_anonymous and map_shared, and it will not create any files. However, the other processes must be children of the current process, and mmap must be setup before fork is called.
If that won't work then shm_open is your best bet.
You can use shm_open to create a shared memory region. The following code segment demonstrates the use of shm_open() to create a shared memory object which is then sized using ftruncate() before being mapped into the process address space using mmap():
#include <unistd.h>
#include <sys/mman.h>
...
#define MAX_LEN 10000
struct region { /* Defines "structure" of shared memory */
int len;
char buf[MAX_LEN];
};
struct region *rptr;
int fd;
/* Create shared memory object and set its size */
fd = shm_open("/myregion", O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
if (fd == -1)
/* Handle error */;
if (ftruncate(fd, sizeof(struct region)) == -1)
/* Handle error */;
/* Map shared memory object */
rptr = mmap(NULL, sizeof(struct region),
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (rptr == MAP_FAILED)
/* Handle error */;
/* Now we can refer to mapped region using fields of rptr;
for example, rptr->len */
...
ftruncate is there to set the size of the file. So if you don't like to call it, you can manually write 0 bytes to fill up the file.
I want to read status information that an application provides via shared memory. I want to use C++ in order to read the content of that named shared memory and then call it with pinvoke from a C#-class.
From the software I know that it has a certain file structure: A struct STATUS_DATA with an array of four structs of SYSTEM_CHARACTERISTICS.
I'm not (yet) familiar with C++, so I tried to follow msdn basically. To find the size of the file to be mapped, I added the sizes of the struct members as to be seen in the code below. This results in a ACCESS DENIED, so I figured, that the result based on the structs is too high. When I use sizeof(STATUS_DATA) (I added the struct to my source), it still ends up in an ACCESS DENIED. If I try something lower, like 1024 Bytes, only thing I can see in pbuf is a <, while debugging.
This is what I got so far:
#include <windows.h>
#include <stdio.h>
#include <conio.h>
#include <tchar.h>
#include <iostream>
#pragma comment(lib, "user32.lib")
using namespace std;
signed int BUF_SIZE = 4 * (10368 + 16 + 4 + 16 + 4 + 16 + 4 + 1 + 4); // sizeof(STATUS_DATA);
TCHAR szName[]=TEXT("ENGINE_STATUS");
int main()
{
HANDLE hMapFile;
unsigned char* pBuf;
hMapFile = OpenFileMapping(
FILE_MAP_READ, // read access
FALSE, // do not inherit the name
szName); // name of mapping object
if (hMapFile == NULL)
{
_tprintf(TEXT("Could not open file mapping object (%d).\n"),
GetLastError());
return 1;
}
pBuf = (unsigned char*) MapViewOfFile(hMapFile, // handle to map object
FILE_MAP_READ, // read/write permission
0,
0,
BUF_SIZE); // 1024);
if (pBuf == NULL)
{
_tprintf(TEXT("Could not map view of file (%d).\n"),
GetLastError());
CloseHandle(hMapFile);
return 1;
}
UnmapViewOfFile(pBuf);
CloseHandle(hMapFile);
return 0;
}
I also made sure that this Shared Mem "is there" by following that hint. Can somebody give me a hint, what I'm missing? Thanks!
The last parameter to MapViewOfFile (dwNumberOfBytesToMap) must be less than the maximum size specified when the mapping was created. Since we don't know what that size is, it seems fair to assume that BUF_SIZE is exceeding it and 1024 isn't. Specifying 0 for this parameter is an easy way to map the entire file into a single view.
Most (all?) C++ debuggers will assume that a pointer to char is a null-terminated string, so when you try and view the mapped data it will only display up until the first byte that is zero. Depending on what data is in the file mapping, this could well be the second byte, which explains why you aren't seeing much information. You would be better to cast the returned pointer to STATUS_DATA* and viewing the individual members.
In short:
Specify zero (0) for dwNumberOfBytesToMap
Cast the returned pointer to STATUS_DATA* instead of unsigned char*