Is seek_ptr unique per file? - c++

Sorry but I didn't find clear answer to my question,
I know that each file has its own seek_ptr, let's suppose the main process opened connection to file_A then before doing anything called fork()
Then forked process reads 2 chars, which is correct?
will seek_ptr be equal to 2 for both files?
seek_ptr be equal to 2 for the child process and still 0 for main process?
Only if the answer is 1:
How can I open 2 files in notepad and each file has its indicator/cursor in different locations?

In Unix, (pid, fd) acts as a pointer into the kernel's table of open file descriptions. When a process is forked, the child process will have a different PID, call it pid2. So (pid2, fd) is a different key from (pid, fd). However, these two pointers actually point to the same open file description: fork does not fork the open file descriptions themselves. Therefore, they share a single offset. If one process seeks, it affects the other process as well. If one process reads, it affects the other process as well.
However, either process is free to call close to dissociate fd from the existing open file description, then call open to create a new open file description which may refer to the same file. After this is done, the two processes will have different open file descriptions, and seeking in one does not affect the other.
Each successful call to open always creates a new open file description.

Related

What "gotchas" should I be aware of when writing to the same file descriptor in a parent and child process?

Background: I'm working in C (and very C-ish C++) on Linux. The parent process has an open file descriptor (edit: not file descriptor, actually a FILE pointer) that it writes data to in a "sectioned" format. The child process uses it for this same purpose. As long as the child process is running, it is guaranteed that the parent will not attempt to write more data to its copy of the FILE pointer. The child exits, the parent waits for it, and then it writes more data to the file.
It appears to be working correctly, but I'm still suspicious of it. Do I need to re-seek to the end in the parent? Are there any synchronization issues I need to handle?
The question changed from 'file descriptors' to 'file pointers' or 'file streams'. That complicates any answer.
File Descriptors
File descriptors come from Unix in general, and Linux in particular, and a lot of the behaviour is standardized by POSIX. You say that the parent and child processes share the same file descriptor. That's not possible; file descriptors are specific to one process.
Suppose the parent opens a file (assume the file descriptor is 3) and therefore there is also a new open file description; then the parent process forks. After the fork, each process has a separate file descriptor (but they're both using file descriptor 3), but they share the same open file description. Yes: 'open file descriptors' and 'open file descriptions' are different! Every open file descriptor has an open file description, but a single open file description can be associated with many open file descriptors, and those descriptors need not all be associated with the same process.
One of the critical bits of data in the open file description is the current position (for reading or writing — writing is what matters here).
Consequently, when the child writes, the current position moves for both the parent and the child. Therefore, whenever the parent does write, it writes after the location where the child finished writing.
What you are seeing is guaranteed. At least under the circumstances where the parent opens the file and forks.
Note that in the scenario discussed, the O_APPEND flag was not needed or relevant. However, if you are worried about it, you could open the file with the O_APPEND flag, and then each normal write (not written via pwrite()) will write the data at the current end of the file. This will work even if the two processes do not share the same open file description.
POSIX specification:
open()
fork()
write()
dup2()
pwrite()
File Streams
File streams come with buffering which makes their behaviour more complex than file descriptors (which have no buffering).
Suppose the scenario is like this pseudo-code (error handling omitted):
FILE *fp = fopen(filename, "w");
…code block 1…
pid_t pid = fork();
if (pid == 0)
{
…child writes to file…
…child exits…
}
else
{
…parent waits for child to exit…
…parent writes to file…
}
The de facto implementation of a file stream uses a file descriptor (and you can find the file descriptor using fileno(fp)). When the fork() occurs, it is important that there is no pending data in the file stream — use fflush(fp) before the fork() if any of the code in '…code block 1…' has written to the file stream.
After the fork(), the two processes share the same open file description, but they have independent file descriptors. Of necessity, they have identical copies of the file stream structure.
When the child writes to its copy of the file stream, the data is stored in its buffer. When the buffer fills, or when the child closes the stream (possibly by exiting in a coordinated manner, not using _exit() or its relatives or as a result of a signal), the child's file data is written to the file. That process will move the current position in the shared open file description.
When the parent is notified that the child has exited, then it can write to its file buffer. That information will be written to disk when the buffer fills or is flushed, or when the parent closes the file stream. Since it will be using the same open file description as the child was using, the write position will be where the child left it.
So, as before, what you're seeing is guaranteed as long as you are careful enough.
In particular, calling fflush(fp) before the fork() is crucial if the file stream has been used by the parent before the fork(). If you don't ensure that the stream is flushed, you will get unflushed data written twice, once by the child and once by the parent.
It is also crucial that the child exits cleanly — closing the file stream and hence flushing any unwritten data to the file. If the child does not exit cleanly, there may be buffered data that never gets written to the file. Similarly, if the parent doesn't exit cleanly, there may be buffered data from the parent that never gets written to the file.
If you are talking about POSIX file descriptors, then each write call to a file descriptor is atomic and affects the underlying kernel resource object indepently of what other processes might do with file descriptors that refer to the same object. If two processes do write at approximately the same time, the operations will get ordered by the kernel with one happening completely (though it might write less data than requested) and then the other happening.
In your case, it sounds like you are synchronizing such that you know all parent writes happen either before the child has started (before fork) or after it has completed (after wait), which guarentees the ordering of the write calls.

Access mmap memory from another process

I've started playing with mmap. I'm trying to create an example workspace that will be then extended to the real case.
This is what I want to achieve:
PROCESS 1:
mmap a file (actually a device, but it's okay to generate an example with a text file)
PROCESS 2: (not foked from process 1; just an independent process)
read the memory mapped by process 1
change some bits
write it to a new file
I've read several examples and documentations, but I still didn't find how to achieve this. What I'm missing is:
how can process 2 access the memory mapped by process 1, without knowing anything about the opened file?
how can I put the mmap content in a new file? I suppose I have to ftruncate a new file, mmap this file and memcpy the content of process 1 memory map to process 2 memory map (then msync)
Side info, I have a message queue opened between the two processes, so they can share some messages if needed (ex. the memory address/size, ...).
Any hints?
Thanks in advance!
MIX
This answer considers you are trying to do this stuff on linux/unix.
how can process 2 access the memory mapped by process 1, without knowing anything about the opened file?
Process 1 passes to mmap[1] the flag MAP_SHARED.
You can:
A) Share the file descriptor using unix domain sockets[2].
B) Send
the name of the file using the queues you mentioned at the end of
your message.
Process 2 opens mmap with the flag MAP_SHARED. Modifications to the mmaped memory in Process 1 will be visible for Process 2. If you need fine control of when the changes from process 1 are shown to process 2 you should control it with msync[3]
how can I put the mmap content in a new file? I suppose I have to
ftruncate a new file, mmap this file and memcpy the content of process
1 memory map to process 2 memory map (then msync)
Why just don't write the mmaped memory as regular memory with write?
[1]http://man7.org/linux/man-pages/man2/mmap.2.html
[2]Portable way to pass file descriptor between different processes
[3]http://man7.org/linux/man-pages/man2/msync.2.html

File pointers after returning from a forked child process

Is it normal, for a given file descriptor shared between a forked parent and child process, that the file position in the parent process remains the same after a child process reads from the same file descriptor?
This is happening for me. Here's the setup:
I am writing a C++ CGI program, so it reads http requests from stdin. When processing a multipart_form, I process stdin with an intermediary object (Multipart_Pull) that has a getc() method that detects the boundary strings and returns EOF at the end of a each field, so I can pretend a field's contents are a file. When the field is a file upload, I fork twice in order to pipe the results of Multipart_Pull::getc to the stdin of a child process that runs ssconvert to make a CSV file from an Excel file for further processing. I wrote the child process to leave the file pointer at the position where the parent could pick it up. The parent process uses wait() to ensure the child processes are done before continuing.
For testing while developing Multipart_Pull, I am faking stdin by opening a disk file that was copied from a real multipart_form request.
When faking stdin, and after the child process returns, the first character read in the parent process is the same first character that the child process read when it started. That is, the file pointer didn't move in the parent's copy of the file.
I have confirmed that the child process actually reads the data by running gdb and following the appropriate child process by using set follow-fork-mode child, and also confirmed the file position of the parent on return by comparing the characters read against the file from which the data is read.
When I am really reading from stdin, I don't expect that this will be a problem because (correct me if I'm wrong here), when you read a character from stdin, it's gone forever.
I realize that there are workarounds to solve this particular problem, the easiest being to just ignore any fields that follow a file upload on a multipart_form, i.e. the parent doesn't try to continue reading after the fork. However, I hate to cripple the production code or make unnecessary restrictions, and mainly because I really just want to understand what's happening.
Thanks in advance.
Is it normal, for a given file descriptor shared between a forked parent and child process, that the file position in the parent process remains the same after a child process reads from the same file descriptor?
Since you bring up fork(), I presume you are working with a POSIX-compliant system. Otherwise, the answer is subject to the specific details of your C++ implementation.
In POSIX terminology, file descriptors and streams are both types of "handles" on an underlying "open file description". There may be multiple distinct handles on the same open file description, potentially held by different processes. The fork() function is one way in which such a situation may arise.
In the event that multiple handles on the same open file description are manipulated, POSIX explicitly declares the results unspecified except under specific conditions. Your child processes satisfy their part of those requirements by closing their streams, either explicitly or as a consequence of normal process termination. According to POSIX, however, for the parent's subsequent use of its stream to have specified behavior, it "shall perform an lseek() or fseek() (as appropriate to the type of handle) to an appropriate location."
In other words, the parent process cannot rely on the child processes' manipulation of the file offset to automatically be visible to it, and in fact cannot rely on any particular offset at all after the children manipulate their copies of the stream.

fopen/fwrite and multi-threading?

fopen/fwrite and multi-threading?
Some multi-threading programs open the same file, each thread create a file pointer to that the file.
There is one thread created by a paricular program that will update the file at some random time, whilst other threads, created by a different program, will simply read the contents of the file.
I guess this create a racing/data-inconsistence problem there if the writing thread change contents in the file whilst other threads try to read the contents.
The problem here is the thread that update the file should compiled into a different exe program than the the program that creates threads that read the contents of the file, so within-program level thread control become impossible.
My solution is create a very small "flag" file on the harddisk to indicates 3 status of the file:
1) writing-thread is updating the contents of the file;
2) reading-thread are reading the contents of the file;
3) Neither 1) or 2);
Using this flag file to block threads whenever necessary.
Are there some more-compact/neat solution to this problem?
It might be easier to use a process-global "named" semaphore that all the processes know about. Plus then you could use thread/process-blocking semaphore mechanisms instead of spin-looping on file-open-close and file contents...

UNIX File Descriptors Reuse

Though I'm reasonably used to UNIX and have programmed on it for a long time, I'm not used to file manipulation.
I know that 0/1/2 file descriptors are standard in, out, and error. I'm aware that whenever a process opens a file, it is given a descriptor with the smallest value that isn't yet used - and I understand some things about using dup/dup2.
I get confused about file descriptors between processes though. Does each process have its own 0/1/2 descriptors for in/out/error or are those 3 descriptors shared between all processes? How come you can run 3 programs in 3 different shells and they all get only their programs output if they are shared?
If two programs open myfile.txt after start-up, will they both use file descriptor #3, or would the second program use #4 since 3 was taken?
I know I asked the same question in a couple ways there, but I just wanted to be clear. The more detail the better :) I've never run into problems with these things while programming, but I'm reading through a UNIX book to understand more and I suddenly realized this confused me a lot and I'd never though about it in detail before.
Each file descriptor is local to the process. However, some file descriptors can refer to the same file - for example, if you create a child process using fork() it would share the files opened by the parent. It would have its own set of file descriptors, initially identical to the parent's ones, but they can change with closing/dup-ing, etc.
If two programs open the same file, in general they get separate file descriptors, pointing to separate internal structures. However, using certain techniques (fork, FD passing, etc.) you can have file descriptors in different processes point to the same internal entity. Generally, though, it is not the case.
Answering your question, both programs would have FD #3 for newly open file.
File descriptors in Unix (normally) persist through fork() and exec() calls. So yes, several processes can share file descriptors.
For example, a shell might do a command like:
foo | bar
In this case, foo's stdout must be connected to bar's stdin. To do this, the shell will most likely use pipe() to create reader- and writer file descriptors. It fork()s twice. The descriptors persist. The fork() which will call up foo, will close(1); dup(writer_fd); to make writer_fd descriptor 1. It will then exec(), and process foo will output to the pipe we created. For bar, we close(0); dup(reader); then exec(). And voila, foo will output to bar.
Don't confuse the file descriptors with the resources they represent. You can have ten different processes, each with a file descriptor of '3' open, and each refer to a different open file. When a process does I/O using its file descriptor, the OS knows which process is doing the I/O and is able to disambiguate which file is being referred to.