stream to byte array - c++

I'm creating a program in native C++ (no clr). I'm using a toolkit which converts data and normally writes it to a file or stdout.
The issue is that I want to write it to an array and I don't know the size which will be sent.
The toolkit requires a paramter "FILE *" and cannot be modified.
Basically working code:
FILE * ofile = fopen("yourfile.dat", "wb");
toolkit::function(ofile);
fclose(ofile);
to std out the first line would be
FILE * ofile = stdout;
What I want now, is that I have can perform the function end in the end have a pointer to an array op byte (e.g. char *) and the size of it.
I've been looking around an can find the sollution.
First writing to a file is not an option.

If I got you right, you want a FILE* object that will store all bytes that were written to it in a memory buffer, right?
fmemopen does exactly this job, but is POSIX.1-2008 and according to its manpage not widely available.

Related

Knowing current compressed file size using gzwrite (zlib)

I'm using zlib for c++.
Quote from
http://refspecs.linuxbase.org/LSB_3.0.0/LSB-PDA/LSB-PDA/zlib-gzwrite-1.html regarding gzwrite function:
The gzwrite() function shall write data to the compressed file referenced by file, which shall have been opened in a write mode (see gzopen() and gzdopen()). On entry, buf shall point to a buffer containing len bytes of uncompressed data. The gzwrite() function shall compress this data and write it to file. The gzwrite() function shall return the number of uncompressed bytes actually written.
I interpret this as the return value will NOT tell me how much larger the file became when writing. Only how much data was compressed into the file.
The only way to know how large the file is would then be to close it, and read the size from the file system. I have a requirement to only continue to write to the file until it reaches a certain size. Can this be achieved without closing the file?
A workaround would be to write until the uncompressed size reaches my limit and then close the file, read the size from file system and update my best guess of file size based on that, and then re-open the file and continue writing. This would make me close and open the file a few times towards the end (as I'm approaching the size limit).
Another workaround, which would give more of an estimate (which is not what I want really) would be to write until uncompressed size reaches the limit, close the file, read the file size from the file system and calculate the compression ratio so far. The I can use this compression ratio to calculate a new limit for uncompressed file size where the compression should get me down to the limit for the compressed file size. If I repeat this the estimate would improve, but again, not what I'm looking for.
Are there better options?
Preferred option would be if zlib could tell me the compressed file size while the file is still open. I don't see why this information would not be available inside zlib at this point, since compression happens when I call gzwrite and not when i close the file.
zlib provides the function gzoffset(), which does exactly what you're asking.
If for some reason you are stuck with a version of zlib that is more than about eight years old, when gzoffset() was added, then this is easy to do with gzdopen(). You open the output file with fopen() or open(), and provide the file descriptor (using fileno() and dup() if you used fopen()), and then provide that descriptor to gzdopen(). Then you can use ftell() or lseek() at any time to see how much as been written. Be careful to not try to double-close the descriptor. See the comments for gzdopen().
You can work around this issue by using a pipe. The idea is to write the compressed data into a pipe. After that, you read the data from the other end of the pipe, count it and write it to the actual file.
To set this up you need to first open the file to write to via a simple open. Then create a pipe via pipe2 and initialize zlib by passing one of the pipe descriptors to gzdopen:
int out = open("/path/to/file", O_WRONLY | O_CREAT | O_TRUNC);
int p[2];
pipe2(p, O_NONBLOCK);
gzFile zFile = gzdopen(p[0], "w");
You can now write the data first to the pipe and then splice it from the pipe to the out file:
gzwrite(zFile, buf, 1024); //or any other length
size_t bytesWritten = 0;
do {
bytesWritten = splice(p[1], NULL, out, NULL, 1024, SPLICE_F_NONBLOCK | SPLICE_F_MORE);
} while(bytesWritten == 1024);
As you can see, you now have the bytesWritten to tell you how much data was actually written. Simply sum it up in another variable and stop splicing as soon as you have written as much data as you need to (or just splice it in one go by writing everything to the zFile and the splice once with the amount of data you are allowed to store as the fifth parameter. If you want to not compress uneccessary data, simply do it in chunks as shown above).
A note on splice: Splice is linux specific, and is basically just a very efficient copy. You can always replace it with a simple "read and write" combo, i.e. read data from fd[1] into a buffer and then write the data from that buffer into out - splice is just faster and less code.

Write at specific position at a file with open()

Hello I am trying to simulate two programs that send and receive files in C++ from the network, something like client and server. To begin with I have to split a file to pages of 4096 bytes and send it to the other program in order to create the file. The way I send and receive files through the network is by write and read. So in the client programm I must create a function tha receives the packages and puts them into a file. I cannot figure a way to put the packages in to the file. For example I a file has 2 pages I must create another file using these 2 pages. Also i cannot know if they come in order so I must create the file and put them in the right position.
/*consider the connections are ok and the file's name is at char* name*/
int file=open(name,"O_CREAT | O_WRONLY,0666);
char buffer[4096];
int pagenumber;
for(int i=0;i<page_number;i++){
read(socket,&pagenumber,sizeof(int));
read(socket,buffer,sizeof(int));
write(file(pagenumber*4096),buffer,4096);
}
This code works for pagenumber=0 but for pagenumber=1 nothing happens! Can you help me? Thanks in advance!
To write at a certain position in the file you must use lseek
off_t lseek(int fd, off_t offset, int whence);
It takes the descriptor, the offset and the final parameter is a constant in these:
SEEK_SET The offset is set to offset bytes.
SEEK_CUR The offset is set to its current location plus offset bytes.
SEEK_END The offset is set to the size of the file plus offset bytes.
If you know how big is the file going to be, you can use ftruncate for it.
int ftruncate(int fd, off_t length);
Anyway even if you create a file that is huge, since most filesystems on Linux support sparse files, the actual file on disk will be the sum of the blocks that have been written.
The first argument to write() is a filedescriptor, which you optained with open(). So it should be
int file = open(...);
...
write(file,buffer,4096);
not
write(file(pagenumber*4096),buffer,4096);
Regarding the question as to how to write at a specific position. You can prepare the file beforehand with write, and then use seek() to position the file where you want to write at. For a description of seek you can look here.
Mario, first of all, lets no rely on garbage in 'pagenumber' to continue the loop (which is happening when loop boundary condition is checked here for the first time). Now, if you are writing page number '0' and then page following it, pagenumber will be initialized to 0 and your loop will come out. Also, please check bytes written and read in write and read system calls respectively.
try pwrite
int file=open(name,"O_CREAT | O_WRONLY,0666);
char buffer[4096];
int pagenumber;
for(int i=0;i<page_number;i++){
read(socket,&pagenumber,sizeof(int));
read(socket,buffer,sizeof(int));
pwrite(file,buffer,4096,4096*i);
}

What does fd represent when typing: int fd = open("file");?

I am looking at I/O operations in C++ and I have a question.
When opening a file like:
#include <fcntl.h>
int main() {
unsigned char buffer[16];
int fd = open (argv[1], O_RDONLY);
read(fd, buffer, sizeof(buffer));
return 0;
}
How can the variable fd represent a file as an integer when passing it to the open method? Is it repesenting a file in current folder? If I print the ´fd´variable, it prints 3. What does that mean?
Ps. I know there are several other ways to handle files, like stdio.h, fstream etc but that is out of the scope of this question. Ds.
How can the variable fd represent a file as an integer when passing it to the open method?
It's a handle that identifies the open file; it's generally called a file descriptor, hence the name fd.
When you open the file, the operating system creates some resources that are needed to access it. These are stored in some kind of data structure (perhaps a simple array) that uses an integer as a key; the call to open returns that integer so that when you pass it read, the operating system can use it to find the resources it needs.
Is it repesenting a file in current folder?
It's representing the file that you opened; its filename was argv[1], the first of the arguments that was passed to the program when it was launched. If that file doesn't exist, or open failed for some reason, then it has the value -1 and doesn't represent any file; you really should check for that before you try to do anything with it.
If I print the fd variable, it prints 3. What does that mean?
It doesn't have any particular meaning; but it has that value because it was the fourth file (or file-like thing) that was opened, after the input (0), output (1) and error (2) streams that are used by cin, cout and cerr in C++.
Because that is the index of the table of resources stored for your current process.
Each process has it own resources table, so you just need to pass the index to read/write/etc function
Generally, a file descriptor is an index for an entry in a kernel-resident data structure containing the details of all open files. In POSIX this data structure is called a file descriptor table, and each process has its own file descriptor table. The user application passes the abstract key to the kernel through a system call, and the kernel will access the file on behalf of the application, based on the key. The application itself cannot read or write the file descriptor table directly.
from: http://en.wikipedia.org/wiki/File_descriptor
open() returns the file descriptor of the file which is the C type int. To know more about File Descriptor refer http://en.wikipedia.org/wiki/File_descriptor.
"fd" stands for file descriptor. It is a value identifying a file. It is often an index (in the global table), an offset, or a pointer. Different APIs use different types. WinAPI, for example, uses different types of handles (HANDLE, HGDI, etc.), which are essentially typedefs for int/void*/long, and so on.
Using naked types like "int" is usually not a good idea, but if the implementation tells you to do so (like POSIX in this case), you should keep it.
The simplified answer is that fd is just an index into some array of file descriptors.
When most processes are started, they are given three open file descriptors to begin with: stdin (0), stdout (1), and stderr (2). So when you open your first file, the next available array entry is 3.

Emulate a file pointer C++?

I am trying to load a bitmap from an archive. The bitmap class I have takes a character pointer to a filename and then loads it if it is in the same directory. The bitmap loading class is well tested and I don't want to mess with it too much. Problem is it uses a file pointer to load and do all of its file manipulation. Is there any way to emulate a file pointer and actually have it read from a chunk in memory instead?
Sorry if this is a bizarre question.
Refactor it and create functions that takes the exact same parameters as before : If you used fopen, fread and fseek that read from disk, create mopen, mread and mseek that read file from memory. You'll only have to fix the name of the functions.
It should be easy without risk and code won't look like an dirty hack in the end.
You can also use a pipe. A pipe is a piece of memory where you can read and write using file primitives. Which is basically what you want
(Assuming POSIX Operating system)
create a pipe:
int p[2];
pipe(p);
use fdopen() to turn the pipe file descriptor into a FILE*
FILE *emulated_file = fdopen(p[0], "r");
then write whatever you want to the write end of the pipe :
write(p[1], 17 ,"whatevereyouwant");
Now :
buf[32];
fread(&buf,1,32, emulated_file);
cout<<buf<<endl;
willl output "whateveryouwant".
Check out John Ratcliff's File Interface replacement for standard file I/O. It supports the feature you need.
You'll still need to refactor the bitmap loading code to use the new interface. However, this interface supports loading from file on disk, or memory chunk in memory (as well as writing to file on disk, or to expandable memory chunks).

feof() returning true when EOF is not reached

I'm trying to read from a file at a specific offset (simplified version):
typedef unsigned char u8;
FILE *data_fp = fopen("C:\\some_file.dat", "r");
fseek(data_fp, 0x004d0a68, SEEK_SET); // move filepointer to offset
u8 *data = new u8[0x3F0];
fread(data, 0x3F0, 1, data_fp);
delete[] data;
fclose(data_fp);
The problem becomes, that data will not contain 1008 bytes, but 529 (seems random). When it reaches 529 bytes, calls to feof(data_fp) will start returning true.
I've also tried to read in smaller chunks (8 bytes at a time) but it just looks like it's hitting EOF when it's not there yet.
A simple look in a hex editor shows there are plenty of bytes left.
Opening a file in text mode, like you're doing, makes the library translate some of the file contents to other stuff, potentially triggering a unwarranted EOF or bad offset calculations.
Open the file in binary mode by passing the "b" option to the fopen call
fopen(filename, "rb");
Is the file being written to in parallel by some other application? Perhaps there's a race condition, so that the file ends at wherever the read stops, when the read is running, but later when you inspect it the rest has been written. That would explain the randomness, too.
Maybe it's a difference between textual and binary file. If you're on Windows, newlines are CRLF, which is two characters in file, but converted to only one when read. Try using fopen(..., "rb")
I can't see your link from work, but if your computer claims no more bytes exist, I'd tend to believe it. Why don't you print the size of the file rather than doing things by hand in a hex editor?
Also, you'd be better off using level 2 I/O the f-calls are ancient C ugliness, and you're using C++ since you have new.
int fh =open(filename, O_RDONLY);
struct stat s;
fstat(fh, s);
cout << "size=" << hex << s.st_size << "\n";
Now do your seeking and reading using level 2 I/O calls, which are faster anyway, and let's see what the size of the file really is.