Capnp: Move to previous position in BufferedInputStreamWrapper - c++

I have a binary file with multiple Capnp messages which I want to read. Reading sequentially works well, but I have the use-case, that I want to jump to a previously known position.
The data sequential images with metadata including there timestamp. I would like to have the possibility to jump back and forth (like in a video player).
This is what I have tried:
int fd = open(filePath.c_str(), O_RDONLY);
kj::FdInputStream fdStream(fd);
kj::BufferedInputStreamWrapper bufferedStream(fdStream);
for (;;) {
kj::ArrayPtr<const kj::byte> framePtr = bufferedStream.tryGetReadBuffer();
if (framePtr != nullptr) {
capnp::PackedMessageReader message(bufferedStream);
// This should reset the buffer to the last read message?
bufferedStream.read((void*)framePtr.begin(), framePtr.size());
// ...
}
else {
// reset to beginning
}
}
But I get this error:
capnp/serialize.c++:186: failed: expected segmentCount < 512; Message has too many segments
I was assuming that tryGetReadBuffer() returns the position and size of the next packed message. But then again, how is the BufferedInputStream supposed to know what "a message" is.
Question: How can I get position and size of messages and read these messages later on from the BufferedInputStreamWrapper?
Alternative: Reading the whole file once, take ownership of the data and save it to a vector. Such as described here (https://groups.google.com/forum/#!topic/capnproto/Kg_Su1NnPOY). Better solution all along?

BufferedInputStream is not seekable. In order to seek backwards, you will need to destroy bufferedStream and then seek the underlying file descriptor, e.g. with lseek(), then create a new buffered stream.
Note that reading the current position (in order to pass to lseek() later to go back) is also tricky if a buffered stream is present, since the buffered stream will have read past the position in order to fill the buffer. You could calculate it by subtracting off the buffer size, e.g.:
// Determine current file position, so that we can seek to it later.
off_t messageStartPos = lseek(fd, 0, SEEK_CUR) -
bufferedStream.tryGetReadBuffer().size();
// Read a message
{
capnp::PackedMessageReader message(bufferedStream);
// ... do stuff with `message` ...
// Note that `message` is destroyed at this }. It's important that this
// happens before querying the buffered stream again, because
// PackedMesasgeReader updates the buffer position in its destructor.
}
// Determine the end position of the message (if you need it?).
off_t messageEndPos = lseek(fd, 0, SEEK_CUR) -
bufferedStream.tryGetReadBuffer().size();
bufferedStream.read((void*)framePtr.begin(), framePtr.size());
FWIW, the effect of this line is "advance past the current buffer an on to the next one". You don't want to do this when using PackedMessageReader, as it will already have advanced the stream itself. In fact, because PackedMessageReader might have already advanced past the current buffer, framePtr may now be invalid, and this line might segfault.
Alternative: Reading the whole file once, take ownership of the data and save it to a vector. Such as described here (https://groups.google.com/forum/#!topic/capnproto/Kg_Su1NnPOY). Better solution all along?
If the file fits comfortably in RAM, then reading it upfront is usually fine, and probably a good idea if you expect to be seeking back and forth a lot.
Another option is to mmap() it. This makes it appear as if the file is in RAM, but the operating system will actually read in the contents on-demand when you access them.
However, I don't think this will actually simplify the code much. Now you'll be dealing with an ArrayInputStream (a subclass of BufferedInputStream). To "seek" you would create a new ArrayInputStream based on a slice of the buffer starting at the point where you want to start.

Related

How to reduce the size of a fstream file in C++

What is the best way to cut the end off of a fstream file in C++ 11
I am writing a data persistence class to store audio for my audio editor. I have chosen to use fstream (possibly a bad idea) to create a random access binary read write file.
Each time I record a little sound into my file I simply tack it onto the end of this file. Another internal data structure / file, contains pointers into the audio file and keeps track of edits.
When I undo a recording action and then do something else the last bit of the audio file becomes irrelevant. It is not referenced in the current state of the document and you cannot redo yourself back to a state where you can ever see it again. So I want to chop this part of the file off and start recording at the new end. I don’t need to cut out bitts in the middle, just off the end.
When the user quits this file will remain and be reloaded when they open the project up again.
In my application I expect the user to do this all the time and being able to do this might save me as much as 30% of the file size. This file will be long, potentially very, very long, so rewriting it to another file every time this happens is not a viable option.
Rewriting it when the user saves could be an option but it is still not that attractive.
I could stick a value at the start that says how long the file is supposed to be and then overwrite the end to recycle the space but in the mean time. If I wanted to continually update the data store file in case of crash this would mean I would be rewriting the start over and over again. I worry that this might be bad for flash drives. I could also recomputed the end of the useful part of the file on load, by analyzing the pointer file but in the mean time I would be wasting all that space potentially, and that is complicated.
Is there a simple call for this in the fstream API?
Am I using the wrong library? Note I want to stick to something generic STL I preferred, so I can keep the code as cross platform as possible.
I can’t seem to find it in the documentation and have looked for many hours. It is not the end of the earth but would make this a little simpler and potentially more efficient. Maybe I am just missing it somehow.
Thanks for your help
Andre’
Is there a simple call for this in the fstream API?
If you have C++17 compiler then use std::filesystem::resize_file. In previous standards there was no such thing in standard library.
With older compilers ... on Windows you can use SetFilePointer or SetFilePointerEx to set the current position to the size you want, then call SetEndOfFile. On Unixes you can use truncate or ftruncate. If you want portable code then you can use Boost.Filesystem. From it is simplest to migrate to std::filesystem in the future because the std::filesystem was mostly specified based on it.
If you have variable, that contains your current position in the file, you could seek back for the length of your "unnedeed chunk", and just continue to write from there.
// Somewhere in the begining of your code:
std::ofstream *file = new std::ofstream();
file->open("/home/user/my-audio/my-file.dat");
// ...... long story of writing data .......
// Lets say, we are on a one millin byte now (in the file)
int current_file_pos = 1000000;
// Your last chunk size:
int last_chunk_size = 12345;
// Your chunk, that you are saving
char *last_chunk = get_audio_chunk_to_save();
// Writing chunk
file->write(last_chunk, last_chunk_size);
// Moving pointer:
current_file_pos += last_chunk_size;
// Lets undo it now!
current_file_pos -= last_chunk_size;
file->seekp(current_file_pos);
// Now you can write new chunks from the place, where you were before writing and unding the last one!
// .....
// When you want to finally write file to disk, you just close it
file->close();
// And when, truncate it to the size of current_file_pos
truncate("/home/user/my-audio/my-file.dat", current_file_pos);
Unfortunatelly, you'll have to write a crossplatform function truncate, that would call SetEndOfFile in windows, and truncate in linux. It's easy enough with using preprocessor macros.

Seek in libarchive, how to reset header?

Is it possible to read decompressed file once again?
Let imagine I used archive_read_next_header(a, &entry),
and I read an unknown number of bytes using archive_read_data(a, ptr_to_buffer, buffer_size). Right now I want to reset it and start reading again from the beginning. I trying to override seekoff(std::streamoff off, std::ios_base::seekdir way, std::ios_base::openmode which). I understand that might be impossible to just seek inside decompressed data because of inner work of compression algorithms, and data is not stored anywhere except a limited number of bytes in libarchive internal buffer.
The idea is to just reset it all, and read std::streamoff off bytes, that way I could create backward seek. Forward seek would be easy, just read std::streamoff off bytes. It's really inefficient, but let's hope, seek won't be used much.
Whole structure archive was initialized that way:
archive_read_set_read_callback(a, read_callback);
archive_read_set_callback_data(a, container);
archive_read_set_seek_callback(a, seek_callback);
archive_read_set_skip_callback(a, skip_callback);
int r = (archive_read_open1(a));
where container contains most of all std::istream, and callbacks are functions which manipulate that stream.
Template of what I would like to achive
`
std::streampos seek_beg(std::streamoff off) {
if(off >= 0) {
// read/skip 'off' bytes
} else {
// reset (a)
// read/skip 'off' bytes
}
// return position
}
`
also my underflow() method is implemented that way:
`
int underflow() {
int r = archive_read_data(ar, ptr, BUFFER_SIZE);
if (r < 0) {
throw std::runtime_error("ERROR");
} else if (r == 0) {
return std::streambuf::traits_type::eof();
} else {
setg(ptr, ptr, ptr + r);
}
return std::streambuf::traits_type::to_int_type(*ptr);
}
`
Libarchive documentation, more precisely, wishlist in libarchive wiki on GitHub says:
A few people have asked for the ability to efficiently "re-read"
particular archive entries. This is a tricky subject. For many
formats, the performance gains from this would be very modest. For
example, with a little performance work, the seeking Zip reader could
support very fast re-reading from the beginning since it only involves
re-parsing the central directory. The cases where there would be real
gains (e.g., tar.gz) are going to be very difficult to handle. The
most likely implementation would be some form of checkpointing so that
clients can explicitly ask for a checkpoint object and then restore
back to that checkpoint. The checkpoint object could be complex if you
have a series of stacked read filters plus state in the format handler
itself.
As I see seeking in archives with help of libarchive is not right now possible, so a solution to my problem was to remember all read data only if I have some suspicion that I would want to re-read it, and alternatively push it back to stream.

C++ Winsock Download File Cut off HTTP Header

I'm downloading the bytes of a file from the web using winsock2. so good so far.
I have the problem that I download my bytes including the http header which I don't need and which causes troubles in my files bytecodes.
Example:
I know I can find the position where the header is ending by finding "\r\n\r\n".
But somehow I can't find or at least cut it... :(
int iResponseBytes = 0;
ofstream ofDownloadedFile;
ofDownloadedFile.open(pathonclient, ios::binary);
do {
iResponseBytes = recv(this->Socket, responseBuffer, pageBufferSize, 0);
if (iResponseBytes > 0) // if bytes received
{
ofDownloadedFile.write(responseBuffer, pageBufferSize);
}
else if (iResponseBytes == 0) //Done
{
break;
}
else //fail
{
cout << "Error while downloading" << endl;
break;
}
} while (iResponseBytes > 0);
I tried searching the array / the pointer using strncmp etc.
Hopefully someone can help me.
Best greetings
You have no guarantees, whatsoever, that the \r\n\r\n sequence will be received completely within a single recv() call.
For example, the first recv() call could end up reading everything up until the first two characters of the sequence, \r\n, then your code runs around the loop again, and the second time recv() gets called it receives the remaining \r\n for the initial two bytes received (followed by the first part of the actual content). A small possibility that this might happen, but it cannot be ignored, and must be correctly handled.
If your goal is to trim everything up until the \r\n\r\n, your current approach is not going to work very well.
Instead, what you should do is invest some time studying how file stream buffering actually works. Pontificate, for a moment, how std::istream/std::ostream read/write large chunks of data at a time, but they provide a character-oriented interface. std::istream, for example, reads a buffer's full of file data at a time, placing it into an internal buffer, which your code can then retrieve one character at a time (if it wishes to). How does that work? Think about it.
To do this correctly, you need to implement the same algorithm yourself: recv() from the socket a buffer at a time, then provide a byte-oriented interface, to return the received contents one byte at a time.
Then, the main code becomes a simple loop, reading the streamed socket contents one byte at a time, at which point discarding everything up until the code sees \r\n\r\n becomes trivial (although there are still a few non-obvious gotchas in doing this right, but that can be a new question).
Of course, once the \r\n\r\n gets processed, it is certainly possible to optimize things going forward, by flushing out whatever's still buffered internally, to the output file, and then resume reading from the socket a whole buffer-at-a-time, and copying it to the output file without burning CPU cycles dealing with the byte-oriented interface.

Reading the contents of the dynamically created buffer giving wrong address of memory for second call to function

Using Visual c++ i am trying to read an image from the stream I do this by storing the stream in a buffer. I know that at what location in buffer i have the image.(its the first file in the stream and i know the size of the image so i read and store the image in buffer until the size of file and thats correct.I am sure about it) For the first time when i read the image there is no problem it works correctly. The code is as follows-
ReadFromStream(IStream *pStream )
{//this pStream stream contents the file contents
ULONG cbRead;
int size=5348928;
char *buffer = new char[size + 1];
HRESULT hr = pStream->Read(buffer, size, &cbRead ); //here we store the stream in buffer.Now all the data is in buffer.
buffer[cbRead ] = L'\0';
int location = 512 ;
char FileContents[107643];
memcpy(FileContents,&buffer[location],SizeOfFile); // here i have the contents of the image in File contents.I am sure about it its location. For the first call to ReadFromStream() function it works fine.
}
But my situation is that i have to read the image second time also on the same execution of the program. so what happens when the second time i call to ReadFromStream() function(with the same stream value i can see on debugging the stream value is same.) even then the buffer show the contents which are at location far away from the image stored in it (i mean the stream had Image File as the first file but in the second call to ReadFromStream() the buffer points to the data of another file but the first file was actually the image file). So the quetion is how this memory is alloctaed up to this unexpected file ?
Why the buffer shows the data which is at location very far from the starting index.(For the second call to ReadFromStream() also it should show image file as the starting file. why it show the file which is far away from the Image file ??? ) As I guess some memory is allocated and which must be deleted ?? but where and how i don't know ..am i right ??
may be its because in the second call to ReadFromStream(); this buffer has already some memory allocated i mean for the second call the buffer points to address which don't start from zero (but it should do it as i think)
Streams are like normal files in that they're sequential in nature and once you've read data, the "read cursor" is advanced and another call to Read() will read more data, and so on.
To seek backwards to re-read the same data again, use IStream::Seek(). For example, to go back to the start of the stream:
LARGE_INTEGER li = { 0 };
HRESULT hr = pStream->Seek(li, STREAM_SEEK_SET, NULL);
Not all streams support seeking so you should always check the return code for error.

Is ftruncate() asynchronous?

I am attempting to write a class in C++ that provides a means of atomically appending to a file, even for the case of power failure mid write.
First, I write my current file position (a 64 offset from the beginning of the file, in bytes) to a separate journal file. Then, I write the requested data to the end of the date file. Finally, I call ftruncate() (setting the truncated size to 0) on the journal file.
The main idea is that if this class is ever asked to open a file that has a non empty journal file, then you know a write was interrupted and you can read the position of the last write from the journal file and fseek to that spot. You lose the last partial write, but the file should not be corrupted.
Unfortunately, it seems like ftruncate() is asynchronous. In practice, even if I call fflush() and fsync() after ftruncate I see the journal grow to up to hundreds of bytes while doing lots of writes. It always ultimately ends up at 0, but I expected to see it at either size 0 or size 8 at all times.
Is it possible to make ftruncate completely synchronous? Or is there a better way to use the journal?
ftruncate() does not change your file descriptor's write offset in the file. If you are leaving the file open and writing the next length after calling ftruncate(), then what's happening is the file's offset is still increasing. When you write, it resets the length of the file to be at the offset and then writes your bytes there.
Probably what you want to do is call lseek(fd, 0, SEEK_SET) after you call ftruncate() so that the next write to the file will take place at the beginning of the file.