Caching images in c++. Using buffer_body or other things instead of file_body? - c++

I have slightly modified version of this https://www.boost.org/doc/libs/develop/libs/beast/example/http/server/async/http_server_async.cpp.
What it does:
According to the correctness of the request it returns the required image or an error.
What I'm going to do:
I want to keep frequently requesting images in local cache like an LRU cache to decrease response time
What I've tried:
I wanted to use buffer_body instead of file_body but some difficulties occurred with respond part, so I discarded this idea.
I tried to decode an png image to std::string, I thought this way I could keep it in std::unordered_map easier, but again problems arose with response part of the code
Here is the response part:
http::response<http::file_body> res {
std::piecewise_construct,
std::make_tuple(std::move(body)),
std::make_tuple(http::status::ok, req.version()) };
res.set(http::field::content_type, "image/png");
res.content_length(size);
res.keep_alive(req.keep_alive());
return send(std::move(res));
If doing it by encoding and decoding the image as string is ok I provide below the code where I read it to a string:
std::unordered_map<std::string, std::string> cache;
std::string load_file_contents(const std::string& filepath)
{
static const size_t MAX_LOAD_DATA_SIZE = 1024 * 1024 * 8 ; // 8 Mbytes.
std::string result;
static const size_t BUFF_SIZE = 8192; // 8 Kbytes
char buf[BUFF_SIZE];
FILE* file = fopen( filepath.c_str(), "rb" ) ;
if ( file != NULL )
{
size_t n;
while( result.size() < MAX_LOAD_DATA_SIZE )
{
n = fread( buf, sizeof(char), BUFF_SIZE, file);
if (n == 0)
break;
result.append(buf, n);
}
fclose(file);
}
return result;
}
template<class Body, class Allocator, class Send>
void handle_request(
beast::string_view doc_root,
http::request<Body, http::basic_fields<Allocator>>&& req,
Send&& send)
{
.... // skipping this part not to paste all the code
if(cache.find(path) == cache.end())
{
// if not in cache
std::ifstream image(path.c_str(), std::ios::binary);
// not in the cache and could open, so get it and decode it as a binary file
cache.emplace(path, load_file_contents(path));
}
.... // repsonse part (provided above) response should take from cache
}
ANY HELP WILL BE APPRECIATED! THANK YOU!

Sometimes there is no need to cache theseĀ files, for example, in my case changing file_body to vector_body or string_body were enough to speed up respond time almost twice

Related

Decoding with OGG/Vorbis gives no sound

I'd like to play an Ogg/Vorbis audio/video file, but right now I can't get to read audio from a file.
My algorithm to read audio is:
Initialize required structures:
vorbis_info info;
vorbis_comment comment;
vorbis_dsp_state dsp;
vorbis_block block;
vorbis_info_init(&info);
vorbis_comment_init(&comment);
Read headers:
Call vorbis_synthesis_headerin(&info, &comment, packet); until it returns OV_ENOTVORBIS
vorbis_synthesis_init(&dsp, &info);
vorbis_block_init(&dsp, &block);
Pass the first non-header packet to function below
Parse packets, do it until audioReady == READY
putPacket(ogg_packet *packet) {
int ret;
ret = vorbis_synthesis(&block, packet);
if( ret == 0 ) {
ret = vorbis_synthesis_blockin(&dsp, &block);
audioReady = (ret == 0) ? READY : NOT_READY;
} else {
audioReady = NOT_READY;
}
}
Read audio data:
float** rawData = nullptr;
readSamples = vorbis_synthesis_pcmout(&dsp, &rawData);
if( readSamples == 0 ) {
audioReady = NOT_READY;
return;
}
int16_t* newData = new int16_t[readSamples * getChannels()];
int16_t* dst = newData;
for(unsigned int i=0; i<readSamples; ++i) {
for(unsigned char ch=0; ch<getChannels(); ++ch) {
*(dst++) = math::clamp<int16_t>(rawData[ch][i]*32767 + 0.5f, -32767, 32767);
}
}
audioData.push_back({readSamples * getChannels() , newData});
vorbis_synthesis_read(&dsp, static_cast<int>(readSamples));
audioReady = NOT_READY;
This is where it gets wrong: after examining the newData contents it is revealed that it contains a very silent sound. I doubt if it is the right data which means somewhere along my algorithm I did something wrong.
I tried to find some examples of similar programs, but all I got are sources with very spaghetti-like code, which seems to do the same algorithm like mine, yet they do their job. (There is one off such library: https://github.com/icculus/theoraplay )
Is there any reason why I'm getting (almost) silence in my application?
PS: If you are wondering if I might getting OGG packets wrong, then I assure you this part of my code is working right, as I'm also reading video data from the same file, using the same code and it shows the video right.
I've found it: during reading packets I assumed that one Ogg Page = one Ogg packet. I's wrong: for audio one page can contain many packets. To read it properly one has to make a code like:
do{
putPacket(&packet);
}while( ogg_stream_packetout(&state, &packet) == 1 );
I did this mistake because for video packets (which I did first) a page contains only one packet.

Can not "read" anything through the FUSE file system

I use fuse to build my own file system in MIT 6.824 lab, and the read operation is implemented in this function.
void
fuseserver_read(fuse_req_t req, fuse_ino_t ino, size_t size,
off_t off, struct fuse_file_info *fi)
{
std::string buf;
int r;
if ((r = yfs->read(ino, size, off, buf)) == yfs_client::OK) {
char* retbuf = (char *)malloc(buf.size());
memcpy(retbuf,buf.data(),buf.size());
//Print the information of the result.
printf("debug read in fuse: the content of %lu is %s, size %lu\n",ino,retbuf, buf.size());
fuse_reply_buf(req,retbuf,buf.size());
} else {
fuse_reply_err(req, ENOENT);
}
//global definition
//struct fuse_lowlevel_ops fuseserver_oper;
//In main()
// fuseserver_oper.read = fuseserver_read;
I print the information of the buf before it return.
The write operation is also implemented, of course.
Then I run a simple test to read out some words.
//test.c
int main(){
//./yfs1 is the mount point of my filesystem
int fd = open("./yfs1/test-file",O_RDWR | O_CREAT,0777);
char* buf = "123";
char* readout;
readout = (char *)malloc(3);
int writesize = write(fd,buf,3);
int readsize = read(fd,readout,3);
printf("%s,%d\n",buf,writesize);
printf("%s,%d\n",readout,readsize);
close(fd);
}
I can get nothing by read(fd,readout,3), but the information printed by the fuseserver_read shows that the buffer is read out successfully before fuse_reply_buf
$ ./test
123,3
,0
debug read in fuse: the content of 2 is 123, size 3
So why the read() in test.c can not read anything from my file system??
Firstly, I've made a mistake to write my test file. The file pointer will point to the end of the file after "write" and of course can read nothing later. So simply reopen the file can make the test work.
Secondly, before read() operation of FUSE, the FUSE will getattr() first and truncate the result of the read() operation with the "size" attribute of the file. So it must be very careful to manipulate the attribute of a file.
There is also a need to notify that you have finished reading by sending an empty buffer, as an "EOF". You can do that by using reply_buf_limited.
Take a look at hello_ll example in the fuse source tree:
static void tfs_read(fuse_req_t req, fuse_ino_t ino, size_t size,
off_t off, struct fuse_file_info *fi) {
(void) fi;
assert(ino == FILE_INO);
reply_buf_limited(req, file_contents, file_size, off, size);
}
static int reply_buf_limited(fuse_req_t req, const char *buf, size_t bufsize,
off_t off, size_t maxsize)
{
if (off < bufsize)
return fuse_reply_buf(req, buf + off,
min(bufsize - off, maxsize));
else
return fuse_reply_buf(req, NULL, 0);
}

c++ Protocol buffer sending over network [duplicate]

I'm trying to read / write multiple Protocol Buffers messages from files, in both C++ and Java. Google suggests writing length prefixes before the messages, but there's no way to do that by default (that I could see).
However, the Java API in version 2.1.0 received a set of "Delimited" I/O functions which apparently do that job:
parseDelimitedFrom
mergeDelimitedFrom
writeDelimitedTo
Are there C++ equivalents? And if not, what's the wire format for the size prefixes the Java API attaches, so I can parse those messages in C++?
Update:
These now exist in google/protobuf/util/delimited_message_util.h as of v3.3.0.
I'm a bit late to the party here, but the below implementations include some optimizations missing from the other answers and will not fail after 64MB of input (though it still enforces the 64MB limit on each individual message, just not on the whole stream).
(I am the author of the C++ and Java protobuf libraries, but I no longer work for Google. Sorry that this code never made it into the official lib. This is what it would look like if it had.)
bool writeDelimitedTo(
const google::protobuf::MessageLite& message,
google::protobuf::io::ZeroCopyOutputStream* rawOutput) {
// We create a new coded stream for each message. Don't worry, this is fast.
google::protobuf::io::CodedOutputStream output(rawOutput);
// Write the size.
const int size = message.ByteSize();
output.WriteVarint32(size);
uint8_t* buffer = output.GetDirectBufferForNBytesAndAdvance(size);
if (buffer != NULL) {
// Optimization: The message fits in one buffer, so use the faster
// direct-to-array serialization path.
message.SerializeWithCachedSizesToArray(buffer);
} else {
// Slightly-slower path when the message is multiple buffers.
message.SerializeWithCachedSizes(&output);
if (output.HadError()) return false;
}
return true;
}
bool readDelimitedFrom(
google::protobuf::io::ZeroCopyInputStream* rawInput,
google::protobuf::MessageLite* message) {
// We create a new coded stream for each message. Don't worry, this is fast,
// and it makes sure the 64MB total size limit is imposed per-message rather
// than on the whole stream. (See the CodedInputStream interface for more
// info on this limit.)
google::protobuf::io::CodedInputStream input(rawInput);
// Read the size.
uint32_t size;
if (!input.ReadVarint32(&size)) return false;
// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit =
input.PushLimit(size);
// Parse the message.
if (!message->MergeFromCodedStream(&input)) return false;
if (!input.ConsumedEntireMessage()) return false;
// Release the limit.
input.PopLimit(limit);
return true;
}
Okay, so I haven't been able to find top-level C++ functions implementing what I need, but some spelunking through the Java API reference turned up the following, inside the MessageLite interface:
void writeDelimitedTo(OutputStream output)
/* Like writeTo(OutputStream), but writes the size of
the message as a varint before writing the data. */
So the Java size prefix is a (Protocol Buffers) varint!
Armed with that information, I went digging through the C++ API and found the CodedStream header, which has these:
bool CodedInputStream::ReadVarint32(uint32 * value)
void CodedOutputStream::WriteVarint32(uint32 value)
Using those, I should be able to roll my own C++ functions that do the job.
They should really add this to the main Message API though; it's missing functionality considering Java has it, and so does Marc Gravell's excellent protobuf-net C# port (via SerializeWithLengthPrefix and DeserializeWithLengthPrefix).
I solved the same problem using CodedOutputStream/ArrayOutputStream to write the message (with the size) and CodedInputStream/ArrayInputStream to read the message (with the size).
For example, the following pseudo-code writes the message size following by the message:
const unsigned bufLength = 256;
unsigned char buffer[bufLength];
Message protoMessage;
google::protobuf::io::ArrayOutputStream arrayOutput(buffer, bufLength);
google::protobuf::io::CodedOutputStream codedOutput(&arrayOutput);
codedOutput.WriteLittleEndian32(protoMessage.ByteSize());
protoMessage.SerializeToCodedStream(&codedOutput);
When writing you should also check that your buffer is large enough to fit the message (including the size). And when reading, you should check that your buffer contains a whole message (including the size).
It definitely would be handy if they added convenience methods to C++ API similar to those provided by the Java API.
IsteamInputStream is very fragile to eofs and other errors that easily occurs when used together with std::istream. After this the protobuf streams are permamently damaged and any already used buffer data is destroyed. There are proper support for reading from traditional streams in protobuf.
Implement google::protobuf::io::CopyingInputStream and use that together with CopyingInputStreamAdapter. Do the same for the output variants.
In practice a parsing call ends up in google::protobuf::io::CopyingInputStream::Read(void* buffer, int size) where a buffer is given. The only thing left to do is read into it somehow.
Here's an example for use with Asio synchronized streams (SyncReadStream/SyncWriteStream):
#include <google/protobuf/io/zero_copy_stream_impl_lite.h>
using namespace google::protobuf::io;
template <typename SyncReadStream>
class AsioInputStream : public CopyingInputStream {
public:
AsioInputStream(SyncReadStream& sock);
int Read(void* buffer, int size);
private:
SyncReadStream& m_Socket;
};
template <typename SyncReadStream>
AsioInputStream<SyncReadStream>::AsioInputStream(SyncReadStream& sock) :
m_Socket(sock) {}
template <typename SyncReadStream>
int
AsioInputStream<SyncReadStream>::Read(void* buffer, int size)
{
std::size_t bytes_read;
boost::system::error_code ec;
bytes_read = m_Socket.read_some(boost::asio::buffer(buffer, size), ec);
if(!ec) {
return bytes_read;
} else if (ec == boost::asio::error::eof) {
return 0;
} else {
return -1;
}
}
template <typename SyncWriteStream>
class AsioOutputStream : public CopyingOutputStream {
public:
AsioOutputStream(SyncWriteStream& sock);
bool Write(const void* buffer, int size);
private:
SyncWriteStream& m_Socket;
};
template <typename SyncWriteStream>
AsioOutputStream<SyncWriteStream>::AsioOutputStream(SyncWriteStream& sock) :
m_Socket(sock) {}
template <typename SyncWriteStream>
bool
AsioOutputStream<SyncWriteStream>::Write(const void* buffer, int size)
{
boost::system::error_code ec;
m_Socket.write_some(boost::asio::buffer(buffer, size), ec);
return !ec;
}
Usage:
AsioInputStream<boost::asio::ip::tcp::socket> ais(m_Socket); // Where m_Socket is a instance of boost::asio::ip::tcp::socket
CopyingInputStreamAdaptor cis_adp(&ais);
CodedInputStream cis(&cis_adp);
Message protoMessage;
uint32_t msg_size;
/* Read message size */
if(!cis.ReadVarint32(&msg_size)) {
// Handle error
}
/* Make sure not to read beyond limit of message */
CodedInputStream::Limit msg_limit = cis.PushLimit(msg_size);
if(!msg.ParseFromCodedStream(&cis)) {
// Handle error
}
/* Remove limit */
cis.PopLimit(msg_limit);
Here you go:
#include <google/protobuf/io/zero_copy_stream_impl.h>
#include <google/protobuf/io/coded_stream.h>
using namespace google::protobuf::io;
class FASWriter
{
std::ofstream mFs;
OstreamOutputStream *_OstreamOutputStream;
CodedOutputStream *_CodedOutputStream;
public:
FASWriter(const std::string &file) : mFs(file,std::ios::out | std::ios::binary)
{
assert(mFs.good());
_OstreamOutputStream = new OstreamOutputStream(&mFs);
_CodedOutputStream = new CodedOutputStream(_OstreamOutputStream);
}
inline void operator()(const ::google::protobuf::Message &msg)
{
_CodedOutputStream->WriteVarint32(msg.ByteSize());
if ( !msg.SerializeToCodedStream(_CodedOutputStream) )
std::cout << "SerializeToCodedStream error " << std::endl;
}
~FASWriter()
{
delete _CodedOutputStream;
delete _OstreamOutputStream;
mFs.close();
}
};
class FASReader
{
std::ifstream mFs;
IstreamInputStream *_IstreamInputStream;
CodedInputStream *_CodedInputStream;
public:
FASReader(const std::string &file), mFs(file,std::ios::in | std::ios::binary)
{
assert(mFs.good());
_IstreamInputStream = new IstreamInputStream(&mFs);
_CodedInputStream = new CodedInputStream(_IstreamInputStream);
}
template<class T>
bool ReadNext()
{
T msg;
unsigned __int32 size;
bool ret;
if ( ret = _CodedInputStream->ReadVarint32(&size) )
{
CodedInputStream::Limit msgLimit = _CodedInputStream->PushLimit(size);
if ( ret = msg.ParseFromCodedStream(_CodedInputStream) )
{
_CodedInputStream->PopLimit(msgLimit);
std::cout << mFeed << " FASReader ReadNext: " << msg.DebugString() << std::endl;
}
}
return ret;
}
~FASReader()
{
delete _CodedInputStream;
delete _IstreamInputStream;
mFs.close();
}
};
I ran into the same issue in both C++ and Python.
For the C++ version, I used a mix of the code Kenton Varda posted on this thread and the code from the pull request he sent to the protobuf team (because the version posted here doesn't handle EOF while the one he sent to github does).
#include <google/protobuf/message_lite.h>
#include <google/protobuf/io/zero_copy_stream.h>
#include <google/protobuf/io/coded_stream.h>
bool writeDelimitedTo(const google::protobuf::MessageLite& message,
google::protobuf::io::ZeroCopyOutputStream* rawOutput)
{
// We create a new coded stream for each message. Don't worry, this is fast.
google::protobuf::io::CodedOutputStream output(rawOutput);
// Write the size.
const int size = message.ByteSize();
output.WriteVarint32(size);
uint8_t* buffer = output.GetDirectBufferForNBytesAndAdvance(size);
if (buffer != NULL)
{
// Optimization: The message fits in one buffer, so use the faster
// direct-to-array serialization path.
message.SerializeWithCachedSizesToArray(buffer);
}
else
{
// Slightly-slower path when the message is multiple buffers.
message.SerializeWithCachedSizes(&output);
if (output.HadError())
return false;
}
return true;
}
bool readDelimitedFrom(google::protobuf::io::ZeroCopyInputStream* rawInput, google::protobuf::MessageLite* message, bool* clean_eof)
{
// We create a new coded stream for each message. Don't worry, this is fast,
// and it makes sure the 64MB total size limit is imposed per-message rather
// than on the whole stream. (See the CodedInputStream interface for more
// info on this limit.)
google::protobuf::io::CodedInputStream input(rawInput);
const int start = input.CurrentPosition();
if (clean_eof)
*clean_eof = false;
// Read the size.
uint32_t size;
if (!input.ReadVarint32(&size))
{
if (clean_eof)
*clean_eof = input.CurrentPosition() == start;
return false;
}
// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit = input.PushLimit(size);
// Parse the message.
if (!message->MergeFromCodedStream(&input)) return false;
if (!input.ConsumedEntireMessage()) return false;
// Release the limit.
input.PopLimit(limit);
return true;
}
And here is my python2 implementation:
from google.protobuf.internal import encoder
from google.protobuf.internal import decoder
#I had to implement this because the tools in google.protobuf.internal.decoder
#read from a buffer, not from a file-like objcet
def readRawVarint32(stream):
mask = 0x80 # (1 << 7)
raw_varint32 = []
while 1:
b = stream.read(1)
#eof
if b == "":
break
raw_varint32.append(b)
if not (ord(b) & mask):
#we found a byte starting with a 0, which means it's the last byte of this varint
break
return raw_varint32
def writeDelimitedTo(message, stream):
message_str = message.SerializeToString()
delimiter = encoder._VarintBytes(len(message_str))
stream.write(delimiter + message_str)
def readDelimitedFrom(MessageType, stream):
raw_varint32 = readRawVarint32(stream)
message = None
if raw_varint32:
size, _ = decoder._DecodeVarint32(raw_varint32, 0)
data = stream.read(size)
if len(data) < size:
raise Exception("Unexpected end of file")
message = MessageType()
message.ParseFromString(data)
return message
#In place version that takes an already built protobuf object
#In my tests, this is around 20% faster than the other version
#of readDelimitedFrom()
def readDelimitedFrom_inplace(message, stream):
raw_varint32 = readRawVarint32(stream)
if raw_varint32:
size, _ = decoder._DecodeVarint32(raw_varint32, 0)
data = stream.read(size)
if len(data) < size:
raise Exception("Unexpected end of file")
message.ParseFromString(data)
return message
else:
return None
It might not be the best looking code and I'm sure it can be refactored a fair bit, but at least that should show you one way to do it.
Now the big problem: It's SLOW.
Even when using the C++ implementation of python-protobuf, it's one order of magnitude slower than in pure C++. I have a benchmark where I read 10M protobuf messages of ~30 bytes each from a file. It takes ~0.9s in C++, and 35s in python.
One way to make it a bit faster would be to re-implement the varint decoder to make it read from a file and decode in one go, instead of reading from a file and then decoding as this code currently does. (profiling shows that a significant amount of time is spent in the varint encoder/decoder). But needless to say that alone is not enough to close the gap between the python version and the C++ version.
Any idea to make it faster is very welcome :)
Just for completeness, I post here an up-to-date version that works with the master version of protobuf and Python3
For the C++ version it is sufficient to use the utils in delimited_message_utils.h, here a MWE
#include <google/protobuf/io/zero_copy_stream_impl.h>
#include <google/protobuf/util/delimited_message_util.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
template <typename T>
bool writeManyToFile(std::deque<T> messages, std::string filename) {
int outfd = open(filename.c_str(), O_WRONLY | O_CREAT | O_TRUNC);
google::protobuf::io::FileOutputStream fout(outfd);
bool success;
for (auto msg: messages) {
success = google::protobuf::util::SerializeDelimitedToZeroCopyStream(
msg, &fout);
if (! success) {
std::cout << "Writing Failed" << std::endl;
break;
}
}
fout.Close();
close(outfd);
return success;
}
template <typename T>
std::deque<T> readManyFromFile(std::string filename) {
int infd = open(filename.c_str(), O_RDONLY);
google::protobuf::io::FileInputStream fin(infd);
bool keep = true;
bool clean_eof = true;
std::deque<T> out;
while (keep) {
T msg;
keep = google::protobuf::util::ParseDelimitedFromZeroCopyStream(
&msg, &fin, nullptr);
if (keep)
out.push_back(msg);
}
fin.Close();
close(infd);
return out;
}
For the Python3 version, building on #fireboot 's answer, the only thing thing that needed modification is the decoding of raw_varint32
def getSize(raw_varint32):
result = 0
shift = 0
b = six.indexbytes(raw_varint32, 0)
result |= ((ord(b) & 0x7f) << shift)
return result
def readDelimitedFrom(MessageType, stream):
raw_varint32 = readRawVarint32(stream)
message = None
if raw_varint32:
size = getSize(raw_varint32)
data = stream.read(size)
if len(data) < size:
raise Exception("Unexpected end of file")
message = MessageType()
message.ParseFromString(data)
return message
Was also looking for a solution for this. Here's the core of our solution, assuming some java code wrote many MyRecord messages with writeDelimitedTo into a file. Open the file and loop, doing:
if(someCodedInputStream->ReadVarint32(&bytes)) {
CodedInputStream::Limit msgLimit = someCodedInputStream->PushLimit(bytes);
if(myRecord->ParseFromCodedStream(someCodedInputStream)) {
//do your stuff with the parsed MyRecord instance
} else {
//handle parse error
}
someCodedInputStream->PopLimit(msgLimit);
} else {
//maybe end of file
}
Hope it helps.
Working with an objective-c version of protocol-buffers, I ran into this exact issue. On sending from the iOS client to a Java based server that uses parseDelimitedFrom, which expects the length as the first byte, I needed to call writeRawByte to the CodedOutputStream first. Posting here to hopegully help others that run into this issue. While working through this issue, one would think that Google proto-bufs would come with a simply flag which does this for you...
Request* request = [rBuild build];
[self sendMessage:request];
}
- (void) sendMessage:(Request *) request {
//** get length
NSData* n = [request data];
uint8_t len = [n length];
PBCodedOutputStream* os = [PBCodedOutputStream streamWithOutputStream:outputStream];
//** prepend it to message, such that Request.parseDelimitedFrom(in) can parse it properly
[os writeRawByte:len];
[request writeToCodedOutputStream:os];
[os flush];
}
Since I'm not allowed to write this as a comment to Kenton Varda's answer above; I believe there is a bug in the code he posted (as well as in other answers which have been provided). The following code:
...
google::protobuf::io::CodedInputStream input(rawInput);
// Read the size.
uint32_t size;
if (!input.ReadVarint32(&size)) return false;
// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit =
input.PushLimit(size);
...
sets an incorrect limit because it does not take into account the size of the varint32 which has already been read from input. This can result in data loss/corruption as additional bytes are read from the stream which may be part of the next message. The usual way of handling this correctly is to delete the CodedInputStream used to read the size and create a new one for reading the payload:
...
uint32_t size;
{
google::protobuf::io::CodedInputStream input(rawInput);
// Read the size.
if (!input.ReadVarint32(&size)) return false;
}
google::protobuf::io::CodedInputStream input(rawInput);
// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit =
input.PushLimit(size);
...
You can use getline for reading a string from a stream, using the specified delimiter:
istream& getline ( istream& is, string& str, char delim );
(defined in the header)

Libzip - read file contents from zip

I using libzip to work with zip files and everything goes fine, until i need to read file from zip
I need to read just a whole text files, so it will be great to achieve something like PHP "file_get_contents" function.
To read file from zip there is a function "int
zip_fread(struct zip_file *file, void *buf, zip_uint64_t nbytes)".
Main problem what i don't know what size of buf must be and how many nbytes i must read (well i need to read whole file, but files have different size). I can just do a big buffer to fit them all and read all it's size, or do a while loop until fread return -1 but i don't think it's rational option.
You can try using zip_stat to get file size.
http://linux.die.net/man/3/zip_stat
I haven't used the libzip interface but from what you write it seems to look very similar to a file interface: once you got a handle to the stream you keep calling zip_fread() until this function return an error (ir, possibly, less than requested bytes). The buffer you pass in us just a reasonably size temporary buffer where the data is communicated.
Personally I would probably create a stream buffer for this so once the file in the zip archive is set up it can be read using the conventional I/O stream methods. This would look something like this:
struct zipbuf: std::streambuf {
zipbuf(???): file_(???) {}
private:
zip_file* file_;
enum { s_size = 8196 };
char buffer_[s_size];
int underflow() {
int rc(zip_fread(this->file_, this->buffer_, s_size));
this->setg(this->buffer_, this->buffer_,
this->buffer_ + std::max(0, rc));
return this->gptr() == this->egptr()
? traits_type::eof()
: traits_type::to_int_type(*this->gptr());
}
};
With this stream buffer you should be able to create an std::istream and read the file into whatever structure you need:
zipbuf buf(???);
std::istream in(&buf);
...
Obviously, this code isn't tested or compiled. However, when you replace the ??? with whatever is needed to open the zip file, I'd think this should pretty much work.
Here is a routine I wrote that extracts data from a zip-stream and prints out a line at a time. This uses zlib, not libzip, but if this code is useful to you, feel free to use it:
#
# compile with -lz option in order to link in the zlib library
#
#include <zlib.h>
#define Z_CHUNK 2097152
int unzipFile(const char *fName)
{
z_stream zStream;
char *zRemainderBuf = malloc(1);
unsigned char zInBuf[Z_CHUNK];
unsigned char zOutBuf[Z_CHUNK];
char zLineBuf[Z_CHUNK];
unsigned int zHave, zBufIdx, zBufOffset, zOutBufIdx;
int zError;
FILE *inFp = fopen(fName, "rbR");
if (!inFp) { fprintf(stderr, "could not open file: %s\n", fName); return EXIT_FAILURE; }
zStream.zalloc = Z_NULL;
zStream.zfree = Z_NULL;
zStream.opaque = Z_NULL;
zStream.avail_in = 0;
zStream.next_in = Z_NULL;
zError = inflateInit2(&zStream, (15+32)); /* cf. http://www.zlib.net/manual.html */
if (zError != Z_OK) { fprintf(stderr, "could not initialize z-stream\n"); return EXIT_FAILURE; }
*zRemainderBuf = '\0';
do {
zStream.avail_in = fread(zInBuf, 1, Z_CHUNK, inFp);
if (zStream.avail_in == 0)
break;
zStream.next_in = zInBuf;
do {
zStream.avail_out = Z_CHUNK;
zStream.next_out = zOutBuf;
zError = inflate(&zStream, Z_NO_FLUSH);
switch (zError) {
case Z_NEED_DICT: { fprintf(stderr, "Z-stream needs dictionary!\n"); return EXIT_FAILURE; }
case Z_DATA_ERROR: { fprintf(stderr, "Z-stream suffered data error!\n"); return EXIT_FAILURE; }
case Z_MEM_ERROR: { fprintf(stderr, "Z-stream suffered memory error!\n"); return EXIT_FAILURE; }
}
zHave = Z_CHUNK - zStream.avail_out;
zOutBuf[zHave] = '\0';
/* copy remainder buffer onto line buffer, if not NULL */
if (zRemainderBuf) {
strncpy(zLineBuf, zRemainderBuf, strlen(zRemainderBuf));
zBufOffset = strlen(zRemainderBuf);
}
else
zBufOffset = 0;
/* read through zOutBuf for newlines */
for (zBufIdx = zBufOffset, zOutBufIdx = 0; zOutBufIdx < zHave; zBufIdx++, zOutBufIdx++) {
zLineBuf[zBufIdx] = zOutBuf[zOutBufIdx];
if (zLineBuf[zBufIdx] == '\n') {
zLineBuf[zBufIdx] = '\0';
zBufIdx = -1;
fprintf(stdout, "%s\n", zLineBuf);
}
}
/* copy some of line buffer onto the remainder buffer, if there are remnants from the z-stream */
if (strlen(zLineBuf) > 0) {
if (strlen(zLineBuf) > strlen(zRemainderBuf)) {
/* to minimize the chance of doing another (expensive) malloc, we double the length of zRemainderBuf */
free(zRemainderBuf);
zRemainderBuf = malloc(strlen(zLineBuf) * 2);
}
strncpy(zRemainderBuf, zLineBuf, zBufIdx);
zRemainderBuf[zBufIdx] = '\0';
}
} while (zStream.avail_out == 0);
} while (zError != Z_STREAM_END);
/* close gzip stream */
zError = inflateEnd(&zStream);
if (zError != Z_OK) {
fprintf(stderr, "could not close z-stream!\n");
return EXIT_FAILURE;
}
if (zRemainderBuf)
free(zRemainderBuf);
fclose(inFp);
return EXIT_SUCCESS;
}
With any streaming you should consider the memory requirements of your app.
A good buffer size is large, but you do not want to have too much memory in use depending on your RAM usage requirements. A small buffer size will require you call your read and write operations more times which are expensive in terms of time performance. So, you need to find a buffer in the middle of those two extremes.
Typically I use a size of 4096 (4KB) which is sufficiently large for many purposes. If you want, you can go larger. But at the worst case size of 1 byte, you will be waiting a long time for you read to complete.
So to answer your question, there is no "right" size to pick. It is a choice you should make so that the speed of your app and the memory it requires are what you need.

Why am i getting not great compression with zlib?

By default i output a file that is 120mb. Here i have a input and output buffer thats double that. When i run this code i get an output of 10mb (default gives me 11mb). When i zip the raw 128mb file i get 700kb. Why am i getting 11mb instead of <1mb like zip gives me? Using 7-zip manager i asked it to compress with gzip using deflate and it give me a 4.6mb file which is still much smaller. I'm very curious why this is happening. It feels like i am doing something wrong.
static UInt32 len=0;
static char buf[1024*1024*256];
static char buf2[1024*1024*256];
static char *curbuf=buf;
z_stream strm;
void initzstuff()
{
strm.zalloc = 0;
strm.zfree = 0;
strm.opaque = 0;
int ret = deflateInit(&strm, Z_BEST_COMPRESSION);
if (ret != Z_OK)
return;
}
void flush_file(MyOstream o, bool end){
strm.avail_in = len;
strm.next_in = (UInt8*)buf;
strm.avail_out = sizeof(buf2);
strm.next_out = (UInt8*)buf2;
int ret = deflate(&strm, (end ? Z_FINISH : Z_NO_FLUSH));
assert(ret != Z_STREAM_ERROR);
int have = sizeof(buf2) - strm.avail_out;
fwrite(buf2, 1, have, o);
if(end)
{
(void)deflateEnd(&strm);
}
len=0;
curbuf=buf;
/*
fwrite(buf, 1, len, o);
len=0;
curbuf=buf;
//*/
}
Zip can use Deflate64 or other compression algorithm (like BZip2), and when your file is very sparce that can result in such difference.
Also, standard for ZLib tells only about the format of compressed data, and how the data is compressed is chosen by archivators, so 7-zip can use some heuristics which makes the ouput smaller.
Probably chunk-size? zlib.net/zpipe.c gives a fairly good example.
You'll probably get better performance too if you chunk rather than try to do the entire stream.