How to pass big amount of data(unknown size, minimum 10GB) using gRPC - c++

From a source I am getting stream data which size will not be known before the final processing, but the minimum is 10 GB. I have to send this large amount of data using gRPC.
Need to mention here, this large amount data will be passed through the gRPC while the processing of the streaming is done. In this step, I have thought to store all the value in a vector.
Regarding sending large amount of data I have tried to get idea and found:
This where it is mentioned not to pass large data using gRPC. Here, mentioned to use any other message protocol where I have limitation to use something else rather than gRPC(at least till today).
From this post I have tried to know how chunk message can be sent but I am not sure is it related to my problem or not.
First post where I have found a blog to stream data using go language.
This one the presentation using python language of this post. But it is also incomplete.
gRPC example could be a good start bt cannot decode due to lack of C++ knowledge
From there, a huge Update I have done in the question. But the main theme of the question is not changed
What I have done so far and some points about my project. The github repo is available here.
A Unary rpc is present in the project
I know that my new Bi directional rpc will take some time. I want that the Unary rpc will not wait for the completion of the Bi directional rpc. Right now I am thinking in a synchronous way where Unary rpc is waiting to pass it's status for the streaming one completion.
I am avoiding the unnecessary lines in C++ code. But giving whole proto files
syntax = "proto3";
package demo_grpc;
message Large_Data {
repeated int32 large_data_collection = 1 [packed=true];
int32 data_chunk_number = 2;
syntax = "proto3";
package demo_grpc;
import "myproto/big_data.proto";
message S_Response {
string name = 1;
string street = 2;
string zip = 3;
string city = 4;
string country = 5;
int32 double_init_val = 6;
message C_Request {
uint32 choose_area = 1;
string name = 2;
int32 init_val = 3;
service AddressBook {
rpc GetAddress(C_Request) returns (S_Response) {}
rpc Stream_Chunk_Service(stream Large_Data) returns (stream Large_Data) {}
#include <big_data.pb.h>
#include <addressbook.grpc.pb.h>
#include <grpcpp/grpcpp.h>
#include <grpcpp/create_channel.h>
#include <iostream>
#include <numeric>
using namespace std;
// This function prompts the user to set value for the required area
void Client_Request(demo_grpc::C_Request &request_)
// do processing for unary rpc. Intentionally avoided here
// According to Client Request this function display the value of protobuf message
void Server_Response(demo_grpc::C_Request &request_, const demo_grpc::S_Response &response_)
// do processing for unary rpc. Intentionally avoided here
// following function make large vector and then chunk to send via stream from client to server
void Stream_Data_Chunk_Request(demo_grpc::Large_Data &request_,
demo_grpc::Large_Data &response_,
uint64_t preferred_chunk_size_in_kibyte)
// A dummy vector which in real case will be the large data set's container
std::vector<int32_t> large_vector;
// irerate it now for 1024*10 times
for(int64_t i = 0; i < 1024 * 10; i++)
uint64_t preferred_chunk_size_in_kibyte_holds_integer_num = 0; // 1 chunk how many intger will contain that num will come here
// total chunk number will be updated here
uint32_t total_chunk = total_chunk_counter(large_vector.size(), preferred_chunk_size_in_kibyte, preferred_chunk_size_in_kibyte_holds_integer_num);
// A temp counter to trace the index of the large_vector
int32_t temp_count = 0;
// loop will start if the total num of chunk is greater than 0. After each iteration total_chunk will be decremented
while(total_chunk > 0)
for (int64_t i = temp_count * preferred_chunk_size_in_kibyte_holds_integer_num; i < preferred_chunk_size_in_kibyte_holds_integer_num + temp_count * preferred_chunk_size_in_kibyte_holds_integer_num; i++)
// the repeated field large_data_collection is taking value from the large_vector
std::string ip_address = "localhost:50051";
auto channel = grpc::CreateChannel(ip_address, grpc::InsecureChannelCredentials());
std::unique_ptr<demo_grpc::AddressBook::Stub> stub = demo_grpc::AddressBook::NewStub(channel);
grpc::ClientContext context;
std::shared_ptr<::grpc::ClientReaderWriter< ::demo_grpc::Large_Data, ::demo_grpc::Large_Data> > stream(stub->Stream_Chunk_Service(&context));
// While the size of each chunk is eached then this repeated field is cleared. I am not sure before this
// value can be transfered to server or not. But my assumption is saying that it should be done
int main(int argc, char* argv[])
std::string client_address = "localhost:50051";
std::cout << "Address of client: " << client_address << std::endl;
// The following part for the Unary RPC
demo_grpc::C_Request query;
demo_grpc::S_Response result;
// This part for the streaming chunk data (Bi directional Stream RPC)
demo_grpc::Large_Data stream_chunk_request_;
demo_grpc::Large_Data stream_chunk_response_;
uint64_t preferred_chunk_size_in_kibyte = 64;
Stream_Data_Chunk_Request(stream_chunk_request_, stream_chunk_response_, preferred_chunk_size_in_kibyte);
// Call
auto channel = grpc::CreateChannel(client_address, grpc::InsecureChannelCredentials());
std::unique_ptr<demo_grpc::AddressBook::Stub> stub = demo_grpc::AddressBook::NewStub(channel);
grpc::ClientContext context;
grpc::Status status = stub->GetAddress(&context, query, &result);
// the following status is for unary rpc as far I have understood the structure
if (status.ok())
Server_Response(query, result);
std::cout << status.error_message() << std::endl;
return 0;
heper function total_chunk_counter
#include <cmath>
uint32_t total_chunk_counter(uint64_t num_of_container_content,
uint64_t preferred_chunk_size_in_kibyte,
uint64_t &preferred_chunk_size_in_kibyte_holds_integer_num)
uint64_t cotainer_size_in_kibyte = (32ULL * num_of_container_content) / 1024;
preferred_chunk_size_in_kibyte_holds_integer_num = (num_of_container_content * preferred_chunk_size_in_kibyte) / cotainer_size_in_kibyte;
float total_chunk = static_cast<float>(num_of_container_content) / preferred_chunk_size_in_kibyte_holds_integer_num;
return std::ceil(total_chunk);
server.cpp which is totally incomplete
#include <myproto/big_data.pb.h>
#include <myproto/addressbook.grpc.pb.h>
#include <grpcpp/grpcpp.h>
#include <grpcpp/server_builder.h>
#include <iostream>
class AddressBookService final : public demo_grpc::AddressBook::Service {
virtual ::grpc::Status GetAddress(::grpc::ServerContext* context, const ::demo_grpc::C_Request* request, ::demo_grpc::S_Response* response)
switch (request->choose_area())
// do processing for unary rpc. Intentionally avoided here
std::cout << "Information of " << request->choose_area() << " is sent to Client" << std::endl;
return grpc::Status::OK;
// Bi-directional streaming chunk data
virtual ::grpc::Status Stream_Chunk_Service(::grpc::ServerContext* context, ::grpc::ServerReaderWriter< ::demo_grpc::Large_Data, ::demo_grpc::Large_Data>* stream)
// stream->Large_Data;
return grpc::Status::OK;
void RunServer()
std::cout << "grpc Version: " << grpc::Version() << std::endl;
std::string server_address = "localhost:50051";
std::cout << "Address of server: " << server_address << std::endl;
grpc::ServerBuilder builder;
builder.AddListeningPort(server_address, grpc::InsecureServerCredentials());
AddressBookService my_service;
std::unique_ptr<grpc::Server> server(builder.BuildAndStart());
int main(int argc, char* argv[])
return 0;
In summary my desire
I need to pass the content of large_vector with the repeated field large_data_collection of message Large_Data. I should chunk the size of the large_vector and populate the repeated field large_data_collection with that chunk size
In server side all chunk will be concatenate by keeping the exact order of the large_vector. Some processing will be done on them (eg: double the value of each index). Then again whole data will be sent to the client as a chunk stream
Would be great if the present unary rpc don't wait for the completion of the bi-directional rpc
Solution with example would be really helpful. Advance thanks. The github repo is available here.


Reason for losing messeges over NNG sockets in raw mode

Some context to my problem:
I need to establish an inter-process communication using C++ and sockets and I picked NNG library for that along with nngpp c++ wrapper. I need to use push/pull protocol so no contexts handling is available to me. I wrote some code based on raw example from nngpp demo. The difference here is that, by using push/pull protocol I split this into two separate programs. One for sending and one for receiving.
Problem descripion:
I need to receive let's say a thousand or more messages per second. For now, all messages are captured only when I send about 50/s. That is way too slow and I do believe it can be done faster. The faster I send, the more I lose. At the moment, when sending 1000msg/s I lose about 150 msgs.
Some words about the code
The code may be in C++17 standard. It is written in object-oriented manner so in the end I want to have a class with "receive" method that would simply give me the received messages. For now, I just print the results on screen. Below, I supply some parts of the project with descriptions:
NOTE msgItem is a struct like that:
struct msgItem {
nng::aio aio;
nng::msg msg;
nng::socket_view itemSock;
explicit msgItem(nng::socket_view sock) : itemSock(sock) {}
And it is taken from example mentioned above.
Callback function that is executed when message is received by one of the aio's (callback is passed in constructor of aio object). It aims at checking whether everything was ok with transmission, retrieving my Payload (just string for now) and passing it to queue while a flag is set. Then I want to print those messages from the queue using separate thread.
void ReceiverBase<Payload>::aioCallback(void *arg) try {
msgItem *msgItem = (struct msgItem *)arg;
Payload retMsg{};
auto result = msgItem->aio.result();
if (result != nng::error::success) {
throw nng::exception(result);
//Here we extract the message
auto msg = msgItem->aio.release_msg();
auto const *data = static_cast<typename Payload::value_type *>(msg.body().data());
auto const count = msg.body().size()/sizeof(typename Payload::value_type);
std::copy(data, data + count, std::back_inserter(retMsg));
std::lock_guard<std::mutex> lk(m_msgMx);
newMessageFlag = true;
} catch (const nng::exception &e) {
fprintf(stderr, "server_cb: %s: %s\n", e.who(), e.what());
} catch (...) {
fprintf(stderr, "server_cb: unknown exception\n");
Separate thread for listening to the flag change and printing. While loop at the end is for continuous work of the program. I use msgCounter to count successful message receival.
void ReceiverBase<Payload>::start() {
auto listenerLambda = [](){
std::string temp;
while (true) {
std::lock_guard<std::mutex> lg(m_msgMx);
if(newMessageFlag) {
temp = std::move(m_messageQueue.front());
std::cout << msgCounter << "\n";
newMessageFlag = false;
std::thread listenerThread (listenerLambda);
while (true) {
This is my sender application. I tweak the frequency of msg sending by changing the value in std::chrono::miliseconds(val).
int main (int argc, char *argv[])
std::string connection_address{"ipc:///tmp/async_demo1"};
std::string longMsg{" here normally I have some long test text"};
std::cout << "Trying connecting sender:";
StringSender sender(connection_address);
for (int i=0; i<1000; ++i) {
And this is receiver:
int main (int argc, char *argv[])
std::string connection_address{"ipc:///tmp/async_demo1"};
std::cout << "Trying connecting receiver:";
StringReceiver receiver(connection_address);
std::cout<< "Connection set up. \n";
return 0;
Nothing speciall in those two applications as You see. the setup method from StringReciver is something like that:
bool ReceiverBase<Payload>::setupConnection() {
m_connected = false;
try {
for (size_t i = 0; i < m_parallel; ++i) { = std::make_unique<msgItem>(m_sock);>aio =
m_connected = true;
for (size_t i = 0; i < m_parallel; ++i) {>itemSock.recv(>aio);
} catch (const nng::exception &e) {
printf("%s: %s\n", e.who(), e.what());
return m_connected;
Do You have any suggestions why the performance is so low? Do I use lock_guards properly here? What I want them to do is basically lock the flag and queue so only one side has access to it.
NOTE: Adding more listeners thread does not affect the performance either way.
NOTE2: newMessageFlag is atomic

serialize complex C++ structures between client and server

I am writing C++ ZeroMQ Client and server programs for the same platform. I need to trigger some functions with arguments on the server. The arguments are complex structures. I have just started to try this out. I am trying to fill a structure and fill it to a char* buffer to see if the bytes are filled out in sequence as per the structure.
But when i try to print the buffer, it prints garbage. Please advice what might be wrong. And is this the elegant way to do this? I cannot use gRPC or Protobuffs as the message contains complex structures.
struct employee {
uint8_t byt;
int arr[10] = {0};
int number;
uint32_t acct;
int main ()
// Prepare our context and socket
zmq::context_t context (1);
zmq::socket_t socket (context, ZMQ_PAIR);
struct employee *e = new employee;
e->byt = 0xff;
e->arr[0] = 15;
e->number = 25555;
e->acct = 45;
std::cout << "Connecting to hello world server…" << std::endl;
socket.connect ("tcp://localhost:5555");
char *temp = (char*)malloc(sizeof(employee));
zmq::message_t request(sizeof(employee));
char *temp1 = temp;
for (int i = 0;i<sizeof(employee);i++) {
memcpy ((void *),(void*)temp, sizeof(employee));
socket.send (request);
// Get the reply.
zmq::message_t reply;
socket.recv (&reply);
return 0;
Two points I would like to share here.
the buffer (temp) contains binary representation of your data structure. If you want to check if the content is meaningful, you can cast the pointer type back to its original
struct employee * employeePtr = static_cast< struct employee *>(temp);
cout << employeePtr ->number;
The way you de-serialize is okay when the object you are trying to
serialize occupies continuous memory. When it is not the case, you will have to handle them some other way (using stream for example). Examples of such cases include:
when you have pointer, shared_ptr etc. to some allocated memory
container classes

c++ Protocol buffer sending over network [duplicate]

I'm trying to read / write multiple Protocol Buffers messages from files, in both C++ and Java. Google suggests writing length prefixes before the messages, but there's no way to do that by default (that I could see).
However, the Java API in version 2.1.0 received a set of "Delimited" I/O functions which apparently do that job:
Are there C++ equivalents? And if not, what's the wire format for the size prefixes the Java API attaches, so I can parse those messages in C++?
These now exist in google/protobuf/util/delimited_message_util.h as of v3.3.0.
I'm a bit late to the party here, but the below implementations include some optimizations missing from the other answers and will not fail after 64MB of input (though it still enforces the 64MB limit on each individual message, just not on the whole stream).
(I am the author of the C++ and Java protobuf libraries, but I no longer work for Google. Sorry that this code never made it into the official lib. This is what it would look like if it had.)
bool writeDelimitedTo(
const google::protobuf::MessageLite& message,
google::protobuf::io::ZeroCopyOutputStream* rawOutput) {
// We create a new coded stream for each message. Don't worry, this is fast.
google::protobuf::io::CodedOutputStream output(rawOutput);
// Write the size.
const int size = message.ByteSize();
uint8_t* buffer = output.GetDirectBufferForNBytesAndAdvance(size);
if (buffer != NULL) {
// Optimization: The message fits in one buffer, so use the faster
// direct-to-array serialization path.
} else {
// Slightly-slower path when the message is multiple buffers.
if (output.HadError()) return false;
return true;
bool readDelimitedFrom(
google::protobuf::io::ZeroCopyInputStream* rawInput,
google::protobuf::MessageLite* message) {
// We create a new coded stream for each message. Don't worry, this is fast,
// and it makes sure the 64MB total size limit is imposed per-message rather
// than on the whole stream. (See the CodedInputStream interface for more
// info on this limit.)
google::protobuf::io::CodedInputStream input(rawInput);
// Read the size.
uint32_t size;
if (!input.ReadVarint32(&size)) return false;
// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit =
// Parse the message.
if (!message->MergeFromCodedStream(&input)) return false;
if (!input.ConsumedEntireMessage()) return false;
// Release the limit.
return true;
Okay, so I haven't been able to find top-level C++ functions implementing what I need, but some spelunking through the Java API reference turned up the following, inside the MessageLite interface:
void writeDelimitedTo(OutputStream output)
/* Like writeTo(OutputStream), but writes the size of
the message as a varint before writing the data. */
So the Java size prefix is a (Protocol Buffers) varint!
Armed with that information, I went digging through the C++ API and found the CodedStream header, which has these:
bool CodedInputStream::ReadVarint32(uint32 * value)
void CodedOutputStream::WriteVarint32(uint32 value)
Using those, I should be able to roll my own C++ functions that do the job.
They should really add this to the main Message API though; it's missing functionality considering Java has it, and so does Marc Gravell's excellent protobuf-net C# port (via SerializeWithLengthPrefix and DeserializeWithLengthPrefix).
I solved the same problem using CodedOutputStream/ArrayOutputStream to write the message (with the size) and CodedInputStream/ArrayInputStream to read the message (with the size).
For example, the following pseudo-code writes the message size following by the message:
const unsigned bufLength = 256;
unsigned char buffer[bufLength];
Message protoMessage;
google::protobuf::io::ArrayOutputStream arrayOutput(buffer, bufLength);
google::protobuf::io::CodedOutputStream codedOutput(&arrayOutput);
When writing you should also check that your buffer is large enough to fit the message (including the size). And when reading, you should check that your buffer contains a whole message (including the size).
It definitely would be handy if they added convenience methods to C++ API similar to those provided by the Java API.
IsteamInputStream is very fragile to eofs and other errors that easily occurs when used together with std::istream. After this the protobuf streams are permamently damaged and any already used buffer data is destroyed. There are proper support for reading from traditional streams in protobuf.
Implement google::protobuf::io::CopyingInputStream and use that together with CopyingInputStreamAdapter. Do the same for the output variants.
In practice a parsing call ends up in google::protobuf::io::CopyingInputStream::Read(void* buffer, int size) where a buffer is given. The only thing left to do is read into it somehow.
Here's an example for use with Asio synchronized streams (SyncReadStream/SyncWriteStream):
#include <google/protobuf/io/zero_copy_stream_impl_lite.h>
using namespace google::protobuf::io;
template <typename SyncReadStream>
class AsioInputStream : public CopyingInputStream {
AsioInputStream(SyncReadStream& sock);
int Read(void* buffer, int size);
SyncReadStream& m_Socket;
template <typename SyncReadStream>
AsioInputStream<SyncReadStream>::AsioInputStream(SyncReadStream& sock) :
m_Socket(sock) {}
template <typename SyncReadStream>
AsioInputStream<SyncReadStream>::Read(void* buffer, int size)
std::size_t bytes_read;
boost::system::error_code ec;
bytes_read = m_Socket.read_some(boost::asio::buffer(buffer, size), ec);
if(!ec) {
return bytes_read;
} else if (ec == boost::asio::error::eof) {
return 0;
} else {
return -1;
template <typename SyncWriteStream>
class AsioOutputStream : public CopyingOutputStream {
AsioOutputStream(SyncWriteStream& sock);
bool Write(const void* buffer, int size);
SyncWriteStream& m_Socket;
template <typename SyncWriteStream>
AsioOutputStream<SyncWriteStream>::AsioOutputStream(SyncWriteStream& sock) :
m_Socket(sock) {}
template <typename SyncWriteStream>
AsioOutputStream<SyncWriteStream>::Write(const void* buffer, int size)
boost::system::error_code ec;
m_Socket.write_some(boost::asio::buffer(buffer, size), ec);
return !ec;
AsioInputStream<boost::asio::ip::tcp::socket> ais(m_Socket); // Where m_Socket is a instance of boost::asio::ip::tcp::socket
CopyingInputStreamAdaptor cis_adp(&ais);
CodedInputStream cis(&cis_adp);
Message protoMessage;
uint32_t msg_size;
/* Read message size */
if(!cis.ReadVarint32(&msg_size)) {
// Handle error
/* Make sure not to read beyond limit of message */
CodedInputStream::Limit msg_limit = cis.PushLimit(msg_size);
if(!msg.ParseFromCodedStream(&cis)) {
// Handle error
/* Remove limit */
Here you go:
#include <google/protobuf/io/zero_copy_stream_impl.h>
#include <google/protobuf/io/coded_stream.h>
using namespace google::protobuf::io;
class FASWriter
std::ofstream mFs;
OstreamOutputStream *_OstreamOutputStream;
CodedOutputStream *_CodedOutputStream;
FASWriter(const std::string &file) : mFs(file,std::ios::out | std::ios::binary)
_OstreamOutputStream = new OstreamOutputStream(&mFs);
_CodedOutputStream = new CodedOutputStream(_OstreamOutputStream);
inline void operator()(const ::google::protobuf::Message &msg)
if ( !msg.SerializeToCodedStream(_CodedOutputStream) )
std::cout << "SerializeToCodedStream error " << std::endl;
delete _CodedOutputStream;
delete _OstreamOutputStream;
class FASReader
std::ifstream mFs;
IstreamInputStream *_IstreamInputStream;
CodedInputStream *_CodedInputStream;
FASReader(const std::string &file), mFs(file,std::ios::in | std::ios::binary)
_IstreamInputStream = new IstreamInputStream(&mFs);
_CodedInputStream = new CodedInputStream(_IstreamInputStream);
template<class T>
bool ReadNext()
T msg;
unsigned __int32 size;
bool ret;
if ( ret = _CodedInputStream->ReadVarint32(&size) )
CodedInputStream::Limit msgLimit = _CodedInputStream->PushLimit(size);
if ( ret = msg.ParseFromCodedStream(_CodedInputStream) )
std::cout << mFeed << " FASReader ReadNext: " << msg.DebugString() << std::endl;
return ret;
delete _CodedInputStream;
delete _IstreamInputStream;
I ran into the same issue in both C++ and Python.
For the C++ version, I used a mix of the code Kenton Varda posted on this thread and the code from the pull request he sent to the protobuf team (because the version posted here doesn't handle EOF while the one he sent to github does).
#include <google/protobuf/message_lite.h>
#include <google/protobuf/io/zero_copy_stream.h>
#include <google/protobuf/io/coded_stream.h>
bool writeDelimitedTo(const google::protobuf::MessageLite& message,
google::protobuf::io::ZeroCopyOutputStream* rawOutput)
// We create a new coded stream for each message. Don't worry, this is fast.
google::protobuf::io::CodedOutputStream output(rawOutput);
// Write the size.
const int size = message.ByteSize();
uint8_t* buffer = output.GetDirectBufferForNBytesAndAdvance(size);
if (buffer != NULL)
// Optimization: The message fits in one buffer, so use the faster
// direct-to-array serialization path.
// Slightly-slower path when the message is multiple buffers.
if (output.HadError())
return false;
return true;
bool readDelimitedFrom(google::protobuf::io::ZeroCopyInputStream* rawInput, google::protobuf::MessageLite* message, bool* clean_eof)
// We create a new coded stream for each message. Don't worry, this is fast,
// and it makes sure the 64MB total size limit is imposed per-message rather
// than on the whole stream. (See the CodedInputStream interface for more
// info on this limit.)
google::protobuf::io::CodedInputStream input(rawInput);
const int start = input.CurrentPosition();
if (clean_eof)
*clean_eof = false;
// Read the size.
uint32_t size;
if (!input.ReadVarint32(&size))
if (clean_eof)
*clean_eof = input.CurrentPosition() == start;
return false;
// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit = input.PushLimit(size);
// Parse the message.
if (!message->MergeFromCodedStream(&input)) return false;
if (!input.ConsumedEntireMessage()) return false;
// Release the limit.
return true;
And here is my python2 implementation:
from google.protobuf.internal import encoder
from google.protobuf.internal import decoder
#I had to implement this because the tools in google.protobuf.internal.decoder
#read from a buffer, not from a file-like objcet
def readRawVarint32(stream):
mask = 0x80 # (1 << 7)
raw_varint32 = []
while 1:
b =
if b == "":
if not (ord(b) & mask):
#we found a byte starting with a 0, which means it's the last byte of this varint
return raw_varint32
def writeDelimitedTo(message, stream):
message_str = message.SerializeToString()
delimiter = encoder._VarintBytes(len(message_str))
stream.write(delimiter + message_str)
def readDelimitedFrom(MessageType, stream):
raw_varint32 = readRawVarint32(stream)
message = None
if raw_varint32:
size, _ = decoder._DecodeVarint32(raw_varint32, 0)
data =
if len(data) < size:
raise Exception("Unexpected end of file")
message = MessageType()
return message
#In place version that takes an already built protobuf object
#In my tests, this is around 20% faster than the other version
#of readDelimitedFrom()
def readDelimitedFrom_inplace(message, stream):
raw_varint32 = readRawVarint32(stream)
if raw_varint32:
size, _ = decoder._DecodeVarint32(raw_varint32, 0)
data =
if len(data) < size:
raise Exception("Unexpected end of file")
return message
return None
It might not be the best looking code and I'm sure it can be refactored a fair bit, but at least that should show you one way to do it.
Now the big problem: It's SLOW.
Even when using the C++ implementation of python-protobuf, it's one order of magnitude slower than in pure C++. I have a benchmark where I read 10M protobuf messages of ~30 bytes each from a file. It takes ~0.9s in C++, and 35s in python.
One way to make it a bit faster would be to re-implement the varint decoder to make it read from a file and decode in one go, instead of reading from a file and then decoding as this code currently does. (profiling shows that a significant amount of time is spent in the varint encoder/decoder). But needless to say that alone is not enough to close the gap between the python version and the C++ version.
Any idea to make it faster is very welcome :)
Just for completeness, I post here an up-to-date version that works with the master version of protobuf and Python3
For the C++ version it is sufficient to use the utils in delimited_message_utils.h, here a MWE
#include <google/protobuf/io/zero_copy_stream_impl.h>
#include <google/protobuf/util/delimited_message_util.h>
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
template <typename T>
bool writeManyToFile(std::deque<T> messages, std::string filename) {
int outfd = open(filename.c_str(), O_WRONLY | O_CREAT | O_TRUNC);
google::protobuf::io::FileOutputStream fout(outfd);
bool success;
for (auto msg: messages) {
success = google::protobuf::util::SerializeDelimitedToZeroCopyStream(
msg, &fout);
if (! success) {
std::cout << "Writing Failed" << std::endl;
return success;
template <typename T>
std::deque<T> readManyFromFile(std::string filename) {
int infd = open(filename.c_str(), O_RDONLY);
google::protobuf::io::FileInputStream fin(infd);
bool keep = true;
bool clean_eof = true;
std::deque<T> out;
while (keep) {
T msg;
keep = google::protobuf::util::ParseDelimitedFromZeroCopyStream(
&msg, &fin, nullptr);
if (keep)
return out;
For the Python3 version, building on #fireboot 's answer, the only thing thing that needed modification is the decoding of raw_varint32
def getSize(raw_varint32):
result = 0
shift = 0
b = six.indexbytes(raw_varint32, 0)
result |= ((ord(b) & 0x7f) << shift)
return result
def readDelimitedFrom(MessageType, stream):
raw_varint32 = readRawVarint32(stream)
message = None
if raw_varint32:
size = getSize(raw_varint32)
data =
if len(data) < size:
raise Exception("Unexpected end of file")
message = MessageType()
return message
Was also looking for a solution for this. Here's the core of our solution, assuming some java code wrote many MyRecord messages with writeDelimitedTo into a file. Open the file and loop, doing:
if(someCodedInputStream->ReadVarint32(&bytes)) {
CodedInputStream::Limit msgLimit = someCodedInputStream->PushLimit(bytes);
if(myRecord->ParseFromCodedStream(someCodedInputStream)) {
//do your stuff with the parsed MyRecord instance
} else {
//handle parse error
} else {
//maybe end of file
Hope it helps.
Working with an objective-c version of protocol-buffers, I ran into this exact issue. On sending from the iOS client to a Java based server that uses parseDelimitedFrom, which expects the length as the first byte, I needed to call writeRawByte to the CodedOutputStream first. Posting here to hopegully help others that run into this issue. While working through this issue, one would think that Google proto-bufs would come with a simply flag which does this for you...
Request* request = [rBuild build];
[self sendMessage:request];
- (void) sendMessage:(Request *) request {
//** get length
NSData* n = [request data];
uint8_t len = [n length];
PBCodedOutputStream* os = [PBCodedOutputStream streamWithOutputStream:outputStream];
//** prepend it to message, such that Request.parseDelimitedFrom(in) can parse it properly
[os writeRawByte:len];
[request writeToCodedOutputStream:os];
[os flush];
Since I'm not allowed to write this as a comment to Kenton Varda's answer above; I believe there is a bug in the code he posted (as well as in other answers which have been provided). The following code:
google::protobuf::io::CodedInputStream input(rawInput);
// Read the size.
uint32_t size;
if (!input.ReadVarint32(&size)) return false;
// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit =
sets an incorrect limit because it does not take into account the size of the varint32 which has already been read from input. This can result in data loss/corruption as additional bytes are read from the stream which may be part of the next message. The usual way of handling this correctly is to delete the CodedInputStream used to read the size and create a new one for reading the payload:
uint32_t size;
google::protobuf::io::CodedInputStream input(rawInput);
// Read the size.
if (!input.ReadVarint32(&size)) return false;
google::protobuf::io::CodedInputStream input(rawInput);
// Tell the stream not to read beyond that size.
google::protobuf::io::CodedInputStream::Limit limit =
You can use getline for reading a string from a stream, using the specified delimiter:
istream& getline ( istream& is, string& str, char delim );
(defined in the header)

How do I make an in-place modification on an array using grpc and google protocol buffers?

I'm having a problem with a const request with the google protocol buffers using grpc. Here is my problem:
I would like to make an in-place modification of an array's value. For that I wrote this simple example where I try to pass an array and sum all of it's content. Here's my code:
syntax = "proto3";
option java_package = "io.grpc.examples";
package adder;
// The greeter service definition.
service Adder {
// Sends a greeting
rpc Add (AdderRequest) returns (AdderReply) {}
// The request message containing the user's name.
message AdderRequest {
repeated int32 values = 1;
// The response message containing the greetings
message AdderReply {
int32 sum = 1;
// Created by Eric Reis on 7/6/16.
#include <iostream>
#include <grpc++/grpc++.h>
#include "adder.grpc.pb.h"
class AdderImpl final : public adder::Adder::Service
grpc::Status Add(grpc::ServerContext* context, const adder::AdderRequest* request,
adder::AdderReply* reply) override
int sum = 0;
for(int i = 0, sz = request->values_size(); i < sz; i++)
request->set_values(i, 10); // -> this gives an error caused by the const declaration of the request variable
// error: "Non-const function 'set_values' is called on the const object"
sum += request->values(i); // -> this works fine
return grpc::Status::OK;
void RunServer()
std::string server_address("");
AdderImpl service;
grpc::ServerBuilder builder;
// Listen on the given address without any authentication mechanism.
builder.AddListeningPort(server_address, grpc::InsecureServerCredentials());
// Register "service" as the instance through which we'll communicate with
// clients. In this case it corresponds to an *synchronous* service.
// Finally assemble the server.
std::unique_ptr<grpc::Server> server(builder.BuildAndStart());
std::cout << "Server listening on " << server_address << std::endl;
// Wait for the server to shutdown. Note that some other thread must be
// responsible for shutting down the server for this call to ever return.
int main(int argc, char** argv)
return 0;
// Created by Eric Reis on 7/6/16.
#include <iostream>
#include <grpc++/grpc++.h>
#include "adder.grpc.pb.h"
class AdderClient
AdderClient(std::shared_ptr<grpc::Channel> channel) : stub_(adder::Adder::NewStub(channel)) {}
int Add(int* values, int sz) {
// Data we are sending to the server.
adder::AdderRequest request;
for (int i = 0; i < sz; i++)
// Container for the data we expect from the server.
adder::AdderReply reply;
// Context for the client. It could be used to convey extra information to
// the server and/or tweak certain RPC behaviors.
grpc::ClientContext context;
// The actual RPC.
grpc::Status status = stub_->Add(&context, request, &reply);
// Act upon its status.
if (status.ok())
return reply.sum();
else {
std::cout << "RPC failed" << std::endl;
return -1;
std::unique_ptr<adder::Adder::Stub> stub_;
int main(int argc, char** argv) {
// Instantiate the client. It requires a channel, out of which the actual RPCs
// are created. This channel models a connection to an endpoint (in this case,
// localhost at port 50051). We indicate that the channel isn't authenticated
// (use of InsecureChannelCredentials()).
AdderClient adder(grpc::CreateChannel("localhost:50051",
int values[] = {1,2};
int sum = adder.Add(values, 2);
std::cout << "Adder received: " << sum << std::endl;
return 0;
My error happens when i try to call the method set_values() on the request object that is defined as const. I understand why this error is occurring but I just can't figure out a way to overcome it without making a copy of the array.
I tried to remove the const definition but the RPC calls fails when I do that.
Since I'm new to this RPC world and even more on grpc and the google protocol buffers I'd like to call for your help. What is the best way to solve this problem?
Please see my answer here. The server receives a copy of the AdderRequest sent by the client. If you were to modify it, the client's original AdderRequest would not be modified. If by "in place" you mean the server modifies the client's original memory, no RPC technology can truly accomplish that, because the client and server run in separate address spaces (processes), even on different machines.
If you truly need the server to modify the client's memory:
Ensure the server and client run on the same machine.
Use OS-specific shared-memory APIs such as shm_open() and mmap() to map the same chunk of physical memory into the address spaces of both the client and the server.
Use RPC to transmit the identifier (name) of the shared memory (not the actual data in the memory) and to invoke the server's processing.
When both client and server have opened and mapped the memory, they both have pointers (likely with different values in the different address spaces) to the same physical memory, so the server will be able to read what the client writes there (with no copying or transmitting) and vice versa.

Boost.Asio local TCP Sockets in C++ - cannot write successfully more than once?

I'm trying to make an audio plugin which can connect to a local Java server and send it data through a socket (TCP). As I heard many nice things about it, I'm using Boost's ASIO library to do the work.
I'm having quite a strange bug in my code : my AudioUnit C++ client (which I use from inside a DAW, I'm testing with Ableton Live and Logic Pro) can connect to my Java server alright, but when I do a write operation, it seems my write is correctly executed only once (as in, I can monitor any incoming message on my Java server, and only the first message is seen)
I'm using the following code :
-- Inside the header :
boost::asio::io_service io_service;
boost::asio::ip::tcp::socket mySocket(io_service);
boost::asio::ip::tcp::endpoint myEndpoint(boost::asio::ip::address::from_string(""), 9001);
boost::system::error_code ignored_error;
-- Inside my plugin's constructor
-- And when I try to send :
boost::asio::write(mySocket, boost::asio::buffer(datastring), ignored_error);
(you will notice that I do not close my socket, because I'd like it to live forever)
I don't think the problem comes from my Java server (though I could be wrong !), because I found out a way to make my C++ plugin "work correctly" and send all the messages I want :
If I don't open my socket upon initializing my plugin, but directly when I try sending the message, every message is received by my remote server. Ie, every time I call sendMessage(), I do the following :
try {
// Connect to the Java application
// Write the data
boost::asio::write(mySocket, boost::asio::buffer(datastring), ignored_error);
// Disconnect
} catch (const std::exception & e) {std::cout << "Couldn't initialize socket\n";}
Still, I'm not too happy with this code : I have to send about 1000 messages per second - while that might not be humongous, but I don't think opening the socket and connecting to the end point everytime is efficient (it's a blocking operation too)
Any input which could lead me in the right direction would be greatly appreciated !
For more information, here's my code in a slightly more complete version (with the useless stuff trimmed to keep it short)
#include <cstdlib>
#include <fstream>
#include "PluginProcessor.h"
#include "PluginEditor.h"
#include "SignalMessages.pb.h"
using boost::asio::local::stream_protocol;
// Default parameter values
const int defaultAveragingBufferSize = 256;
const int defaultMode = 0;
const float defaultInputSensitivity = 1.0;
const int defaultChannel = 1;
const int defaultMonoStereo = 1; //Mono processing
// Variables used by the audio algorithm
int nbBufValProcessed = 0;
float signalSum = 0;
// Used for beat detection
float signalAverageEnergy = 0;
float signalInstantEnergy = 0;
const int thresholdFactor = 5;
const int averageEnergyBufferSize = 11025; //0.25 seconds
// Socket used to forward data to the Processing application, and the variables associated with it
boost::asio::io_service io_service;
boost::asio::ip::tcp::socket mySocket(io_service);
boost::asio::ip::tcp::endpoint myEndpoint(boost::asio::ip::address::from_string(""), 9001);
boost::system::error_code ignored_error;
averagingBufferSize = defaultAveragingBufferSize;
inputSensitivity = defaultInputSensitivity;
mode = defaultMode;
monoStereo = defaultMonoStereo;
channel = defaultChannel;
// Connect to the remote server
// Note for stack overflow : this is where I'd like connect to my server !
void SignalProcessorAudioProcessor::processBlock (AudioSampleBuffer& buffer, MidiBuffer& midiMessages)
// In case we have more outputs than inputs, clear any output
// channels that doesn't contain input data
for (int i = getNumInputChannels(); i < getNumOutputChannels(); ++i)
buffer.clear (i, 0, buffer.getNumSamples());
// This is the most important part of my code, audio processing takes place here !
// Note for stack overflow : this shouldn't be very interesting, as it is not related to my current problem
for (int channel = 0; channel < std::getNumInputChannels(); ++channel)
const float* channelData = buffer.getReadPointer (channel);
for (int i=0; i<buffer.getNumSamples(); i++) {
signalSum += std::abs(channelData[i]);
signalAverageEnergy = ((signalAverageEnergy * (averageEnergyBufferSize-1)) + std::abs(channelData[i])) / averageEnergyBufferSize;
nbBufValProcessed += buffer.getNumSamples();
if (nbBufValProcessed >= averagingBufferSize) {
signalInstantEnergy = signalSum / (averagingBufferSize * monoStereo);
// If the instant signal energy is thresholdFactor times greater than the average energy, consider that a beat is detected
if (signalInstantEnergy > signalAverageEnergy*thresholdFactor) {
//Set the new signal Average Energy to the value of the instant energy, to avoid having bursts of false beat detections
signalAverageEnergy = signalInstantEnergy;
//Create an impulse signal - note for stack overflow : these are Google Protocol buffer messages, serialization is faster this way
Impulse impulse;
std::string datastringImpulse;
nbBufValProcessed = 0;
signalSum = 0;
void SignalProcessorAudioProcessor::sendMessage(std::string datastring) {
try {
// Write the data
boost::asio::write(mySocket, boost::asio::buffer(datastring), ignored_error);
} catch (const std::exception & e) {
std::cout << "Caught an error while trying to initialize the socket - the Java server might not be ready\n";
std::cerr << e.what();
// This creates new instances of the plugin..
AudioProcessor* JUCE_CALLTYPE createPluginFilter()
return new SignalProcessorAudioProcessor();