Pre-serializing some fields of a proto message

Pre-serializing some fields of a proto message - c++

Suppose I have a proto structure that looks like the following:
message TMessage {
optional TDictionary dictionary = 1;
optional int specificField1 = 2;
optional TOtherMessage specificField2 = 3;
...
}
Suppose I am using C++. This is the message stub that is used in the master process to send information to the bunch of the nodes using the network. In particular, the dictionary field is 1) pretty heavy 2) common for all the serialized messages, and all the following specific fields are filled with the relatively small information specific to the destination node.
Of course, dictionary is built only once, but it comes out that the major part of running time is spent while serializing the common dictionary part again and again for each new node.
Obvious optimization would be to pre-serialize dictionary into the byte string and put it into the TMessage as a bytes field, but this looks a bit nasty to me.
Am I right that there is no built-in way to pre-serialize a message field without ruining the message structure? It sounds like an idea for a good plugin for proto compiler.

Protobuf is designed such that concatenation === composition, at least for the root message. That means that you can serialize an object with just the dictionary, and snapshot the bytes somewhere. Now for each of the real messages you can paste down that snapshot, and then serialize an object with just the other fields - just whack it straight after: no additional syntax is required. This is semantically identical to serializing them all at the same time. In fact, since it will retain the field order, it should actually be identical bytes too.
It helps that you used "optional" throughout :)

Marc's answer is perfect for your use case. Here is just another option:
The field must be a submessage, like your TDictionary is.
Have another variant of the outer message, with bytes in place of the submessage you want to preserialize:
message TMessage_preserialized {
optional bytes dictionary = 1;
...
}
Now you can serialize the TDictionary separately and put the resulting data in the bytes field. In protobuf format, submessages and bytes field are written out the same way. This means you can serialize as TMessage_preserialized and still deserialize as normal TMessage.

Related

The order of elements in a protobuf map field

I have many map fields defined in my protocol buffer messages. The messages are populated in C++ and received in a different C++ component, which reads their content using the Descriptor and Reflection APIs.
Given a map field, say:
map <int32, int32> my_map = 1;
This is transported in the same way as something like this:
message my_map_entry {
int32 key = 1;
int32 value = 2;
}
repeated my_map_entry my_map = 1;
As I have understood the current limitation of Descriptor and Reflection APIs, here I have to perform look ups by iterating over the received data. Of course I could put all the data in some more suitable data structure, such as a std::unordered_map if I wanted to do many look ups in the received map field, but I generally only do one look up per received map field.
Can I assume something about the order in which the data is received? Is the repeated my_map_entry messages perhaps ordered, because of the underlaying data structure used in the protocol buffer implementation? If so, a look up for an integer key in a map can stop when a larger key is found. That could give me a potential optimization when it comes to processing the received map fields in my application.

You can not assume the order of the map is similar after serialization.
The following quote is taken from the protobuf website:
Wire format ordering and map iteration ordering of map values is
undefined, so you cannot rely on your map items being in a particular
order
In general protobuf may serialize fields in a random order.

Wireshark Dissector VoidString type

I am working on a Wireshark Dissector Generator for a senior project. I have done some reading but had a question about the VoidString object in the ProtoField Object. The documentation wasn't too clear on this particular value or what its used for.
Our generator uses C++ so that our client can modify it after the project is complete. I was reading in another thread here that it could be passed a table of key, value pairs. Are there other structures or information this parameter is used for? We're trying to make a data structure to contain the parse of a file passed by the user and we're trying to determine how to best make this object. Would it be better to allow a template object to be passed here instead or is the table sufficient?

I'm not sure to understand your needs but according to the wireshark source code (wslua_proto_fields.c), the definition of the VoidString parameter is :
#define WSLUA_OPTARG_ProtoField_new_VALUESTRING 4 /* A table containing the text that
corresponds to the values, or a table containing unit name for the values if base is
`base.UNIT_STRING`, or one of `frametype.NONE`, `frametype.REQUEST`, `frametype.RESPONSE`,
`frametype.ACK` or `frametype.DUP_ACK` if field type is ftypes.FRAMENUM. */
So the table will be "cast" following the type and print in base representation.

Partial deserialization of std::map

Are there any ways to do partial deserialization of std::map that was serialized with boost::archive::text_oarchive and then saved to file?
For example we have a big serialized and saved map where key is integer and value is some structure and now we need to get it back by parts... load first 100 records, then load next 100 records... etc.
Are there any libs, boost classes or solutions to do it?

Normally the same serialize() function is called both to serialize and to deserialize. If you want to get it back in parts, you should serialize it in parts in the first place.

Complex and interrelated data structure in the Client Server scenerio

I need to know efficient mechanism used for data structure in the socket programming. Lets consider an example of car manufacturing on assembly line.
Initially Conveyer is empty then i start adding different parts dynamically. How can i transmit my data to the server using the TCP/UDP. What can i do so that my server can recognize, if i add some new part dynamically ? and after calculating server return data to client in same structure, so that client can put calculated data on the exact position of component.
Is it possible to arrange this data using some B Tree or B+ Tree structures ? is it possible to reconstruct the same tree on the server side ? what could be other possible alternatives approaches to do this ?

You need to serialize your data, whatever you need to send to server, to some text or binary blob. Yeah, it's possible to serialize interrelated data structure, e.g. by assigning some ID to items and then referencing them by that ID. For C++ serialization I would recommend to have a look at Boost.Serialization.
The simplest ID is memory address on serializer (sender) side - kind of unique identifier ready to use. Of course on deserializer side it must be considered as a just ID and not a memory address.

What is a good design pattern to implement a dynamic data importer tool?

We are planning to build a dynamic data import tool. Basically taking information on one end in a specified format (access, excel, csv) and upload it into an web service.
The situation is that we do not know the export field names, so the application will need to be able to see the wsdl definition and map to the valid entries in the other end.
In the import section we can define most of the fields, but usually they have a few that are custom. Which I see no problem with that.
I just wonder if there is a design pattern that will fit this type of application or help with the development of it.

I am not sure where the complexity is in your application, so I will just give an example of how I have used patterns for importing data of different formats. I created a factory which takes file format as argument and returns a parser for particular file format. Then I use the builder pattern. The parser is provided with a builder which the parser calls as it is parsing the file to construct desired data objects in application.
// In this example file format describes a house (complex data object)
AbstractReader reader = factory.createReader("name of file format");
AbstractBuilder builder = new HouseBuilder(list_of_houses);
reader.import(text_stream, builder);
// now the list_of_houses should contain an extra house
// as defined in the text_stream

I would say the Adaptor Pattern, as you are "adapting" the data from a file to an object, like the SqlDataDataAdapter does it from a Sql table to a DataTable
have a different Adaptor for each file type/format? example SqlDataAdptor, MySqlDataAdapter, they handle the same commands but different datasources, to achive the same output DataTable
Adaptor pattern
HTH
Bones

Probably Bridge could fit, since you have to deal with different file formats.
And Façade to simplify the usage. Handle my reply with care, I'm just learning design patterns :)

You will probably also need Abstract Factory and Command patterns.
If the data doesn't match the input format you will probably need to transform it somehow.
That's where the command pattern come in. Because the formats are dynamic, you will need to base the commands you generate off of the input. That's where Abstract factory is useful.

Our situation is that we need to import parametric shapes from competitors files. The layout of their screen and data fields are similar but different enough so that there is a conversion process. In addition we have over a half dozen competitor and maintenance would be a nightmare if done through code only. Since most of them use tables to store their parameters for their shapes we wrote a general purpose collection of objects to convert X into Y.
In my CAD/CAM application the file import is a Command. However the conversion magic is done by a Ruleset via the following steps.
Import the data into a table. The field names are pulled in as well depending on the format.
We pass the table to a RuleSet. I will explain the structure the ruleset in a minute.
The Ruleset transform the data into a new set of objects (or tables) which we retrieve
We pass the result to the rest of the software.
A RuleSet is comprise of set of Rules. A Rule can contain another Rule. A rule has a CONDITION that it tests, and a MAP TABLE.
The MAP TABLE maps the incoming field with a field (or property) in the result. There are can be one mapping or a multitude. The mapping doesn't have to involve just poking the input value into a output field. We have a syntax for calculation and string concatenation as well.
This syntax is also used in the Condition and can incorporate multiple files like ([INFIELD1] & "-" & [INFIELD2])="A-B" or [DIM1] + [DIM2] > 10. Anything between the brackets is substituted with a incoming field.
Rules can contain other Rules. The way this works is that in order for a sub Rule mapping to apply both it's condition and those of it's parent (or parents) have to be true. If a subRule has a mapping that conflicts with a parent's mapping then the subRule Mapping applies.
If two Rules on the same level have condition that are true and have conflicting mapping then the rule with the higher index (or lower on the list if you are looking at tree view) will have it's mapping apply.
Nested Rules is equivalent to ANDs while rules on the same level are equivalent of ORs.
The result is a mapping table that is applied to the incoming data to transform it to the needed output.
It is amicable to be being displayed in a UI. Namely a Treeview showing the rules hierarchy and a side panel showing the mapping table and conditions of the rule. Just as importantly you can create wizards that automate common rule structures.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js