Problems with parsing a text header packet in c++ - c++

I am trying to parse header packet of SIP protocol (Similar to HTTP) which is a text based protocol.
The fields in the header do not have an order.
For ex: if there are 3 fields, f1, f2, and f3 they can come in any order any number of times say f3, f2 , f1, f1.
This is increasing the complexity of my parser since I don't know which will come first.
What should I do to overcome this complexity?

Ultimately, you simply need to decouple your processing from the order of receipt. To do that, have a loop that repeats while fields are encountered, and inside the loop determine which field type it is, then dispatch to the processing for that field type. If you can process the fields immediately great, but if you need to save the potentially multiple values given for a field type you might - for example - put them into a vector or even a shared multimap keyed on the field name or id.
Pseudo-code:
Field x;
while (x = get_next_field(input))
{
switch (x.type())
{
case Type1: field1_values.push_back(x.value()); break;
case Type2: field2 = x.value(); break; // just keep the last value seen...
default: throw std::runtime_error("unsupported field type");
}
}
// use the field1_values / field2 etc. variables....

Tony already gave the main idea, I'll get more specific.
The basic idea in parsing is that it is generally separated into several phases. In your case you need to separate the lexing part (extracting the tokens) from the semantic part (acting on them).
You can proceed in different fashions, since I prefer a structured approach, let us suppose that we have a simple struct reprensenting the header:
struct SipHeader {
int field1;
std::string field2;
std::vector<int> field3;
};
Now, we create a function that take a field name, its value, and fill the corresponding field of the SipHeader structure appropriately.
void parseField(std::string const& name, std::string const& value, SipHeader& sh) {
if (name == "Field1") {
sh.field1 = std::stoi(value);
return;
}
if (name == "Field2") {
sh.field2 = value;
return;
}
if (name == "Field3") {
// ...
return;
}
throw std::runtime_error("Unknown field");
}
And then you iterate over the lines of the header and for each line separate the name and the value and call this functions.
There are obviously refinements:
instead of a if-chain you can use a map of functors or you can fully tokenize the source and store the fields in a std::map<std::string, std::string>
you can use a state-machine technic to immediately act on it without copying
but the essential advice is the same:
To manage complexity you need to separate the task in orthogonal subtasks.

Related

Tuples: No matching function for call to 'get'

I have 3 structs : Student, Citizen, Employee. I want user to be able to choose what struct they want to work with (std::vector of structs, actually). Since there's no way to define type at runtime, I created all 3 vectors, but will use only one of them (depending on the user's choice), others will stay empty:
std::vector<Student> container_student;
std::vector<Citizen> container_citizen;
std::vector<Employee> container_employee;
auto containers = make_tuple(container_student, container_citizen, container_employee);
std::cout << "Enter:\n0 to operate \"Student\" struct\n1 to operate \"Citizen\" struct\n2 to operate \"Employee\" struct\n";
std::cin >> container_type;
auto container = std::get<container_type>(containers);
But I get No matching function for call to 'get', even though container_type is an int and containers is a tuple.
Edit: understandable, auto can't make magic and I still try to make container's type to depend on runtime. But even if I try to use std::get<container_type>(containers) (probably define would help) instead of container in functions etc., I get the same error, which is not understandable.
Unfortunately, what you're proposing isn't possible in C++. The C++ typing and template system works at compile-time, where information read in from the user isn't available. As a result, anything passed into a template's angle braces needs to be determinable at compile-time. In your case, the number the user enters, indicating which option they want to select, is only knowable at runtime.
There are some routes you could take to achieve the same result, though. For example, one option would be to do something like this:
if (container_type == 0) {
auto container = std::get<0>(containers);
/* ... */
} else if (container_type == 1) {
auto container = std::get<1>(containers);
/* ... */
} /* etc */
Here, all the template angle braces are worked out at compile-time. (Then again, if this is what you're going to be doing, you wouldn't need the tuple at all. ^_^)
Another option would be to use templates, like this:
template <typename T> void doSomething(std::vector<T>& container) {
/* Put your code here */
}
/* Then, back in main... */
if (container_type == 0) {
doSomething(container_student);
} else if (container_type == 1) {
doSomething(container_citizen);
} /* etc */
This still requires you to insert some code to map from integer types to the functions you want to call, but it leaves you the freedom to have a container variable (the one in doSomething) that you can treat generically at that point.
It's basically the Fundamental Theorem of Software Engineering in action - all problems can be solved by adding another layer of indirection. :-)
Hope this helps!

AST: get leaf value when leafs are of different types

I need to represent in a AST a structure like this:
struct {
int data;
double doubleDataArray[10];
struct {
int nestedData;
};
};
I'm creating an AST like this one:
I need to retrieve data from leaves. The problem that I have is that leaves contains heterogenous data. A leaf can represent an integer value, a double, a string and so on.
I can create classes like IntValue, DoubleValue that inherit from Value and store respective data, perform a dynamic_cast to convert Value to the type referred in its type attribute. Something like
switch (value->getType()) {
case Type::Int: {
auto iv = dynamic_cast<IntValue>(value);
int value = iv->getValue();
} break;
case Type::Double() {
auto dv = dynamic_cast<DoubleValue>(value);
double value = dv->getValue();
} break;
//…
}
but I'd like to know if there's a better way, because a switch like that one it's not easy maintainable and readable.
I've seen some example, like in boost::program_options, something like:
int value = value->getValue().as<int>();
It's a better way? How can I reproduce this?
You could do something like this using c++17
struct node {
//... other stuff
std::variant</*your types of nodes here*/> type;
}
then call this visitor on your nodes
std::visit([](auto&& node) {
if constexpr(std::is_same_v<std::decay_t<decltype(node)>, /* your type here */>) {
// ...
}
else if constexpr(/* ... */) {
// ...
}
}, node0.type);
Going on a tangent for a slightly different flavor of a solution, how about doing it the way capnproto does it? Capnproto's own schema compiler represents the AST in memory using the Capnproto wire encoding. The schema supports tagged unions. The lexer and parser for the schema are built using combinators (although I presume that you already have a good parser in place that produces the AST).
The structure could be expressed as follows using capnp schema:
# MyAst.capnp
struct Struct {
fields #0 :List(Field);
}
struct Field {
name #4 :Text;
union {
integer #0 :List(Int32);
fpoint #1 :List(Double);
text #2 :List(Text);
structure #3 :Struct;
}
}
The schema compiler would generate C++ code for this, with the following important classes Struct::Reader, Struct::Builder, Field::Reader and Field::Builder. Whatever makes the AST would use the Struct::Builder type to make a structure instance, with its data. Then, you'd traverse the structure as follows:
void processData(Struct::Reader reader) {
auto fields = reader.getFields();
for (auto &field : fields) {
if (field.hasInteger()) {
int32_t val = field.getInteger();
...
} else if (field.hasFpoint()) {
double val = field.getFpoint();
...
} else if (field.hasText()) {
kj::StringPtr val = field.getText();
...
} else if (field.hasStructure()) {
processData(field.getStructure());
}
}
}
The kj framework (included in capnproto) has quite a few compiler-building goodies, such as memory arenas. A Foo::Builder would then be obtained from an Orphan<Foo>, and the orphan is produced by an orphanage that carves out memory from an arena allocator. With your entire AST built in an arena with one or few large, contiguous segments, this would perform better than allocating all those types on general-purpose heap (assuming that your AST is not tiny). This representation also serializes directly to disk or network with no transcoding: you can do a binary dump of an orphanage's arena, then later load it directly and you get all your data back with zero effort and zero transcoding. The Foo::Reader and Foo::Builder types provide very fast accessors that don't do any data decoding nor translation - that's the advantage of the capnproto encoding. If you modify the data in the AST, the orphanage may grow, but it also provides a copy operation that copies only the referenced areas (a copying GC, if you will) - and that's blazing fast, too, since no transcoding is done. Chunks of verbatim binary data are copied with very little traversal overhead.

How to make a string into a reference?

I have looked into this, but it's not what I wanted: Convert string to variable name or variable type
I have code that reads an ini file, stores data in a QHash table, and checks the values of the hash key, (see below) if a value is "1" it's added to World.
Code Examples:
World theWorld;
AgentMove AgentMovement(&theWorld);
if(rules.value("AgentMovement") == "1")
theWorld.addRule(&AgentMovement);
INI file:
AgentMovement=1
What I want to do is, dynamically read from the INI file and set a reference to a hard coded variable.
for(int j = 0; j < ck.size(); j++)
if(rules.value(ck[j]) == "1")
theWorld.addRule("&" + ck[j]);
^
= &AgentMovement
How would you make a string into a reference as noted above?
This is a common theme in programming: A value which can only be one of a set (could be an enum, one of a finite set of ints, or a set of possible string values, or even a number of buttons in a GUI) is used as a criteria to perform some kind of action. The simplistic approach is to use a switch (for atomic types) or an if/else chain for complex types. That is what you are currently doing, and there is nothing wrong with it as such:
if(rules.value(ck[j]) == "1") theWorld.addRule(&AgentMovement);
else if(rules.value(ck[j]) == "2") theWorld.addRule(&AgentEat);
else if(rules.value(ck[j]) == "3") theWorld.addRule(&AgentSleep);
// etc.
else error("internal error: weird rules value %s\n", rules.value(ck[j]));
The main advantages of this pattern are in my experience that it is crystal clear: anybody, including you in a year, understands immediately what's going on and can see immediately which criteria leads to which action. It is also trivial to debug which can be a surprising advantage: You can break at a specific action, and only at that action.
The main disadvantage is maintainability. If the same criteria (enum or whatever) is used to switch between different things in various places, all these places have to be maintained, for example when a new enum value is added. An action may come with a sound, an icon, a state change, a log message, and so on. If these do not happen at the same time (in the same switch), you'll end up switching multiple times over the action enum (or if/then/else over the string values). In that case it's better to bundle all information connected to an action in a data structure and put the structures in a map/hash table with the actions as keys. All the switches collapse to single calls. The compile-time initialization of such a map could look like this:
struct ActionDataT { Rule rule; Icon icon; Sound sound; };
map<string, AcionDataT> actionMap
= {
{"1", {AgentMovement, moveIcon, moveSound} }
{"2", {AgentEat, eatIcon, eatSound } } ,
//
};
The usage would be like
for(int j = 0; j < ck.size(); j++)
theWorld.addRule(actionMap[rules.value(ck[j])].rule);
And elsewhere, for example:
if(actionFinished(action)) removeIcon(actionMap[action].icon);
This is fairly elegant. It demonstrates two principles of software design: 1. "All problems in computer science can be solved by another level of indirection" (David Wheeler), and 2. There is often a choice between more data or more code. The simplistic approach is code-oriented, the map approach is data oriented.
The data-centrist approach is indispensable if switches occur in more than one situation, because coding them out each time would be a maintenance nightmare.
Note that with the data-centrist approach none of the places where an action is used has to be touched when a new action is added. This is essential. The mechanism resembles (in principle and implementation, actually) the call of a virtual member function. The calling code doesn't know and isn't really interested in what is actually done. Responsibility is transferred to the object. The calling code may perform actions later in the life cycle of a program which didn't exist when it was written. By contrast, compare it to a program with many explicit switches where every single use must be examined when an action is added.
The indirection involved in the data-centrist approach is its disadvantage though, and the only problem which cannot be solved by another level of indirection, as Wheeler remarked. The code becomes more abstract and hence less obvious and harder to debug.
You have to provide the mapping from the names to the object by yourself. I would wrap it into a class, something like this:
template <typename T>
struct ObjectMap {
void addObject(std::string name,T* obj){
m[name] = obj;
}
T& getRef(std::string name) const {
auto x = m.find(name);
if (x != m.end() ) { return *(x->second);}
else { return dummy; }
}
private:
std::map<std::string,T*> m;
T dummy;
}
The problem with this approach is that you have to decide what to do if an object is requested that is actually not in the map. A reference always has to reference something (in contrast to a pointer that can be 0). I decided to return the reference to a dummy object. However, you might want to consider to use pointers instead of references. Another option might be to throw an error in case the object is not in the map.

How replace If-else block condition

In my code I have an if-else block condition like this:
public String method (Info info) {
if (info.isSomeBooleanCondition) {
return "someString";
}
else if (info.isSomeOtherCondition) {
return "someOtherString";
}
else if (info.anotherCondition) {
return "anotherStringAgain";
}
else if (lastCondition) {
return "string ...";
}
else return "lastButNotLeastString";
}
Each conditional branch returns a String.
Since if-else statements are difficult to read, test and maintain, how can I replace?
I was thinking to use Chain Of Responsability Pattern, is it right in this case?
Is there any other elegant way that I can do that?
I am left to assume that your code does not exist in the Info class as it is passed in an referenced for all but that last condition. My first instinct would be to make String OtherClass.method(Info) into String Info.method() and have it return the appropriate string.
Next, I would take a look at the conditions. Are they really conditions or can they be mapped to a table. Whenever I see code performing a lookup, such as this, I tend to fall back on attempting to fit into a dictionary or map so I can perform a lookup for the value.
If you are left with conditions that must be checked then I would begin thinking about lambdas, delegates or custom interface. A series of if..then across the same type could easily be represented. Next, you would collect them and execute accordingly. IMO, this would make the if..then bunch much clearer. It is more code by is secondary at this point.
interface IInfoCheck
{
bool TryCheck(Info info, out string);
}
public OtherClass()
{
// Setup checks
CheckerCollection.add(new IInfoCheck{
public String check(out result) {
// check code
}
});
}
public String method(Info info) {
foreach (IInfoCheck ic in CheckerCollection)
{
String result = null;
if (ic.TryCheck(out result))
{
return result;
}
}
}
The problem statement does not fit into an ideal chain of responsibility scenario because it is either/or kind or conditions which look 'chained' but is actually 'not'. Reason - one processes all the chain-links in the chain of responsibility pattern irrespective of what happened in the previous links, i.e. no chain-links are skipped(although you can configure which chain links to process and which not - but still the execution of a chain-link is not dependent on the outcome of a previous chain-link). However, in this if-else-if* scenario - once an if statement condition matches, the further conditions are not evaluated.
I have thought of an alternative design which achieves the above without if-else, but it is lengthier but at the same time more flexible.
Lets say we have a FunctionalInterface IfElseReplacer which takes 'info' as input and gives 'String' output.
public Interface IfElseReplacer(){
public String executeCondition(Info);
}
Then the above conditions can be re-phrased as lambda expressions would look like -
"(Info info) -> info.someCondition ? someString"
"(Info info) -> info.anotherCondition ? someOtherString"
and so on...
Then we need a processConditons method to process these Lambdas- it could be a default method in ifElseReplacer -
default String processConditions(List<IfElseReplacer> ifElseReplacerList, Info info){
String strToReturn="lastButNotLeastString";
for(IfElseReplacer ifElseRep:ifElseReplacerList){
strToReturn=ifElseRep.executeCondition(info);
if(!"lastButNotLeastString".equals(strToReturn)){
break;//if strToReturn's value changes i.e. executeCondition returns a String valueother than "lastButNotLeastString" then exit the for loop
}
return strToReturn;
}
What remains now is to (I am skipping the code for this - please let me know if you need it then will write this also) -
From wherever the if-else conditions need to be checked there -
Create an array of lambda expressions as explained above assigning them to IfElseReplacer interfaces while adding them to a list of type IfElseReplacer.
Pass this list to the default method processConditions() along with an instance of Info.
Default method would return the String value which we would be same as the result of if-else-if* block given in the problem statement.
I'd simply factor out the returns:
return
info.isSomeBooleanCondition ? "someString" :
info.isSomeOtherCondition ? "someOtherString" :
info.anotherCondition ? "anotherStringAgain" :
lastCondition ? "string ..." :
"lastButNotLeastString"
;
From the limited information about the problem, and the code given, it looks like this a case of type-switching. The default solution would be to use a inheritance for that:
class Info {
public abstract String method();
};
class BooleanCondition extends Info {
public String method() {
return "something";
};
class SomeOther extends Info {
public String getString() {
return "somethingElse";
};
Patterns which are interesting in this case are Decorator, Strategy and Template Method. Chain of Responsibility has another focus. Each element in the chain implement logic to process some commands. When chained, an object forwards the command if it cannot process it. This implements a loosly coupled structure to process commands where no central dispatch is needed.
If computing the string on the conditions is an operation, and from the name of the class I am guessing that it is probably an expression tree, you should look at the Visitor pattern.

How to select an action depending on the value of a pair of enums?

I'm implementing a model of a protocol in C++ (cache coherency protocol to be particular, but it does not matter for this question)
The protocol takes two values : a previous_state and a message_type. Both are enums. The protocol should select a unique action for each combination of the two inputs. Some combinations are invalid (an error should be displayed), and a few combinations are to be stalled.
What is a good way to code the above scenario in C++? I can think of : Two nested switch blocks to select an input combination, and call a particular action implemented as a function.
Is there some more elegant and flexible way to code the above scenario? It should ideally be easy to add/remove input combinations from the protocol.
Thanks for any advice. (I'm new to design patterns, and don't know any that fits here)
Let's assume the two enums are 32-bit values. I'd do something like this:
void doit(E1 previous_state, E2 message_type) {
# define COMBINE(_x_, _y_) (static_cast<int64_t>(previous_state) << 32 | message_type)
switch (COMBINE(previous_state, message_type) {
case COMBINE(e1value1, e2value1):
// ...
break;
case COMBINE(e1value4, e2value3):
// ...
break;
// ... more cases ...
default:
// report error
}
}
Don't assume this will generate faster code though--switch statements are often optimized into jump tables, but tricks like this may defeat that. If you're mostly interested in the best possible performance, you'll have to experiment and find out which is best on your system (noting that changing the int64_t to a smaller type and minimizing the shifts in my example may have some effect).
Why not use simple 2 dimensional array? For example
enum Previous_state
{
state_1 = 0,
state_2,
...,
state_n,
PreviousLastValue
}
enum Message_type
{
type_1 = 0,
type_2,
...,
type_n,
TypeLastValue
}
...
Action actions[PreviousLastValue][TypeLastValue] = {NULL};
void SetAction(Previous_state state, Message_type type, Action action)
{
actions[state][type] = action;
}
void RemoveAction(Previous_state state, Message_type type)
{
actions[state][type] = 0;
}
void GetAction(Previous_state state, Message_type type)
{
if(actions[state][type] == 0)
{
//display error
}
return actions[state][type];
}