variably modified type in char[] - c++

I have this struct. What I am trying to do is to have a continues ram space to memcpy them on hard drive. I have a dynamic created string which I will use as a key. I want to create a struct that can do this. I used templates and I made this.
template <class ItemType> struct INXM_Node {
ItemType key;
int left;
int right;
int next; // Used for queue.
} ;
I was running:
INXM_Node<char[100]> *root = new INXM_Node<char[100]>();
Everything was fine until I tried to change 100 with a variable. Then i got error:
'char [(((long unsigned int)(((long int)attrLength) - 1)) + 1u)]' is a variably modified type
What I ran was:
sizeof(INXM_Node<char[attrLength]>);
I am taking attrLength as an argument from a function.
I need to generate multiple structs with different char arrays.

The problem is that the compiler needs to know what type ItemType is at compile time. When you use a variable, it cannot know. The compiler attempts to specifically create each ItemType that will be used in the execution of your program. If you are using a variable length char array, the compiler does not know how much memory to allocate for that particular ItemType. You might consider using std::string

The type you use to instantiate a template must be fixed at compile time. When you compile with a template, specific code is emitted by the compiler, for the different types you use with the template. This can't be done at run time (there might not be a compiler available even) and it would be unreasonable and indeed impossible to expect it to be done for every possible type at compile time.
I think you're taking the wrong approach to your problem in general though. It would be better to use std::string as the key if you need the size to vary at run time and use something like boost::serialize to (portably and safely) save your data to disk.

Related

Build a compile time command look up table using template metaprogramming

I am trying to build a command parser for a embedded system(bare metal) where it will receive command via message and call corresponding function. The structure will look like
struct cmdparse{
char* commandname;
function_pointer;
};
Initially the respective modules will register the command they will serve and the corresponding function pointer. The command parser builds the look up table during initialisation. when ever a command is received it parses the table and calls the corresponding function, Is it possible to achieve this i.e build this look up table in compile time using template metaprogramming. The main advantage I am expecting is when ever a new command is added, I don't need to check the command parser to see if the array size needs to be increased. Since it is a embedded system project usage of vector is banned due to dynamic memory requirements. Plus if this look up table goes to ROM instead of RAM it will add a safety clause of avoiding un intentional corruption.
If you have decent compiler (enable at least c++11) you can build at compile-time with:
struct cmdparse{
const char* commandname;
void (*fn)();
};
void whatever1();
void whatever2();
constexpr cmdparse commands[] = {//<--compiler time
cmdparse{"cmd1", &whatever1},
cmdparse{"cmd2", &whatever2}
};
If you don't have a good compiler you may need to remove constexpr - but otherwise this method should work.
Making room for more commands at runtime is perhaps best done in a seperate array:
std::array<cmdparse, 1024> dyn_commands; //<-- supports up to 1024 commands

Is gcc compile time proportional to number of executions or lines of code?

I am using gcc 4.8.5 to compile a c++98 code. My c++ code statically initializes unordered_map of unoredred_maps with ~20,000 total key-value pairs, and overloaded function which will take ~450 different types. This program will be executed on a continuous stream of data, and for every block of data, overloaded function will return an output.
The problem is, gcc takes too long to compile due to initializing ~20,000 key-value pairs.
The nested unordered_map has a structure of map< DATATYPE, map< key, value >>, and only one of the overloaded function gets called for each data input. In other words, I do not need to statically initialize the entire nested map, but I can instead dynamically define map<key, value> for the corresponding datatype when needed. For example, I can check for the definition of a map and when it is undefined, I can later populate it in run time. This will result a map with ~45 average key-value pairs.
However, I know that dynamic initialization will require longer code. For a simple execution described above (statically initializing entire map), will other method such as dynamic initialization significantly reduce time? My understanding is, whatever alternative I take, I still need to write a code to populate entire key-value pairs. Also, overhead and actual computation that goes behind populating an unordered_map (hashmap) should not differ asymptotically in most cases, and should not show significant difference than running same number of loops to increment a value.
For reference, I am writing a python script that reads in multiple json files to print out the c++ code, which then gets compiled using gcc. I am not reading json directly from c++ so whatever I do, c++ source will need to insert key-value one by one because it will not have access to json file.
// below is someEXE.cpp, which is a result from python script.
// Every line is inside python's print"" (using python 2.7)
// so that it can write complete c++ that should compile.
someEXE.cpp
// example of an overloaded function among ~450
// takes in pointer to data and exampleMap created above
void exampleFunction(DIFFERENT_TYPE1*data,
std::unorderd_map<std::string, std::unordered_map<std::string, std::string>> exampleMap) {
printf("this is in specific format: %s", exampleMap["DATATYPE1"]
[std::to_string(data->member_variable)].c_str();
//... more print functions below (~25 per datatype)
};
int main() {
// current definition of the unordered_map (total ~20,000 pairs)
std::unordered_map<std::string, std::unordered_map<std::string,
std::string>> exampleMap = {
{"DATATYPE1", {{"KEY1", "VAL1"}, {"KEY2", "VAL2"}, /*...*/}}
};
// create below test function for all ~450 types
// when I run the program, code will printf values to screen
DIFFERENT_TYPE1 testObj = {0};
DIFFERENT_TYPE1 *testObjPointer = &testObj;
exampleFunction(testObjPointer, exampleMap);
return 0;
}
EDIT: My initial question was "Is CMAKE compile time proportional to...". Changed the term "CMAKE" with actual compiler name, gcc 4.8.5 with the help from the comments.
With the further code you posted, and Jonathan Wakely's answer on the specific issue with your compiler, I can make a suggestion.
When writing my own codegen, if possible, I prefer generating plain old data and leaving logic and behaviour in non-generated code. This way you get a small(er) pure C++ code in data-driven style, and a separate block of dumb and easy-to-generate data in declarative style.
For example, directly code this
// GeneratedData.h
namespace GeneratedData {
struct Element {
const char *type;
const char *key;
const char *val;
};
Element const *rawElements();
size_t rawElementCount();
}
and this
// main.cpp
#include "GeneratedData.h"
#include <string>
#include <unordered_map>
using Map = std::unordered_map<std::string, std::string>;
using TypeMap = std::unordered_map<std::string, Map>;
TypeMap buildMap(GeneratedData::Element const *el, size_t count)
{
TypeMap map;
for (; count; ++el, --count) {
// build the whole thing here
}
}
// rest of main can call buildMap once, and keep the big map.
// NB. don't pass it around by value!
and finally generate the big dumb file
// GeneratedData.cpp
#include "GeneratedData.h"
namespace {
GeneratedData::Element const array[] = {
// generated elements here
};
}
namespace GeneratedData {
Element const *rawElements { return array; }
size_t rawElementCount() { return sizeof(array)/sizeof(array[0]); }
}
if you really want to, you can separate even that logic from your codegen by just #includeing it in the middle, but it's probably not necessary here.
Original answer
Is CMAKE
CMake.
... compile time
CMake configures a build system which then invokes your compiler. You haven't told us which build system it is configuring for you, but you could probably run it manually for the problematic object file(s), and see how much of the overhead is really CMake's.
... proportional to number of executions or lines of code?
No.
There is some overhead per-execution. Each executed compiler process has some overhead per line of code, but probably much more overhead per enabled optimization, and some optimizations may scale with cyclomatic complexity or other metrics.
statically initializes unordered_map of unoredred_maps with ~20,000 total key-value pairs
You should try to hide your giant initialization as much as possible - you haven't shown any code, but if it's only visible in one translation unit, only one object file will take a very long time to compile.
You could also probably use a codegen tool like gperf to build a perfect hash.
I can't give you a lot more detail without seeing at least a fragment of your actual code and some hint as to how your files and translation units are layed out.
Older versions of GCC take a very long time to compile large initializer-lists like this:
unordered_map<string, unordered_map<string, string>> exampleMap = {
{"DATATYPE1", {{"KEY1", "VAL1"}, {"KEY2", "VAL2"}, /*...*/}}
};
The problem is that every new element in the initializer-list causes more code to be added to the block being compiled, and it gets bigger and bigger, needing to allocate more and more memory for the compiler's AST. Recent versions have been changed to process the initializer-list differently, although some problems still remain. Since you're using GCC 4.8.5 the recent improvements won't help you anyway.
However, I know that dynamic initialization will require longer code. For a simple execution described above (statically initializing entire map), will other method such as dynamic initialization significantly reduce time?
Splitting the large initializer-list into separate statements that insert elements one-by-one will definitely reduce the compile time when using older versions of GCC. Each statement can be compiled very quickly that way, instead of having to compile a single huge initialization that requires allocating more and more memory for each element.

Use gcc plugins to modify the order of variable declarations

I know this is very hard to do, and that I should avoid that, but I have my reasons for this.
I want to modify the order of some field declarations in compilation time, for example :
class A {
char c;
int i;
}
must turn to :
class A {
int i;
char c;
}
if I chose to swap the order of i and c,
I want to know how to change the location of a field declaration having its tree
Anyone know how can I do this ??
thanks !
I use the g++ 4.9.2 version of plugins
If I was going to try this, I would try two different approaches.
Hook in to the PLUGIN_FINISH_TYPE event and rewrite the type there. To rewrite it, reorder the fields and force a relayout of the type. You'll have to read a bit of GCC source to understand how to invalidate the layout and force a new one.
If that didn't work, add a new pass that is run just after gimplification, and try to rewrite the types there. I suspect this is not likely to work, though.
Hook in to the PLUGIN_FINISH_TYPE event and rewrite the type there. To rewrite it, reorder the fields and force a relayout of the type. You'll have to read a bit of GCC source to understand how to invalidate the layout and force a new one.
This is implemented in randomize_layout_plugin.c in linux kernel.
This solution works but it breaks down dwarf debug information. Actually, in debug information, the order of members stay the same one as initially defined in the source code, but the structure is well shuffled in the binary.

small classes how to pass by value and register

I need to use small classes formed essentially from just an integer "handle" and be able to treat that as a class in order to be able to attach methods to it.
At the same time I want also to avoid to pass from one function to the other just the address of the handle ( the "this" pointer) because doing so means that in order to read a handle that should just be there I would need to read a memory location to have it.
So I need essentially to have the "handle" passed by value eventually in registers ( depending on calling convention ).
Some clarifying code is:
struct F{
int aa,bb,cc;};
F A[0x100];
struct handle{
int hhh;
void elaborateHandle(){ ... operations ;}
};
int main(){
handle h;h.hhh=3;
h.elaborateHandle();
// I need that call to pass on the stack essentially the number 3 and not the address of where the number 3 was saved on the stack.
}
I think, that you shouldn't think about it, because here you are having a very very small performance lose, dereferencing one pointer is a cheap operation.
If you use optimizing compiler, there is a chance, that your method call will be inlined inside caller func.
Anyways, if you trying to optimize your performance, you should search in other place.
But if you really thinking that it causes troubles there is a way:
Declare the function outside your class (not as member), and if you want to access private data declare it as friend.
First, print the assembly language of the code that calls your function and the first part of your function.
The assembly language will show how the registers are used. Normally, compilers try to make best use of registers when passing values to functions.
To help the compiler better use registers:
Limit parameter quantities in functions.
The compiler reserves a limited quantity of registers for passing to functions. The more parameters a function has, the less probability that all parameters will be in registers.
Also, the compiler may need to save registers before calling a function in order to pass more parameters to the function.
Pass values that fit inside registers.
If the compiler can't fit a data type in one register, it may use two registers (such as passing 64 bit values on a 32-bit processor).
If the compiler can't fit the data type in two registers, it may push the data on the stack rather than passing by register. This means that the receiving function will have to copy the values from the stack.
Pass large items by reference or pointer. On most platforms, the compiler can store a pointer into a register and pass the register to the receiving function. A lot faster that pushing and popping values with a stack. Also, compilers may use pointers to implement references.
Suggest to the compiler to place values in registers.
Although the register keyword may not be available in more recent language versions, using the register keyword with variables suggests to the compiler that you would like to have the variable in a register. It is only a suggestion and the compiler can ignore it and you.
Define variables as close to their point of usage. This allows the compiler to allocate registers when needed rather than reserving them for a while.
Create scope blocks. Using { and } to create new scope blocks will help the compiler allocate and deallocate registers that are used only in a limited area. So if variables are only used in a limited area in a function, place that area in a new scope block. You can even tag those local variable with the register keyword.
Compile with high optimization levels.
Set your compiler's optimization levels high, then check the assembly language.
The compiler may use memory for variable storage when optimization is at the lowest setting (debugging). At higher optimization levels, the compiler starts using registers more effectively.
Remember, print the assembly language of the functions and the calling code before and after playing with optimization levels.
Using the g++ compiler on the x86 platform, I found that the flag "-freg-struct-return" has a different effect that described in the documentation. According to my tests, that flag, obliges the compiler to pass structures by value ( I didn't checked it but it will be probably be valid when structures have a size smaller than a specific size -- I checked up to 64 bits and it works compiling using -m32 ).
Differently from what the documentation says, structs aren't passed in registers, unless a register passing convention in used.
That behaviour is valid also for declared or compiler recognized const methods of structures ( or classes ).
So if a method doesn't change the structure, than the structure is passed by value ( in stack allocated space or in registers depending e.g. on defining a function using the __attributes__ (( regparam(3) )) .
Instead as it should be, if a structure is modified by a method, than the address is passed to the method instead of the value of the struct ( as it should be ).
The documentation of that flag is misleading because it says: "Return struct and union values in registers when possible. This is more efficient for small structures than -fpcc-struct-return.
If you specify neither -fpcc-struct-return nor -freg-struct-return, GCC defaults to whichever convention is standard for the target. If there is no standard convention, GCC defaults to -fpcc-struct-return, except on targets where GCC is the principal compiler. In those cases, we can choose the standard, and we chose the more efficient register return alternative."
The testing code I used is bellow, the effects may be seen by seeing what the disassemler shows.
#include <stdio.h>
int a;
int aa[100];
struct Token{
short int ind; short int ind1; short int ind2;
int v() const{return aa[ind];}
__attribute((noinline)) void setind(int i){ind=i;}
__attribute((noinline)) int tok() {return ind;}
};
__attribute__ ((noinline)) void showIt(Token t){
t.ind+=t.ind;
a+=t.ind;
t.ind=8;
}
Token t0 = {.ind=15};
Token t1 = {.ind=99};
int main(int argc, char **argv)
{
t0.setind(10);
int x=19;
x=t0.tok();
showIt(t0);
t1.setind(20+x);
showIt(t1);
printf("%i\n",a);
return 0;
}

How do I treat string variables as actual code?

That probably wasn't very clear. Say I have a char *a = "reg". Now, I want to write a function that, on reading the value of a, instantiates an object of a particular class and names it reg.
So for example, say I have a class Register, and a separate function create(char *). I want something like this:
void create(char *s) //s == "reg"
{
//what goes here?
Register reg; // <- this should be the result
}
It should be reusable:
void create(char *s) //s == "second"
{
//what goes here?
Register second; // <- this should be the result
}
I hope I've made myself clear. Essentially, I want to treat the value in a variable as a separate variable name. Is this even possible in C/C++? If not, anything similar? My current solution is to hash the string, and the hash table would store the relevant Register object at that location, but I figured that was pretty unnecessary.
Thanks!
Variable names are compile-time artifacts. They don't exist at runtime. It doesn't make sense in C++ to create a dynamically-named variable. How would you refer to it?
Let's say you had this hypothetical create function, and wrote code like:
create("reg");
reg.value = 5;
This wouldn't compile, because the compiler doesn't know what reg refers to in the second line.
C++ doesn't have any way to look up variables at runtime, so creating them at runtime is a nonstarter. A hash table is the right solution for this. Store objects in the hash table and look them up by name.
This isn't possible. C++ does not offer any facilities to process code at runtime. Given the nature of a typical C++ implementation (which compiles to machine code ahead of time, losing all information about source code), this isn't even remotely feasible.
Like I said in my comment:
What's the point? A variable name is something the compiler, but -most importantly- you, the programmer, should care about. Once the application is compiled, the variable name could be whatever... it could be mangled and senseless, it doesn't matter anymore.
You read/write code, including var-names. Once compiled, it's down to the hardware to deal with it.
Neither C nor C++ have eval functions
Simply because: you only compile what you need, eval implies input later-on that may make no sense, or require other dependencies.
C/C++ are compiled ahead of time, eval implies evaluation at runtime. The C process would then imply: pre-process, compile and link the string, in such a way that it still is part of the current process...
Even if it were possible, eval is always said to be evil, that goes double for languages like the C family that are meant to run reliably, and are often used for time-critical operations. The right tool for the job and all that...
A HashTable with objects that have hash, key, Register, collision members is the sensible thing to do. It's not that much overhead anyway...
Still feel like you need this?
Look into the vast number of scripting languages that are out there. Perl, Python... They're all better suited to do this type of stuff
If you need some variable creation and lookup you can either:
Use one of the scripting languages, as suggested by others
Make the lookup explicitly, yourself. The simplest approach is by using a map, which would map a string to your register object. And then you can have:
std::map<const char*, Register*> table;
Register* create(const char* name) {
Register* r = new Register();
table[name] = r;
return r;
}
Register* lookup(const char* name) {
return table[name];
}
void destroy(const char* name) {
delete table[name];
table.erase(name);
}
Obviously, each time you want to access a variable created this way, you have to go through the call to lookup.