Why is std::map making my code so bloated?

Why is std::map making my code so bloated? - c++

My latest recreational project is to write a brainfuck interpreter in C++. It was straightforward enough but today I decided to add to it by an compilation step. The eventual goal is to be able to create executables but right now all it does is some basic optimization. For instance +++++ is turned for 5 add 1 commands into a single add 5 and so on.
While this works well, I've noticed that the size of my executable after being stripped has gone from 9k to 12k. After some research I've determined the function below is to blame, specifically the map. I do not understand why.
void Brainfuck::compile(const string& input) {
map<char, pair<Opcode, int>> instructions {
{ '<', make_pair(Opcode::MOVL, 1) },
{ '>', make_pair(Opcode::MOVR, 1) },
{ '+', make_pair(Opcode::INCR, 1) },
{ '-', make_pair(Opcode::DECR, 1) },
{ '[', make_pair(Opcode::WHILE, 0) },
{ ']', make_pair(Opcode::WEND, 0) },
{ ',', make_pair(Opcode::INP, 0) },
{ '.', make_pair(Opcode::OUTP, 0) },
};
string::const_iterator c = begin(input);
while (c != end(input)) {
if (instructions.find(*c) != end(instructions)) {
auto instruction = instructions[*c];
makeOp(c, instruction.first, instruction.second);
} else {
c++;
}
}
}
The key in the map is one of the 8 valid Brainfuck operations. The function goes through the input string and looks for these valid characters. An invalid character is just skipped as per the Brainfuck spec. If it finds one it passes the map value for that key to a function called makeop that does optimization, turns it into an op struct and adds it to the vector of instructions that my interpreter will actually execute.
The op struct has two members. An Opcode, which is a scoped enum based on a uint8_t representing one of the 8 operations, and one int containing the operand. (one operand for now. A future more sophisticated version might have instructions with multiple operands.) So in the +++++ example above the struct would contain Opcode::INCR and 5.
So the value of each map entry is a std::pair consisting of the Opcode, and the number of operands. I realize that some overhead is inevitable with a generic data structure but given the simplicity of the description above, don't you think 3k is a bit excessive?
Or perhaps this is not the right approach to achieve my aim efficiently? But if not a std::map then which data structure should I use?
Update:
Thanks to all those who responded. First, a few words about my motives. I don't use C++ in my day job that often. So I'm doing some recreational projects to keep my knowledge sharp. Having the absolutely smallest code size is not as important as e.g. clarity but it is interesting to me to learn how bloat and such things happen.
Following the advice given, my function now looks like this:
static const int MAXOPS = 8;
void Brainfuck::compile(const string& input) {
array<tuple<char, Opcode, int>, MAXOPS> instructions {
make_tuple('<', Opcode::MOVL, 1),
make_tuple('>', Opcode::MOVR, 1),
make_tuple('+', Opcode::INCR, 1),
make_tuple('-', Opcode::DECR, 1),
make_tuple('[', Opcode::WHILE, 0),
make_tuple(']', Opcode::WEND, 0),
make_tuple(',', Opcode::INP, 0),
make_tuple('.', Opcode::OUTP, 0),
};
string::const_iterator c = begin(input);
while (c != end(input)) {
auto instruction = find_if(begin(instructions), end(instructions),
[&instructions, &c](auto i) {
return *c == get<0>(i);
});
if (instruction != end(instructions)) {
makeOp(c, get<1>(*instruction), get<2>(*instruction));
} else {
c++;
}
}
}
I recompiled and...nothing happened. I remembered that I had another method which contained a map and a (now deleted?) response suggested that merely having an instantiation of map would be enough to add code. So I changed that map to an array to and recompiled again. This time the stripped size of
my executable went from 12280 to 9152.

map internally uses a balanced tree for storing the elements. Balanced trees are fast but require some code overhead for re-balancing the tree on insertions or deletes. So 3k for this code seams reasonably.

Related

C++ if statement order

A portion of a program needs to check if two c-strings are identical while searching though an ordered list (e.g.{"AAA", "AAB", "ABA", "CLL", "CLZ"}). It is feasible that the list could get quite large, so small improvements in speed are worth degradation of readability. Assume that you are restricted to C++ (please don't suggest switching to assembly). How can this be improved?
typedef char StringC[5];
void compare (const StringC stringX, const StringC stringY)
{
// use a variable so compareResult won't have to be computed twice
int compareResult = strcmp(stringX, stringY);
if (compareResult < 0) // roughly 50% chance of being true, so check this first
{
// no match. repeat with a 'lower' value string
compare(stringX, getLowerString() );
}
else if (compareResult > 0) // roughly 49% chance of being true, so check this next
{
// no match. repeat with a 'higher' value string
compare(stringX, getHigherString() );
}
else // roughly 1% chance of being true, so check this last
{
// match
reportMatch(stringY);
}
}
You can assume that stringX and stringY are always the same length and you won't get any invalid data input.
From what I understand, a compiler will make the code so that the CPU will check the first if-statement and jump if it's false, so it would be best if that first statement is the most likely to be true, as jumps interfere with the pipeline. I have also heard that when doing a compare, a[n Intel] CPU will do a subtraction and look at the status of flags without saving the subtraction's result. Would there be a way to do the strcmp once, without saving the result into a variable, but still being able to check that result during the both of the first two if-statements?

std::binary_search may help:
bool cstring_less(const char (&lhs)[4], const char (&rhs)[4])
{
return std::lexicographical_compare(std::begin(lhs), std::end(lhs),
std::begin(rhs), std::end(rhs));
}
int main(int, char**)
{
const char cstrings[][4] = {"AAA", "AAB", "ABA", "CLL", "CLZ"};
const char lookFor[][4] = {"BBB", "ABA", "CLS"};
for (const auto& s : lookFor)
{
if (std::binary_search(std::begin(cstrings), std::end(cstrings),
s, cstring_less))
{
std::cout << s << " Found.\n";
}
}
}
Demo

I think using hash tables can improve the speed of comparison drastically. Also, if your program is multithreaded, you can find some useful hash tables in intel thread building blocks library. For example, tbb::concurrent_unordered_map has the same api as std::unordered_map
I hope it helps you.

If you try to compare all the strings to each other you'll get in a O(N*(N-1)) problem. The best thing, as you have stated the lists can grow large, is to sort them (qsort algorithm has O(N*log(N))) and then compare each element with the next one in the list, which adds a new O(N) giving up to O(N*log(N)) total complexity. As you have the list already ordered, you can just traverse it (making the thing O(N)), comparing each element with the next. An example, valid in C and C++ follows:
for(i = 0; i < N-1; i++) /* one comparison less than the number of elements */
if (strcmp(array[i], array[i+1]) == 0)
break;
if (i < N-1) { /* this is a premature exit from the loop, so we found a match */
/* found a match, array[i] equals array[i+1] */
} else { /* we exhausted al comparisons and got out normally from the loop */
/* no match found */
}

Searching through a vector

I have a vector, vector<Gate*> PI_Gates, that can contain the characters 0, 1, D, D` (D Not), and X. Depending on the combination of characters in the vector, I will set an output to some value. Here is a example of what I have done so far:
if(GateType== GATE_NAND)
{
for(int i=0; i<PI_Gates.size(); i++)
{
if(PI_Gates[i]->getValue() == LOGIC_DBAR)
{
MyGate->setValue(LOGIC_D);
}
else if(PI_Gates[i]->getValue() == LOGIC_ZERO)
{
MyGate->setValue(LOGIC_ONE);
}
else if(PI_Gates[i]->getValue() == LOGIC_X)
{
MyGate->setValue(LOGIC_X);
}
else if(PI_Gates[i]->getValue() == LOGIC_ONE)
{
}
}
This code is for analyzing a circuit at the gate level and outputting the results. For the cases where the input value is D, 0, or X, my output determined at that time since they will have logic priority. However, for this particular logic gate, if all my inputs are logic 1, then my output will be logic zero. Likewise, when all the inputs are logic D, my output will be logic D not. If the inputs are a combination of logic 1 and logic D, the output is still logic D not. As a result, I need to find a way to search through my vector and determine if my inputs are the following: all logic 1, all logic D, combination of logic 1 and D. And this is where I am stuck. I cannot come up with a good approach to this.
I guess im simpler terms, its like you have a vector that contains the numbers 1, 2, 3, 4, and 5 and you want to check if all the values are either 4 or 5 and if they are, do some operation when you are done iterating through the vector. The solution may be something simple but I think i am over complicating it in my head.
Thanks in advance.

For what you are doing, a simple lookup would work:
const std::string valid_values[] =
{ "0", "1", "D", "/D" /* D Not */, "X"};
//...
std::string * const end_ptr = &valid_values[sizeof(valid_values)];
const std::string value = PI_Gates[i]->getValue();
size_t position = find(&valid_values[0], // begin
end_ptr, // End
value);
if (position != end_ptr)
{
// The gate name was found
MyGate->setValue(value);
}
If you want more versatility, you could use a std::map<string, function pointer> and associate functions with gate names. A table of structures containing the gate name and function pointer would work too.

Need suggestion to improve speed for word break (dynamic programming)

The problem is: Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.
For example, given
s = "hithere",
dict = ["hi", "there"].
Return true because "hithere" can be segmented as "leet code".
My implementation is as below. This code is ok for normal cases. However, it suffers a lot for input like:
s = "aaaaaaaaaaaaaaaaaaaaaaab", dict = {"aa", "aaaaaa", "aaaaaaaa"}.
I want to memorize the processed substrings, however, I cannot done it right. Any suggestion on how to improve? Thanks a lot!
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict) {
int len = s.size();
if(len<1) return true;
for(int i(0); i<len; i++) {
string tmp = s.substr(0, i+1);
if((wordDict.find(tmp)!=wordDict.end())
&& (wordBreak(s.substr(i+1), wordDict)) )
return true;
}
return false;
}
};

It's logically a two-step process. Find all dictionary words within the input, consider the found positions (begin/end pairs), and then see if those words cover the whole input.
So you'd get for your example
aa: {0,2}, {1,3}, {2,4}, ... {20,22}
aaaaaa: {0,6}, {1,7}, ... {16,22}
aaaaaaaa: {0,8}, {1,9} ... {14,22}
This is a graph, with nodes 0-23 and a bunch of edges. But node 23 b is entirely unreachable - no incoming edge. This is now a simple graph theory problem
Finding all places where dictionary words occur is pretty easy, if your dictionary is organized as a trie. But even an std::map is usable, thanks to its equal_range method. You have what appears to be an O(N*N) nested loop for begin and end positions, with O(log N) lookup of each word. But you can quickly determine if s.substr(begin,end) is a still a viable prefix, and what dictionary words remain with that prefix.
Also note that you can build the graph lazily. Staring at begin=0 you find edges {0,2}, {0,6} and {0,8}. (And no others). You can now search nodes 2, 6 and 8. You even have a good algorithm - A* - that suggests you try node 8 first (reachable in just 1 edge). Thus, you'll find nodes {8,10}, {8,14} and {8,16} etc. As you see, you'll never need to build the part of the graph that contains {1,3} as it's simply unreachable.
Using graph theory, it's easy to see why your brute-force method breaks down. You arrive at node 8 (aaaaaaaa.aaaaaaaaaaaaaab) repeatedly, and each time search the subgraph from there on.
A further optimization is to run bidirectional A*. This would give you a very fast solution. At the second half of the first step, you look for edges leading to 23, b. As none exist, you immediately know that node {23} is isolated.

In your code, you are not using dynamic programming because you are not remembering the subproblems that you have already solved.
You can enable this remembering, for example, by storing the results based on the starting position of the string s within the original string, or even based on its length (because anyway the strings you are working with are suffixes of the original string, and therefore its length uniquely identifies it). Then, in the beginning of your wordBreak function, just check whether such length has already been processed and, if it has, do not rerun the computations, just return the stored value. Otherwise, run computations and store the result.
Note also that your approach with unordered_set will not allow you to obtain the fastest solution. The fastest solution that I can think of is O(N^2) by storing all the words in a trie (not in a map!) and following this trie as you walk along the given string. This achieves O(1) per loop iteration not counting the recursion call.

Thanks for all the comments. I changed my previous solution to the implementation below. At this point, I didn't explore to optimize on the dictionary, but those insights are very valuable and are very much appreciated.
For the current implementation, do you think it can be further improved? Thanks!
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict) {
int len = s.size();
if(len<1) return true;
if(wordDict.size()==0) return false;
vector<bool> dq (len+1,false);
dq[0] = true;
for(int i(0); i<len; i++) {// start point
if(dq[i]) {
for(int j(1); j<=len-i; j++) {// length of substring, 1:len
if(!dq[i+j]) {
auto pos = wordDict.find(s.substr(i, j));
dq[i+j] = dq[i+j] || (pos!=wordDict.end());
}
}
}
if(dq[len]) return true;
}
return false;
}
};

Try the following:
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict)
{
for (auto w : wordDict)
{
auto pos = s.find(w);
if (pos != string::npos)
{
if (wordBreak(s.substr(0, pos), wordDict) &&
wordBreak(s.substr(pos + w.size()), wordDict))
return true;
}
}
return false;
}
};
Essentially one you find a match remove the matching part from the input string and so continue testing on a smaller input.

Design Pattern For Making An Assembler

I'm making an 8051 assembler.
Before everything is a tokenizer which reads next tokens, sets error flags, recognizes EOF, etc.
Then there is the main loop of the compiler, which reads next tokens and check for valid mnemonics:
mnemonic= NextToken();
if (mnemonic.Error)
{
//throw some error
}
else if (mnemonic.Text == "ADD")
{
...
}
else if (mnemonic.Text == "ADDC")
{
...
}
And it continues to several cases. Worse than that is the code inside each case, which checks for valid parameters then converts it to compiled code. Right now it looks like this:
if (mnemonic.Text == "MOV")
{
arg1 = NextToken();
if (arg1.Error) { /* throw error */ break; }
arg2 = NextToken();
if (arg2.Error) { /* throw error */ break; }
if (arg1.Text == "A")
{
if (arg2.Text == "B")
output << 0x1234; //Example compiled code
else if (arg2.Text == "#B")
output << 0x5678; //Example compiled code
else
/* throw "Invalid parameters" */
}
else if (arg1.Text == "B")
{
if (arg2.Text == "A")
output << 0x9ABC; //Example compiled code
else if (arg2.Text == "#A")
output << 0x0DEF; //Example compiled code
else
/* throw "Invalid parameters" */
}
}
For each of the mnemonics I have to check for valid parameters then create the correct compiled code. Very similar codes for checking the valid parameters for each mnemonic repeat in each case.
So is there a design pattern for improving this code?
Or simply a simpler way to implement this?
Edit: I accepted plinth's answer, thanks to him. Still if you have ideas on this, i will be happy to learn them. Thanks all.

I've written a number of assemblers over the years doing hand parsing and frankly, you're probably better off using a grammar language and a parser generator.
Here's why - a typical assembly line will probably look something like this:
[label:] [instruction|directive][newline]
and an instruction will be:
plain-mnemonic|mnemonic-withargs
and a directive will be:
plain-directive|directive-withargs
etc.
With a decent parser generator like Gold, you should be able to knock out a grammar for 8051 in a few hours. The advantage to this over hand parsing is that you will be able to have complicated enough expressions in your assembly code like:
.define kMagicNumber 0xdeadbeef
CMPA #(2 * kMagicNumber + 1)
which can be a real bear to do by hand.
If you want to do it by hand, make a table of all your mnemonics which will also include the various allowable addressing modes that they support and for each addressing mode, the number of bytes that each variant will take and the opcode for it. Something like this:
enum {
Implied = 1, Direct = 2, Extended = 4, Indexed = 8 // etc
} AddressingMode;
/* for a 4 char mnemonic, this struct will be 5 bytes. A typical small processor
* has on the order of 100 instructions, making this table come in at ~500 bytes when all
* is said and done.
* The time to binary search that will be, worst case 8 compares on the mnemonic.
* I claim that I/O will take way more time than look up.
* You will also need a table and/or a routine that given a mnemonic and addressing mode
* will give you the actual opcode.
*/
struct InstructionInfo {
char Mnemonic[4];
char AddessingMode;
}
/* order them by mnemonic */
static InstructionInfo instrs[] = {
{ {'A', 'D', 'D', '\0'}, Direct|Extended|Indexed },
{ {'A', 'D', 'D', 'A'}, Direct|Extended|Indexed },
{ {'S', 'U', 'B', '\0'}, Direct|Extended|Indexed },
{ {'S', 'U', 'B', 'A'}, Direct|Extended|Indexed }
}; /* etc */
static int nInstrs = sizeof(instrs)/sizeof(InstrcutionInfo);
InstructionInfo *GetInstruction(char *mnemonic) {
/* binary search for mnemonic */
}
int InstructionSize(AddressingMode mode)
{
switch (mode) {
case Inplied: return 1;
/ * etc */
}
}
Then you will have a list of every instruction which in turn contains a list of all the addressing modes.
So your parser becomes something like this:
char *line = ReadLine();
int nextStart = 0;
int labelLen;
char *label = GetLabel(line, &labelLen, nextStart, &nextStart); // may be empty
int mnemonicLen;
char *mnemonic = GetMnemonic(line, &mnemonicLen, nextStart, &nextStart); // may be empty
if (IsOpcode(mnemonic, mnemonicLen)) {
AddressingModeInfo info = GetAddressingModeInfo(line, nextStart, &nextStart);
if (IsValidInstruction(mnemonic, info)) {
GenerateCode(mnemonic, info);
}
else throw new BadInstructionException(mnemonic, info);
}
else if (IsDirective()) { /* etc. */ }

Yes. Most assemblers use a table of data which describes the instructions: mnemonic, op code, operands forms etc.
I suggest looking at the source code for as. I'm having some trouble finding it though. Look here. (Thanks to Hossein.)

I think you should look into the Visitor pattern. It might not make your code that much simpler, but will reduce coupling and increase reusability. SableCC is a java framework to build compilers that uses it extensively.

When I was playing with a Microcode emulator tool, I converted everything into descendants of an Instruction class. From Instruction were category classes, such as Arithmetic_Instruction and Branch_Instruction. I used a factory pattern to create the instances.
Your best bet may be to get a hold of the assembly language syntax specification. Write a lexer to convert to tokens (**please, don't use if-elseif-else ladders). Then based on semantics, issue the code.
Long time ago, assemblers were a minimum of two passes: The first to resolve constants and form the skeletal code (including symbol tables). The second pass was to generate more concrete or absolute values.
Have you read the Dragon Book lately?

Have you looked at the "Command Dispatcher" pattern?
http://en.wikipedia.org/wiki/Command_pattern
The general idea would be to create an object that handles each instruction (command), and create a look-up table that maps each instruction to the handler class. Each command class would have a common interface (Command.Execute( *args ) for example) which would definitely give you a cleaner / more flexible design than your current enormous switch statement.

Checking lists and running handlers

I find myself writing code that looks like this a lot:
set<int> affected_items;
while (string code = GetKeyCodeFromSomewhere())
{
if (code == "some constant" || code == "some other constant") {
affected_items.insert(some_constant_id);
} else if (code == "yet another constant" || code == "the constant I didn't mention yet") {
affected_items.insert(some_other_constant_id);
} // else if etc...
}
for (set<int>::iterator it = affected_items.begin(); it != affected_items.end(); it++)
{
switch(*it)
{
case some_constant_id:
RunSomeFunction(with, these, params);
break;
case some_other_constant_id:
RunSomeOtherFunction(with, these, other, params);
break;
// etc...
}
}
The reason I end up writing this code is that I need to only run the functions in the second loop once even if I've received multiple key codes that might cause them to run.
This just doesn't seem like the best way to do it. Is there a neater way?

One approach is to maintain a map from strings to booleans. The main logic can start with something like:
if(done[code])
continue;
done[code] = true;
Then you can perform the appropriate action as soon as you identify the code.
Another approach is to store something executable (object, function pointer, whatever) into a sort of "to do list." For example:
while (string code = GetKeyCodeFromSomewhere())
{
todo[code] = codefor[code];
}
Initialize codefor to contain the appropriate function pointer, or object subclassed from a common base class, for each code value. If the same code shows up more than once, the appropriate entry in todo will just get overwritten with the same value that it already had. At the end, iterate over todo and run all of its members.

Since you don't seem to care about the actual values in the set you could replace it with setting bits in an int. You can also replace the linear time search logic with log time search logic. Here's the final code:
// Ahead of time you build a static map from your strings to bit values.
std::map< std::string, int > codesToValues;
codesToValues[ "some constant" ] = 1;
codesToValues[ "some other constant" ] = 1;
codesToValues[ "yet another constant" ] = 2;
codesToValues[ "the constant I didn't mention yet" ] = 2;
// When you want to do your work
int affected_items = 0;
while (string code = GetKeyCodeFromSomewhere())
affected_items |= codesToValues[ code ];
if( affected_items & 1 )
RunSomeFunction(with, these, params);
if( affected_items & 2 )
RunSomeOtherFunction(with, these, other, params);
// etc...

Its certainly not neater, but you could maintain a set of flags that say whether you've called that specific function or not. That way you avoid having to save things off in a set, you just have the flags.
Since there is (presumably from the way it is written), a fixed at compile time number of different if/else blocks, you can do this pretty easily with a bitset.

Obviously, it will depend on the specific circumstances, but it might be better to have the functions that you call keep track of whether they've already been run and exit early if required.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why is std::map making my code so bloated? - c++

map internally uses a balanced tree for storing the elements. Balanced trees are fast but require some code overhead for re-balancing the tree on insertions or deletes. So 3k for this code seams reasonably.

Related

C++ if statement order

Searching through a vector

Need suggestion to improve speed for word break (dynamic programming)

Design Pattern For Making An Assembler

Checking lists and running handlers

Categories

Resources