Design Pattern For Making An Assembler

Design Pattern For Making An Assembler - c++

I'm making an 8051 assembler.
Before everything is a tokenizer which reads next tokens, sets error flags, recognizes EOF, etc.
Then there is the main loop of the compiler, which reads next tokens and check for valid mnemonics:
mnemonic= NextToken();
if (mnemonic.Error)
{
//throw some error
}
else if (mnemonic.Text == "ADD")
{
...
}
else if (mnemonic.Text == "ADDC")
{
...
}
And it continues to several cases. Worse than that is the code inside each case, which checks for valid parameters then converts it to compiled code. Right now it looks like this:
if (mnemonic.Text == "MOV")
{
arg1 = NextToken();
if (arg1.Error) { /* throw error */ break; }
arg2 = NextToken();
if (arg2.Error) { /* throw error */ break; }
if (arg1.Text == "A")
{
if (arg2.Text == "B")
output << 0x1234; //Example compiled code
else if (arg2.Text == "#B")
output << 0x5678; //Example compiled code
else
/* throw "Invalid parameters" */
}
else if (arg1.Text == "B")
{
if (arg2.Text == "A")
output << 0x9ABC; //Example compiled code
else if (arg2.Text == "#A")
output << 0x0DEF; //Example compiled code
else
/* throw "Invalid parameters" */
}
}
For each of the mnemonics I have to check for valid parameters then create the correct compiled code. Very similar codes for checking the valid parameters for each mnemonic repeat in each case.
So is there a design pattern for improving this code?
Or simply a simpler way to implement this?
Edit: I accepted plinth's answer, thanks to him. Still if you have ideas on this, i will be happy to learn them. Thanks all.

I've written a number of assemblers over the years doing hand parsing and frankly, you're probably better off using a grammar language and a parser generator.
Here's why - a typical assembly line will probably look something like this:
[label:] [instruction|directive][newline]
and an instruction will be:
plain-mnemonic|mnemonic-withargs
and a directive will be:
plain-directive|directive-withargs
etc.
With a decent parser generator like Gold, you should be able to knock out a grammar for 8051 in a few hours. The advantage to this over hand parsing is that you will be able to have complicated enough expressions in your assembly code like:
.define kMagicNumber 0xdeadbeef
CMPA #(2 * kMagicNumber + 1)
which can be a real bear to do by hand.
If you want to do it by hand, make a table of all your mnemonics which will also include the various allowable addressing modes that they support and for each addressing mode, the number of bytes that each variant will take and the opcode for it. Something like this:
enum {
Implied = 1, Direct = 2, Extended = 4, Indexed = 8 // etc
} AddressingMode;
/* for a 4 char mnemonic, this struct will be 5 bytes. A typical small processor
* has on the order of 100 instructions, making this table come in at ~500 bytes when all
* is said and done.
* The time to binary search that will be, worst case 8 compares on the mnemonic.
* I claim that I/O will take way more time than look up.
* You will also need a table and/or a routine that given a mnemonic and addressing mode
* will give you the actual opcode.
*/
struct InstructionInfo {
char Mnemonic[4];
char AddessingMode;
}
/* order them by mnemonic */
static InstructionInfo instrs[] = {
{ {'A', 'D', 'D', '\0'}, Direct|Extended|Indexed },
{ {'A', 'D', 'D', 'A'}, Direct|Extended|Indexed },
{ {'S', 'U', 'B', '\0'}, Direct|Extended|Indexed },
{ {'S', 'U', 'B', 'A'}, Direct|Extended|Indexed }
}; /* etc */
static int nInstrs = sizeof(instrs)/sizeof(InstrcutionInfo);
InstructionInfo *GetInstruction(char *mnemonic) {
/* binary search for mnemonic */
}
int InstructionSize(AddressingMode mode)
{
switch (mode) {
case Inplied: return 1;
/ * etc */
}
}
Then you will have a list of every instruction which in turn contains a list of all the addressing modes.
So your parser becomes something like this:
char *line = ReadLine();
int nextStart = 0;
int labelLen;
char *label = GetLabel(line, &labelLen, nextStart, &nextStart); // may be empty
int mnemonicLen;
char *mnemonic = GetMnemonic(line, &mnemonicLen, nextStart, &nextStart); // may be empty
if (IsOpcode(mnemonic, mnemonicLen)) {
AddressingModeInfo info = GetAddressingModeInfo(line, nextStart, &nextStart);
if (IsValidInstruction(mnemonic, info)) {
GenerateCode(mnemonic, info);
}
else throw new BadInstructionException(mnemonic, info);
}
else if (IsDirective()) { /* etc. */ }

Yes. Most assemblers use a table of data which describes the instructions: mnemonic, op code, operands forms etc.
I suggest looking at the source code for as. I'm having some trouble finding it though. Look here. (Thanks to Hossein.)

I think you should look into the Visitor pattern. It might not make your code that much simpler, but will reduce coupling and increase reusability. SableCC is a java framework to build compilers that uses it extensively.

When I was playing with a Microcode emulator tool, I converted everything into descendants of an Instruction class. From Instruction were category classes, such as Arithmetic_Instruction and Branch_Instruction. I used a factory pattern to create the instances.
Your best bet may be to get a hold of the assembly language syntax specification. Write a lexer to convert to tokens (**please, don't use if-elseif-else ladders). Then based on semantics, issue the code.
Long time ago, assemblers were a minimum of two passes: The first to resolve constants and form the skeletal code (including symbol tables). The second pass was to generate more concrete or absolute values.
Have you read the Dragon Book lately?

Have you looked at the "Command Dispatcher" pattern?
http://en.wikipedia.org/wiki/Command_pattern
The general idea would be to create an object that handles each instruction (command), and create a look-up table that maps each instruction to the handler class. Each command class would have a common interface (Command.Execute( *args ) for example) which would definitely give you a cleaner / more flexible design than your current enormous switch statement.

Related

Automatically modify C++ Code: Transform code from a parse tree back to source code

i'm trying to modify C++ code. I get a piece of code and line numbers and i need to apply code at the given line numbers.
Like this:
1 void foo(){
2 int a = 5;
3 int b = 10;
4 }
And the lines numbers: 2,3. Now i want to automatically insert Code after the lines numbers:
1 void foo(){
2 int a = 5;
3 newcode();
4 int b = 10;
5 newcode();
6 }
In another Thread people said antlr is a good way for this. So i tried using the antlr runtime api. Its easy to generate a parse Tree. I also found ways to modify it. But now i dont know how to get the source code back from the parse tree?
I dont really need the source code, it would also be enough to just compile the parse tree to an executable program. How can i do this?
Is there maybe an easier way to solve my problem? Maybe just read the code, count the \n and after 2 and 3 \n i put the my code?
Edit:
For my bachelor thesis, i get a piece of parallel code and i need to force it to execute a given interleaving. Therefore i have the job to write a tool to automatically insert instructions like "EnterCriticalSection(...)" and "LeaveCriticalSection(...)" at given lines in the code. Now, i got another job, to rename the main function and insert my own main function. I think this won't work with counting lines.

A possible solution could be to use the generated parse tree for token positions (each TerminalNode has a Token instance attached with the information where it is located in the original source code). With that at hand you can start copying the unmodified text from the original source stream and then insert your own text, which belongs at this position. After that copy the next unmodifed code part and then insert your next modification. Do this in a loop until you reach EOF.
This scenario doesn't care for the final formatting, but I think that's probably not relevant - your tasks sounds like you are doing instrumentation of code for some measurements.
Here's code I use to retrieve the original source code given two parse tree nodes:
std::string MySQLRecognizerCommon::sourceTextForRange(tree::ParseTree *start, tree::ParseTree *stop, bool keepQuotes) {
Token *startToken = antlrcpp::is<tree::TerminalNode *>(start) ? dynamic_cast<tree::TerminalNode *>(start)->getSymbol()
: dynamic_cast<ParserRuleContext *>(start)->start;
Token *stopToken = antlrcpp::is<tree::TerminalNode *>(stop) ? dynamic_cast<tree::TerminalNode *>(start)->getSymbol()
: dynamic_cast<ParserRuleContext *>(stop)->stop;
return sourceTextForRange(startToken, stopToken, keepQuotes);
}
//----------------------------------------------------------------------------------------------------------------------
std::string MySQLRecognizerCommon::sourceTextForRange(Token *start, Token *stop, bool keepQuotes) {
CharStream *cs = start->getTokenSource()->getInputStream();
size_t stopIndex = stop != nullptr ? stop->getStopIndex() : std::numeric_limits<size_t>::max();
std::string result = cs->getText(misc::Interval(start->getStartIndex(), stopIndex));
if (keepQuotes || result.size() < 2)
return result;
char quoteChar = result[0];
if ((quoteChar == '"' || quoteChar == '`' || quoteChar == '\'') && quoteChar == result.back()) {
if (quoteChar == '"' || quoteChar == '\'') {
// Replace any double occurence of the quote char by a single one.
replaceStringInplace(result, std::string(2, quoteChar), std::string(1, quoteChar));
}
return result.substr(1, result.size() - 2);
}
return result;
}
This code has been taken from the MySQL Workbench parser module.

Why is std::map making my code so bloated?

My latest recreational project is to write a brainfuck interpreter in C++. It was straightforward enough but today I decided to add to it by an compilation step. The eventual goal is to be able to create executables but right now all it does is some basic optimization. For instance +++++ is turned for 5 add 1 commands into a single add 5 and so on.
While this works well, I've noticed that the size of my executable after being stripped has gone from 9k to 12k. After some research I've determined the function below is to blame, specifically the map. I do not understand why.
void Brainfuck::compile(const string& input) {
map<char, pair<Opcode, int>> instructions {
{ '<', make_pair(Opcode::MOVL, 1) },
{ '>', make_pair(Opcode::MOVR, 1) },
{ '+', make_pair(Opcode::INCR, 1) },
{ '-', make_pair(Opcode::DECR, 1) },
{ '[', make_pair(Opcode::WHILE, 0) },
{ ']', make_pair(Opcode::WEND, 0) },
{ ',', make_pair(Opcode::INP, 0) },
{ '.', make_pair(Opcode::OUTP, 0) },
};
string::const_iterator c = begin(input);
while (c != end(input)) {
if (instructions.find(*c) != end(instructions)) {
auto instruction = instructions[*c];
makeOp(c, instruction.first, instruction.second);
} else {
c++;
}
}
}
The key in the map is one of the 8 valid Brainfuck operations. The function goes through the input string and looks for these valid characters. An invalid character is just skipped as per the Brainfuck spec. If it finds one it passes the map value for that key to a function called makeop that does optimization, turns it into an op struct and adds it to the vector of instructions that my interpreter will actually execute.
The op struct has two members. An Opcode, which is a scoped enum based on a uint8_t representing one of the 8 operations, and one int containing the operand. (one operand for now. A future more sophisticated version might have instructions with multiple operands.) So in the +++++ example above the struct would contain Opcode::INCR and 5.
So the value of each map entry is a std::pair consisting of the Opcode, and the number of operands. I realize that some overhead is inevitable with a generic data structure but given the simplicity of the description above, don't you think 3k is a bit excessive?
Or perhaps this is not the right approach to achieve my aim efficiently? But if not a std::map then which data structure should I use?
Update:
Thanks to all those who responded. First, a few words about my motives. I don't use C++ in my day job that often. So I'm doing some recreational projects to keep my knowledge sharp. Having the absolutely smallest code size is not as important as e.g. clarity but it is interesting to me to learn how bloat and such things happen.
Following the advice given, my function now looks like this:
static const int MAXOPS = 8;
void Brainfuck::compile(const string& input) {
array<tuple<char, Opcode, int>, MAXOPS> instructions {
make_tuple('<', Opcode::MOVL, 1),
make_tuple('>', Opcode::MOVR, 1),
make_tuple('+', Opcode::INCR, 1),
make_tuple('-', Opcode::DECR, 1),
make_tuple('[', Opcode::WHILE, 0),
make_tuple(']', Opcode::WEND, 0),
make_tuple(',', Opcode::INP, 0),
make_tuple('.', Opcode::OUTP, 0),
};
string::const_iterator c = begin(input);
while (c != end(input)) {
auto instruction = find_if(begin(instructions), end(instructions),
[&instructions, &c](auto i) {
return *c == get<0>(i);
});
if (instruction != end(instructions)) {
makeOp(c, get<1>(*instruction), get<2>(*instruction));
} else {
c++;
}
}
}
I recompiled and...nothing happened. I remembered that I had another method which contained a map and a (now deleted?) response suggested that merely having an instantiation of map would be enough to add code. So I changed that map to an array to and recompiled again. This time the stripped size of
my executable went from 12280 to 9152.

map internally uses a balanced tree for storing the elements. Balanced trees are fast but require some code overhead for re-balancing the tree on insertions or deletes. So 3k for this code seams reasonably.

How to Simplify C++ Boolean Comparisons

I'm trying to find a way to simplify the comparison cases of booleans. Currently, there are only three (as shown below), but I'm about to add a 4th option and this is getting very tedious.
bracketFirstIndex = message.indexOf('[');
mentionFirstIndex = message.indexOf('#');
urlFirstIndex = message.indexOf(urlStarter);
bool startsWithBracket = (bracketFirstIndex != -1);
bool startsWithAtSymbol = (mentionFirstIndex != -1);
bool startsWithUrl = (urlFirstIndex != -1);
if (!startsWithBracket)
{
if (!startsWithAtSymbol)
{
if (!startsWithUrl)
{
// No brackets, mentions, or urls. Send message as normal
cursor.insertText(message);
break;
}
else
{
// There's a URL, lets begin!
index = urlFirstIndex;
}
}
else
{
if (!startsWithUrl)
{
// There's an # symbol, lets begin!
index = mentionFirstIndex;
}
else
{
// There's both an # symbol and URL, pick the first one... lets begin!
index = std::min(urlFirstIndex, mentionFirstIndex);
}
}
}
else
{
if (!startsWithAtSymbol)
{
// There's a [, look down!
index = bracketFirstIndex;
}
else
{
// There's both a [ and #, pick the first one... look down!
index = std::min(bracketFirstIndex, mentionFirstIndex);
}
if (startsWithUrl)
{
// If there's a URL, pick the first one... then lets begin!
// Otherwise, just "lets begin!"
index = std::min(index, urlFirstIndex);
}
}
Is there a better/more simpler way to compare several boolean values, or am I stuck in this format and I should just attempt to squeeze in the 4th option in the appropriate locations?

Some type of text processing are fairly common, and for those, you should strongly consider using an existing library. For example, if the text you are processing is using the markdown syntax, consider using an existing library to parse the markdown into a structured format for you to interpret.
If this is completely custom parsing, then there are a few options:
For very simple text processing (like a single string expected to
be in one of a few formats or containing a piece of subtext in an expected format), use regular expressions. In C++, the RE2 library provides very powerful support for matching and extracting usign regexes.
For more complicated text processing, such as data spanning many lines or having a wide variety of content / syntax, consider using an existing lexer and parser generator. Flex and Bison are common tools (used together) to auto-generate logic for parsing text according to a grammar.
You can, by hand, as you are doing now, write your own parsing logic.
If you go with the latter approach, there are a few ways to simplify things:
Separate the "lexing" (breaking up the input into tokens) and "parsing" (interpreting the series of tokens) into separate phases.
Define a "Token" class and a corresponding hierarchy representing the types of symbols that can appear within your grammar (like RawText, Keyword, AtMention, etc.)
Create one or more enums representing the states that your parsing logic can be in.
Implement your lexing and parsing logic as a state machine that transforms the state given the current state and the next token or letter. Building up a map from (state, token type) to next_state or from (state, token type) to handler_function can help you to simplify the structure.

Since you are switching only on the starting letter, use cases:
enum State { Start1, Start2, Start3, Start4};
State state;
if (startswithbracket) {
state = Start1;
} else {
.
.
.
}
switch (state) {
case Start1:
dosomething;
break;
case Start2:
.
.
.
}
More information about switch syntax and use cases can be found here.

C++ - strcmp() does not work correctly?

There's something really weird going on: strcmp() returns -1 though both strings are exactly the same. Here is a snippet from the output of the debugger (gdb):
(gdb) print s[i][0] == grammar->symbols_from_int[107][0]
$36 = true
(gdb) print s[i][1] == grammar->symbols_from_int[107][1]
$37 = true
(gdb) print s[i][2] == grammar->symbols_from_int[107][2]
$38 = true
(gdb) print s[i][3] == grammar->symbols_from_int[107][3]
$39 = true
(gdb) print s[i][4] == grammar->symbols_from_int[107][4]
$40 = true
(gdb) print s[i][5] == grammar->symbols_from_int[107][5]
$41 = false
(gdb) print grammar->symbols_from_int[107][4]
$42 = 0 '\0'
(gdb) print s[i]
$43 = (char * const&) #0x202dc50: 0x202d730 "Does"
(gdb) print grammar->symbols_from_int[107]
$44 = (char * const&) #0x1c9fb08: 0x1c9a062 "Does"
(gdb) print strcmp(s[i],grammar->symbols_from_int[107])
$45 = -1
Any idea what's going on?
Thanks in advance,
Onur
Edit 1:
Here are some snippets of my code:
# include <unordered_map> // Used as hash table
# include <stdlib.h>
# include <string.h>
# include <stdio.h>
# include <vector>
using namespace std;
using std::unordered_map;
using std::hash;
struct eqstr
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp(s1, s2) == 0;
}
};
...
<some other code>
...
class BPCFG {
public:
char *symbols; // Character array holding all grammar symbols, with NULL seperating them.
char *rules; // Character array holding all rules, with NULL seperating them.
unordered_map<char *, int , hash<char *> , eqstr> int_from_symbols; // Hash table holding the grammar symbols and their integer indices as key/value pairs.
...
<some other code>
...
vector<char *> symbols_from_int; // Hash table holding the integer indices and their corresponding grammar symbols as key/value pairs.
void load_symbols_from_file(const char *symbols_file);
}
void BPCFG::load_symbols_from_file(const char *symbols_file) {
char buffer[200];
FILE *input = fopen(symbols_file, "r");
int symbol_index = 0;
while(fscanf(input, "%s", buffer) > 0) {
if(buffer[0] == '/')
strcpy(symbols + symbol_index, buffer+1);
else
strcpy(symbols + symbol_index, buffer);
symbols_from_int.push_back(symbols + symbol_index);
int_from_symbols[symbols+symbol_index] = symbols_from_int.size()-1;
probs.push_back(vector<double>());
hyperprobs.push_back(vector<double>());
rules_from_IntPair.push_back(vector<char *>());
symbol_index += strlen(symbols+symbol_index) + 1;
}
fclose(input);
}
This last function (BPCFG::load_symbols_from_file) seems to be the only function I modify symbols_from_int in my whole code. Please tell me if you need some more code. I'm not putting everything because it's hundreds of lines.
Edit 2:
OK, I think I should add one more thing from my code. This is the constructor of BPCFG class:
BPCFG(int symbols_length, int rules_length, int symbol_count, int rule_count):
int_from_symbols(1.5*symbol_count),
IntPair_from_rules(1.5*rule_count),
symbol_after_dot(10*rule_count)
{
symbols = (char *)malloc(symbols_length*sizeof(char));
rules = (char *)malloc(rules_length*sizeof(char));
}
Edit 3:
Here is the code on the path to the point of error. It's not compilable, but it shows where the code stepped through (I checked with next and step commands in the debugger that the code indeed follows this route):
BPCFG my_grammar(2000, 5500, 194, 187);
my_grammar.load_symbols_from_file("random_50_1_words_symbols.txt");
<some irrelevant code>
my_grammar.load_rules_from_file("random_50_1_words_grammar.txt", true);
<some irrelevant code>
my_grammar.load_symbols_after_dots();
BPCFGParser my_parser(&my_grammar);
BPCFGParser::Sentence s;
// (Sentence is defined in the BPCFGParser class with
// typedef vector<char *> Sentence;)
Edge e;
try {
my_parser.parse(s, e);
}
catch(char *e) {fprintf(stderr, "%s", e);}
void BPCFGParser::parse(const Sentence & s, Edge & goal_edge) {
/* Initializing the chart */
chart::active_sets.clear();
chart::passive_sets.clear();
chart::active_sets.resize(s.size());
chart::passive_sets.resize(s.size());
// initialize(sentence, goal);
try {
initialize(s, goal_edge);
}
catch (char *e) {
if(strcmp(e, UNKNOWN_WORD) == 0)
throw e;
}
<Does something more, but the execution does not come to this point>
}
void BPCFGParser::initialize(const Sentence & s, Edge & goal_edge) {
// create a new chart and new agendas
/* For now, we plan to do this during constructing the BPCFGParser object */
// for each word w:[start,end] in the sentence
// discoverEdge(w:[start,end])
Edge temp_edge;
for(int i = 0;i < s.size();i++) {
temp_edge.span.start = i;
temp_edge.span.end = i+1;
temp_edge.isActive = false;
/* Checking whether the given word is ever seen in the training corpus */
unordered_map<char *, int , hash<char *> , eqstr>::const_iterator it = grammar->int_from_symbols.find(s[i]);
if(it == grammar->int_from_symbols.end())
throw UNKNOWN_WORD;
<Does something more, but execution does not come to this point>
}
}
Where I run the print commands in the debugger is the last
throw UNKNOWN_WORD;
command. I mean, I was stepping with next on GDB and after seeing this line, I ran all these print commands.
Thank you for your interest,
Onur
OK, I think I should add one more thing from my code. This is the constructor of BPCFG class:
BPCFG(int symbols_length, int rules_length, int symbol_count, int rule_count):
int_from_symbols(1.5*symbol_count),
IntPair_from_rules(1.5*rule_count),
symbol_after_dot(10*rule_count)
{
symbols = (char *)malloc(symbols_length*sizeof(char));
rules = (char *)malloc(rules_length*sizeof(char));
}

This sounds like s is a pointer to an array that was on the stack which is overwritten as soon as a new function is called, ie strcmp()
What does the debugger say they are after the strcmp() call?

In recent Linux distributions strcmp is a symbol of type STT_GNU_IFUNC. This is not supported in the last release of GDB (7.2 at the time of writing). That may be the cause of your problem, although in your case the return value looks genuine.

Given that GDB output the only possible cause I can see is that strcmp() is bugged.
You basically did in GDB what strcmp does: compare character per character, until both are zeros (at 4).
Can you try print strcmp("Does", "Does"); ?
EDIT: also try:
print stricmp(s[i], grammar->symbols_from_int[107], 1);
print stricmp(s[i], grammar->symbols_from_int[107], 2);
print stricmp(s[i], grammar->symbols_from_int[107], 3);
print stricmp(s[i], grammar->symbols_from_int[107], 4);
print stricmp(s[i], grammar->symbols_from_int[107], 5);

The only way you'll ever figure this out is to step INTO strcmp with the debugger.

I would strongly suggest you zero out the memory before you start using it. I realize that the GDB output makes no sense because you do verify it's a null terminated strings, but I've had a lot of string.h bizarre problems go away with a memset, bzero, calloc or whatever you want to use.
Specifically, zero out the memory in the constructor and the buffer you use when reading from the file.

As others have pointed out, it's near to impossible that there would be a problem with strcmp.
It would be best to strip down the problematic code to the absolute minimum required to reproduce the problem (also include instructions on how to compile - which compiler, flags, runtime library are you using?). Most likely you will find a bug in the process.
If not, you will receive a lot of credit for finding a bug in one of the most intensively used C functions ;-)

May be size of the two string are not same.

Does strlen(s[i]) and strlen(grammar->symbols_from_int[107]) return the same?
Also, I can't imagine this is the problem but could you use a constant value rather than i just to make sure that something weird is not going?

I would strongly recommend trying to solve the problem with regular STL strings first - you'll get cleaner code and automated memory management so you can concentrate on the parser logic. Only after the thing is working and profiling proves that string manipulation is the performance bottleneck I would look at all the algorithms used in greater detail, then at specialized string allocators, and - as the last resort - at manual character array manipulation, in that order.

Thanks to all spared time to answer. I wrote my own string comparison function and it worked correctly at the same point, so obviously it's a problem with strcmp(). Nevertheless, my code still does not work as I expect. But I'll request help for that only after I analyze it thoroughly. Thanks.

Mark you own strcmp realization as inline and look what happens ...
From GCC 4.3 Release Series Changes, New Features, and Fixes for GCC 4.3.4 :
"During feedback directed optimizations, the expected block size the memcpy, memset and bzero functions operate on is discovered and for cases of commonly used small sizes, specialized inline code is generated."
May be some other related bugs exists ...
Try to switch compiler optimization or/and function inlining off.

Maybe there is a non-printable character in one of the strings?

Checking lists and running handlers

I find myself writing code that looks like this a lot:
set<int> affected_items;
while (string code = GetKeyCodeFromSomewhere())
{
if (code == "some constant" || code == "some other constant") {
affected_items.insert(some_constant_id);
} else if (code == "yet another constant" || code == "the constant I didn't mention yet") {
affected_items.insert(some_other_constant_id);
} // else if etc...
}
for (set<int>::iterator it = affected_items.begin(); it != affected_items.end(); it++)
{
switch(*it)
{
case some_constant_id:
RunSomeFunction(with, these, params);
break;
case some_other_constant_id:
RunSomeOtherFunction(with, these, other, params);
break;
// etc...
}
}
The reason I end up writing this code is that I need to only run the functions in the second loop once even if I've received multiple key codes that might cause them to run.
This just doesn't seem like the best way to do it. Is there a neater way?

One approach is to maintain a map from strings to booleans. The main logic can start with something like:
if(done[code])
continue;
done[code] = true;
Then you can perform the appropriate action as soon as you identify the code.
Another approach is to store something executable (object, function pointer, whatever) into a sort of "to do list." For example:
while (string code = GetKeyCodeFromSomewhere())
{
todo[code] = codefor[code];
}
Initialize codefor to contain the appropriate function pointer, or object subclassed from a common base class, for each code value. If the same code shows up more than once, the appropriate entry in todo will just get overwritten with the same value that it already had. At the end, iterate over todo and run all of its members.

Since you don't seem to care about the actual values in the set you could replace it with setting bits in an int. You can also replace the linear time search logic with log time search logic. Here's the final code:
// Ahead of time you build a static map from your strings to bit values.
std::map< std::string, int > codesToValues;
codesToValues[ "some constant" ] = 1;
codesToValues[ "some other constant" ] = 1;
codesToValues[ "yet another constant" ] = 2;
codesToValues[ "the constant I didn't mention yet" ] = 2;
// When you want to do your work
int affected_items = 0;
while (string code = GetKeyCodeFromSomewhere())
affected_items |= codesToValues[ code ];
if( affected_items & 1 )
RunSomeFunction(with, these, params);
if( affected_items & 2 )
RunSomeOtherFunction(with, these, other, params);
// etc...

Its certainly not neater, but you could maintain a set of flags that say whether you've called that specific function or not. That way you avoid having to save things off in a set, you just have the flags.
Since there is (presumably from the way it is written), a fixed at compile time number of different if/else blocks, you can do this pretty easily with a bitset.

Obviously, it will depend on the specific circumstances, but it might be better to have the functions that you call keep track of whether they've already been run and exit early if required.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js