I have this piece of code which accesses some information about a point on a 'x' and 'y' axis. This information is later used to draw some points onto the screen.
This is how the code works:
//MAX_X_COORD has a value of 169
//MAX_Y_COORD has a value of 55
void RedrawFromDATAtable()
{
COORD pos;
HANDLE tmpbuf = CreateConsoleScreenBuffer(GENERIC_WRITE , NULL, NULL, CONSOLE_TEXTMODE_BUFFER, NULL);
WriteConsoleA(tmpbuf, " ", 1, NULL, NULL);
if(SetConsoleActiveScreenBuffer(tmpbuf)==0)
{MessageBoxA(NULL, "ERROR", "ERROR", 0);return;}
bufferdata_ex * dptr;
//bufferdata_ex * y_dptr;
int * lcol(new int); //Increases speed by reducing function calls - Experienced about twice as fast drawing!
for(short x=0;x<MAX_X_COORD;x++)
{
//y_dptr = bridge->DATA[x];
for(short y=0;y<MAX_Y_COORD;y++)
{
//dptr = (y_dptr+y); //Rewrite to use a constant pointer!
dptr = &(_bridge->DATA[x][y]);
if(dptr->InUse==true)
{
{
pos.X = x;
pos.Y = y;
SetConsoleCursorPosition(output, pos);
//haschanged = false;
}
if(!(*lcol==dptr->color)) //Need for a new color?
{ SetConsoleTextAttribute(output, dptr->color);lcol = &dptr->color;}
char c((char)dptr->sym);
WriteConsoleA(output, &c, 1, NULL, NULL);
lcol = &dptr->color;
}
}
}
SetConsoleTextAttribute(output, bridge->current_color);
SetConsoleCursorPosition(output, last_mouse_position);
SetConsoleActiveScreenBuffer(output);
CloseHandle(tmpbuf);
delete lcol;
}
Cut to the case!
Alright!
So recently I had a thought that accessing the array like that would slow down my code. As far as I know then whenever you access an element in an array the processor will take the base adress of the array and from there by multiply the size of the elements by the index which is used to find the adress of the specified element.
My thought here was that if I ask the processor to do that multiple times, instead of just creating a pointer to the adress, and then use that to process my elements, then it would slow down my code.
So I rewrote the code to the following:
void RedrawFromDATAtable()
{
COORD pos;
HANDLE tmpbuf = CreateConsoleScreenBuffer(GENERIC_WRITE , NULL, NULL, CONSOLE_TEXTMODE_BUFFER, NULL);
WriteConsoleA(tmpbuf, " ", 1, NULL, NULL);
if(SetConsoleActiveScreenBuffer(tmpbuf)==0)
{MessageBoxA(NULL, "ERROR", "ERROR", 0);return;}
bufferdata_ex * dptr;
bufferdata_ex * y_dptr;
int * lcol(new int); //Increases speed by reducing function calls - Experienced about twice as fast drawing!
for(short x=0;x<MAX_X_COORD;x++)
{
y_dptr = _bridge->DATA[x];
for(short y=0;y<MAX_Y_COORD;y++)
{
dptr = (y_dptr+y); //Rewrite to use a constant pointer!
//dptr = &(bridge->DATA[x][y]);
if(dptr->InUse==true)
{
{
pos.X = x;
pos.Y = y;
SetConsoleCursorPosition(output, pos);
//haschanged = false;
}
if(!(*lcol==dptr->color)) //Need for a new color?
{ SetConsoleTextAttribute(output, dptr->color);lcol = &dptr->color;}
char c((char)dptr->sym);
WriteConsoleA(output, &c, 1, NULL, NULL);
lcol = &dptr->color;
}
}
}
SetConsoleTextAttribute(output, bridge->current_color);
SetConsoleCursorPosition(output, last_mouse_position);
SetConsoleActiveScreenBuffer(output);
CloseHandle(tmpbuf);
delete lcol;
}
The idea seems perfectly fine to me, but the problem is that the first piece of code is faster than the second piece of code!
So my question is: Why is it the first piece of code is faster than the second piece of code?
For those who doesn't like to read:
Why is the first piece of code faster than the other?
The first one takes 0.0919 seconds to finish where the second takes 0.226 seconds.
Also this is a copy of how the assembly handles the pointers:
//No pointers
dptr = &(bridge->DATA[x][y]);
001B41C6 mov eax,dword ptr [this]
001B41C9 mov ecx,dword ptr [eax+14h]
001B41CC movsx edx,word ptr [x]
001B41D0 imul edx,edx,370h
001B41D6 lea eax,[ecx+edx+1D4h]
001B41DD movsx ecx,word ptr [y]
001B41E1 shl ecx,4
001B41E4 add eax,ecx
001B41E6 mov dword ptr [dptr],eax
//With pointers
//Pointing to DATA[x]
012C41A5 mov eax,dword ptr [this]
012C41A8 mov ecx,dword ptr [eax+14h]
012C41AB movsx edx,word ptr [x]
012C41AF imul edx,edx,370h
012C41B5 lea eax,[ecx+edx+1D4h]
012C41BC mov dword ptr [y_dptr],eax
//Pointing to DATA[x]+y
012C41E0 movsx eax,word ptr [y]
012C41E4 shl eax,4
012C41E7 add eax,dword ptr [y_dptr]
012C41EA mov dword ptr [dptr],eax
other than this part of the code, then the rest is identical.
Looking only at the assembly we see an extra mov (the assignment of y_dptr).
Seeing how this is done on every iteration in the (outer) loop and there are no other differences in the code, this could be your reason for performance decrease.
Other than that, there is really nothing in your code that takes advantage of the pointer magic you are trying to use.
f.e. You use dptr = (y_dptr+y); where you can lose either dptr or y_dptr by using increment on the pointer directly (y_dptr++;). This is some pointer arithmetic magic you are not using and could be improved.
Related
map<int, set<int>**> st;
for (int i = 0; i < 10; i++) {
set<int>* temp = new set<int>();
set<int>** ptr = new set<int>*();
//set<int>** ptr = (set<int>**)malloc(sizeof(set<int>*));
ptr = &temp;
temp->insert(i);
st[i] = ptr; //at this stage st[0] .. st[i] points to the same ptr
}
The vector of sets, st[0]...st[9] keeps getting overridden and points to the newly generated ptr for every single loop.
Am I missing something?
With
set<int>* temp = new set<int>();
set<int>** ptr = new set<int>*();
you are creating the sets pointed to by the pointers freshly in each iteration of the loop. I.e. nothing is overwritten, it is even worse, it gets completely destroyed after each iteration.
That is why temp->insert(i); will always insert into a freshly created and still empty set.
The line ptr = &temp; however will always set the local pointer ptr point to the address of the loop-local pointer temp, which can easily always end up on the same address. (Think of the stack, though that is not strictly defined by C standard.)
You store dangling pointers in your map st.
ptr = &temp;
will take the address of the temporary variable temp. In the next iteration of the loop a new value will be assigned to temp, but the address of temp will stay the same, thus ptr will always point to the same temporary variable.
When the loop ends and temp goes out of scope you have dangling pointers and undefined behaviour follows.
I'm not really sure why you want to use pointers, especially double pointers here. I think you can drop all the pointers and let the map manage memory for you.
map<int, set<int>> st;
for (int i = 0; i < 10; i++) {
set<int> temp;
temp->insert(i);
st[i] = temp;
}
I am currently working on an Arduino platform and I am trying to get rid of all Strings through char [] and pointers in order to avoid problems with the memory of my Arduino nano. The following code was being used to generate a string and pass to a function which expects a char *:
char * ptr = "";
strcpy(ptr, "AT+CWJAP=\"");
strcat(ptr, wifi_ssid);
strcat(ptr,"\",\"");
strcat(ptr,WIFI_PASS);
strcat(ptr,"\"");
Serial.println(ptr);
addToPipe(ptr);
where:
void ESP8266::addToPipe(char * cmd) {
for(pipeSlot = 0; pipeSlot < PIPEMAXSIZE; pipeSlot++) {
if(isCharArrayEmpty(pipe[pipeSlot])){
Serial.print("Slot is Empty. New data:");
Serial.println(cmd);
pipe[pipeSlot] = cmd;
pipeSlot = PIPEMAXSIZE; //for breaking loop
} else {
Serial.print("Slot is Full with:");
Serial.println(pipe[pipeSlot]);
}
}
printPipe();
}
but for some non-apparent reason the pointer ptr was being printed in the Serial port continuously. However, by changing the above string generation with the following:
char * ptr = malloc(1);
strcpy(ptr, "AT+CWJAP=\"");
strcat(ptr, wifi_ssid);
strcat(ptr,"\",\"");
strcat(ptr,WIFI_PASS);
strcat(ptr,"\"");
Serial.println(ptr);
addToPipe(ptr);
seems to get rid of the problem. The question is, what is the difference between:
1. char * ptr = "";
2. char * ptr = malloc(1);
3. char * ptr = NULL
Thanks in advance
When you do this:
char * ptr = malloc(1);
You're only allocating enough space for a single byte. When you then try to strcat or strcpy anything to it, you're writing past the bounds of allocated memory. This invokes undefined behavior.
Assigning "" to ptr also won't work because it now points to a (empty) string literal and string literals can't be modified. Assigning NULL also won't work because it's undefined behavior to dereference a NULL pointer.
You need to allocate enough space to hold the entire string plus the terminating null byte:
char *ptr = malloc(10 + strlen(wifi_ssid) + 3 + strlen(WIFI_PASS) + 1 + 1);
strcpy(ptr, "AT+CWJAP=\"");
strcat(ptr, wifi_ssid);
strcat(ptr,"\",\"");
strcat(ptr,WIFI_PASS);
strcat(ptr,"\"");
How do I pass data around my program without copying it every time?
Specifically, when calling sim(ohlc) I want to just pass the pointer reference, I don't want to copy the data to the function.
This is the program I made, but I'm not sure this is the best way to do it (specially when it comes to speed and memory usage).
I think I'm not passing the pointer to sim(ohlc) like I should, but if I try sim(&ohlc) I don't know how to change the sim function to accept that.
struct ohlcS {
vector<unsigned int> timestamp;
vector<float> open;
vector<float> high;
vector<float> low;
vector<float> close;
vector<float> volume;
} ;
ohlcS *read_csv(string file_name) {
// open file and read stuff
if (read_error)
return NULL;
static ohlcS ohlc;
ohlc.timestamp.push_back(read_value);
return &ohlc;
}
int sim(ohlcS* ohlc) {
// do stuff
return 1;
}
main() {
ohlcS *ohlc = read_csv(input_file);
results = sim(ohlc);
}
It's C++, use a reference. It's safe, since you return a static object.
static ohlc ohlc_not_found;
ohlc &read_csv(string file_name) {
// open file and read stuff
if(error_while_opening)
{
return ohlc_not_found;
}
static ohlc loc_ohlc;
loc_ohlc.timestamp.push_back(read_value);
return loc_ohlc;
}
int sim(const ohlc& par_ohlc) {
// do stuff
return 1;
}
....
ohlc& var_ohlc = read_csv(input_file);
if(var_ohlc == ohlc_not_found)
{
// error handling
return;
}
results = sim(var_ohlc);
If you want to modify par_ohlc in sim, do not make it const.
and it's not recommended to use ohlc for both class and variable name :(
In line:
results = sim(ohlc);
you are passing ohlc pointer to sim function, no deep data copy is done, only 32bit pointer value is copied.
This pushes the address (32 bit value) onto the stack.
results = sim(ohlc);
Like:
; ...
push eax ; addr of struct/class/whatever
call function ; jump to function
; ...
function:
push ebp
mov ebp, esp
mov eax, [ebp+8] ; ebp+8 is the 32 bit value you pushed before onto the stack
; -> your pointer
Take a look at this and maybe that too.
Version 2
; ...
push eax ; addr of struct/class/whatever
jmp function ; jump to function
autolbl001:
; ...
function:
push ebp
mov ebp, esp
mov eax, [ebp+8] ; ebp+8 is the 32 bit value you pushed before onto the stack
; ...
jmp autolbl001
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Tail recursion in C++
I'm new to tail recursion in c++. My project requires I make all my functions tail recursive. I've tested the following code and it works correctly. However, I'm not sure if how I've done it qualifies as tail recursion.
static int sum_helper(list_t hList, int accumulator){
if (list_isEmpty(hList))
return accumulator;
else {
accumulator += list_first(hList);
hList = list_rest(hList);
return sum_helper(hList, accumulator);
}
}
int sum(list_t list){
/*
// EFFECTS: returns the sum of each element in list
// zero if the list is empty.
*/
if (list_isEmpty(list))
return 0;
return sum_helper(list, 0);
}
Thanks!
In short, you don't do anything after the recursive call (sum_helper). This means that you never need to return to the caller, and thus, you can throw away the stack frame of the caller.
Take the example of the normal factorial function
int fact(int x)
{
if(x == 0)
return 1;
else
return x * fact(x-1);
}
This is not tail recursive since the value of fact(x-1) needs to be returned, then multiplied by six. Instead, we can cheat a little, and pass an accumulator too. See this:
int fact(int x, int acc)
{
if(x == 0)
return acc; // Technically, acc * 1, but that's the identity anyway.
else
return fact(x-1, acc*x);
}
Here, the last function call in the control flow is fact(x-1, acc*x). Afterwards, we don't need to use the return value for anything of the called function for anything else, hence we don't need to return to the current frame. For this reason, we can throw away the stack frame and apply other optimisations.
Disclaimer: I've probably applied the factorial algorithm wrong, but you get the jist. Hopefully.
It's tail-recursion provided list_t doesn't have a non-trivial destructor. If it does have a non-trivial destructor, the destructor needs to run after the recursive call returns and before the function itself returns.
Bonus:
int sum(list_t hList, int accumulator = 0) {
return list_isEmpty(hList)
? 0
: sum(list_rest(hList), accumulator + list_first(hList));
}
But tastes vary; some people might like yours more.
From theoreitcal point of view, yes, it's tail recursion (provided that hList does not have nontrival destructor). But from practival point of view it depends on your compiler and its settings. Let's take a look at assembly generated for this simple code:
#include <cstdlib>
struct list{
int head;
list * tail;
};
int sum_helper(list * l, int accumulator){
if (l == NULL)
return accumulator;
else {
accumulator += l->head;
return sum_helper(l->tail, accumulator);
}
}
Optimisations ON : (g++ -O2 ..., boring part omitted):
testq %rdi, %rdi
movl %esi, %eax
je .L2
...
.L6:
...
jne .L6 <-- loop
.L2:
rep
ret
This is clearly a loop. But when you disable optimisations, you get:
_Z10sum_helperP4listi:
.LFB6:
...
jne .L2
movl -12(%rbp), %eax
jmp .L3
.L2:
...
call _Z10sum_helperP4listi <-- recursion
.L3:
leave
.cfi_def_cfa 7, 8
ret
Which is recursive.
The book 'Modern Compiler Design' is the nice book about compilers. In its source code something that is annoying me is AST or Abstract Syntax Tree. Suppose we want to write a parenthesized expression parser which parses something like: ((2+3)*4) * 2! The book says that we have an AST like:
((2+3)*4) * 2
/ | \
(2+3) *4 * 2
/ | \
(2+3) * 4
/ | \
2 + 3
So should I save a tree in memory or just use recursive calls; Note: if I don't store it in memory, how can I convert it to machine code ?
Parser code:
int parse(Expression &expr)
{
if(token.class=='D')
{
expr.type='D';
expr.value=token.val-'0';
get_next_token();
return 1;
}
if(token.class=='(')
{
expr.type='P';
get_next_token();
parse(&expr->left);
parse_operator(&expr->op);
parse(&expr->right);
if(token.class!=')')
Error("missing )");
get_next_token();
return 1;
}
return 0;
}
Grammar is:
expr -> expr | (expr op expr)
digit -> 0|1|2....|9
op -> +|*
You can store the tree in memory or you can directly produce the required output code. Storing the intermediate form is normally done to be able to do some processing on the code at an higher level before generating output.
In your case for example it would be simple to discover that your expression contains no variables and therefore the result is a fixed number. Looking only at one node at a time this however is not possible. To be more explicit if after looking at "2*" you generate machine code for computing the double of something this code is sort of wasted when the other part is for example "3" because your program will compute "3" and then compute the double of that every time while just loading "6" would be equivalent but shorter and faster.
If you want to generate the machine code then you need first to know for what kind of machine the code is going to be generated... the simplest model uses a stack-based approach. In this case you need no register allocation logic and it's easy to compile directly to machine code without the intermediate representation. Consider this small example that handles just integers, four operations, unary negation and variables... you will notice that no data structure is used at all: source code characters are read and machine instructions are written to output...
#include <stdio.h>
#include <stdlib.h>
void error(const char *what) {
fprintf(stderr, "ERROR: %s\n", what);
exit(1);
}
void compileLiteral(const char *& s) {
int v = 0;
while (*s >= '0' && *s <= '9') {
v = v*10 + *s++ - '0';
}
printf(" mov eax, %i\n", v);
}
void compileSymbol(const char *& s) {
printf(" mov eax, dword ptr ");
while ((*s >= 'a' && *s <= 'z') ||
(*s >= 'A' && *s <= 'Z') ||
(*s >= '0' && *s <= '9') ||
(*s == '_')) {
putchar(*s++);
}
printf("\n");
}
void compileExpression(const char *&);
void compileTerm(const char *& s) {
if (*s >= '0' && *s <= '9') {
// Number
compileLiteral(s);
} else if ((*s >= 'a' && *s <= 'z') ||
(*s >= 'A' && *s <= 'Z') ||
(*s == '_')) {
// Variable
compileSymbol(s);
} else if (*s == '-') {
// Unary negation
s++;
compileTerm(s);
printf(" neg eax\n");
} else if (*s == '(') {
// Parenthesized sub-expression
s++;
compileExpression(s);
if (*s != ')')
error("')' expected");
s++;
} else {
error("Syntax error");
}
}
void compileMulDiv(const char *& s) {
compileTerm(s);
for (;;) {
if (*s == '*') {
s++;
printf(" push eax\n");
compileTerm(s);
printf(" mov ebx, eax\n");
printf(" pop eax\n");
printf(" imul ebx\n");
} else if (*s == '/') {
s++;
printf(" push eax\n");
compileTerm(s);
printf(" mov ebx, eax\n");
printf(" pop eax\n");
printf(" idiv ebx\n");
} else break;
}
}
void compileAddSub(const char *& s) {
compileMulDiv(s);
for (;;) {
if (*s == '+') {
s++;
printf(" push eax\n");
compileMulDiv(s);
printf(" mov ebx, eax\n");
printf(" pop eax\n");
printf(" add eax, ebx\n");
} else if (*s == '-') {
s++;
printf(" push eax\n");
compileMulDiv(s);
printf(" mov ebx, eax\n");
printf(" pop eax\n");
printf(" sub eax, ebx\n");
} else break;
}
}
void compileExpression(const char *& s) {
compileAddSub(s);
}
int main(int argc, const char *argv[]) {
if (argc != 2) error("Syntax: simple-compiler <expr>\n");
compileExpression(argv[1]);
return 0;
}
For example running the compiler with 1+y*(-3+x) as input you get as output
mov eax, 1
push eax
mov eax, dword ptr y
push eax
mov eax, 3
neg eax
push eax
mov eax, dword ptr x
mov ebx, eax
pop eax
add eax, ebx
mov ebx, eax
pop eax
imul ebx
mov ebx, eax
pop eax
add eax, ebx
However this approach of writing compilers doesn't scale well to an optimizing compiler.
While it's possible to get some optimization by adding a "peephole" optimizer in the output stage, many useful optimizations are possible only looking at code from an higher point of view.
Also even the bare machine code generation could benefit by seeing more code, for example to decide which register assign to what or to decide which of the possible assembler implementations would be convenient for a specific code pattern.
For example the same expression could be compiled by an optimizing compiler to
mov eax, dword ptr x
sub eax, 3
imul dword ptr y
inc eax
Nine times out of ten you'll save the AST in memory for whatever you are doing after lexing and parsing are done.
Once you have an AST you can do a number of things:
Evaluate it directly (perhaps using recursion, perhaps using your own custom stack)
Transform it into some other output, such as code in another language or some other type of translation.
Compile it to preferred instruction set
etc.
You can create an AST with Dijkstra's Shunting-yard algorithm.
At some point you will have the whole expression or AST in memory though, unless you calculate immediate results while parsing. This works with (sub-)expressions containing only literals or compile time constants, but not with any variables calculated at runtime.
So should I save a tree in memory or just use recursive calls;
You'll use recursive calls in your parser to build the tree in memory.
And of course, you want to keep the tree in memory to process it.
An optimizing compiler keeps several representations of the code in memory (and transform them).
The answer to the question depends on whether you want a compiler, an interpreter, or something in between (an interpreter wrapped around an intermediate language). If you want an interpreter, a recursive descent parser will at the same time evaluate the expression, so there is no need to hold it in memory. If you want a compiler, then a constant expression like the example can and should be optimised, but most expressions will operate on variables, and you need to convert to tree form as an intermediate step before converting to a linear form.
A hybrid compiler / interpreter will usually compile expressions, but it doesn't have to. It's often a cheap way of writing a program which outputs an executable to simply wrap the interpreter up with the source code. Matlab uses this technique - code used to be genuinely compiled but there were problems with consistency with the interactive version. However I wouldn't allow the difficulty of generating a parse tree for expressions determine the issue.