How to save a value to a variable using mmx ? (c++)

How to save a value to a variable using mmx ? (c++) - c++

int А;
int32_t matr1[10] = { 3,10,100,1000,2,40,200,3}; // first matrix
int32_t result[10] = {}; //result
...
_asm{
lea eax,[matr1]
lea edx,[result]
movq mm0,[eax]
movd[edx],mm0
}
How can I continue the code so that for example the 3 element of the array is stored in the variable A. That is, I only need to take one element and save it to a variable

Related

Reverse "formula" based on a string

Is possible to make the reverse process of that?
string dir = textenc.Text;
uint EDI = 0x1505;
uint EDX = 0;
byte ECX = 0;
for ( int i = 0; i < dir.Length; i++ )
{
if ( dir[ i ] != '.' && dir[ i ] != '\\' )
{
EDX = EDI;
EDX = EDX << 5;
ECX = ( byte )dir[ i ];
EDX = EDX + EDI;
EDX = EDX + ECX;
EDI = EDX;
}
};
return EDI;
The dir is a string, for example, when dir is "data\font\tahoma.ttf" the output of that function would be: 2114405758.
Is there a way to retrieve the original string giving only the output number?

No.
The hash function ignores the characters . and \. You can add as many as these as you want, and it will still calculate the same value.
Note: As mentioned by others, this is a hash function, which will create infinitely many collisions.

The short answer is NO, it is not possible.
The long answer is that it will be nearly impossible to tell if the generated hash is unique to the input given.
The only way to know this is to generate hashes for all possible string combinations until you hit a duplicate; Or you don't hit a duplicate, but you will run out of memory before the later happens. There is also the case that you don't hit duplicates for a long time which can also mean that the hash function is quite good but still does not mean it is reversible.

Dynamically Find the Edge of a Rectangle

I have 2 2D points which are jammed together into an array: int square[4]. These four numbers are interpreted as the definition of a rectangle with horizontal lines parallel to the X-axis and vertical lines parallel to the Y-axis. The elements of the array then respectively define:
Left edge's X coordinate
Bottom edge's Y coordinate
Right edge's X coordinate
Top edge's Y coordinate
I have defined the a winding order in this enum:
enum WindingOrder {
BOTTOM = 0,
RIGHT,
TOP,
LEFT
};
The minimal, complete, verifiable example of my code, is that I am given an output second array: int output[4] and an input WindingOrder edge. I need to populate output as follows:
switch(edge) {
case BOTTOM:
output[0] = square[0]; output[1] = square[1]; output[2] = square[2]; output[3] = square[1];
break;
case RIGHT:
output[0] = square[2]; output[1] = square[1]; output[2] = square[2]; output[3] = square[3];
break;
case TOP:
output[0] = square[2]; output[1] = square[3]; output[2] = square[0]; output[3] = square[3];
break;
case LEFT:
output[0] = square[0]; output[1] = square[3]; output[2] = square[0]; output[3] = square[1];
break;
}
I'm not married to a particular WindingOrder arrangement, nor do I care about the order of the points in ouptut, so if changing those makes this solvable I'm down. What I want to know is can I construct the square indexes to assign to output in a for loop, without an if/case/ternary statement (in other words using bit-wise operations)?
So I'd want, given int i = 0 and WindingOrder edge to do bit-wise operations on them to find:
do {
output[i] = array[???];
} while(++i <= LEFT);
EDIT:
I've received a lot of static array answers (which I believe are the best way to solve this so I've given a +1). But as a logic problem I'm curious how few bit-wise operations could be taken to find an element of a given edge dynamically. So for example, how should this function's body be writen given an arbitrary edge and i: int getIndex(int i, int edge)

Here is a different solution. It is a variation on the static array approach, but without an actual array: the indexing matrix is inlined as a 32 bit unsigned integer computed as constant expression. The column for the edge parameter is selected with a single shift, finally, individual indices for each array element are selected with via simple bit-shifting and masking.
This solution has some advantages:
it is simple to understand
it does not use tests
it does not use a static array, nor any other memory location
it is independent on the winding order and can be easily customized for any array component order
it does not use C99 specific syntax, which may not be available in C++.
This is as close as I could get to a bitwise solution.
#include <iostream>
enum WindingOrder { BOTTOM = 0, RIGHT, TOP, LEFT };
void BitwiseWind(int const *input, int *output, enum WindingOrder edge)
{
unsigned bits = ((0x00010201 << BOTTOM * 2) |
(0x02010203 << RIGHT * 2) |
(0x02030003 << TOP * 2) |
(0x00030001 << LEFT * 2))
>> (edge * 2);
output[0] = input[(bits >> 24) & 3];
output[1] = input[(bits >> 16) & 3];
output[2] = input[(bits >> 8) & 3];
output[3] = input[(bits >> 0) & 3];
}
int main() {
enum WindingOrder edges[4] = { BOTTOM, RIGHT, TOP, LEFT };
int rect[4] = { 1, 3, 4, 5 };
int output[4];
for (int i = 0; i < 4; i++) {
BitwiseWind(rect, output, edges[i]);
std::cout << output[0] << output[1] << output[2] << output[3] << std::endl;
}
return 0;
}
Compiling BitwiseWind for x86-64 with clang -O3 generates 21 instructions, 6 more than the static array version, but without any memory reference. That's a little disappointing, but I hope it could generate fewer instructions for an ARM target, taking advantage of bit-field extraction opcodes. Incidentally, the inlined version using output[i] = array[(i+(i==winding)*2)&3]; produces 25 instructions without any jumps, and gcc -O3 does much worse: it generates a lot more code with 4 tests and jumps.
The generic getIndex function below compiles to just 6 x86 instructions:
int getIndex(int i, int edge) {
return (((0x00010201 << BOTTOM * 2) |
(0x02010203 << RIGHT * 2) |
(0x02030003 << TOP * 2) |
(0x00030001 << LEFT * 2))
>> (edge * 2 + 24 - i * 8)) & 3;
}

Is there a particular reason that this needs to use lots of bitwise operations? It seems quite a complex way to solve the problem?
You seem to be quite worried about speed, for example, you don't want to use modulo because it is expensive. This being the case, why not just use a really simple lookup and unroll the loops? Example on ideone as well.
EDIT: Thanks to chqrlie for input. Have updated answer accordingly.
#include <iostream>
using namespace std;
enum WindingOrder {
BOTTOM = 0,
RIGHT,
TOP,
LEFT
};
void DoWinding1(unsigned int const *const in, unsigned int *const out, const enum WindingOrder ord)
{
static const unsigned int order[4][4] = { [BOTTOM] = {0,1,2,1},
[RIGHT] = {2,1,2,3},
[TOP] = {2,3,0,3},
[LEFT] = {0,3,0,1} };
out[0] = in[order[ord][0]];
out[1] = in[order[ord][1]];
out[2] = in[order[ord][2]];
out[3] = in[order[ord][3]];
}
int main() {
unsigned int idx;
unsigned int rect[4] = {1, 3, 4, 5};
unsigned int out[4] = {0};
DoWinding1(rect, out, BOTTOM);
std::cout << out[0] << out[1] << out[2] << out[3] << std::endl;
return 0;
}

Is that possible to redefine WindingOrder's value set? If it could be , here's my solution , which tried encoding selection indexes in WindingOrder's value set , then simply decoding out select index for input[] by shifting and masking as long the output[] index iterating.
[Thanks to chqrlie for offering code base]:
#include <iostream>
enum WindingOrder {
// the RIGHT most 4-bits indicate the selection index from input[] to output[0]
// the LEFT most 4-bits indicate the selection index from input[] to output[3]
BOTTOM = 0x1210,
RIGHT = 0x3212,
TOP = 0x3230,
LEFT = 0x3010
};
void BitwiseWind(int const *input, int *output, unsigned short edge)
{
for (size_t i = 0; i < 4; i++)
output[i] = input[(edge >> (i*4)) & 0x000F]; // decode
}
int main() {
enum WindingOrder edges[4] = { BOTTOM, RIGHT, TOP, LEFT };
int rect[4] = { 1, 3, 4, 5 };
int output[4];
for (int i = 0; i < 4; i++) {
BitwiseWind(rect, output, edges[i]);
std::cout << output[0] << output[1] << output[2] << output[3] << std::endl;
}
return 0;
}
The generic getIndex(int i,enum WindingOrder edge) would be:
int getIndex(int i,enum WindingOrder edge)
{
return ((edge >> (i*4)) & 0x000F);
}
I did not count how many instruction it used , but i believe it would be quiet few. And really easy to image how it worked. :)

This is untested and there might be a small mistake in some details but the general idea should work.
Copying the array to the output would use the indices {0,1,2,3}. To get a specific edge you have to do some transformations to the indices:
changed_pos changed_to
RIGHT : {2,1,2,3} 0 2
TOP : {0,3,2,3} 1 3
LEFT : {0,1,0,3} 2 0
BOTTOM: {0,1,2,1} 3 1
So basically you have to add 2 mod 4 for the specific position of your winding.
So the (like I said untested) snipped could look like this
for (size_t i=0; i<4; ++i) {
output[i] = array[(i+(i==edge)*2)%4];
}
If the comparison is true you add 1*2=2, else 0*2=0 to the index and do mod 4 to stay in the range.
Your enum have to look like this (but I guess you figured this out by yourself):
enum WindingOrder {
RIGHT,
TOP,
LEFT,
BOTTOM
};
MWE:
#include <iostream>
#include <string>
#include <vector>
enum WindingOrder {
RIGHT=0,
TOP,
LEFT,
BOTTOM
};
int main()
{
std::vector<int> array = {2,4,8,9};
std::vector<int> output(4);
std::vector<WindingOrder> test = {LEFT,RIGHT,BOTTOM,TOP};
for (auto winding : test) {
for (size_t i=0; i<4; ++i) {
output[i] = array[(i+(i==winding)*2)%4];
}
std::cout << "winding " << winding << ": " << output[0] << output[1] << output[2] << output[3] << std::endl;
}
}

From the answer of yourself, you're close to the solution. I think what you need here is Karnaugh map, which is a universal method for most Boolean algebra problems.
Suppose
The elements of the array then respectively define:
input[0]: Left edge's X coordinate
input[0]: Bottom edge's Y coordinate
input[0]: Right edge's X coordinate
input[0]: Top edge's Y coordinate
I have defined the a winding order in this enum:
enum WindingOrder {
BOTTOM = 0,
RIGHT,
TOP,
LEFT
};
Since the for-loop may looks like
for (int k = 0; k != 4; ++k) {
int i = getIndex(k, edge); // calculate i from k and edge
output[k] = square[i];
}
Then the input is k(output[k]) and edge, the output is i(square[i]). And because i has 2 bits, then two logic functions are needed.
Here we use P = F1(A, B, C, D) and Q = F2(A, B, C, D) to represent the logic functions, in which A, B, C, D, P and Q are all single bit, and
k = (A << 1) + B;
edge = (C << 1) + D;
i = (P << 1) + Q;
Then what we need to do is just deduce the two logic functions F1 and F2 from the given conditions.
From the switch case statements you gave, we can easily get the truth table.
k\edge 0 1 3 2
0 0 2 0 2
1 1 1 3 3
3 1 3 1 3
2 2 2 0 0
Then separate this into two truth table for two bits P and Q.
P edge 0 1 3 2
k AB\CD 00 01 11 10
0 00 0 1 0 1
1 01 0 0 1 1
3 11 0 1 0 1
2 10 1 1 0 0
Q edge 0 1 3 2
k AB\CD 00 01 11 10
0 00 0 0 0 0
1 01 1 1 1 1
3 11 1 1 1 1
2 10 0 0 0 0
These are the Karnaugh maps that I mentioned at the beginning. We can easily get the functions.
F1(A, B, C, D) = A~B~C + A~CD + ~B~CD + ~ABC + ~AC~D + BC~D
F2(A, B, C, D) = B
Then the program will be
int getIndex(int k, int edge) {
int A = (k >> 1) & 1;
int B = k & 1;
int C = (edge >> 1) & 1;
int D = edge & 1;
int P = A&~B&~C | A&~C&D | ~B&~C&D | ~A&B&C | ~A&C&~D | B&C&~D;
int Q = B;
return (P << 1) + Q;
}
Passed the examine here. Of course, you can simplify the function even more with the XOR.
EDIT
Using XOR to simplify the expression can be achieved most of time, since A^B == A~B + ~AB. But this may not the thing you want. First, I think the performance varies only a little between the Sum of Products(SoP) expression and the even more simplified version with XOR. Second, there is not a universal method (as far as I know) to simplify an expression with XOR, so you have to rely on your own experience to do this work.
There are sixteen possible logic functions of two variables, but in digital logic hardware, the simplest gate circuits implement only four of them: AND, OR, and the complements of those (NAND and NOR). And the Karnaugh map are used to simplify real-world logic requirements so that they can be implemented using a minimum number of physical logic gates.
There are two common expressions used here, Sum of Products and Product of Sums expressions. These two expressions can be implemented directly using only AND and OR logic operators. And they can be deduced directly with Karnaugh map.

If you define the coordinates and directions in clockwise order starting at left,
#define LEFT 0
#define TOP 1
#define RIGHT 2
#define BOTTOM 3
you can use
void edge_line(int line[4], const int rect[4], const int edge)
{
line[0] = rect[ edge & 2 ];
line[1] = rect[ ((edge + 3) & 2) + 1 ];
line[2] = rect[ ((edge + 1) & 2) ];
line[3] = rect[ (edge & 2) + 1 ];
}
to copy the edge line coordinates (each line segment in clockwise winding order). It looks suboptimal, but using -O2, GCC-4.8, you get essentially
edge_line:
pushl %esi
pushl %ebx
movl 20(%esp), %ecx
movl 16(%esp), %edx
movl 12(%esp), %eax
movl %ecx, %esi
andl $2, %esi
movl (%edx,%esi,4), %ebx
movl %ebx, (%eax)
leal 3(%ecx), %ebx
addl $1, %ecx
andl $2, %ebx
andl $2, %ecx
addl $1, %ebx
movl (%edx,%ebx,4), %ebx
movl %ebx, 4(%eax)
movl (%edx,%ecx,4), %ecx
movl %ecx, 8(%eax)
movl 4(%edx,%esi,4), %edx
movl %edx, 12(%eax)
popl %ebx
popl %esi
ret
but on 64-bit, even better
edge_line:
movl %edx, %ecx
andl $2, %ecx
movslq %ecx, %rcx
movl (%rsi,%rcx,4), %eax
movl %eax, (%rdi)
leal 3(%rdx), %eax
addl $1, %edx
andl $2, %edx
andl $2, %eax
movslq %edx, %rdx
cltq
movl 4(%rsi,%rax,4), %eax
movl %eax, 4(%rdi)
movl (%rsi,%rdx,4), %eax
movl %eax, 8(%rdi)
movl 4(%rsi,%rcx,4), %eax
movl %eax, 12(%rdi)
ret
As you can see, there are no conditionals, and the binary operators combine and optimize to very few instructions.
Edited to add:
If we define a getIndex(i, edge) function, using three binary ANDs, one bit shift (right by 1), three additions, and one subtraction,
int getIndex(const int i, const int edge)
{
return (i & 1) + ((edge + 4 - (i & 1) + (i >> 1)) & 2);
}
with which edge_line() can be implemented as
void edge_line(int line[4], const int rect[4], const int edge)
{
line[0] = rect[ getIndex(0, edge) ];
line[1] = rect[ getIndex(1, edge) ];
line[2] = rect[ getIndex(2, edge) ];
line[3] = rect[ getIndex(3, edge) ];
}
we get the exact same results as before. Using GCC-4.8.4 and -O2 on AMD64/x86-64 compiles to
getIndex:
movl %edi, %edx
sarl %edi
andl $1, %edx
subl %edx, %esi
leal 4(%rsi,%rdi), %eax
andl $2, %eax
addl %edx, %eax
ret
and to
getIndex:
movl 4(%esp), %eax
movl 8(%esp), %edx
movl %eax, %ecx
andl $1, %ecx
subl %ecx, %edx
sarl %eax
leal 4(%edx,%eax), %eax
andl $2, %eax
addl %ecx, %eax
ret
on i686. Note that I arrived at the above form using the four-by-four result table; there are other, more rigorous ways to construct it, and there might even be a more optimal form. Because of this, I seriously recommend adding a big huge comment above the function, explaining the intent, and preferably also showing the result table. Something like
/* This function returns an array index:
* 0 for left
* 1 for top
* 2 for right
* 3 for bottom
* given edge:
* 0 for left
* 1 for top
* 2 for right
* 3 for bottom
* and i:
* 0 for initial x
* 1 for initial y
* 2 for final x
* 3 for final y
*
* The result table is
* | edge
* | 0 1 2 3
* ----+-------
* i=0 | 0 0 2 2
* i=1 | 3 1 1 3
* i=2 | 0 2 2 0
* i=3 | 1 1 3 3
*
* Apologies for the write-only code.
*/
Or something similar.

Lets call our goal variable to be used to index squared: int index.
Now we'll create a table of the desired index for edge versus i, with edge across the row and i down the column:
║0│1│2│3
═╬═╪═╪═╪═
0║0│1│2│1
─╫─┼─┼─┼─
1║2│1│2│3
─╫─┼─┼─┼─
2║2│3│0│3
─╫─┼─┼─┼─
3║0│3│0│1
It is obvious from this that the least significant bit of index is always odd for odd is and even for even is. So if we could find the most significant bit of index we'd just to or that with i & 1 and we'd have our index. So lets make another table of just the most significant bit of index for the same edge versus i table:
║0│1│2│3
═╬═╪═╪═╪═
0║0│0│1│0
─╫─┼─┼─┼─
1║1│0│1│1
─╫─┼─┼─┼─
2║1│1│0│1
─╫─┼─┼─┼─
3║0│1│0│0
We can see several things here:
When i is 0 or 3 the columns are identical depending only on edge
These columns are set when edge is 1 or 2
When i is 1 or 2 the columns are inverse of each other
These columns are set when only edge's most significant bit or only i's most significant bit is set
So let's start by breaking edge and i into least significant and most significant bits:
const int ib0 = i & 1;
const int ib1 = (i & 2) >> 1;
const int eb0 = edge & 1;
const int eb1 = (edge & 2) >> 1;
From here we can easily find whether i is 0 or 3:
const int iXor = ib0 ^ ib1;
For the 0/3 condition:
const int iXorCondition = ib1 ^ eb1;
And the 1/2 condition:
const int iNXorCondition = eb0 ^ eb1;
Now we'll just need to combine those with their respective iXor and put back index's least significant bit:
const int index = ((iNXorCondition & ~iXor | iXorCondition & iXor) << 1) | ib0;
Putting this all together into a convenient function we get:
int getIndex(int i, int edge) {
const int ib0 = i & 1;
const int ib1 = (i & 2) >> 1;
const int eb0 = edge & 1;
const int eb1 = (edge & 2) >> 1;
const int iXor = ib0 ^ ib1;
const int iNXorCondition = eb0 ^ eb1;
const int iXorCondition = ib1 ^ eb1;
return ((iNXorCondition & ~iXor | iXorCondition & iXor) << 1) | ib0;
}
I've written a checking live example here.

What I want to know is can I construct the square indexes to assign to output in a for loop, without an if/case/ternary statement (in other words using bit-wise operations) ?
I would ask you what you expect to achieve in doing that ?
My view is that the switch-case construct will, typically, be completely reorganized by a compiler's optimization code. It's best, IMO, to leave that code alone and let the compiler do that.
There are only two conditions where Id change that view ;
You were writing in OpenCL ( rather than C ) and wanted to optimize the code where decision-branch logic can be problematic for performance.
You wanted to use explicit coding for SIMD vectorization. There are some special operations that might help there, but it's a coding option that locks you into things that might not work well on hardware without SIMD instruction sets ( or perform quite differently on different hardware ). It's also worth noting that some compilers can auto-vectorize with the right coding.
I just see little or no advantage to coding these operations any other way than switch-case for C.

This is a way to achieve that:
do {
output[i] = square[
(edge & 1) * (
!(i & 1) * ((edge + 1) & 2) +
(i & 1) * (
(!((edge - 1)/2)&1) * i +
(((edge - 1)/2)&1) * (4-i)
)
) +
!(edge & 1) * (
(i & 1) * (edge + 1) +
!(i & 1) * ((edge & 2) - ((edge & 2)-1) * i)
)
];
} while(++i <= LEFT);
To help you understand I indented the code, you can obviously erase all the white spaces. I have put a tab where ever I wanted to separate two cases. By the way as you see the calculation is in two sections for two different cases which are symmetrical but I solved each case with a different algorithm so you can see various ways to achieve things.

A variation of fibonacci assembly x86, have to call it from a c++ main method, kind of lost at few parts

This is the fibonacci code,
unsigned int fib(unsigned int n)
{
if (n==1 || n ==2)
return 1;
else
return fib(n-2) + fib(n-1);
}
but instead for my code, I have to change formula to a new one,
f(n-2)/2 + f(n-1) * 2, so the sequence is 1, 2, 4, 9, 20, 44, 98, 218
I need to write a recursive function called Mobonacci in assembly to calculate the nth number in sequence, and also a main function in c++ that reads a positive number n, then cals mobonacci assembly function with parameter n, then print our result
So I'm kind of confused, do I write the function in assembly like I did below, then write a c++ function to call it? and how would guys change my code from fibonacci to the new formula? Here is my code, what do I need to change and did I need create new part that let the code read input? Also is my code too short? do I need add anything else?
.code
main PROC
mov ecx,0
push 4 ; calculate the nth fib
call Fib ; calculate fib (eax)
call WriteDec
call Crlf
exit
main ENDP
Fib PROC
add ecx,1
push ebp
mov ebp,esp
mov eax,[ebp+8] ; get n
cmp eax,2 ; n == 2?
je exception2
cmp eax,1 ; n == 1?
je exception2
dec eax
push eax ; Fib(n-1)
call fib
add eax,
jmp Quit
Exception2:
dec eax
Quit:
pop ebp ; return EAX
ret 4 ; clean up stack
Fib ENDP
END main

Depends on where you are trying to insert asm code into your c++ code...
For gcc/linux you can do something like :
//Simple example:
void *frame; /* Frame pointer */
__asm__ ("mov %%ebp,%0":"=r"(frame));
//Complicated example:
int foo(void) {
int joe=1234, fred;
__asm__(
" mov %1,%%eax\n"
" add $2,%%eax\n"
" mov %%eax,%0\n"
:"=r" (fred) /* %0: Out */
:"r" (joe) /* %1: In */
:"%eax" /* Overwrite */
);
return fred;
}
The important thing is to understand how to use your asm function in cpp.
You can find some useful things about this subject here : https://www.cs.uaf.edu/2011/fall/cs301/lecture/10_12_asm_c.html
About the second part of your question.
To multiple, you can use the command "mul" and to make a division "div".
So if you want to do f(n-1) * 2
You have to get you register %eax after the "call fib" and use mul.
Just have a look here:
http://www.tutorialspoint.com/assembly_programming/assembly_arithmetic_instructions.htm

C++ Conditional Operator versus if-else

I have always wondered about this. Let's say we have a variable, string weight, and an input variable, int mode, which can be 1 or 0.
Is there a clear benefit to using:
weight = (mode == 1) ? "mode:1" : "mode:0";
over
if(mode == 1)
weight = "mode:1";
else
weight = "mode:0";
beyond code readability? Are speeds at all affected, is this handled differently by the compiler (such as the ability of certain switch statements to be converted to jump tables)?

The key difference between the conditional operator and an if/else block is that the conditional operator is an expression, rather than a statement. Thus, there are few places you can use the conditional operator where you can't use an if/else. For example, initialization of constant objects, like so:
const double biasFactor = (x < 5) ? 2.5 : 6.432;
If you used if/else in this case, biasFactor would have to be non-const.
Additonally, constructor initializer lists call for expressions rather than statements as well:
X::X()
: myData(x > 5 ? 0xCAFEBABE : OxDEADBEEF)
{
}
In this case, myData may not have any assignment operator or non-const member functions defined--its constructor may be the only way to pass any parameters to it.
Also, note that any expression can be turned into a statement by adding a semicolon at the end--the reverse is not true.

No, this is purely about presenting the code to a human reader. I'd expect any compiler to generate identical code for these.

With mingw, the assembly code generated with
const char * testFunc()
{
int mode=1;
const char * weight = (mode == 1)? "mode:1" : "mode:0";
return weight;
}
is:
testFunc():
0040138c: push %ebp
0040138d: mov %esp,%ebp
0040138f: sub $0x10,%esp
10 int mode=1;
00401392: movl $0x1,-0x4(%ebp)
11 const char * weight = (mode == 1)? "mode:1" : "mode:0";
00401399: cmpl $0x1,-0x4(%ebp)
0040139d: jne 0x4013a6 <testFunc()+26>
0040139f: mov $0x403064,%eax
004013a4: jmp 0x4013ab <testFunc()+31>
004013a6: mov $0x40306b,%eax
004013ab: mov %eax,-0x8(%ebp)
12 return weight;
004013ae: mov -0x8(%ebp),%eax
13 }
And with
const char * testFunc()
{
const char * weight;
int mode=1;
if(mode == 1)
weight = "mode:1";
else
weight = "mode:0";
return weight;
}
is:
testFunc():
0040138c: push %ebp
0040138d: mov %esp,%ebp
0040138f: sub $0x10,%esp
11 int mode=1;
00401392: movl $0x1,-0x8(%ebp)
12 if(mode == 1)
00401399: cmpl $0x1,-0x8(%ebp)
0040139d: jne 0x4013a8 <testFunc()+28>
13 weight = "mode:1";
0040139f: movl $0x403064,-0x4(%ebp)
004013a6: jmp 0x4013af <testFunc()+35>
15 weight = "mode:0";
004013a8: movl $0x40306b,-0x4(%ebp)
17 return weight;
004013af: mov -0x4(%ebp),%eax
18 }
Pretty much the same code is generated. The performance of your application shouldn't depend on small details like this one.
So, no it doesn't make a difference.

Why are intermediate results of template recursion in object file?

While playing around with template recursion in D, I found that the intermediate results of the classical factorial are still in the object file. I suppose they are also in the executable...?
I can see that the actually executed code contains only the value (or a pointer to it) but:
Shouldn't there be a single mov statement without the intermediate data being saved for no reason?
This is the code:
int main()
{
static int x = factorial!(5);
return x;//factorial!(5);
}
template factorial(int n)
{
static if (n == 1)
const factorial = 1;
else
const factorial = n * factorial!(n-1);
}
and this is the output of obj2asm test.o:
( for your convenience: 1! = 1h, 2! = 2h, 3! = 6h, 4! = 18h, 5! = 78h )
FLAT group
;File = test_fac_01.d
extrn _main
public _deh_beg
public _deh_end
public _tlsstart
public _tlsend
public _D11test_fac_014mainFZi1xi
extrn _GLOBAL_OFFSET_TABLE_
public _Dmain
public _D11test_fac_0112__ModuleInfoZ
extrn _Dmodule_ref
public _D11test_fac_017__arrayZ
public _D11test_fac_018__assertFiZv
public _D11test_fac_0115__unittest_failFiZv
extrn _d_array_bounds
extrn _d_unittestm
extrn _d_assertm
.text segment
assume CS:.text
:
mov EAX,offset FLAT:_D11test_fac_0112__ModuleInfoZ[018h]#32
mov ECX,offset FLAT:_Dmodule_ref#32
mov RDX,[RCX]
mov [RAX],RDX
mov [RCX],RAX
ret
.text ends
.data segment
_D11test_fac_0112__ModuleInfoZ:
db 004h,000h,000h,0ffffff80h,000h,000h,000h,000h ;........
db 074h,065h,073h,074h,05fh,066h,061h,063h ;test_fac
db 05fh,030h,031h,000h,000h,000h,000h,000h ;_01.....
db 000h,000h,000h,000h,000h,000h,000h,000h ;........
dq offset FLAT:_D11test_fac_0112__ModuleInfoZ#64
.data ends
.bss segment
.bss ends
.rodata segment
.rodata ends
.tdata segment
_tlsstart:
db 000h,000h,000h,000h,000h,000h,000h,000h ;........
db 000h,000h,000h,000h,000h,000h,000h,000h ;........
.tdata ends
.tdata. segment
_D11test_fac_014mainFZi1xi:
db 078h,000h,000h,000h ;x...
.tdata. ends
.text._Dmain segment
assume CS:.text._Dmain
_Dmain:
push RBP
mov RBP,RSP
mov RAX,FS:[00h]
mov RCX,_D11test_fac_014mainFZi1xi#GOTTPOFF[RIP]
mov EAX,[RCX][RAX]
pop RBP
ret
nop
nop
nop
.text._Dmain ends
.data._D11test_fac_0117__T9factorialVi5Z9factorialxi segment
_D11test_fac_0117__T9factorialVi5Z9factorialxi:
db 078h,000h,000h,000h ;x...
.data._D11test_fac_0117__T9factorialVi5Z9factorialxi ends
.data._D11test_fac_0117__T9factorialVi4Z9factorialxi segment
_D11test_fac_0117__T9factorialVi4Z9factorialxi:
db 018h,000h,000h,000h ;....
.data._D11test_fac_0117__T9factorialVi4Z9factorialxi ends
.data._D11test_fac_0117__T9factorialVi3Z9factorialxi segment
_D11test_fac_0117__T9factorialVi3Z9factorialxi:
db 006h,000h,000h,000h ;....
.data._D11test_fac_0117__T9factorialVi3Z9factorialxi ends
.data._D11test_fac_0117__T9factorialVi2Z9factorialxi segment
_D11test_fac_0117__T9factorialVi2Z9factorialxi:
db 002h,000h,000h,000h ;....
.data._D11test_fac_0117__T9factorialVi2Z9factorialxi ends
.data._D11test_fac_0117__T9factorialVi1Z9factorialxi segment
_D11test_fac_0117__T9factorialVi1Z9factorialxi:
db 001h,000h,000h,000h ;....
.data._D11test_fac_0117__T9factorialVi1Z9factorialxi ends
.ctors segment
dq offset FLAT:#64
.ctors ends
.text._D11test_fac_017__arrayZ segment
assume CS:.text._D11test_fac_017__arrayZ
_D11test_fac_017__arrayZ:
push RBP
mov RBP,RSP
sub RSP,010h
mov RSI,RDI
mov RDI,offset FLAT:_D11test_fac_0112__ModuleInfoZ#64
call _d_array_bounds#PC32
nop
nop
.text._D11test_fac_017__arrayZ ends
.text._D11test_fac_018__assertFiZv segment
assume CS:.text._D11test_fac_018__assertFiZv
_D11test_fac_018__assertFiZv:
push RBP
mov RBP,RSP
sub RSP,010h
mov RSI,RDI
mov RDI,offset FLAT:_D11test_fac_0112__ModuleInfoZ#64
call _d_assertm#PC32
nop
nop
.text._D11test_fac_018__assertFiZv ends
.text._D11test_fac_0115__unittest_failFiZv segment
assume CS:.text._D11test_fac_0115__unittest_failFiZv
_D11test_fac_0115__unittest_failFiZv:
push RBP
mov RBP,RSP
sub RSP,010h
mov RSI,RDI
mov RDI,offset FLAT:_D11test_fac_0112__ModuleInfoZ#64
call _d_unittestm#PC32
leave
ret
.text._D11test_fac_0115__unittest_failFiZv ends
end

You shouldn't use templates when what you want is compile-time function execution. Just write the function as you would and call it in a static context.
int main()
{
static int x = factorial(5); // static causes CTFE
return x;
}
int factorial(int n)
{
if (n == 1)
return 1;
else
return n * factorial(n-1);
}
This won't result in any extra symbols because factorial is evaluated at compile time. There are no symbols other than factorial itself. Your template trick instantiates symbols to achieve the same effect, but it's not symbols you want.
Alternatively, if you still want to use templates, but don't want symbols then you can use manifest constants via enum.
template factorial(int n)
{
static if (n == 1)
enum factorial = 1;
else
enum factorial = n * factorial!(n-1);
}
Notice the change from const to enum. enum values are purely compile-time, so they produce no symbols or data in the object files.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to save a value to a variable using mmx ? (c++) - c++

Related

Reverse "formula" based on a string

Dynamically Find the Edge of a Rectangle

A variation of fibonacci assembly x86, have to call it from a c++ main method, kind of lost at few parts

C++ Conditional Operator versus if-else

Why are intermediate results of template recursion in object file?

Categories

Resources