The linker cannot find assembly function connected with C++ program - c++

I wrote assembly function and main in C in VS Code and compiled it a standard command
gcc charFull64.s file.c -gstabs -no-pie -o ex
The program was linked and worked. Then I tried to add them to the project in CLion. Unfortunately, I keep getting the same problem that the linker cannot find my assembly function. I tried to add some flags in the CMake and find an answer on the internet, but I can't.
Probably a significant part of the function program (charFull64.s):
[...]
.text
.global fullChar
.type fullChar , #function
#int fullChar (unsigned char*);
fullChar:
push %rbp
movq %rsp, %rbp
pushq %rbx
pushq %r12
pushq %r13
movq %rdi, number # unsigned char* to number
[...]
And my main in file.c:
#include <stdio.h>
extern int fullChar (unsigned char*);
int main(){
unsigned char a[] = "514";
int size = fullChar(a);
for(int i = 0; i < size; i++)
printf(" %d", a[i]);
printf("\n");
return 0;
}
Clion still throws this:
CMakeFiles/program.dir/main.cpp.o: In function `main':
/home/stanley/CLionProjects/Factorization/main.cpp:17: undefined reference to `fullChar(unsigned char*)'
collect2: error: ld returned 1 exit status
CMakeFiles/program.dir/build.make:90: recipe for target 'program' failed
I don't know how to solve it anymore.

Related

Why global arrays does not consume memory in readonly mode?

The following code declare a global array (256 MiB) and calculate sum of it's items.
This program consumes 188 KiB when runing:
#include <cstdlib>
#include <iostream>
using namespace std;
const unsigned int buffer_len = 256 * 1024 * 1024; // 256 MB
unsigned char buffer[buffer_len];
int main()
{
while(true)
{
int sum = 0;
for(int i = 0; i < buffer_len; i++)
sum += buffer[i];
cout << "Sum: " << sum << endl;
}
return 0;
}
The following code is like the above code, but sets array elements to random value before calculating the sum of array items.
This program consumes 256.1 MiB when runing:
#include <cstdlib>
#include <iostream>
using namespace std;
const unsigned int buffer_len = 256 * 1024 * 1024; // 256 MB
unsigned char buffer[buffer_len];
int main()
{
while(true)
{
// ********** Changing array items.
for(int i = 0; i < buffer_len; i++)
buffer[i] = std::rand() % 256;
// **********
int sum = 0;
for(int i = 0; i < buffer_len; i++)
sum += buffer[i];
cout << "Sum: " << sum << endl;
}
return 0;
}
Why global arrays does not consume memory in readonly mode (188K vs 256M)?
My Compiler: GCC
MY OS: Ubuntu 20.04
Update:
In my real scenario I will generate the buffer with xxd command, so it's elements are not zero:
$ xxd -i buffer.dat buffer.cpp
There's much speculation in the comments that this behavior is explained by compiler optimizations, but OP's use of g++ (i.e. without optimization) doesn't support this, and neither does the assembly output on the same architecture, which clearly shows buffer being used:
buffer:
.zero 268435456
.section .rodata
...
.L3:
movl -4(%rbp), %eax
cmpl $268435455, %eax
ja .L2
movl -4(%rbp), %eax
cltq
leaq buffer(%rip), %rdx
movzbl (%rax,%rdx), %eax
movzbl %al, %eax
addl %eax, -8(%rbp)
addl $1, -4(%rbp)
jmp .L3
The real reason you're seeing this behavior is the use of Copy On Write in the kernel's VM system. Essentially for a large buffer of zeros like you have here, the kernel will create a single "zero page" and point all pages in buffer to this page. Only when the page is written will it get allocated.
This is actually true in your second example as well (i.e. the behavior is the same), but you're touching every page with data, which forces the memory to be "paged in" for the entire buffer. Try only writing buffer_len/2, and you'll see that 128.2MiB of memory gets allocated by the process. Half of the true size of the buffer:
Here's also a helpful summary from Wikipedia:
The copy-on-write technique can be extended to support efficient memory allocation by having a page of physical memory filled with zeros. When the memory is allocated, all the pages returned refer to the page of zeros and are all marked copy-on-write. This way, physical memory is not allocated for the process until data is written, allowing processes to reserve more virtual memory than physical memory and use memory sparsely, at the risk of running out of virtual address space.

The -masm=intel flag is not working for running assembly language in gcc compiler with Intel syntax

I am trying to use inline assembler __asm in my C program with Intel syntax as opposed to AT&T syntax. I am compiling with gcc -S -masm=intel test.c
but it is giving error. Below is my test.c file.
#include <stdio.h>
//using namespace std;
int AsmCode(int num,int power) {
__asm {
mov eax, num;
mov ecx, power;
shl eax, cl;
};
}
int main()
{
printf("eax value is %d\n",AsmCode(2,3));
//getchar();
return 0;
}
Expected result was eax value is 16, but errors are occurring like unknown type name 'mov',unknown type name 'shl' etc.
Edit:
I have updated the code as:
int AsmCode(int num,int power) {
__asm__ (
"movl eax, num;"
"mov ecx, power;"
"shl eax, cl;"
);
}
int main()
{
printf("eax value is %d\n",AsmCode(2,3));
return 0;
}
And compiled this code with gcc -S -masm=intel test.c. This resulted in NO OUTPUT, whereas it should produce output as eax value is 16.
When compiled with gcc test.c it produced the errors:
Error: too many memory references for 'mov'
Error: too many memory references for 'shl'
Please help..
The most important error is the first one:
main.cpp:4:11: error: expected '(' before '{' token
__asm {
^
(
You're using the wrong syntax for GCC. You've used Microsoft Visual Studio syntax. So, your GCC doesn't know that you're trying to give it assembly instructions.
Instead of __asm { ... }, it should be __asm__ ( "..." ).
Like this:
int AsmCode(int num,int power) {
__asm__ (
"mov eax, num;"
"mov ecx, power;"
"shl eax, cl;"
);
}
Read more here.
Note that there are further issues with your ASM that you should ask about separately.

Tracking native instructions in Intel PIN [duplicate]

This question already has an answer here:
What instructions 'instCount' Pin tool counts?
(1 answer)
Closed 5 years ago.
I am using the Intel PIN tool to do some analysis on the assembly instructions of a C program. I have a simple C program which prints "Hello World", which I have compiled and generated an executable. I have the assembly instruction trace generated from gdb like this-
Dump of assembler code for function main:
0x0000000000400526 <+0>: push %rbp
0x0000000000400527 <+1>: mov %rsp,%rbp
=> 0x000000000040052a <+4>: mov $0x4005c4,%edi
0x000000000040052f <+9>: mov $0x0,%eax
0x0000000000400534 <+14>: callq 0x400400 <printf#plt>
0x0000000000400539 <+19>: mov $0x0,%eax
0x000000000040053e <+24>: pop %rbp
0x000000000040053f <+25>: retq
End of assembler dump.
I ran a pintool where I gave the executable as an input, and I am doing an instruction trace and printing the number of instructions. I wish to trace the instructions which are from my C program and probably get the machine opcodes and do some kind of analysis. I am using a C++ PIN tool to count the number of instructions-
#include "pin.H"
#include <iostream>
#include <stdio.h>
UINT64 icount = 0;
using namespace std;
//====================================================================
// Analysis Routines
//====================================================================
void docount(THREADID tid) {
icount++;
}
//====================================================================
// Instrumentation Routines
//====================================================================
VOID Instruction(INS ins, void *v) {
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_THREAD_ID, IARG_END);
}
VOID Fini(INT32 code, VOID *v) {
printf("count = %ld\n",(long)icount);
}
INT32 Usage() {
PIN_ERROR("This Pintool failed\n"
+ KNOB_BASE::StringKnobSummary() + "\n");
return -1;
}
int main(int argc, char *argv[]) {
if (PIN_Init(argc, argv)) return Usage();
PIN_InitSymbols();
PIN_AddInternalExceptionHandler(ExceptionHandler,NULL);
INS_AddInstrumentFunction(Instruction, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram();
return 0;
}
When I run my hello world program with this tool, I get icount = 81563. I understand that PIN adds its own instructions for analysis, but I don't understand how it adds so many instructions, while I don't have more than 10 instructions in my C program. Also is there a way to identify the assembly instructions which are from my code and the ones generated by PIN. I seem to find no way to differentiate between instructions generated by PIN and the ones which are from my program. Please Help!
You're not measuring what you think you're measuring. See my answer here for details:
What instructions 'instCount' Pin tool counts?
Pin does not count its own instructions. The large count is the result of preparation before and after main() and the call to printf().

Why are there 8 bytes between the end of a buffer and the saved frame pointer?

I am doing a stack-smashing exercise for coursework, and I have already completed the assignment, but there is one aspect that I do not understand.
Here is the target program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int bar(char *arg, char *out)
{
strcpy(out, arg);
return 0;
}
void foo(char *argv[])
{
char buf[256];
bar(argv[1], buf);
}
int main(int argc, char *argv[])
{
if (argc != 2)
{
fprintf(stderr, "target1: argc != 2\n");
exit(EXIT_FAILURE);
}
foo(argv);
return 0;
}
Here are the commands used to compile it, on an x86 virtual machine running Ubuntu 12.04, with ASLR disabled.
gcc -ggdb -m32 -g -std=c99 -D_GNU_SOURCE -fno-stack-protector -m32 target1.c -o target1
execstack -s target1
When I look at the memory of this program on the stack, I see that buf has the address 0xbffffc40. Moreover, the saved frame pointer is stored at 0xbffffd48, and the return address is stored at 0xbffffd4c.
These specific addresses are not relevant, but I observe that even though buf only has length 256, the distance 0xbffffd48 - 0xbffffc40 = 264. Symbolically, this computation is $fp - buf.
Why are there 8 extra bytes between the end of buf and the stored frame pointer on the stack?
Here is some disassembly of the function foo. I have already examined it, but I did not see any obvious usage of that memory region, unless it was implicit (ie a side effect of some instruction).
0x080484ab <+0>: push %ebp
0x080484ac <+1>: mov %esp,%ebp
0x080484ae <+3>: sub $0x118,%esp
0x080484b4 <+9>: mov 0x8(%ebp),%eax
0x080484b7 <+12>: add $0x4,%eax
0x080484ba <+15>: mov (%eax),%eax
0x080484bc <+17>: lea -0x108(%ebp),%edx
0x080484c2 <+23>: mov %edx,0x4(%esp)
0x080484c6 <+27>: mov %eax,(%esp)
0x080484c9 <+30>: call 0x804848c <bar>
0x080484ce <+35>: leave
0x080484cf <+36>: ret
Basile Starynkevitch gets the prize for mentioning alignment.
It turns out that gcc 4.7.2 defaults to aligning the frame boundary to a 4-word boundary. On 32-bit emulated hardware, that is 16 bytes. Since the saved frame pointer and the saved instruction pointer together only take up 8 bytes, the compiler put in another 8 bytes after the end of buf to align the top of the stack frame to a 16 byte boundary.
Using the following additional compiler flag, the 8 bytes disappears, because the 8 bytes is enough to align to a 2-word boundary.
-mpreferred-stack-boundary=2

Super weird segfault with gcc 4.7 -- Bug?

Here is a piece of code that I've been trying to compile:
#include <cstdio>
#define N 3
struct Data {
int A[N][N];
int B[N];
};
int foo(int uloc, const int A[N][N], const int B[N])
{
for(unsigned int j = 0; j < N; j++) {
for( int i = 0; i < N; i++) {
for( int r = 0; r < N ; r++) {
for( int q = 0; q < N ; q++) {
uloc += B[i]*A[r][j] + B[j];
}
}
}
}
return uloc;
}
int apply(const Data *d)
{
return foo(4,d->A,d->B);
}
int main(int, char **)
{
Data d;
for(int i = 0; i < N; ++i) {
for(int j = 0; j < N; ++j) {
d.A[i][j] = 0.0;
}
d.B[i] = 0.0;
}
int res = 11 + apply(&d);
printf("%d\n",res);
return 0;
}
Yes, it looks quite strange, and does not do anything useful at all at the moment, but it is the most concise version of a much larger program which I had the problem with initially.
It compiles and runs just fine with GCC(G++) 4.4 and 4.6, but if I use GCC 4.7, and enable third level optimizations:
g++-4.7 -g -O3 prog.cpp -o prog
I get a segmentation fault when running it. Gdb does not really give much information on what went wrong:
(gdb) run
Starting program: /home/kalle/work/code/advect_diff/c++/strunt
Program received signal SIGSEGV, Segmentation fault.
apply (d=d#entry=0x7fffffffe1a0) at src/strunt.cpp:25
25 int apply(const Data *d)
(gdb) bt
#0 apply (d=d#entry=0x7fffffffe1a0) at src/strunt.cpp:25
#1 0x00000000004004cc in main () at src/strunt.cpp:34
I've tried tweaking the code in different ways to see if the error goes away. It seems necessary to have all of the four loop levels in foo, and I have not been able to reproduce it by having a single level of function calls. Oh yeah, the outermost loop must use an unsigned loop index.
I'm starting to suspect that this is a bug in the compiler or runtime, since it is specific to version 4.7 and I cannot see what memory accesses are invalid.
Any insight into what is going on would be very much appreciated.
It is possible to get the same situation with the C-version of GCC, with a slight modification of the code.
My system is:
Debian wheezy
Linux 3.2.0-4-amd64
GCC 4.7.2-5
Okay so I looked at the disassembly offered by gdb, but I'm afraid it doesn't say much to me:
Dump of assembler code for function apply(Data const*):
0x0000000000400760 <+0>: push %r13
0x0000000000400762 <+2>: movabs $0x400000000,%r8
0x000000000040076c <+12>: push %r12
0x000000000040076e <+14>: push %rbp
0x000000000040076f <+15>: push %rbx
0x0000000000400770 <+16>: mov 0x24(%rdi),%ecx
=> 0x0000000000400773 <+19>: mov (%rdi,%r8,1),%ebp
0x0000000000400777 <+23>: mov 0x18(%rdi),%r10d
0x000000000040077b <+27>: mov $0x4,%r8b
0x000000000040077e <+30>: mov 0x28(%rdi),%edx
0x0000000000400781 <+33>: mov 0x2c(%rdi),%eax
0x0000000000400784 <+36>: mov %ecx,%ebx
0x0000000000400786 <+38>: mov (%rdi,%r8,1),%r11d
0x000000000040078a <+42>: mov 0x1c(%rdi),%r9d
0x000000000040078e <+46>: imul %ebp,%ebx
0x0000000000400791 <+49>: mov $0x8,%r8b
0x0000000000400794 <+52>: mov 0x20(%rdi),%esi
What should I see when I look at this?
Edit 2015-08-13: This seem to be fixed in g++ 4.8 and later.
You never initialized d. Its value is indeterminate, and trying to do math with its contents is undefined behavior. (Even trying to read its values without doing anything with them is undefined behavior.) Initialize d and see what happens.
Now that you've initialized d and it still fails, that looks like a real compiler bug. Try updating to 4.7.3 or 4.8.2; if the problem persists, submit a bug report. (The list of known bugs currently appears to be empty, or at least the link is going somewhere that only lists non-bugs.)
It indeed and unfortunately is a bug in gcc. I have not the slightest idea what it is doing there, but the generated assembly for the apply function is ( I compiled it without main btw., and it has foo inlined in it):
_Z5applyPK4Data:
pushq %r13
movabsq $17179869184, %r8
pushq %r12
pushq %rbp
pushq %rbx
movl 36(%rdi), %ecx
movl (%rdi,%r8), %ebp
movl 24(%rdi), %r10d
and exactly at the movl (%rdi,%r8), %ebp it will crashes, since it adds a nonsensical 0x400000000 to $rdi (the first parameter, thus the pointer to Data) and dereferences it.