Is main() really start of a C++ program? - c++

The section $3.6.1/1 from the C++ Standard reads,
A program shall contain a global
function called main, which is the
designated start of the program.
Now consider this code,
int square(int i) { return i*i; }
int user_main()
{
for ( int i = 0 ; i < 10 ; ++i )
std::cout << square(i) << endl;
return 0;
}
int main_ret= user_main();
int main()
{
return main_ret;
}
This sample code does what I intend it to do, i.e printing the square of integers from 0 to 9, before entering into the main() function which is supposed to be the "start" of the program.
I also compiled it with -pedantic option, GCC 4.5.0. It gives no error, not even warning!
So my question is,
Is this code really Standard conformant?
If it's standard conformant, then does it not invalidate what the Standard says? main() is not start of this program! user_main() executed before the main().
I understand that to initialize the global variable main_ret, the use_main() executes first but that is a different thing altogether; the point is that, it does invalidate the quoted statement $3.6.1/1 from the Standard, as main() is NOT the start of the program; it is in fact the end of this program!
EDIT:
How do you define the word 'start'?
It boils down to the definition of the phrase "start of the program". So how exactly do you define it?

You are reading the sentence incorrectly.
A program shall contain a global function called main, which is the designated start of the program.
The standard is DEFINING the word "start" for the purposes of the remainder of the standard. It doesn't say that no code executes before main is called. It says that the start of the program is considered to be at the function main.
Your program is compliant. Your program hasn't "started" until main is started. The function is called before your program "starts" according to the definition of "start" in the standard, but that hardly matters. A LOT of code is executed before main is ever called in every program, not just this example.
For the purposes of discussion, your function is executed prior to the 'start' of the program, and that is fully compliant with the standard.

No, C++ does a lot of things to "set the environment" prior to the call of main; however, main is the official start of the "user specified" part of the C++ program.
Some of the environment setup is not controllable (like the initial code to set up std::cout; however, some of the environment is controllable like static global blocks (for initializing static global variables). Note that since you don't have full control prior to main, you don't have full control on the order in which the static blocks get initialized.
After main, your code is conceptually "fully in control" of the program, in the sense that you can both specify the instructions to be performed and the order in which to perform them. Multi-threading can rearrange code execution order; but, you're still in control with C++ because you specified to have sections of code execute (possibly) out-of-order.

Your program will not link and thus not run unless there is a main. However main() does not cause the start of the execution of the program because objects at file level have constructors that run beforehand and it would be possible to write an entire program that runs its lifetime before main() is reached and let main itself have an empty body.
In reality to enforce this you would have to have one object that is constructed prior to main and its constructor to invoke all the flow of the program.
Look at this:
class Foo
{
public:
Foo();
// other stuff
};
Foo foo;
int main()
{
}
The flow of your program would effectively stem from Foo::Foo()

You tagged the question as "C" too, then, speaking strictly about C, your initialization should fail as per section 6.7.8 "Initialization" of the ISO C99 standard.
The most relevant in this case seems to be constraint #4 which says:
All the expressions in an initializer for an object that
has static storage duration shall be constant expressions or string literals.
So, the answer to your question is that the code is not compliant to the C standard.
You would probably want to remove the "C" tag if you were only interested to the C++ standard.

Section 3.6 as a whole is very clear about the interaction of main and dynamic initializations. The "designated start of the program" is not used anywhere else and is just descriptive of the general intent of main(). It doesn't make any sense to interpret that one phrase in a normative way that contradicts the more detailed and clear requirements in the Standard.

The compiler often has to add code before main() to be standard compliant. Because the standard specifies that initalization of globals/statics must be done before the program is executed. And as mentioned, the same goes for constructors of objects placed at file scope (globals).
Thus the original question is relevant to C as well, because in a C program you would still have the globals/static initialization to do before the program can be started.
The standards assume that these variables are initialized through "magic", because they don't say how they should be set before program initialization. I think they considered that as something outside the scope of a programming language standard.
Edit: See for example ISO 9899:1999 5.1.2:
All objects with static storage
duration shall be initialized (set to
their initial values) before program
startup. The manner and timing of such
initialization are otherwise
unspecified.
The theory behind how this "magic" was to be done goes way back to C's birth, when it was a programming language intended to be used only for the UNIX OS, on RAM-based computers. In theory, the program would be able to load all pre-initialized data from the executable file into RAM, at the same time as the program itself was uploaded to RAM.
Since then, computers and OS have evolved, and C is used in a far wider area than originally anticipated. A modern PC OS has virtual addresses etc, and all embedded systems execute code from ROM, not RAM. So there are many situations where the RAM can't be set "automagically".
Also, the standard is too abstract to know anything about stacks and process memory etc. These things must be done too, before the program is started.
Therefore, pretty much every C/C++ program has some init/"copy-down" code that is executed before main is called, in order to conform with the initialization rules of the standards.
As an example, embedded systems typically have an option called "non-ISO compliant startup" where the whole initialization phase is skipped for performance reasons, and then the code actually starts directly from main. But such systems don't conform to the standards, as you can't rely on the init values of global/static variables.

Your "program" simply returns a value from a global variable. Everything else is initialization code. Thus, the standard holds - you just have a very trivial program and more complex initialization.

main() is a user function called by the C runtime library.
see also: Avoiding the main (entry point) in a C program

Seems like an English semantics quibble. The OP refers to his block of code first as "code" and later as the "program." The user writes the code, and then the compiler writes the program.

Ubuntu 20.04 glibc 2.31 RTFS + GDB
glibc does some setup before main so that some of its functionalities will work. Let's try to track down the source code for that.
hello.c
#include <stdio.h>
int main() {
puts("hello");
return 0;
}
Compile and debug:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o hello.out hello.c
gdb hello.out
Now in GDB:
b main
r
bt -past-main
gives:
#0 main () at hello.c:3
#1 0x00007ffff7dc60b3 in __libc_start_main (main=0x555555555149 <main()>, argc=1, argv=0x7fffffffbfb8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffbfa8) at ../csu/libc-start.c:308
#2 0x000055555555508e in _start ()
This already contains the line of the caller of main: https://github.com/cirosantilli/glibc/blob/glibc-2.31/csu/libc-start.c#L308.
The function has a billion ifdefs as can be expected from the level of legacy/generality of glibc, but some key parts which seem to take effect for us should simplify to:
# define LIBC_START_MAIN __libc_start_main
STATIC int
LIBC_START_MAIN (int (*main) (int, char **, char **),
int argc, char **argv,
{
/* Initialize some stuff. */
result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
exit (result);
}
Before __libc_start_main are are already at _start, which by adding gcc -Wl,--verbose we know is the entry point because the linker script contains:
ENTRY(_start)
and is therefore is the actual very first instruction executed after the dynamic loader finishes.
To confirm that in GDB, we an get rid of the dynamic loader by compiling with -static:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o hello.out hello.c
gdb hello.out
and then make GDB stop at the very first instruction executed with starti and print the first instructions:
starti
display/12i $pc
which gives:
=> 0x401c10 <_start>: endbr64
0x401c14 <_start+4>: xor %ebp,%ebp
0x401c16 <_start+6>: mov %rdx,%r9
0x401c19 <_start+9>: pop %rsi
0x401c1a <_start+10>: mov %rsp,%rdx
0x401c1d <_start+13>: and $0xfffffffffffffff0,%rsp
0x401c21 <_start+17>: push %rax
0x401c22 <_start+18>: push %rsp
0x401c23 <_start+19>: mov $0x402dd0,%r8
0x401c2a <_start+26>: mov $0x402d30,%rcx
0x401c31 <_start+33>: mov $0x401d35,%rdi
0x401c38 <_start+40>: addr32 callq 0x4020d0 <__libc_start_main>
By grepping the source for _start and focusing on x86_64 hits we see that this seems to correspond to sysdeps/x86_64/start.S:58:
ENTRY (_start)
/* Clearing frame pointer is insufficient, use CFI. */
cfi_undefined (rip)
/* Clear the frame pointer. The ABI suggests this be done, to mark
the outermost frame obviously. */
xorl %ebp, %ebp
/* Extract the arguments as encoded on the stack and set up
the arguments for __libc_start_main (int (*main) (int, char **, char **),
int argc, char *argv,
void (*init) (void), void (*fini) (void),
void (*rtld_fini) (void), void *stack_end).
The arguments are passed via registers and on the stack:
main: %rdi
argc: %rsi
argv: %rdx
init: %rcx
fini: %r8
rtld_fini: %r9
stack_end: stack. */
mov %RDX_LP, %R9_LP /* Address of the shared library termination
function. */
#ifdef __ILP32__
mov (%rsp), %esi /* Simulate popping 4-byte argument count. */
add $4, %esp
#else
popq %rsi /* Pop the argument count. */
#endif
/* argv starts just at the current stack top. */
mov %RSP_LP, %RDX_LP
/* Align the stack to a 16 byte boundary to follow the ABI. */
and $~15, %RSP_LP
/* Push garbage because we push 8 more bytes. */
pushq %rax
/* Provide the highest stack address to the user code (for stacks
which grow downwards). */
pushq %rsp
#ifdef PIC
/* Pass address of our own entry points to .fini and .init. */
mov __libc_csu_fini#GOTPCREL(%rip), %R8_LP
mov __libc_csu_init#GOTPCREL(%rip), %RCX_LP
mov main#GOTPCREL(%rip), %RDI_LP
#else
/* Pass address of our own entry points to .fini and .init. */
mov $__libc_csu_fini, %R8_LP
mov $__libc_csu_init, %RCX_LP
mov $main, %RDI_LP
#endif
/* Call the user's main function, and exit with its value.
But let the libc call main. Since __libc_start_main in
libc.so is called very early, lazy binding isn't relevant
here. Use indirect branch via GOT to avoid extra branch
to PLT slot. In case of static executable, ld in binutils
2.26 or above can convert indirect branch into direct
branch. */
call *__libc_start_main#GOTPCREL(%rip)
which ends up calling __libc_start_main as expected.
Unfortunately -static makes the bt from main not show as much info:
#0 main () at hello.c:3
#1 0x0000000000402560 in __libc_start_main ()
#2 0x0000000000401c3e in _start ()
If we remove -static and start from starti, we get instead:
=> 0x7ffff7fd0100 <_start>: mov %rsp,%rdi
0x7ffff7fd0103 <_start+3>: callq 0x7ffff7fd0df0 <_dl_start>
0x7ffff7fd0108 <_dl_start_user>: mov %rax,%r12
0x7ffff7fd010b <_dl_start_user+3>: mov 0x2c4e7(%rip),%eax # 0x7ffff7ffc5f8 <_dl_skip_args>
0x7ffff7fd0111 <_dl_start_user+9>: pop %rdx
By grepping the source for _dl_start_user this seems to come from sysdeps/x86_64/dl-machine.h:L147
/* Initial entry point code for the dynamic linker.
The C function `_dl_start' is the real entry point;
its return value is the user program's entry point. */
#define RTLD_START asm ("\n\
.text\n\
.align 16\n\
.globl _start\n\
.globl _dl_start_user\n\
_start:\n\
movq %rsp, %rdi\n\
call _dl_start\n\
_dl_start_user:\n\
# Save the user entry point address in %r12.\n\
movq %rax, %r12\n\
# See if we were run as a command with the executable file\n\
# name as an extra leading argument.\n\
movl _dl_skip_args(%rip), %eax\n\
# Pop the original argument count.\n\
popq %rdx\n\
and this is presumably the dynamic loader entry point.
If we break at _start and continue, this seems to end up in the same location as when we used -static, which then calls __libc_start_main.
When I try a C++ program instead:
hello.cpp
#include <iostream>
int main() {
std::cout << "hello" << std::endl;
}
with:
g++ -ggdb3 -O0 -std=c++11 -Wall -Wextra -pedantic -o hello.out hello.cpp
the results are basically the same, e.g. the backtrace at main is the exact same.
I think the C++ compiler is just calling into hooks to achieve any C++ specific functionality, and things are pretty well factored across C/C++.
TODO:
commented on concrete easy-to-understand examples of what glibc is doing before main. This gives some ideas: What happens before main in C++?
make GDB show the source itself without us having to look at it separately, possibly with us building glibc ourselves: How to compile my own glibc C standard library from source and use it?
understand how the above source code maps to objects such as crti.o that can be seen with gcc --verbose main.c and which end up getting added to the final link

main is called after initializing all the global variables.
What the standard does not specify is the order of initialization of all the global variables of all the modules and statically linked libraries.

Yes, main is the "entry point" of every C++ program, excepting implementation-specific extensions. Even so, some things happen before main, notably global initialization such as for main_ret.

Related

Debug an AVR assembler project at source level

I want to debug a code of a plain assembler project for the ATmega2560. I want to use the Microchip debugger for that purpose. The goal is to get source level debugging with all variables and functions, breakpoints etc.
I managed to create a C "stub" file with a main() function that calls the assembler code.
extern int foo(int a);
int main(void)
{
int a = 0;
while (1)
{
a = foo(a);
}
}
But the assembler code also includes the interrupt vector table including the reset vector.
.extern main
.section .vectors
.global RESET_
RESET_: jmp WARM_0
.section code
.global foo
foo:
ret
WARM_0:
call main
ret
.end
Now I want to run the code from the label RESET_. The linker has placed the code in the section .vectors. That's okay so far, but the vector table from the GCC startup files are in that section before the vector table in my code. The GCC startup code must be removed to get my vector to the address 0. Therefore I activate the linker option "Do not use standard start files (-nostartfiles)". That gives the desired result: The reset vector has a jump to RESET_.
But this has the important side effect, that the debugger is not able to debug at source code level anymore. The C file with the main() function is still linked. But the source level debug support is lost.
How can I debug a plan assembler project with Microchip debugger/simulator for AVR8?
Remark: The code in the assembler file is not sufficient for a valid program. It's reduced to get a minimal example that should work in the Microchip environment.

A deeper look into variable initialization [duplicate]

In C, let's say you have a variable called variable_name. Let's say it's located at 0xaaaaaaaa, and at that memory address, you have the integer 123. So in other words, variable_name contains 123.
I'm looking for clarification around the phrasing "variable_name is located at 0xaaaaaaaa". How does the compiler recognize that the string "variable_name" is associated with that particular memory address? Is the string "variable_name" stored somewhere in memory? Does the compiler just substitute variable_name for 0xaaaaaaaa whenever it sees it, and if so, wouldn't it have to use memory in order to make that substitution?
Variable names don't exist anymore after the compiler runs (barring special cases like exported globals in shared libraries or debug symbols). The entire act of compilation is intended to take those symbolic names and algorithms represented by your source code and turn them into native machine instructions. So yes, if you have a global variable_name, and compiler and linker decide to put it at 0xaaaaaaaa, then wherever it is used in the code, it will just be accessed via that address.
So to answer your literal questions:
How does the compiler recognize that the string "variable_name" is associated with that particular memory address?
The toolchain (compiler & linker) work together to assign a memory location for the variable. It's the compiler's job to keep track of all the references, and linker puts in the right addresses later.
Is the string "variable_name" stored somewhere in memory?
Only while the compiler is running.
Does the compiler just substitute variable_name for 0xaaaaaaaa whenever it sees it, and if so, wouldn't it have to use memory in order to make that substitution?
Yes, that's pretty much what happens, except it's a two-stage job with the linker. And yes, it uses memory, but it's the compiler's memory, not anything at runtime for your program.
An example might help you understand. Let's try out this program:
int x = 12;
int main(void)
{
return x;
}
Pretty straightforward, right? OK. Let's take this program, and compile it and look at the disassembly:
$ cc -Wall -Werror -Wextra -O3 example.c -o example
$ otool -tV example
example:
(__TEXT,__text) section
_main:
0000000100000f60 pushq %rbp
0000000100000f61 movq %rsp,%rbp
0000000100000f64 movl 0x00000096(%rip),%eax
0000000100000f6a popq %rbp
0000000100000f6b ret
See that movl line? It's grabbing the global variable (in an instruction-pointer relative way, in this case). No more mention of x.
Now let's make it a bit more complicated and add a local variable:
int x = 12;
int main(void)
{
volatile int y = 4;
return x + y;
}
The disassembly for this program is:
(__TEXT,__text) section
_main:
0000000100000f60 pushq %rbp
0000000100000f61 movq %rsp,%rbp
0000000100000f64 movl $0x00000004,0xfc(%rbp)
0000000100000f6b movl 0x0000008f(%rip),%eax
0000000100000f71 addl 0xfc(%rbp),%eax
0000000100000f74 popq %rbp
0000000100000f75 ret
Now there are two movl instructions and an addl instruction. You can see that the first movl is initializing y, which it's decided will be on the stack (base pointer - 4). Then the next movl gets the global x into a register eax, and the addl adds y to that value. But as you can see, the literal x and y strings don't exist anymore. They were conveniences for you, the programmer, but the computer certainly doesn't care about them at execution time.
A C compiler first creates a symbol table, which stores the relationship between the variable name and where it's located in memory. When compiling, it uses this table to replace all instances of the variable with a specific memory location, as others have stated. You can find a lot more on it on the Wikipedia page.
All variables are substituted by the compiler. First they are substituted with references and later the linker places addresses instead of references.
In other words. The variable names are not available anymore as soon as the compiler has run through
This is what's called an implementation detail. While what you describe is the case in all compilers I've ever used, it's not required to be the case. A C compiler could put every variable in a hashtable and look them up at runtime (or something like that) and in fact early JavaScript interpreters did exactly that (now, they do Just-In-TIme compilation that results in something much more raw.)
Specifically for common compilers like VC++, GCC, and LLVM: the compiler will generally assign a variable to a location in memory. Variables of global or static scope get a fixed address that doesn't change while the program is running, while variables within a function get a stack address-that is, an address relative to the current stack pointer, which changes every time a function is called. (This is an oversimplification.) Stack addresses become invalid as soon as the function returns, but have the benefit of having effectively zero overhead to use.
Once a variable has an address assigned to it, there is no further need for the name of the variable, so it is discarded. Depending on the kind of name, the name may be discarded at preprocess time (for macro names), compile time (for static and local variables/functions), and link time (for global variables/functions.) If a symbol is exported (made visible to other programs so they can access it), the name will usually remain somewhere in a "symbol table" which does take up a trivial amount of memory and disk space.
Does the compiler just substitute variable_name for 0xaaaaaaaa whenever it sees it
Yes.
and if so, wouldn't it have to use memory in order to make that substitution?
Yes. But it's the compiler, after it compiled your code, why do you care about memory?

C++ / GCC - file-scope objects' constructors aren't being called

I'm beginning to develop for a Renesas RZ/A1L microcontroller. Renesas provide an IDE (e2 Studio - a modified version of Eclipse), set up to compile C / C++ with GCC. Everything works fine, but...
If I declare an object in file-scope (outside of any function), its constructor is never called. For instance:
class NewClass {
public:
int i;
NewClass() {
i = 4;
}
};
NewClass newInstance;
int main(void)
{
// My program...
}
I can tell that the constructor isn't being called because, using the in-circuit debugging setup supplied by Renesas, I can see that i is never set to 4 (even when I place further references to newInstance and i; I have optimisation switched off too). Sorry I can't do a simple cout of i's value - the code is being run in a microcontroller and I haven't worked out how to do that just yet.
If I instead place the NewClass newInstance; line inside of main(), then the problem goes away.
A further consequence of the problem is that, for inheriting classes, calling a virtual function on one (via a pointer of base-class type) causes a crash - I suspect due to the constructor having not executed and hence not written to memory some indicator of what class the object was.
By what mechanism would such a constructor normally be called? I did some Googling - would it be the ".ctors" list? (https://gcc.gnu.org/onlinedocs/gccint/Initialization.html)
Renesas's "template" C++ project does actually include code to call all of the ctors; however, from looking at my generated .map file for the project, I can see that no ctors are actually present. Does that narrow down the problem - is the GCC compiler not spitting them out when it should be?
Many thanks for your help.
Quoting the draft C++11 standard, N3337, we find that:
[basic.start.main]/1 A program shall contain a global function called
main, which is the designated start of the program. It is
implementation-defined whether a program in a freestanding environment
is required to define a main function. [ Note: In a freestanding
environment, start-up and termination is implementation-defined;
start-up contains the execution of constructors for objects of
namespace scope with static storage duration; termination contains the
execution of destructors for objects with static storage duration. —
end note ]
As you can see, it's implementation-defined in a free-standing environment. Therefore assuming you have a 32-bit x86 GCC toolchain...
It sounds like you're in a freestanding environment. If so, if you want to use global constructors, there is some boilerplate you need to implement. The initialization page you linked mentions a linker line, something that would look like this:
i686-elf-ld crt0.o crti.o crtbegin.o foo.o bar.o crtend.o crtn.o
Assuming foo.o and bar.o are part of your program, it's required that your linker line looks like this. Note that the compiler should provide its own crtbegin.o and crtend.o, so you can find the location of those using -print-file-name. In a Makefile, it'd look something like this:
CRTBEGIN_OBJ:=$(shell $(CC) $(CFLAGS) -print-file-name=crtbegin.o)
CRTEND_OBJ:=$(shell $(CC) $(CFLAGS) -print-file-name=crtend.o)
Now for the actual initialization function. In the same file as your kernel entry point, call _init before kernel_main (or whatever it's called.) Optionally _fini can be called after kernel_main but it's unlikely to be necessary. The exact code will depend on the architecture, but here is an example for 32-bit x86:
/* x86 crti.s */
.section .init
.global _init
.type _init, #function
_init:
push %ebp
movl %esp, %ebp
/* gcc will nicely put the contents of crtbegin.o's .init section here. */
.section .fini
.global _fini
.type _fini, #function
_fini:
push %ebp
movl %esp, %ebp
/* gcc will nicely put the contents of crtbegin.o's .fini section here. */
/* x86 crtn.s */
.section .init
/* gcc will nicely put the contents of crtend.o's .init section here. */
popl %ebp
ret
.section .fini
/* gcc will nicely put the contents of crtend.o's .fini section here. */
popl %ebp
ret

Why would inline assembly crash in release only? [duplicate]

I have a scenario in GCC causing me problems. The behaviour I get is not the behaviour I expect. To summarise the situation, I am proposing several new instructions for x86-64 which are implemented in a hardware simulator. In order to test these instructions I am taking existing C source code and handcoding the new instructions using hexidecimal. Because these instructions interact with the existing x86-64 registers, I use the input/output/clobber lists to declare dependencies for GCC.
What's happening is that if I call a function e.g. printf, the dependent registers aren't saved and restored.
For example
register unsigned long r9 asm ("r9") = 101;
printf("foo %s\n", "bar");
asm volatile (".byte 0x00, 0x00, 0x00, 0x00" : /* no output */ : "q" (r9) );
101 was assigned to r9 and the inline assembly (fake in this example) is dependent on r9. This runs correctly in the absence of the printf, but when it is there GCC does not save and restore r9 and another value is there by the time my custom instruction is called.
I thought perhaps that GCC might have secretly changed the assignment to the variable r9, but when I do this
asm volatile (".byte %0" : /* no output */ : "q" (r9) );
and look at the assembly output, it is indeed using %r9.
I am using gcc 4.4.5. What do you think might be happening? I thought GCC will always save and restore registers on function calls. Is there some way I can enforce it?
Thanks!
EDIT: By the way, I'm compiling the program like this
gcc -static -m64 -mmmx -msse -msse2 -O0 test.c -o test
The ABI, section 3.2.1 says:
Registers %rbp, %rbx and %r12 through %r15 “belong” to the calling function and the
called function is
required to preserve their values. In other words, a called function must preserve
these registers’ values for its caller. Remaining registers “belong” to the called
function. If a calling function wants to preserve such a register value across a
function call, it must save the value in its local stack frame.
so you shouldn't expect registers other than %rbp, %rbx and %r12 through %r15 to be preserved by a function call.
gcc will not make explicit-register variables like this callee-saved. Basically this register notation you're using makes the variable a direct alias for the register, with the assumption you want to be able to read back the value a callee leaves in the register. If you used a callee-saved register instead of a call-clobbered (caller-saved) register, the problem would go away.

Is it possible to write a program without using main() function?

I keep getting this question asked in interviews:
Write a program without using main() function?
One of my friends showed me some code using Macros, but i could not understand it.
So the question is:
Is it really possible to write and compile a program without main()?
No you cannot unless you are writing a program in a freestanding environment (embedded environment OS kernel etc.) where the starting point need not be main(). As per the C++ standard main() is the starting point of any program in a hosted environment.
As per the:
C++03 standard 3.6.1 Main function
1 A program shall contain a global function called main, which is the designated start of the program. It is implementation-defined whether a program in a freestanding environment is required to define a main function. [ Note: In a freestanding environment, start-up and
termination is implementation-defined; startup contains the execution of constructors for objects of namespace scope with static storage duration; termination contains the execution of destructors for objects with static storage duration.
What is freestanding Environment & What is Hosted Environment?
There are two kinds of conforming implementations defined in the C++ standard; hosted and freestanding.
A freestanding implementation is one that is designed for programs that are executed without the benefit of an operating system.
For Ex: An OS kernel or Embedded environment would be a freestanding environment.
A program using the facilities of an operating system would normally be in a hosted implementation.
From the C++03 Standard Section 1.4/7:
A freestanding implementation is one in which execution may take place without the benefit of an operating system, and has an implementation-defined set of libraries that includes certain language-support libraries.
Further,
Section: 17.4.1.3.2 Freestanding implementations quotes:
A freestanding implementation has an implementation-defined set of headers. This set shall include at least the following headers, as shown in Table:
18.1 Types <cstddef>
18.2 Implementation properties <limits>
18.3 Start and termination <cstdlib>
18.4 Dynamic memory management <new>
18.5 Type identification <typeinfo>
18.6 Exception handling <exception>
18.7 Other runtime support <cstdarg>
Within standard C++ a main function is required, so the question does not make sense for standard C++.
Outside of standard C++ you can for example write a Windows specific program and use one of Microsoft's custom startup functions (wMain, winMain, wWinmain). In Windows you can also write the program as a DLL and use rundll32 to run it.
Apart from that you can make your own little runtime library. At one time that was a common sport.
Finally, you can get clever and retort that according to the standard's ODR rule main isn't "used", so any program qualifies. Bah! Although unless the interviewers have unusual good sense of humor (and they wouldn't have asked the question if they had) they'll not think that that's a good answer.
Sample program without a visible main function.
/*
7050925.c
$ gcc -o 7050925 7050925.c
*/
#include <stdio.h>
#define decode(s,t,u,m,p,e,d) m##s##u##t
#define begin decode(a,n,i,m,a,t,e)
int begin()
{
printf("How mainless!\n");
}
From: http://learnhacking.in/c-program-without-main-function/
main means an entry point, a point from which your code will start executing. although main is not the first function to run. There are some more code which runs before main and prepares the environment to make your code run. This code then calls main . You can change the name of the main function by recompiling the code of the startup file crt0.c and changing the name of the main function. Or you can do the following:
#include <stdio.h>
extern void _exit (register int code);
_start()
{
int retval;
retval = my_main ();
_exit(retval);
}
int my_main(void)
{
printf("Hello\n");
return 0;
}
Compile the code with:
gcc -o no_main no_main.c -nostartfiles
The -nostartfiles will not include the default startup file. You point to the main entry file with the _start .
main is nothing but a predefined entrypoint for the user code. Therefore you can name it whatever, but at the end of the day you do need an entry point. In C/C++ and other languages the name is selected as main if you make another language or hack the sources of these language compilers then you can change the name of main to pain but it will bring pain, as it will violate the standards.
But manipulating the entry function name is useful for kernel code, the first function to run in the kernel, or code written for embedded systems.
They may refer to a program written for a freestanding implementation. The C++ Standard defines two sorts of implementations. One is a hosted implementation. Programs written for those implementations are required to have a main function. But otherwise, no main function is required if the freestanding implementation doesn't require one. This is useful for operation system kernels or embedded system programs that don't run under an operation system.
Yes
$ cat > hwa.S
write = 0x04
exit = 0xfc
.text
_start:
movl $1, %ebx
lea str, %ecx
movl $len, %edx
movl $write, %eax
int $0x80
xorl %ebx, %ebx
movl $exit, %eax
int $0x80
.data
str: .ascii "Hello, world!\n"
len = . -str
.globl _start
$ as -o hwa.o hwa.S
$ ld hwa.o
$ ./a.out
Hello, world!
The kernel that really runs an executable knows nothing about internal symbols, it just transfers to an entry point specified in binary in the executable image header.
The reason you need a main is because normally your "main program" is really just another module. The entry point is in library-provided startup code written in some combination of C and assembly and that library code just happens to call main so you normally need to provide one. But run the linker directly and you don't.
To include a C module1...
Mac:~/so$ cat > nomain.S
.text
.globl start
start:
call _notmain
Mac:~/so$ as -o nomain.o nomain.S
Mac:~/so$ cat > notmain.c
#include <unistd.h>
void notmain(void) {
write(1, "hi\n", 3);
_exit(0);
}
Mac:~/so$ cc -c notmain.c
Mac:~/so$ ld -w nomain.o notmain.o -lc
Mac:~/so$ ./a.out
hi
1. And I'm also switching to x86-64 here.
Yes it possible to compile with out main but you cannot pass the linking phase though.
g++ -c noMain.cpp -o noMain.o
"Without using main" might also mean that no logic is allowed within main, but the main itself exists. I can imagine the question had this cleared out, but since it's not cleared here, this is another possible answer:
struct MainSub
{
MainSub()
{
// do some stuff
}
};
MainSub mainSub;
int main(int argc, char *argv[]) { return 0; }
What will happen here is that the stuff in MainSub's constructor will execute before the unusable main is executed, and you can place the program's logic there. This of course requires C++, and not C (also not clear from the question).
As long as you are using g++ you could change your entry point with linker option -e, so the following code and compile command may let you create a program without a main() function:
#import <iostream>
class NoMain
{
public:
NoMain()
{
std::cout << "Hello World!" << std::endl;
exit(0);
}
} mainClass;
I gave file name as noname.cpp, and the compile option is:
g++ nomain.cpp -Wl,-e,_mainClass -v
To tell the truth, I didn't fully understand why the code can works fine. I suspect that the address of global variable mainClass is the same to constructor of NoMain class. However, I also have several reasons that I could tell my guess may not correct.
I think the macro reference was to renaming the main function, the following is not my code, and demonstrates this. The compiler still sees a main function though, but technically there's no main from a source point of view. I got it here http://www.exforsys.com/forum/c-and-c/96849-without-main-function-how-post412181.html#post412181
#include<stdio.h>
#define decode(s,t,u,m,p,e,d) m##s##u##t
#define begin decode(a,n,i,m,a,t,e)
int begin()
{
printf(" hello ");
}
Disregarding specific language standards, most linking loader provides some means to declare a function name (entry point) which must be executed when the binary is loaded is loaded into memory.
For old school c language, default was something like 'start' or '_start', defined in so-called crt (c runtime?), which does several householding jobs needed for c standard functions, such as preparing memory heap, initialising static variable areas, parsing command line into argc/argv, etc.
Possibly you could override the entry point function if you take enough care not to use the standard functions which requires those household things (e.g. malloc(), free(), printf(), any class definitions have custom constructor, ...)
Quite restrictive but not impossible if you use functions provided by o/s, not by standard c runtime.
For example, you can make a simple helloworld using write() function on descriptor 1.
When C or C++ code runs, it executes at a known start address, the code here initialises the run-time environment, initialises the stack pointer, performs data initialisation, calls static constructors, then jumps to main().
The code that does this is linked with your code at build time by the linker. In GCC it is usually in crt0.s, with a commercial compiler it is unlikely that this code will be available to you.
In the end, it has to start somewhere and main() is just a symbolic name for that location. It is specified by the language standard so that developers know what to call it, otherwise code would not be portable from one tool chain to another.
If you are writing code for a 'bare-metal' system with no OS or at least no OS in the sense of a process loader (embedded systems often include an RTOS kernel that is started after main()) , then you can in theory call the C code entry point whatever you wish since you usually have complete control over run-time start-up code. But do do so would be foolish and somewhat perverse.
Some RTOS environments such as VxWorks, and most application frameworks in general include main() )or its equivalent) within their own library code so that it runs before the user application code. For example VxWorks applications start from usrAppInit(), and Win32 applications start from WinMain().
Write a class and print your name in the constructor of that class and declare a GLOBAL OBJECT of that class. So the class' constructor gets executed before main. So you can leave the main empty and still print your name.
class MyClass
{
myClass()
{
cout << "printing my name..." <<endl;
}
};
MyClass gObj; // this will trigger the constructor.
int main()
{
// nothing here...
}
I realize this is an old question, but I just found this out and had to share. It will probably not work with all linkers, but it's at least possible to trick ld (I'm running version 2.24.51.20140918) into thinking there is a main-function by doing this:
int main[] {};
or even just
int main;
You can then apply one of the aforementioned tricks to let the program execute some code, e.g. through the use of a constructor:
struct Main
{
Main()
{
cout << "Hello World!\n";
exit(0);
}
} main_;
The exit(0) is to prevent the array from being "called". Good fun :-)
Yes you can do it by changing the entry point of the C language from main() to _start
Here is the code :
#include<stdio.h>
#include<stdlib.h>
int sid()
{
printf("Hallo World\n");
exit(0);
}
Then run the code using gcc complier.Let's assume that you have saved the file with the name of test.c then.
gcc -nostartfiles test.c
./a.out
Maybe it's possible to compile a .data section and fill it with code?
It depends what they mean.
Did they mean:
Write a program with no main() function.
Then generally speaking no.
But there are ways to cheat.
You can use the pre-processor to hide main() in plain sight.
Most compiler allow you to specify the entry point into your code.
By default it is main(int,char*[])
Or did they mean:
Write a program that runs code without using main (to run your code).
This is a relatively simple trick. All objects in the global namespace run their constructors before main() is entered and destruction after main() exits. Thus all you need to do is define a class with a constructor that runs the code you want, then create an object in the global namespace.
Note: The compiler is allowed to optimize these objects for delayed load (but usually does not) but to be safe just put the global in the same file as the main function (that can be empty).
Function main is only default label for address where program will start execution. So technically yes it`s possible, but you have to set name of function that will start execution in your environment.
1) Using a macro that defines main
#include<stdio.h>
#define fun main
int fun(void)
{
printf("stackoverfow");
return 0;
}
Output:
stackoverflow
2) Using Token-Pasting Operator
The above solution has word ‘main’ in it. If we are not allowed to even write main, we ca use token-pasting operator (see this for details)
#include<stdio.h>
#define fun m##a##i##n
int fun()
{
printf("stackoverflow");
return 0;
}
yes it is possible to write a program without main().
But it uses main() indirectly.
following program will help you to understand..
#include<stdio.h>
#define decode(s,t,u,m,p,e,d) m##s##u##t
#define begin decode(a,n,i,m,a,r,e)
int begin()
{
printf(” you are inside main() which is hidden“);
}
The ‘##‘ operator is called the token pasting or token merging operator. That is we can merge two or more characters with it.
In the 2nd line of the program-
define decode(s,t,u,m,p,e,d) m##s##u##t
What is the preprocessor doing here. The macro decode(s,t,u,m,p,e,d) is being expanded as “msut” (The ## operator merges m,s,u & t into msut). The logic is when you pass (s,t,u,m,p,e,d) as argument it merges the 4th,1st,3rd & the 2nd characters(tokens)
Now look at the third line of the program –
define begin decode(a,n,i,m,a,r,e)
Here the preprocessor replaces the macro “begin” with the expansion decode(a,n,i,m,a,r,e). According to the macro definition in the previous line the argument must be expanded so that the 4th,1st,3rd & the 2nd characters must be merged. In the argument(a,n,i,m,a,r,e) 4th,1st,3rd & the 2nd characters are ‘m’,’a’,’i’ & ‘n’.
so it will replace begin by main() by the preprocessor before the program is passed on for the compiler. That’s it…
By using C++ constructors you write a C++ program without the main function with nothing it it. Let's say for example we can print a hello world without writing anything in the main function as follows:
class printMe{
private:
//
public:
printMe(){
cout<<"Hello Wold! "<<endl;
}
protected:
//
}obj;
int main(){}
According to standards, main() is required and the starting point of hosted environments. That's why you've to use tricks to hide the obvious looking main, like the trick posted above.
#include <stdio.h>
#define decode(s,t,u,m,p,e,d) m##s##u##t
#define begin decode(a,n,i,m,a,t,e)
int begin()
{
printf(" hello ");
}
Here, main is written by the macro tricks. It might not be clear at once, but it eventually leads to main. If this is the valid answer for your question, this could be done much easily, like this.
# include <stdio.h>
# define m main
int m()
{
printf("Hell0");
}