Static Values in Assembler Code - c++

I have the following simple code:
#include <cmath>
struct init_sin
{
typedef double type;
static constexpr type value(int index) {
return 3*std::pow(std::sin(index * 2.0 * 3.1415 / 20.0),1.999);
}
};
int main(){
static double VALUE = init_sin::value(10);
double VALUE_NONSTAT = 3*std::pow(std::sin(10 * 2.0 * 3.1415 / 20.0),1.999);
return int(VALUE_NONSTAT);
}
I would like to find out what the meaning of the assembler code is of this given piece.
Here the link to the assembly: http://pastebin.com/211AfSYh
I thought that VALUE is compile time computed and directly as value in the assembler code
Which should be in this line if I am not mistaken:
33 .size main, .-main
34 .data
35 .align 8
36 .type _ZZ4mainE5VALUE, #object
GAS LISTING /tmp/ccbPDNK8.s page 2
37 .size _ZZ4mainE5VALUE, 8
38 _ZZ4mainE5VALUE:
39 0000 15143B78 .long 2017137685
40 0004 45E95B3E .long 1046210885
Why are there two values with .long ? And why are the types long? (its a double?, maybe in assembler there is only long.
Does that mean that the value of VALUE was compile time generated
Where is the result for VALUE_NON_STATIC? This should be computed during run-time right? I cannot quite see where?
Thanks a lot!

.long in this assembler syntax implies a 32-bit number. Because a double is 64-bits, what you're seeing there is the two 32-bit parts of VALUE, in their double representation. You'll also notice above it that it's being aligned to an 8-byte boundary (through the .align statement) and that it's size is 8 (through the .size statement). Also, it's in the main .data segment, which is typically used for global-scope variables which are not initialised to zero (as a side-note, .bss is typically used to zero-initialised global scope variables).
The VALUE_NONSTAT can be seen being loaded into %rax here, which is the 64-bit version of the AX register:
V
20 0004 48B81514 movabsq $4493441537811354645, %rax
20 3B7845E9
20 5B3E
Recalling that 15143B7845E95B3E is the representation of the value of 3*std::pow(std::sin(index * 2.0 * 3.1415 / 20.0),1.999) when stored in a double, you can see the internal value in hex starting around where I inserted a V.
Later statements then push it onto the stack (movq %rax, -8(%rbp)), then load it into an FP register (movsd -8(%rbp), %xmm0) before converting it to an integer and storing it in %eax, which is the register for return values (cvttsd2si %xmm0, %eax) and then returning from the routine, using ret.
In any case, at the optimisation level you're using (and probably below), your compiler has figured out that VALUE_NONSTAT is a constant expression, and just inlined it at compile time instead, since the value is fully known at compile time.

Related

Which is the faster operation? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have two variable a and b.I have to write a if condition on variable a and b:
This is First Approach:
if(a > 0 || b >0){
//do some things
}
This is second Approach:
if((a+b) > 0){
//do some thing
}
Update: consider a and b are unsigned.then which will take lesser execution time between logical or(||) and arithmetic (+ )operator
this condition will Iterate around one million times.
Any help on this will be appreciated.
Your second condition is wrong. If a=1, b=-1000, it will evaluate to false, whereas your first condition will be evaluated to true. In general you shouldn't worry about speed at these kind of tests, the compiler optimizes the condition a lot, so a logical OR is super fast. In general, people are making bigger mistakes than optimizing such conditions... So don't try to optimize unless you really know what is going on, the compiler is in general doing a much better job than any of us.
In principle, in the first expression you have 2 CMP and one OR, whereas in the second, you have only one CMP and one ADD, so the second should be faster (even though the complier does some short-circuit in the first case, but this cannot happen 100% of the time), however in your case the expressions are not equivalent (well, they are for positive numbers...).
I decided to check this for C language, but identical arguments apply to C++, and similar arguments apply to Java (except Java allows signed overflow). Following code was tested (for C++, replace _Bool with bool).
_Bool approach1(int a, int b) {
return a > 0 || b > 0;
}
_Bool approach2(int a, int b) {
return (a + b) > 0;
}
And this was resulting disasembly.
.file "faster.c"
.text
.p2align 4,,15
.globl approach1
.type approach1, #function
approach1:
.LFB0:
.cfi_startproc
testl %edi, %edi
setg %al
testl %esi, %esi
setg %dl
orl %edx, %eax
ret
.cfi_endproc
.LFE0:
.size approach1, .-approach1
.p2align 4,,15
.globl approach2
.type approach2, #function
approach2:
.LFB1:
.cfi_startproc
addl %esi, %edi
testl %edi, %edi
setg %al
ret
.cfi_endproc
.LFE1:
.size approach2, .-approach2
.ident "GCC: (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388]"
.section .note.GNU-stack,"",#progbits
Those codes are quite different, even considering how clever the compilers are these days. Why is that so? Well, the reason is quite simple - they aren't identical. If a is -42 and b is 2, the first approach will return true, and second will return false.
Surely, you may think that a and b should be unsigned.
.file "faster.c"
.text
.p2align 4,,15
.globl approach1
.type approach1, #function
approach1:
.LFB0:
.cfi_startproc
orl %esi, %edi
setne %al
ret
.cfi_endproc
.LFE0:
.size approach1, .-approach1
.p2align 4,,15
.globl approach2
.type approach2, #function
approach2:
.LFB1:
.cfi_startproc
addl %esi, %edi
testl %edi, %edi
setne %al
ret
.cfi_endproc
.LFE1:
.size approach2, .-approach2
.ident "GCC: (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388]"
.section .note.GNU-stack,"",#progbits
It's quite easy to notice that approach1 is better here, because it doesn't do pointless addition, which is in fact, quite wrong. In fact, it even makes an optimization to (a | b) != 0, which is correct optimization.
In C, unsigned overflows are defined, so the compiler has to handle the case when integers are very high (try INT_MAX and 1 for approach2). Even assuming you know the numbers won't overflow, it's easy to notice approach1 is faster, because it simply tests if both variables are 0.
Trust your compiler, it will optimize better than you, and that without small bugs that you could accidentally write. Write code instead of asking yourself whether i++ or ++i is faster, or if x >> 1 or x / 2 is faster (by the way, x >> 1 doesn't do the same thing as x / 2 for signed numbers, because of rounding behavior).
If you want to optimize something, optimize algorithms you use. Instead of using worst case O(N4) sorting algorithm, use worst case O(N log N) algorithm. This will actually make program faster, especially if you sort reasonably big arrays
The real answer for this is always to do both and actually test which one runs faster. That's the only way to know for sure.
I would guess the second one would run faster, because an add is a quick operation but a missed branch causes pipeline clears and all sort of nasty things. It would be data dependent though. But it isn't exactly the same, if a or b is allowed to be negative or big enough for overflow then it isn't the same test.
Well, I wrote some quick code and disassembled:
public boolean method1(final int a, final int b) {
if (a > 0 || b > 0) {
return true;
}
return false;
}
public boolean method2(final int a, final int b) {
if ((a + b) > 0) {
return true;
}
return false;
}
These produce:
public boolean method1(int, int);
Code:
0: iload_1
1: ifgt 8
4: iload_2
5: ifle 10
8: iconst_1
9: ireturn
10: iconst_0
11: ireturn
public boolean method2(int, int);
Code:
0: iload_1
1: iload_2
2: iadd
3: ifle 8
6: iconst_1
7: ireturn
8: iconst_0
9: ireturn
So as you can see, they're pretty similar; the only difference is performing a > 0 test vs a + b; looks like the || got optimized away. What the JIT compiler optimizes these to, I have no clue.
If you wanted to get really picky:
Option 1: Always 1 load and 1 comparison, possible 2 loads and 2 comparisons
Option 2: Always 2 loads, 1 addition, 1 comparison
So really, which one performs better depends on what your data looks like and whether there is a pattern the branch predictor can use. If so, I could imagine the first method running faster because the processor basically "skips" the checks, and in the best case only has to perform half the operations the second option will. To be honest, though, this really seems like premature optimization, and I'm willing to bet that you're much more likely to get more improvement elsewhere in your code. I don't find basic operations to be bottlenecks most of the time.
Two things:
(a|b) > 0 is strictly better than (a+b) > 0, so replace it.
The above two only work correctly if the numbers are both unsigned.
If a and b have the potential to be negative numbers, the two choices are not equivalent, as has been pointed out by the answer by #vsoftco.
If both a and b are guaranteed to be non-negative integers, I would use
if ( (a|b) > 0 )
instead of
if ( (a+b) > 0 )
I think bitwise | is faster than integer addition.
Update
Use bitwise | instead of &.

Boolean multiplication in c++?

Consider the following:
inline unsigned int f1(const unsigned int i, const bool b) {return b ? i : 0;}
inline unsigned int f2(const unsigned int i, const bool b) {return b*i;}
The syntax of f2 is more compact, but do the standard guarantees that f1 and f2 are strictly equivalent ?
Furthermore, if I want the compiler to optimize this expression if b and i are known at compile-time, which version should I prefer ?
Well, yes, both are equivalent. bool is an integral type and true is guaranteed to convert to 1 in integer context, while false is guaranteed to convert to 0.
(The reverse is also true, i.e. non-zero integer values are guaranteed to convert to true in boolean context, while zero integer values are guaranteed to convert to false in boolean context.)
Since you are working with unsigned types, one can easily come up with other, possibly bit-hack-based yet perfectly portable implementations of the same thing, like
i & -(unsigned) b
although a decent compiler should be able to choose the best implementation by itself for any of your versions.
P.S. Although to my great surprise, GCC 4.1.2 compiled all three variants virtually literally, i.e. it used machine multiplication instruction in multiplication-based variant. It was smart enough to use cmovne instruction on the ?: variant to make it branchless, which quite possibly made it the most efficient implementation.
Yes. It's safe to assume true is 1 and false is 0 when used in expressions as you do and is guaranteed:
C++11, Integral Promotions, 4.5:
An rvalue of type bool can be converted to an rvalue of type int, with
false becoming zero and true becoming one.
The compiler will use implicit conversion to make an unsigned int from b, so, yes, this should work. You're skipping the condition checking by simple multiplication. Which one is more effective/faster? Don't know. A good compiler would most likely optimize both versions I'd assume.
FWIW, the following code
inline unsigned int f1(const unsigned int i, const bool b) {return b ? i : 0;}
inline unsigned int f2(const unsigned int i, const bool b) {return b*i;}
int main()
{
volatile unsigned int i = f1(42, true);
volatile unsigned int j = f2(42, true);
}
compiled with gcc -O2 produces this assembly:
.file "test.cpp"
.def ___main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 2,,3
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB2:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl $42, 8(%esp) // i
movl $42, 12(%esp) // j
xorl %eax, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE2:
There's not much left of either f1 or f2, as you can see.
As far as C++ standard is concerned, the compiler is allowed to do anything with regards to optimization, as long as it doesn't change the observable behaviour (the as if rule).

Must I initialize floats using 0.f?

When I initialize float variables in my program, I commonly have vectors like:
Vector forward(0.f,0.f,-1.f),right(1.f,0.f,0.f),up(0.f,1.f,0.f)
(Vectors are just 3 floats like struct Vector{ float x,y,z; };)
This looks much easier to read as:
Vector forward(0,0,-1),right(1,0,0),up(0,1,0)
Must I initialize my float variables using floats? Am I losing anything or incurring some kind of penalty when I use integers (or doubles) to initialize a float?
There's no semantic difference between the two. Depending on some compilers, it is possible for extra code to be generated, though. See also this and this SO questions of the same topic.
I can confirm that gcc generates the same code for all variants of
int main()
{
float a = 0.0f; /* or 0 or 0.0 */
return 0;
}
and that this code is
.file "1.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0x00000000, %eax
movl %eax, -4(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",#progbits
The relevant line is
movl $0x00000000, %eax
Changing a to 0.1 (or 0.1f) changes the line to
movl $0x3dcccccd, %eax
It seems that gcc is able to deduce the correct constant and doesn't generate extra code.
For a single literal constant, it shouldn't matter. In the context of an initializer, a constant of any numeric type will be implicitly converted to the type of the object being initialized. This is guaranteed by the language standard. So all of these:
float x = 0;
float x = 0.0;
float x = 0.0f;
float x = 0.0L; // converted from long double to float
are equally valid and result in the same value being stored in x.
A literal constant in a more complex expression can have surprising results, though.
In most cases, each expression is evaluated by itself, regardless of the context in which it appears. Any implicit conversion is applied after the subexpression has been evaluated.
So if you write:
float x = 1 / 2;
the expression 1 / 2 will be evaluated as an int, yielding 0, which is then converted to float. It will setxto0.0f, not to0.5f`.
I think you should be safe using unsuffixed floating-point constants (which are of type double).
Incidentally, you might consider using double rather than float in your program. double, as I mentioned, is the type of an unsuffixed floating-point constant, and can be thought of in some sense as the "default" floating-point type. It usually has more range and precision than float, and there's typically not much difference in performance.
It could be a good programming practise to always write 0.f, 1.f etc., even if often gcc can figure out what the programmer means by 1.0 et al.
The problematic cases are not so much trivial float variable initializations, but numeric constants in somewhat more complex formulae, where a combination of operators, float variables and said constants can easily lead to occurrence of unintended double valued calculations and costly float-double-float conversions.
Spotting these conversions without specifically checking the compiled code for them becomes very hard if the intended type for numeric values is mostly omitted in the code and instead only included when it's absolutely required. Hence I for one would choose the approach of just typing in the f's and getting used to having them around.

Function call cost

The application I am dealing with right now uses some brute-force numerical algorithm that calls many tiny functions billions of times. I was wandering how much the performance can be improved by eliminating function calls using inclining and static polymorphism.
What is the cost of calling a function relative to calling non-inline and non-intrinsic function in the following situations:
1) function call via function pointer
2) virtual function call
I know that it is hard to measure, but a very rough estimate would do it.
Thank you!
To make a member function call compiler needs to:
Fetch address of function -> Call function
To call a virtual function compiler needs to:
Fetch address of vptr -> Fetch address of the function -> Call function
Note: That virtual mechanism is compiler implementation detail, So the implementation might differ for different compilers, there may not even be a vptr or vtable for that matter. Having said So Usually, compilers implement it with vptr and vtable and then above holds true.
So there is some overhead for sure(One additional Fetch), To know precisely how much it impacts, you will have to profile your source code there is no simpler way.
It depends on your target architecture and your compiler, but one thing you can do is write a small test and check the assembly generated.
I did one to do the test:
// test.h
#ifndef FOO_H
#define FOO_H
void bar();
class A {
public:
virtual ~A();
virtual void foo();
};
#endif
// main.cpp
#include "test.h"
void doFunctionPointerCall(void (*func)()) {
func();
}
void doVirtualCall(A *a) {
a->foo();
}
int main() {
doFunctionPointerCall(bar);
A a;
doVirtualCall(&a);
return 0;
}
Note that you don't even need to write test.cpp, since you just need to check the assembly for main.cpp.
To see the compiler assembly output, with gcc use the flag -S:
gcc main.cpp -S -O3
It will create a file main.s, with the assembly output.
Now we can see what gcc generated to the calls.
doFunctionPointerCall:
.globl _Z21doFunctionPointerCallPFvvE
.type _Z21doFunctionPointerCallPFvvE, #function
_Z21doFunctionPointerCallPFvvE:
.LFB0:
.cfi_startproc
jmp *%rdi
.cfi_endproc
.LFE0:
.size _Z21doFunctionPointerCallPFvvE, .-_Z21doFunctionPointerCallPFvvE
doVirtualCall:
.globl _Z13doVirtualCallP1A
.type _Z13doVirtualCallP1A, #function
_Z13doVirtualCallP1A:
.LFB1:
.cfi_startproc
movq (%rdi), %rax
movq 16(%rax), %rax
jmp *%rax
.cfi_endproc
.LFE1:
.size _Z13doVirtualCallP1A, .-_Z13doVirtualCallP1A
Note here I'm using a x86_64, that the assembly will change for other achitectures.
Looking to the assembly, it looks like it is using two extra movq for the virtual call, it probably is some offset in the vtable. Note that in a real code, it would need to save some registers (be it function pointer or virtual call), but the virtual call would still need two extra movq over function pointer.
Just use a profiler like AMD's codeanalyst (using IBS and TBS), else you can go the more 'hardcore' route and give Agner Fog's optimization manuals a read (they will help both for precision instruction timings and optimizing your code): http://www.agner.org/optimize/
Function calls are a significant overhead if the functions are small. The CALL and RETURN while optimized on modern CPUs will still be noticeable when many many calls are made. Also the small functions could be spread across memory so the CALL/RETURN may also induce cache misses and excessive paging.
//code
int Add(int a, int b) { return a + b; }
int main() {
Add(1, Add(2, 3));
...
}
// NON-inline x86 ASM
Add:
MOV eax, [esp+4] // 1st argument a
ADD eax, [esp+8] // 2nd argument b
RET 8 // return and fix stack 2 args * 4 bytes each
// eax is the returned value
Main:
PUSH 3
PUSH 2
CALL [Add]
PUSH eax
PUSH 1
CALL [Add]
...
// INLINE x86 ASM
Main:
MOV eax, 3
ADD eax, 2
ADD eax, 1
...
If optimization is your goal and you're calling many small functions, it's always best to inline them. Sorry, I don't care for the ugly ASM syntax used by c/c++ compilers.

storing structs in ROM on ARM device

I have some constant data that I want to store in ROM since there is a fair amount of it and I'm working with a memory-constrained ARM7 embedded device. I'm trying to do this using structures that look something like this:
struct objdef
{
int x;
int y;
bool (*function_ptr)(int);
some_other_struct * const struct_array; // array of similar structures
const void* vp; // previously ommittted to shorten code
}
which I then create and initialize as globals:
const objdef def_instance = { 2, 3, function, array, NULL };
However, this eats up quite a bit of RAM despite the const at the beginning. More specifically, it significantly increases the amount of RW data and eventually causes the device to lock up if enough instances are created.
I'm using uVision and the ARM compiler, along with the RTX real-time kernel.
Does anybody know why this doesn't work or know a better way to store structured heterogenous data in ROM?
Update
Thank you all for your answers and my apologies for not getting back to you guys earlier. So here is the score so far and some additional observations on my part.
Sadly, __attribute__ has zero effect on RAM vs ROM and the same goes for static const. I haven't had time to try the assembly route yet.
My coworkers and I have discovered some more unusual behavior, though.
First, I must note that for the sake of simplicity I did not mention that my objdef structure contains a const void* field. The field is sometimes assigned a value from a string table defined as
char const * const string_table [ROWS][COLS] =
{
{ "row1_1", "row1_2", "row1_3" },
{ "row2_1", "row2_2", "row2_3" },
...
}
const objdef def_instance = { 2, 3, function, array, NULL };//->ROM
const objdef def_instance = { 2, 3, function, array, string_table[0][0] };//->RAM
string_table is in ROM as expected. And here's the kicker: instances of objdef get put in ROM until one of the values in string_table is assigned to that const void* field. After that the struct instance is moved to RAM.
But when string_table is changed to
char const string_table [ROWS][COLS][MAX_CHARS] =
{
{ "row1_1", "row1_2", "row1_3" },
{ "row2_1", "row2_2", "row2_3" },
...
}
const objdef def_instance = { 2, 3,function, array, NULL };//->ROM
const objdef def_instance = { 2, 3, function, array, string_table[0][0] };//->ROM
those instances of objdef are placed in ROM despite that const void* assigment. I have no idea why this should matter.
I'm beginning to suspect that Dan is right and that our configuration is messed up somewhere.
I assume you have a scatterfile that separates your RAM and ROM sections. What you want to do is to specify your structure with an attribute for what section it will be placed, or to place this in its own object file and then specify that in the section you want it to be in the scatterfile.
__attribute__((section("ROM"))) const objdef def_instance = { 2, 3, function, array };
The C "const" keyword doesn't really cause the compiler to put something in the text or const section. It only allows the compiler to warn you of attempts to modify it. It's perfectly valid to get a pointer to a const object, cast it to a non-const, and write to it, and the compiler needs to support that.
Your thinking is correct and reasonable. I've used Keil / uVision (this was v3, maybe 3 years ago?) and it always worked how you expected it to, i.e. it put const data in flash/ROM.
I'd suspect your linker configuration / script. I'll try to go back to my old work & see how I had it configured. I didn't have to add #pragma or __attribute__ directives, I just had it place .const & .text in flash/ROM. I set up the linker configuration / memory map quite a while ago, so unfortunately, my recall isn't very fresh.
(I'm a bit confused by people who are talking about casting & const pointers, etc... You didn't ask anything about that & you seem to understand how "const" works. You want to place the initialized data in flash/ROM to save RAM (not ROM->RAM copy at startup), not to mention a slight speedup at boot time, right? You're not asking if it's possible to change it or whatever...)
EDIT / UPDATE:
I just noticed the last field in your (const) struct is a some_other_struct * const (constant pointer to a some_other_struct). You might want to try making it a (constant) pointer to a constant some_other_struct [some_other_struct const * const] (assuming what it points to is indeed constant). In that case it might just work. I don't remember the specifics (see a theme here?), but this is starting to seem familiar. Even if your pointer target isn't a const item, and you can't eventually do this, try changing the struct definition & initializing it w/ a pointer to const and just see if that drops it into ROM. Even though you have it as a const pointer and it can't change once the structure is built, I seem to remember something where if the target isn't also const, the linker doesn't think it can be fully initialized at link time & defers the initialization to when the C runtime startup code is executed, incl. the ROM to RAM copy of initialized RW memory.
You could always try using assembly language.
Put in the information using DATA statements and publish (make public) the starting addresses of the data.
In my experience, large Read-Only data was declared in a source file as static const. A simple global function inside the source file would return the address of the data.
If you are doing stuff on ARM you are probably using the ELF binary format. ELF files contain an number of sections but constant data should find its way into .rodata or .text sections of the ELF binary. You should be able to check this with the GNU utility readelf or the RVCT utility fromelf.
Now assuming you symbols find themselves in the correct part of the elf file, you now need to find out how the RTX loader does its job. There is also no reason why the instances cannot share the same read only memory but this will depend on the loader. If the executable is stored in the rom, it may be run in-place but may still be loaded into RAM. This also depends on the loader.
A complete example would have been best. If I take something like this:
typedef struct
{
char a;
char b;
} some_other_struct;
struct objdef
{
int x;
int y;
const some_other_struct * struct_array;
};
typedef struct
{
int x;
int y;
const some_other_struct * struct_array;
} tobjdef;
const some_other_struct def_other = {4,5};
const struct objdef def_instance = { 2, 3, &def_other};
const tobjdef tdef_instance = { 2, 3, &def_other};
unsigned int read_write=7;
And compile it with the latest codesourcery lite
arm-none-linux-gnueabi-gcc -S struct.c
I get
.arch armv5te
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 6
.eabi_attribute 18, 4
.file "struct.c"
.global def_other
.section .rodata
.align 2
.type def_other, %object
.size def_other, 2
def_other:
.byte 4
.byte 5
.global def_instance
.align 2
.type def_instance, %object
.size def_instance, 12
def_instance:
.word 2
.word 3
.word def_other
.global tdef_instance
.align 2
.type tdef_instance, %object
.size tdef_instance, 12
tdef_instance:
.word 2
.word 3
.word def_other
.global read_write
.data
.align 2
.type read_write, %object
.size read_write, 4
read_write:
.word 7
.ident "GCC: (Sourcery G++ Lite 2010.09-50) 4.5.1"
.section .note.GNU-stack,"",%progbits
With the section marked as .rodata, which I would assume is desired. Then it is up to the linker script to make sure that ro data is put in rom. And note the read_write variable is after switching from .rodata to .data which is read/write.
So to make this a complete binary and see if it gets placed in rom or ram (.text or .data) then
start.s
.globl _start
_start:
b reset
b hang
b hang
b hang
b hang
b hang
b hang
b hang
reset:
hang: b hang
Then
# arm-none-linux-gnueabi-gcc -c -o struct.o struct.c
# arm-none-linux-gnueabi-as -o start.o start.s
# arm-none-linux-gnueabi-ld -Ttext=0 -Tdata=0x1000 start.o struct.o -o struct.elf
# arm-none-linux-gnueabi-objdump -D struct.elf > struct.list
And we get
Disassembly of section .text:
00000000 <_start>:
0: ea000006 b 20 <reset>
4: ea000008 b 2c <hang>
8: ea000007 b 2c <hang>
c: ea000006 b 2c <hang>
10: ea000005 b 2c <hang>
14: ea000004 b 2c <hang>
18: ea000003 b 2c <hang>
1c: ea000002 b 2c <hang>
00000020 <reset>:
20: e59f0008 ldr r0, [pc, #8] ; 30 <hang+0x4>
24: e5901000 ldr r1, [r0]
28: e5801000 str r1, [r0]
0000002c <hang>:
2c: eafffffe b 2c <hang>
30: 00001000 andeq r1, r0, r0
Disassembly of section .data:
00001000 <read_write>:
1000: 00000007 andeq r0, r0, r7
Disassembly of section .rodata:
00000034 <def_other>:
34: 00000504 andeq r0, r0, r4, lsl #10
00000038 <def_instance>:
38: 00000002 andeq r0, r0, r2
3c: 00000003 andeq r0, r0, r3
40: 00000034 andeq r0, r0, r4, lsr r0
00000044 <tdef_instance>:
44: 00000002 andeq r0, r0, r2
48: 00000003 andeq r0, r0, r3
4c: 00000034 andeq r0, r0, r4, lsr r0
And that achieved the desired result. The read_write variable is in ram, the structs are in the rom. Need to make sure both the const declarations are in the right places, the compiler gives no warnings about say putting a const on some pointer to another structure that it may not determine at compile time as being a const, and even with all of that getting the linker script (if you use one) to work as desired can take some effort. for example this one seems to work:
MEMORY
{
bob(RX) : ORIGIN = 0x0000000, LENGTH = 0x8000
ted(WAIL) : ORIGIN = 0x2000000, LENGTH = 0x8000
}
SECTIONS
{
.text : { *(.text*) } > bob
.data : { *(.data*) } > ted
}