gdb: Cast depends on the language compiled - gdb

I was debugging a program compiled in Rust using GDB (arm-none-eabi-gdb). At one point, I wanted to write to a memory address as follow:
(gdb) set *((int *) 0x24040000) = 0x0000CAFE
syntax error in expression, near `) 0x24040000) = 0x0000CAFE'.
After multiple tentative, I found out that I was casting the C style and I had to cast it the Rust style as follow:
set *(0x24040000 as *mut i32) = 0x0000CAFE
My question is how GDB is interpreting the different commands and why I get this error. Is it because the symbol (int) is not recognized, but in this case, how gdb load the symbols? Does gdb need to compile the instruction to the correct language of the binary running on the target?

Yes, it depends on the language, and the language is deduced from the filename of the loaded source file.
Quoting the manual:
print and many other GDB commands accept an expression and compute its value. Any kind of constant, variable or operator defined by the programming language you are using is valid in an expression in GDB. This includes conditional expressions, function calls, casts, and string constants.
And:
If you are not interested in seeing the value of the assignment, use the set command instead of the print command. set is really the same as print except that the expression’s value is not printed and is not put in the value history (see Value History). The expression is evaluated only for its effects.
And:
Language-specific information is built into GDB for some languages, allowing you to express operations like the above in your program’s native language, and allowing GDB to output values in a manner consistent with the syntax of your program’s native language. The language you use to build expressions is called the working language.
And:
There are two ways to control the working language—either have GDB set it automatically, or select it manually yourself. You can use the set language command for either purpose. On startup, GDB defaults to setting the language automatically.
[..] most of the time GDB infers the language from the name of the file.

Related

(v) is actually (*&v) since when?

Could C++ standards gurus please enlighten me:
Since which C++ standard version has this statement failed because (v) seems to be equivalent to (*&v)?
I.e. for example the code:
#define DEC(V) ( ((V)>0)? ((V)-=1) : 0 )
...{...
register int v=1;
int r = DEC(v) ;
...}...
This now produces warnings under -std=c++17 like:
cannot take address of register variable
left hand side of operand must be lvalue
Many C macros enclose ALL macro parameters in parentheses, of which the above is meant only to be a representative example.
The actual macros that produce warnings are for instance
the RTA_* macros in /usr/include/linux/rtnetlink.h.
Short of not using/redefining these macros in C++, is there any workaround?
If you look at the revision summary of the latest C++1z draft, you'd see this in [diff.cpp14.dcl.dcl]
[dcl.stc]
Change: Removal of register storage-class-specifier.
Rationale: Enable repurposing of deprecated keyword in future
revisions of this International Standard.
Effect on original feature: A valid C++ 2014 declaration utilizing the register
storage-class-specifier is ill-formed in this International Standard.
The specifier can simply be removed to retain the original meaning.
The warning may be due to that.
register is no longer a storage class specifier, you should remove it. Compilers may not be issuing the right error or warnings but your code should not have register to begin with
The following is a quote from the standard informing people about what they should do with regards to register in their code (relevant part emphasized), you probably have an old version of that file
C.1.6 Clause 10: declarations [diff.dcl]
Change: In C++, register is not a storage class specifier.
Rationale: The storage class specifier had no effect in C++.
Effect on original feature: Deletion of semantically well-defined feature.
Difficulty of converting: Syntactic transformation.
How widely used: Common.
Your worry is unwarranted since the file in question does not actually contain the register keyword:
grep "register" /usr/include/linux/rtnetlink.h
outputs nothing. Either way, you shouldn't be receiving the warning since:
System headers don't emit warnings by default, at least in GCC
It isn't wise to try to compile a file that belongs to a systems project like the linux kernel in C++ mode, as there may be subtle and nasty breaking changes
Just include the file normally or link the C code to your C++ binary. Report a bug if you really are getting a warning that should normally be suppressed to your compiler vendor.

Are variable identifiers totally needless, at the end of the day?

I've taken a good time studying TOC and Compiler design, not done yet but I feel comfortable with the conceptions. On the other hand I have a very shallow knowledge of assembly and machine code, and I have always the desire/need to connect the two sides( HLL and LLL representation of the code ), as I'm learning C++ with paying great attention to performance and optimization discussions.
C++ is a statically typed language:
My question is: Our variables when written as expressions in the statements of the code, do all these variables ( and other entities with identifiers ) become at runtime, mere instructions of addressing to positions of the virtual memory ( for static and for globals ) and addressing relevant to stack address for local variables?
I mean, after a successful compilation including semantic and syntactic verification, isn't wise to deal with data at runtime as guaranteed entities of target memory bytes without any thinking of any identifier or any checking, with the symbol table no more needed?
If my question appeared to be the type of questions that are due to lacking of learning effort ( which I hope it doesn't ), please just inform me about that, and tell me where to read. If that was the case, then it's honestly because I'm concentrating on C++ nowadays and haven't got the chance yet to have a sound knowledge of low level languages, I apologize for that in advance.
You're spot on. Once compiled to machine code, there is no longer any notion of a variable identifier (or variable type, for that matter). It's just bytes at a certain location. Which location was determined by the compiler (when compiling) based on the variable name, or by the linker (when linking) in the case of global variables.
Of course, it can be useful to retain information such as identifiers, for debugging purposes. This is precisely what "compilation with debug information" means: when you do that, the compiler will somehow embed the (redundant) identifiers into the generated code such that a debugger can access them. Or put them in a separate file alongside; the details of that depend on the format of the debugging information.
Yes, mostly. There are a few details that will make identifiers remain more than just addresses or stack offsets.
First we have in RTTI in C++ which means that during runtime the name of at least types may still be available. For example:
const std::type_info &info = typeid(*ptr_interface);
std::cout << info.name() << std::endl;
would print the name of whatever type *ptr_interface is of.
Second, due to the way a program is linked the symbols from the object files may still be present in the executing image. You have for example the linux kernel making use of this as it can produce a backtrace of the stack including the function names. Also it uses knowledge of function names in order to be able to load and link modules. Similar functionality exists in Gnu C library, than when linked for it is able to retrieve function names in stack traces.
In normal cases though the code will not be affected by the original names of the variables (but the compiler will of course emit code suitable for the type the variable have).

gdb: size of a struct that isn't in context?

Sometimes I need to know size of a struct which is not in the scope (not even on the stack, i.e. frame-related commands won't help). E.g. it happens for debugging client + server communication, when restarting the apps to just break somewhere in context of the struct with the purpose of finding the size is uncomfortable and time consuming.
How do I find size of a struct defined in a header with disregard to my current context?
For C, gdb's "expression language" is just ordinary C expressions, with a few handy extensions for debugging. This is less true for C++, primarily because C++ is just much more difficult to parse, so there expression language tends to be a subset of C++ plus some gdb extensions.
So, the short answer is you can just type:
(gdb) print sizeof(mystruct)
However, there are caveats.
First, gdb's current language matters. You can find this with show language. In the case of a struct type, in C++ there is an automatic typedef, but in C there is not. So if you are using the auto language (and you usually should), and are stopped in a C frame, you will need to use the keyword:
(gdb) print sizeof(struct mystruct)
Now, this still may not work. The usual reason at this point is that the structure isn't used in your program, and so doesn't show up in the debug info. The debug info can be optimized out even if you think it ought to have been available, because it is up to the compiler. For example, I think if a struct is only used in sizeof expressions (and no variable is ever defined of that type), then I think (hard to remember for sure) that GCC won't emit DWARF for it.
You can check to see if the type is available using readelf or dwgrep, like:
$ readelf -wi myexecutableorlibrary | grep mystruct
(Though in real life I usually use less and then examine the DWARF DIEs carefully. You will need to know a little DWARF to make sense of this.)
Sometimes in gdb it's handy to use the "filename" extension to specify exactly which entity you mean. Like:
(gdb) print 'myfile.c'::variable
Not sure if that works for types, and anyway it shouldn't usually be necessary for them.
In C/C++, you have the sizeof function which will give you the size of any type (including struct) or variable.
I'm not sure if you can apply this while debugging but you could simply have a test program with the same headers (type definitions) tell you what the size of your types is.

Difference between newer implementation and older implementations

I am a newbie to Fortran. Please look at the code below:
c main program
call foo(2)
print*, 2
stop
end
subroutine foo(x)
x = x + 1
return
end
In some implementations of Fortran IV, the above code would print a 3. Why is that? Can you suggest an explanation?
How do you suppose more recent Fortran implementations get around the problem?
Help is very much appreciated. Thank You.
The program breaks the language rules - the dummy argument x in the subroutine is modified via the line x = x + 1, but it is associated with something that is an expression (a simple constant). In general, values that result from expressions cannot be modified.
That specific code is still syntactically valid Fortran 2008. It remains a programming error in Fortran 2008 - as it was in Fortran IV/66. This isn't something that compilers are required to diagnose. Some may, perhaps with additional debugging options, and perhaps not till runtime.
Because the program breaks the language rules anything could happen when you run the program. Exactly what depends on the code generated by the compiler. Compilers may have set aside modifiable storage for the value that results from the expression such that it internally looks like a variable (the program might print three and the program carries on), that modifiable storage might be shared across the program for other instances of the constant 2 (suddenly the value of 2 becomes three everywhere!), the storage for the value of the constant might in non-modifiable memory (the program may crash), the compiler may issue an error message, the program may get upset and sulk in its bedroom, the program might declare war on a neighbouring nation - it is a programming error - what happens is unspecified.
As of Fortran 90, facilities were introduced into the language to allow programmers to write new code that is practical for compilers to check for errors such as these (and in some cases compilers are required to check for errors if they are to be regarded as standard conforming).
For the code as presented, the main program and the subroutine are to be regarded as separately compiled - the main program is unaware of the details of the subroutine and vice versa (it is possible that the subroutine could be compiled long after the main program, on a different machine, with the outputs of the two being linked together at some later stage - without fancy link time behaviour or static analysis it is therefore not possible to resolve errors such as this). Language rules are such that when compiling the main program the compiler must implicitly assume the details of the interface of the subroutine based only on the way the subroutine is referenced - inside the main program the subroutine has an implicit interface.
Fortran 90 introduced the concept of an explicit interface, where the compiler is explicitly told what the interface of the subroutine in various ways, and can then check that any reference to the subroutine is consistent with that interface. If a procedure is a module procedure, internal procedure or intrinsic procedure - that interface is automatically realized, alternatively for external subprograms, procedure pointers, etc, the programmer can explicitly describe the interface using an interface block.
In addition, Fortran 90 introduced the intent attribute - a characteristic of a dummy argument of a procedure that is also then a characteristic of the interface for a procedure. The intent of the argument indicates to the compiler whether the procedure may define the argument (it also may implications for default initialization and component allocation status) and hence whether an expression could be a valid actual argument. x in subroutine foo would typically be declared INTENT(INOUT).
Collectively these new language features provide a robust defence against this sort of programming error when using compilers with a basic level of implementation quality. If you are starting with the language then it is recommended that these new features become part of your standard approach - i.e. use implicit none, all procedures should generally be module procedures or internal procedures, use external procedures only when absolutely required, always specify dummy argument intent, use free form source.

g++ generated Assembly looks ugly

I'm quite familiar with gcc assembly... Recently I was forced to use g++ for some code cleanup. Let me mention I'm very familiar with assembly, hence out of curiosity I often take a look at how good the compiler generated asm is.
But the naming conventions with g++ are just bizarre. I was wondering if there are any guidelines on how to read its asm output ?
Thanks a lot.
I don't find g++'s asm 'ugly' or hard to understand, though I've been working with GCC for over 8 years now.
On Linux, function labels usually go by _ZN, The "_ZN" prefix being a token that designates C++ name mangling (as opposed to C), followed by namespace the function belongs, then function names and argument types, then templates, if any.
Example:
// tests::vec4::testEquality()
_ZN5tests4vec412testEqualityEv
_ZN - C++ mangling, 'N' for member (_ZZ for const or others)
5tests - length (5 chars) + name
4vec4 -length (4 chars) + sub namespace
12testEquality - length (12 chars) + function name
Ev - void argument (none)
From man g++:
-fverbose-asm
Put extra commentary information in the generated assembly code to make it more
readable. This option is generally only of use to those who actually need to read the
generated assembly code (perhaps while debugging the compiler itself).
If you're looking at the naming convention for external symbols then this will follow the name mangling convention of the platform that you are using. It can be reversed with the c++filt program which will give you the human readable version of C++ function names, although they will (in all probability) no longer be valid linker symbols.
If you're just looking at local function labels, then you're out of luck. g++'s assembler output is for talking to the assembler and not really designed for ease of human comprehension. It's going to generate a set of relatively meaningless labels.
If the code has debugging information, objdump can provide a more helpful disassembly :
-S, --source Intermix source code with disassembly
-l, --line-numbers Include line numbers and filenames in output
For people who are working on demangling those names inside the program (like me), hopefully this thread helps.
def demangle(name):
import subprocess as sp
stdout, _ = sp.Popen(['c++filt', name],
stdin=sp.PIPE, stdout=sp.PIPE).communicate()
return stdout.split("\n")[0]
print demangle('_ZNSt15basic_stringbufIcSt11char_traitsIcESaIcEE17_M_stringbuf_initESt13_Ios_Openmode')