avr-gdb: No symbol "x" in current context - gdb

I'm debugging this function compiled with -g -ggdb -gstabs -Og on an AVR 8-bit controller:
usbMsgLen_t usbFunctionSetup(uchar data[8]) {
usbRequest_t *req = (void *)data;
if (req->bRequest == CUSTOM_REQ_TMP) {
// temperature in °C multiplied by 10
int16_t tmpx10 = (mVAvgTmp >> EWMA_BS) - 500;
// relative humidity in % multiplied by 10
uint32_t rhx10 = (mvAvgRh * 100 - (75750 << EWMA_BS)) / (318 << EWMA_BS);
static char msg[16];
snprintf(msg, sizeof(msg), "%d|%ld", tmpx10, rhx10);
usbMsgPtr = (usbMsgPtr_t)msg;
asm("break");
return sizeof(msg);
}
// return no data for unimplemented requests
return 0;
}
When execution stops at the manually inserted break instruction, avr-gdb prints the value of msg with the correct values of tmpx10 and rhx10:
(gdb) p msg
$1 = "200|528\000\000\000\000\000\000\000\000"
but for tmpx10 it prints a wrong value and for rhx10, it says No symbol "rhx10" in current context.:
(gdb) p tmpx10
$2 = 116
(gdb) p rhx10
No symbol "rhx10" in current context.
even though both variables are in the same scope as msg.
What am I missing?

-Og will perform minimal optimisation, but it's not no optimisation. At the point you are stopped both tmpx10 and rhx10 are no longer needed, so it is possible that their locations have been reused for something else (tmpx10) or maybe that the debug info doesn't even describe the location any more (rhx10).
You're right, that -Og should ensure this doesn't happen, and you might consider raising a compiler bug, but the reality is that, especially when there's optimisation in use, getting these cases right all the time is hard.
I'm also curious why you're using -gstabs. If you're compiler is GCC, and you're debugging with GDB, I'd have though using the default DWARF would give you better results.
Finally, when I compile for debugging, I use -g3 -O0 which include the maximum debug information and the least (no) optimisation. This might give you better results.

Related

Stack Smashing in GCC vs Clang (Possibly due to canaries)

I am trying to understand possible sources for "stack smashing" errors in GCC, but not Clang.
Specifically, when I compile a piece of code with just debug symbols
set(CMAKE_CXX_FLAGS_DEBUG "-g")
and use the GCC C++ compiler (GNU 5.4.0), the application crashes with
*** stack smashing detected ***: ./testprogram terminated
Aborted (core dumped)
However, when I use Clang 3.8.0, the program completes without error.
My first thought was that perhaps the canaries of GCC are catching a buffer overrun that Clang isn't. So I added the additional debug flag
set(CMAKE_CXX_FLAGS_DEBUG "-g -fstack-protector-all")
But Clang still compiles a program that runs without errors. To me this suggests that the issue likely is not a buffer overrun (as you commonly see with stack smashing errors), but an allocation issue.
In any case, when I add in the ASAN flags:
set(CMAKE_CXX_FLAGS_DEBUG "-g -fsanitize=address")
Both compilers yield a program that crashes with an identical error. Specifically,
GCC 5.4.0:
==1143==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12)
==1143==ReserveShadowMemoryRange failed while trying to map 0xdfff0001000 bytes. Perhaps you're using ulimit -v
Aborted (core dumped)
Clang 3.8.0:
==1387==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12)
==1387==ReserveShadowMemoryRange failed while trying to map 0xdfff0001000 bytes. Perhaps you're using ulimit -v
Aborted (core dumped)
Can somebody give me some hints on the likely source of this error? I am having an awefully hard time tracing down the line where this is occurring, as it is in a very large code base.
EDIT
The issue is unresolved, but is isolated to the following function:
void get_sparsity(Data & data) {
T x[n_vars] = {};
T g[n_constraints] = {};
for (Index j = 0; j < n_vars; j++) {
const T x_j = x[j];
x[j] = NAN;
eval_g(n_vars, x, TRUE, n_constraints, g, &data);
x[j] = x_j;
std::vector<Index> nonzero_entries;
for (Index i = 0; i < n_constraints; i++) {
if (isnan(g[i])) {
data.flattened_nonzero_rows.push_back(i);
data.flattened_nonzero_cols.push_back(j);
nonzero_entries.push_back(i);
}
}
data.nonzeros.push_back(nonzero_entries);
}
int internal_debug_point = 5;
}
which is called like this:
get_sparsity(data);
int external_debug_point= 6;
However, when I put a debug point on the last line of the get_sparsity function, internal_debug_point = 5, it reaches that line without issue. However, when exiting the function, and before it hits the external debug point external_debug_point = 6, it crashes with the error
received signal SIGABRT, Aborted.
0x00007ffffe315428 in __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
My guess is that GCC is only checking the canaries when exiting that function, and hence the error is actually occurring inside the function. Does that sound reasonable? If so, then is there a way to get GCC or clang to do more frequent canary checks?
I suspect ASan is running out of memory.
I don't think the ASan errors mean your program is trying to allocate that memory, it means ASan is trying to allocate it for itself (it says "shadow memory" which is what ASan uses to keep track of the memory your program allocates).
If the number of iterations (and size of array) n_vars is large, then the function will use extra memory for a new std::vector in every loop, forcing ASan to track more and more memory.
You could try moving the local vector out of the loop (which will likely increase the performance of the function anyway):
std::vector<Index> nonzero_entries;
for (Index j = 0; j < n_vars; j++) {
// ...
for (Index i = 0; i < n_constraints; i++) {
if (isnan(g[i])) {
data.flattened_nonzero_rows.push_back(i);
data.flattened_nonzero_cols.push_back(j);
nonzero_entries.push_back(i);
}
}
data.nonzeros.push_back(nonzero_entries);
nonzero_entries.clear();
}
This will reuse the same memory for nonzero_entries instead of allocating and deallcoating memory for a new vector every iteration.
Trying to figure out the source of the stack problems was getting nowhere. So I tried a different approach. Through debugging, I narrowed down the above function get_sparsity as the culprit. The debugger wasn't giving me any hints exactly WHERE the problem was occurring, but it was somewhere inside that function. With that information, I switched the only two stack variables in that function x and g to heap variables so that valgrind could help me find the error (sgcheck was coming up empty). Specifically, I modified the above code to
void get_sparsity(Data & data) {
std::vector<T> x(n_vars, 0);
std::vector<T> g(n_constraints, 0);
/* However, for our purposes, it is easier to make an std::vector of Eigen
* vectors, where the ith entry of "nonzero_entries" contains a vector of
* indices in g for which g(indices) are nonzero when perturbing x(i).
* If that sounds complicated, just look at the code and compare to
* the code where we use the sparsity structure.
*/
for (Index j = 0; j < n_vars; j++) {
const T x_j = x[j];
x[j] = NAN;
Bool val = eval_g(n_vars, x.data(), TRUE, n_constraints, g.data(), &data);
x[j] = x_j;
std::vector<Index> nonzero_entries;
for (Index i = 0; i < n_constraints; i++) {
if (isnan(g[i])) {
data.flattened_nonzero_rows.push_back(i);
data.flattened_nonzero_cols.push_back(j);
nonzero_entries.push_back(i);
}
}
data.nonzeros.push_back(nonzero_entries);
}
int bob = 5;
return;
}
and then valgrinded it to find the offending line. Now that I know where the problem is occurring, I can fix the problem.

Incorrect floating point behavior

When I run the below C++ program in a 32-bit powerpc kernel which supports software floating emulation (hardware floating point disabled), I get a incorrect conditional evaluation. Can some tell me what's the potential problem here?
#include <stdio.h>
int main() {
int newmax = 1;
if ((newmax + 0.0) > 256) {
printf("\nShouldn't be here\n");
} else {
printf("\nShould be here\n");
}
}
Compile:
powerpc-linux-g++ -msoft-float -c floating.cxx
powerpc-linux-g++ -o floating floating.o
Output in target system:
[linux:/]$ ./floating
Shouldn't be here
You should specify -msoft-float also when linking
Give us a dissassembly with the -S flag: powerpc-linux-g++ -msoft-float -c floating.cxx -S -o floating.s
First, why hardware floating point is disabled?
Because of this type casts may be performed in incorrect order.
(double)1 = 0x3FF0000000000000
(float) 1 = 0x3F800000
This is your condition.
if ((newmax + 0.0) > 256)
In your case:
1) newmax casting to float or double;
2) adding 0.0;
3) gotten value casting back to int.
It depends on your machine, but int usually is 32-bit value. To check it you can use:
int i;
printf("%d", sizeof(i));
Anyway, going back to your problem, after calculated value converting into int you get big positive number. In your situation I would print it/or compare not with 0x100 but with
0x3F800000,
0x3FF0000000000000,
0x3FF00000
To find out, what happened, but disassembling is the best option.
Probably it wasn't so helpfull, but that was just my idea, what happened.
This could be anything from compiler error to assembler error to linker error to kernel error. As other people already pointed out: Compiler errors - which is the most likely source of this error - could be verified (or ruled out), if you provided the output of compiling with the -S option. If it is not a compiler error, a kernel error with the floating point emulation would be the next likely source of the problem.
The statement in your code newmax + 0.0 produces a result in float or double but is compared with an integer value.
thus this error.
Try this out,
int i=1;
printf("%d",(i+0.0));
you get a result 0 everytime no matter what the value of i.
Whereas,
int i=1;
printf("%f",(i+0.0));
This produces 1.0000

set a breakpoint in malloc_error_break to debug in C++

I'm writing a program that takes 2 command line arguments: a and b respectively.
Everything is good as long as a <= 17.5
As soon as a > 17.5 the program throws the following error:
incorrect checksum for freed object - object was probably modified after being freed
I've narrowed down the problem down to the following piece of code:
for(int a=0; a < viBrickWall.size(); a++) {
vector<int64_t> viTmp(iK-i);
fill(viTmp.begin(),viTmp.end(),2);
for(int b = 0; b < viBrickWall[a].size(); b++) {
viTmp[viBrickWall[a][b]] = 3;
}
viResult.push_back(viTmp);
viTmp.clear();
}
Removing the latter piece of code, gets rid of the error.
I'm also using valgrind to debug the memory, but I haven't been able to find any solution.
Here it is a copy of valgrind's report:
Report hosted in pastebin
EDIT
I compiled the program with debugging flags:
g++ -g -O0 -fno-inline program.cpp
and ran it with valgrind as follows:
` valgrind --leak-check=full --show-reachable=yes --dsymutil=yes ./a.out 48 10 ``
I noticed the following line:
==15318== Invalid write of size 8
==15318== at 0x100001719: iTileBricks(int) (test.cpp:74)
==15318== by 0x100001D7D: main (test.cpp:40)
Line 74 is:
viTmp[viBrickWall[a][b]] = 3;
and Line 40 is:
viBrickWall = iTileBricks(iPanelWidth);
You're causing an invalid write to heap memory with this line:
viTmp[viBrickWall[a][b]] = 3;
this implies that viBrickWall[a][b] is indexing outside of viTmp at that time. Add
int i = viBrickWall[a][b];
assert(0 <= i && i < viTmp.size());
before the store to viTmp[i] = 3.
HINT: maybe increasing the size of viTmp by one would fix it:
-vector<int64_t> viTmp(iK-i);
+vector<int64_t> viTmp(iK - i + 1);
I don't know the content of viBrickWall so this is just an educated guess from the Valgrind output.
I'm not sure if you're using GNU libstdc++ or libc++ on Mac OSX. If you're using libstdc++ or have a Linux box handy, declare viTmp to be a std::__debug::vector would catch this problem quickly.

C++ and pin tool -- very weird DOUBLE variable issue with IF statement

I am working with pin tool that simulates a processor and having a very strange problem.
In the code snippet below, Router::Evaluate() is called repeatedly many times. After it is called several million times, strange behavior occurs intermittently where "_cycles != 0" is evaluated to be true in the first IF statement and to be false in the immediately following IF statement, falling into ELSE block.
void Router::Evaluate( )
{
//---------debug print code---------
if (_cycles != 0) {
cout << "not a zero" << endl;
if (_cycles != 0) cout << "---not a zero" << endl;
else cout << "---zero" << endl;
}
//----------------------------------
_cycles += _speedup;
while ( _cycles >= 1.0 ) {
_Step();
_cycles -= 1.0;
}
}
//class definition
class Router : public TimedModule {
Protected:
double _speedup; //initialized to 1.0
double _cycles; //initialized to 0.0
...
}
Below is the output of the code where "not a zero" followed by "---zero" is printed out from time to time seemingly randomly.
not a zero
---zero
(...some other output...)
not a zero
---zero
(...some other output...)
How could this possibly happen? This is not a multi-threaded program, so synchronization is not an issue. The program is compiled with gcc4.2.4 and executed on 32-bit CentOS. Does anybody have a clue?
Thanks.
--added---
I should have mentioned this, too. I did try printing the value of _cycles each time, and it is always 0.0, which should not be possible...
I also used the following g++ options: "-MM -MG -march=i686 -g -ggdb -g1 -finline-functions -O3 -fPIC"
Unless you have a horrible compiler bug, I would guess something like this is happening:
_cycles has some small fraction remaining after the subtractions. As long the compiler knows nothing else is changing its contents, it keeps its value in a higher precision floating point register. When it sees the I/O operation it is not certain the value of _cycles is needed elsewhere, so it makes sure to store its contents back to the double-precision memory location, rounding off the extra bits that were in the register. The next check assumes pessimistically the value might have changed during the I/O operation, and loads it back from memory, now without the extra bits that made it non-zero in the previous test.
As Daniel Fischer mentioned in a comment, using -ffloat-store inhibits the use of high-precision registers. If the problem goes away when using this option then the scenario I described is very likely. Check the assembly output of Router::Evaluate to be sure.

Running gives wrong numbers, but debugging works well?

I have a problem with my code. When I run the code the short "opcode" has the wrong value 52496. So I debug the code step by step... and when I am doing this "opcode" has the correct value 4624! Can someone give me a hint?
void packet_get()
{
boost::shared_ptr<boost::array<unsigned char, 2>> opc(new boost::array<unsigned char, 2>);
recv_two_bytes(opc);
unsigned short opcode;
unsigned char * test[2];
test[0] = &opc->at(0); // *test[0] == 0x12
test[1] = &opc->at(1); // *test[1] == 0x10
opcode = 0;
int i = 0;
for(i = 0; i <= 1; i++)
{
opcode = (opcode<<8) | *(test[i]);
}
// opcode should now be short 4624
}
Usually, when the behavior of the program is different between normal and debug runs, it is due to an undefined behavior. One such common mistake is uninitialized variables.
When you execute a program, it is given a stack that is most likely uninitialized. In debug mode, it is possible for the debugger to initialize this stack. Therefore, an uninitialized variable can easily have different values in debug and normal execution (even 0 in debug mode, which most of the times is what you actually wanted to give the variable but forgot).
It seems like you have some error like that in your recv_two_bytes function. Enabling all warnings on your compiler will help pin down the problem if it is more trivial.
Be on the look out for other errors such as indexing out of array also.