C++ backtrace with this=0x0 in various frames - c++

I have a program in a mips multicore system and I get a backtrace from core really hard to figure out (at least for me) , I suppose that maybe one of the other cores write to mem but not all the stack is corrupted what makes it more confusing for me.
In frame #2 this is NULL and in frame #0 this is NULL too (the cause of the core-dump).
This is (part) the backtrace:
#0 E::m (this=0x0, string=0x562f148 "", size=202) at E.cc:315
#1 0x00000000105c773c in P::e (this=0x361ecd00, string=0x562f148 "", size=202, offset=28) at P.cc:137
#2 0x00000000105c8c5c in M::e (this=0x0, id=7 '\a', r=2, string=0x562f148 "", size=202, oneClass=0x562f148 "", secondClass=0x14eff439 "",
offset=28) at M.cc:75
#3 0x0000000010596354 in m::find (this=0x4431fd70, string=0x562f148 "", size=202, oneClass=0x14eff438 "", secondClass=0x14eff439 "",
up=false) at A.cc:458
#4 0x0000000010597364 in A::trigger (this=0x4431fd70, triggerType=ONE, string=0x562f148 "", size=0, up=true) at A.cc:2084
#5 0x000000001059bcf0 in A::findOne (this=0x4431fd70, index=2, budget=0x562f148 "", size=202, up=true) at A.cc:1155
#6 0x000000001059c934 in A::shouldpathNow (this=0x4431fd70, index=2, budget=0x562f148 "", size=202, up=false, startAt=0x0, short=)
at A.cc:783
#7 0x00000000105a385c in A::shouldpath (this=0x4431fd70, index=2, rbudget=, rsize=, up=false,
direct=) at A.cc:1104
About the m::find function
442 m_t m::find(unsigned char const *string, unsigned int size,
443 hClass_t *hClass, h_t *fHClass,
444 bool isUp) {
445
446
447 const Iterator &it=arr_[getIndex()]->getSearchIterator((char const*)value, len);
448
449 unsigned int const offset = value - engine_->getData();
450 451 int ret=UNKNOWN;
452 M *p;
453 for(const void* match=it.next();
454 ret == UNKNOWN && match != NULL;
455 match = it.next()){
456 p = (M*)match;
457 if(p->needMore()){
458 ret = p->e(id_, getIndex(), value, len, hClass, fHClass, offset);

this=0x0 can actually happen pretty easily. For example:
E *instance = NULL;
instance->method();
this will be NULL within method.
There's no need to assume that the memory has been corrupted or the stack has been overwritten. In fact, if the rest of the stack's contents seem to make sense (and you seem to think that they do), then the stack is probably fine.
Instead of necessarily looking for memory corruption, check your logic to see if you have an uninitialized (NULL) pointer or reference.

Not being able to see all the code, its kind-of difficult to imagine what's happening. Could you also add the code for M::e() and P::e() or at least the important parts.
Something that might just solve everything is to add a NULL check, as follows in m::find():
456 p = (M*)match;
if(!p) { return; /* or do whatever */ }
457 if(p->needMore()){
458 ret = p->e(id_, getIndex(), value, len, hClass, fHClass, offset);
If p were NULL, I would have expected it to have crashed calling p->needMore(), but depending on what that method does, it may not crash.

Related

in multithread program does bt a coredump always gives the culprit thread?

this is a little bit general question,
I have a segfault in a multithreaded program, and bt coredump shows below,
(gdb) bt full
#0 0x0000000000441540 in try_dequeue<std::shared_ptr<Frame> > (item=<synthetic pointer>, this=0xbe3c50) at /root/projects/active/user/include/third_party/concurrentqueue.h:1111
nonEmptyCount = 0
best = 0x0
bestSize = 0
#1 ConsumerNice::listening_nice (this=0xbe3c40) at /root/projects/active/user/include/concurrency/consumer_nice.h:45
frame = std::shared_ptr (empty) 0x0
#2 0x00000000004c0530 in execute_native_thread_routine ()
No symbol table info available.
#3 0x00007f3eb3f81e65 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4 0x00007f3ead70a88d in clone () from /lib64/libc.so.6
No symbol table info available.
So I go to look at the source code,
my code as below
void listening_nice() {
while (true) {
std::shared_ptr<Frame> frame;
if (nice_queue.try_dequeue(frame)) {
on_frame_nice(frame);
}
}
}
and cameron314/concurrentqueue part look like below,
bool try_dequeue(U& item)
{
// Instead of simply trying each producer in turn (which could cause needless contention on the first
// producer), we score them heuristically.
size_t nonEmptyCount = 0;
ProducerBase* best = nullptr;
size_t bestSize = 0;
for (auto ptr = producerListTail.load(std::memory_order_acquire); nonEmptyCount < 3 && ptr != nullptr; ptr = ptr->next_prod()) {
auto size = ptr->size_approx();
if (size > 0) {
if (size > bestSize) {
bestSize = size;
best = ptr;
}
++nonEmptyCount;
}
}
It doesnt seem possible to cause segfault, therefore I am wondering, is bt always show the culprit thread? or there is a chance segfault is caused by some other problem in some other thread, or even the operating system?
Noted this program is running on 3 same configured machine, but only one machine crashes once a day, that is it runs for 3 straight hours on that one machine, then crashed.

cygwin exception when assigning value to vector of strings

I am having following exception during the course of the run of program:
0 [main] myFunction 5560 cygwin_exception::open_stackdumpfile: Dumping stack trace to myFunction.exe.stackdump
The contents of stackdump file are as follows:
Stack trace:
Frame Function Args
00000223800 0018006FB93 (0060007AE38, 00600083EC8, 00600083EF8, 00600083F28)
00000000006 0018007105A (0060007BB78, 00600000000, 0000000014C, 00000000000)
000002239E0 0018011C6A7 (00600083048, 00600083078, 006000830A8, 006000830D8)
00000000041 001801198DE (0060007DCB8, 0060007DCE8, 00000000000, 0060007DD48)
0060008F2B0 00180119DAB (0060007E1F8, 0060007E228, 0060007E258, 00000000006)
0060008F2B0 00180119F7C (0060007CB38, 0060007CB68, 0060007CB98, 0060007CBC8)
0060008F2B0 0018011A23F (00180115A0B, 0060007CCE8, 006000885B0, 00000000000)
0060008F2B0 00180148A65 (003FC4AA93D, 00600083900, 00100439102, 0060007B080)
0060008F2B0 001800C1DB3 (00000000000, 00000223EE0, 0010042A2BC, 00000223E90)
0060008F2B0 00180115A0B (00000223EE0, 0010042A2BC, 00000223E90, 00000000017)
0060008F2B0 00600000001 (00000223EE0, 0010042A2BC, 00000223E90, 00000000017)
End of stack trace
Let me describe in detail the peculiar problem which happens at runtime. I am not able to describe the problem with just words, so I am listing scenario when the program works and when it fails.
I have created a vector of string in my header file and initialised them in the constructor as follows :
std::vector <std::string> symbolMap,localSymbolMap;
for(int i=0;i<100;i++){
symbolMap.push_back(" ");
localSymbolMap.push_back(" ");
}
I have defined a function to assign appropriate value to these variables later in the program as follows :
void TestClient::setTickerMap(int j, std::string symbol, std::string localSymbol){
symbolMap[j] = symbol;
localSymbolMap[j]=localSymbol;
}
Now, in the main program, I call this function as follows:
TestClient client;
for(int j=0;j<27;j++){
std::cout<<j<<" "<<realTimeSymbols[j]<<" "<<getLocalSymbol(realTimeSymbols[j],date)<<std::endl;
client.setTickerMap(j,realTimeSymbols[j],getLocalSymbol(realTimeSymbols[j],date));
}
// Here, I have checked for each j, that values of realTimeSymbols and getLocalSymbol are proper.
When I run the program, I get the error described above. The program always crashed when j is equal to 24.
Now the following workaround is working as of now:
void TestClient ::setTickerMap(int j, std::string symbol, std::string localSymbol){
if(j==24){
// symbolMap[j]="SYNDIBANK";
// localSymbolMap[j]="SYNDIBANK15MARFUT";
}
else{
symbolMap[j] = symbol;
localSymbolMap[j]=localSymbol;
}
if(j==1){
symbolMap[24]="SYNDIBANK";
localSymbolMap[24]="SYNDIBANK15MARFUT";
}
}
Following 3 variations of the code are above workaround are not working and they result in the original error:
Variation 1:
void TestClient ::setTickerMap(int j, std::string symbol, std::string localSymbol){
if(j==24){
// symbolMap[j]="SYNDIBANK";
// localSymbolMap[j]="SYNDIBANK15MARFUT";
}
else{
symbolMap[j] = symbol;
localSymbolMap[j]=localSymbol;
}
if(j==25){
symbolMap[24]="SYNDIBANK";
localSymbolMap[24]="SYNDIBANK15MARFUT";
}
}
Variation 2:
void TestClient ::setTickerMap(int j, std::string symbol, std::string localSymbol){
if(j==24){
symbolMap[j]="SYNDIBANK";
localSymbolMap[j]="SYNDIBANK15MARFUT";
}
else{
symbolMap[j] = symbol;
localSymbolMap[j]=localSymbol;
}
}
Variation 3:
void TestClient ::setTickerMap(int j, std::string symbol, std::string localSymbol){
if(j==24){
symbolMap[j]="AB";
localSymbolMap[j]="SYNDIBANK15MARFUT";
}
else{
symbolMap[j] = symbol;
localSymbolMap[j]=localSymbol;
}
}
Now, if I assign a single character to symbolMap in variation 3 as follows :
symbolMap[j]="A";
then the code is able to run(although is the result is not correct).
I am not able to figure what exactly is causing this runtime error. I have checked the related question (Cygwin Exception : open stack dump file) and I do not have a separate session of cygwin running. I have restarted my PC just be extra sure. Still the problem persists. Any suggestions as to why this behaviour is seen on my PC.
UPDATE:
To be sure that the error is not related to out-of-index, the following call from main program works fine:
TestClient client;
for(int j=25;j<27;j++){
std::cout<<j<<" "<<realTimeSymbols[j]<<" "<<getLocalSymbol(realTimeSymbols[j],date)<<std::endl;
client.setTickerMap(j,realTimeSymbols[j],getLocalSymbol(realTimeSymbols[j],date));
}
The program also works fine when j is iterated from 24 to 27. But fails when the loop is iterated from any number before 24 to 27.
GDB OUTPUT
I do not have much experience with gdb but following is the output of the gdb if it helps:
GNU gdb (GDB) 7.8
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-cygwin".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from order_trading2632_limit.exe...done.
(gdb) run
Starting program: /cygdrive/e/eclipse_workspace/testClient/Debug/testClient.exe
[New Thread 4832.0x11e4]
[New Thread 4832.0x1798]
Attempt 1 of 10000
[New Thread 4832.0x1020]
Connection successful
Program received signal SIGABRT, Aborted.
0x00000003fc4ab0e3 in cygstdc++-6!_ZNSs6assignERKSs () from /usr/bin/cygstdc++-6.dll
(gdb) bt
#0 0x00000003fc4ab0e3 in cygstdc++-6!_ZNSs6assignERKSs () from /usr/bin/cygstdc++-6.dll
#1 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) set $pc=*(void **)$rsp
(gdb) set $rsp=$rsp+8
(gdb) bt
#0 0x000007fefd3110ac in WaitForSingleObjectEx () from /cygdrive/c/Windows/system32/KERNELBASE.dll
#1 0x000000018011c639 in sig_send(_pinfo*, siginfo_t&, _cygtls*) () from /usr/bin/cygwin1.dll
#2 0x00000001801198de in _pinfo::kill(siginfo_t&) () from /usr/bin/cygwin1.dll
#3 0x0000000180119dab in kill0(int, siginfo_t&) () from /usr/bin/cygwin1.dll
#4 0x0000000180119f7c in raise () from /usr/bin/cygwin1.dll
#5 0x000000018011a23f in abort () from /usr/bin/cygwin1.dll
#6 0x0000000180148a65 in dlfree () from /usr/bin/cygwin1.dll
#7 0x00000001800c1db3 in free () from /usr/bin/cygwin1.dll
#8 0x0000000180115a0b in _sigfe () from /usr/bin/cygwin1.dll
#9 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Note that stack trace is corrupted and I have used trick from following question to print the stacktrace (GDB corrupted stack frame - How to debug?). Please help me in debugging the program further.
UPDATE
It is not the case that the error happens only when index is 24. Before calling the said loop, I initialize various arrays of int, double and string. Changing the number of initialization affects the index when this error happens. Today, I initialised vectors of length 24 before running this loop, this time the error happened at index 3.
This is really frustrating to implement the workaround. I do not that if there are some other memory issues I am overlooking because of this. Please offer suggestions.
CODE
int main(int argc, char** argv) {
unsigned int port = 7900;
const char* host = "";
int clientId = 6;
int attempt = 0;
int MAX_ATTEMPTS=10000;
int NUMREALTIMESYMBOLS=37;
std::string realTimeSymbolsArr[]={"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","aa","bb","cc","dd","ee","ff","gg","hh","ii","jj","kk"};
std::vector <std::string> realTimeSymbols(realTimeSymbolsArr,realTimeSymbolsArr+NUMREALTIMESYMBOLS);
int isTradeable[]={1,0,0,0,1,0,1,1,1,1,1,0,0,1,0,1,1,0,0,0,0,1,0,0,0,0,0,1,1,1,0,1,0,1,1,0,1};
int numSubscriptions[]={2,2,1,4,1,1,2,6,3,1,1,1,2,1,3,1,1,2,1,3,1,1,1,10,4,1,6,1,1,9,4,2,1,3,1,1,2};
int subscriptionList[NUMREALTIMESYMBOLS][100];
int subscriptionIndex[NUMREALTIMESYMBOLS][100];
subscriptionList[0][0]=0;subscriptionIndex[0][0]=0;
subscriptionList[0][1]=2;subscriptionIndex[0][1]=2;
subscriptionList[1][0]=2;subscriptionIndex[1][0]=1;
subscriptionList[1][1]=0;subscriptionIndex[1][1]=3;
subscriptionList[2][0]=2;subscriptionIndex[2][0]=0;
subscriptionList[3][0]=4;subscriptionIndex[3][0]=2;
subscriptionList[3][1]=31;subscriptionIndex[3][1]=2;
subscriptionList[3][2]=13;subscriptionIndex[3][2]=3;
subscriptionList[3][3]=34;subscriptionIndex[3][3]=3;
subscriptionList[4][0]=4;subscriptionIndex[4][0]=0;
subscriptionList[5][0]=9;subscriptionIndex[5][0]=2;
subscriptionList[6][0]=6;subscriptionIndex[6][0]=0;
subscriptionList[6][1]=8;subscriptionIndex[6][1]=2;
subscriptionList[7][0]=7;subscriptionIndex[7][0]=0;
subscriptionList[7][1]=8;subscriptionIndex[7][1]=1;
subscriptionList[7][2]=31;subscriptionIndex[7][2]=1;
subscriptionList[7][3]=36;subscriptionIndex[7][3]=1;
subscriptionList[7][4]=13;subscriptionIndex[7][4]=2;
subscriptionList[7][5]=34;subscriptionIndex[7][5]=2;
subscriptionList[8][0]=8;subscriptionIndex[8][0]=0;
subscriptionList[8][1]=7;subscriptionIndex[8][1]=1;
subscriptionList[8][2]=21;subscriptionIndex[8][2]=3;
subscriptionList[9][0]=9;subscriptionIndex[9][0]=0;
subscriptionList[10][0]=10;subscriptionIndex[10][0]=0;
subscriptionList[11][0]=11;subscriptionIndex[11][0]=0;
subscriptionList[12][0]=28;subscriptionIndex[12][0]=3;
subscriptionList[12][1]=33;subscriptionIndex[12][1]=3;
subscriptionList[13][0]=13;subscriptionIndex[13][0]=0;
subscriptionList[14][0]=33;subscriptionIndex[14][0]=1;
subscriptionList[14][1]=28;subscriptionIndex[14][1]=2;
subscriptionList[14][2]=15;subscriptionIndex[14][2]=3;
subscriptionList[15][0]=15;subscriptionIndex[15][0]=0;
subscriptionList[16][0]=16;subscriptionIndex[16][0]=0;
subscriptionList[17][0]=0;subscriptionIndex[17][0]=1;
subscriptionList[17][1]=11;subscriptionIndex[17][1]=2;
subscriptionList[18][0]=7;subscriptionIndex[18][0]=2;
subscriptionList[19][0]=6;subscriptionIndex[19][0]=3;
subscriptionList[19][1]=8;subscriptionIndex[19][1]=3;
subscriptionList[19][2]=16;subscriptionIndex[19][2]=3;
subscriptionList[20][0]=9;subscriptionIndex[20][0]=1;
subscriptionList[21][0]=21;subscriptionIndex[21][0]=0;
subscriptionList[22][0]=9;subscriptionIndex[22][0]=3;
subscriptionList[23][0]=6;subscriptionIndex[23][0]=1;
subscriptionList[23][1]=10;subscriptionIndex[23][1]=1;
subscriptionList[23][2]=27;subscriptionIndex[23][2]=1;
subscriptionList[23][3]=29;subscriptionIndex[23][3]=1;
subscriptionList[23][4]=16;subscriptionIndex[23][4]=2;
subscriptionList[23][5]=21;subscriptionIndex[23][5]=2;
subscriptionList[23][6]=2;subscriptionIndex[23][6]=3;
subscriptionList[23][7]=4;subscriptionIndex[23][7]=3;
subscriptionList[23][8]=7;subscriptionIndex[23][8]=3;
subscriptionList[23][9]=36;subscriptionIndex[23][9]=3;
subscriptionList[24][0]=24;subscriptionIndex[24][0]=0;
subscriptionList[24][1]=24;subscriptionIndex[24][1]=1;
subscriptionList[24][2]=24;subscriptionIndex[24][2]=2;
subscriptionList[24][3]=24;subscriptionIndex[24][3]=3;
subscriptionList[25][0]=29;subscriptionIndex[25][0]=3;
subscriptionList[26][0]=21;subscriptionIndex[26][0]=1;
subscriptionList[26][1]=0;subscriptionIndex[26][1]=2;
subscriptionList[26][2]=10;subscriptionIndex[26][2]=2;
subscriptionList[26][3]=15;subscriptionIndex[26][3]=2;
subscriptionList[26][4]=27;subscriptionIndex[26][4]=2;
subscriptionList[26][5]=33;subscriptionIndex[26][5]=2;
subscriptionList[27][0]=27;subscriptionIndex[27][0]=0;
subscriptionList[28][0]=28;subscriptionIndex[28][0]=0;
subscriptionList[29][0]=29;subscriptionIndex[29][0]=0;
subscriptionList[29][1]=4;subscriptionIndex[29][1]=1;
subscriptionList[29][2]=13;subscriptionIndex[29][2]=1;
subscriptionList[29][3]=16;subscriptionIndex[29][3]=1;
subscriptionList[29][4]=34;subscriptionIndex[29][4]=1;
subscriptionList[29][5]=6;subscriptionIndex[29][5]=2;
subscriptionList[29][6]=36;subscriptionIndex[29][6]=2;
subscriptionList[29][7]=27;subscriptionIndex[29][7]=3;
subscriptionList[29][8]=31;subscriptionIndex[29][8]=3;
subscriptionList[30][0]=30;subscriptionIndex[30][0]=0;
subscriptionList[30][1]=30;subscriptionIndex[30][1]=1;
subscriptionList[30][2]=30;subscriptionIndex[30][2]=2;
subscriptionList[30][3]=30;subscriptionIndex[30][3]=3;
subscriptionList[31][0]=31;subscriptionIndex[31][0]=0;
subscriptionList[31][1]=29;subscriptionIndex[31][1]=2;
subscriptionList[32][0]=11;subscriptionIndex[32][0]=3;
subscriptionList[33][0]=33;subscriptionIndex[33][0]=0;
subscriptionList[33][1]=15;subscriptionIndex[33][1]=1;
subscriptionList[33][2]=28;subscriptionIndex[33][2]=1;
subscriptionList[34][0]=34;subscriptionIndex[34][0]=0;
subscriptionList[35][0]=11;subscriptionIndex[35][0]=1;
subscriptionList[36][0]=36;subscriptionIndex[36][0]=0;
subscriptionList[36][1]=10;subscriptionIndex[36][1]=3;
double a1[]={720,0.0,750,0.0,900,0.0,760,360,120,390,600,360,0.0,760,0.0,140,660,0.0,0.0,0.0,0.0,720,0.0,0.0,100,0.0,0.0,120,320,40,100,500,0.0,630,570,0.0,100};
double a2[]={0.5,0.0,1.3,0.0,0.6,0.0,0.45,0.15,0.45,0.4,0.25,1.4,0.0,0.55,0.0,0.2,0.8,0.0,0.0,0.0,0.0,0.6,0.0,0.0,0.4,0.0,0.0,0.25,0.4,0.25,0.4,0.35,0.0,0.4,0.5,0.0,0.4};
double a3[]={1350,0.0,1250,0.0,300,0.0,1150,1400,900,1200,850,900,0.0,600,0.0,1450,1450,0.0,0.0,0.0,0.0,1000,0.0,0.0,1200,0.0,0.0,1150,350,1400,1200,1350,0.0,1500,300,0.0,1200};
double a4[]={0.6,0.0,0.7,0.0,0.2,0.0,0.3,0.55,0.4,0.8,0.25,0.7,0.0,0.25,0.0,0.55,0.5,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.7,0.0,0.0,0.65,0.55,0.45,0.7,0.6,0.0,0.4,0.4,0.0,0.7};
double a5[]={300,0.0,1300,0.0,1350,0.0,200,1100,1200,650,1500,1350,0.0,1050,0.0,1300,550,0.0,0.0,0.0,0.0,250,0.0,0.0,150,0.0,0.0,1250,700,1150,150,1250,0.0,1500,1500,0.0,150};
double a6[]={0.3,0.0,0.8,0.0,0.6,0.0,0.5,0.6,0.6,0.3,0.35,0.7,0.0,0.55,0.0,0.45,0.35,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.5,0.0,0.0,0.55,0.3,0.35,0.5,0.75,0.0,0.2,0.5,0.0,0.5};
double a7[]={1500,0.0,1500,0.0,1050,0.0,750,1100,1350,1350,100,1350,0.0,550,0.0,1400,1000,0.0,0.0,0.0,0.0,1000,0.0,0.0,1350,0.0,0.0,350,550,350,1350,500,0.0,1350,1250,0.0,1350};
double a8[]={0.9,0.0,0.9,0.0,0.8,0.0,0.6,0.35,0.7,0.2,0.15,0.7,0.0,0.3,0.0,0.55,0.5,0.0,0.0,0.0,0.0,0,0.0,0.0,0.3,0.0,0.0,0.4,0.3,0.5,0.3,0.35,0.0,0.5,0.5,0.0,0.3};
double a9[]={0.008,0.0,0.009,0.0,0.01,0.0,0.01,0.009,0.009,0.007,0.009,0.01,0.0,0.009,0.0,0.01,0.008,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.006,0.0,0.0,0.008,0.009,0.01,0.006,0.009,0.0,0.008,0.009,0.0,0.006};
double a10[]={0.008,0.0,0.009,0.0,0.008,0.0,0.008,0.008,0.009,0.008,0.008,0.006,0.0,0.009,0.0,0.01,0.008,0.0,0.0,0.0,0.0,0.005,0.0,0.0,0.009,0.0,0.0,0.01,0.008,0.009,0.009,0.009,0.0,0.008,0.01,0.0,0.009};
double a11[]={0.4,0.0,0.2,0.0,0.1,0.0,0.3,0.4,0.2,0.1,0.7,0.2,0.0,0,0.0,0.1,0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.2,0.0,0.0,0.3,0,0.2,0.2,0,0.0,0.7,0.1,0.0,0.2};
int a12[]={500,1000,8000,2000,4000,1000,1250,1000,1000,500,1000,125,2000,4000,1000,250,2000,250,1000,1250,500,2000,1000,500,0,250,500,4000,4000,1250,0,2000,500,500,4000,125,1000};
double a13[]={0.0013406,0.0020022,0.0018709,0.0018948,0.0017975,0.0014687,0.0011068,0.001355,0.0010891,0.00088151,0.0014294,0.0012989,0.0014205,0.0019711,0.0015365,0.0020505,0.0018961,0.00078672,0.0023114,0.0012203,0.0012849,0.0015674,0.0012844,0.0014197,0.0,0.00074657,0.00096164,0.0017109,0.0015385,0.00068178,0.0,0.0021815,0.00087359,0.00074349,0.0021645,0.001595,0.0014573};
int a14[]={14850,0,16500,0,13740,0,13740,24750,13740,14100,13740,30750,0,14400,0,13740,13740,0,0,0,0,14100,0,0,13500,0,0,13740,13740,25200,13500,13740,0,13740,13740,0,13740};
int a15[]={30750,0,35900,0,34950,0,35900,35900,35900,35900,30000,34950,0,26250,0,34000,35900,0,0,0,0,34500,0,0,13500,0,0,35900,34650,35900,13500,32700,0,35900,35900,0,33300};
Client client;
for (int i = 0; i < MAX_ATTEMPTS; i++) {
client.connect(host, port, clientId);
++attempt;
std::cout << "Attempt " << attempt << " of " << MAX_ATTEMPTS<< std::endl;
for (int j=0;j<NUMREALTIMESYMBOLS;j++){
if(j==24 || j==30)
continue;
std::cout<<j<<" "<<realTimeSymbols[j]<<" "<<getLocalSymbol(realTimeSymbols[j],date)<<std::endl;
client.setTickerMap(j,realTimeSymbols[j],getLocalSymbol(realTimeSymbols[j],date));
}
}
}
Constructor of Client:
Client::Client(){
for(int i=0;i<50;i++){
symbolMap.push_back(" ");
localSymbolMap.push_back(" ");
}
}
The above code fails at 24 and 30. Hence, the loop to continue when j is 24 or 30 as workaround.
std::string realTimeSymbolsArr[]={"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","aa","bb","cc","dd","ee","ff","gg","hh","ii","jj","kk"};
subscriptionList[24][1]=24;subscriptionIndex[24][1]=1;
subscriptionList[24][2]=24;subscriptionIndex[24][2]=2;
subscriptionList[24][3]=24;subscriptionIndex[24][3]=3;
subscriptionList[30][1]=30;subscriptionIndex[30][1]=1;
subscriptionList[30][2]=30;subscriptionIndex[30][2]=2;
subscriptionList[30][3]=30;subscriptionIndex[30][3]=3;
You have not posted the source for getLocalSymbol, but I must assume that it also uses the same data and has a flow of the following form:
getLocalSymbol(a, b) {
int i, j, old_i, old_j;
std::string value;
// Derive i and j from the parameters
// ...
// And build the String
do {
old_i = i;
old_j = j;
i = subscriptionList[old_i][old_j];
j = subscriptionIndex[old_i][old_j];
value += realTimeSymbolsArr[i];
} while(j > 0);
return value;
}
Got it right? That control flow, or something equivalent, appears to be part of it, either way.
This goes well for almost all values of i and j - except for the aforementioned values of 24 and 30 for i, and 1 to 3 for j.
With these values, i and j remain the same in every iteration and value becomes longer and longer, until eventually something breaks on the stack which overwrites both j (and thereby causes the loop to terminate) and corrupts value.
Either way, the std::string you returned is now corrupted as you exceeded some limit during that endless loop.
As for how to solve it, fix that infinite loop and fix your data.
For fixing the loop, limit the iteration count.
For fixing your data, well, now that you know why the data is causing the bug, you should be able to figure yourself how to fix it.
Remember, you have to fix BOTH. If you don't fix the data, you will get an unreasonable long return value. And if you don't fix the iteration limit, it will crash again as soon as someone repeats a similar mistake when updating the data.

parameter value lost after call new_allocator in c++

I meet a strange behavior for a c++11 program, and can not figure out what is going wrong. please gave me some advises. thanks.
basically, it is a OpenCL program.
struct memory_layout
{
public:
memory_layout(managed_device d);
scalar<int> s;
};
memory_layout::memory_layout(managed_device d) :
s(d)
{
}
class doer
{
public:
doer();
void go();
private:
managed_device dev;
memory_layout mem;
};
doer::doer():
dev(find_CPU()),
mem(dev)
{
}
void doer::go()
{
task t = copy(10,mem.s);
}
int main(){
doer d;
d.go();
return 0;
}
when it runs to copy function, it has "Segmentation Fault".
Here is the def of copy:
template <typename T>
task copy(const T& source, scalar<T>& sink, const std::vector<task>& deps = {} )
{
return sink.device().create_task( profile::copy<T>(source, sink), deps );
}
When I use gdb to debug:
Breakpoint 1, doer::go (this=0x7fffffffdc90) at main.cpp:79
79 task t = copy(10,mem.s); // device() original be 0x60f0d0
(gdb) p mem.s.device()
$1 = (cppcl::opencl_1_2::device::managed_device &) #0x7fffffffdc60: {_device = 0x60f0d0}
(gdb) s
std::vector<unsigned long, std::allocator<unsigned long> >::vector (this=0x7fffffffdc50) at /usr/include/c++/4.8.3/bits/stl_vector.h:249
249 : _Base() { }
(gdb)
std::_Vector_base<unsigned long, std::allocator<unsigned long> >::_Vector_base (this=0x7fffffffdc50)
at /usr/include/c++/4.8.3/bits/stl_vector.h:125
125 : _M_impl() { }
(gdb)
std::_Vector_base<unsigned long, std::allocator<unsigned long> >::_Vector_impl::_Vector_impl (this=0x7fffffffdc50)
at /usr/include/c++/4.8.3/bits/stl_vector.h:87
87 : _Tp_alloc_type(), _M_start(0), _M_finish(0), _M_end_of_storage(0)
(gdb)
std::allocator<unsigned long>::allocator (this=0x7fffffffdc50) at /usr/include/c++/4.8.3/bits/allocator.h:113
113 allocator() throw() { }
(gdb)
__gnu_cxx::new_allocator<unsigned long>::new_allocator (this=0x7fffffffdc50) at /usr/include/c++/4.8.3/ext/new_allocator.h:80
warning: Source file is more recent than executable.
80
(gdb)
std::_Vector_base<unsigned long, std::allocator<unsigned long> >::_Vector_impl::_Vector_impl (this=0x7fffffffdc50)
at /usr/include/c++/4.8.3/bits/stl_vector.h:88
88 { }
(gdb)
cppcl::opencl_1_2::device::copy<int> (source=#0x7fffffffdc6c: 10, sink=..., deps=std::vector of length 0, capacity 0)
at /usr/include/cppcl/1.2/device/buffer_templates.h:1233
warning: Source file is more recent than executable.
1233 return sink.device().create_task( profile::copy<T>(source, sink), deps );
(gdb) p sink.device()
$2 = (cppcl::opencl_1_2::device::managed_device &) #0x7fffffffdc60: {_device = 0x0}
after I step into the copy function, it first build the "deps" parameter, and then, the _device value changed to 0x0. I could not figure out why this happy?
thanks for giving me some suggestions.
I'm assuming that you're not asking what's wrong with your code, that you're only asking how to figure out yourself what's wrong with your code. Otherwise, there's not enough information in your question.
This is a good first step in debugging. You've found clear indication that one value in memory is being changed. You've found a concrete object managed_device at address 0x7fffffffdc60 that contains a value that gets changed somehow.
Let me use a simple complete program:
#include <stdio.h>
int *p;
void f() {
++*p;
}
int main() {
int i = 3;
p = &i;
printf("%d\n", i); // i is 3 here.
f();
printf("%d\n", i); // Huh? i is 4 here.
}
Now, of course it is completely and utterly obvious why i changes in this program, but let's suppose that I completely overlooked it anyway.
If I set a breakpoint on line 13 (the call to f), and inspect i, I see that it is still 3.
Breakpoint 1, main () at test.cc:13
13 f();
(gdb) p i
$1 = 3
No surprise there. And I've already determined that the value will at some unknown point in the future get changed, I just don't know when.
I can now use the watch instruction to monitor that variable for changes:
(gdb) watch i
Hardware watchpoint 2: i
and then continue execution:
(gdb) cont
Continuing.
Hardware watchpoint 2: i
Old value = 3
New value = 4
f () at test.cc:7
7 }
(gdb) bt
#0 f () at test.cc:7
#1 0x004011e9 in main () at test.cc:13
Now, I have seen that the code that modified i was just before the closing brace in f.
This is what you'll need to do with your own code. It'll be a bit more complex than in this simple example, but you should be able to use it for your own code as well.

ByteSize() with in Google protocol buffer

now I develop the test code using GPB in qnx as follows:
Offer_event Offer;
string a = "127.0.0.7";
Offer.set_ipaddress(a);
Offer.set_port(9000);
BufSize = Offer.ByteSize();
Length_message = BufSize + Message_Header_Size;
Message->PayloadLength_of_Payload = BufSize;
PayloadBuffer = new char[BufSize];
Offer.SerializeToArray(PayloadBuffer, BufSize);
in that case, I met some errors. but I cannot understand it.
that error is as follows:
#0 std::string::size (this=0xcd21c0)
at /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntoarmeabi/arm-unknown-nto-qnx6.5.0eabi/pic/libstdc++-v3/include/bits/basic_string.h:624
624 /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntoarmeabi/arm-unknown-
nto-qnx6.5.0eabi/pic/libstdc++-v3/include/bits/basic_string.h: No such file or d
irectory.
in /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntoarmeabi/arm-unkno
wn-nto-qnx6.5.0eabi/pic/libstdc++-v3/include/bits/basic_string.h
(gdb) bt
#0 std::string::size (this=0xcd21c0)
at /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntoarmeabi/arm-unknown-n
to-qnx6.5.0eabi/pic/libstdc++-v3/include/bits/basic_string.h:624
#1 0x0067d6b0 in google::protobuf::internal::WireFormatLite::StringSize ()
#2 0x0063ecd0 in Offer_event::ByteSize ()
#3 0x00404f18 in AnalysisCmdC_Actor::TestGPB ()
from C:/QNX650/target/qnx6/armle-v7/lib/libc.so.3
#11 0x0004201a in ?? ()
Cannot access memory at address 0x0
Current language: auto; currently c++
(gdb)
I don't know why the ByteSize has a problem.
If i delete the string part, it works well.
I think usage of string is problem.
what's the problem?

Identifying crash with hs_err_pid*.log and gdb

Update Sept. 12, 2011
I was able to get the core file and immediately dissabled the instruction that crashed. As per advice I tracked the value of r28 (by the way, no registry entry was log to hs_err_pid*.log) and check where did the value come from (see below w/ <---). However, I was not able to determine the value of r32.
Could the reason for the miss-alignment is that r28 is a 8-byte integer loaded to a 4-byte integer r31?
;;; 1053 if( Transfer( len ) == FALSE ) {
0xc00000000c0c55c0:2 <TFM::PrintTrace(..)+0x32>: adds r44=0x480,r32;; <---
0xc00000000c0c55d0:0 <TFM::PrintTrace(..)+0x40>: ld8 r43=[ret2]
0xc00000000c0c55d0:1 <TFM::PrintTrace(..)+0x41>: (p6) st4 [r35]=ret3
0xc00000000c0c55d0:2 <TFM::PrintTrace(..)+0x42>: adds r48=28,r33
0xc00000000c0c55e0:0 <TFM::PrintTrace(..)+0x50>: mov ret0=0;;
0xc00000000c0c55e0:1 <TFM::PrintTrace(..)+0x51>: ld8.c.clr r62=[r45]
0xc00000000c0c55e0:2 <TFM::PrintTrace(..)+0x52>: cmp.eq.unc p6,p1=r0,r62
;;; 1056 throw MutexLock ;
0xc00000000c0c55f0:0 <TFM::PrintTrace(..)+0x60>: nop.m 0x0
0xc00000000c0c55f0:1 <TFM::PrintTrace(..)+0x61>: nop.m 0x0
0xc00000000c0c55f0:2 <TFM::PrintTrace(..)+0x62>: (p6) br.cond.dpnt.many _NZ10TFM07PrintTraceEPi+0x800;;
;;; 1057 }
0xc00000000c0c5600:0 <TFM::PrintTrace(..)+0x70>: adds r41=0x488,r32
0xc00000000c0c5600:1 <TFM::PrintTrace(..)+0x71>: adds r40=0x490,r32
0xc00000000c0c5600:2 <TFM::PrintTrace(..)+0x72>: br.call.dptk.many rp=0xc00000000c080620;;
;;; 1060 dwDataLen = len ;
0xc00000000c0c5610:0 <TFM::PrintTrace(..)+0x80>: ld8 r16=[r44] <---
0xc00000000c0c5610:1 <TFM::PrintTrace(..)+0x81>: mov gp=r36
0xc00000000c0c5610:2 <TFM::PrintTrace(..)+0x82>: (p1) mov r62=8;;
0xc00000000c0c5620:0 <TFM::PrintTrace(..)+0x90>: cmp.eq.unc p6=r0,r16
0xc00000000c0c5620:1 <TFM::PrintTrace(..)+0x91>: nop.m 0x0
0xc00000000c0c5620:2 <TFM::PrintTrace(..)+0x92>: (p6) br.cond.dpnt.many _NZ10TFM07PrintTraceEPi+0xda0;;
0xc00000000c0c5630:0 <TFM::PrintTrace(..)+0xa0>: adds r21=16,r16 <---
0xc00000000c0c5630:1 <TFM::PrintTrace(..)+0xa1>: (p1) mov r62=8;;
0xc00000000c0c5630:2 <TFM::PrintTrace(..)+0xa2>: nop.i 0x0
0xc00000000c0c5640:0 <TFM::PrintTrace(..)+0xb0>: ld8 r42=[r21];; <---
0xc00000000c0c5640:1 <TFM::PrintTrace(..)+0xb1>: cmp.eq.unc p6=r0,r42
0xc00000000c0c5640:2 <TFM::PrintTrace(..)+0xb2>: nop.i 0x0
0xc00000000c0c5650:0 <TFM::PrintTrace(..)+0xc0>: nop.m 0x0
0xc00000000c0c5650:1 <TFM::PrintTrace(..)+0xc1>: mov r47=5
0xc00000000c0c5650:2 <TFM::PrintTrace(..)+0xc2>: (p6) br.cond.dpnt.many _NZ10TFM07PrintTraceEPi+0xdf0;;
0xc00000000c0c5660:0 <TFM::PrintTrace(..)+0xd0>: ld4.a r27=[r48]
;;; 1064 if( dwDataLen <= dwViewLen ) {
0xc00000000c0c5660:1 <TFM::PrintTrace(..)+0xd1>: adds r28=28,r42 <--
0xc00000000c0c5660:2 <TFM::PrintTrace(..)+0xd2>: cmp.ne.unc p6=r0,r46;;
0xc00000000c0c5670:0 <TFM::PrintTrace(..)+0xe0>: ld4.sa r26=[r28],
0xc00000000c0c5670:1 <TFM::PrintTrace(..)+0xe1>: (p6) ld4 r31=[r28] <-- instruction that crashed
Let me know if register values are needed. I think I can acquire the register value using info reg command of gdb.
This is the result of info registers (I excluded values of prXXX and brXXX), I don't have any idea how to map these to the disassembled instruction above.
gr1: 0x9fffffffbf716588
gr2: 0x9fffffff5f667c00
gr3: 0x9fffffff5f667c00
gr4: 0x6000000000e0b000
gr5: 0x9fffffff8adfe2e0
gr6: 0x9fffffff8ada9000
gr7: 0x9fffffff8ad7a000
gr8: 0x1
gr9: 0x9fffffff8adfd0f0
gr10: 0
gr11: 0xc000000000000690
gr12: 0x9fffffff8adfd140
gr13: 0x6000000001681510
gr14: 0x9fffffffbf7d8e98
gr15: 0x1a
gr16: 0x60000000044dac60
gr17: 0x1f
gr18: 0
gr19: 0x9fffffff8ad023f0
gr20: 0x9fffffff8adfd0e0
gr21: 0x60000000044dac70
gr22: 0x9fffffff5f668000
gr23: 0xd
gr24: 0x1
gr25: 0xc0000000004341f0
gr26: NaT
gr27: 0x63
gr28: 0xc00000000c5f801c
gr29: 0xc00000000029db20
gr30: 0xc00000000029db20
gr31: 0x288
gr32: 0x60000000044796d0
gr33: 0x6000000001a78910
gr34: 0x7e
gr35: 0x6000000001d03a90
gr36: 0x9fffffffbf716588
gr37: 0xc000000000000c9d
gr38: 0xc00000000c0c4f70
gr39: 0x9
gr40: 0x6000000004479b60
gr41: 0x6000000004479b58
gr42: 0xc00000000c5f8000
gr43: 0x9fffffffbf7144e0
gr44: 0x6000000004479b50
gr45: 0x6000000004479b68
gr46: 0x6000000001d03a90
gr47: 0x5
gr48: 0x6000000001a7892c
gr49: 0x9fffffff8adfe110
gr50: 0xc000000000000491
gr51: 0xc00000000c0c5520
gr52: 0xc00000000c07dd10
gr53: 0x9fffffff8adfe120
gr54: 0x9fffffff8adfe0a0
gr55: 0xc00000000000058e
gr56: 0xc00000000042be40
gr57: 0x39
gr58: 0x3
gr59: 0x33
gr60: 0
gr61: 0x9fffffffbf7d2598
gr62: 0x8
gr63: 0x9fffffffbf716588
gr64: 0xc000000000000f22
gr65: 0xc00000000c0c5610
This is an update to my previous post. Since I was furnished a copy
of the core file, I used gdb to examine the core file and executed
the following command:
1) bt
2) frame n <- the frame where the abort occurred
3) disas
And here are the results.
(gdb) bt
#0 0xc0000000001e5350:0 in _lwp_kill+0x30 ()
from /usr/lib/hpux64/libpthread.so.1
#1 0xc00000000014c7b0:0 in pthread_kill+0x9d0 ()
from /usr/lib/hpux64/libpthread.so.1
#2 0xc0000000002e4080:0 in raise+0xe0 () from /usr/lib/hpux64/libc.so.1
#3 0xc0000000003f47f0:0 in abort+0x170 () from /usr/lib/hpux64/libc.so.1
#4 0xc00000000e65e0d0:0 in os::abort ()
at /CLO/Components/JAVA_HOTSPOT/Src/src/os/hp-ux/vm/os_hp-ux.cpp:2033
#5 0xc00000000eb473e0:0 in VMError::report_and_die ()
at /CLO/Components/JAVA_HOTSPOT/Src/src/share/vm/utilities/vmError.cpp:1008
#6 0xc00000000e66fc90:0 in os::Hpux::JVM_handle_hpux_signal ()
at /CLO/Components/JAVA_HOTSPOT/Src/src/os_cpu/hp-ux_ia64/vm/os_hp-ux_ia64.cpp:1051
#7 <signal handler called>
#8 0xc00000000c0c5670:1 in TFMTrace::PrintTrace () at tfmtrace.cpp:1064
#9 0xc00000000c0c4f70:0 in FMLogger::WriteLog () at fmlogger.cpp:90
...
(gdb) frame 8
#8 0xc00000000c0c5670:1 in TFMTrace::PrintTrace () at tfmtrace.cpp:1064
1064 if( dwDataLen <= dwViewLen ) {
Current language: auto; currently c++
(gdb) disas $pc-16*4 $pc+16*4
...
0xc00000000c0c5660:0 <TFMTrace::PrintTrace(...)+0xd0> : ld4.a r27=[r48] MII,
;;; 1064 if( dwDataLen <= dwViewLen ) {
0xc00000000c0c5660:1 <TFMTrace::PrintTrace(...)+0xd1> : adds r28=28,r42
0xc00000000c0c5660:2 <TFMTrace::PrintTrace(...)+0xd2> : cmp.ne.unc p6=r0,r46;;
0xc00000000c0c5670:0 <TFMTrace::PrintTrace(...)+0xe0> : ld4.sa r26=[r28] MMI,
0xc00000000c0c5670:1 <TFMTrace::PrintTrace(...)+0xe1> : (p6) ld4 r31=[r28]
0xc00000000c0c5670:2 <TFMTrace::PrintTrace(...)+0xe2> : adds r46=24,r42;;
0xc00000000c0c5680:0 <TFMTrace::PrintTrace(...)+0xf0> : (p6) st4 [r35]=r31 MI,I
0xc00000000c0c5680:1 <TFMTrace::PrintTrace(...)+0xf1> : adds r59=36,r42;;
0xc00000000c0c5680:2 <TFMTrace::PrintTrace(...)+0xf2> : nop.i 0x0
0xc00000000c0c5690:0 <TFMTrace::PrintTrace(...)+0x100>: ld4.c.clr r27=[r48] MIB,
;;; 1066 dwLen = dwTrcLen ;
0xc00000000c0c5690:1 <TFMTrace::PrintTrace(...)+0x101>: cmp4.eq.unc p6,p8=99,r27
0xc00000000c0c5690:2 <TFMTrace::PrintTrace(...)+0x102>: nop.b 0x0;;
0xc00000000c0c56a0:0 <TFMTrace::PrintTrace(...)+0x110>: (p8) ld4.c.clr r26=[r28] MMI
;;; 1067 }
0xc00000000c0c56a0:1 <TFMTrace::PrintTrace(...)+0x111>: (p6) st4 [r48]=r47
0xc00000000c0c56a0:2 <TFMTrace::PrintTrace(...)+0x112>: cmp4.geu.unc p7=r26,r27
End of assemb
A "normal" crash in native code causes a report like this:
C [libc.so.6+0x88368] strstr+0x64a
Note small offset from the function (strstr in this case) to the crash point.
In your case, JVM decided that the address oxc00000000f675671 is inside libtracejni.so, but the closest function it could find is very far from the crash point (0x5065eff9 == 1.2 GB away).
Is your library really that big?
If it really is that big, chances are you have stripped it, and so the symbol _NZ10TFM07PrintTraceEPi doesn't actually have anything to do with the problem (which is in the code that is 1.2GB away).
You need to find out what code was really at address oxc00000000f675671 at the time of the crash. Usually hs_err_pid*.log contains a list of load addresses for all the shared libraries. Find the load address of libtracejni.so, subtract it from pc. That should give you an address similar to 0x400...675671 which you should be able to lookup in your unstripped version of libtracejni.so.
Also note that crash address ends with ASCII "C8G", which may or may not be a coincidence.
Update 2011/08/05.
Now you know which instruction crashed:
0x4000000000099670:1 <TFMTrace::PrintTrace(...)+0xe1>: (p6) ld4 r31=[r28]
This is a load of 4-byte integer from memory pointed by r28.
The next questions are: what is the value of r28 at crash point (should be logged in hs_err*.log), and also where did it come from (complete disassembly of TFM::PrintTrace will tell you that).