C++ backtrace with this=0x0 in various frames - c++
I have a program in a mips multicore system and I get a backtrace from core really hard to figure out (at least for me) , I suppose that maybe one of the other cores write to mem but not all the stack is corrupted what makes it more confusing for me.
In frame #2 this is NULL and in frame #0 this is NULL too (the cause of the core-dump).
This is (part) the backtrace:
#0 E::m (this=0x0, string=0x562f148 "", size=202) at E.cc:315
#1 0x00000000105c773c in P::e (this=0x361ecd00, string=0x562f148 "", size=202, offset=28) at P.cc:137
#2 0x00000000105c8c5c in M::e (this=0x0, id=7 '\a', r=2, string=0x562f148 "", size=202, oneClass=0x562f148 "", secondClass=0x14eff439 "",
offset=28) at M.cc:75
#3 0x0000000010596354 in m::find (this=0x4431fd70, string=0x562f148 "", size=202, oneClass=0x14eff438 "", secondClass=0x14eff439 "",
up=false) at A.cc:458
#4 0x0000000010597364 in A::trigger (this=0x4431fd70, triggerType=ONE, string=0x562f148 "", size=0, up=true) at A.cc:2084
#5 0x000000001059bcf0 in A::findOne (this=0x4431fd70, index=2, budget=0x562f148 "", size=202, up=true) at A.cc:1155
#6 0x000000001059c934 in A::shouldpathNow (this=0x4431fd70, index=2, budget=0x562f148 "", size=202, up=false, startAt=0x0, short=)
at A.cc:783
#7 0x00000000105a385c in A::shouldpath (this=0x4431fd70, index=2, rbudget=, rsize=, up=false,
direct=) at A.cc:1104
About the m::find function
442 m_t m::find(unsigned char const *string, unsigned int size,
443 hClass_t *hClass, h_t *fHClass,
444 bool isUp) {
445
446
447 const Iterator &it=arr_[getIndex()]->getSearchIterator((char const*)value, len);
448
449 unsigned int const offset = value - engine_->getData();
450 451 int ret=UNKNOWN;
452 M *p;
453 for(const void* match=it.next();
454 ret == UNKNOWN && match != NULL;
455 match = it.next()){
456 p = (M*)match;
457 if(p->needMore()){
458 ret = p->e(id_, getIndex(), value, len, hClass, fHClass, offset);
this=0x0 can actually happen pretty easily. For example:
E *instance = NULL;
instance->method();
this will be NULL within method.
There's no need to assume that the memory has been corrupted or the stack has been overwritten. In fact, if the rest of the stack's contents seem to make sense (and you seem to think that they do), then the stack is probably fine.
Instead of necessarily looking for memory corruption, check your logic to see if you have an uninitialized (NULL) pointer or reference.
Not being able to see all the code, its kind-of difficult to imagine what's happening. Could you also add the code for M::e() and P::e() or at least the important parts.
Something that might just solve everything is to add a NULL check, as follows in m::find():
456 p = (M*)match;
if(!p) { return; /* or do whatever */ }
457 if(p->needMore()){
458 ret = p->e(id_, getIndex(), value, len, hClass, fHClass, offset);
If p were NULL, I would have expected it to have crashed calling p->needMore(), but depending on what that method does, it may not crash.
Related
in multithread program does bt a coredump always gives the culprit thread?
this is a little bit general question, I have a segfault in a multithreaded program, and bt coredump shows below, (gdb) bt full #0 0x0000000000441540 in try_dequeue<std::shared_ptr<Frame> > (item=<synthetic pointer>, this=0xbe3c50) at /root/projects/active/user/include/third_party/concurrentqueue.h:1111 nonEmptyCount = 0 best = 0x0 bestSize = 0 #1 ConsumerNice::listening_nice (this=0xbe3c40) at /root/projects/active/user/include/concurrency/consumer_nice.h:45 frame = std::shared_ptr (empty) 0x0 #2 0x00000000004c0530 in execute_native_thread_routine () No symbol table info available. #3 0x00007f3eb3f81e65 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x00007f3ead70a88d in clone () from /lib64/libc.so.6 No symbol table info available. So I go to look at the source code, my code as below void listening_nice() { while (true) { std::shared_ptr<Frame> frame; if (nice_queue.try_dequeue(frame)) { on_frame_nice(frame); } } } and cameron314/concurrentqueue part look like below, bool try_dequeue(U& item) { // Instead of simply trying each producer in turn (which could cause needless contention on the first // producer), we score them heuristically. size_t nonEmptyCount = 0; ProducerBase* best = nullptr; size_t bestSize = 0; for (auto ptr = producerListTail.load(std::memory_order_acquire); nonEmptyCount < 3 && ptr != nullptr; ptr = ptr->next_prod()) { auto size = ptr->size_approx(); if (size > 0) { if (size > bestSize) { bestSize = size; best = ptr; } ++nonEmptyCount; } } It doesnt seem possible to cause segfault, therefore I am wondering, is bt always show the culprit thread? or there is a chance segfault is caused by some other problem in some other thread, or even the operating system? Noted this program is running on 3 same configured machine, but only one machine crashes once a day, that is it runs for 3 straight hours on that one machine, then crashed.
cygwin exception when assigning value to vector of strings
I am having following exception during the course of the run of program: 0 [main] myFunction 5560 cygwin_exception::open_stackdumpfile: Dumping stack trace to myFunction.exe.stackdump The contents of stackdump file are as follows: Stack trace: Frame Function Args 00000223800 0018006FB93 (0060007AE38, 00600083EC8, 00600083EF8, 00600083F28) 00000000006 0018007105A (0060007BB78, 00600000000, 0000000014C, 00000000000) 000002239E0 0018011C6A7 (00600083048, 00600083078, 006000830A8, 006000830D8) 00000000041 001801198DE (0060007DCB8, 0060007DCE8, 00000000000, 0060007DD48) 0060008F2B0 00180119DAB (0060007E1F8, 0060007E228, 0060007E258, 00000000006) 0060008F2B0 00180119F7C (0060007CB38, 0060007CB68, 0060007CB98, 0060007CBC8) 0060008F2B0 0018011A23F (00180115A0B, 0060007CCE8, 006000885B0, 00000000000) 0060008F2B0 00180148A65 (003FC4AA93D, 00600083900, 00100439102, 0060007B080) 0060008F2B0 001800C1DB3 (00000000000, 00000223EE0, 0010042A2BC, 00000223E90) 0060008F2B0 00180115A0B (00000223EE0, 0010042A2BC, 00000223E90, 00000000017) 0060008F2B0 00600000001 (00000223EE0, 0010042A2BC, 00000223E90, 00000000017) End of stack trace Let me describe in detail the peculiar problem which happens at runtime. I am not able to describe the problem with just words, so I am listing scenario when the program works and when it fails. I have created a vector of string in my header file and initialised them in the constructor as follows : std::vector <std::string> symbolMap,localSymbolMap; for(int i=0;i<100;i++){ symbolMap.push_back(" "); localSymbolMap.push_back(" "); } I have defined a function to assign appropriate value to these variables later in the program as follows : void TestClient::setTickerMap(int j, std::string symbol, std::string localSymbol){ symbolMap[j] = symbol; localSymbolMap[j]=localSymbol; } Now, in the main program, I call this function as follows: TestClient client; for(int j=0;j<27;j++){ std::cout<<j<<" "<<realTimeSymbols[j]<<" "<<getLocalSymbol(realTimeSymbols[j],date)<<std::endl; client.setTickerMap(j,realTimeSymbols[j],getLocalSymbol(realTimeSymbols[j],date)); } // Here, I have checked for each j, that values of realTimeSymbols and getLocalSymbol are proper. When I run the program, I get the error described above. The program always crashed when j is equal to 24. Now the following workaround is working as of now: void TestClient ::setTickerMap(int j, std::string symbol, std::string localSymbol){ if(j==24){ // symbolMap[j]="SYNDIBANK"; // localSymbolMap[j]="SYNDIBANK15MARFUT"; } else{ symbolMap[j] = symbol; localSymbolMap[j]=localSymbol; } if(j==1){ symbolMap[24]="SYNDIBANK"; localSymbolMap[24]="SYNDIBANK15MARFUT"; } } Following 3 variations of the code are above workaround are not working and they result in the original error: Variation 1: void TestClient ::setTickerMap(int j, std::string symbol, std::string localSymbol){ if(j==24){ // symbolMap[j]="SYNDIBANK"; // localSymbolMap[j]="SYNDIBANK15MARFUT"; } else{ symbolMap[j] = symbol; localSymbolMap[j]=localSymbol; } if(j==25){ symbolMap[24]="SYNDIBANK"; localSymbolMap[24]="SYNDIBANK15MARFUT"; } } Variation 2: void TestClient ::setTickerMap(int j, std::string symbol, std::string localSymbol){ if(j==24){ symbolMap[j]="SYNDIBANK"; localSymbolMap[j]="SYNDIBANK15MARFUT"; } else{ symbolMap[j] = symbol; localSymbolMap[j]=localSymbol; } } Variation 3: void TestClient ::setTickerMap(int j, std::string symbol, std::string localSymbol){ if(j==24){ symbolMap[j]="AB"; localSymbolMap[j]="SYNDIBANK15MARFUT"; } else{ symbolMap[j] = symbol; localSymbolMap[j]=localSymbol; } } Now, if I assign a single character to symbolMap in variation 3 as follows : symbolMap[j]="A"; then the code is able to run(although is the result is not correct). I am not able to figure what exactly is causing this runtime error. I have checked the related question (Cygwin Exception : open stack dump file) and I do not have a separate session of cygwin running. I have restarted my PC just be extra sure. Still the problem persists. Any suggestions as to why this behaviour is seen on my PC. UPDATE: To be sure that the error is not related to out-of-index, the following call from main program works fine: TestClient client; for(int j=25;j<27;j++){ std::cout<<j<<" "<<realTimeSymbols[j]<<" "<<getLocalSymbol(realTimeSymbols[j],date)<<std::endl; client.setTickerMap(j,realTimeSymbols[j],getLocalSymbol(realTimeSymbols[j],date)); } The program also works fine when j is iterated from 24 to 27. But fails when the loop is iterated from any number before 24 to 27. GDB OUTPUT I do not have much experience with gdb but following is the output of the gdb if it helps: GNU gdb (GDB) 7.8 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-cygwin". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from order_trading2632_limit.exe...done. (gdb) run Starting program: /cygdrive/e/eclipse_workspace/testClient/Debug/testClient.exe [New Thread 4832.0x11e4] [New Thread 4832.0x1798] Attempt 1 of 10000 [New Thread 4832.0x1020] Connection successful Program received signal SIGABRT, Aborted. 0x00000003fc4ab0e3 in cygstdc++-6!_ZNSs6assignERKSs () from /usr/bin/cygstdc++-6.dll (gdb) bt #0 0x00000003fc4ab0e3 in cygstdc++-6!_ZNSs6assignERKSs () from /usr/bin/cygstdc++-6.dll #1 0x0000000000000000 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) set $pc=*(void **)$rsp (gdb) set $rsp=$rsp+8 (gdb) bt #0 0x000007fefd3110ac in WaitForSingleObjectEx () from /cygdrive/c/Windows/system32/KERNELBASE.dll #1 0x000000018011c639 in sig_send(_pinfo*, siginfo_t&, _cygtls*) () from /usr/bin/cygwin1.dll #2 0x00000001801198de in _pinfo::kill(siginfo_t&) () from /usr/bin/cygwin1.dll #3 0x0000000180119dab in kill0(int, siginfo_t&) () from /usr/bin/cygwin1.dll #4 0x0000000180119f7c in raise () from /usr/bin/cygwin1.dll #5 0x000000018011a23f in abort () from /usr/bin/cygwin1.dll #6 0x0000000180148a65 in dlfree () from /usr/bin/cygwin1.dll #7 0x00000001800c1db3 in free () from /usr/bin/cygwin1.dll #8 0x0000000180115a0b in _sigfe () from /usr/bin/cygwin1.dll #9 0x0000000000000000 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) Note that stack trace is corrupted and I have used trick from following question to print the stacktrace (GDB corrupted stack frame - How to debug?). Please help me in debugging the program further. UPDATE It is not the case that the error happens only when index is 24. Before calling the said loop, I initialize various arrays of int, double and string. Changing the number of initialization affects the index when this error happens. Today, I initialised vectors of length 24 before running this loop, this time the error happened at index 3. This is really frustrating to implement the workaround. I do not that if there are some other memory issues I am overlooking because of this. Please offer suggestions. CODE int main(int argc, char** argv) { unsigned int port = 7900; const char* host = ""; int clientId = 6; int attempt = 0; int MAX_ATTEMPTS=10000; int NUMREALTIMESYMBOLS=37; std::string realTimeSymbolsArr[]={"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","aa","bb","cc","dd","ee","ff","gg","hh","ii","jj","kk"}; std::vector <std::string> realTimeSymbols(realTimeSymbolsArr,realTimeSymbolsArr+NUMREALTIMESYMBOLS); int isTradeable[]={1,0,0,0,1,0,1,1,1,1,1,0,0,1,0,1,1,0,0,0,0,1,0,0,0,0,0,1,1,1,0,1,0,1,1,0,1}; int numSubscriptions[]={2,2,1,4,1,1,2,6,3,1,1,1,2,1,3,1,1,2,1,3,1,1,1,10,4,1,6,1,1,9,4,2,1,3,1,1,2}; int subscriptionList[NUMREALTIMESYMBOLS][100]; int subscriptionIndex[NUMREALTIMESYMBOLS][100]; subscriptionList[0][0]=0;subscriptionIndex[0][0]=0; subscriptionList[0][1]=2;subscriptionIndex[0][1]=2; subscriptionList[1][0]=2;subscriptionIndex[1][0]=1; subscriptionList[1][1]=0;subscriptionIndex[1][1]=3; subscriptionList[2][0]=2;subscriptionIndex[2][0]=0; subscriptionList[3][0]=4;subscriptionIndex[3][0]=2; subscriptionList[3][1]=31;subscriptionIndex[3][1]=2; subscriptionList[3][2]=13;subscriptionIndex[3][2]=3; subscriptionList[3][3]=34;subscriptionIndex[3][3]=3; subscriptionList[4][0]=4;subscriptionIndex[4][0]=0; subscriptionList[5][0]=9;subscriptionIndex[5][0]=2; subscriptionList[6][0]=6;subscriptionIndex[6][0]=0; subscriptionList[6][1]=8;subscriptionIndex[6][1]=2; subscriptionList[7][0]=7;subscriptionIndex[7][0]=0; subscriptionList[7][1]=8;subscriptionIndex[7][1]=1; subscriptionList[7][2]=31;subscriptionIndex[7][2]=1; subscriptionList[7][3]=36;subscriptionIndex[7][3]=1; subscriptionList[7][4]=13;subscriptionIndex[7][4]=2; subscriptionList[7][5]=34;subscriptionIndex[7][5]=2; subscriptionList[8][0]=8;subscriptionIndex[8][0]=0; subscriptionList[8][1]=7;subscriptionIndex[8][1]=1; subscriptionList[8][2]=21;subscriptionIndex[8][2]=3; subscriptionList[9][0]=9;subscriptionIndex[9][0]=0; subscriptionList[10][0]=10;subscriptionIndex[10][0]=0; subscriptionList[11][0]=11;subscriptionIndex[11][0]=0; subscriptionList[12][0]=28;subscriptionIndex[12][0]=3; subscriptionList[12][1]=33;subscriptionIndex[12][1]=3; subscriptionList[13][0]=13;subscriptionIndex[13][0]=0; subscriptionList[14][0]=33;subscriptionIndex[14][0]=1; subscriptionList[14][1]=28;subscriptionIndex[14][1]=2; subscriptionList[14][2]=15;subscriptionIndex[14][2]=3; subscriptionList[15][0]=15;subscriptionIndex[15][0]=0; subscriptionList[16][0]=16;subscriptionIndex[16][0]=0; subscriptionList[17][0]=0;subscriptionIndex[17][0]=1; subscriptionList[17][1]=11;subscriptionIndex[17][1]=2; subscriptionList[18][0]=7;subscriptionIndex[18][0]=2; subscriptionList[19][0]=6;subscriptionIndex[19][0]=3; subscriptionList[19][1]=8;subscriptionIndex[19][1]=3; subscriptionList[19][2]=16;subscriptionIndex[19][2]=3; subscriptionList[20][0]=9;subscriptionIndex[20][0]=1; subscriptionList[21][0]=21;subscriptionIndex[21][0]=0; subscriptionList[22][0]=9;subscriptionIndex[22][0]=3; subscriptionList[23][0]=6;subscriptionIndex[23][0]=1; subscriptionList[23][1]=10;subscriptionIndex[23][1]=1; subscriptionList[23][2]=27;subscriptionIndex[23][2]=1; subscriptionList[23][3]=29;subscriptionIndex[23][3]=1; subscriptionList[23][4]=16;subscriptionIndex[23][4]=2; subscriptionList[23][5]=21;subscriptionIndex[23][5]=2; subscriptionList[23][6]=2;subscriptionIndex[23][6]=3; subscriptionList[23][7]=4;subscriptionIndex[23][7]=3; subscriptionList[23][8]=7;subscriptionIndex[23][8]=3; subscriptionList[23][9]=36;subscriptionIndex[23][9]=3; subscriptionList[24][0]=24;subscriptionIndex[24][0]=0; subscriptionList[24][1]=24;subscriptionIndex[24][1]=1; subscriptionList[24][2]=24;subscriptionIndex[24][2]=2; subscriptionList[24][3]=24;subscriptionIndex[24][3]=3; subscriptionList[25][0]=29;subscriptionIndex[25][0]=3; subscriptionList[26][0]=21;subscriptionIndex[26][0]=1; subscriptionList[26][1]=0;subscriptionIndex[26][1]=2; subscriptionList[26][2]=10;subscriptionIndex[26][2]=2; subscriptionList[26][3]=15;subscriptionIndex[26][3]=2; subscriptionList[26][4]=27;subscriptionIndex[26][4]=2; subscriptionList[26][5]=33;subscriptionIndex[26][5]=2; subscriptionList[27][0]=27;subscriptionIndex[27][0]=0; subscriptionList[28][0]=28;subscriptionIndex[28][0]=0; subscriptionList[29][0]=29;subscriptionIndex[29][0]=0; subscriptionList[29][1]=4;subscriptionIndex[29][1]=1; subscriptionList[29][2]=13;subscriptionIndex[29][2]=1; subscriptionList[29][3]=16;subscriptionIndex[29][3]=1; subscriptionList[29][4]=34;subscriptionIndex[29][4]=1; subscriptionList[29][5]=6;subscriptionIndex[29][5]=2; subscriptionList[29][6]=36;subscriptionIndex[29][6]=2; subscriptionList[29][7]=27;subscriptionIndex[29][7]=3; subscriptionList[29][8]=31;subscriptionIndex[29][8]=3; subscriptionList[30][0]=30;subscriptionIndex[30][0]=0; subscriptionList[30][1]=30;subscriptionIndex[30][1]=1; subscriptionList[30][2]=30;subscriptionIndex[30][2]=2; subscriptionList[30][3]=30;subscriptionIndex[30][3]=3; subscriptionList[31][0]=31;subscriptionIndex[31][0]=0; subscriptionList[31][1]=29;subscriptionIndex[31][1]=2; subscriptionList[32][0]=11;subscriptionIndex[32][0]=3; subscriptionList[33][0]=33;subscriptionIndex[33][0]=0; subscriptionList[33][1]=15;subscriptionIndex[33][1]=1; subscriptionList[33][2]=28;subscriptionIndex[33][2]=1; subscriptionList[34][0]=34;subscriptionIndex[34][0]=0; subscriptionList[35][0]=11;subscriptionIndex[35][0]=1; subscriptionList[36][0]=36;subscriptionIndex[36][0]=0; subscriptionList[36][1]=10;subscriptionIndex[36][1]=3; double a1[]={720,0.0,750,0.0,900,0.0,760,360,120,390,600,360,0.0,760,0.0,140,660,0.0,0.0,0.0,0.0,720,0.0,0.0,100,0.0,0.0,120,320,40,100,500,0.0,630,570,0.0,100}; double a2[]={0.5,0.0,1.3,0.0,0.6,0.0,0.45,0.15,0.45,0.4,0.25,1.4,0.0,0.55,0.0,0.2,0.8,0.0,0.0,0.0,0.0,0.6,0.0,0.0,0.4,0.0,0.0,0.25,0.4,0.25,0.4,0.35,0.0,0.4,0.5,0.0,0.4}; double a3[]={1350,0.0,1250,0.0,300,0.0,1150,1400,900,1200,850,900,0.0,600,0.0,1450,1450,0.0,0.0,0.0,0.0,1000,0.0,0.0,1200,0.0,0.0,1150,350,1400,1200,1350,0.0,1500,300,0.0,1200}; double a4[]={0.6,0.0,0.7,0.0,0.2,0.0,0.3,0.55,0.4,0.8,0.25,0.7,0.0,0.25,0.0,0.55,0.5,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.7,0.0,0.0,0.65,0.55,0.45,0.7,0.6,0.0,0.4,0.4,0.0,0.7}; double a5[]={300,0.0,1300,0.0,1350,0.0,200,1100,1200,650,1500,1350,0.0,1050,0.0,1300,550,0.0,0.0,0.0,0.0,250,0.0,0.0,150,0.0,0.0,1250,700,1150,150,1250,0.0,1500,1500,0.0,150}; double a6[]={0.3,0.0,0.8,0.0,0.6,0.0,0.5,0.6,0.6,0.3,0.35,0.7,0.0,0.55,0.0,0.45,0.35,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.5,0.0,0.0,0.55,0.3,0.35,0.5,0.75,0.0,0.2,0.5,0.0,0.5}; double a7[]={1500,0.0,1500,0.0,1050,0.0,750,1100,1350,1350,100,1350,0.0,550,0.0,1400,1000,0.0,0.0,0.0,0.0,1000,0.0,0.0,1350,0.0,0.0,350,550,350,1350,500,0.0,1350,1250,0.0,1350}; double a8[]={0.9,0.0,0.9,0.0,0.8,0.0,0.6,0.35,0.7,0.2,0.15,0.7,0.0,0.3,0.0,0.55,0.5,0.0,0.0,0.0,0.0,0,0.0,0.0,0.3,0.0,0.0,0.4,0.3,0.5,0.3,0.35,0.0,0.5,0.5,0.0,0.3}; double a9[]={0.008,0.0,0.009,0.0,0.01,0.0,0.01,0.009,0.009,0.007,0.009,0.01,0.0,0.009,0.0,0.01,0.008,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.006,0.0,0.0,0.008,0.009,0.01,0.006,0.009,0.0,0.008,0.009,0.0,0.006}; double a10[]={0.008,0.0,0.009,0.0,0.008,0.0,0.008,0.008,0.009,0.008,0.008,0.006,0.0,0.009,0.0,0.01,0.008,0.0,0.0,0.0,0.0,0.005,0.0,0.0,0.009,0.0,0.0,0.01,0.008,0.009,0.009,0.009,0.0,0.008,0.01,0.0,0.009}; double a11[]={0.4,0.0,0.2,0.0,0.1,0.0,0.3,0.4,0.2,0.1,0.7,0.2,0.0,0,0.0,0.1,0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.2,0.0,0.0,0.3,0,0.2,0.2,0,0.0,0.7,0.1,0.0,0.2}; int a12[]={500,1000,8000,2000,4000,1000,1250,1000,1000,500,1000,125,2000,4000,1000,250,2000,250,1000,1250,500,2000,1000,500,0,250,500,4000,4000,1250,0,2000,500,500,4000,125,1000}; double a13[]={0.0013406,0.0020022,0.0018709,0.0018948,0.0017975,0.0014687,0.0011068,0.001355,0.0010891,0.00088151,0.0014294,0.0012989,0.0014205,0.0019711,0.0015365,0.0020505,0.0018961,0.00078672,0.0023114,0.0012203,0.0012849,0.0015674,0.0012844,0.0014197,0.0,0.00074657,0.00096164,0.0017109,0.0015385,0.00068178,0.0,0.0021815,0.00087359,0.00074349,0.0021645,0.001595,0.0014573}; int a14[]={14850,0,16500,0,13740,0,13740,24750,13740,14100,13740,30750,0,14400,0,13740,13740,0,0,0,0,14100,0,0,13500,0,0,13740,13740,25200,13500,13740,0,13740,13740,0,13740}; int a15[]={30750,0,35900,0,34950,0,35900,35900,35900,35900,30000,34950,0,26250,0,34000,35900,0,0,0,0,34500,0,0,13500,0,0,35900,34650,35900,13500,32700,0,35900,35900,0,33300}; Client client; for (int i = 0; i < MAX_ATTEMPTS; i++) { client.connect(host, port, clientId); ++attempt; std::cout << "Attempt " << attempt << " of " << MAX_ATTEMPTS<< std::endl; for (int j=0;j<NUMREALTIMESYMBOLS;j++){ if(j==24 || j==30) continue; std::cout<<j<<" "<<realTimeSymbols[j]<<" "<<getLocalSymbol(realTimeSymbols[j],date)<<std::endl; client.setTickerMap(j,realTimeSymbols[j],getLocalSymbol(realTimeSymbols[j],date)); } } } Constructor of Client: Client::Client(){ for(int i=0;i<50;i++){ symbolMap.push_back(" "); localSymbolMap.push_back(" "); } } The above code fails at 24 and 30. Hence, the loop to continue when j is 24 or 30 as workaround.
std::string realTimeSymbolsArr[]={"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","aa","bb","cc","dd","ee","ff","gg","hh","ii","jj","kk"}; subscriptionList[24][1]=24;subscriptionIndex[24][1]=1; subscriptionList[24][2]=24;subscriptionIndex[24][2]=2; subscriptionList[24][3]=24;subscriptionIndex[24][3]=3; subscriptionList[30][1]=30;subscriptionIndex[30][1]=1; subscriptionList[30][2]=30;subscriptionIndex[30][2]=2; subscriptionList[30][3]=30;subscriptionIndex[30][3]=3; You have not posted the source for getLocalSymbol, but I must assume that it also uses the same data and has a flow of the following form: getLocalSymbol(a, b) { int i, j, old_i, old_j; std::string value; // Derive i and j from the parameters // ... // And build the String do { old_i = i; old_j = j; i = subscriptionList[old_i][old_j]; j = subscriptionIndex[old_i][old_j]; value += realTimeSymbolsArr[i]; } while(j > 0); return value; } Got it right? That control flow, or something equivalent, appears to be part of it, either way. This goes well for almost all values of i and j - except for the aforementioned values of 24 and 30 for i, and 1 to 3 for j. With these values, i and j remain the same in every iteration and value becomes longer and longer, until eventually something breaks on the stack which overwrites both j (and thereby causes the loop to terminate) and corrupts value. Either way, the std::string you returned is now corrupted as you exceeded some limit during that endless loop. As for how to solve it, fix that infinite loop and fix your data. For fixing the loop, limit the iteration count. For fixing your data, well, now that you know why the data is causing the bug, you should be able to figure yourself how to fix it. Remember, you have to fix BOTH. If you don't fix the data, you will get an unreasonable long return value. And if you don't fix the iteration limit, it will crash again as soon as someone repeats a similar mistake when updating the data.
parameter value lost after call new_allocator in c++
I meet a strange behavior for a c++11 program, and can not figure out what is going wrong. please gave me some advises. thanks. basically, it is a OpenCL program. struct memory_layout { public: memory_layout(managed_device d); scalar<int> s; }; memory_layout::memory_layout(managed_device d) : s(d) { } class doer { public: doer(); void go(); private: managed_device dev; memory_layout mem; }; doer::doer(): dev(find_CPU()), mem(dev) { } void doer::go() { task t = copy(10,mem.s); } int main(){ doer d; d.go(); return 0; } when it runs to copy function, it has "Segmentation Fault". Here is the def of copy: template <typename T> task copy(const T& source, scalar<T>& sink, const std::vector<task>& deps = {} ) { return sink.device().create_task( profile::copy<T>(source, sink), deps ); } When I use gdb to debug: Breakpoint 1, doer::go (this=0x7fffffffdc90) at main.cpp:79 79 task t = copy(10,mem.s); // device() original be 0x60f0d0 (gdb) p mem.s.device() $1 = (cppcl::opencl_1_2::device::managed_device &) #0x7fffffffdc60: {_device = 0x60f0d0} (gdb) s std::vector<unsigned long, std::allocator<unsigned long> >::vector (this=0x7fffffffdc50) at /usr/include/c++/4.8.3/bits/stl_vector.h:249 249 : _Base() { } (gdb) std::_Vector_base<unsigned long, std::allocator<unsigned long> >::_Vector_base (this=0x7fffffffdc50) at /usr/include/c++/4.8.3/bits/stl_vector.h:125 125 : _M_impl() { } (gdb) std::_Vector_base<unsigned long, std::allocator<unsigned long> >::_Vector_impl::_Vector_impl (this=0x7fffffffdc50) at /usr/include/c++/4.8.3/bits/stl_vector.h:87 87 : _Tp_alloc_type(), _M_start(0), _M_finish(0), _M_end_of_storage(0) (gdb) std::allocator<unsigned long>::allocator (this=0x7fffffffdc50) at /usr/include/c++/4.8.3/bits/allocator.h:113 113 allocator() throw() { } (gdb) __gnu_cxx::new_allocator<unsigned long>::new_allocator (this=0x7fffffffdc50) at /usr/include/c++/4.8.3/ext/new_allocator.h:80 warning: Source file is more recent than executable. 80 (gdb) std::_Vector_base<unsigned long, std::allocator<unsigned long> >::_Vector_impl::_Vector_impl (this=0x7fffffffdc50) at /usr/include/c++/4.8.3/bits/stl_vector.h:88 88 { } (gdb) cppcl::opencl_1_2::device::copy<int> (source=#0x7fffffffdc6c: 10, sink=..., deps=std::vector of length 0, capacity 0) at /usr/include/cppcl/1.2/device/buffer_templates.h:1233 warning: Source file is more recent than executable. 1233 return sink.device().create_task( profile::copy<T>(source, sink), deps ); (gdb) p sink.device() $2 = (cppcl::opencl_1_2::device::managed_device &) #0x7fffffffdc60: {_device = 0x0} after I step into the copy function, it first build the "deps" parameter, and then, the _device value changed to 0x0. I could not figure out why this happy? thanks for giving me some suggestions.
I'm assuming that you're not asking what's wrong with your code, that you're only asking how to figure out yourself what's wrong with your code. Otherwise, there's not enough information in your question. This is a good first step in debugging. You've found clear indication that one value in memory is being changed. You've found a concrete object managed_device at address 0x7fffffffdc60 that contains a value that gets changed somehow. Let me use a simple complete program: #include <stdio.h> int *p; void f() { ++*p; } int main() { int i = 3; p = &i; printf("%d\n", i); // i is 3 here. f(); printf("%d\n", i); // Huh? i is 4 here. } Now, of course it is completely and utterly obvious why i changes in this program, but let's suppose that I completely overlooked it anyway. If I set a breakpoint on line 13 (the call to f), and inspect i, I see that it is still 3. Breakpoint 1, main () at test.cc:13 13 f(); (gdb) p i $1 = 3 No surprise there. And I've already determined that the value will at some unknown point in the future get changed, I just don't know when. I can now use the watch instruction to monitor that variable for changes: (gdb) watch i Hardware watchpoint 2: i and then continue execution: (gdb) cont Continuing. Hardware watchpoint 2: i Old value = 3 New value = 4 f () at test.cc:7 7 } (gdb) bt #0 f () at test.cc:7 #1 0x004011e9 in main () at test.cc:13 Now, I have seen that the code that modified i was just before the closing brace in f. This is what you'll need to do with your own code. It'll be a bit more complex than in this simple example, but you should be able to use it for your own code as well.
ByteSize() with in Google protocol buffer
now I develop the test code using GPB in qnx as follows: Offer_event Offer; string a = "127.0.0.7"; Offer.set_ipaddress(a); Offer.set_port(9000); BufSize = Offer.ByteSize(); Length_message = BufSize + Message_Header_Size; Message->PayloadLength_of_Payload = BufSize; PayloadBuffer = new char[BufSize]; Offer.SerializeToArray(PayloadBuffer, BufSize); in that case, I met some errors. but I cannot understand it. that error is as follows: #0 std::string::size (this=0xcd21c0) at /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntoarmeabi/arm-unknown-nto-qnx6.5.0eabi/pic/libstdc++-v3/include/bits/basic_string.h:624 624 /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntoarmeabi/arm-unknown- nto-qnx6.5.0eabi/pic/libstdc++-v3/include/bits/basic_string.h: No such file or d irectory. in /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntoarmeabi/arm-unkno wn-nto-qnx6.5.0eabi/pic/libstdc++-v3/include/bits/basic_string.h (gdb) bt #0 std::string::size (this=0xcd21c0) at /home/builder/hudson/650-gcc-4.4/svn/linux-x86-o-ntoarmeabi/arm-unknown-n to-qnx6.5.0eabi/pic/libstdc++-v3/include/bits/basic_string.h:624 #1 0x0067d6b0 in google::protobuf::internal::WireFormatLite::StringSize () #2 0x0063ecd0 in Offer_event::ByteSize () #3 0x00404f18 in AnalysisCmdC_Actor::TestGPB () from C:/QNX650/target/qnx6/armle-v7/lib/libc.so.3 #11 0x0004201a in ?? () Cannot access memory at address 0x0 Current language: auto; currently c++ (gdb) I don't know why the ByteSize has a problem. If i delete the string part, it works well. I think usage of string is problem. what's the problem?
Identifying crash with hs_err_pid*.log and gdb
Update Sept. 12, 2011 I was able to get the core file and immediately dissabled the instruction that crashed. As per advice I tracked the value of r28 (by the way, no registry entry was log to hs_err_pid*.log) and check where did the value come from (see below w/ <---). However, I was not able to determine the value of r32. Could the reason for the miss-alignment is that r28 is a 8-byte integer loaded to a 4-byte integer r31? ;;; 1053 if( Transfer( len ) == FALSE ) { 0xc00000000c0c55c0:2 <TFM::PrintTrace(..)+0x32>: adds r44=0x480,r32;; <--- 0xc00000000c0c55d0:0 <TFM::PrintTrace(..)+0x40>: ld8 r43=[ret2] 0xc00000000c0c55d0:1 <TFM::PrintTrace(..)+0x41>: (p6) st4 [r35]=ret3 0xc00000000c0c55d0:2 <TFM::PrintTrace(..)+0x42>: adds r48=28,r33 0xc00000000c0c55e0:0 <TFM::PrintTrace(..)+0x50>: mov ret0=0;; 0xc00000000c0c55e0:1 <TFM::PrintTrace(..)+0x51>: ld8.c.clr r62=[r45] 0xc00000000c0c55e0:2 <TFM::PrintTrace(..)+0x52>: cmp.eq.unc p6,p1=r0,r62 ;;; 1056 throw MutexLock ; 0xc00000000c0c55f0:0 <TFM::PrintTrace(..)+0x60>: nop.m 0x0 0xc00000000c0c55f0:1 <TFM::PrintTrace(..)+0x61>: nop.m 0x0 0xc00000000c0c55f0:2 <TFM::PrintTrace(..)+0x62>: (p6) br.cond.dpnt.many _NZ10TFM07PrintTraceEPi+0x800;; ;;; 1057 } 0xc00000000c0c5600:0 <TFM::PrintTrace(..)+0x70>: adds r41=0x488,r32 0xc00000000c0c5600:1 <TFM::PrintTrace(..)+0x71>: adds r40=0x490,r32 0xc00000000c0c5600:2 <TFM::PrintTrace(..)+0x72>: br.call.dptk.many rp=0xc00000000c080620;; ;;; 1060 dwDataLen = len ; 0xc00000000c0c5610:0 <TFM::PrintTrace(..)+0x80>: ld8 r16=[r44] <--- 0xc00000000c0c5610:1 <TFM::PrintTrace(..)+0x81>: mov gp=r36 0xc00000000c0c5610:2 <TFM::PrintTrace(..)+0x82>: (p1) mov r62=8;; 0xc00000000c0c5620:0 <TFM::PrintTrace(..)+0x90>: cmp.eq.unc p6=r0,r16 0xc00000000c0c5620:1 <TFM::PrintTrace(..)+0x91>: nop.m 0x0 0xc00000000c0c5620:2 <TFM::PrintTrace(..)+0x92>: (p6) br.cond.dpnt.many _NZ10TFM07PrintTraceEPi+0xda0;; 0xc00000000c0c5630:0 <TFM::PrintTrace(..)+0xa0>: adds r21=16,r16 <--- 0xc00000000c0c5630:1 <TFM::PrintTrace(..)+0xa1>: (p1) mov r62=8;; 0xc00000000c0c5630:2 <TFM::PrintTrace(..)+0xa2>: nop.i 0x0 0xc00000000c0c5640:0 <TFM::PrintTrace(..)+0xb0>: ld8 r42=[r21];; <--- 0xc00000000c0c5640:1 <TFM::PrintTrace(..)+0xb1>: cmp.eq.unc p6=r0,r42 0xc00000000c0c5640:2 <TFM::PrintTrace(..)+0xb2>: nop.i 0x0 0xc00000000c0c5650:0 <TFM::PrintTrace(..)+0xc0>: nop.m 0x0 0xc00000000c0c5650:1 <TFM::PrintTrace(..)+0xc1>: mov r47=5 0xc00000000c0c5650:2 <TFM::PrintTrace(..)+0xc2>: (p6) br.cond.dpnt.many _NZ10TFM07PrintTraceEPi+0xdf0;; 0xc00000000c0c5660:0 <TFM::PrintTrace(..)+0xd0>: ld4.a r27=[r48] ;;; 1064 if( dwDataLen <= dwViewLen ) { 0xc00000000c0c5660:1 <TFM::PrintTrace(..)+0xd1>: adds r28=28,r42 <-- 0xc00000000c0c5660:2 <TFM::PrintTrace(..)+0xd2>: cmp.ne.unc p6=r0,r46;; 0xc00000000c0c5670:0 <TFM::PrintTrace(..)+0xe0>: ld4.sa r26=[r28], 0xc00000000c0c5670:1 <TFM::PrintTrace(..)+0xe1>: (p6) ld4 r31=[r28] <-- instruction that crashed Let me know if register values are needed. I think I can acquire the register value using info reg command of gdb. This is the result of info registers (I excluded values of prXXX and brXXX), I don't have any idea how to map these to the disassembled instruction above. gr1: 0x9fffffffbf716588 gr2: 0x9fffffff5f667c00 gr3: 0x9fffffff5f667c00 gr4: 0x6000000000e0b000 gr5: 0x9fffffff8adfe2e0 gr6: 0x9fffffff8ada9000 gr7: 0x9fffffff8ad7a000 gr8: 0x1 gr9: 0x9fffffff8adfd0f0 gr10: 0 gr11: 0xc000000000000690 gr12: 0x9fffffff8adfd140 gr13: 0x6000000001681510 gr14: 0x9fffffffbf7d8e98 gr15: 0x1a gr16: 0x60000000044dac60 gr17: 0x1f gr18: 0 gr19: 0x9fffffff8ad023f0 gr20: 0x9fffffff8adfd0e0 gr21: 0x60000000044dac70 gr22: 0x9fffffff5f668000 gr23: 0xd gr24: 0x1 gr25: 0xc0000000004341f0 gr26: NaT gr27: 0x63 gr28: 0xc00000000c5f801c gr29: 0xc00000000029db20 gr30: 0xc00000000029db20 gr31: 0x288 gr32: 0x60000000044796d0 gr33: 0x6000000001a78910 gr34: 0x7e gr35: 0x6000000001d03a90 gr36: 0x9fffffffbf716588 gr37: 0xc000000000000c9d gr38: 0xc00000000c0c4f70 gr39: 0x9 gr40: 0x6000000004479b60 gr41: 0x6000000004479b58 gr42: 0xc00000000c5f8000 gr43: 0x9fffffffbf7144e0 gr44: 0x6000000004479b50 gr45: 0x6000000004479b68 gr46: 0x6000000001d03a90 gr47: 0x5 gr48: 0x6000000001a7892c gr49: 0x9fffffff8adfe110 gr50: 0xc000000000000491 gr51: 0xc00000000c0c5520 gr52: 0xc00000000c07dd10 gr53: 0x9fffffff8adfe120 gr54: 0x9fffffff8adfe0a0 gr55: 0xc00000000000058e gr56: 0xc00000000042be40 gr57: 0x39 gr58: 0x3 gr59: 0x33 gr60: 0 gr61: 0x9fffffffbf7d2598 gr62: 0x8 gr63: 0x9fffffffbf716588 gr64: 0xc000000000000f22 gr65: 0xc00000000c0c5610 This is an update to my previous post. Since I was furnished a copy of the core file, I used gdb to examine the core file and executed the following command: 1) bt 2) frame n <- the frame where the abort occurred 3) disas And here are the results. (gdb) bt #0 0xc0000000001e5350:0 in _lwp_kill+0x30 () from /usr/lib/hpux64/libpthread.so.1 #1 0xc00000000014c7b0:0 in pthread_kill+0x9d0 () from /usr/lib/hpux64/libpthread.so.1 #2 0xc0000000002e4080:0 in raise+0xe0 () from /usr/lib/hpux64/libc.so.1 #3 0xc0000000003f47f0:0 in abort+0x170 () from /usr/lib/hpux64/libc.so.1 #4 0xc00000000e65e0d0:0 in os::abort () at /CLO/Components/JAVA_HOTSPOT/Src/src/os/hp-ux/vm/os_hp-ux.cpp:2033 #5 0xc00000000eb473e0:0 in VMError::report_and_die () at /CLO/Components/JAVA_HOTSPOT/Src/src/share/vm/utilities/vmError.cpp:1008 #6 0xc00000000e66fc90:0 in os::Hpux::JVM_handle_hpux_signal () at /CLO/Components/JAVA_HOTSPOT/Src/src/os_cpu/hp-ux_ia64/vm/os_hp-ux_ia64.cpp:1051 #7 <signal handler called> #8 0xc00000000c0c5670:1 in TFMTrace::PrintTrace () at tfmtrace.cpp:1064 #9 0xc00000000c0c4f70:0 in FMLogger::WriteLog () at fmlogger.cpp:90 ... (gdb) frame 8 #8 0xc00000000c0c5670:1 in TFMTrace::PrintTrace () at tfmtrace.cpp:1064 1064 if( dwDataLen <= dwViewLen ) { Current language: auto; currently c++ (gdb) disas $pc-16*4 $pc+16*4 ... 0xc00000000c0c5660:0 <TFMTrace::PrintTrace(...)+0xd0> : ld4.a r27=[r48] MII, ;;; 1064 if( dwDataLen <= dwViewLen ) { 0xc00000000c0c5660:1 <TFMTrace::PrintTrace(...)+0xd1> : adds r28=28,r42 0xc00000000c0c5660:2 <TFMTrace::PrintTrace(...)+0xd2> : cmp.ne.unc p6=r0,r46;; 0xc00000000c0c5670:0 <TFMTrace::PrintTrace(...)+0xe0> : ld4.sa r26=[r28] MMI, 0xc00000000c0c5670:1 <TFMTrace::PrintTrace(...)+0xe1> : (p6) ld4 r31=[r28] 0xc00000000c0c5670:2 <TFMTrace::PrintTrace(...)+0xe2> : adds r46=24,r42;; 0xc00000000c0c5680:0 <TFMTrace::PrintTrace(...)+0xf0> : (p6) st4 [r35]=r31 MI,I 0xc00000000c0c5680:1 <TFMTrace::PrintTrace(...)+0xf1> : adds r59=36,r42;; 0xc00000000c0c5680:2 <TFMTrace::PrintTrace(...)+0xf2> : nop.i 0x0 0xc00000000c0c5690:0 <TFMTrace::PrintTrace(...)+0x100>: ld4.c.clr r27=[r48] MIB, ;;; 1066 dwLen = dwTrcLen ; 0xc00000000c0c5690:1 <TFMTrace::PrintTrace(...)+0x101>: cmp4.eq.unc p6,p8=99,r27 0xc00000000c0c5690:2 <TFMTrace::PrintTrace(...)+0x102>: nop.b 0x0;; 0xc00000000c0c56a0:0 <TFMTrace::PrintTrace(...)+0x110>: (p8) ld4.c.clr r26=[r28] MMI ;;; 1067 } 0xc00000000c0c56a0:1 <TFMTrace::PrintTrace(...)+0x111>: (p6) st4 [r48]=r47 0xc00000000c0c56a0:2 <TFMTrace::PrintTrace(...)+0x112>: cmp4.geu.unc p7=r26,r27 End of assemb
A "normal" crash in native code causes a report like this: C [libc.so.6+0x88368] strstr+0x64a Note small offset from the function (strstr in this case) to the crash point. In your case, JVM decided that the address oxc00000000f675671 is inside libtracejni.so, but the closest function it could find is very far from the crash point (0x5065eff9 == 1.2 GB away). Is your library really that big? If it really is that big, chances are you have stripped it, and so the symbol _NZ10TFM07PrintTraceEPi doesn't actually have anything to do with the problem (which is in the code that is 1.2GB away). You need to find out what code was really at address oxc00000000f675671 at the time of the crash. Usually hs_err_pid*.log contains a list of load addresses for all the shared libraries. Find the load address of libtracejni.so, subtract it from pc. That should give you an address similar to 0x400...675671 which you should be able to lookup in your unstripped version of libtracejni.so. Also note that crash address ends with ASCII "C8G", which may or may not be a coincidence. Update 2011/08/05. Now you know which instruction crashed: 0x4000000000099670:1 <TFMTrace::PrintTrace(...)+0xe1>: (p6) ld4 r31=[r28] This is a load of 4-byte integer from memory pointed by r28. The next questions are: what is the value of r28 at crash point (should be logged in hs_err*.log), and also where did it come from (complete disassembly of TFM::PrintTrace will tell you that).