I'm seeing a truly baffling series of error reports from Valgrind's Memcheck tool:
==29456== Invalid read of size 8
==29456== at 0x4D5C90: CkIndex_Ping1::_callthr_trecv_PingMsg(CkThrCallArg*) (in /scratch/phil/charm/net-linux-x86_64-bigsim/tests/charm++/pingpong/pgm)
==29456== by 0x503ECB: CthStartThread (libthreads-default.c:1690)
==29456== by 0x56A08AF: ??? (in /lib/x86_64-linux-gnu/libc-2.19.so)
==29456== Address 0x5b09a90 is 0 bytes inside a block of size 16 alloc'd
==29456== at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==29456== by 0x4D5C14: CkIndex_Ping1::_call_trecv_PingMsg(void*, void*) (in /scratch/phil/charm/net-linux-x86_64-bigsim/tests/charm++/pingpong/pgm)
==29456== by 0x517D79: CkDeliverMessageFree (ck.C:593)
==29456== by 0x5378C3: CkLocRec_local::invokeEntry(CkMigratable*, void*, int, bool) (cklocation.C:1795)
==29456== by 0x537CA7: CkLocRec_local::deliver(CkArrayMessage*, CkDeliver_t, int) (cklocation.C:1862)
==29456== by 0x539977: CkLocMgr::deliver(CkMessage*, CkDeliver_t, int) (cklocation.C:2834)
==29456== by 0x51F091: CkLocMgr::deliverInline(CkMessage*) (cklocation.h:313)
==29456== by 0x51A6EF: _processArrayEltMsg(CkCoreState*, envelope*) (ck.C:1181)
==29456== by 0x51A8C8: _processHandler(void*, CkCoreState*) (ck.C:1266)
==29456== by 0x4EE7EF: BgProcessMessageDefault(threadInfo*, char*) (blue.C:1339)
==29456== by 0x5C5928: BgProcessMessageFreezeMode(threadInfo*, char*) (middle-ccs.C:165)
==29456== by 0x4F590D: workThreadInfo::scheduler(int) (bigsim_proc.C:282)
Note that it's saying that the offending address is inside a still-allocated (i.e. not yet free()'d) block, and that the read size plus offset is well less than the size of the block.
This is on Ubuntu Linux 14.04, with Valgrind version valgrind-3.10.0.SVN (package 1:3.10~20140411-0ubuntu1), and the code was compiled with gcc/g++ 4.8.4-2ubuntu1~14.04.
I've found a similar question, to which the answer was "this is a bug on Mac OS X". Am I really looking at a Valgrind bug here, or is there something else my code might have wrong?
Edit: I also found a mailing list post covering a similar environment - user-level threads that might be screwing with Valgrind's understanding. It doesn't seem to actually answer anything though.
Related
I'm using ASAN address sanitizer to detect memory issues. When the program stops ASAN complains about the following:
==102121==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 537 byte(s) in 1 object(s) allocated from:
#0 0x75cb48 in operator new(unsigned long) (/home/app+0x75cb48)
#1 0x7dca83 in __gnu_cxx::new_allocator<char>::allocate(unsigned long, void const*) /opt/rh/devtoolset-7/root/usr/include/c++/7/ext/new_allocator.h:111
#2 0x7ce766 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/basic_string.tcc:1057
#3 0x7cc54d in std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (/home/app+0x7cc54d)
#4 0x7c1f2a in std::string::reserve(unsigned long) /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/basic_string.tcc:960
#5 0x7fa0a639c6f5 in std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::overflow(int) (/lib64/libstdc++.so.6+0x9b6f5)
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x75cec8 in operator new(unsigned long, std::nothrow_t const&) (/home/app+0x75cec8)
#1 0x7fa0a635df1d in __cxa_thread_atexit (/lib64/libstdc++.so.6+0x5cf1d)
Indirect leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x75cec8 in operator new(unsigned long, std::nothrow_t const&) (/home/app+0x75cec8)
#1 0x7fa0a635df1d in __cxa_thread_atexit (/lib64/libstdc++.so.6+0x5cf1d)
I've seen on the ASAN page that it can come from the fact the the standard library is statically linked. Although, in my case it is dynamic one.
The application is compiled with devtoolset-7 on RHEL.
Do you have any idea where the leak comes from?
You can get more info than
#0 0x75cb48 in operator new(unsigned long) (/home/app+0x75cb48)
by using llvm-symbolizer.
Download it, and set the environment variable
ASAN_SYMBOLIZER_PATH=/usr/where/ever/the/binary/is
If you are sure that the leak is a false alarm, you can use a suppression file:
create a suppression text file and add to it: leak: __cxa_thread_atexit
Set environment variable
LSAN_OPTIONS=suppressions=path/to/suppr.txt
and then run your app.
http://clang.llvm.org/docs/AddressSanitizer.html#symbolizing-the-reports
I am executing simple tensorflow code to create graph def as shown
tensorflow::NewSession (options, &session)
ReadBinaryProto (tensorflow::Env::Default(), "/home/ashok/eclipseWorkspace/faceRecognition-x86_64/Data/models/optimized_facenet.pb", &graph_def));
session->Create (graph_def);
But when I run Valgrind as shown below
valgrind --leak-check=full --show-leak-kinds=all --vex-guest-max-insns=25 ./faceRecognition-x86_64 -r -i
I get below errors
==12366== 16,000 bytes in 1 blocks are still reachable in loss record 47,782 of 47,905
==12366== at 0x4C2E19F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12366== by 0xBF875DC: std::vector<tensorflow::CostModel::MemUsage, std::allocator<tensorflow::CostModel::MemUsage> >::reserve(unsigned long) (in /usr/lib/libtensorflow_cc.so)
==12366== by 0xBF90128: tensorflow::CostModel::InitFromGraph(tensorflow::Graph const&) (in /usr/lib/libtensorflow_cc.so)
==12366== by 0xBEE48D3: tensorflow::SimpleGraphExecutionState::InitBaseGraph(tensorflow::BuildGraphOptions const&) (in /usr/lib/libtensorflow_cc.so)
==12366== by 0xBEE52CF: tensorflow::SimpleGraphExecutionState::MakeForBaseGraph(tensorflow::GraphDef*, tensorflow::SimpleGraphExecutionStateOptions const&, std::unique_ptr<tensorflow::SimpleGraphExecutionState, std::default_delete<tensorflow::SimpleGraphExecutionState> >*) (in /usr/lib/libtensorflow_cc.so)
==12366== by 0xBE68B9D: tensorflow::DirectSession::MaybeInitializeExecutionState(tensorflow::GraphDef const&, bool*) (in /usr/lib/libtensorflow_cc.so)
==12366== by 0xBE68CF9: tensorflow::DirectSession::ExtendLocked(tensorflow::GraphDef const&) (in /usr/lib/libtensorflow_cc.so)
==12366== by 0xBE68FC7: tensorflow::DirectSession::Create(tensorflow::GraphDef const&) (in /usr/lib/libtensorflow_cc.so)
==12366== by 0x26B899: TensorFlow::initializeRecognition() (in /home/ashok/eclipseWorkspace/faceRecognition-x86_64/Debug/faceRecognition-x86_64)
==12366== by 0x24197D: RecognitionWithImages::RecognitionWithImages() (in /home/ashok/eclipseWorkspace/faceRecognition-x86_64/Debug/faceRecognition-x86_64)
==12366== by 0x12F27C: main (in /home/ashok/eclipseWorkspace/faceRecognition-x86_64/Debug/faceRecognition-x86_64)
These type of errors are also generated when I do session -> run ()
Due to the above issues, the memory needed to run the program keeps increasing as time passes and the application crashes due to insufficient memory after a certain point of time
Did you close the session ? You also need to delete the Session pointer.
session->Close();
delete session;
Valgrind tool which is a well known memory analyzing tool reports an Invalid Read in OCIStmtPrepare in Oracle C API Function. This can be observed in several such Oracle C API functions.
Please refer the following stack trace.
According to my observations and understanding the the application creates a buffer of 317 bytes. However when it is passed to Oracle library it does some memory copy using the __intel_new_memcpy function. However the __intel_new_memcpy function copies 320 bytes (which is 8 from 312). The actual allocated memory was 317 bytes.
Could you please confirm whether this behaviour correct? What goes wrong in this?
==22195== Invalid read of size 8
==22195== at 0x68CD2D9: __intel_new_memcpy (in /x02/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
==22195== by 0x5D84158: kpurclientparse (in /x02/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
==22195== by 0x5D878DE: kpureq (in /x02/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
==22195== by 0x5D607FA: OCIStmtPrepare (in /x02/app/oracle/product/11.2.0/client_1/lib/libclntsh.so.11.1)
==22195== by 0x4099E0: DBCursor::Parse(char const*) (OCICPP.C:1020)
==22195== by 0x40CE29: DBCon::NewCursor(char const*, int) (OCICPP.C:753)
==22195== by 0x4047A6: main (main.cpp:59)
==22195== Address 0xa2e7e68 is 312 bytes inside a block of size 317 alloc'd
==22195== at 0x4C26E1C: operator new[](unsigned long) (vg_replace_malloc.c:305)
==22195== by 0x4EBD00F: String::Set(char const*, unsigned int) (String.cpp:544)
==22195== by 0x4EBD169: String::Set(char const*) (String.cpp:512)
==22195== by 0x4EBD188: String::operator=(char const*) (String.cpp:590)
==22195== by 0x404784: main (main.cpp:55)
I have a problem with a memory leak and the output of Valgrind:
==4501== 15,263,488 bytes in 59,623 blocks are definitely lost in loss record 5,941 of 5,942
==4501== at 0x4C2BBA0: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==4501== by 0x6CC78D1: newlocale (newlocale.c:201)
==4501== by 0x527EE7: app::TLocale::create(app::ELocale) (locale.cpp:141)
==4501== by 0x5276AD: app::TLocale::TLocale() (locale.cpp:38)
==4501== by 0x50091E: util::TDateTime::TDateTime(util::EDateTimeType) (datetime.cpp:828)
==4501== by 0x587EE4: util::TVariant::TVariant() (variant.cpp:74)
==4501== by 0x561215: data::TField::TField() (tables.cpp:193)
==4501== by 0x554EA7: sqlite::TQuery::assignFields() (sqlite.cpp:631)
==4501== by 0x553C80: sqlite::TQuery::next() (sqlite.cpp:415)
==4501== by 0x4F8E06: sql::TDataQuery::fetchAll() (database.cpp:219)
==4501== by 0x5CD499: app::TMain::qryPersonalData(std::string&, unsigned long, unsigned long, std::string const&, unsigned long) (main.cpp:1616)
==4501== by 0x5CCF86: app::TMain::getPersonalData(app::TThreadData&, char const*&, unsigned long&, util::TNamedVariants&) (main.cpp:1568)
If i look at the first lines I would asume that newlocale() is used without freelocale(). But I checked the code again and again, this is not the case.
Am I misinterpreting the output and the leakage might be somewhere else?
From the traceback it's not that obvious that pairing of newlocale and freelocale is the culprit. There's a number of constructors (TLocale, TDateTime, TVariant and TField) that would probably have to be destroyed as well (if the freelocale is called from the destructor). You should check that there is no memory leaks of those classes as well (for example if you have a new TLocale without delete you would get another memory leak detected).
More than that is hard to tell without seeing enough of the source code.
In my project i am using jsoncpp, boost and many libraries, when i ran the valgrind for my program in many palces including jsoncpp, boost libraries it shows possible memory leak in string creation
I have pasted the valgrind error snippets
==5506== 427,198 bytes in 489 blocks are possibly lost in loss record 8,343 of 8,359
==5506== at 0x4C2B1C7: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5506== by 0x9360A88: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==5506== by 0x55EB0BD: char* std::string::_S_construct(char const*, char const*, std::allocator const&, std::forward_iterator_tag) (basic_string.tcc:140)
==5506== by 0x936261C: std::basic_string, std::allocator >::basic_string(char
const*, unsigned long, std::allocator const&) (in
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
==5506== by 0x63FEB99: Json::Value::asString() const (json_value.cpp:611)
My question is are these errors are valid or they are false positive ?
Thanks in advance
To be completely sure, you can do a looped test and check for memory hogging.
We had similar messages and they turned out to be false positives, so we added them to the suppression list.
Valgrind has some heuristics that reduces the number of 'false positive'
possibly lost.
A.o., it has an heuristic to better detect std::string.
Use the following options to activate some heuristics:
--leak-check-heuristics=heur1,heur2,... which heuristics to use for
improving leak search false positive [none]
where heur is one of:
stdstring length64 newarray multipleinheritance all none
Note that in the upcoming 3.11 release, the default for this option
has been changed from 'none' to 'all'.