Running valgrind, I get loads of memory leaks in opencv, especially with the function of namedWindow.
In the main, I have an image CSImg and PGImg:
std::string cs = "Computer Science Students";
std::string pg = "Politics and Government Students";
CSImg.displayImage(cs);
cv::destroyWindow(cs);
PGImg.displayImage(pg);
cv::destroyWindow(pg);
display image function is:
void ImageHandler::displayImage(std::string& windowname){
namedWindow(windowname);
imshow(windowname, m_image);
waitKey(7000);
}
Valgrind is giving me enormous memory leaks when I do displayImage.
For example:
==6561== 2,359,544 bytes in 1 blocks are possibly lost in loss record 3,421 of 3,421
==6561== at 0x4C2B3F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6561== by 0x4F6C94C: cv::fastMalloc(unsigned long) (in /usr/lib/libopencv_core.so.2.3.1)
==6561== by 0x4F53650: cvCreateData (in /usr/lib/libopencv_core.so.2.3.1)
==6561== by 0x4F540F0: cvCreateMat (in /usr/lib/libopencv_core.so.2.3.1)
==6561== by 0x56435AF: cvImageWidgetSetImage(_CvImageWidget*, void const*) (in /usr/lib/libopencv_highgui.so.2.3.1)
==6561== by 0x5644C14: cvShowImage (in /usr/lib/libopencv_highgui.so.2.3.1)
==6561== by 0x5642AF7: cv::imshow(std::string const&, cv::_InputArray const&) (in /usr/lib/libopencv_highgui.so.2.3.1)
==6561== by 0x40CED7: ImageHandler::displayImage(std::string&) (imagehandler.cpp:33)
==6561== by 0x408CF5: main (randomU.cpp:601)
imagehandler.cpp, line 33 is:
imshow(windowname, m_image); //the full function is written above ^
randomU.cpp line 601 is:
CSImg.displayImage(cs);
Any help is appreciated.
Ask for any further info you need.
Sorry, the stark reality looks like OpenCV leaks. It leaks from the side of its Qt interface too due to self-references according to the Leaks Instrument (XCode tools).
Other proof that this is not just a false alarm: On my Mac, Opencv 2.4.3 continuously grows in the memory (according to Activity Monitor) when processing webcam input. (I am not using any pointers or data strorages so theoretically my OpenCV program should remain of constant size.)
Actually you don't need to call namedWindow anymore. You just call a "naked" cv::imshow(windowname,m_image). It works fine even if you overwrite.
REMARK:
waitKey has two usages:
1. to wait forever, then waitKey(0);
2. to wait for just a bit, possibly because you are displaying input from your webcam. Then do waitKey(30); (or less, depending on the fps of what you are playing. For movies, 30.)
Related
Computer:
Processor: Intel Xeon Silver 4114 CPU # 2.19Ghz (2 processors)
Ram: 96 Gb 2666 Hz: 12 - 8 Gb sticks
OS: Windows 10
GPU: None
Hard drive: Samsung MZVLB512HAJQ-000H2 - 512GB M.2 PCIe NVMe
IDE:
Visual Studio 2019
I am including what I am doing in case it is relevant. I am running a visual studio code where I read data off a GSC PCI SIO4B Sync Card 256K. Using the API for this card (Documentation: http://www.generalstandards.com/downloads/GscApi.1.6.10.1.pdf) I read 150 bytes of data at a speed of 100Hz using the code below. That data is then being split into to the message structure my device. I can’t give info on the message structure but the data is then combined into the various words using a union and added to an integer array int Data[100];
Union Example:
union data_set{
unsigned int integer;
unsigned char input[2];
} word;
Example of how the data is read read:
PLX_PHYSICAL_MEM cpRxBuffer;
#define TEST_BUFFER_SIZE 0x400
//allocates memory for the buffer
cpRxBuffer.Size = TEST_BUFFER_SIZE;
status = GscAllocPhysicalMemory(BoardNum, &cpRxBuffer);
status = GscMapPhysicalMemory(BoardNum, &cpRxBuffer);
memset((unsigned char*)cpRxBuffer.UserAddr, 0xa5, sizeof(cpRxBuffer));
// start data reception:
status = GscSio4ChannelReceivePlxPhysData(BoardNum, iRxChannel, &cpRxBuffer, SetMaxBytes, &messageID);
// wait for Rx operation to complete
status = GscSio4ChannelWaitForTransfer(BoardNum, iRxChannel, 7000, messageID, &amount);
if (status)
{
// If we have an error, "bytesTransferred" will contain the number of bytes that we
// actually transmitted.
DisplayErrorMessage(status);
printf("\n\t%04X bytes out of %04X transferred", amount, SetMaxBytes);
}
My issue is that this code works fine and keeps up for around 5 minutes then randomly it stops being able to keep up and the FIFO (first in first out) register on the PCI card begins to fill up faster than the code can process the data. To me this seems like a memory leak issue since the code works fine for a long time, then starts to slow down when nothing has changed as all the code is doing it reading the data off the card. We used to save the data in a really large array but even after removing that we had the same issue.
I am unsure how to figure out exactly what is happening and I'm hopping for a way to determine if there is a memory leak and how to fix it if there is.
It being a data leak is only a guess though and it very well could be something else that is the problem so any out of the box suggestions for diagnosing the problem are also appreciated.
Similar to Paul's answer, but I like to strategically place two (or more) _CrtMemCheckpoint followed by _CrtMemDifference, to cut down the noise.
Memory leaks can be detected and reported on (in Debug builds) by calling the _CrtDumpMemoryLeaks function. When running under the debugger, this will tell you (in the output tab) how many allocations you have at the time that it is called and the file and line number that each was allocated from.
Call this right at the end of your program, after you (think you) have freed all the resources you use. Anything left over is a candidate for being a leak.
I am trying to track down a bug that occasionally crashes my app in the destructor of this trivial C++ class:
class CrashClass {
public:
CrashClass(double r1, double s1, double r2, double s2, double r3, double s3, string dateTime) : mR1(r1), mS1(s1), mR2(r2), mS2(s2), mR3(r3), mS3(s3), mDateTime(dateTime) { }
CrashClass() : mR1(0), mS1(0), mR2(0), mS2(0), mR3(0), mS3(0) { }
~CrashClass() {}
string GetDateTime() { return mDateTime; }
private:
double mR1, mS1, mR2, mS2, mR3, mS3;
string mDateTime;
};
A bunch of those objects is stuck in a standard C++ vector and used in a second class:
class MyClass {
(...)
private:
vector<CrashClass> mCrashClassVec;
};
MyClass is created and dealloc'd as required many times over.
The code is using C++17 on the latest Xcode 10.1 under macOS 10.14.4.
All of this is part of a computationally intensive simulation app running for multiple hours to days. On a 6-core i7 machine running 12 calculations in parallel (using macOS' GCD framework) this frequently crashes after a couple of hours with a
pointer being freed was not allocated
error when invoking mCrashClassVec.clear() on the member in MyClass, i.e.
frame #0: 0x00007fff769a72f6 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00000001004aa80d libsystem_pthread.dylib`pthread_kill + 284
frame #2: 0x00007fff769116a6 libsystem_c.dylib`abort + 127
frame #3: 0x00007fff76a1f977 libsystem_malloc.dylib`malloc_vreport + 545
frame #4: 0x00007fff76a1f738 libsystem_malloc.dylib`malloc_report + 151
frame #5: 0x0000000100069448 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::__libcpp_deallocate(__ptr=<unavailable>) at new:236 [opt]
frame #6: 0x0000000100069443 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::allocator<char>::deallocate(__p=<unavailable>) at memory:1796 [opt]
frame #7: 0x0000000100069443 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::allocator_traits<std::__1::allocator<char> >::deallocate(__p=<unavailable>) at memory:1555 [opt]
frame #8: 0x0000000100069443 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string() at string:1941 [opt]
frame #9: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string() at string:1936 [opt]
frame #10: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] CrashClass::~CrashClass(this=<unavailable>) at CrashClass.h:61 [opt]
frame #11: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] CrashClass::~CrashClass(this=<unavailable>) at CrashClass.h:61 [opt]
frame #12: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::allocator<CrashClass>::destroy(this=<unavailable>, __p=<unavailable>) at memory:1860 [opt]
frame #13: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] void std::__1::allocator_traits<std::__1::allocator<CrashClass> >::__destroy<CrashClass>(__a=<unavailable>, __p=<unavailable>) at memory:1727 [opt]
frame #14: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] void std::__1::allocator_traits<std::__1::allocator<CrashClass> >::destroy<CrashClass>(__a=<unavailable>, __p=<unavailable>) at memory:1595 [opt]
frame #15: 0x0000000100069439 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::__vector_base<CrashClass, std::__1::allocator<CrashClass> >::__destruct_at_end(this=<unavailable>, __new_last=0x00000001011ad000) at vector:413 [opt]
frame #16: 0x0000000100069429 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::__vector_base<CrashClass, std::__1::allocator<CrashClass> >::clear(this=<unavailable>) at vector:356 [opt]
frame #17: 0x0000000100069422 BackTester`MyClass::DoStuff(int, int) [inlined] std::__1::vector<CrashClass, std::__1::allocator<CrashClass> >::clear(this=<unavailable>) at vector:749 [opt]
Side note: The vector being cleared might have no elements (yet).
In the stacktrace (bt all) I can see other threads performing operations on their copies of CrashClass vectors but as far as I can see from comparing addresses in the stack trace all of those are in fact private copies (as designed), i.e. none of this data is shared between the threads.
Naturally the bug only occurs in full production mode, i.e. all attempts to reproduce the crash
running in DEBUG mode,
running under Lldb's (Xcode's) Address Sanitizer (for many hours/overnight),
running under Lldb's (Xcode's) Thread Sanitizer (for many hours/overnight),
running a cut-down version of the class with just the critical code left/replicated,
failed and did not trigger the crash.
Why might deallocating a simple member allocated on the stack fail with a pointer being freed was not allocated error?
Also additional hints on how to debug this or trigger the bug in a more robust to investigate further are very much welcome.
Update 5/2019
The bug is still around intermittently crashing the app and I'm starting to believe that the issues I'm experiencing are actually caused by Intel's data corruption bug in recent CPU models..
https://mjtsai.com/blog/2019/05/17/microarchitectural-data-sampling-mds-mitigation/
https://mjtsai.com/blog/2017/06/27/bug-in-skylake-and-kaby-lake-hyper-threading/
https://www.tomshardware.com/news/hyperthreading-kaby-lake-skylake-skylake-x,34876.html
You might try a few tricks:
Run the production version using a single thread for an even longer duration (say a week or 2) to see if it crashes.
Ensure that you don't consume all available RAM taking into account the fact that you might have memory fragmentation.
Ensure that your program does not have memory leak or increase memory usage the more long it runs.
Add some tracking by adding extra value, set value to something known in destructor (so you would recognize the pattern if you do a double delete).
Try to run the program under another platform and compiler.
Your compiler or library might contains bugs. Try another (more recent) version.
Remove code from the original version until it crashes no more. That works better if you can consistently get the crash with a sequence that somehow corrupt memory.
Once you got a crash, run the program with the exact same data (for each thread) and see if it always crash at the same location.
Rewrite or validate any unsafe code in your application. Avoid casting, printf and other old school variable argument function and any unsafe strcpy and similar function.
Use checked STL version.
Try unoptimized release version.
Try optimized debug version.
Learn the differences between DEBUG and RELEASE version for your compiler.
Rewrite problematic code from zero. Maybe it won't have the bug.
Inspect the data when it crashes.
Review your error/exception handling to see if you ignore some potential problem.
Test how you program behave when it run out of memory, out of disk space, when an exception is thrown…
Ensure that your debugger stop at each thrown exception handled or not.
Ensure that your program compile and run without warnings or that you understand them and are sure it does not matters.
Inspect the data when it crash to see if look good.
You might reserve memory to reduce fragmentation and reallocation. If your program runs for hours, it might be possible that the memory get too much fragmented and the system cannot find a block that is big enough.
Since your program is multithreaded, ensure that your run-time is also compatible with that.
Ensure that you don't share data across thread or that they are adequately protected.
I have a core on both Solaris/Linux platforms and I don´t see the problem.
On a linux platform, I have the following core:
(gdb) where
#0 0x001aa81b in do_lookup_x () from /lib/ld-linux.so.2
#1 0x001ab0da in _dl_lookup_symbol_x () from /lib/ld-linux.so.2
#2 0x001afa05 in _dl_fixup () from /lib/ld-linux.so.2
#3 0x001b5c90 in _dl_runtime_resolve () from /lib/ld-linux.so.2
#4 0x00275e4c in __gxx_personality_v0 () from /opt/gnatpro/lib/libstdc++.so.6
#5 0x00645cfe in _Unwind_RaiseException_Phase2 (exc=0x2a7b10, context=0xffd58434) at ../../../src/libgcc/../gcc/unwind.inc:67
#6 0x00646082 in _Unwind_RaiseException (exc=0x2a7b10) at ../../../src/libgcc/../gcc/unwind.inc:136
#7 0x0027628d in __cxa_throw () from /opt/gnatpro/lib/libstdc++.so.6
#8 0x00276e4f in operator new(unsigned int) () from /opt/gnatpro/lib/libstdc++.so.6
#9 0x08053737 in Receptor::receive (this=0x93c12d8, msj=...) at Receptor.cc:477
#10 0x08099666 in EventProcessor::run (this=0xffd75580) at EventProcessor.cc:437
#11 0x0809747d in SEventProcessor::run (this=0xffd75580) at SEventProcessor.cc:80
#12 0x08065564 in main (argc=1, argv=0xffd76734) at my_project.cc:20
On a Solaris platform I have another core:
$ pstack core.ultimo
core 'core.ultimo' of 9220: my_project_sun
----------------- lwp# 1 / thread# 1 --------------------
0006fa28 __1cDstdGvector4CpnMDistribuidor_n0AJallocator4C2___Dend6kM_pk2_ (1010144, 1ce84, ffbd0df8, ffb7a18c, fffffff8, ffbedc7c) + 30
0005d580 __1cDstdGvector4CpnMDistribuidor_n0AJallocator4C2___Esize6kM_I_ (1010144, 219, 1ce84, ffffffff, fffffff8, ffbedc7c) + 30
0005ab14 __1cTReceptorHreceive6MrnKMensaje__v_ (33e630, ffbede70, ffffffff, 33e634, 33e68c, 0) + 1d4
0015df78 __1cREventProcessorDrun6M_v_ (ffbede18, 33e630, dcc, 1, 33e730, 6e) + 350
00159a50 __1cWSEventProcessorDrun6M_v_ (da08000, 2302f7, 111de0c, 159980, ff1fa07c, cc) + 48
000b6acc main (1, ffbeef74, ffbeef7c, 250000, 0, 0) + 16c
00045e10 _start (0, 0, 0, 0, 0, 0) + 108
----------------- lwp# 2 / thread# 2 --------------------
...
The piece of code is:
...
msj2.tipo(UPDATE);
for(i = 0; i < distr.size(); ++i)
{
distr[i]->insert(new Mensaje(msj2)); **--> Receptor.cc:477**
}
...
This core happens randomly, sometimes the process is running for weeks.
The size of the core is 4291407872 B.
I am running valgrind to see if the heap is corrupted but by now I have not encountered problems as "Invalid read", "Invalid write" ...
Also, when I was running valgrind I have found twice the following message:
==19002== Syscall param semctl(arg) points to uninitialised byte(s)
and I have detected the lines of code but could these errors lead to the core? I think that I have seen these errors with valgrind before and they weren´t as important and the ones that say "Invalid read/write".
If you have any idea how to solve this problem, it would be highly appreciated.
The core size is the clue. The largest 32-bit unsigned number is 4,294,967,295. Your core is quite close to that indicating that the process is out of memory. The most likely cause is a memory leak.
See my recent article Memory Leaks in C/C++
Valgrind will find the issue for you on Linux. You have to start it with the --leak-check option for this. It will check for leaks when the process exits gracefully so you will need a way to shut the process down.
Dtrace with dbx on Solaris will also likely work.
Also, when I was running valgrind I have found twice the following
message:
==19002== Syscall param semctl(arg) points to uninitialised byte(s)
and I have detected the lines of code but could these errors lead to
the core?
Yes, that could result in a SIGSEGV, as it's quite likely undefined behavior. (I'm not going to say it's definitely undefined behavior without seeing the actual code - but it likely is.) It's not likely that doing that can cause a SIGSEGV, but then again the intermittent failure you're seeing doesn't happen all that often. So you do need to fix that problem.
In addition to valgrind, on Solaris you can also use libumem and watchmalloc to check for problems managing heap memory. See the man pages for umem_debug and watchmalloc to get started.
To use dbx on Solaris, you need to have Solaris Studio installed (it's free). Solaris Studio also offers a way to use the run-time memory checking of dbx without having to directly invoke the dbx debugger. See the man page for bcheck. The bcheck man page will be in the Solaris Studio installation directory tree, in the man directory.
And if it is a memory leak, you should be able to see the process address space growing over time.
Before I ask my question let me explain my environment:
I have a C/C++ application that runs continuously (Infinite loop) inside an embedded Linux device.
The application records some data from the system and stores them in text files on an SD-card (1 file per day).
The recording occurs on a specific trigger detected from the systems (each 5 minutes for example) and each trigger inserts a new line in the text files.
Typical datatypes used within the application are: (o/i)stream, char arrays, char*, c_str() function, structs and struct*, static string arrays, #define, enums, FILE*, vector<>, and usual ones (int, string, etc.). Some of these datatypes are passed as arguments to functions.
The application is cross compiled with a custom GCC compiler within a Buildroot and BusyBox package for the device's CPU Atmel AT91RM9200QU.
The application executes some system commands using popen in which the output is read using the resulting FILE*
Now the application is running for three days and I noticed an increase of 32 KB byte in the virtual storage (VSZ from the top command) each day. By mistake the device restarted, I launched the application again and the VSZ value started from the usual value on each fresh start (about 2532 KB).
I developed another application that monitors the VSZ value for the application and it is scheduled using crontab each on each our to start monitor. I noticed at some point during the day the 32 KB I noticed happened 4 KB each hour.
So the main question is, what would be the reason that the VSZ increase ? Eventually it will reach a limit causing the system to crash that is my concern because the device have approx. 27 MB of RAM.
Update: Beside the VSZ value, the RSS also increases. I ran the application under valgrind --leak-check=full and after the first recording I aborted the application and the following message appeared many many times!.
==28211== 28 bytes in 1 blocks are possibly lost in loss record 15 of 52
==28211== at 0x4C29670: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==28211== by 0x4EF33D8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib64/libstdc++.so.6.0.19)
==28211== by 0x4EF4B00: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) (in /usr/lib64/libstdc++.so.6.0.19)
==28211== by 0x4EF4F17: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib64/libstdc++.so.6.0.19)
==28211== by 0x403842: __static_initialization_and_destruction_0 (gatewayfunctions.h:28)
*==28211== by 0x403842: _GLOBAL__sub_I__Z18szBuildUDPTelegramSsii (gatewayfunctions.cpp:396)
==28211== by 0x41AE7C: __libc_csu_init (elf-init.c:88)
==28211== by 0x5676A94: (below main) (in /lib64/libc-2.19.so)
The same message appears, except that the line with * appears with a different file name. The other thing I notice, line 28 of file gatewayfunctions.h is a static string array declaration, this array is used in two files only. Any suggestions ?
I successfully ported cudaDecodeGL from windows to linux , it works fine , but after checking memory leakage with valgrind , I found that there is alot of memory leakage in that :
I reviewed the code and for finding solution I have some question :
1) should I delete all declared pointer in all function , I mean not deleting a pointer cause memory leakage ?
2) does porting windows program to linux make memory leakage problem , For example because of memory management mechanism in linux and windows ?!?!
3) can you give me a procedure in which you face to memory leakage in valgrind, I mean what do you do if valgrind told you you have memory leak like this ?
part of valgrind log file :
.
.
.
.
==10468== 754,864 (4,088 direct, 750,776 indirect) bytes in 1 blocks are definitely lost in loss record 136 of 137
==10468== at 0x4A069EE: malloc (vg_replace_malloc.c:270)
==10468== by 0x5B0A366: cuvidCreateVideoParser (in /usr/lib64/libnvcuvid.so.319.17)
==10468== by 0x40929E: VideoParser::VideoParser(VideoDecoder*, FrameQueue*, CUctx_st**) (in /home/admin/testcuda/de_3/cudaDecodeGL/3_Imaging/cudaDecodeGL/Far_Decoder)
==10468== by 0x4063F3: initCudaVideo() (in /home/admin/testcuda/de_3/cudaDecodeGL/3_Imaging/cudaDecodeGL/Far_Decoder)
==10468== by 0x404E8B: initCudaResources(int, char**, int*) (in /home/admin/testcuda/de_3/cudaDecodeGL/3_Imaging/cudaDecodeGL/Far_Decoder)
==10468== by 0x40561B: main (in /home/admin/testcuda/de_3/cudaDecodeGL/3_Imaging/cudaDecodeGL/Far_Decoder)
LEAK SUMMARY:
==10468== definitely lost: 7,608 bytes in 148 blocks
==10468== indirectly lost: 988,728 bytes in 907 blocks
==10468== possibly lost: 2,307,388 bytes in 59 blocks
==10468== still reachable: 413,278 bytes in 198 blocks
==10468== suppressed: 0 bytes in 0 blocks
Let me know if you need more information. If you think I should add some info to make my question clear , just let me know how to do that , I really appreciate that.
Update :
VideoParser::VideoParser(VideoDecoder *pVideoDecoder, FrameQueue *pFrameQueue, CUcontext *pCudaContext): hParser_(0)
{
assert(0 != pFrameQueue);
oParserData_.pFrameQueue = pFrameQueue;
assert(0 != pVideoDecoder);
oParserData_.pVideoDecoder = pVideoDecoder;
oParserData_.pContext = pCudaContext;
CUVIDPARSERPARAMS oVideoParserParameters;
memset(&oVideoParserParameters, 0, sizeof(CUVIDPARSERPARAMS));
oVideoParserParameters.CodecType = pVideoDecoder->codec();
oVideoParserParameters.ulMaxNumDecodeSurfaces = pVideoDecoder->maxDecodeSurfaces();
oVideoParserParameters.ulMaxDisplayDelay = 1; // this flag is needed so the parser will push frames out to the decoder as quickly as it can
oVideoParserParameters.pUserData = &oParserData_;
oVideoParserParameters.pfnSequenceCallback = HandleVideoSequence; // Called before decoding frames and/or whenever there is a format change
oVideoParserParameters.pfnDecodePicture = HandlePictureDecode; // Called when a picture is ready to be decoded (decode order)
oVideoParserParameters.pfnDisplayPicture = HandlePictureDisplay; // Called whenever a picture is ready to be displayed (display order)
CUresult oResult = cuvidCreateVideoParser(&hParser_, &oVideoParserParameters);
assert(CUDA_SUCCESS == oResult);
}
as you see cuvidCreateVideoParser is in the shared library , how can I solve this memory leakage ?
Well, documentation on nvcuvid does not appear on the first page on google results so a quick look at nvcuvid.h revealed:
CUresult CUDAAPI cuvidCreateVideoParser(CUvideoparser *pObj, CUVIDPARSERPARAMS *pParams);
CUresult CUDAAPI cuvidParseVideoData(CUvideoparser obj, CUVIDSOURCEDATAPACKET *pPacket);
CUresult CUDAAPI cuvidDestroyVideoParser(CUvideoparser obj);
Be sure to destroy your video parser handle via cuvidDestroyVideoParser in your deconstructor of your VideoParser class. From the small code block you provided it is not clear how long the VideoParsers lives. I suppose (due to the valgrind output) it is within a function scope and gets destroyed when the function returns. Without proper destruction of your object's cuvid resources you will encounter memory-leakage.