HTML Tidy segfault on TidyBufFree - c++

I'm using Tidy to clean up lots of HTML. The function I'm using is:
std::string cleanHTML (std::string htmlcontent)
{
char* outputstr;
TidyBuffer output ={0};
uint buflen =0;
TidyBuffer errbuf;
int rc = -1;
Bool ok;
TidyDoc tdoc = tidyCreate(); // Initialize "document"
tidyBufInit( &errbuf );
ok = tidyOptSetBool( tdoc, TidyXhtmlOut, yes ); // Convert to XHTML
if ( ok )
rc = tidySetErrorBuffer( tdoc, &errbuf ); // Capture diagnostics
if ( rc >= 0 )
rc = tidyParseString( tdoc, htmlcontent.c_str() ); // Parse the input
if ( rc >= 0 )
rc = tidySaveBuffer (tdoc,&output ); // Tidy it up!
uint yy= output.size;
outputstr = (char*)malloc(yy+10);
uint xx=yy+10;
rc = tidySaveString (tdoc,outputstr,&xx);
std::string cleanedhtml (outputstr);
tidyBufFree(&output);
tidyBufFree(&errbuf);
tidyRelease(tdoc);
return cleanedhtml;
}
The program seems to segfault on tidyBufFree (&output) on a certain call (I don't think there is anything obviously distinctive about the call) having used gdb. There also seems to be a memory leak coming from this function.
Can anyone help?
EDIT:
I've used Valgrind as recommended and the output is below (can someone explain what it means?).
==7860== Process terminating with default action of signal 11 (SIGSEGV)
==7860== Access not within mapped region at address 0x0
==7860== at 0x428B00: tidyBufFree (in /home/sergerold/qt5_episode_analyser/a.out)
==7860== by 0x405EC6: cleanHTML(std::string) (in /home/sergerold/qt5_episode_analyser/a.out)
==7860== by 0x4048A3: get_tvseries(std::string) (in /home/sergerold/qt5_episode_analyser/a.out)
==7860== by 0x403DE2: main (in /home/sergerold/qt5_episode_analyser/a.out)
==7860== If you believe this happened as a result of a stack
==7860== overflow in your program's main thread (unlikely but
==7860== possible), you can try to increase the size of the
==7860== main thread stack using the --main-stacksize= flag.
==7860== The main thread stack size used in this run was 8388608.
==7860==
==7860== HEAP SUMMARY:
==7860== in use at exit: 2,285,594 bytes in 3,638 blocks
==7860== total heap usage: 102,543 allocs, 98,905 frees, 137,801,931 bytes allocated
==7860==
==7860== LEAK SUMMARY:
==7860== definitely lost: 0 bytes in 0 blocks
==7860== indirectly lost: 0 bytes in 0 blocks
==7860== possibly lost: 1,303,686 bytes in 114 blocks
==7860== still reachable: 981,908 bytes in 3,524 blocks
==7860== suppressed: 0 bytes in 0 blocks
==7860== Rerun with --leak-check=full to see details of leaked memory
==7860==
==7860== For counts of detected and suppressed errors, rerun with: -v
==7860== Use --track-origins=yes to see where uninitialised values come from
==7860== ERROR SUMMARY: 113 errors from 17 contexts (suppressed: 0 from 0)
Segmentation fault
SOLVED:
The segmentation fault was caused by tidyBufFree (&output) when &output was empty causing a dereferencing of a null pointer.

Your code seems a lot like this example but with few important differences.
Note in the example the author is not calling tidyBufInit( &errbuf ); this may be your memory leak. To be on the safe side use a tool for memory debugging for instance valgrind. As for the segfault - it seems what you do do free output is correct(at least according to the example) so my guess is that a stack corruption may be causing the problem. Again valgrind may help you find it.

The segmentation fault was caused by tidyBufFree (&output) when &output was empty causing a dereferencing of a null pointer. – user3083672

Related

C++ memory leak. Valgrind - mismatched delete

I receive objects from Thread #1 - its a 3rd party lib code - my callback called on it.
Objects have fixed-length string fields wrapped:
typedef struct somestr_t {
char * Data;
int Len; } somestr_t;
I have to create copy of the objects by hand every time, before I can pass it further to my code. So amongst other things I copy these strings too using this helper:
inline void CopyStr(somestr_t * dest, somestr_t * src)
{
if (src->Len == 0) {
dest->Len = 0;
return;
}
char* data = new char[src->Len];
memcpy(data, src->Data, src->Len);
dest->Data = data;
dest->Len = src->Len;
}
Then somewhere down the road I delete the object and its string fields:
if (someobj != nullptr)
{
if (someobj ->somestr.Len != 0) delete someobj ->somestr.Data;
. . .
delete someobj ;
}
When I run valgrind I get these in places where I would expect the strings to be deleted:
==33332== Mismatched free() / delete / delete []
==33332== at 0x48478DD: operator delete(void*, unsigned long) (vg_replace_malloc.c:935)
==33332== by 0x41B517: cleanup() (Recorder.cpp:86)
==33332== by 0x41BB29: signal_callback(int) (Recorder.cpp:129)
==33332== by 0x4C11DAF: ??? (in /usr/lib64/libc.so.6)
==33332== by 0x4CD14D4: clock_nanosleep##GLIBC_2.17 (clock_nanosleep.c:48)
==33332== by 0x4CD6086: nanosleep (nanosleep.c:25)
==33332== by 0x4D02DE8: usleep (usleep.c:32)
==33332== by 0x41C3EF: Logger(void*) (LogThreads.h:28)
==33332== by 0x4C5C6C9: start_thread (pthread_create.c:443)
==33332== by 0x4BFC2B3: clone (clone.S:100)
==33332== Address 0xd661260 is 0 bytes inside a block of size 12 alloc'd
==33332== at 0x484622F: operator new[](unsigned long) (vg_replace_malloc.c:640)
==33332== by 0x419E72: CopyStr (CbOverrides.h:23)
and summary report:
==34077== HEAP SUMMARY:
==34077== in use at exit: 328,520 bytes in 3,828 blocks
==34077== total heap usage: 124,774 allocs, 120,946 frees, 559,945,294 bytes allocated
==34077==
==34077== LEAK SUMMARY:
==34077== definitely lost: 0 bytes in 0 blocks
==34077== indirectly lost: 0 bytes in 0 blocks
==34077== possibly lost: 0 bytes in 0 blocks
==34077== still reachable: 328,520 bytes in 3,828 blocks
==34077== suppressed: 0 bytes in 0 blocks
I never used valgrind (or any c++ tool) before so I am not sure - why mismatch delete is reported? why there are 328K unreleased memory on exit?
char* data = new char[src->Len];
and
if (someobj ->somestr.Len != 0) delete someobj ->somestr.Data;
That delete should be delete [].
Why are there still reachable: 425,333 bytes in 3,860 blocks. Sorry, my crystal ball isn't working.
Normally Valgrind does give a hint as to what you need to do
==19283== Rerun with --leak-check=full to see details of leaked memory
It's a little bit mean in that after you've done that it will tell you about another option
==21816== Reachable blocks (those to which a pointer was found) are not shown.
==21816== To see them, rerun with: --leak-check=full --show-leak-kinds=all
Try those and start working through the non-freed memory.

C++ valgrind error message segmentation fault

I'm new to C++ and I'm trying to write an enigma machine simulator. I know I've done something funky with my pointers and I've run it through Valgrind, but I'm not sure what the error messages mean and where to begin fixing it? (What does suppressed mean in the leak summary?)
Here's part of the code where each component is created and where the error occurs.
Enigma::Enigma(int argc, char** argv){
errorCode = NO_ERROR;
plugboard = NULL;
*rotor = NULL; //WHERE THE ERROR OCCURS
reflector = NULL;
rotorCount = 0;
//first check how many rotors there are
if (argc >= 5)
rotorCount = argc - 4;
if (argc <= 4)
errorCode = INSUFFICIENT_NUMBER_OF_PARAMETERS;
//pass files into each component and check if well-formed
if (errorCode == NO_ERROR){
plugboard = new Plugboard(argv[1]);
errorCode = plugboard -> errorCode;
if (errorCode == NO_ERROR){
cout << "Plugboard configuration loaded successfully" << endl;
reflector = new Reflector(argv[2]);
errorCode = reflector -> errorCode;
if (errorCode == NO_ERROR){
cout << "Reflector configuration loaded successfully" << endl;
rotor = new Rotor*[rotorCount];
size_t i = 0;
while (i < rotorCount && errorCode == NO_ERROR) {
rotor[i] = new Rotor (argv[i+3]);
i++;
errorCode = rotor[i]-> errorCode;
//destructor if rotor loading was unsuccessful
if (errorCode != NO_ERROR){
for (int j=0; j<=i; j++)
delete rotor[j];
delete [] rotor;
Here's the Valgrind error message:
reflectors/I.rf rotors/I.rot rotors/II.rot rotors/III.rot rotors/I.pos
==68943== Memcheck, a memory error detector
==68943== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==68943== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==68943== Command: ./enigma plugboards/I.pb reflectors/I.rf rotors/I.rot rotors/II.rot rotors/III.rot rotors/I.pos
==68943==
--68943-- run: /usr/bin/dsymutil "./enigma"
==68943== Use of uninitialised value of size 8
==68943== at 0x100002598: Enigma::Enigma(int, char**) (enigma.cpp:17)
==68943== by 0x100002F92: Enigma::Enigma(int, char**) (enigma.cpp:12)
==68943== by 0x100000862: main (main.cpp:18)
==68943== Uninitialised value was created by a stack allocation
==68943== at 0x1000007D4: main (main.cpp:11)
==68943==
==68943== Invalid write of size 8
==68943== at 0x100002598: Enigma::Enigma(int, char**) (enigma.cpp:17)
==68943== by 0x100002F92: Enigma::Enigma(int, char**) (enigma.cpp:12)
==68943== by 0x100000862: main (main.cpp:18)
==68943== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==68943==
==68943==
==68943== Process terminating with default action of signal 11 (SIGSEGV)
==68943== Access not within mapped region at address 0x0
==68943== at 0x100002598: Enigma::Enigma(int, char**) (enigma.cpp:17)
==68943== by 0x100002F92: Enigma::Enigma(int, char**) (enigma.cpp:12)
==68943== by 0x100000862: main (main.cpp:18)
==68943== If you believe this happened as a result of a stack
==68943== overflow in your program's main thread (unlikely but
==68943== possible), you can try to increase the size of the
==68943== main thread stack using the --main-stacksize= flag.
==68943== The main thread stack size used in this run was 10022912.
==68943==
==68943== HEAP SUMMARY:
==68943== in use at exit: 18,685 bytes in 166 blocks
==68943== total heap usage: 187 allocs, 21 frees, 27,133 bytes allocated
==68943==
==68943== LEAK SUMMARY:
==68943== definitely lost: 0 bytes in 0 blocks
==68943== indirectly lost: 0 bytes in 0 blocks
==68943== possibly lost: 72 bytes in 3 blocks
==68943== still reachable: 200 bytes in 6 blocks
==68943== suppressed: 18,413 bytes in 157 blocks
==68943== Rerun with --leak-check=full to see details of leaked memory
==68943==
==68943== For counts of detected and suppressed errors, rerun with: -v
==68943== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 1 from 1)
Segmentation fault: 11
Thanks
Maybe you should take a look at: How do pointer to pointers work in C?
Basically, I see that rotor is a pointer to pointer or possibly an array of pointers (since you initialize *rotor to NULL and later also set rotor[i] = new Rotor).
Make sure you've initialized rotor properly. Is it pointing to a valid object? If not, you cannot expect *rotort = NULL /* or whatever value */; to work.
Basically suppressed means the memory leaks outside of your code in shared libraries.And *rotor = NULL,but it doesn't point to any valid object..

Why does this Deque destructor have memory leak

I use doubly linked list to implement Deque in C++.
Destructor:
Deque::~Deque()
{
while (this->left_p)
{
node *temp = this->left_p;
this->left_p = this->left_p->next;
delete temp;
}
this->right_p = NULL;
}
when i use valgrind --leak-check=full ./a.out to check memory leak just to test my destructor` I got the following output:
==2636==
==2636== HEAP SUMMARY:
==2636== in use at exit: 72,704 bytes in 1 blocks
==2636== total heap usage: 1,003 allocs, 1,002 frees, 97,760 bytes allocated
==2636==
==2636== 72,704 bytes in 1 blocks are still reachable in loss record 1 of 1
==2636== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2636== by 0x4EC3EFF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==2636== by 0x40106B9: call_init.part.0 (dl-init.c:72)
==2636== by 0x40107CA: call_init (dl-init.c:30)
==2636== by 0x40107CA: _dl_init (dl-init.c:120)
==2636== by 0x4000C69: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
==2636==
==2636== LEAK SUMMARY:
==2636== definitely lost: 0 bytes in 0 blocks
==2636== indirectly lost: 0 bytes in 0 blocks
==2636== possibly lost: 0 bytes in 0 blocks
==2636== still reachable: 72,704 bytes in 1 blocks
==2636== suppressed: 0 bytes in 0 blocks
==2636==
==2636== For counts of detected and suppressed errors, rerun with: -v
==2636== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
I can't figure out why there is still ONE out of 1003 allocs not being free.
Why do i have one memory leak? what is wrong with my destructor?
Test code here:
/* Deque Test Program 6 */
#include <cstring>
#include <iostream>
#include "Deque.h"
using namespace std ;
int main (int argc, char * const argv[]) {
cout << "\n\nDeque Class Test Program 6 - START\n\n";
// Make a Deque
Deque * dq1 = new Deque();
for( int i = 0 ; i<1 ; i++ ){
dq1->push_left(1);
// dq1->display();
}
cout << "Size=" << dq1->size() << endl ;
// The destructor should delete all the nodes.
delete dq1 ;
cout << "\n\nDeque Class Test Program 6 - DONE\n\n";
return 0;
}
edit: remove implementation code.
Essentially, it's not your code's fault, it's valgrind's.
Check this other question that has had the same problem:
Valgrind: Memory still reachable with trivial program using <iostream>
Quoting from the post:
First of all: relax, it's probably not a bug, but a feature. Many implementations of the C++ standard libraries use their own memory pool allocators. Memory for quite a number of destructed objects is not immediately freed and given back to the OS, but kept in the pool(s) for later re-use. The fact that the pools are not freed at the exit of the program cause Valgrind to report this memory as still reachable. The behaviour not to free pools at the exit could be called a bug of the library though.
Hope that helps :)
The memory leak reported by valgrind does not appear to be in your code:
==2636== 72,704 bytes in 1 blocks are still reachable in loss record 1 of 1
==2636== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2636== by 0x4EC3EFF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==2636== by 0x40106B9: call_init.part.0 (dl-init.c:72)
==2636== by 0x40107CA: call_init (dl-init.c:30)
==2636== by 0x40107CA: _dl_init (dl-init.c:120)
This appears to be a heap allocation from within a constructor of a global object. (In theory, it could still come from your code if operator new is called as a tail call, so that it does not show up in the backtrace, but I don't see such an object declaration in your cdoe.)
It is also not an actual leak, it is just some data allocated on the heap at program start. If you install debugging information for libstdc++, then you might get a hint of what is actually being allocated. Then you could also set a breakpoint on call_init and step through the early process initialization, to see the constructors that are called.

GMP mpf functions causing a segmentation fault

I can not figure out that is causing this error. I just installed GMP on ubuntu. This is a 64 bit OS on an AMD cpu (not sure if it matters). I keep getting a segmentation fault.
#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>
#include <time.h>
int main(int argc, char** argv)
{
mpz_t sum, fac;
mpf_t fsum, ffac;
int i;
time_t t;
mpz_init_set_ui(sum, 1);
mpz_init_set_ui(fac, 1);
t = time(NULL);
for(i = 10000; i >= 1; --i)
{
mpz_mul_ui(fac, fac, i);
mpz_add(sum, sum, fac);
if(i % 10000 == 0)
{
printf("%d\n", i);
}
}
printf("Time %d\n", (time(0) - t));
mpf_init(fsum);
mpf_init(ffac);
mpf_set_z(fsum, sum);
mpf_set_z(ffac, fac);
mpz_clear(sum);
mpz_clear(fac);
mpf_div(fac, sum, fac);
mpf_out_str(stdout, 10, 50, fac);
mpf_clear(fsum);
mpf_clear(ffac);
return(EXIT_SUCCESS);
}
This code outputs the following...
10000
Time 0
Segmentation fault (core dumped)
I then tried to run this program with valgrind and this is the output.
==25427== Memcheck, a memory error detector
==25427== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==25427== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==25427== Command: /home/chase/NetBeansProjects/GmpECalc/dist/Debug/GNU-Linux-x86/gmpecalc
==25427==
10000
Time 1
==25427== Invalid read of size 8
==25427== at 0x4E8E590: __gmpn_copyi (in /usr/lib/x86_64-linux-gnu/libgmp.so.10.1.3)
==25427== by 0x400B27: main (main.c:40)
==25427== Address 0x73b0000073c is not stack'd, malloc'd or (recently) free'd
==25427==
==25427==
==25427== Process terminating with default action of signal 11 (SIGSEGV)
==25427== Access not within mapped region at address 0x73B0000073C
==25427== at 0x4E8E590: __gmpn_copyi (in /usr/lib/x86_64-linux-gnu/libgmp.so.10.1.3)
==25427== by 0x400B27: main (main.c:40)
==25427== If you believe this happened as a result of a stack
==25427== overflow in your program's main thread (unlikely but
==25427== possible), you can try to increase the size of the
==25427== main thread stack using the --main-stacksize= flag.
==25427== The main thread stack size used in this run was 8388608.
==25427==
==25427== HEAP SUMMARY:
==25427== in use at exit: 48 bytes in 2 blocks
==25427== total heap usage: 3,706 allocs, 3,704 frees, 27,454,096 bytes allocated
==25427==
==25427== LEAK SUMMARY:
==25427== definitely lost: 0 bytes in 0 blocks
==25427== indirectly lost: 0 bytes in 0 blocks
==25427== possibly lost: 0 bytes in 0 blocks
==25427== still reachable: 48 bytes in 2 blocks
==25427== suppressed: 0 bytes in 0 blocks
==25427== Rerun with --leak-check=full to see details of leaked memory
==25427==
==25427== For counts of detected and suppressed errors, rerun with: -v
==25427== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)
The error seems to be occurring at the mpf_div function. However, if I remove this function the error will occur at mpf_out_str. I also tried initializing ffac and fsum to doubles (instead of setting them to the fac and sum) and I get the same error.
Problem is in this lines:
mpz_clear(sum); // You clear the variables, GMP deallocates their memory
mpz_clear(fac);
mpf_div(fac, sum, fac); // You use cleared variables, segfault
Maybe you meant:
mpf_div(ffac, fsum, ffac);

Glib memory leak using valgrind investigation

I know that there is similiar thread before here about this problem and on this site https://live.gnome.org/Valgrind had been explained, I wrote my simple program below
#include <glib.h>
#include <glib/gprintf.h>
#include <iostream>
int main()
{
const gchar *signalfound = g_strsignal(1);
std::cout << signalfound<< std::endl;
return 0;
}
but when I tried to check using valgrind using this command
G_DEBUG=gc-friendly G_SLICE=always-malloc valgrind --leak-check=full --leak-resolution=high ./g_strsignal
and here is the result
==30274== Memcheck, a memory error detector
==30274== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==30274== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==30274== Command: ./g_strsignal
==30274== Parent PID: 5201
==30274==
==30274==
==30274== HEAP SUMMARY:
==30274== in use at exit: 14,746 bytes in 18 blocks
==30274== total heap usage: 24 allocs, 6 frees, 23,503 bytes allocated
==30274==
==30274== LEAK SUMMARY:
==30274== definitely lost: 0 bytes in 0 blocks
==30274== indirectly lost: 0 bytes in 0 blocks
==30274== possibly lost: 0 bytes in 0 blocks
==30274== still reachable: 14,746 bytes in 18 blocks
==30274== suppressed: 0 bytes in 0 blocks
==30274== Reachable blocks (those to which a pointer was found) are not shown.
==30274== To see them, rerun with: --leak-check=full --show-reachable=yes
==30274==
==30274== For counts of detected and suppressed errors, rerun with: -v
==30274== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
I noticed that what was valgrind said "Reachable blocks (those to which a pointer was found) are not shown.". then I try to check the gmem.c
source on corresponding function since I used glib-2.35.4 version. I found following code
gpointer
g_malloc (gsize n_bytes)
{
if (G_LIKELY (n_bytes))
{
gpointer mem;
mem = glib_mem_vtable.malloc (n_bytes);
TRACE (GLIB_MEM_ALLOC((void*) mem, (unsigned int) n_bytes, 0, 0));
if (mem)
return mem;
g_error ("%s: failed to allocate %"G_GSIZE_FORMAT" bytes",
G_STRLOC, n_bytes);
}
TRACE(GLIB_MEM_ALLOC((void*) NULL, (int) n_bytes, 0, 0));
return NULL;
}
And my question is
Is this still a normal situation on where valgrind had said "Reachable blocks (those to which a pointer was found) are not shown.", and I think this statement is refer to the g_malloc function above in which has returning mem a gpointer variable?
If not are there any alternatives to solve, "still reachable: 14,746 bytes in 18 blocks" on what valgrind had said above?
I'm running x86 fedora 18
thanks
It most likely refers to dynamically allocated memory returned by the function g_strsignal().
valgrind says "Reachable blocks....", because a valid pointer(signalfound) still points to the dynamically allocated memory.
If Valgrind finds that a pointer to pointing to dynamic memory is lost(overwritten) then it reports a "definite leak...", Since it can conclusively say that the dynamic block of memory can never be freed. In your case the pointer still points to the block valgrind does not assume it is lost but it assumes it is probably by design.