std::list copy to std::vector skipping elements - c++

I've run across a rather bizarre exception while running C++ code in my objective-C application. I'm using libxml2 to read an XSD file. I then store the relevant tags as instances of the Tag class in an std::list. I then copy this list into an std::vector using an iterator on the list. However, every now and then some elements of the list aren't copied to the vector. Any help would be greatly appreciated.
printf("\n length list = %lu, length vector = %lu\n",XSDFile::tagsList.size(), XSDFile::tags.size() );
std::list<Tag>::iterator it = XSDFile::tagsList.begin();
//result: length list = 94, length vector = 0
/*
for(;it!=XSDFile::tagsList.end();++it)
{
XSDFile::tags.push_back(*it); //BAD_ACCESS code 1 . . very bizarre . . . . 25
}
*/
std::copy (XSDFile::tagsList.begin(), XSDFile::tagsList.end(), std::back_inserter (XSDFile::tags));
printf("\n Num tags in vector = %lu\n", XSDFile::tags.size());
if (XSDFile::tagsList.size() != XSDFile::tags.size())
{
printf("\n length list = %lu, length vector = %lu\n",XSDFile::tagsList.size(), XSDFile::tags.size() );
//result: length list = 94, length vector = 83
}

I've found the problem. The memory was corrupted causing the std::list to become corrupted during the parsing of the XSD. I parse the XSD using a function start_element.
xmlSAXHandler handler = {0};
handler.startElement = start_element;
I used malloc guard in xcode to locate the use of freed memory. It pointed to the line:
std::strcpy(message, (char*)name);
So I removed the malloc (actually commented in the code) and it worked. The std::vector now consistently copies all 94 entries of the list. If anyone has an explanation as to why this worked that would be great.
static void start_element(void * ctx, const xmlChar *name, const xmlChar **atts)
{
// int len = strlen((char*)name);
// char *message = (char*)malloc(len*sizeof(char));
// std::strcpy(message, (char*)name);
if (atts != NULL)
{
// atts[0] = type
// atts[1] = value
// len = strlen((char*)atts[1]);
// char *firstAttr = (char*)malloc(len*sizeof(char));
// std::strcpy(firstAttr, (char*)atts[1]);
if(strcmp((char*)name, "xs:include")==0)
{
XSDFile xsd;
xsd.ReadXSDTypes((char*)atts[1]);
}
else if(strcmp((char*)name, "xs:element")==0)
{
doElement(atts);
}
else if(strcmp((char*)name, "xs:sequence")==0)
{
//set the default values
XSDFile::sequenceMin = XSDFile::sequenceMax = 1;
if (sizeof(atts) == 4)
{
if(strcmp((char*)atts[3],"unbounded")==0)
XSDFile::sequenceMax = -1;
int i = 0;
while(atts[i] != NULL)
{
//atts[i] = name
//atts[i+i] = value
std::string name((char*)atts[i]);
std::string value((char*)atts[i+1]);
if(name=="minOccurs")
XSDFile::sequenceMin = (atoi(value.c_str()));
else if(name=="maxOccurs")
XSDFile::sequenceMax = (atoi(value.c_str()));
i += 2;
}
}
}
}
//free(message);
}

Related

MariaDB Connector C, mysql_stmt_fetch_column() and memory corruption

I'm working on a wrapper for MariaDB Connector C. There is a typical situation when a developer doesn't know a length of a data stored in a field. As I figured out, one of the ways to obtain a real length of the field is to pass a buffer of lengths to mysql_stmt_bind_result and then to fetch each column by calling mysql_stmt_fetch_column. But I can't understand how the function mysql_stmt_fetch_column works because I'm getting a memory corruption and app abortion.
Here is how I'm trying to reach my goal
// preparations here
...
if (!mysql_stmt_execute(stmt))
{
int columnNum = mysql_stmt_field_count(stmt);
if (columnNum > 0)
{
MYSQL_RES* metadata = mysql_stmt_result_metadata(stmt);
MYSQL_FIELD* fields = mysql_fetch_fields(metadata);
MYSQL_BIND* result = new MYSQL_BIND[columnNum];
std::memset(result, 0, sizeof (MYSQL_BIND) * columnNum);
std::vector<unsigned long> lengths;
lengths.resize(columnNum);
for (int i = 0; i < columnNum; ++i)
result[i].length = &lengths[i];
if (!mysql_stmt_bind_result(stmt, result))
{
while (true)
{
int status = mysql_stmt_fetch(stmt);
if (status == 1)
{
m_lastError = mysql_stmt_error(stmt);
isOK = false;
break;
}
else if (status == MYSQL_NO_DATA)
{
isOK = true;
break;
}
for (int i = 0; i < columnNum; ++i)
{
my_bool isNull = true;
if (lengths.at(i) > 0)
{
result[i].buffer_type = fields[i].type;
result[i].is_null = &isNull;
result[i].buffer = malloc(lengths.at(i));
result[i].buffer_length = lengths.at(i);
mysql_stmt_fetch_column(stmt, result, i, 0);
if (!isNull)
{
// here I'm trying to read a result and I'm getting a valid result only from the first column
}
}
}
}
}
}
If I put an array to the mysql_stmt_fetch_column then I'm fetching the only first field valid, all other fields are garbage. If I put a single MYSQL_BIND structure to this function, then I'm getting an abortion of the app on approximately 74th field (funny thing that it's always this field). If I use another array of MYSQL_BIND then the situation is the same as the first case.
Please help me to understand how to use it correctly! Thanks
Minimal reproducible example

Golang CGO with large char pointer - SEGSERV

I have a large amount of data being read from TagLib library and passed to GoLang (mpeg image data).
Here is where data is fetched:
void audiotags_mpeg_artwork(TagLib::MPEG::File *mpegFile, int id) {
TagLib::ID3v2::Tag *id3v2 = mpegFile->ID3v2Tag(false);
if (id3v2!=nullptr) {
const TagLib::ID3v2::FrameList frameList = id3v2->frameListMap()["APIC"];
for(auto it = frameList.begin(); it != frameList.end(); it++) {
TagLib::ID3v2::AttachedPictureFrame * frame = (TagLib::ID3v2::AttachedPictureFrame *)(*it);
if (frame!=nullptr && frame->size() > 0) {
const auto &apicBase64 = frame->picture().toBase64();
auto len = apicBase64.size();
if (len > 0) {
// Generate memory for key
char* key = new char[5];
memcpy(key, "APIC", 4);
key[4]='\0';
// Generate memory for picture data
char* val = new char[len];
memcpy (val, apicBase64.data(), len);
// Send to GoLang
go_map_audiotags(id, key, val);
// Free memory
delete[] key;
delete[] val;
}
}
}
}
}
At this point, go_map_autotags works (I use a similar method for other data). This also works for other picture data, however depending on the size this will crash with:
unexpected fault address 0x766a000
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x766a000 pc=0x404530b]
Within GoLang, I have the following export:
//export go_map_audiotags
func go_map_audiotags(id C.int, key *C.char, val *C.char) {
m := maps[int(id)]
k := strings.ToLower(C.GoString(key))
log.Println("go_map_audiotags k:", k) // <--- works
v := C.GoString(val) // <--- crashes
log.Println("go_map_audiotags v:", v) // <--- Does not reach
m[k] = v
}
Is there a bette way I should be transporting this data? I assume what's happening is:
1) The C.char limit be being reached
2) C++ is, for some reason, recycling the memory before setting v in GoLang
The data stored in val is not null-terminated. In your C code, when you make a copy using memcpy, the null terminator is not included. In the C code, change the code to:
// Generate memory for picture data
char* val = new char[len+1];
memcpy (val, apicBase64.data(), len);
val[len] = '\0';

c++ find and handle each lines is slower than php

Hello i created a program to handle a config file line by checking each lines and get the config blocks but for first time i made it with php and the speed was amazing. we have some blocks like this
Block {
}
php program can read each line and detect about 50,000 of this blocks in just 1 second after that i went to c++ to create my program in c++ but i saw a very very bad problem. my program was too slow (read 50,000 of this blocks in 55 seconds) while my php codes was exactly the same of c++ codes (in action and activity). php was 55x faster than c++ while the codes are the same.
this is my code in php
const PATH = "conf.txt";
if(!file_exists(PATH)) die("path_not_found");
if(!is_readable((PATH))) die("path_not_readable");
$Lines = explode("\r\n", file_get_contents(PATH));
class Block
{
public $Name;
public $Keys = array();
public $Blocks = array();
}
function Handle(& $Lines, $Start, & $Return_block, & $End_on)
{
for ($i = $Start; $i < count($Lines); $i++)
{
while (trim($Lines[$i]) != "")
{
$Pos1 = strpos($Lines[$i], "{");
$Pos2 = strpos($Lines[$i], "}");
if($Pos1 !== false && ($Pos2 === false || $Pos2 > $Pos1)) // Detect { in less position
{
$thisBlock = new Block();
$thisBlock->Name = trim(substr($Lines[$i], 0, $Pos1));
$Lines[$i] = substr($Lines[$i], $Pos1 + 1);
Handle($Lines, $i, $thisBlock, $i);
$Return_block->Blocks[] = $thisBlock;
}
else { // Detect } in less position than {
$Lines[$i] = substr($Lines[$i], $Pos2 + 1);
$End_on = $i;
return;
}
}
}
}
$DefaultBlock = new Block();
Handle($Lines, 0, $DefaultBlock, $NullValue);
$OutsideKeys = $DefaultBlock->Keys;
$Blocks = $DefaultBlock->Blocks;
echo "Found (".count($OutsideKeys).") keys and (".count($Blocks).") blocks.<br><br>";
and this is my code in C++
string Trim(string & s)
{
auto wsfront = std::find_if_not(s.begin(), s.end(), [](int c) {return std::isspace(c); });
auto wsback = std::find_if_not(s.rbegin(), s.rend(), [](int c) {return std::isspace(c); }).base();
return (wsback <= wsfront ? std::string() : std::string(wsfront, wsback));
}
class Block
{
private:
string Name;
vector <Block> Blocks;
public:
void Add(Block & thisBlock) { Blocks.push_back(thisBlock); }
Block(string Getname = string()) { Name = Getname; }
int Count() { return Blocks.size(); }
};
void Handle(vector <string> & Lines, size_t Start, Block & Return, size_t & LastPoint, bool CheckEnd = true)
{
for (size_t i = Start; i < Lines.size(); i++)
{
while (Trim(Lines[i]) != "")
{
size_t Pos1 = Lines[i].find("{");
size_t Pos2 = Lines[i].find("}");
if (Pos1 != string::npos && (Pos2 == string::npos || Pos1 < Pos2)) // Found {
{
string Name = Trim(Lines[i].substr(0, Pos1));
Block newBlock = Block(Name);
Lines[i] = Lines[i].substr(Pos1 + 1);
Handle(Lines, i, newBlock, i);
Return.Add(newBlock);
}
else { // Found }
Lines[i] = Lines[i].substr(Pos2 + 1);
return;
}
}
}
}
int main()
{
string Cont;
___PATH::GetFileContent("D:\\conf.txt", Cont);
vector <string> Lines = ___String::StringSplit(Cont, "\r\n");
Block Return;
size_t Temp;
// The problem (low handle speed) start from here not from including or split
Handle(Lines, 0, Return, Temp);
cout << "Is(" << Return.Count() << ")" << endl;
return 0;
}
as you can see, this codes are exactly the same in action but i don't know why php handling in this code is 55x faster than my c++ codes. you can create a txt file and create about 50,000 of this block's
Block {
}
and test it yourself. please help me to fix this. i am really confused (same codes but not same performance
php = 50,000 blocks and detect in 1 second
c++ = 50,000 blocks and detect in 55 seconds (and maybe more) !
i have no problem in my program design. because i got my performance completely on php but my problem is on c++ that is 55x slower than php in same code action !
i am using (visual studio 2017) to compile this program (c++)
First, "code" is singular, not plural.
C++ is a very different language than php. It is not "the same code", and it is nowhere near the same in action.
For example, these two lines:
Block newBlock = Block(Name);
Return.Add(newBlock);
First create a Block on the stack, and then call Block's copy constructor to make another one inside the vector. You then throw away the stack object.
Also, vectors guarantee that they are contiguous, so as you add new Blocks via your Add method, vector will occasionally stop, allocate another chunk of memory (twice as big as the last one, iirc), copy everything over to that new chunk, and then free the old one. Either preallocate the vector (via vector::reserve()), or consider using something like a deque that doesn't guarantee continuity in memory if you don't need that property.
I also don't know what ___String::StringSplit does, but you are almost certain to have the same vector growth problem in reading your file.
Culprit is in these 2 lines:
Handle(Lines, i, newBlock, i);
Return.Add(newBlock);
Let's say you have 5 levels of 1 block each. What Happens on bottom one? You copy one instance of block. What happens on level 4? You copy 2 blocks (parent and its child). So for level 5 you make 15 copies - 1+2+3+4+5. Look at this diagram:
Handle level1 copies 5 blocks (`Return`->level4->level3->level4->level5)
Handle level2 copies 4 blocks (`Return`->level3->level4->level5)
Handle level3 copies 3 blocks (`Return`->level4->level5
Handle level4 copies 2 blocks (`Return`->level5)
Handle level5 copies 1 block (`Return`)
Formula is:
S = ( N + N^2 ) / 2
so for levels 20 you would do 210 copies and so on.
Suggestion is to use move semantics to avoid this copy:
// change method Add to this
void Add(Block thisBlock) { Blocks.push_back(std::move(thisBlock)); }
// and change this call
Return.Add( std::move( newBlock ) );
Or allocate blocks dynamically using smart pointers
Out of simple curiousity, try this Trim implementation instead:
void _Trim(std::string& result, const std::string& s) {
const auto* ptr = s.data();
const auto* left = ptr;
const auto* end = s.data() + s.size();
while (ptr < end && std::isspace(*ptr)) {
++ptr;
}
if (ptr == end) {
result = "";
return;
}
left = ptr;
while (end > left && std::isspace(*(end-1))) {
--end;
}
result = std::string(left, end);
}
std::string Trim(const std::string& s) {
// Not sure if RVO would fire for direct implementation of _Trim here
std::string result;
_Trim(result, s);
return result;
}
And another optimization:
void Add(Block& thisBlock) {
Blocks.push_back(std::move(thisBlock));
}
// Don't use thisBlock after call to this function. It is
// far from being pretty but it should avoid *lots* of copies.
I wonder if you'll get better result. Pls let me know.

Relocating memory for variable array size

Let's say we have got an array of villages. Each village is unique with regards to the array. A village can have multiple stores. The problem is that you only have limited (dynamic) memory and cannot recklessly initialise memory that you will not use. Therefore, If you create a new store in a village, you will need to relocate your village to a different place in the memory since the size of the village has grown. If you add a new village, you will have to relocate as well in order to accumulate for the extra size.
The problem is that you cannot simply use the following:
pVillage = (Village *)realloc( pVillage, ++MAX_VILLAGE * sizeof(Village));
This is because you will create memory for x villages containing 1 store per village. Since it could be the case that existing villages have multiple stores, the memory allocation will fail and could possibly overwrite memory which you shouldn't touch!
My example c++ code is as such:
class Store
{
char* m_pStoreName;
int m_nAmountOfProducts;
public:
Store(char* name, int number);
Store();
};
Store::Store(char* name, int number)
{
char *m_pStoreName = new char[10];
strcpy( m_pStoreName, name );
m_nAmountOfProducts = number;
}
Store::Store()
{
}
int nMaxStore = 1;
class Village
{
int m_nAmountOfStores;
Store *m_pStoreDb = (Store *)malloc( nMaxStore * sizeof(Store) );
public:
char* m_pVillageName;
Village(char* name);
Village();
addStore(Store* st);
};
Village::Village(char* name)
{
char *m_pVillageName = new char[10];
strcpy(m_pVillageName, name);
m_nAmountOfStores = 0;
}
Village::Village()
{
}
Village::addStore(Store* st)
{
if ( m_nAmountOfStores == nMaxStore )
{
//Need more memory
m_pStoreDb = (Store *)realloc( m_pStoreDb, ++nMaxStore * sizeof(Store) ); //Realocate and copy memory
}
//Add to db
*( m_pStoreDb + m_nAmountOfStores++ ) = *st;
}
int nAmountVillages = 0;
int nMaxVillages = 1;
Village *pVillageDB = (Village *)malloc( nMaxVillages * sizeof(Village) );
int main()
{
addNewVillage("Los Angeles", new Store("Store 1", 20));
addNewVillage("San Fransisco", new Store("Store 1", 10 ));
addNewVillage("New York", new Store("Store 1", 15));
addNewVillage("Los Angeles", new Store("Store 2", 12));
addNewVillage("Los Angeles", new Store("Store 3", 22));
addNewVillage("Amsterdam", new Store("Store 1", 212));
addNewVillage("Los Angeles", new Store("Store 4", 2));
return 0;
}
void addNewVillage(char* villageName, Store *store)
{
for ( int x = 0; x < nAmountVillages; x++ )
{
Village *pVil = (pVillageDB + x);
if ( !strcmp(pVil->m_pVillageName, villageName) )
{
//Village found, now add store
pVil->addStore( store );
return;
}
}
//Village appears to be new, add new one after expanding memory
if ( nAmountVillages == nMaxVillages )
{
//Need more memory
pVillageDB = (Village *)realloc( pVillageDB, ++nMaxVillages * sizeof(Village) ); //Realocate and copy memory
}
//Create new village and add store to village
Village *pNewVil = new Village( villageName );
pNewVil->addStore( store );
//Add village to DB
(pVillageDB + nAmountVillages++ ) = pNewVil;
}
My problem, question is:
What is the correct way to dynamically enlarge an array of custom
objects if not all objects have the same size, or if an existing
object in this array grows in size (new stores in a village)?
I have previously created the following function:
int Village::calculateStoreSize()
{
int nSize = 0;
for ( int x = 0; x < m_nAmountOfStores; x++ )
{
Store *pStore = ( m_pStoreDb + x );
nSize += sizeof(*pStore);
}
return nSize;
}
I tried using it in the following function to relocate the store array in a village object:
m_pStoreDb = (Store *)realloc( m_pStoreDb, Village::calculateStoreSize() + sizeof(Store) );
Unfortunately, when adding a new village, everything goes wrong with memory!
There are two types of arrays in C/C++: Static and Dynamic.
Unfortunately even though the latter one suggests that the size can be "easily adjusted" via calls like realloc(), the truth is, always reallocating all the objects of an array just to increase the size by 1 is tedious and should thus be avoided. However there is a workaround!
If I understand your question correctly, you want to dynamically allocate memory for objects of varying size. The casual way to do so is via linked lists. The most basic linked list would look something like this:
struct store {
char* name;
int products;
store* next;
};
class MyLinkedList {
private:
store* first;
public:
//Constructor
MyLinkedList();
//Destructor: iterate through the list and call delete on all elements.
~MyLinkedList();
void Add(const char* name, int products) {
if(this->first == NULL) {
store* new_store = new store;
new_store->name = new char[30];
strcpy(new_store->name, name);
new_store->products = products;
new_store->next = NULL;
this->first = new_store;
} else {
store* iter = this->first;
while(iter->next != NULL) {
iter = iter->next;
}
//Now iter points at the last element within the linked list.
store* new_store = new store;
new_store->name = new char[30];
strcpy(new_store->name, name);
new_store->products = products;
new_store->next = NULL;
iter->next = new_store;
}
}
//...
};
If you don't want to implement your own, you should have a look at the Standard Template Library's std::vector.
See Stackoverflow - Linked Lists in C++ or Wikipedia - Linked Lists for more detail on linked lists, or to get a good description of the STL's std::vector see cpluslus.com - std::vector.
Hope I could help, cheers!
lindebear
PS: You can also nest linked lists, see here.

Function has corrupt return value

I have a situation in Visual C++ 2008 that I have not seen before. I have a class with 4 STL objects (list and vector to be precise) and integers.
It has a method:
inline int id() { return m_id; }
The return value from this method is corrupt, and I have no idea why.
debugger screenshot http://img687.imageshack.us/img687/6728/returnvalue.png
I'd like to believe its a stack smash, but as far as I know, I have no buffer over-runs or allocation issues.
Some more observations
Here's something that puts me off. The debugger prints right values in the place mentioned // wrong ID.
m_header = new DnsHeader();
assert(_CrtCheckMemory());
if (m_header->init(bytes, size))
{
eprintf("0The header ID is %d\n", m_header->id()); // wrong ID!!!
inside m_header->init()
m_qdcount = ntohs(h->qdcount);
m_ancount = ntohs(h->ancount);
m_nscount = ntohs(h->nscount);
m_arcount = ntohs(h->arcount);
eprintf("The details are %d,%d,%d,%d\n", m_qdcount, m_ancount, m_nscount, m_arcount);
// copy the flags
// this doesn't work with a bitfield struct :(
// memcpy(&m_flags, bytes + 2, sizeof(m_flags));
//unpack_flags(bytes + 2); //TODO
m_init = true;
}
eprintf("Assigning an id of %d\n", m_id); // Correct ID.
return
m_header->id() is an inline function in the header file
inline int id() { return m_id; }
I don't really know how best to post the code snippets I have , but here's my best shot at it. Please do let me know if they are insufficient:
Class DnsHeader has an object m_header inside DnsPacket.
Main body:
DnsPacket *p ;
p = new DnsPacket(r);
assert (_CrtCheckMemory());
p->add_bytes(buf, r); // add bytes to a vector m_bytes inside DnsPacket
if (p->parse())
{
read_packet(sin, *p);
}
p->parse:
size_t size = m_bytes.size(); // m_bytes is a vector
unsigned char *bytes = new u_char[m_bytes.size()];
copy(m_bytes.begin(), m_bytes.end(), bytes);
m_header = new DnsHeader();
eprintf("m_header allocated at %x\n", m_header);
assert(_CrtCheckMemory());
if (m_header->init(bytes, size)) // just set the ID and a bunch of other ints here.
{
size_t pos = DnsHeader::SIZE; // const int
if (pos != size)
; // XXX perhaps generate a warning about extraneous data?
if (ok)
m_parsed = true;
}
else
{
m_parsed = false;
}
if (!ok) {
m_parsed = false;
}
return m_parsed;
}
read_packet:
DnsHeader& h = p.header();
eprintf("The header ID is %d\n", h.id()); // ID is wrong here
...
DnsHeader constructor:
m_id = -1;
m_qdcount = m_ancount = m_nscount = m_arcount = 0;
memset(&m_flags, 0, sizeof(m_flags)); // m_flags is a struct
m_flags.rd = 1;
p.header():
return *m_header;
m_header->init: (u_char* bytes, int size)
header_fmt *h = (header_fmt *)bytes;
m_id = ntohs(h->id);
eprintf("Assigning an id of %d/%d\n", ntohs(h->id), m_id); // ID is correct here
m_qdcount = ntohs(h->qdcount);
m_ancount = ntohs(h->ancount);
m_nscount = ntohs(h->nscount);
m_arcount = ntohs(h->arcount);
You seem to be using a pointer to an invalid class somehow. The return value shown is the value that VS usually uses to initialize memory with:
2^32 - 842150451 = 0xCDCDCDCD
You probably have not initialized the class that this function is a member of.
Without seeing more of the code in context.. it might be that the m_id is out of the scope you expect it to be in.
Reinstalled VC++. That fixed everything.
Thank you for your time and support everybody! :) Appreciate it!