VS2010 C code - String pooling - c++

Below code crash in VS 2010 when you compile with following flag and if you add /GF- or remove the opimization flag they don't crash. The crash occur at assembly code which translate 'if( path[i] == '/' )'. I like to understand the optimization that compiler does here and lead to crash. Looking forward for some pointers.
-Karthik
cl.exe /MD /O2 test.c
// Test.c
#include <stdio.h>
#include <string.h>
void testpath(char* path, int bufsiz)
{
int i;
printf("%p\n", path);
for( i=0; i < strlen(path); i++ ) {
if( path[i] == '/' ) {
path[i] = '\\';
}
}
}
int main()
{
const char* path = "testexport.prj";
char *path1 = "testexport.prj";
printf("%p\n", path);
printf("%p\n", path1);
testpath(path, 1024);
}

Trying to modify the contents of a string literal invokes Undefined Behaviour.
From ISO C99 (Section 6.4.5/6)
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined
From ISO C++-98 (Section 2.13.4/2)
Whether all string literals are distinct(that is, are stored in non overlapping objects) is implementation defined. The effect of attempting to modify a string literal is undefined.
On most implementations (including MSVC) this results to crash of your application.

You try to modify a string literal, that's undefined behavior.
const char* path = "testexport.prj";
testpath(path, 1024);
// then later:
void testpath(char* path, int bufsiz)
{
int i;
for( i=0; i`<`strlen(path); i++ ) {
if( path[i] == '/' ) {
path[i] = '\\';// <<<<<< UB here
}
}
string literals are usually stored in read-only memory, so on your implementation an attempt to modify a string literal results in access violation that crashes your program.

We have an application where in VC6.0 there appears to be a bug in string pooling. Two different strings appear to "pool" to one string, causing a crash. This crash does not occur in VS2010. OR if /Zi is used in VS6.0 instead of /ZI. Just wondering if the default in VS6.0 is to use string pooling and not the default in VS2010. In which case it could be that the string pooling bug still exists. If there are MANY strings which are near identical, the working theory I have (yet to be vetted) is that there is a hash collision going undetected in the string pooling, which eliminates one of the two strings. This would be visible in the generated ASM code when looking at the pointers for two nearly similar strings.
In our case we are not modifying strings in VC6.0, just referencing them.

Related

How to create a function that removes all of a selected character in a C-string?

I want to make a function that removes all the characters of ch in a c-string.
But I keep getting an access violation error.
Unhandled exception at 0x000f17ba in testassignments.exe: 0xC0000005: Access violation writing location 0x000f787e.
void removeAll(char* &s, const char ch)
{
int len=strlen(s);
int i,j;
for(i = 0; i < len; i++)
{
if(s[i] == ch)
{
for(j = i; j < len; j++)
{
s[j] = s[j + 1];
}
len--;
i--;
}
}
return;
}
I expected the c-string to not contain the character "ch", but instead, I get an access violation error.
In the debug I got the error on the line:
s[j] = s[j + 1];
I tried to modify the function but I keep getting this error.
Edit--
Sample inputs:
s="abmas$sachus#settes";
ch='e' Output->abmas$sachus#settes, becomes abmas$sachus#stts
ch='t' Output-> abmas$sachus#stts, becomes abmas$sachus#ss.
Instead of producing those outputs, I get the access violation error.
Edit 2:
If its any help, I am using Microsoft Visual C++ 2010 Express.
Apart from the inefficiency of your function shifting the entire remainder of the string whenever encountering a single character to remove, there's actually not much wrong with it.
In the comments, people have assumed that you are reading off the end of the string with s[j+1], but that is untrue. They are forgetting that s[len] is completely valid because that is the string's null-terminator character.
So I'm using my crystal ball now, and I believe that the error is because you're actually running this on a string literal.
// This is NOT okay!
char* str = "abmas$sachus#settes";
removeAll(str, 'e');
This code above is (sort of) not legal. The string literal "abmas$sachus#settes" should not be stored as a non-const char*. But for backward compatibility with C where this is allowed (provided you don't attempt to modify the string) this is generally issued as a compiler warning instead of an error.
However, you are really not allowed to modify the string. And your program is crashing the moment you try.
If you were to use the correct approach with a char array (which you can modify), then you have a different problem:
// This will result in a compiler error
char str[] = "abmas$sachus#settes";
removeAll(str, 'e');
Results in
error: invalid initialization of non-const reference of type ‘char*&’ from an rvalue of type ‘char*’
So why is that? Well, your function takes a char*& type that forces the caller to use pointers. It's making a contract that states "I can modify your pointer if I want to", even if it never does.
There are two ways you can fix that error:
The TERRIBLE PLEASE DON'T DO THIS way:
// This compiles and works but it's not cool!
char str[] = "abmas$sachus#settes";
char *pstr = str;
removeAll(pstr, 'e');
The reason I say this is bad is because it sets a dangerous precedent. If the function actually did modify the pointer in a future "optimization", then you might break some code without realizing it.
Imagine that you want to output the string with characters removed later, but the first character was removed and you function decided to modify the pointer to start at the second character instead. Now if you output str, you'll get a different result from using pstr.
And this example is only assuming that you're storing the string in an array. Imagine if you actually allocated a pointer like this:
char *str = new char[strlen("abmas$sachus#settes") + 1];
strcpy(str, "abmas$sachus#settes");
removeAll(str, 'e');
Then if removeAll changes the pointer, you're going to have a BAD time when you later clean up this memory with:
delete[] str; //<-- BOOM!!!
The I ACKNOWLEDGE MY FUNCTION DEFINITION IS BROKEN way:
Real simply, your function definition should take a pointer, not a pointer reference:
void removeAll(char* s, const char ch)
This means you can call it on any modifiable block of memory, including an array. And you can be comforted by the fact that the caller's pointer will never be modified.
Now, the following will work:
// This is now 100% legit!
char str[] = "abmas$sachus#settes";
removeAll(str, 'e');
Now that my free crystal-ball reading is complete, and your problem has gone away, let's address the elephant in the room:
Your code is needlessly inefficient!
You do not need to do the first pass over the string (with strlen) to calculate its length
The inner loop effectively gives your algorithm a worst-case time complexity of O(N^2).
The little tricks modifying len and, worse than that, the loop variable i make your code more complex to read.
What if you could avoid all of these undesirable things!? Well, you can!
Think about what you're doing when removing characters. Essentially, the moment you have removed one character, then you need to start shuffling future characters to the left. But you do not need to shuffle one at a time. If, after some more characters you encounter a second character to remove, then you simply shunt future characters further to the left.
What I'm trying to say is that each character only needs to move once at most.
There is already an answer demonstrating this using pointers, but it comes with no explanation and you are also a beginner, so let's use indices because you understand those.
The first thing to do is get rid of strlen. Remember, your string is null-terminated. All strlen does is search through characters until it finds the null byte (otherwise known as 0 or '\0')...
[Note that real implementations of strlen are super smart (i.e. much more efficient than searching single characters at a time)... but of course, no call to strlen is faster]
All you need is your loop to look for the NULL terminator, like this:
for(i = 0; s[i] != '\0'; i++)
Okay, and now to ditch the inner loop, you just need to know where to stick each new character. How about just keeping a variable new_size in which you are going to count up how long the final string is.
void removeAll(char* s, char ch)
{
int new_size = 0;
for(int i = 0; s[i] != '\0'; i++)
{
if(s[i] != ch)
{
s[new_size] = s[i];
new_size++;
}
}
// You must also null-terminate the string
s[new_size] = '\0';
}
If you look at this for a while, you may notice that it might do pointless "copies". That is, if i == new_size there is no point in copying characters. So, you can add that test if you want. I will say that it's likely to make little performance difference, and potentially reduce performance because of additional branching.
But I'll leave that as an exercise. And if you want to dream about really fast code and just how crazy it gets, then go and look at the source code for strlen in glibc. Prepare to have your mind blown.
You can make the logic simpler and more efficient by writing the function like this:
void removeAll(char * s, const char charToRemove)
{
const char * readPtr = s;
char * writePtr = s;
while (*readPtr) {
if (*readPtr != charToRemove) {
*writePtr++ = *readPtr;
}
readPtr++;
}
*writePtr = '\0';
}

Why can't I read apostrophes using ifstream without it crashing?

I'm using this code:
std::string word;
std::ifstream f((file_name + ".txt").c_str());
while (f >> word) {
good_input = true;
for (int i = 0; i < word.length(); ++i) {
if (ispunct(word.at(i))) {
word.erase(i--, 1);
}
else if (isupper(word.at(i))){
word.at(i) = tolower(word.at(i));
}
}
Every time I read the word "doesn't" from a text file, I get this error:
Debug Assertion Failed!
Program: directory\SortingWords(Length).exe
File: minkernel\crts\ucrt\src\appcrt\convert\istype.cpp
Line: 36
Expression: c >= -1 && c <= 255
For more information please visit... [etc.]
When I click "abort", my program exits with code 3. Don't know if that's helpful?
It looks like it's got something to do with the apostrophe maybe? This code works find for all other words in my document up until this one. And works great with documents that don't include apostrophes, yet they include plenty of other punctuation...
I tried changing the encoding of the text file (simply made with notepad), but that didn't help. Generally found lots of complaints about apostrophes but no working answers. Can anyone help me figure out what's going on?
As documentation for ispunct says:
The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF.
Visual C++ is nice enough to add an almost explicit message for this error if you link to the debug runtime (this is often the case with undefined behaviour - with the release runtime, it just crashes or behaves strangely; with the debug runtime, you get an error dialog box).
In theory, this means that in the character set used by your environment, ' is not representable as an unsigned char, i.e. its character code is too big or too low.
In practice, this seems very unlikely and perhaps even impossible on Windows. It is much more likely that your file doesn't really contain an apostrophe but a character that merely looks like one, e.g. an accent: ´
Here's how you can reproduce the problem in a simple manner:
#include <ctype.h>
int main()
{
ispunct('\'');
ispunct('´'); // undefined behaviour (crash or error message with Visual C++)
}
isupper has the same problem.
You can use those functions safely with static_cast, e.g.:
if (ispunct(static_cast<unsigned char>(word.at(i))))
Of course, now ispunct will return zero for the character. If you really need to cover ´, you have to do so explicitly, for example with a helper function like this:
bool extended_ispunct(int c)
{
return static_cast<unsigned char>(c) || c == '´';
}

Get signal SIGABRT

I' m trying to understad the work with a pointers. So I've wrote a test program where name is split into labels by removing the separating dots. Each label is represented as a length/data pair as follows:
google.ru represents as "x\06googlex\02ru"
I get signal SIGABRT when i returned from my test function
I guess it's caused with my bad work with pointer.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void test(unsigned char* dns, unsigned char* host);
int main(int argc, const char * argv[])
{
unsigned char host[]="google.ru";
unsigned char* dnsTest=(unsigned char*)malloc(20);
if(dnsTest==NULL)
{
exit(1);
}
test(dnsTest, host);
printf("%s", dnsTest);
free(dnsTest);
return 0;
}
void test(unsigned char* dns, unsigned char* host)
{
strcat((char*)host, ".");
int lock=0;
for(int i=0; i<strlen((char *)host);i++)
{
if(host[i]=='.')
{
*dns=i-lock;
dns++;
for(;lock<i; lock++)
{
*dns=host[lock];
dns++;
}
lock++;
}
}
*dns='\0';
}
Since you tagged c++ as well, you can use new \ delete instead of malloc \ free. New will call constructor (when you will learn classes it will be useful), and new return to you exact type, so no need to cast (and casting is almost always a bad idea)
Then assertion after memory allocation with new will be redundant:
if(dnsTest==NULL)
since c++ new will throw an exception by default.
If you go ahead, you can use std::string - its much more simple than c null terminated strings (but not for the task of understanding pointers).
Every other comments (about redundant strlen usage and absence of '\0' symbol) is correct too, but I want to give you two advices, how to grasp pointers.
First - read K&R - it's the Bible.
Second - if you use Microsoft Visual Studio, you can compile your code in Debug mode, and use Memory View tab. Visual Studio compiler place some magic number inside memory (link) at debug mode. They can help you with understanding where you addressing unallocated memory or how exactly your memory layout looks like.

C++ code with GCC optimisation causes core with invalid free() on strings

I have C++ code that is built with gcc (4.1.2) with -O2.
When this code is compiled and run with no optimisation, the program executes without any issue.
When compiled with O1/O2/O3, the code will crash with a valgrind indicating an invalid free.
This has been narrowed to the string variables inside the function.
The code will read in a file, and will iterate the contents.
I have removed all processing code, and the following code snippet causes the core...
int MyParser::iParseConfig(Config &inConfig)
{
bool keepGoing = true;
while(keepGoing)
{
string valueKey = "";
keepGoing = false;
}
return 0;
}
When this is run with non-optimised, it works fine.
When I build and run this optimised, it will not work.
It looks to be an issue with the way GCC optimises the string class.
Any ideas how we can circumvent this?
If you are overflowing the charIndex, (when i gets higher than 99) who knows what your program state is in... the storage you declare is not very big (2 chars and a null).
I cannot explain why exactly this code crashes for you when compiled with optimizations, perhaps i gets more than 2 digits and you have a buffer overflow, maybe it's something different, but anyway I would change the code:
sprintf(charIndex, "%d", i++);
string valueKey = "";
valueKey.append("Value").append(charIndex);
string value = inConfig.sFindField(valueKey);
like this:
stringstream ss;
ss << "Value" << i++;
string value(ss.str());
It is more C++-like and should work. Try it.
If you are curious if this is really a buffer overflow situation, insert the line:
assert(i < 99);
before the call to printf. Or use snprintf:
snprintf(charIndex, sizeof(charIndex), "%d", i++);
Or make your buffer bigger.
This was an issue with header files being incorrectly included - there was a duplicate include of the MyParser.h file in the list of includes.
This caused some strange scenario around the string optimisation within the GCC optimisation levels.

Problem with simple overloading of functions

The set functions' idea:
First argument is a reference, allocates space to hold copy of testing, sets str member of beany to point to the new block, copies testing to new block, and sets ct member of beany.
Problem:
1) Line that contains:
for (int i = 0; i < temp.length(); i++)
Error:expression must have a class
type
2) Line that contains:
temp[i] = cstr[i];
Error: expression must have
pointer-to-object type
3) overload of function show() for stringy type can't find matching function signature due to presence of const
Very new to these concepts, could someone explain the reason for the errors?
#include "stdafx.h"
#include <iostream>
using namespace std;
#include <cstring>
#include <cctype>
struct stringy {
char * str; //points to a string
int ct; //length of string(not counting '\0')
};
void set(stringy & obj, char cstr);
void show(const stringy & obj, int times=1);
void show(const char * cstr, int times = 1);
int _tmain(int argc, _TCHAR* argv[])
{
string beany;
char testing[] = "Reality isn't what it used to be.";
set(beany, testing);
show(beany);
show(beany, 2);
testing[0] = 'D';
testing[1] = 'u';
show(testing);
show(testing, 3);
show("Done");
return 0;
}
void set(stringy & obj, char cstr)
{
char * temp = new char[cstr];
obj.str = temp;
for (int i = 0; i < temp.length(); i++)
temp[i] = cstr[i];
}
void show(const stringy & obj, int times)
{
for (int i = 0; i < times; i++)
cout << obj.str;
}
void show(const char * cstr, int times)
{
for (int i = 0; i < times; i++)
cout << cstr;
}
I hope you won't take this personally... but this code has so many errors on so many logical levels that in my opinion it's simply FUBAR.
Please do yourself a favor and start by reading a C++ book. A list of good ones can be found here and you can also find decent resources for free on the internet.
C++ is not a language that you (or anyone else indeed) can hope to learn by just typing it some characters and looking at what happens... that is simply just a suicidal approach to C++.
EDIT:
After doing some googling seems indeed that you are following a book. From a few excerpts I found on the net seems a book that is teaching programming using C++. I don't think this is a good idea because C++ is too complex and apparently illogical to be the first language for a programmer, also it's very very easy to get programs that compile fine and that will just drive you crazy when you run them. There are some gurus however that think this is a viable approach.
Your book is indeed listed, not because is good, but just because the title is close to one of a good book. Probably just a marketing trick to sell it.
EDIT2:
I felt a bit sorry for being so rude when your only fault is choosing a bad book to learn C++. To try compensate here is my attempt to tell all problems I think are present in your C++ code:
1. Learn standard C++
#include "stdafx.h"
If you are learning C++ then you should try to put aside everything that microsoft tells you about the language. Standard C++ has never been important for microsoft; probably because portable code is more a threat to microsoft than good for them.
Once you know C++ (but only then) it's ok to write microsoft-specific code if that is your platform. But it's important that you know what is ms-only and what is C++. There are cases in which the difference is just plain stupid and not worth considering (e.g. for scoping or handling of allocation failures) but sometimes you actually MUST use their variation of the language to work with windows.
MS development tools are great (or at least they were... I was simply in love with VC6 for example) but they will always try to trick you into writing unportable code. This is done both in IDEs and in the windows API examples. Don't fall into those traps: write portable code unless you have a real need for platform-specific code and be always be conscious about it.
2. Don't pollute the global namespace
using namespace std;
This is a bad idea. Even if it's a bit annoying it's much better if you get used to write std:: before standard functions. The reasons are because of the complex rules of name lookup and overload resolution that are present in the language and because of all the names that you are getting into your namespace without being conscious about them.
Saving typing time is not really that important in C++ (it's important in PERL if you are writing a throw-away script... but not for general programs). Much more important to help who is reading your code (and this includes yourself) and using std:: does that.
3. Use a proper main declaration
This is again about not falling in stupid MS traps. The correct declaration for main is
int main(int argc, const char *argv[])
You should never use anything else when learning about C++. If the MS tool you are using doesn't allow you to write a correct declaration (that wouldn't be a surprise) then just drop it on the floor now and learn C++ using a tool that shows some respect for the standard instead. Once you know C++ you can begin use non-portable stuff if you really need but knowing that's non-portable stuff.
Mingw is a good free C++ compiler for windows and there are free good IDEs if you like them. Over the years I got to like more using a good editor like emacs (vim is also ok, I used it for many years) and a command line compiler, but mainly because I work in a variety of languages on several different operating systems and no single IDE can cover all that. I want to put low level knowledge (how to copy a piece of text, how to search for a string, how to ask to completion, how to open another file) at a finger level and not having to think consciously in which IDE I am just to find the proper command. You cannot really play Chopin if you've to think every time to where G# is on the keyboard.
May be I'm just old, however... ;-)
4. Pick a reasonable naming convention
struct stringy {
char * str; //points to a string
int ct; //length of string(not counting '\0')
};
In your code your are naming a class stringy. It's better if you get used to what is the most common naming convention in C++ for classes, that is having it named Stringy instead.
The standard C++ library is not following this convention but those classes will always be prefixed by std:: anyway.
My advice is also to NOT use the idea of system hungarian notation of calling variables depending on the C++ type (like iIndex, sFileName) that is sometimes present in MS documentation. That idea doesn't scale up and simply means you will use bad names for all your variables.
5. Problems with set function
void set(stringy & obj, char cstr)
{
char * temp = new char[cstr];
obj.str = temp;
for (int i = 0; i < temp.length(); i++)
temp[i] = cstr[i];
}
In this function there are several errors:
You want to pass a char * not a char. A char holds the room for a single character and instead you want to initialize your stringy instance with a sequence of characters. In C++ you can use a char pointer for that because there is a specific support for char sequences in memory that are closed with the special ascii char NUL (note the single "L", the ASCII NUL character in C++ is spelled '\0' and is not to be confused with the NULL pointer). C++ preferred way for handling sequences of characters is actually the std::string standard class, but NUL-terminated sequences of characters are also fully supported for backward-compatibility with C.
A pointer is however just the address of a charater... that character will be followed by other characters until you find the closing '\0' but a pointer has no length member (actually has no members at all, it's a primitive type like int or double).
To know the lenght of a sequence of characters that has been passed using a pointer there is the standard function strlen (that returns the number of characters in the sequence excluding the terminating '\0'). So your code should probably be something like:
void set(stringy & obj, char *cstr)
{
char * temp = new char[1 + strlen(cstr)];
obj.str = temp;
strcpy(obj.str, cstr);
}
I've been using also the standard function strcpy that does the copy of a char sequence including the end '\0' marker. A possibile implementation of strcpy (here just to show the idea of '\0'-terminated strings) is the following:
char *mystrcpy(char *dest, const char *src)
{
int i = 0;
while (src[i] != '\0')
{
dest[i] = src[i];
i++;
}
dest[i] = '\0';
return dest;
}
6. Memory allocation
The stringy class is badly designed (in C++ there isn't any big difference between struct and class: just what is the default for visibility). To be specific construction and destruction are not handled where they should be (inside the stringy class), and for a class designed this way also assignment and copy construction must be handled or forbidden.
As a consequence your program is simply forgetting deallocation leaking memory (normally not a serious issue for main, but it's important to understand the problem).
Hopefully this problem is just because the book didn't arrive yet to explain those concepts.
Anyway I find it strange a book that talks about new[] but not about delete[] (may be there is a reason that your book is not listed as a good book).
A properly implemented stringy should IMO look something like:
struct stringy
{
int size; // Number of characters EXCLUDING ending '\0'
char *ptr; // Pointer to first character
stringy(const char *s = "")
: size(strlen(s)), ptr(new char[1 + size])
{
strcpy(ptr, s);
}
~stringy()
{
delete[] ptr;
}
stringy(const stringy& other)
: size(other.size), ptr(new char[1 + size])
{
strcpy(ptr, other.ptr);
}
stringy& operator=(const stringy& other)
{
char *newptr = new char[1 + other.size];
strcpy(newptr, other.ptr);
delete[] ptr;
ptr = newptr;
size = other.size;
return *this;
}
};
temp is a const char*. That type does not provide any kind of length facilities- it is not an object and does not have a length() member method. Use a std::string- that is what it is for.