STAssertEquals reports equal strings as being different - unit-testing

I was trying to write a unit test for a library. A method from library returns string and I wanted to make sure it returns right string. But somewhat STAssertEquals macro in SenTestKit saw it as different value even though it was same.
You can see that description part clearly showed that two string values are same, yet this macro complained their values are different. When I return static string from method ( just like return #"op_user") it passed the test case. Anyone have an idea what cause to fail this test with same string value?

I think you want to use STAssertEqualObjects() instead of STAssertEquals(). The former is for Objective-C instances, the latter for primitive types.
From the docs:
STAssertEqualObjects
Fails the test case when two objects are different.
STAssertEquals
Fails the test case when two values are different.
If you compare Objective-C objects with STAssertEquals(), you're just comparing their pointer values. They could point to two different string objects that contain the same string. You would want them to compare as equal, even if their pointer values are different.
To compare the actual string contents, you'd use the isEqual: method, which is exactly what STAssertEqualObjects() does.

Not quite an answer (since Marc already replied correctly), but as a general rule, never ever use STAssertEquals!
It uses a questionable approach which first checks whether the types of the arguments are equal, using #encode to determine the type, and then basically performs a memcmp. So, variables defined as below
int i = 1;
unsigned int ui = 1;
assert(i == ui);
STAssertEquals(i, ui, #"");
will let STAssertEquals fail - even though the values are equal.
While
signed char sc = 1;
char c = 1;
STAssertEquals(sc, c, #"");
guess what?
(Hint: types are NOT equal!)
... will succeed, because #encode cannot differentiate between an unsigned char and an signed char, which however are distinct types!
Other issues are with padding in structs where memcmp may return non-zero but the struct values would actually compare equal.
So better don't use STAssertEquals, and as well its successor XCTAssertEqual which adopted the same ridiculous approach.

Related

How to print elements from tcl_obj in gdb?

I am debugging a c++-tcl interface application and I need to see the elements of Tcl_Obj objv.
I tried doing print *(objv[1]) and so on but it doesnt seem helping.
Is there any way to see Tcl_Obj elements in gdb?
It's not particularly easy to understand a Tcl_Obj * from GDB as the data structure uses polymorphic pointers with shrouded types. (Yeah, this is tricky C magic.) However, there are definitely some things you can try. (I'll pretend that the pointer is called objPtr below, and that it is of type Tcl_Obj *.)
Firstly, check out what the objPtr->typePtr points to, if anything. A NULL objPtr->typePtr means that the object just has something in the objPtr->bytes field, which is a UTF-8 string containing objPtr->length bytes with a \0 at objPtr->bytes[objPtr->length]. A Tcl_Obj * should never have both its objPtr->bytes and objPtr->typePtr being NULL at the same time.
If the objPtr->typePtr is not NULL, it points to a static constant structure that defines the basic polymorphic type operations on the Tcl_Obj * (think of it as being like a vtable). Of initial interest to you is going to be the name field though; that's a human-readable const char * string, and it will probably help you a lot. The other things in that structure include a definition of how to duplicate the object and how to serialize the object. (The objPtr->bytes field really holds the serialization.)
The objPtr->typePtr defines the interpretation of the objPtr->internalRep, which is a C union that is big enough to hold two generic pointers (and a few other things besides, like a long and double; you'll also see a Tcl_WideInt, which is probably a long long but that depends on the compiler). How this happens is up to the implementation of the type so it's difficult to be all-encompassing here, but it's basically the case that small integers have the objPtr->internalRep.longValue field as meaningful, floating point numbers have the objPtr->internalRep.doubleValue as meaningful, and more complex types hang a structure off the side.
With a list, the structure actually hangs off the objPtr->internalRep.twoPtrValue.ptr1 and is really a struct List (which is declared in tclInt.h and is not part of Tcl's public API). The struct List in turn has a variable-length array in it, the elements field; don't modify inside there or you'll break things. Dictionaries are similar, but use a struct Dict instead (which contains a variation on the theme of hash tables) and which is declared just inside tclDictObj.c; even the rest of Tcl's implementation can't see how they work internally. That's deliberate.
If you want to debug into a Tcl_Obj *, you'll have to proceed carefully, look at the typePtr, apply relevant casts where necessary, and make sure you're using a debug build of Tcl with all the symbol and type information preserved.
There's nothing about this that makes debugging a whole array of values particularly easy. The simplest approach is to print the string view of the object, like this:
print Tcl_GetString(objv[1])
Be aware that this does potentially trigger the serialization of the object (including memory allocation) so it's definitely not perfect. It is, however, really easy to do. (Tcl_GetString generates the serialization if necessary — storing it in the objPtr->bytes field of course — and returns a pointer to it. This means that the value returned is definitely UTF-8. Well, Tcl's internal variation on UTF-8 that's slightly denormalized in a couple of places that probably don't matter to you right now.)
Note that you can read some of this information from scripts in Tcl 8.6 (the current recommended release) with the ::tcl::unsupported::representation command. As you can guess from the name, it's not supported (because it violates a good number of Tcl's basic semantic model rules) but it can help with debugging before you break out the big guns of attaching gdb.

String literals that contain '\0' - why aren't they the same?

So I did the following test:
char* a = "test";
char* b = "test";
char* c = "test\0";
And now the questions:
1) Is it guaranteed that a==b? I know I'm comparing addresses. This is not meant to compare the strings, but whether identical string literals are stored in a single memory location
2) Why doesn't a==c? Shouldn't the compiler be able to see that they're referring to the same string?
3) Is an extra \0 appended at the end of c, even though it already contains one?
I didn't want to ask 3 different questions for this because they seem somehow related, sorry 'bout that.
Note: The tag is correct, I'm interested in C++. (although please specify if the behavior is different for C)
Is it guaranteed that a==b?
No. But it is allowed by §2.14.5/12:
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.
And as you can see from that last sentence using char* instead of char const* is a recipe for trouble (and your compiler should be rejecting it; make sure you have warnings enabled and high conformance levels selected).
Why doesn't a==c? Shouldn't the compiler be able to see that they're referring to the same string?
No, they're not required to be referring to same array of characters. One has five elements, the other six. An implementation could store the two in overlapping storage, but that's not required.
Is an extra \0 appended at the end of c, even though it already contains one?
Yes.
1 - absolutely not. a might == b though if the compiler chooses to share the same static string.
2 - because they are NOT referring to the same string
3 - yes.
The behavior is no different between C and C++ here except that C++ compilers should reject the assignment to non-const char*.
1) Is it guaranteed that a==b?
It is not. Note that you are comparing addresses and they could be pointing to different locations. Most smart compilers would fold this duplicate literal constant, so the pointers may compare equal, but again its not guaranteed by the standard.
2) Why doesn't a==c? Shouldn't the compiler be able to see that they're referring to the same string?
You are trying to compare pointers, they point to different memory locations. Even if you were comparing the content of such pointers, they are still unequal (see next question).
3) Is an extra \0 appended at the end of c, even though it already contains one?
Yes, there is.
First note that this should be const char* as that's what string literals decay to.
Both create arrays initialized with 't' 'e' 's' 't' folowed by a '\0' (length = 5). Comparing for equality will only tell you if they both start with the same pointer, not if they have the same contents (though logically, the two ideas follow each other).
A isn't equal to C because the same rules apply, a = 't' 'e' 's' 't' '\0' and b = 't' 'e' 's' 't' '\0' '\0'
Yes, the compiler always does it and you shouldn't expicitly do in if you're making a string like this. If you however crated an array and manually populated it, you need to ensure you add the \0.
Note that for my #3, const char[] = "Hello World" would also automatically get the \0 at the end, I was refferring to manually filling the array, not having the compiler work it out.
The problem here is you're mixing the concepts of pointer and textual equivalence.
When you say a == b or a == c you are asking if the pointers involved point to the same physical address. The test has nothing to do with the textual contents of the pointers.
To get textual equivalence you should use strcmp
If you are doing pointer comparisons than a != b, b != c, and c != a. Unless the compiler is smart enough to notice that your first two strings are the same.
If you do a strcmp(str, str) then all your strings will come back as matches.
I am not sure if the compiler will add an additional null termination to c, but I would guess that it would.
As has been said a few times in other answers, you are comparing pointers. However, I would add that strcmp(b,c) should be true, because it stops checking at the first \0.

C++ reinterpret_cast, making unique number

Recently, i'm using a code to make unique int number for my classes.
I used reinterpret_cast<int>(my_unique_name) where my_unique_name is a char [] variable with unique value. Something like below:
const char my_unique_name[] = "test1234";
int generate_unique_id_from_string(const char *str)
{
return reinterpret_cast<int>(str);
}
My question is, is the generated int really unique for all entry strings ?
No, it is not. You are casting the address of the string, not its contents.
To create a numeric value based on string input, use a hash function. However, this doesn't create a truly unique number, because of the so-called pigeonhole principle.
It would depend. Here's my interpretation of your question:
You're trying to assign a different number to each string. And identical strings from different sources will have different IDs.
Case 1:
If str happens to be a reusable buffer that you use to read in those strings from wherever. Then they'll all have the same base address. So no it will not be unique.
Case 2:
str happens to be a heap-allocated string. Furthermore, all the strings that will be ID'ed have overlapping lifetimes. Then yes, the IDs will be unique because they all reside in memory at the same time at different addresses.
EDIT:
If you want to generate a unique ID, but you want identical strings to have the same ID, then look at Greg's answer for a hash function.
It may not be always unique,
id1 = generate_unique_id_from_string("test12344444444444444");
id2 = generate_unique_id_from_string("Test12344444444444444");
Also, I think this will depend on the endianness of the platform.

Initializing a char array in C. Which way is better?

The following are the two ways of initializing a char array:
char charArray1[] = "foo";
char charArray2[] = {'f','o','o','\0'};
If both are equivalent, one would expect everyone to use the first option above (since it requires fewer key strokes). But I've seen code where the author takes the pain to always use the second method.
My guess is that in the first case the string "foo" is stored in the data segment and copied into the array at runtime, whereas in the second case the characters are stored in the code segment and copied into the array at runtime. And for some reason, the author is allergic to having anything in the data segment.
Edit: Assume the arrays are declared local to a function.
Questions: Is my reasoning correct? Which is your preferred style and why?
What about another possibility:
char charArray3[] = {102, 111, 111, 0};
You shouldn't forget the C char type is a numeric type, it just happens the value is often used as a char code. But if I use an array for something not related to text at all, I would would definitely prefer initialize it with the above syntax than encode it to letters and put them between quotes.
If you don't want the terminal 0 you also have to use the second form or in C use:
char charArray3[3] = "foo";
It is a a C feature that nearly nobody knows, but if the compiler does not have room enough to hold the final 0 when initializing a charArray, it does not put it, but the code is legal. However this should be avoided because this feature has been removed from C++, and a C++ compiler would yield an error.
I checked the assembly code generated by gcc, and all the different forms are equivalent. The only difference is that it uses either .string or .byte pseudo instruction to declare data. But tha's just a readability issue and does not make a bit of difference in the resulting program.
I think the second method is used mostly in legacy code where compilers didn't support the first method. Both methods should store the data in the data segments. I prefer the first method due to readability. Also, I needed to patch a program once (can't remember which, it was a standard UNIX tool) to not use /etc (it was for an embedded system). I had a very hard time finding the correct place because they used the second method and my grep couldn't find "etc" anywhere :-)

Why isn't ("Maya" == "Maya") true in C++?

Any idea why I get "Maya is not Maya" as a result of this code?
if ("Maya" == "Maya")
printf("Maya is Maya \n");
else
printf("Maya is not Maya \n");
Because you are actually comparing two pointers - use e.g. one of the following instead:
if (std::string("Maya") == "Maya") { /* ... */ }
if (std::strcmp("Maya", "Maya") == 0) { /* ... */ }
This is because C++03, §2.13.4 says:
An ordinary string literal has type “array of n const char”
... and in your case a conversion to pointer applies.
See also this question on why you can't provide an overload for == for this case.
You are not comparing strings, you are comparing pointer address equality.
To be more explicit -
"foo baz bar" implicitly defines an anonymous const char[m]. It is implementation-defined as to whether identical anonymous const char[m] will point to the same location in memory(a concept referred to as interning).
The function you want - in C - is strmp(char*, char*), which returns 0 on equality.
Or, in C++, what you might do is
#include <string>
std::string s1 = "foo"
std::string s2 = "bar"
and then compare s1 vs. s2 with the == operator, which is defined in an intuitive fashion for strings.
The output of your program is implementation-defined.
A string literal has the type const char[N] (that is, it's an array). Whether or not each string literal in your program is represented by a unique array is implementation-defined. (§2.13.4/2)
When you do the comparison, the arrays decay into pointers (to the first element), and you do a pointer comparison. If the compiler decides to store both string literals as the same array, the pointers compare true; if they each have their own storage, they compare false.
To compare string's, use std::strcmp(), like this:
if (std::strcmp("Maya", "Maya") == 0) // same
Typically you'd use the standard string class, std::string. It defines operator==. You'd need to make one of your literals a std::string to use that operator:
if (std::string("Maya") == "Maya") // same
What you are doing is comparing the address of one string with the address of another. Depending on the compiler and its settings, sometimes the identical literal strings will have the same address, and sometimes they won't (as apparently you found).
Any idea why i get "Maya is not Maya" as a result
Because in C, and thus in C++, string literals are of type const char[], which is implicitly converted to const char*, a pointer to the first character, when you try to compare them. And pointer comparison is address comparison.
Whether the two string literals compare equal or not depends whether your compiler (using your current settings) pools string literals. It is allowed to do that, but it doesn't need to. .
To compare the strings in C, use strcmp() from the <string.h> header. (It's std::strcmp() from <cstring>in C++.)
To do so in C++, the easiest is to turn one of them into a std::string (from the <string> header), which comes with all comparison operators, including ==:
#include <string>
// ...
if (std::string("Maya") == "Maya")
std::cout << "Maya is Maya\n";
else
std::cout << "Maya is not Maya\n";
C and C++ do this comparison via pointer comparison; looks like your compiler is creating separate resource instances for the strings "Maya" and "Maya" (probably due to having an optimization turned off).
My compiler says they are the same ;-)
even worse, my compiler is certainly broken. This very basic equation:
printf("23 - 523 = %d\n","23"-"523");
produces:
23 - 523 = 1
Indeed, "because your compiler, in this instance, isn't using string pooling," is the technically correct, yet not particularly helpful answer :)
This is one of the many reasons the std::string class in the Standard Template Library now exists to replace this earlier kind of string when you want to do anything useful with strings in C++, and is a problem pretty much everyone who's ever learned C or C++ stumbles over fairly early on in their studies.
Let me explain.
Basically, back in the days of C, all strings worked like this. A string is just a bunch of characters in memory. A string you embed in your C source code gets translated into a bunch of bytes representing that string in the running machine code when your program executes.
The crucial part here is that a good old-fashioned C-style "string" is an array of characters in memory. That block of memory is often referred to by means of a pointer -- the address of the start of the block of memory. Generally, when you're referring to a "string" in C, you're referring to that block of memory, or a pointer to it. C doesn't have a string type per se; strings are just a bunch of chars in a row.
When you write this in your code:
"wibble"
Then the compiler provides a block of memory that contains the bytes representing the characters 'w', 'i', 'b', 'b', 'l', 'e', and '\0' in that order (the compiler adds a zero byte at the end, a "null terminator". In C a standard string is a null-terminated string: a block of characters starting at a given memory address and continuing until the next zero byte.)
And when you start comparing expressions like that, what happens is this:
if ("Maya" == "Maya")
At the point of this comparison, the compiler -- in your case, specifically; see my explanation of string pooling at the end -- has created two separate blocks of memory, to hold two different sets of characters that are both set to 'M', 'a', 'y', 'a', '\0'.
When the compiler sees a string in quotes like this, "under the hood" it builds an array of characters, and the string itself, "Maya", acts as the name of the array of characters. Because the names of arrays are effectively pointers, pointing at the first character of the array, the type of the expression "Maya" is pointer to char.
When you compare these two expressions using "==", what you're actually comparing is the pointers, the memory addresses of the beginning of these two different blocks of memory. Which is why the comparison is false, in your particular case, with your particular compiler.
If you want to compare two good old-fashioned C strings, you should use the strcmp() function. This will examine the contents of the memory pointed two by both "strings" (which, as I've explained, are just pointers to a block of memory) and go through the bytes, comparing them one-by-one, and tell you whether they're really the same.
Now, as I've said, this is the kind of slightly surprising result that's been biting C beginners on the arse since the days of yore. And that's one of the reasons the language evolved over time. Now, in C++, there is a std::string class, that will hold strings, and will work as you expect. The "==" operator for std::string will actually compare the contents of two std::strings.
By default, though, C++ is designed to be backwards-compatible with C, i.e. a C program will generally compile and work under a C++ compiler the same way it does in a C compiler, and that means that old-fashioned strings, "things like this in your code", will still end up as pointers to bits of memory that will give non-obvious results to the beginner when you start comparing them.
Oh, and that "string pooling" I mentioned at the beginning? That's where some more complexity might creep in. A smart compiler, to be efficient with its memory, may well spot that in your case, the strings are the same and can't be changed, and therefore only allocate one block of memory, with both of your names, "Maya", pointing at it. At which point, comparing the "strings" -- the pointers -- will tell you that they are, in fact, equal. But more by luck than design!
This "string pooling" behaviour will change from compiler to compiler, and often will differ between debug and release modes of the same compiler, as the release mode often includes optimisations like this, which will make the output code more compact (it only has to have one block of memory with "Maya" in, not two, so it's saved five -- remember that null terminator! -- bytes in the object code.) And that's the kind of behaviour that can drive a person insane if they don't know what's going on :)
If nothing else, this answer might give you a lot of search terms for the thousands of articles that are out there on the web already, trying to explain this. It's a bit painful, and everyone goes through it. If you can get your head around pointers, you'll be a much better C or C++ programmer in the long run, whether you choose to use std::string instead or not!