C++ reinterpret_cast, making unique number - c++

Recently, i'm using a code to make unique int number for my classes.
I used reinterpret_cast<int>(my_unique_name) where my_unique_name is a char [] variable with unique value. Something like below:
const char my_unique_name[] = "test1234";
int generate_unique_id_from_string(const char *str)
{
return reinterpret_cast<int>(str);
}
My question is, is the generated int really unique for all entry strings ?

No, it is not. You are casting the address of the string, not its contents.
To create a numeric value based on string input, use a hash function. However, this doesn't create a truly unique number, because of the so-called pigeonhole principle.

It would depend. Here's my interpretation of your question:
You're trying to assign a different number to each string. And identical strings from different sources will have different IDs.
Case 1:
If str happens to be a reusable buffer that you use to read in those strings from wherever. Then they'll all have the same base address. So no it will not be unique.
Case 2:
str happens to be a heap-allocated string. Furthermore, all the strings that will be ID'ed have overlapping lifetimes. Then yes, the IDs will be unique because they all reside in memory at the same time at different addresses.
EDIT:
If you want to generate a unique ID, but you want identical strings to have the same ID, then look at Greg's answer for a hash function.

It may not be always unique,
id1 = generate_unique_id_from_string("test12344444444444444");
id2 = generate_unique_id_from_string("Test12344444444444444");
Also, I think this will depend on the endianness of the platform.

Related

C++ - evaluating an input string as an internal code variable

Is there a way to take a string as an input argument to a c++ function and evaluate it as an internal argument e.g. the name of a structure or other variable?
For example (written in pseudo code)
int myFunction(string nameStructure){
nameStructure.field = 1234
}
The "take away" point is converting the input string as a variable within the code.
Mark
This type of question is often a symptom of a XY problem so consider other options first. That being said, there's no such default mechanism in C++ but there is a simple workaround I can think of - use a dictionary (std::map / std::unordered_map) to store all your objects:
std::map<std::string, MyAwesomeObject> objects;
...
int myFunction(std::string nameStructure)
{
objects[nameStructure].field = 1234
}
The names of local variables are just artifacts of the human-readable code and have no meaning in the compiled binary. Your int myIntVar's and char* myCharP's get turned into instructions like "four bytes starting at the location of the base pointer minus eight bytes, interpreted as a four-byte integer". They no longer have names as such.
If you export symbols from your binary, you can at runtime to look into export table according to your binary format and find the variable you want. But i bet you want something like access to local variable and that is not possible.
If you really need this funcionality, take a look at more dynamic interpreted languages as php
http://php.net/manual/en/language.variables.variable.php

String storage optimization

I'm looking for some C++ library that would help to optimize memory usage by storing similar (not exact) strings in memory only once. It is not FlyWeight or string interning which is capable to store exact objects/strings only once. The library should be able to analyze and understand that, for example, two particular strings of different length have identical first 100 characters, this substring should be stored only once.
Example 1:
std::string str1 = "http://www.contoso.com/some/path/app.aspx?ch=test1"<br/>
std::string str2 = "http://www.contoso.com/some/path/app.aspx?ch=test2"<br/>
in this case it is obvious that the only difference in these two strings is the last character, so it would be a great saving in memory if we hold only one copy of "http://www.contoso.com/some/path/app.aspx?ch=test" and then two additional strings "1" and "2"
Example 2:
std::string str1 = "http://www.contoso.com/some/path/app.aspx?ch1=test1"<br/>
std::string str2 = "http://www.contoso.com/some/path/app.aspx?ch2=test2"<br/>
this is more complicated case when there are multiple identical substrings : one copy of "http://www.contoso.com/some/path/app.aspx?ch", then two strings "1" and "2", one copy of "=test" and since we already have strings "1" and "2" stored we don't need any additional strings.
So, is there such a library? Is there something that can help to develop such a library relatively fast? strings are immutable, so there is no need to worry about updating indexes or locks for threadsafety
If strings have common prefix the solution may be - using radix tree (also known as trie) (http://en.wikipedia.org/wiki/Radix_tree) for string representation. So you can only store pointer to tree leaf. And get whole string by growing up to tree root.
hello world
hello winter
hell
[2]
/
h-e-l-l-o-' '-w-o-r-l-d-[0]
\
i-n-t-e-r-[1]
Here is one more solution: http://en.wikipedia.org/wiki/Rope_(data_structure)
libstdc++ implementation: https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.3/a00223.html
SGI documentation: http://www.sgi.com/tech/stl/Rope.html
But I think you need to construct your strings for rope to work properly. Maybe found longest common prefix and suffix for every new string with previous string and then express new string as concatenation of previous string prefix, then uniq part and then previous string suffix.
For example 1, what I can come up with is Radix Tree, a space-optimized version from Trie. I did a simple google and found quite a few implementations in C++.
For example 2, I am also curious about the answer!
First of all, note that std::string is not immutable and you have to make sure that none of these strings are accidentally modified.
This depends on the pattern of the strings. I suggest using hash tables (std::unordered_map in C++11). The exact details depend on how you are going to access these strings.
The two strings you have provided differ only after the "?ch" part. If you expect that many strings will have long common prefixes where these prefixes are almost of the same size. You can do the following:
Let's say the size of a prefix is 43 chars. Let s be a string. Then, we can consider s[0-42] a key into the hash table and the rest of the string as a value.
For example, given "http://www.contoso.com/some/path/app.aspx?ch=test1" the key would be "http://www.contoso.com/some/path/app.aspx?" and "ch=test1" would be the value. if the key already exists in the hash table you can just add the value to the collection of values associated with key. Otherwise, add the key/value pair.
This is just an example, what the key is and what the value is depend on how you are going to access these strings.
Also if all string have "=test" in them, then you don't have to store this with every value. You can just store it once and then insert it when retrieving a string. So given the value "ch1=test1", what will be stored is just "ch11". This depends on the pattern of the strings.

STAssertEquals reports equal strings as being different

I was trying to write a unit test for a library. A method from library returns string and I wanted to make sure it returns right string. But somewhat STAssertEquals macro in SenTestKit saw it as different value even though it was same.
You can see that description part clearly showed that two string values are same, yet this macro complained their values are different. When I return static string from method ( just like return #"op_user") it passed the test case. Anyone have an idea what cause to fail this test with same string value?
I think you want to use STAssertEqualObjects() instead of STAssertEquals(). The former is for Objective-C instances, the latter for primitive types.
From the docs:
STAssertEqualObjects
Fails the test case when two objects are different.
STAssertEquals
Fails the test case when two values are different.
If you compare Objective-C objects with STAssertEquals(), you're just comparing their pointer values. They could point to two different string objects that contain the same string. You would want them to compare as equal, even if their pointer values are different.
To compare the actual string contents, you'd use the isEqual: method, which is exactly what STAssertEqualObjects() does.
Not quite an answer (since Marc already replied correctly), but as a general rule, never ever use STAssertEquals!
It uses a questionable approach which first checks whether the types of the arguments are equal, using #encode to determine the type, and then basically performs a memcmp. So, variables defined as below
int i = 1;
unsigned int ui = 1;
assert(i == ui);
STAssertEquals(i, ui, #"");
will let STAssertEquals fail - even though the values are equal.
While
signed char sc = 1;
char c = 1;
STAssertEquals(sc, c, #"");
guess what?
(Hint: types are NOT equal!)
... will succeed, because #encode cannot differentiate between an unsigned char and an signed char, which however are distinct types!
Other issues are with padding in structs where memcmp may return non-zero but the struct values would actually compare equal.
So better don't use STAssertEquals, and as well its successor XCTAssertEqual which adopted the same ridiculous approach.

What's a use case where an empty string and NULL should be treated differently?

What's an example of a scenario where an empty string and NULL should be treated as distinct values?
(I ask because Django and Oracle consider them indistinguishable, but some databases treat the empty string and NULL as 2 distinct values.)
In short, that allows for optional unique fields.
Let's take the example of financial securities: some are identified using multiple codes. Bloomberg codes, ISIN codes, Sedol codes, Reuters codes ... But rarely would all securities have registered for all types of codes at the same time.
However if one security is already assigned one type of code, you would not want another security to reuse the same value.
Hence the need for both uniqueness and optionality, which mixing '' and NULL prevents, because the db would complain if you were trying to insert '' twice for unassigned codes.
In my mind these two values are semantically different. Empty string is a valid string instance while null means that the value has not been set at all (note that in general null is not reserved for string type only but is a general concept that applies to any nullable type and therefore I don't treat null as a string value (i.e. a value from the set of all possible string values) while empty string is specific to string type and is a string value - i.e. it belongs to the set of possible string values). To understand what I mean look at nullable integer type - is there a difference between 0 and null? Obviously there is. The difference between string and int example is that in real world 0 is much more useful than the empty string. In real world an empty string and null are also very often equivalents.
One use case when you may want empty string instead of null is when you want to remove all occurrences of a character/substring from a string. The easiest method is to replace what you want to remove with empty strings. I don't know what it would mean to replace a substring with a null value - it will probably behave differently depending on the language you are using. However, if I use the empty string I would expect the same behavior regardless of the language.
You could have a database table that represents an inheritance hierarchy (known as Table-Per-Hierarchy in .NET's Entity Framework). This means that a single table stores the state of all derived types in that hierarchy, as well as the base class.
Look at this class hierarchy:
public class Animal
{
int Id,
int NumberOfLegs
}
public class Cat : Animal
{
string FurColor;
}
The base class Animal has no string properties. However, a derived type called Cat has a string property called FurColor. If we were using Table-Per-Hierarchy, then we would have the columns:
ID | NumberOfLegs | FurColor
We would probably also have what's called a discriminator column which helps differentiate between different types, but that's not important here.
Now, if you had an instance of the Animal base class, your object would only have 2 properties. When this object is stored into the database table, there is no value for FurColor because that property has nothing to do with the Animal class. Therefore NULL is the most suitable value as it indicates that there is no explicit value for this property. Empty string on the other hand could be considered to have some meaning, which is structurally incorrect for the Animal object, as that property is non existant on it.
If you had an instance of the Cat class, then your object would have 3 properties. Saving this to the database table would conceptually require all 3 fields. If FurColor didn't have a particular value, empty string is perfectly valid and in my opinion a better option than NULL. This is because when you read the object back from the database, you don't have to specifically check the property for NULL before using it in operations such as string concatenation (which will throw something along the lines of a NULL reference exception in most statically typed languages such as C#).
So long story short, NULL can be considered 'value-less' where an empty string might be a perfectly valid value in your application.

How can I store an inventory-like list of numbers?

I've got a list of number that I need to keep track of. The numbers are loosely related, but represent distinctly different items. I'd like to keep a list of the numbers but be able to refer to them by name so that I can call them and use them where needed easily. Kind of like an inventory listing, where the numbers all refer to a part ID and I'd like to call them idPart1, idPart2, idPart3 so their purpose is easily identifiable when they are used.
What would be the best way to do this?
1)Define a structure. Say, Inventory. A number of int members will be included, part1, part2 etc. To use, an instance of the structure will be created, values assigned to the members, and the numbers will be used by saying struct.member as needed.
2)Define an enumeration. Use part1, part2 as the enum literals. Store the actual values in a vector or list, each one at the index corresponding to the value of the number's name within the enum. Use the enum literals to retrieve the values, list[enumLit].
3)Something completely different
There's nothing else I need to do with the numbers - just look them up every once in a while. Since there's no processing, I kind of think a new class for them is overkill, but I'm willing to be convinced otherwise.
Any suggestions?
Let me try to rephrase what you're trying to do here. You want developers who use your code to be able to refer to a pre-defined set of numeric values:
using intuitive names
that will be validated at compile time
and that the IDE will recognize for the sake of code completion.
If the values will not change at run-time, and they are integer values, you can use an enum as Mark Ransom showed.
If the values will not change at run-time, and they are non-integer values, you can use either #define or const variables:
#define PART1 1.3
#define PART2 "1233-456"
or
namespace PartNumbers
{
const double Part1 = 1.3;
const char* Part2 = "123-456"
}
If the values may change at run-time, you can use either of the two options you identified. Both of these options have the drawback of requiring an object to be instantiated which holds the current part number values. The other options are simpler to implement and don't require any run-time lookup. Everything is resolved at compile time.
All of these options require users of your code to recompile if they are to access new part types. Your first option may require existing code to be recompiled when new part types are added, even if the existing code doesn't access them; it's more prone to memory layout changes.
You can use a map with a string as the key.
std::map<string,int> mymap;
mymap["part1"] = value1;
cout << mymap["part1"];
You could use the:
std::map<string, int> someMapName;
with the key as the string and the actual number as the int. That way you could do you could use
someMapName["idPart1"]
to grab the number.'
EDIT:
If you are ok with Enumerations then option 2 would work perfectly with the std::map just instead of string, the key would be your enum type obviously.
Based on your comments to the other answers, I'd say enums are the way to go, but I'd structure them a little differently.
namespace id {
enum {
part1 = 123,
part2 = 456,
part3 = 987,
...
};
}
cout << id::part1;
Use a database.
Specifically, a table like the following:
+------------------+-------------------+
| Item Name | Item Number |
+------------------+-------------------+
Internally, this can be represented as:
std::map<std::string, // The item name
unsigned int> // The number.
When you want the number, retrieve it using the name:
std::map<std::string, unsigned int> index_by_name;
//...
std::string part_name = "Part0123";
unsigned int part_number = 0;
part_number = index_by_name[name];
Seriously, use a database. Check out SQLite and MySQL.