c++ last element of a structure field - c++

I get a structure, and I don't know the size of it (every time it's different). I would like to set the last place in one of the fields of this structure to a certain value. In pseudocode, I mean something like this:
structureA.fieldB[end] = cert_value;
I'd do it in matlab however I cannot somehow find the proper syntax in c++, can you help me?

In Matlab, a structure data type holds key-value pairs where the "value" may be of different types. In C++, there are some key-value containers available (associative containers like set, map, multimap), but they usually store elements of a single type. What you need if I understood it right is something like
"one" : 1
"two" : [1,2,5]
"three" : "name"
Which means that your structure resembles a Python dictionary.
In C++, the only way I have heard of using containers with truly different types is by using boost::any, which is accepted as the answer to this question.
If you pack a container with elements of different types, then you can use the end() member function of a container to get the last element.

You need sizeof, this gives you the size of the array in bytes. Since you want the the index of the last element, you have to divide this number by the number of bytes for one element. You end up with:
int index_end = sizeof(structureA.fieldB) / sizeof(structureA.fieldB[0]);
structureA.fieldB[index_end] = new_value;

Related

C++ unordered_map or unordered_set : What to use if I wish to keep an "isVisited" data structure

I want to keep a data structure for storing all the elements that I have seen till now. Considering that keeping an array for this is out of question as elements can be of the order of 10^9, what data structure should I use for achieving this : unordered_map or unordered_set in C++ ?
Maximum elements that will be visited in worst case : 10^5
-10^9 <= element <= 10^9
As #MikeCAT said in the comments, a map would only make sense if you wanted to store additional information about the element or the visitation. But if you wanted only to store the truth value of whether the element has been visited or not, the map would look something like this:
// if your elements were strings
std::unordered_map<std::string, bool> isVisited;
and then this would just be a waste of space. Storing the truth value is redundant, if the mere presence of the string within the map already indicates that it has been visited. Let's see a comparison:
std::unordered_map<std::string, bool> isVisitedMap;
std::unordered_set<std::string> isVisitedSet;
// Visit some places
isVisitedMap["madrid"] = true;
isVisitedMap["london"] = true;
isVisitedSet.insert("madrid");
isVisitedSet.insert("london");
// Maybe the information expires so you want to remove them
isVisitedMap["london"] = false;
isVisitedSet.erase("london");
Now the elements stored in each structure will be:
For the map:
{{"london", false}, {"madrid", true}} <--- 4 elements
{"madrid"} <--- 1 element. Much better
In a project in which I had a binary tree converted to a binary DAG for optimization purposes (GRAPHGEN) I passed the exploration function a map from node pointers to bool:
std::map<BinaryDrag<conact>::node*, bool> &visited_fl
The map kept track of the pointers in order not to go through the same nodes again when doing multiple passes.
You could use a std::unordered_map<Value, bool>.
I want to keep a data structure for storing all the elements that I have seen till now.
A way to re-phrase that is to say "I want a data structure to store the set of all elements that I've seen till now". The clue is in the name. Without more information, std::unordered_set seems like a reasonable choice to represent a set.
That said, in practice it depends on details like what you're planning to do with this set. Array can be a good choice as well (yes, even for billions of elements), other set implementations may be better and maps can be useful in some use cases.

Is it possible to create storage in for an n element array where the elements are tuples?

I'm making a class that is supposed to be able to store a 20 element array with each element being a tuple of four predefined types. Another catch is, I can't use parameters.
I can't find good online sources for this and the material provided from my university is honestly insufficient. I'm preparing for an exam and I'm stumped when it comes to objects in OCaml.
I was thinking of doing something like
val mutable arr = Array.make 20 (input 20 values)
but this seems too simplistic and inefficient to be a correct solution.
The fields of a class can have any type. This certainly includes an array type. Arrays, in turn, can contain any type, which includes tuples.
Any given mutable field and any given array is, of course, restricted to always contain values of the same type. This is what it means to have "strong" typing.
OCaml is a high level language, so there's no need (or opportunity, really) to be concerned with too many details of representation. If you want a class with a field like you say, your proposted type sounds perfectly fine.
type mytuple = int * float * char
class myclass = object
val mutable myfield : mytuple array = [||]
end
You can find good documentation on OCaml at realworldocaml.org. There are more resources listed at ocaml.org.

Usual Terminology to name a list of list

I frequently have to deal with list of list of my item-type in my scripts.
Most of the time, I'm reducing the list of my item-type into items with a plural to indicate a collection. Following this convention, I would consequently name my list of list of my item-type : list_items
However I might as well have to deal with some list of list_items.
So I was wondering if there were any terminology, name, to use so as to indicate the concept list of list of (or even sequence of sequence of or generator of generator of)
I first thought of 2D-Array, but it's not appropriate since all lists may not have the same length.
Any idea ?
struct Value;
Value value
list<Value> values;
typedef list<Value> ValueList;
ValueList valueList;
list<ValueList> valueLists;
I would not go any further. Usually you should not name you data types based on its content. The name should try to express the semantics.
Sure you have a list of values, but that does it mean? Is it a List of Features? then you should call it so. That does the List of Featurelists represent? is it your Groundtruth data of your classifier? Hope you got the point.

How to associate to a number another number without using array

Let's say we have read these values:
3
1241
124515
5322353
341
43262267234
1241
1241
3213131
And I have an array like this (with the elements above):
a[0]=1241
a[1]=124515
a[2]=43262267234
a[3]=3
...
The thing is that the elements' order in the array is not constant (I have to change it somewhere else in my program).
How can I know on which position does one element appear in the read document.
Note that I can not do:
vector <int> a[1000000000000];
a[number].push_back(all_positions);
Because a will be too large (there's a memory restriction). (let's say I have only 3000 elements, but they're values are from 0 to 2^32)
So, in the example above, I would want to know all the positions 1241 is appearing on without iterating again through all the read elements.
In other words, how can I associate to the number "1241" the positions "1,6,7" so I can simply access them in O(1) (where 1 actually is the number of positions the element appears)
If there's no O(1) I want to know what's the optimal one ...
I don't know if I've made myself clear. If not, just say it and I'll update my question :)
You need to use some sort of dynamic array, like a vector (std::vector) or other similar containers (std::list, maybe, it depends on your needs).
Such data structures are safer and easier to use than C-style array, since they take care of memory management.
If you also need to look for an element in O(1) you should consider using some structures that will associate both an index to an item and an item to an index. I don't think STL provides any, but boost should have something like that.
If O(log n) is a cost you can afford, also consider std::map
You can use what is commonly refered to as a multimap. That is, it stores Key and multiple values. This is an O(log) look up time.
If you're working with Visual Studios they provide their own hash_multimap, else may I suggest using Boost::unordered_map with a list as your value?
You don't need a sparse array of 1000000000000 elements; use an std::map to map positions to values.
If you want bi-directional lookup (that is, you sometimes want "what are the indexes for this value?" and sometimes "what value is at this index?") then you can use a boost::bimap.
Things get further complicated as you have values appearing more than once. You can sacrifice the bi-directional lookup and use a std::multimap.
You could use a map for that. Like:
std::map<int, std::vector<int>> MyMap;
So everytime you encounter a value while reading the file, you append it's position to the map. Say X is the value you read and Y is the position then you just do
MyMap[X].push_back( Y );
Instead of you array use
std::map<int, vector<int> > a;
You need an associative collection but you might want to associated with multiple values.
You can use std::multimap< int, int >
or
you can use std::map< int, std::set< int > >
I have found in practice the latter is easier for removing items if you just need to remove one element. It is unique on key-value combinations but not on key or value alone.
If you need higher performance then you may wish to use a hash_map instead of map. For the inner collection though you will not get much performance in using a hash, as you will have very few duplicates and it is better to std::set.
There are many implementations of hash_map, and it is in the new standard. If you don't have the new standard, go for boost.
It seems you need a std::map<int,int>. You can store the mapping such as 1241->0 124515->1 etc. Then perform a look up on this map to get the array index.
Besides the std::map solution offered by others here (O(log n)), there's the approach of a hash map (implemented as boost::unordered_map or std::unordered_map in C++0x, supported by modern compilers).
It would give you O(1) lookup on average, which often is faster than a tree-based std::map. Try for yourself.
You can use a std::multimap to store both a key (e.g. 1241) and multiple values (e.g. 1, 6 and 7).
An insert has logarithmic complexity, but you can speed it up if you give the insert method a hint where it can insert the item.
For O(1) lookup you could hash the number to find its entry (key) in a hash map (boost::unordered_map, dictionary, stdex::hash_map etc)
The value could be a vector of indices where the number occurs or a 3000 bit array (375 bytes) where the bit number for each respective index where the number (key) occurs is set.
boost::unordered_map<unsigned long, std::vector<unsigned long>> myMap;
for(unsigned long i = 0; i < sizeof(a)/sizeof(*a); ++i)
{
myMap[a[i]].push_back(i);
}
Instead of storing an array of integer, you could store an array of structure containing the integer value and all its positions in an array or vector.

Given 200 strings, what is a good way to key a LUT of relationship values

I've got 200 strings. Each string has a relationship (measured by a float between 0 and 1) with every other string. This relationship is two-way; that is, relationship A/B == relationship B/A. This yields n(n-1)/2 relationships, or 19,800.
What I want to do is store these relationships in a lookup table so that given any two words I can quickly find the relationship value.
I'm using c++ so I'd probably use a std::map to store the LUT. The question is, what's the best key to use for this purpose.
The key needs to be unique and needs to be able to be calculated quickly from both words.
My approach is going to be to create a unique identifier for each word pair. For example given the words "apple" and "orange" then I combine them together as "appleorange" (alphabetical order, smallest first) and use that as the key value.
Is this a good solution or can someone suggest something more cleverer? :)
Basically you are describing a function of two parameters with the added property that order of parameters is not significant.
Your approach will work if you do not have ambiguity between words when changing order (I would suggest putting a coma or like between the two words to remove possible ambiguities). Any 2D array would also work.
I would probably convert each keyword to some unique identifier (using a simple map) before trying to find the relationship value, but it does not change much from what you are proposing.
If boost/tr1 is acceptable, I would go for an unordered_map with the pair of strings as key. The main question would then be: what with the order of the strings? This could be handled by the hash-function, which starts with the lexical first string.
Remark: this is just a suggestion after reading the design-issue, not a study.
How "quickly" is quickly? Given you don't care about the order of the two words, you could try a map like this:
std::map<std::set<std::string>, double> lut;
Here the key is a set of the two words, so if you insert "apple" and "orange", then the order is the same as "orange" "apple", and given set supports the less than operator, it can function as a key in a map. NOTE: I intentionally did not use a pair for a key, given the order matters there...
I'd start with something fairly basic like this, profile and see how fast/slow the lookups etc. are before seeing if you need to do anything smarter...
If you create a sorted array with the 200 strings, then you can binary search it to find the matching indices of the two strings, then use those two indices in a 2D array to find the relationship value.
If your 200 strings are in an array, your 20,100 similarity values can be in a one dimensional array too. It's all down to how you index into that array. Say x and y are the indexes of the strings you want the similarity for. Swap x and y if necessary so that y>=x, then look at entry i= x + y(y+1)/2 in the large array.
(x,y) of (0,0),(0,1),(1,1),(0,2),(1,2),(2,2),(0,3),(1,3)... will take you to entry 0,1,2,3,4,5,6,7...
So this uses space optimally and it gives faster look up than a map would. I'm assuming efficiency is at least mildly important to you since you are using C++!
[if you're not interested in self similarity values where y=x, then use i = x + y(y-1)/2 instead].