Which dataset should I use? - c++

The title may have been a bit vague, but I will appreciate some ideas for the current problem I have.
Here is a dataset:
1 1/1/2013
2 1/1/2013
3 1/1/2013
1 1/2/2013
2 1/2/2013
1 1/3/2013
2 1/3/2013
3 1/3/2013
So, I begin with the first record, and see if there is another 1 in my list. If there is, I ignore it, and go back to the second record. If there is another 2 in my list, I ignore it, and go back to the 3rd record, and so on and so forth.
Now, the desired result of this list, that I am looking for is <1, 1/3/2013>, since no other record of 1 exists below it.
Similarly, in this dataset:
1 1/1/2013
2 1/1/2013
3 1/1/2013
1 1/2/2013
2 1/2/2013
3 1/2/2013
4 1/2/2013
1 1/3/2013
2 1/3/2013
3 1/3/2013
The desired result would be <4, 1/2/2013>, since there is no other occurrence of 4 down the list.
My question is, how would I go about doing this, what standard STL container can I use? Further more, these are the results returned by a query.
I am sorry I don't use boost or any of the other libraries, and looking to get this done with std variables.

You can use two maps - one map to store mapping from the key (your first column) to the value (your second column) and second map to store mapping from the key (your first column) to the record number:
std::map<int, std::string> m1;
std::map<int, int> m2;
int counter = 0;
while (...)
{
<...get record...>
m1[record.key] = record.value;
m2[record.key] = counter++;
}
Then you need to scan the second map m2 in order to find the key with minimal position:
int keyMin = <...big number...>, posMin = <...big number...>;
for (std::map<int, int>::const_iterator it = m2.begin(); it != m2.end(); ++it)
{
if (it->second < posMin)
{
keyMin = it->first;
posMin = it->second;
}
}
The result will be the first key, for which there are no records with this key down the road. Using this key and the first map m1 you'll be able to find its corresponding value.

You can check from the bottom, and remember the first(last when counting from the top) appearance of each index. And after You've done this (in time O(n)) You can take the last You found.

What does query return? You can choose std::vector<some-structure> if it returns a known structure, or std::vector<std::vector<std::string> > if it returns a string list.
Then going from bottom and remembering all unique ids that you see you are able to get the last good value in o(n) time and o(n) memory.

Related

Map with a vector as a parameter [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
For the Leetcode:
There are n people whose IDs go from 0 to n - 1 and each person belongs exactly to one group. Given the array groupSizes of length n telling the group size each person belongs to, return the groups there are and the people's IDs each group includes.
You can return any solution in any order and the same applies for IDs. Also, it is guaranteed that there exists at least one solution.
Example 1:
Input: groupSizes = [2,1,3,3,3,2]
Output: [[1],[0,5],[2,3,4]]
class Solution {
public:
vector<vector<int>> groupThePeople(vector<int>& groupSizes) {
unordered_map<int,vector<int> > myMap;
int n=groupSizes.size();
vector<vector<int>> answer;
for(int i=0;i<n;i++){
myMap[groupSizes[i]].push_back(i); // myMap key/value ; key= group, value=index
cout<<i<<endl;
if(myMap[groupSizes[i]].size()==groupSizes[i]){
cout<<"pushed "<<i<<endl;
answer.push_back(myMap[groupSizes[i]]);
myMap[groupSizes[i]]={};
}
}
return answer;
}
};
Does the map contain a bunch of different vectors, or is there only 1 vector?
Could you explain what exactly is being pushed? When you have map<int,vector<int>>; are you pushing the groupsize as the key, and then the value is the index?
Thus, would the map look like map[groupsize value, vector of indexes]?
How did the output get its first vector [1]? If the value 2 should of been pushed the vector first?
Does the map contain a bunch of different vectors, or is there only 1
vector?
For each key in your map, there exists one vector as a value. In your example, there are 3 vectors:
Key "1", Vector ( 1 )
Key "2", Vector ( 0, 5 )
Key "3", Vector ( 2, 3, 4 )
Could you explain what exactly is being pushed? When you have
map<int,vector<int>>; are you pushing the groupsize as the key, and
then the value is the index?
What you are pushing and where is dependent on the current value of i and n. n is set to the size of groupSizes, so 6. i ranges from 0 to 5. On the first iteration, push_back is called the following way:
myMap[groupSizes[0]].push_back(0);
groupSizes[0] has value "2", as the index 0 for [2,1,3,3,3,2] is "2"
myMap[groupSizes[0]] searches for value "2" in the map. If the key does not exist yet, it is inserted. It then returns the value; if it did not exist yet, it is created -> an empty vector will be returned.
push_back(0) the value "0" is added to the vector for value "2"
Thus, would the map look like map[groupsize value, vector of indexes]?
Yes, but note that the groupsize values are grouped within the map. There are only the values 1, 2 and 3 in it and not 2, 1, 3, 3, 3 and 2.
How did the output get its first vector [1]? If the value 2 should of
been pushed the vector first?
I think you final output code is missing, but I would guess that your output just iterates over your result vector. If that is the case, the order is just a happy coincidence. The groupSize value tells you, how many elements there will be in the vector. So the lower the groupSize value, the sooner it will be finished and pushed into the result vector. So "1" only contains one element and will be pushed first into the result vector, as if(myMap[groupSizes[i]].size()==groupSizes[i]).

Every sum possibilities of elements

From a given array (call it numbers[]), i want another array (results[]) which contains all sum possibilities between elements of the first array.
For example, if I have numbers[] = {1,3,5}, results[] will be {1,3,5,4,8,6,9,0}.
there are 2^n possibilities.
It doesn't matter if a number appears two times because results[] will be a set
I did it for sum of pairs or triplet, and it's very easy. But I don't understand how it works when we sum 0, 1, 2 or n numbers.
This is what I did for pairs :
std::unordered_set<int> pairPossibilities(std::vector<int> &numbers) {
std::unordered_set<int> results;
for(int i=0;i<numbers.size()-1;i++) {
for(int j=i+1;j<numbers.size();j++) {
results.insert(numbers.at(i)+numbers.at(j));
}
}
return results;
}
Also, assuming that the numbers[] is sorted, is there any possibility to sort results[] while we fill it ?
Thanks!
This can be done with Dynamic Programming (DP) in O(n*W) where W = sum{numbers}.
This is basically the same solution of Subset Sum Problem, exploiting the fact that the problem has optimal substructure.
DP[i, 0] = true
DP[-1, w] = false w != 0
DP[i, w] = DP[i-1, w] OR DP[i-1, w - numbers[i]]
Start by following the above solution to find DP[n, sum{numbers}].
As a result, you will get:
DP[n , w] = true if and only if w can be constructed from numbers
Following on from the Dynamic Programming answer, You could go with a recursive solution, and then use memoization to cache the results, top-down approach in contrast to Amit's bottom-up.
vector<int> subsetSum(vector<int>& nums)
{
vector<int> ans;
generateSubsetSum(ans,0,nums,0);
return ans;
}
void generateSubsetSum(vector<int>& ans, int sum, vector<int>& nums, int i)
{
if(i == nums.size() )
{
ans.push_back(sum);
return;
}
generateSubsetSum(ans,sum + nums[i],nums,i + 1);
generateSubsetSum(ans,sum,nums,i + 1);
}
Result is : {9 4 6 1 8 3 5 0} for the set {1,3,5}
This simply picks the first number at the first index i adds it to the sum and recurses. Once it returns, the second branch follows, sum, without the nums[i] added. To memoize this you would have a cache to store sum at i.
I would do something like this (seems easier) [I wanted to put this in comment but can't write the shifting and removing an elem at a time - you might need a linked list]
1 3 5
3 5
-----
4 8
1 3 5
5
-----
6
1 3 5
3 5
5
------
9
Add 0 to the list in the end.
Another way to solve this is create a subset arrays of vector of elements then sum up each array's vector's data.
e.g
1 3 5 = {1, 3} + {1,5} + {3,5} + {1,3,5} after removing sets of single element.
Keep in mind that it is always easier said than done. A single tiny mistake along the implemented algorithm would take a lot of time in debug to find it out. =]]
There has to be a binary chop version, as well. This one is a bit heavy-handed and relies on that set of answers you mention to filter repeated results:
Split the list into 2,
and generate the list of sums for each half
by recursion:
the minimum state is either
2 entries, with 1 result,
or 3 entries with 3 results
alternatively, take it down to 1 entry with 0 results, if you insist
Then combine the 2 halves:
All the returned entries from both halves are legitimate results
There are 4 additional result sets to add to the output result by combining:
The first half inputs vs the second half inputs
The first half outputs vs the second half inputs
The first half inputs vs the second half outputs
The first half outputs vs the second half outputs
Note that the outputs of the two halves may have some elements in common, but they should be treated separately for these combines.
The inputs can be scrubbed from the returned outputs of each recursion if the inputs are legitimate final results. If they are they can either be added back in at the top-level stage or returned by the bottom level stage and not considered again in the combining.
You could use a bitfield instead of a set to filter out the duplicates. There are reasonably efficient ways of stepping through a bitfield to find all the set bits. The max size of the bitfield is the sum of all the inputs.
There is no intelligence here, but lots of opportunity for parallel processing within the recursion and combine steps.

Assertion Error, using STL Vector

for(myIterator = numbers.begin();myIterator != numbers.end() ;myIterator++)
{
resultVect.push_back(*myIterator+2);
numbers.erase(myIterator+2);
}
numbers consist of a series of numbers (eg 1,2,3,4,5,6,7)
Then I would like to erase every 3rd number.
Something like,
1 2 3 4 5 6 ( First round -> 3 is out)
1 2 4 5 6 ( Second round -> 6 is out)
1 2 4 5 ( Third round -> 4 is out)
and so on.
I will store the number that goes out in another vector (resultVect).
Im getting Assertion error. Pls advise tq
When you use erase for a vector, it will relocate the elements after the erase position so the iterators after that will be invalidated.
Second when you say iterator + 2 and that could go beyond the range of the vector too.
Removing an element from the vector invalidates all iterators to that element and beyond (in the current standard, there is an open issue to change this).
The first question is how efficient you want the process to be, if you don't care (much) about performance you can do a simple loop:
for (int i = 3; i < input.size(); i+=3) {
result.push_back(input[i]);
}
for (int i = (input.size()+2)/3 - 1; i >= 0; --i) {
input.erase(input.begin()+i*3);
}
If performance is critical, you can take a look at the std::remove algorithm and use the same approach to avoid doing multiple copies of the elements while you run the algorithm. Basically you need a read and a write head into the original container and you only copy from the read to the write location if the condition is not met.
Simply put: you cannot modify a vector while iterating it. The iterator will become invalid and that will get you an assertion.
To properly do what you want, you might consider creating a copy of the vector with values to keep, and a vector with values to remove. Then replace the number vector by the one with the values to keep.

C++ Equivalent of VLOOKUP function

I'm trying to create an equivalent of the excel VLOOKUP function for a two dimensional csv file I have. If given a number 5 I would like to be able to look at a column of a dynamic table I have and find the row with the highest number less than five in that column.
For example. If I used 5 from my example before:
2 6
3 7
4 11
6 2
9 4
Would return to me 11, the data paired with the highest entry below 5.
I have no idea how to go about doing this. If it helps, the entries in column one (the column I will be searching) will go from smallest to largest.
I am a beginner to C++ so I apologize if I'm missing some obvious method.
std::map can do this pretty easily:
You'd start by creating a map of the correct type, then populating it with your data:
std::map<int, int, std::greater<int> > data;
data[2] = 6;
data[3] = 7;
data[4] = 11;
data[6] = 2;
data[9] = 4;
Then you'd search for data with lower_bound or upper_bound:
std::cout << data.lower_bound(5)->second; // prints 11
A couple of notes: First, note the use of std::greater<T> as the comparison operator. This is necessary because lower_bound will normally return an iterator to the next item (instead of the previous) if the key you're looking for isn't present in the map. Using std::greater<T> sorts the map in reverse, so the "next" item is the smaller one instead of the larger.
Second, note that this automatically sorts the data based on the keys, so it depends only on the data you insert, not the order of insertion.

[only equal operator]what are the fast algorithms to find duplicate elements in a collection and group them?

Suppose we have a collection of elements, and these elements only have equal operator. So, it's impossible to sort them.
how can you pick out those with duplicates and put them into each group with least amount of comparison? preferably in C++, but algorithm is more important than the language. For Example given {E1,E2,E3,E4,E4,E2,E6,E4,E3}, I wish to extract out {E2,E2}, {E3,E3}, {E4,E4,E4}. what data structure and algorithm you will choose?
EDIT
My scenario, if binary data 1 is equal to binary data 2 we can say these two elements are identical. But, only = and != is logical
element 1:
4 0 obj
<< /Type /Pages /Kids 5 0 R /Count 1 >>
stream
.....binary data 1....
endstream
endobj
element 2:
5 0 obj
<< /Type /Pages /Kids 5 0 R /Count 1 >>
stream
.....binary data 2....
endstream
endobj
It is sufficient to find any arbitrary predicate P such that P(a,a)==false, P(a,b) && P(b,a)==false, P(a,b) && P(b,c) implies P(a,c) and !P(a,b) && !P(b,a) implies a == b. Less-then satisfies this property, as thus greater-then. But they're far from the only possibilities.
You can now sort your collection by predicate P, and all elements which are equal will be adjacent. In your case, define P(E1,E2)=true, P(E2,E3)=true, etc.
For your answer, though I am not 100% sure that you want this is only.
If you want good algo try Binary search tree creation. as it is a group,and according to BST properties you can easily group elements.
For Example
BST()
{
count = 0;
if(elementinserted)
count = 1;
if(newelement == already inserted element)
{
count++;
put element in array upto count value;
}
}
I hope this explanation can help you.
If all you have is an equality test, you have no hope.
Suppose you have a situation where each element is unique. And another where only two elements are duplicates.
There are n(n+1)/2 of the second type. Each can only be distinguished from the first by a particular comparison. Which means in the worst case you must do all n(n+1)/2 comparisons: exhastive search over all pairs.
What you need to do is to figure out what else you can really do, as equality only is exceedingly rare.