recursively find subsets - c++

Here is a recursive function that I'm trying to create that finds all the subsets passed in an STL set. the two params are an STL set to search for subjects, and a number i >= 0 which specifies how big the subsets should be. If the integer is bigger then the set, return empty subset
I don't think I'm doing this correctly. Sometimes it's right, sometimes its not. The stl set gets passed in fine.
list<set<int> > findSub(set<int>& inset, int i)
{
list<set<int> > the_list;
list<set<int> >::iterator el = the_list.begin();
if(inset.size()>i)
{
set<int> tmp_set;
for(int j(0); j<=i;j++)
{
set<int>::iterator first = inset.begin();
tmp_set.insert(*(first));
the_list.push_back(tmp_set);
inset.erase(first);
}
the_list.splice(el,findSub(inset,i));
}
return the_list;
}

From what I understand you are actually trying to generate all subsets of 'i' elements from a given set right ?
Modifying the input set is going to get you into trouble, you'd be better off not modifying it.
I think that the idea is simple enough, though I would say that you got it backwards. Since it looks like homework, i won't give you a C++ algorithm ;)
generate_subsets(set, sizeOfSubsets) # I assume sizeOfSubsets cannot be negative
# use a type that enforces this for god's sake!
if sizeOfSubsets is 0 then return {}
else if sizeOfSubsets is 1 then
result = []
for each element in set do result <- result + {element}
return result
else
result = []
baseSubsets = generate_subsets(set, sizeOfSubsets - 1)
for each subset in baseSubssets
for each element in set
if no element in subset then result <- result + { subset + element }
return result
The key points are:
generate the subsets of lower rank first, as you'll have to iterate over them
don't try to insert an element in a subset if it already is, it would give you a subset of incorrect size
Now, you'll have to understand this and transpose it to 'real' code.

I have been staring at this for several minutes and I can't figure out what your train of thought is for thinking that it would work. You are permanently removing several members of the input list before exploring every possible subset that they could participate in.
Try working out the solution you intend in pseudo-code and see if you can see the problem without the stl interfering.

It seems (I'm not native English) that what you could do is to compute power set (set of all subsets) and then select only subsets matching condition from it.
You can find methods how to calculate power set on Wikipedia Power set page and on Math Is Fun (link is in External links section on that Wikipedia page named Power Set from Math Is Fun and I cannot post it here directly because spam prevention mechanism). On math is fun mainly section It's binary.

I also can't see what this is supposed to achieve.
If this isn't homework with specific restrictions i'd simply suggest testing against a temporary std::set with std::includes().

Related

Struggling to extract a section of a list in Haskell

I am trying to implement a basic function but I'm out of practice with Haskell and struggling so would really appreciate some help. My question is specifically how to select a section of a list by index. I know how to in other languages but have been struggling
[ x | x <- graph, x!! > 5 && x!! <10 ]
I have been fiddling around with basic list comprehension similar to what is above, and while I know that isn't right I was hoping a similarly simple solution would be available.
If anyone wants more information or felt like helping on the further question I have included more information below, thanks!
type Node = Int
type Branch = [Node]
type Graph= [Node]
next :: Branch -> Graph -> [Branch]
This is the individual question for the "next" function
This is the general set up information but most importantly that the graph is represented as a flattened adjacency matric
Apologies for the two pictures but it seemed the best way to convey the information.
As pointed out in the comments !! does not give you the index of a value in the way it seems you expect. It is just an infix for getting an element of a list.
There is no way to get the index of x like this in Haskell since the x object doesn't keep track of where it is.
To fix this we can make a list of objects that do keep track of where they were. This can be achieved with zip.
zip [0..] graph
This creates a list of tuples each containing their index and the value in graph.
So you can write your list comprehensions as
[ x | (index, x) <- zip [0..] graph, index > 5, index < 10 ]
Now this is not going to be terribly fast since it still needs to go through every element of the list despite the fact that we know no element after the 11th will be used. For speed we would want to use a combination of take and drop.
drop 5 (take 10 graph)
However if we wanted to do some other selections (e.g. all even indexes), we can still go back to the list comprehension.
In this case, you could drop 5 <&> take 4. As in drop 5 x & take 4. Drop skips the first few elements and take leaves out all but the first few left after the drop.

Search similar object

Assume I have the following array of objects:
Object 0:
[0]=1.1344
[1]=2.18
...
[N]=1.86
-----------
Object 1 :
[0]=1.1231
[1]=2.16781
...
[N]=1.8765
-------------
Object 2 :
[0]=1.2311
[1]=2.14781
...
[N]=1.5465
--------
Object 17:
[0]=1.31
[1]=2.55
...
[N]=0.75
How can I compare those objects?
You can see that object 0 and object 1 are very similar but object 17 not like any of them.
I would like to have algorithm tha twill give me all the similar object in my array
You tag this question with Algorithm (and I am not expert in C++) so lets give a pseudo code.
First, you should set a threshold which define 2 var with different under that threshold as similar. Second step will be to loop over all pair of elements and check for similarity.
Consider A to be array with n objects and m to be number of fields in each object.
threshold = 0.1
for i in (0, n):
for j in (i+1,n):
flag = true;
for k in (1,m):
if (abs(A[i][k] - A[j][k]) > threshold)
flag = false // if the absolute value of the diff is above the threshold object are not similar
break // no need to continue checks
if (flag)
print: element i and j similar // and do what ever
Time complexity is O(m * n^2).
Notice that you can use the same algorithm to sort the objects array - declare compare function as the max diff between field and then sort accordingly.
Hope that helps!
Your problem essentially boils down to nearest neighbor search which is a well researched problem in data mining.
There are diffent approaches to this problem.
I would suggest to decide first what number of similar elements you want OR to set a given threshold for the similarity. Than you have to iterate through all the vectors and compute a distance function between the query vector and each vector in the database.
I would suggest you to use Euclidean distance in your case since you have real nominal data.
You can read more about the topic of nearest neighbor search and Euclidean distancehere and here. Good luck!
What you need is a classifier, for your problem there are 2 algorithms depends on what you wanted.
If you need to find which object is most similar to the choosen object-m, you can use nearest neighbor algorithm or else if you need to find similar sets of objects you can use k-means algorithm to find k sets.

Ordering by sum of difference

I have a model that has one attribute with a list of floats:
values = ArrayField(models.FloatField(default=0), default=list, size=64, verbose_name=_('Values'))
Currently, I'm getting my entries and order them according to the sum of all diffs with another list:
def diff(l1, l2):
return sum([abs(v1-v2) for v1, v2 in zip(l1, l2)])
list2 = [0.3, 0, 1, 0.5]
entries = Model.objects.all()
entries.sort(key=lambda t: diff(t.values, list2))
This works fast if my numer of entries is very slow small. But I'm afraid with a large number of entries, the comparison and sorting of all the entries will get slow since they have to be loaded from the database. Is there a way to make this more efficient?
best way is to write it yourself, right now you are iterating over a list over 4 times!
although this approach looks pretty but it's not good.
one thing that you can do is:
have a variable called last_diff and set it to 0
iterate through all entries.
iterate though each entry.values
from i = 0 to the end of list, calculate abs(entry.values[i]-list2[i])
sum over these values in a variable called new_diff
if new_diff > last_diff break from inner loop and push the entry into its right place (it's called Insertion Sort, check it out!)
in this way, in average scenario, time complexity is much lower than what you are doing now!
and maybe you must be creative too. I'm gonna share some ideas, check them for yourself to make sure that they are fine.
assuming that:
values list elements are always positive floats.
list2 is always the same for all entries.
then you may be able to say, the bigger the sum over the elements in values, the bigger the diff value is gonna be, no matter what are the elements in list2.
then you might be able to just forget about whole diff function. (test this!)
The only way to makes this really go faster, is to move as much work as possible to the database, i.e. the calculations and the sorting. It wasn't easy, but with the help of this answer I managed to actually write a query for that in almost pure Django:
class Unnest(models.Func):
function = 'UNNEST'
class Abs(models.Func):
function = 'ABS'
class SubquerySum(models.Subquery):
template = '(SELECT sum(%(field)s) FROM (%(subquery)s) _sum)'
x = [0.3, 0, 1, 0.5]
pairdiffs = Model.objects.filter(pk=models.OuterRef('pk')).annotate(
pairdiff=Abs(Unnest('values')-Unnest(models.Value(x, ArrayField(models.FloatField())))),
).values('pairdiff')
entries = Model.objects.all().annotate(
diff=SubquerySum(pairdiffs, field='pairdiff')
).order_by('diff')
The unnest function turns each element of the values into a row. In this case it happens twice, but the two resulting columns are instantly subtracted and made positive. Still, there are as many rows per pk as there are values. These need to be summed, but that's not as easy as it sounds. The column can't be simply be aggregated. This was by far the most tricky part—even after fiddling with it for so long, I still don't quite understand why Postgres needs this indirection. Of the few options there are to make it work, I believe a subquery is the single one expressible in Django (and only as of 1.11).
Note that the above behaves exactly the same as with zip, i.e. the when one array is longer than the other, the remainder is ignored.
Further improvements
While it will be a lot faster already when you don't have to retrieve all rows anymore and loop over them in Python, it doesn't change yet that it results in a full table scan. All rows will have to be processed, every single time. You can do better, though. Have a look into the cube extension. Use it to calculate the L1 distance—at least, that seems what you're calculating—directly with the <#> operator. That will require the use of RawSQL or a custom Expression. Then add a GiST index on the SQL expression cube("values"), or directly on the field if you're able to change the type from float[] to cube. In case of the latter, you might have to implement your own CubeField too—I haven't found any package yet that provides it. In any case, with all that in place, top-N queries on the lowest distance will be fully indexed hence blazing fast.

Elixir: Find middle item in list

I'm trying to learn Elixir. In most other languages i've battled with, this would be an easy task.
However, i can't seem to figure out how to access a list item by index in Elixir, which i need for finding the median item in my list. Any clarification would be greatly appreciated!
You will want to look into Enum.at/3.
a = [1,2,3,4,5]
middle_index = a |> length() |> div(2)
Enum.at(a, middle_index)
Note: This is expensive as it needs to traverse the entire list to find the length of the list, and then traverse halfway through the list to find what the actual element is. Generally speaking, if you need random access to an item in a list, you should be looking for a different data structure.
This is how I would do it:
Enum.at(x, div(length(x), 2))
Enum.at/3 retrieves the value at a particular index of an enumerable. div/2 is the equivalent of the Python 2.x / integer division.

C++ sorting algorithm

Thanks for looking at this question in advance.
I am trying to order the following list of items:
Bpgvjdfj,Bvfbyfzc
Zjmvxouu,Fsmotsaa
Xocbwmnd,Fcdlnmhb
Fsmotsaa,Zexyegma
Bvfbyfzc,Qkignteu
Uysmwjdb,Wzujllbk
Fwhbryyz,Byoifnrp
Klqljfrk,Bpgvjdfj
Qkignteu,Wgqtalnh
Wgqtalnh,Coyuhnbx
Sgtgyldw,Fwhbryyz
Coyuhnbx,Zjmvxouu
Zvjxfwkx,Sgtgyldw
Czeagvnj,Uysmwjdb
Oljgjisa,Dffkuztu
Zexyegma,Zvjxfwkx
Fcdlnmhb,Klqljfrk
Wzujllbk,Oljgjisa
Byoifnrp,Czeagvnj
Into the following order:
Bpgvjdfj
Bvfbyfzc
Qkignteu
Wgqtalnh
Coyuhnbx
Zjmvxouu
Fsmotsaa
Zexyegma
Zvjxfwkx
Sgtgyldw
Fwhbryyz
Byoifnrp
Czeagvnj
Uysmwjdb
Wzujllbk
Oljgjisa
Dffkuztu
This is done by:
Taking the first pair and putting the names into a list
Using the second name of the pair, find the pair where it is used as the first name
Add the second name of that pair to the list
Repeat 2 & 3
I am populating an unordered_map with the pairs then sorting and adding each name to a list. This can be seen in the following code:
westIter = westMap.begin();
std::string westCurrent = westIter->second;
westList.push_front(westCurrent);
for(int i = 0; i < 30; i++)
{
if(westMap.find(westCurrent) != westMap.end())
{
//find pair in map where first iterator is equal to "westCurrent"
//append second iterator of pair to list
}
westIter++;
}
Note: I'm not sure if "push_front" is correct at this moment in time as I have only got the first value inserted.
My question is could someone give me some insight as to how I could go about this? As I am unsure of the best way and whether my thinking is correct. Any insight would be appreciated.
There is but one weakness in your plan. You need to first find the first person of the chain, the Mr New York.
Your algorithm assumes the line starts with the first guy. For that to work, you should first scan the entire map to find the one name that does not appear as a second element. That is Mr New York and you can proceed from there. push_back is what you would need to use here.
Create a data structure that stores a chain, its front and back. Store in a hash table with 'back' as key.
Create a bunch of singleton chains (one for each element)
Iteratively, pick a chain find its 'front' in the hash table (i.e. find another chain that has the same element as 'back') and merge them
Do it until you are left with only one chain