C++ sorting algorithm - c++

Thanks for looking at this question in advance.
I am trying to order the following list of items:
Bpgvjdfj,Bvfbyfzc
Zjmvxouu,Fsmotsaa
Xocbwmnd,Fcdlnmhb
Fsmotsaa,Zexyegma
Bvfbyfzc,Qkignteu
Uysmwjdb,Wzujllbk
Fwhbryyz,Byoifnrp
Klqljfrk,Bpgvjdfj
Qkignteu,Wgqtalnh
Wgqtalnh,Coyuhnbx
Sgtgyldw,Fwhbryyz
Coyuhnbx,Zjmvxouu
Zvjxfwkx,Sgtgyldw
Czeagvnj,Uysmwjdb
Oljgjisa,Dffkuztu
Zexyegma,Zvjxfwkx
Fcdlnmhb,Klqljfrk
Wzujllbk,Oljgjisa
Byoifnrp,Czeagvnj
Into the following order:
Bpgvjdfj
Bvfbyfzc
Qkignteu
Wgqtalnh
Coyuhnbx
Zjmvxouu
Fsmotsaa
Zexyegma
Zvjxfwkx
Sgtgyldw
Fwhbryyz
Byoifnrp
Czeagvnj
Uysmwjdb
Wzujllbk
Oljgjisa
Dffkuztu
This is done by:
Taking the first pair and putting the names into a list
Using the second name of the pair, find the pair where it is used as the first name
Add the second name of that pair to the list
Repeat 2 & 3
I am populating an unordered_map with the pairs then sorting and adding each name to a list. This can be seen in the following code:
westIter = westMap.begin();
std::string westCurrent = westIter->second;
westList.push_front(westCurrent);
for(int i = 0; i < 30; i++)
{
if(westMap.find(westCurrent) != westMap.end())
{
//find pair in map where first iterator is equal to "westCurrent"
//append second iterator of pair to list
}
westIter++;
}
Note: I'm not sure if "push_front" is correct at this moment in time as I have only got the first value inserted.
My question is could someone give me some insight as to how I could go about this? As I am unsure of the best way and whether my thinking is correct. Any insight would be appreciated.

There is but one weakness in your plan. You need to first find the first person of the chain, the Mr New York.
Your algorithm assumes the line starts with the first guy. For that to work, you should first scan the entire map to find the one name that does not appear as a second element. That is Mr New York and you can proceed from there. push_back is what you would need to use here.

Create a data structure that stores a chain, its front and back. Store in a hash table with 'back' as key.
Create a bunch of singleton chains (one for each element)
Iteratively, pick a chain find its 'front' in the hash table (i.e. find another chain that has the same element as 'back') and merge them
Do it until you are left with only one chain

Related

Recursive backtracking, showing the best solution

For school I am supposed to use recursive backtracking to solve a Boat puzzle. The user inputs a maximum weight for the boat, the amount of item types, and a weight and value for each item type. More than one of each item type can be placed on the boat.
Our assignment states "The program should find a solution that fills the boat with selected valuable items such that the total value of the items in the boat is maximized while the total weight of the items stays within the weight capacity of the boat."
It also has pretty specific template for the recursive backtracking algorithm.
Currently I am using contiguous lists of items to store the possible items and the items on the boat. The item struct includes int members for weight, value, count (of how many times it is used) and a unique code for printing purposes. I then have a Boat class which contains data members max_weight, current_weight, value_sum, and members for each of the contiguous lists, and then member functions needed to solve the puzzle. All of my class functions seem to be working perfectly and my recursion is indeed displaying the correct answer given the example input.
The thing I can't figure out is the condition for extra credit, which is, "Modify your program so that it displays the best solution, which has the lowest total weight. If there are two solutions with the same total weight, break the tie by selecting the solution with the least items in it." I've looked at it for awhile, but I'm just not sure how I can change it make sure the weight is minimized while also maximizing the value. Here is the code for my solution:
bool solve(Boat &boat) {
if (boat.no_more()) {
boat.print();
return true;
}
else {
int pos;
for (int i = 0; i < boat.size(); i++){
if (boat.can_place(i)) {
pos = boat.add_item(i);
bool solved = solve(boat);
boat.remove_item(pos);
if (solved) return true;
}
}
return false;
}
}
All functions do pretty much exactly what their name says. No more returns true if none of the possible items will fit on the boat. Size returns the size of the list of possible items. Adding and removing items change the item count data and also the Boat current_weight and value_sum members accordingly. Also the add_item, remove_item and can_place parameter is the index of the possible item that is being used. In order to make sure maximized value is found, the list of possible items is sorted in descending order by value in the Boat's constructor, which takes a list of possible items as a parameter.
Also here is an example of what input and output look like:
Any insight is greatly appreciated!
It turned out that the above solution was correct. The only reason I was getting an incorrect answer was because of my implementation of the nomore() function. In the function I was checking if any item in the possible items list was less than the weight left on the boat. I should have been checking if they were less than or equal to the weight on the boat. A simple mistake.
The wikipedia entry was indeed of use and I enjoyed the comic :)

Finding a unique url from a large list of URLs in O(n) time in a single pass

Recently I was asked this question in an interview. I gave an answer in O(n) time but in two passes. Also he asked me how to do the same if the url list cannot fit into the memory. Any help is very much appreciated.
If it all fits in memory, then the problem is simple: Create two sets (choose your favorite data structure), both initially empty. One will contain unique URLs and the other will contain URLs that occur multiple times. Scan the URL list once. For each URL, if it exists in the unique set, remove it from the unique set and put it in the multiple set; otherwise, if it does not exist in the multiple set, add it to the unique set.
If the set does not fit into memory, the problem is difficult. The requirement of O(n) isn't hard to meet, but the requirement of a "single pass" (which seems to exclude random access, among other things) is tough; I don't think it's possible without some constraints on the data. You can use the set approach with a size limit on the sets, but this would be easily defeated by unfortunate orderings of the data and would in any event only have a certain probability (<100%) of finding a unique element if one exists.
EDIT:
If you can design a set data structure that exists in mass storage (so it can be larger than would fit in memory) and can do find, insert, and deletes in O(1) (amortized) time, then you can just use that structure with the first approach to solve the second problem. Perhaps all the interviewer was looking for was to dump the URLs into a data base with a UNIQUE index for URLs and a count column.
One could try to use Trie structure for keeping data. It's compressed so it would take less memory, as memory reusage for common url parts.
loop would look like:
add string s to trie;
check that added string is not finished in existing node
internal node -> compress path
leaf node -> delete path
For the "fits-in-memory" case, you could use two hash-tables as follows (pseudocode):
hash-table uniqueTable = <initialization>;
hash-table nonUniqueTable = <initialization>;
for-each url in url-list {
if (nonUniqueTable.contains(url)) {
continue;
}
else if (uniqueTable.contains(url)) {
nonUniqueTable.add(url);
uniqueTable.remove(url);
}
else {
uniqueTable.add(url)
}
}
if (uniqueTable.size() > 1)
return uniqueTable.first();
Python based
You have a list - not sure where it's "coming" from, but if you already have it in memory then:
L.sort()
from itertools import groupby
for key, vals in groupby(L, lambda L: L):
if len(vals) == 1:
print key
Otherwise use storage (possibly using):
import sqlite3
db = sqlite3.connect('somefile')
db.execute('create table whatever(key)')
Get your data into that, then execute "select * from whatever group by key where count(*) = 1)"
This is actually a classic interview question and the answer they were expecting was that you first sort the urls and then make a binary search.
If it doesn't fit in memory, you can do the same thing with a file.

Is there a Java collection whose objects are unique (as in a set) but has the ability to get the index/position of a certain object(as in a list)?

I have an ordered, unique, set of objects. I am currently using a TreeSet in order to get the ordering correct. However, sets do not have the ability to get index.
My current implementation is fine, but not necessarily intuitive.
TreeSet<T> treeSet = new TreeSet<T>(Comparable c);
// Omitted: Add items to treeSet //
int index = new ArrayList<T>(treeSet)().indexOf(object);
Is there an easier way to do this?
treeSet.headSet(object).size() should do the trick:
import java.util.SortedSet;
import java.util.TreeSet;
class Test {
public static void main(String[] args) {
SortedSet<String> treeSet = new TreeSet<String>();
String first = "index 0";
String second = "index 1";
treeSet.add(first);
treeSet.add(second);
int one = treeSet.headSet(second).size();
System.out.println(one);
// 1
}
}
I also faced the problem of finding element at a certain position in a TreeMap. I enhanced the tree with weights that allow accessing elements by index and finding elements at indexes.
The project is called indexed-tree-map https://github.com/geniot/indexed-tree-map . The implementation for finding index of an element or element at an index in a sorted map is not based on linear iteration but on a tree binary search. Updating weights of the tree is also based on climbing up the tree to the root. So no linear iterations.
Java doesn't have such a thing. Here are a few suggestions what you could do:
Leave it as it is, since this is not as bad as it could be ;)
Use an iterator to go trough your elements
Write a wrapper class which extends TreeSet and add the get functionality.
Checkout Guava and look if they have something like this (I haven't used it, so I can't tell, sorry!)
create an array Object[] arrayView = mySet.toArray(); and then get the elements from that (this is kinda stupid in terms of performance and memory)

Searching data stored in a tree

I have this data that is hierarchical and so I store it in a tree. I want to provide a search function to it. Do I have to create a binary tree for that? I don't want to have thousands of nodes twice. Isn't there a kind of tree that allows me to both store the data in the order given and also provide me the binary tree like efficient searching setup, with little overhead?
Any other data structure suggestion will also be appreciated.
Thanks.
EDIT:
Some details: The tree is a very simple "hand made" tree that can be considered very very basic. The thing is, there are thousands of names and other text that will be entered as data that I want to search but I don't want to traverse the nodes in a traditional way and need a fast search like binary search.
Also, importantly, the user must be able to see the structure he has entered and NOT the sorted one. So I cant keep it sorted to support the search. That is why I said I don't want to have thousands of nodes twice.
If you don't want to change your trees hierarchy use a map to store pointers to vertexes: std::map<SearchKeyType,Vertex*> M.
Every time when you will add vertex to your tree you need to add it to your map too. It's very easy: M[key]=&vertex. To find an element use M.find(key);, or M[key]; if you are sure that key exists.
If your tree has duplicate keys, then you should use a multimap.
Edit: If your key's size is too big, than you can use pointer to key instead of key:
inline bool comparisonFunction(SearchKeyType * arg1,SearchKeyType * arg2);
std::map<SearchKeyType *, Vertex *, comparisonFunction> M;
inline bool comparisonFunction(SearchKeyType * arg1,SearchKeyType * arg2)
{
return (*arg1)<(*arg2);
}
to search Element with value V you must write following:
Vertex * v = M[&V]; // assuming that element V exists in M

recursively find subsets

Here is a recursive function that I'm trying to create that finds all the subsets passed in an STL set. the two params are an STL set to search for subjects, and a number i >= 0 which specifies how big the subsets should be. If the integer is bigger then the set, return empty subset
I don't think I'm doing this correctly. Sometimes it's right, sometimes its not. The stl set gets passed in fine.
list<set<int> > findSub(set<int>& inset, int i)
{
list<set<int> > the_list;
list<set<int> >::iterator el = the_list.begin();
if(inset.size()>i)
{
set<int> tmp_set;
for(int j(0); j<=i;j++)
{
set<int>::iterator first = inset.begin();
tmp_set.insert(*(first));
the_list.push_back(tmp_set);
inset.erase(first);
}
the_list.splice(el,findSub(inset,i));
}
return the_list;
}
From what I understand you are actually trying to generate all subsets of 'i' elements from a given set right ?
Modifying the input set is going to get you into trouble, you'd be better off not modifying it.
I think that the idea is simple enough, though I would say that you got it backwards. Since it looks like homework, i won't give you a C++ algorithm ;)
generate_subsets(set, sizeOfSubsets) # I assume sizeOfSubsets cannot be negative
# use a type that enforces this for god's sake!
if sizeOfSubsets is 0 then return {}
else if sizeOfSubsets is 1 then
result = []
for each element in set do result <- result + {element}
return result
else
result = []
baseSubsets = generate_subsets(set, sizeOfSubsets - 1)
for each subset in baseSubssets
for each element in set
if no element in subset then result <- result + { subset + element }
return result
The key points are:
generate the subsets of lower rank first, as you'll have to iterate over them
don't try to insert an element in a subset if it already is, it would give you a subset of incorrect size
Now, you'll have to understand this and transpose it to 'real' code.
I have been staring at this for several minutes and I can't figure out what your train of thought is for thinking that it would work. You are permanently removing several members of the input list before exploring every possible subset that they could participate in.
Try working out the solution you intend in pseudo-code and see if you can see the problem without the stl interfering.
It seems (I'm not native English) that what you could do is to compute power set (set of all subsets) and then select only subsets matching condition from it.
You can find methods how to calculate power set on Wikipedia Power set page and on Math Is Fun (link is in External links section on that Wikipedia page named Power Set from Math Is Fun and I cannot post it here directly because spam prevention mechanism). On math is fun mainly section It's binary.
I also can't see what this is supposed to achieve.
If this isn't homework with specific restrictions i'd simply suggest testing against a temporary std::set with std::includes().