Searching data stored in a tree - c++

I have this data that is hierarchical and so I store it in a tree. I want to provide a search function to it. Do I have to create a binary tree for that? I don't want to have thousands of nodes twice. Isn't there a kind of tree that allows me to both store the data in the order given and also provide me the binary tree like efficient searching setup, with little overhead?
Any other data structure suggestion will also be appreciated.
Thanks.
EDIT:
Some details: The tree is a very simple "hand made" tree that can be considered very very basic. The thing is, there are thousands of names and other text that will be entered as data that I want to search but I don't want to traverse the nodes in a traditional way and need a fast search like binary search.
Also, importantly, the user must be able to see the structure he has entered and NOT the sorted one. So I cant keep it sorted to support the search. That is why I said I don't want to have thousands of nodes twice.

If you don't want to change your trees hierarchy use a map to store pointers to vertexes: std::map<SearchKeyType,Vertex*> M.
Every time when you will add vertex to your tree you need to add it to your map too. It's very easy: M[key]=&vertex. To find an element use M.find(key);, or M[key]; if you are sure that key exists.
If your tree has duplicate keys, then you should use a multimap.
Edit: If your key's size is too big, than you can use pointer to key instead of key:
inline bool comparisonFunction(SearchKeyType * arg1,SearchKeyType * arg2);
std::map<SearchKeyType *, Vertex *, comparisonFunction> M;
inline bool comparisonFunction(SearchKeyType * arg1,SearchKeyType * arg2)
{
return (*arg1)<(*arg2);
}
to search Element with value V you must write following:
Vertex * v = M[&V]; // assuming that element V exists in M

Related

How to sort List in Strintemplate (v4)?

Here is what I try to achieve:
In my Java code, at some place a Map (may be a HashMap) is created, that may not be sorted:
var map = Map.of("c","third entry","a","first entry","b","second entry");
var data = Map.of("map",map);
Later this map is passed to a stringtemplate loaded from a file with content similar to this:
delimiters "«", "»"
template(data) ::= <<
«data.map.keys:{key|• «data.map.(key)»}; separator="\n"»
>>
However, I want the generated list to be sorted by map key, but there seems to be no function like sort(list). Can I achieve what I want?
desired:
«sorted(data.map.keys):{key|• «data.map.(key)»}; separator="\n"»
I googled for this functionality for some time and I really wonder, whether nobody had wanted sorted lists before.
Of course, I could solve it by passing an extra (sorted) list of keys. But is there a straightforward approach?
best wishes :)

Finding a unique url from a large list of URLs in O(n) time in a single pass

Recently I was asked this question in an interview. I gave an answer in O(n) time but in two passes. Also he asked me how to do the same if the url list cannot fit into the memory. Any help is very much appreciated.
If it all fits in memory, then the problem is simple: Create two sets (choose your favorite data structure), both initially empty. One will contain unique URLs and the other will contain URLs that occur multiple times. Scan the URL list once. For each URL, if it exists in the unique set, remove it from the unique set and put it in the multiple set; otherwise, if it does not exist in the multiple set, add it to the unique set.
If the set does not fit into memory, the problem is difficult. The requirement of O(n) isn't hard to meet, but the requirement of a "single pass" (which seems to exclude random access, among other things) is tough; I don't think it's possible without some constraints on the data. You can use the set approach with a size limit on the sets, but this would be easily defeated by unfortunate orderings of the data and would in any event only have a certain probability (<100%) of finding a unique element if one exists.
EDIT:
If you can design a set data structure that exists in mass storage (so it can be larger than would fit in memory) and can do find, insert, and deletes in O(1) (amortized) time, then you can just use that structure with the first approach to solve the second problem. Perhaps all the interviewer was looking for was to dump the URLs into a data base with a UNIQUE index for URLs and a count column.
One could try to use Trie structure for keeping data. It's compressed so it would take less memory, as memory reusage for common url parts.
loop would look like:
add string s to trie;
check that added string is not finished in existing node
internal node -> compress path
leaf node -> delete path
For the "fits-in-memory" case, you could use two hash-tables as follows (pseudocode):
hash-table uniqueTable = <initialization>;
hash-table nonUniqueTable = <initialization>;
for-each url in url-list {
if (nonUniqueTable.contains(url)) {
continue;
}
else if (uniqueTable.contains(url)) {
nonUniqueTable.add(url);
uniqueTable.remove(url);
}
else {
uniqueTable.add(url)
}
}
if (uniqueTable.size() > 1)
return uniqueTable.first();
Python based
You have a list - not sure where it's "coming" from, but if you already have it in memory then:
L.sort()
from itertools import groupby
for key, vals in groupby(L, lambda L: L):
if len(vals) == 1:
print key
Otherwise use storage (possibly using):
import sqlite3
db = sqlite3.connect('somefile')
db.execute('create table whatever(key)')
Get your data into that, then execute "select * from whatever group by key where count(*) = 1)"
This is actually a classic interview question and the answer they were expecting was that you first sort the urls and then make a binary search.
If it doesn't fit in memory, you can do the same thing with a file.

Is there a Java collection whose objects are unique (as in a set) but has the ability to get the index/position of a certain object(as in a list)?

I have an ordered, unique, set of objects. I am currently using a TreeSet in order to get the ordering correct. However, sets do not have the ability to get index.
My current implementation is fine, but not necessarily intuitive.
TreeSet<T> treeSet = new TreeSet<T>(Comparable c);
// Omitted: Add items to treeSet //
int index = new ArrayList<T>(treeSet)().indexOf(object);
Is there an easier way to do this?
treeSet.headSet(object).size() should do the trick:
import java.util.SortedSet;
import java.util.TreeSet;
class Test {
public static void main(String[] args) {
SortedSet<String> treeSet = new TreeSet<String>();
String first = "index 0";
String second = "index 1";
treeSet.add(first);
treeSet.add(second);
int one = treeSet.headSet(second).size();
System.out.println(one);
// 1
}
}
I also faced the problem of finding element at a certain position in a TreeMap. I enhanced the tree with weights that allow accessing elements by index and finding elements at indexes.
The project is called indexed-tree-map https://github.com/geniot/indexed-tree-map . The implementation for finding index of an element or element at an index in a sorted map is not based on linear iteration but on a tree binary search. Updating weights of the tree is also based on climbing up the tree to the root. So no linear iterations.
Java doesn't have such a thing. Here are a few suggestions what you could do:
Leave it as it is, since this is not as bad as it could be ;)
Use an iterator to go trough your elements
Write a wrapper class which extends TreeSet and add the get functionality.
Checkout Guava and look if they have something like this (I haven't used it, so I can't tell, sorry!)
create an array Object[] arrayView = mySet.toArray(); and then get the elements from that (this is kinda stupid in terms of performance and memory)

How to get id vertex by name?

If I want to get the name by id vertex i can use this funcion: VAS(g, "name",id)
but if I want the opposite way,to get the id by the name, how can I do that?
igraph doesn't provide, on its own, a means to look up vertices by name, and for good reason - mapping from name to ID is a more challenging problem than mapping from ID to name, which is a simple array lookup operation. You could iterate through all the vertices and stop at the one that matches, but this is inefficient for large graphs (O(n) in the number of vertices). A faster way is to use some sort of associative array data structure, such as the dict in #Jasc's answer, and use the names as keys and ID's as values. (You'll need to keep this index in sync with the graph if you change it.) C, on its own, or the standard C library provide no such data structure, but there are many implementations available, for instance the GHash structure found in glib.
I found the following on the igrah website or mailinglist.
g = igraph.Graph(0, directed=True)
g.add_vertices(2)
g.vs[0]["name"] = "Bob"
g.vs[1]["name"] = "Bill"
# build a dict for all vertices to lookup ids by name
name2id = dict((v, k) for k, v in enumerate(g.vs["name"]))
# access specific vertices like this:
id_bob = name2id["Bob"]
print(g.vs[id_bob]["name"])

Making an index-creating class

I'm busy with programming a class that creates an index out of a text-file ASCII/BINARY.
My problem is that I don't really know how to start. I already had some tries but none really worked well for me.
I do NOT need to find the address of the file via the MFT. Just loading the file and finding stuff much faster by searching for the key in the index-file and going in the text-file to the address it shows.
The index-file should be built up as follows:
KEY ADDRESS
1 0xABCDEF
2 0xFEDCBA
. .
. .
We have a text-file with the following example value:
1, 8752 FW,
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++,
******************************************************************************,
------------------------------------------------------------------------------;
I hope that this explains my question a bit better.
Thanks!
It seems to me that all your class needs to do is store an array of pointers or file start offsets to the key locations in the file.
It really depends on what your Key locations represent.
I would suggest that you access the file through your class using some public methods. You can then more easily tie in Key locations with the data written.
For example, your Key locations may be where each new data block written into the file starts from. e.g. first block 1000 bytes, key location 0; second block 2500 bytes, key location 1000; third block 550 bytes; key location 3500; the next block will be 4050 all assuming that 0 is the first byte.
Store the Key values in a variable length array and then you can easily retrieve the starting point for a data block.
If your Key point is signified by some key character then you can use the same class, but with a slight change to store where the Key value is stored. The simplest way is to step through the data until the key character is located, counting the number of characters checked as you go. The count is then used to produce your key location.
Your code snippet isn't so much of an idea as it is the functionality you wish to have in the end.
Recognize that "indexing" merely means "remembering" where things are located. You can accomplish this using any data structure you wish... B-Tree, Red/Black tree, BST, or more advanced structures like suffix trees/suffix arrays.
I recommend you look into such data structures.
edit:
with the new information, I would suggest making your own key/value lookup. Build an array of keys, and associate their values somehow. this may mean building a class or struct that contains both the key and the value, or instead contains the key and a pointer to a struct or class with a value, etc.
Once you have done this, sort the key array. Now, you have the ability to do a binary search on the keys to find the appropriate value for a given key.
You could build a hash table in a similar manner. you could build a BST or similar structure like i mentioned earlier.
I still don't really understand the question (work on your question asking skillz), but as far as I can tell the algorithm will be:
scan the file linearly, the first value up to the first comma (',') is a key, probably. All other keys occur wherever a ';' occurs, up to the next ',' (you might need to skip linebreaks here). If it's a homework assignment, just use scanf() or something to read the key.
print out the key and byte position you found it at to your index file
AFAIUI that's the algorithm, I don't really see the problem here?