Here is what I try to achieve:
In my Java code, at some place a Map (may be a HashMap) is created, that may not be sorted:
var map = Map.of("c","third entry","a","first entry","b","second entry");
var data = Map.of("map",map);
Later this map is passed to a stringtemplate loaded from a file with content similar to this:
delimiters "«", "»"
template(data) ::= <<
«data.map.keys:{key|• «data.map.(key)»}; separator="\n"»
>>
However, I want the generated list to be sorted by map key, but there seems to be no function like sort(list). Can I achieve what I want?
desired:
«sorted(data.map.keys):{key|• «data.map.(key)»}; separator="\n"»
I googled for this functionality for some time and I really wonder, whether nobody had wanted sorted lists before.
Of course, I could solve it by passing an extra (sorted) list of keys. But is there a straightforward approach?
best wishes :)
Related
I have an array and hash which I need to combine. Please let me know the simplest way to do this.
array1 = [:user_id, :project_id, :task_id]
entry_hash = {"User"=>1, "Project"=>[8], "Task"=>[87]}
When it is combined i want a hash like
output = {"user_id"=>1, "project_id"=>8, "task_id"=>87}
Thanks for the help!
It's a bit unclear to me what you want to achieve here. Looking at your example, the easiest solution would be to change the keys of entry_hash using .downcase and adding _id. For the value, you could check if it's an array and if so, use the first value.
output = {}
entry_hash.each do |key, value|
output[key.downcase + '_id'] = value.kind_of?(Array) ? value[0] : value
end
This assumes of course that the keys in the hash are the nouns for the column names in the array. The code above will not work if the names are more complex (e.g. CamelCaseName or snake_case_id). Rails comes with ActiveSupport that can help you there, but this is a totally different question: Converting camel case to underscore case in ruby
If array and hash don't share the same names there is no easy way to do this automatically. Hash doesn't guarantee the order of its elements, so iterating through both and mapping values like in the above snippet won't work reliably.
Recently I was asked this question in an interview. I gave an answer in O(n) time but in two passes. Also he asked me how to do the same if the url list cannot fit into the memory. Any help is very much appreciated.
If it all fits in memory, then the problem is simple: Create two sets (choose your favorite data structure), both initially empty. One will contain unique URLs and the other will contain URLs that occur multiple times. Scan the URL list once. For each URL, if it exists in the unique set, remove it from the unique set and put it in the multiple set; otherwise, if it does not exist in the multiple set, add it to the unique set.
If the set does not fit into memory, the problem is difficult. The requirement of O(n) isn't hard to meet, but the requirement of a "single pass" (which seems to exclude random access, among other things) is tough; I don't think it's possible without some constraints on the data. You can use the set approach with a size limit on the sets, but this would be easily defeated by unfortunate orderings of the data and would in any event only have a certain probability (<100%) of finding a unique element if one exists.
EDIT:
If you can design a set data structure that exists in mass storage (so it can be larger than would fit in memory) and can do find, insert, and deletes in O(1) (amortized) time, then you can just use that structure with the first approach to solve the second problem. Perhaps all the interviewer was looking for was to dump the URLs into a data base with a UNIQUE index for URLs and a count column.
One could try to use Trie structure for keeping data. It's compressed so it would take less memory, as memory reusage for common url parts.
loop would look like:
add string s to trie;
check that added string is not finished in existing node
internal node -> compress path
leaf node -> delete path
For the "fits-in-memory" case, you could use two hash-tables as follows (pseudocode):
hash-table uniqueTable = <initialization>;
hash-table nonUniqueTable = <initialization>;
for-each url in url-list {
if (nonUniqueTable.contains(url)) {
continue;
}
else if (uniqueTable.contains(url)) {
nonUniqueTable.add(url);
uniqueTable.remove(url);
}
else {
uniqueTable.add(url)
}
}
if (uniqueTable.size() > 1)
return uniqueTable.first();
Python based
You have a list - not sure where it's "coming" from, but if you already have it in memory then:
L.sort()
from itertools import groupby
for key, vals in groupby(L, lambda L: L):
if len(vals) == 1:
print key
Otherwise use storage (possibly using):
import sqlite3
db = sqlite3.connect('somefile')
db.execute('create table whatever(key)')
Get your data into that, then execute "select * from whatever group by key where count(*) = 1)"
This is actually a classic interview question and the answer they were expecting was that you first sort the urls and then make a binary search.
If it doesn't fit in memory, you can do the same thing with a file.
If I want to get the name by id vertex i can use this funcion: VAS(g, "name",id)
but if I want the opposite way,to get the id by the name, how can I do that?
igraph doesn't provide, on its own, a means to look up vertices by name, and for good reason - mapping from name to ID is a more challenging problem than mapping from ID to name, which is a simple array lookup operation. You could iterate through all the vertices and stop at the one that matches, but this is inefficient for large graphs (O(n) in the number of vertices). A faster way is to use some sort of associative array data structure, such as the dict in #Jasc's answer, and use the names as keys and ID's as values. (You'll need to keep this index in sync with the graph if you change it.) C, on its own, or the standard C library provide no such data structure, but there are many implementations available, for instance the GHash structure found in glib.
I found the following on the igrah website or mailinglist.
g = igraph.Graph(0, directed=True)
g.add_vertices(2)
g.vs[0]["name"] = "Bob"
g.vs[1]["name"] = "Bill"
# build a dict for all vertices to lookup ids by name
name2id = dict((v, k) for k, v in enumerate(g.vs["name"]))
# access specific vertices like this:
id_bob = name2id["Bob"]
print(g.vs[id_bob]["name"])
Assuming I have lines of data like the following that show user names and their favorite fruits:
Alice\tApple
Bob\tApple
Charlie\tGuava
Alice\tOrange
I'd like to create a pig query that shows the favorite fruit of each user. If a user appears multiple times, then I'd like to show "Multiple". For example, the result with the data above should be:
Alice\tMultiple
Bob\tApple
Charlie\tGuava
In SQL, this could be done something like this (although it wouldn't necessarily perform very well):
select user, case when count(fruit) > 1 then 'Multiple' else max(fruit) end
from FruitPreferences
group by user
But I can't figure out the equivalent PigLatin. Any ideas?
Write a "Aggregate Function" Pig UDF (scroll down to "Aggregate Functions"). This is a user-defined function that takes a bag and outputs a scalar. So basically, your UDF would take in the bag, determine if there is more than one item in it, and transform it accordingly with an if statement.
I can think of a way of doing this without a UDF, but it is definitely awkward. After your GROUP, use SPLIT to split your data set into two: one in which the count is 1 and one in which the count is more than one:
SPLIT grouped INTO one IF COUNT(fruit) == 0, more IF COUNT(fruit) > 0;
Then, separately use FOREACH ... GENERATE on each to transform it:
one = FOREACH one GENERATE name, MAX(fruit); -- hack using MAX to get the item
more = FOREACH more GENERATE name, 'Multiple';
Finally, union them back:
out = UNION one, more;
I haven't really found a better way of handing the same data set in two different ways based on some conditional, like you want. I typically do some sort of split/recombine like I did here. I believe Pig will be smart and make a plan that doesn't use more than 1 M/R job.
Disclaimer: I can't actually test this code at the moment, so it may have some mistakes.
Update:
In looking harder, I was reminded of the bicond operator and I think that will work here.
b = FOREACH a GENERATE name, (COUNT(fruit)==1 ? MAX(FRUIT) : 'Multiple');
I have this data that is hierarchical and so I store it in a tree. I want to provide a search function to it. Do I have to create a binary tree for that? I don't want to have thousands of nodes twice. Isn't there a kind of tree that allows me to both store the data in the order given and also provide me the binary tree like efficient searching setup, with little overhead?
Any other data structure suggestion will also be appreciated.
Thanks.
EDIT:
Some details: The tree is a very simple "hand made" tree that can be considered very very basic. The thing is, there are thousands of names and other text that will be entered as data that I want to search but I don't want to traverse the nodes in a traditional way and need a fast search like binary search.
Also, importantly, the user must be able to see the structure he has entered and NOT the sorted one. So I cant keep it sorted to support the search. That is why I said I don't want to have thousands of nodes twice.
If you don't want to change your trees hierarchy use a map to store pointers to vertexes: std::map<SearchKeyType,Vertex*> M.
Every time when you will add vertex to your tree you need to add it to your map too. It's very easy: M[key]=&vertex. To find an element use M.find(key);, or M[key]; if you are sure that key exists.
If your tree has duplicate keys, then you should use a multimap.
Edit: If your key's size is too big, than you can use pointer to key instead of key:
inline bool comparisonFunction(SearchKeyType * arg1,SearchKeyType * arg2);
std::map<SearchKeyType *, Vertex *, comparisonFunction> M;
inline bool comparisonFunction(SearchKeyType * arg1,SearchKeyType * arg2)
{
return (*arg1)<(*arg2);
}
to search Element with value V you must write following:
Vertex * v = M[&V]; // assuming that element V exists in M