If I want to get the name by id vertex i can use this funcion: VAS(g, "name",id)
but if I want the opposite way,to get the id by the name, how can I do that?
igraph doesn't provide, on its own, a means to look up vertices by name, and for good reason - mapping from name to ID is a more challenging problem than mapping from ID to name, which is a simple array lookup operation. You could iterate through all the vertices and stop at the one that matches, but this is inefficient for large graphs (O(n) in the number of vertices). A faster way is to use some sort of associative array data structure, such as the dict in #Jasc's answer, and use the names as keys and ID's as values. (You'll need to keep this index in sync with the graph if you change it.) C, on its own, or the standard C library provide no such data structure, but there are many implementations available, for instance the GHash structure found in glib.
I found the following on the igrah website or mailinglist.
g = igraph.Graph(0, directed=True)
g.add_vertices(2)
g.vs[0]["name"] = "Bob"
g.vs[1]["name"] = "Bill"
# build a dict for all vertices to lookup ids by name
name2id = dict((v, k) for k, v in enumerate(g.vs["name"]))
# access specific vertices like this:
id_bob = name2id["Bob"]
print(g.vs[id_bob]["name"])
Related
Here is what I try to achieve:
In my Java code, at some place a Map (may be a HashMap) is created, that may not be sorted:
var map = Map.of("c","third entry","a","first entry","b","second entry");
var data = Map.of("map",map);
Later this map is passed to a stringtemplate loaded from a file with content similar to this:
delimiters "«", "»"
template(data) ::= <<
«data.map.keys:{key|• «data.map.(key)»}; separator="\n"»
>>
However, I want the generated list to be sorted by map key, but there seems to be no function like sort(list). Can I achieve what I want?
desired:
«sorted(data.map.keys):{key|• «data.map.(key)»}; separator="\n"»
I googled for this functionality for some time and I really wonder, whether nobody had wanted sorted lists before.
Of course, I could solve it by passing an extra (sorted) list of keys. But is there a straightforward approach?
best wishes :)
I have multiple indices in Elasticsearch (and the corresponding documents in Django created using django-elasticsearch-dsl). All of the indices have these settings:
settings = {'number_of_shards': 1,
'number_of_replicas': 0}
Now, I am trying to perform a search across all the 10 indices. In order to retrieve consistent scoring between the results from different indices, I am using dfs_query_then_fetch:
search = Search(index=['mov*'])
search = search.params(search_type='dfs_query_then_fetch')
objects = search.query("multi_match", query='Tom & Jerry', fields=['title', 'actors'])
I get bad results due to inconsistent scoring. A book called 'A story of Jerry and his friend Tom' from one index can be ranked higher than the cartoon 'Tom & Jerry' from another index. The reason is that dfs_query_then_fetch is not working. When I remove it or substitute with the simple query_then_fetch, I get absolutely the same results with the identical scoring.
I have tested it on URI requests as well, and I always get the same scores for both search types.
What can be the reason for it?
UPDATE: The results are actually not the same, but they are only really slightly different, e.g. a score of 50.1 with dfs and 50.0 without dfs, while the same model within one index has a score of 80.0.
If the number of shards is 1, then dfs_query_then_fetch and query_then_fetch will return the same result. DFS query will do a query to all shards and then show you results based on the scores computed, but in this case there is only one shard.
Regarding the scoring, you might wanna have a look at your actors field too. Also, do let us know what are the analyzer and tokenizer if you have used custom ones?
I'm trying to come up with an elegant solution for representing place/transition petri nets.
So far I save them as follows:
{:netname {:places {:name tokens, ...}
:transitions #{:t1, :t2, :t3, ...}
:edges_in #{[:from :to tokens], ...}
:edges_out #{[:from :to tokens], ...}}}
tokens is a number, everything starts with a symbol with the corresponding name.
//edit - Some more clarification:
The :netname and :name are unique, because it has to be possible to merge 2 nets, where the places again have to have unique names. The numerical tokens are determined by the user of the petri nets during creation of a place or edge.
I would be thankful for some pointers or links to a more elaborate / better data structure for my problem.
//edit 2 - I reworked my first take on the data-structure, because of the uniquenes of place-names. :places now references a hashmap. Also edges_in and out are now hashmaps, because every edge is unique with its origin, destination and token number.
//edit 3 - The use of the structure: It is read and written to in the same quantity i would say. The way a petri net is used, there is a back and forth between modifying the net and reading it, with maybe slightly more reading towards the end.
I also modified my structure above slightly, so :edges_in and :edges_out now saves the triplets as a vector instead of a list. This simplyfies saving the hashmap to file and reading it from it, because load-string evaluates lists as expressions.
You could look at ISO 15909 interchange format for HLPNs called PNML. This would at least provide you with a basis for a standard interface to your data structures.
Recently I was asked this question in an interview. I gave an answer in O(n) time but in two passes. Also he asked me how to do the same if the url list cannot fit into the memory. Any help is very much appreciated.
If it all fits in memory, then the problem is simple: Create two sets (choose your favorite data structure), both initially empty. One will contain unique URLs and the other will contain URLs that occur multiple times. Scan the URL list once. For each URL, if it exists in the unique set, remove it from the unique set and put it in the multiple set; otherwise, if it does not exist in the multiple set, add it to the unique set.
If the set does not fit into memory, the problem is difficult. The requirement of O(n) isn't hard to meet, but the requirement of a "single pass" (which seems to exclude random access, among other things) is tough; I don't think it's possible without some constraints on the data. You can use the set approach with a size limit on the sets, but this would be easily defeated by unfortunate orderings of the data and would in any event only have a certain probability (<100%) of finding a unique element if one exists.
EDIT:
If you can design a set data structure that exists in mass storage (so it can be larger than would fit in memory) and can do find, insert, and deletes in O(1) (amortized) time, then you can just use that structure with the first approach to solve the second problem. Perhaps all the interviewer was looking for was to dump the URLs into a data base with a UNIQUE index for URLs and a count column.
One could try to use Trie structure for keeping data. It's compressed so it would take less memory, as memory reusage for common url parts.
loop would look like:
add string s to trie;
check that added string is not finished in existing node
internal node -> compress path
leaf node -> delete path
For the "fits-in-memory" case, you could use two hash-tables as follows (pseudocode):
hash-table uniqueTable = <initialization>;
hash-table nonUniqueTable = <initialization>;
for-each url in url-list {
if (nonUniqueTable.contains(url)) {
continue;
}
else if (uniqueTable.contains(url)) {
nonUniqueTable.add(url);
uniqueTable.remove(url);
}
else {
uniqueTable.add(url)
}
}
if (uniqueTable.size() > 1)
return uniqueTable.first();
Python based
You have a list - not sure where it's "coming" from, but if you already have it in memory then:
L.sort()
from itertools import groupby
for key, vals in groupby(L, lambda L: L):
if len(vals) == 1:
print key
Otherwise use storage (possibly using):
import sqlite3
db = sqlite3.connect('somefile')
db.execute('create table whatever(key)')
Get your data into that, then execute "select * from whatever group by key where count(*) = 1)"
This is actually a classic interview question and the answer they were expecting was that you first sort the urls and then make a binary search.
If it doesn't fit in memory, you can do the same thing with a file.
I have this data that is hierarchical and so I store it in a tree. I want to provide a search function to it. Do I have to create a binary tree for that? I don't want to have thousands of nodes twice. Isn't there a kind of tree that allows me to both store the data in the order given and also provide me the binary tree like efficient searching setup, with little overhead?
Any other data structure suggestion will also be appreciated.
Thanks.
EDIT:
Some details: The tree is a very simple "hand made" tree that can be considered very very basic. The thing is, there are thousands of names and other text that will be entered as data that I want to search but I don't want to traverse the nodes in a traditional way and need a fast search like binary search.
Also, importantly, the user must be able to see the structure he has entered and NOT the sorted one. So I cant keep it sorted to support the search. That is why I said I don't want to have thousands of nodes twice.
If you don't want to change your trees hierarchy use a map to store pointers to vertexes: std::map<SearchKeyType,Vertex*> M.
Every time when you will add vertex to your tree you need to add it to your map too. It's very easy: M[key]=&vertex. To find an element use M.find(key);, or M[key]; if you are sure that key exists.
If your tree has duplicate keys, then you should use a multimap.
Edit: If your key's size is too big, than you can use pointer to key instead of key:
inline bool comparisonFunction(SearchKeyType * arg1,SearchKeyType * arg2);
std::map<SearchKeyType *, Vertex *, comparisonFunction> M;
inline bool comparisonFunction(SearchKeyType * arg1,SearchKeyType * arg2)
{
return (*arg1)<(*arg2);
}
to search Element with value V you must write following:
Vertex * v = M[&V]; // assuming that element V exists in M