graph-tools BFS search speed up

graph-tools BFS search speed up - python-2.7

I have tried to set up graph-tools BFS search on a ubuntu system. Due to a bug I have been forced to use the graph_tools.bfs_search() instead of graph_tools.bfs_iterator.
So I did set up a minimal class as mentioned in the examples inheriting from graph_tools.BFSVisitor.
The goal of this is to track all of the edge source nodes and edge action values pairs reachable from a specific node in a graph and store them in a numpy array, such that the array will have the following dimensions: (num_nodes, num_actions) where num_actions is equal to the maximum out-degree in the graph.
The function does its jobs but accessing the edge PropertyMap self.edge_action[edge] of the graph in order to retrieve the edge action is a huuge bottleneck and significantly slows down my code. But since I tried to use graph-tools only for speed in the first place, I am a little bit stuck right now.
Am I missing on something about the graph-tools library or is there no way to speed this up? Otherwise I might as well just go back to networkx and try to find the fastest way there.
I simply cannot think of a way to avoid using this pythonic loop to access edge actions in order to use the c++ power of graph-tools.
Here my simple class:
class SetIterator(gt.BFSVisitor):
def __init__(self, action, safe_set):
"""
action: gt.PropertyMap
edge property representing edge action
safe_set: np.array
array used to store safe node, action pairs
"""
self.edge_action = action
self.ea = self.edge_action
self.safe_set = safe_set
def discover_vertex(self, u):
"""
Invoked on first encounter of vertex.
Parameters
----------
u: gt.Vertex
"""
self.safe_set[int(u), 0] = True
def examine_edge(self, e):
"""
Called when edge is checked
Parameters
----------
e: gt.Edge
"""
# TODO This one line is bottleneck of code
self.safe_set[int(e.source()), self.ea[e]] = True

Related

Slow insertion using Neptune and Gremlin

I'm having problems with the insertion using gremlin to Neptune.
I am trying to insert many nodes and edges, potentially hundred thousands of nodes and edges, with checking for existence.
Currently, we are using inject to insert the nodes, and the problem is that it is slow.
After running the explain command, we figured out that the problem was the coalesce and the where steps - it takes more than 99.9% of the run duration.
I want to insert each node and edge only if it doesn’t exist, and that’s why I am using the coalesce and where steps.
For example, the query we use to insert nodes with inject:
properties_list = [{‘uid’:’1642’},{‘uid’:’1322’}…]
g.inject(properties_list).unfold().as_('node')
.sideEffect(__.V().where(P.eq('node')).by(‘uid).fold()
.coalesce(__.unfold(), __.addV(label).property(Cardinality.single,'uid','1')))
With 1000 nodes in the graph and properties_list with 100 elements, running the query above takes around 30 seconds, and it gets slower as the number of nodes in the graph increases.
Running a naive injection with the same environment as the query above, without coalesce and where, takes less than 1 second.
I’d like to hear your suggestions and to know what are the best practices for inserting many nodes and edges (with checking for existence).
Thank you very much.

If you have a set of IDs that you want to check for existence, you can speed up the query significantly by also providing just a list of IDs to the query and calculating the intersection of the ones that exist upfront. Then, having calculated the set that need updates you can just apply them in one go. This will make a big difference. The reason you are running into problems is that the mid traversal V has a lot of work to do. In general it would be better to use actual IDs rather than properties (UID in your case). If that is not an option the same technique will work for property based IDs. The steps are:
Using inject or sideEffect insert the IDs to be found as one list and the corresponding map containing the changes to conditionally be applied in a separate map.
Find the intersection of the ones that exist and those that do not.
Using that set of non existing ones, apply the updates using the values in the set to index into your map.
Here is a concrete example. I used the graph-notebook for this but you can do the same thing in code:
Given:
ids = "['1','2','9998','9999']"
and
data = "[['id':'1','value':'XYZ'],['id':'9998','value':'ABC'],['id':'9999','value':'DEF']]"
we can do something like this:
g.V().hasId(${ids}).id().fold().as('exist').
constant(${data}).
unfold().as('d').
where(without('exist')).by('id').by()
which correctly finds the ones that do not already exist:
{'id': 9998, 'value': 'ABC'}
{'id': 9999, 'value': 'DEF'}
You can use this pattern to construct your conditional inserts a lot more efficiently (I hope :-) ). So to add the new vertices you might do:
g.V().hasId(${ids}).id().fold().as('exist').
constant(${data}).
unfold().as('d').
where(without('exist')).by('id').by().
addV('test').
property(id,select('d').select('id')).
property('value',select('d').select('value'))
v[9998]
v[9999]
As a side note, we are adding two new steps to Gremlin - mergeV and mergeE that will allow this to be done much more easily and in a more declarative style. Those new steps should be part of the TinkerPop 3.6 release.

About autograd in pyorch, Adding new user-defined layers, how should I make its parameters update?

everyone !
My demand is a optical-flow-generating problem. I have two raw images and a optical flow data as ground truth, now my algorithm is to generate optical flow using raw images, and the euclidean distance between generating optical flow and ground truth could be defined as a loss value, so it can implement a backpropagation to update parameters.
I take it as a regression problem, and I have to ideas now:
I can set every parameters as (required_grad = true), and compute a loss, then I can loss.backward() to acquire the gradient, but I don’t know how to add these parameters in optimizer to update those.
I write my algorithm as a model. If I design a “custom” model, I can initilize several layers such as nn.Con2d(), nn.Linear() in def init() and I can update parameters in methods like (torch.optim.Adam(model.parameters())), but if I define new layers by myself, how should I add this layer’s parameters in updating parameter collection???
This problem has confused me several days. Are there any good methods to update user-defined parameters? I would be very grateful if you could give me some advice!

Tensor values have their gradients calculated if they
Have requires_grad == True
Are used to compute some value (usually loss) on which you call .backward().
The gradients will then be accumulated in their .grad parameter. You can manually use them in order to perform arbitrary computation (including optimization). The predefined optimizers accept an iterable of parameters and model.parameters() does just that - it returns an iterable of parameters. If you have some custom "free-floating" parameters you can pass them as
my_params = [my_param_1, my_param_2]
optim = torch.optim.Adam(my_params)
and you can also merge them with the other parameter iterables like below:
model_params = list(model.parameters())
my_params = [my_param_1, my_param_2]
optim = torch.optim.Adam(model_params + my_params)
In practice however, you can usually structure your code to avoid that. There's the nn.Parameter class which wraps tensors. All subclasses of nn.Module have their __setattr__ overridden so that whenever you assign an instance of nn.Parameter as its property, it will become a part of Module's .parameters() iterable. In other words
class MyModule(nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.my_param_1 = nn.Parameter(torch.tensor(...))
self.my_param_2 = nn.Parameter(torch.tensor(...))
will allow you to write
module = MyModule()
optim = torch.optim.Adam(module.parameters())
and have the optim update module.my_param_1 and module.my_param_2. This is the preferred way to go, since it helps keep your code more structured
You won't have to manually include all your parameters when creating the optimizer
You can call module.zero_grad() and zero out the gradient on all its children nn.Parameters.
You can call methods such as module.cuda() or module.double() which, again, work on all children nn.Parameters instead of requiring to manually iterate through them.

Abaqus Python 'Getclosest' command

I'm using the getclosest command to find a vertex.
ForceVertex1 = hatInstance.vertices.getClosest(coordinates=((x,y,z,))
This is a dictionary object with Key 0 and two values (hatInstance.vertices[1] and the coordinates of the vertex) The specific output:
{0: (mdb.models['EXP-100'].rootAssembly.instances['hatInstance-100'].vertices[1], (62.5242172081597, 101.192447407436, 325.0))}
Whenever I try to create a set, the vertex isn't accepted
mainAssembly.Set(vertices=ForceVertex1[0][0],name='LoadSet1')
I also tried a different way:
tolerance = 1.0e-3
vertex = []
for vertex in hatInstance.vertices:
x = vertex.pointOn[0][0]
print x
y = vertex.pointOn[0][1]
print y
z = vertex.pointOn[0][2]
print z
break
if (abs(x-xTarget)) < tolerance and abs(y-yTarget) < tolerance and abs(z-zTarget) < tolerance):
vertex.append(hatInstance.vertices[vertex.index:vertex.index+1])
xTarget etc being my coordinates, despite this I still don't get a vertex object

For those struggeling with this, I solved it.
Don't use the getClosest command as it returns a dictionary object despite the manual recommending this. I couldn't convert this dictionary object, specifically a key and a value within to a standalone object (vertex)
Instead use Instance.vertices.getByBoundingSphere(center=,radius=)
The center is basically a tuple of the coordinates and the radius is the tolerance. This returns an array of vertices

If you want the geometrical object you just have to access the dictionary.
One way to do it is:
ForceVertex1 = hatInstance.vertices.getClosest(coordinates=((x,y,z,))[0][0]
This will return the vertices object only, which you can assign to a set or whatever.

Edit: Found a solution to actually address the original question:
part=mdb.models[modelName].parts[partName]
v=part.vertices.getClosest(coordinates=(((x,y,z)),))
Note the formatting requirement for coordinates ((( )),), three sets of parenthesis with a comma. This will find the vertex closest to the specified point. In order to use this to create a set, I found you need to massage the Abaqus Python interface to return the vertex in a format that uses their "getSequenceFromMask" method. In order to create a set, the edges, faces, and/or vertices need to be of type "Sequence", which is internal to Abaqus. To do this, I then use the following code:
v2=part.verticies.findAt((((v[0][1])),))
part.Set(name='setName', vertices=v2)
Note, v[0][1] will give you the point at which the vertex lies on. Note again the format of the specified point using the findAt method (((point)),) with three sets of parenthesis and a comma. This will return a vertex that uses the getSequenceFromMask method in Abaqus (you can check by typing v2 then enter in the python box at the bottom of CAE, works with Abaqus 2020). This is type "Sequence" (you can check by typing type(V2)) and this can be used to create a set. If you do not format the point in findAt correctly (e.g., findAt(v[0][1]), without the parenthesis and comma), it will return an identical vertex as you get by accessing the dictionary returned using getClosest (e.g., v[0][0]). This is type 'Vertex' and cannot be used to create a set, even though it asks for a vertex. If you know the exact point where the vertex is, then you do not need the first step. You can simply use the findAt method with the correct formatting. However, the tolerance for findAt is very small (1e-6) and will return an empty sequence if nothing is found within the tolerance. If you only have a ballpark idea of where the vertex is located, then you need to use the getClosest method first. This indeed gets the closest vertex to the specified point, which may or may not be the one you are interested in.
Original post:
None of these answers work for a similar problem I am having while trying to create a set of faces within some range near a point. If I use getClosest as follows
f=mdb.models['Model-1'].parts['Part-1'].faces.getClosest(coordinates=((0,0,0),), searchTolerance=1)
mdb.models['Model-1'].parts['Part-1'].Set(faces=f, name='faceSet')
I get an error "TypeError: Keyword error on faces".
If I access the dictionary via face=f[0], I get error "Feature Creation Failed". If I access the tuple within the dictionary via f[0][0], I get the error "TypeError: keyword error on faces" again.
The option to use .getByBoundingSphere doesn't work either, because the faces in my model are massive, and the faces have to be completely contained within the sphere for Abaqus to "get" them, basically requiring me to create a sphere that encompasses the entire model.
My solution was to create my own script as follows:
import numpy as np
model=mdb.models['Model-1']
part=model.parts['Part-1']
faceSave=[]
faceSave2=[]
x=np.arange(-1,1,0.1)
y=np.arange(-1,1,0.1)
z=np.arange(-1,1,0.1)
for x1 in x:
for y1 in y:
for z1 in z:
f=part.faces.findAt(((x1,y1,z1),))
if len(f)>0:
if f[0] in faceSave2:
None
else:
faceSave.append(f)
faceSave2.append(f[0])
part.Set(faces=faceSave,name='faceSet')
This works, but it's extraordinarily slow, in part because "findAt" will throw a warning to the console whenever it doesn't find a face, and it usually doesn't find a face with this approach. The code above basically looks within a small cube for any faces, and puts them in the list "faceSave". faceSave2 is setup to ensure that duplicate faces aren't added to the list. Accessing the tuple (e.g, f[0] in the code above) contains the unique information about the face, whereas 'f' is just a pointer to the 'findAt' command. Strangely, you can use the pointer 'f' to create a Set, but you cannot use the actual face object 'f[0]' to create a set. The problem with this approach for general use is, the tolerance for "findAt" is super small, so, you either have to be confident where things are located in your model, or have the step size be 1e-6 in np.arange() to ensure you don't miss a face that's in the cube. With a tiny step size, expect the code to take forever.
At any rate, I can use a tuple (or a list of tuples) obtained via "findAt" to create a Set in Abaqus. However, I cannot use the tuple obtained via "getClosest" to make a set, even though I see no difference between the two objects. It's unfortunate, because getClosest gives me the exact info I need effectively immediately without my jumbled mess of for-loops.

#anarchoNobody:
Thank you so much for your edited answer!
This workaround works great, also with faces. I spent a lot of hours trying to figure out why .getClosest does not provide a working result for creating a set, but with the workaround and the number of brackets it works.
If applied with several faces, the code has to be slightly modified:
faces=((mdb.models['Model-1'].rootAssembly.instances['TT-1'].faces.getClosest(
coordinates=(((10.0, 10.0, 10.0)),), searchTolerance=2)),
(mdb.models['Model-1'].rootAssembly.instances['TT-1'].faces.getClosest(
coordinates=((-10.0, 10.0, 10.0),), searchTolerance=2)),)
faces1=(mdb.models['Model-1'].rootAssembly.instances['Tube-1'].faces.findAt((((
faces[0][0][1])),)),
mdb.models['Model-1'].rootAssembly.instances['Tube-1'].faces.findAt((((
faces[1][0][1])),)),)
mdb.models['Model-1'].rootAssembly.Surface(name='TT-inner-surf', side1Faces=faces1)
```

Create rows of arrays (or tables) using pytables of arbitrary shape

I execute a third party program (here refered as program B for sake of simplicity) through a script which is handled by a python program (main program). In order to give you a global overview of what I am trying to do, here is a very simplified list of executed tasks by the main program:
Execute the program B and wait for it to finish.
Once B has finished, it reads the B outputs which have been stored into an ASCII file by B.
Format the outputs into an ensemble of 1 dimensional array that are then stored into an HDF5 file, using the Pytables modules.
Go back to step 1, using a new set of parameters for B, until an exit condition is True.
My problem is in step 3. Pytables seems to handle very well tables of known shapes. In my case, I know the shape of the outputs of B only after execution of B. And from one iteration to the other, the shape of my outputs varies.
Below is the code that I wrote for handling fixed shape outputs of B, and using some solutions provided in stackoverflow for similar (but not identical) issues. This solution is not satisfactory in my case because here, the shapes must be invariant.
So my question is how would you adapt this code in the case of each row having a different shape? I saw in another post some possibilities (In PyTables, how to create nested array of variable length?), but I am not yet well familiar with EArray, VLArray. Furthermore, it seems not to be really efficient methods.
def makemytable1D(filepointer, group, tablename, labels, shapes):
#Declare the dictionary
template = {}
# make all columns
for i in np.arange(len(labels)):
template[labels[i]]=tables.Float64Col(shape=shapes[i], pos=i)
table = filepointer.create_table(group,tablename,template)
return table, template
def fillmytable1D(table, labels, data, Ndata):
tablerow=table.row
for i in np.arange(Ndata):
tablerow[labels[i]]=data[i]
tablerow.append()
table.flush()
# ----------- Execution -----------
import numpy as np
import tables
labels=np.array(['Field1','Field2','Field3','Field4','Field5']) # example of labels
data=np.array([[0,1], [2,2,2,2], [3,3,3], [4,4], [5,5]]) # example of data
shapes=[]
for d in data:
shapes.append(np.array(d).shape)
Ndata=len(data)
try:
saveFile=tables.open_file('save.hdf','w')
group=saveFile.create_group('/', 'group1', 'Model 1')
tab, template=makemytable1D(saveFile, group, 'test', labels, shapes)
for i in range(10): # The iteration. In my real life problem, data has a shape that varies at each iteration.The current example would not work here.
fillmytable1D(tab, labels, data, Ndata)
finally:
saveFile.close()

Need a same relative level nodes (siblings) DAG algorithm in Python

So I am currently working on a quick and dirty Python project that supports a data structure made out of a dictionary with keys being GOIDs from the open biological ontology format. It hashses to another dictionary that contains lists of parent nodes or terms and children nodes or terms that helps me form lists with all children or all ancestors for a given node in the ontology ( working with GO .obo file, if that helps anyone ).
My problem is that I have been looking for an algorithm that will help me return all the same nodes on the same level as a given node id which has to be relative because there could be more than one path to a node ( it is a directed acyclic graph, but there can be multiple parents per node ). I essentially need to look up the parents of a node, store the children of the parents all on a common list, and then repeat this process on every node added without repeating nodes or slowing down the computation significantly.
I'm think this can easily be done using a set to prevent duplicate entries, and just keeping track of which parents I have visited until all parents of siblings have been visited without being able to add a new parent, but my suspicions are this might be terribly inefficient. If anyone has experience with this kind of algorithm, and insights would be highly appreciated! Hope this is clear enough for a response.
Thanks!

Ok guys, so this is what I have developed so far, but it seems to keep giving me wrong values for some strange reason. Is there a minor error anyone can see where I accidentally not terminating correctly?
# A helper function to find generations of a given node
def getGenerationals(self,goid):
quit = False
visitedParents = set()
generation = set()
tempGen = set()
generation.add(goid)
while not quit:
quit = True
generation |= tempGen
tempGen = set()
print "TEMP GEN:",tempGen
for g in generation:
parents = set(self._terms[g]['p'])
for p in parents:
if p not in visitedParents:
visitedParents.add(p)
print "Parent:",p
quit = False
tempGen |= set(self._terms[p]['c'])
raw_input("Break")
return generation
# Working function
def getGeneration(self,goid):
generation = list(self.getGenerationals(goid))
generation.remove(goid)
return list(generation)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

graph-tools BFS search speed up - python-2.7

Related

Slow insertion using Neptune and Gremlin

About autograd in pyorch, Adding new user-defined layers, how should I make its parameters update?

Abaqus Python 'Getclosest' command

Create rows of arrays (or tables) using pytables of arbitrary shape

Need a same relative level nodes (siblings) DAG algorithm in Python

Categories

Resources