Gremlin - how to show field value instead label value - amazon-web-services

with gremlin I want to show a graph with vertex name a field name end not the label name.
How can do it?
If I use this
my graph
I see the label name and not a specific value field, for example Name field value
How I can modify my query?
Thanks a lot
I want a specific value field for vertex

That looks like you are using the graph-notebook Jupyter notebooks. You can use the -d and -de flags to tell the notebook how to label the diagram. For example:
%%gremlin -d name -de weight
will label nodes using a "name" property, and edges using a "weight" property.
Note that most of the magic commands, like %%gremlin support asking for help. You just need to put at least one character in the cell body. For example:
%%gremlin --help
x
There is additional documentation available here and here
There are also a number of sample notebooks that come included with the graph-notebook open source package. Several of those contain examples of how to customize graph visualizations. You may find those useful.

Related

How to update all GCP projects incorrect label for a particular folder with correct labels

I have a requirement below for GCP:
Find all the projects under a particular folder with labels attached. (i.e. Key/value) -- already had a list
update the incorrect labels "keys" with correct labels
update the incorrect label "values" with correct label values
if the label does not present in that project, it should create the labels
Preferring power-shell script but no hard rule as such

How do i arrange Single cardinality for Vertex properties imported via CSV into AWS Neptune?

Neptune documentation says they support "Set" property cardinality only on property data imported via CSV, which means there is no way that a newly arrived property value could overwrite the old property value on the same vertex, on the same property.
For example, if the first CSV imports
~id,~label,age
Marko,person,29
then Marko has a birthday & a second CSV imports
~id,~label,age
Marko,person,30
'Marko' vertex 'age' property will contain both age values, which doesn't seem useful.
AWS says this (collapsing Set to Single cardinality properties (keeping the last arrived value only) needs to be done with post-processing, via Gremlin traversals.
Does this mean that there should be a traversal that continuously scanning Vertexes with multiple (Set) properties and set the property once again with Single cardinality, with the last value possible? IF so, what is the optimal Gremlin query to do do that?
In pseudo-Gremlin i'd imagine something like:
g.V().property(single, properties(*), _.tail())
Is there a guarantee at all that Set-cardinality properties are always listed in order of arrival?
Or am i completely on the wrong track here.
Any help would be appreciated.
Update:
So the best thing i was able to come with up so far is still far from a perfect solution, but it still might be useful for someone in my shoes.
In Plan A if we happen to know the property names and the order of arrival does not matter at all (just want single cardinality on these props), the traversal for all vertexes could be something like:
g.V().has(${propname}).where(property(single, ${propname}, properties(${propname}).value().order().tail() ) )
The plan B is to collect new property values under temporary property names in the same vertex (eg. starting with _), and traverse through vertexes having such temporary property names and set original properties with their tailed values with single cardinality:
g.V().has(${temp_propname}).where(property(single, ${propname}, properties(${temp_propname}).value().order().tail() ) ).properties('temp_propname').drop()
The Plan C, which would be the coolest, but unfortunately does not work, is to keep collecting property values in a dedicated vertex, with epoch timestamps as property names, and property values as their values:
g.V(${vertexid}).out('has_propnames').properties()
==>vp[1542827843->value1]
==>vp[1542827798->value2]
==>vp[1542887080->latestvalue]
and sort the property names (keys), take the last one, and use its value to keep THE main vertex property value up-to-date with the latest value:
g.V().has(${propname}).where(out(${has_these_properties}).count().is(gt(0))).where(property(single, ${propname}, out(${has_these_properties}).properties().value( out(${has_these_properties}).properties().keys().order().tail() ) ) )
Looks like the parameter for value() step must be constant, it can't use the outcome of another traversal as parameter, so i could not get this working. Perhaps someone with more Gremlin experience know a workaround for this.
AWS have recently introduced 'single' cardinality support on CSV bulk loader:
https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-tutorial-format-gremlin.html
So no more Gremlin-level property value arrangement should be needed.
It would probably be more performant to read in the file from which you are bulk loading and set that property using the vertex id, rather than scanning for a vertex with multiple values for that property.
So your gremlin update query would be as follows.
g.V(${id})
.property(single,${key},${value})
In so far as whether set is a guaranteed order, I do not know. :(

Split attribute labels with delimiter for processing

I opened a csv file in Weka 3.8 and selected an attribute/column (picture below). The labels are delimited by a pipe character. There should be 23 distinct labels but Weka displays 914. Thus, Weka cannot visualize for too many values. Action is one label, adventure is another one, etc. Basically there can be more than one label per row.
For processing (eg. classification), How can separate those values so Weka can read them?
This question is similar to this. But the question asks about the date attribute (eg. "dd-MM-yyyy HH:mm"). This asks about a character-separated value (eg. "Action|Adventure|Drama")
Edit:
The data is taken from kaggle.
Ah, I had run into this problem too.
Firstly, ensure that the Genres attribute is recognised as a String type. If you are only using the GUI, go to Open File... and open the file (I presume it's a .dat file. If you've renamed it to .csv hit the check box which says "Invoke options dialog").
In the Generic Object Editor window, enter the index of the Genres attribute (here it's last).
Doing that will cause the attribute to look like this in the GUI.
Now choose the filter called StringToWordVector (weka.filters.unsupervised.attribute.StringToWordVector). Now under the Editor window, find the Tokenizer entry, click on its field, and under delimeters remove the defaults and add the pipe character. You may optionally edit the attribute prefix field as well.
Hit apply and find the required genres added in as numeric attributes, set to 0 for cases where the genre was not present in the original string, 1 otherwise.
StringToWordVector is a pretty useful filter, and there's much more to it in the docs: http://weka.sourceforge.net/doc.dev/weka/filters/unsupervised/attribute/StringToWordVector.html.

strange Train and test set are not compatible error in weka

I have read many solution about this error. But my problem is definitely different from the others: I have a "train" dataset(arff) and a "test" dataset(arff), both these two arff have an attribute "id"(string). It works well if I 'remove' "id" of these two arff at the same time(if I don't remove the id in "test" I will get an error); what confuse me is that my friend can do it by remove only the "id" in "train", so his output will contains the "id".
(since he didn't remove the "id" in the "test", the number of attribute will not be the same, and this is against what I read that the number of attribute should be exactly the same).
I really need an output that can contain the "id".
Maybe I did something wrong with the "remove"? I read somewhere said that the test feature may be superior to that of train. And also a paragraph talking about how to remove:"Instead of using a nominal ID attribute, declare it as STRING
attribute. With this you don't have to declare each possible value
like with NOMINAL attributes and it therefore doesn't matter what
strings are used in the test set that you're trying to use the trained
model on. In order to be able to work with this STRING ID attribute
you have to use the FilteredClassifier in conjunction with the Remove
filter (package weka.filters.unsupervised.attribute) and your original
base classifier. This setup will remove the ID attribute for the
learning process (i.e., the base classifier), but you'll still be able
to use it outside for tracking instances. "
http://weka.8497.n7.nabble.com/use-saved-model-td22857.html
Anyone have an idea?
Any help will be appreciated.
my 2 arff, left: train; right: test
left: output of myfriend with id such as test_subject1005 ; right: my output
Finally I got my solution. Just click directly the "supplied test set" and in the prompt interface click "Yes". That all! (It seems that I did not see this prompt before, so I did not try)

Remove Missing Values in Weka

I'm using a dataset in Weka for classfication that includes missing values. As far as I understood, Weka replaces them automatically with the Modes or Mean of the training data (using the filter unsupervised/attribute/ReplaceMissingValues) when using a classifier like NaiveBayes.
I would like to try removing them, to see how this effects the quality of the classifier. Is there a filter to do that?
See this answer below for a better, modern approach.
My approach is not the perfect one because IF you have more than 5 or 6 attributes then it becomes quite cumbersome to apply but I can suggest that MultiFilter should be used for this purpose if only a few attributes have missing values.
If you have missing values in 2 attributes then you'll use RemoveWithValues 2 times in a MultiFilter.
Load your data in Weka Explorer
Select MultiFilter from the Filter area
Click on MultiFilter and Add RemoveWithValues
Then configure each RemoveWithValues filter with the attribute index and select True in matchMissingValues
Save the filter settings and click Apply in Explorer.
Use the removeIf() method on weka.core.Instances using the method reference from weka.core.Instance for the hasMissingValue method, which returns a boolean if a given Instance has any missing values.
Instances dataset = source.getDataSet(); // for some source
dataset.removeIf(Instance::hasMissingValue);