Neo4j search unknown number of properties - list

I've got graph like that:
(A)-[r1]-(B)-[r2]-(C)
The thing is, that r1 and r2 can have different number of properties.
relation1:
index1: 10
index2: 2
relation2:
index1: 6
index2: 4
index3: 5
Is it possible to search among all properties without knowing their names? Or is there better way to keep lists in neo4j?

Property values can be lists, as long as all the elements are the same type. So you can have
match (A) -[r1]-> (B) -[r2]-> (C) set r1.vals = [10, 6], r2.vals=[6, 4, 5]
and later search with
match (A) -[r]-> (B) where 10 in r.vals return a,b
I don't know whether this works with indexing, so presumably tstorms' answer is better if you have a lot of these relationships.

There's no way to do this in "native" cypher, but you could use automatic relationship indexing that uses Lucene. I think you can do the following in Cypher:
START r=rel:rel_auto_index("*:'your_search_value'")
RETURN startNode(r), endNode(r), type(r);
Make sure automatic indexing is enabled in your Neo4j properties:
relationship_auto_indexing = true

Related

Elasticsearch scoring on multiple indexes: dfs_query_then_fetch returns the same scores as query_then_fetch

I have multiple indices in Elasticsearch (and the corresponding documents in Django created using django-elasticsearch-dsl). All of the indices have these settings:
settings = {'number_of_shards': 1,
'number_of_replicas': 0}
Now, I am trying to perform a search across all the 10 indices. In order to retrieve consistent scoring between the results from different indices, I am using dfs_query_then_fetch:
search = Search(index=['mov*'])
search = search.params(search_type='dfs_query_then_fetch')
objects = search.query("multi_match", query='Tom & Jerry', fields=['title', 'actors'])
I get bad results due to inconsistent scoring. A book called 'A story of Jerry and his friend Tom' from one index can be ranked higher than the cartoon 'Tom & Jerry' from another index. The reason is that dfs_query_then_fetch is not working. When I remove it or substitute with the simple query_then_fetch, I get absolutely the same results with the identical scoring.
I have tested it on URI requests as well, and I always get the same scores for both search types.
What can be the reason for it?
UPDATE: The results are actually not the same, but they are only really slightly different, e.g. a score of 50.1 with dfs and 50.0 without dfs, while the same model within one index has a score of 80.0.
If the number of shards is 1, then dfs_query_then_fetch and query_then_fetch will return the same result. DFS query will do a query to all shards and then show you results based on the scores computed, but in this case there is only one shard.
Regarding the scoring, you might wanna have a look at your actors field too. Also, do let us know what are the analyzer and tokenizer if you have used custom ones?

Report Builder- Nested If Statements with Multiple Values to Categorize

I am working in Report Builder and having issues creating a calculated field to categorize data from another column.
To simplify and explain my goal:
I’d like to create a calculated field with 4 distinct categories and I’m assuming the best way to do that is a nested if statement. Feel free to correct me if that is not the best function to use.
Category 1: Let’s just call it “A”
Category 2: “B“
Category 3: “C“
Category 4: “D”
Values from the other column:
Simplified Example-
Numbers 1-10 would be category A,
numbers 11-20 would be B,
numbers 21-30 would be C,
numbers 31-40 would be category D
However in my particular case the values aren’t nicely organized in those 10 consecutive ranges. For example, I have a 33 value that would be an A category, which makes it so I can’t use the greater than or less than operators.
Having explained my issue and goal- my question is how to write the syntax for an if statement when I have multiple discrete values that aren’t neatly organized in consecutive numerical order?
I hope this question makes sense.
I tried using just one argument to get it going and got stumped when it didn’t work:
Iif(field data = 1,2,3,4,5,6,7,8,9,10,33, “A”, “Other”)
It doesn’t work with the commas and I tried inserting the Or Operator between each value and that didn’t work either.
Thanks for any syntax tips you can provide.
There are a few ways you can do this.
Option 1: In your database design
The best way, in my opinion, is to do this in your database. Create a table with these values/category pairs and simply join to that whenever you need to include the categorised view of the data.
Option 2: In your report design
If you really have to do this in the report design, then using SWITCH() will probably be easier, certainly to read.
Given your second example, and expanding it a little you could do something like this...
=SWITCH(
(Fields!myData.Value >=1 AND Fields!myData.Value <=10) OR Fields!myData.Value = 33, "A",
(Fields!myData.Value >=11 AND Fields!myData.Value <=15) OR Fields!myData.Value = 34 OR Fields!myData.Value = 39, "B",
True, "Other"
)
SWITCH uses pairs of values, when the first value in the pair evaluates to true the seconds value in the pair is returned.
The final True, "Other" acts like an else. If not previous criteria matched, then the final pair always evaluates to true so "Other" would be returned.

Use case for "sets of tuple data" in Pyomo

When we specify the data for a set we have the ability to give it tuples of data. For example, we could write in our .dat file the following:
set A : 1 2 3 :=
1 + - -
2 - - +
3 - + +
This would specify that we would have 4 tuples in our set: (1,1), (2,3), (3,2), (3,3)
But I guess that I am struggling to understand exactly why we would want to do this? Furthermore, suppose we instantiated a Set object in our code as:
model.Aset = RangeSet(4, dimen=2)
Would this then specify that our tuples would have the indices 1, 2, 3, and 4?
I am thinking that specifying tuples in our set could potentially be useful when working with some data in which it's important to have a bit of a "spatial" understanding of the problem. But I would be curious to hear from the community what the potential applications of specifying set data this way might be.
The most common place this appears is when you're trying to model edges between nodes in a network. Networks aren't usually completely dense (have edges between every pair of nodes) so it's beneficial to represent just the edges that appear using a sparse set of tuples.

Data structure to multi-sort feature vector by attributes

I need to sort a vector with tuples
[
(a_11, ..., a_1n),
... ,
(a_m1, ..., a_mn)
]
based on a list of attributes and their comparison operators < or >.
For example: sort first by a_2 with the > operator and by a_57 with the < operator.
Question: I am looking for a data structure to do this efficiently under the assumption that sorting happens much more often than updates to the vector.
My current idea is to store the sorting order for each attribute by adding pointers similar to a linked list for each attribute:
For example, this vector:
0: (1, 7, 4)
1: (2, 5, 6)
2: (3, 4, 5)
Would get the data structure
0: (1 next:1 prev:-, 7 next:- prev:1, 4 next:2 prev:-)
1: (2 next:2 prev:1, 5 next:0 prev:2, 6 next:- prev:2)
2: (3 next:- prev:2, 4 next:1 prev:-, 5 next:1 prev:0)
Edit:
At any given time I need only one sorting order. After I get a user request for a different sorting order I need to recompute as quickly as possible.
The incremental idea is very good, but I need to make an estimate on how much time I need and this is way more easy if I have an idea how it should be done.
Once i am finished I need random access to groups of 100 elements, i.e. the first 100, the second 100, or elements 5100-5199.
I would use boost::MultiIndex for this. – drescherjm

Splitting a list based on another list values in Mathematica

In Mathematica I have a list of point coordinates
size = 50;
points = Table[{RandomInteger[{0, size}], RandomInteger[{0, size}]}, {i, 1, n}];
and a list of cluster indices these points belong to
clusterIndices = {1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1};
what is the easiest way to split the points into two separate lists based on the clusterIndices values?
EDIT:
The solution I came up with:
pointIndices =
Map[#[[2]] &,
GatherBy[MapIndexed[{#1, #2[[1]]} &, clusterIndices], First],
{2}];
pointsByCluster = Map[Part[points, #] &, pointIndices];
It there a better way to do this?
As #High Performance Mark and #Nicholas Wilson said, I'd start with combining the two lists together via Transpose or Thread. In this case,
In[1]:= Transpose[{clusterIndices, points}]==Thread[{clusterIndices, points}]
Out[1]:= True
At one point, I looked at which was faster, and I think Thread is marginally faster. But, it only really matters when you are using very long lists.
#High Performance Mark makes a good point in suggesting Select. But, it would only allow you to pull a single cluster out at a time. The code for selecting cluster 1 is as follows:
Select[Transpose[{clusterIndices, points}], #[[1]]==1& ][[All, All, 2]]
Since you seem to want to generate all clusters, I'd suggest doing the following:
GatherBy[Transpose[{clusterIndices, points}], #[[1]]& ][[All, All, 2]]
which has the advantage of being a one liner and the only tricky part was in selecting the correct Part of the resulting list. The trick in determining how many All terms are necessary is to note that
Transpose[{clusterIndices, points}][[All,2]]
is required to get the points back out of the transposed list. But, the "clustered" list has one additional level, hence the second All.
It should be noted that the second parameter in GatherBy is a function that accepts one parameter, and it can be interchanged with any function you wish to use. As such, it is very useful. However, if you'd like to transform your data as your gathering it, I'd look at Reap and Sow.
Edit: Reap and Sow are somewhat under used, and fairly powerful. They're somewhat confusing to use, but I suspect GatherBy is implemented using them internally. For instance,
Reap[ Sow[#[[2]], #[[1]] ]& /# Transpose[{clusterIndices, points}], _, #2& ]
does the same thing as my previous code without the hassle of stripping off the indices from the points. Essentially, Sow tags each point with its index, then Reap gathers all of the tags (_ for the 2nd parameter) and outputs only the points. Personally, I use this instead of GatherBy, and I've encoded it into a function which I load, as follows:
SelectEquivalents[x_List,f_:Identity, g_:Identity, h_:(#2&)]:=
Reap[Sow[g[#],{f[#]}]&/#x, _, h][[2]];
Note: this code is a modified form of what was in the help files in 5.x. But, the 6.0 and 7.0 help files removed a lot of the useful examples, and this was one of them.
Here's a succinct way to do this using the new SplitBy function in version 7.0 that should be pretty fast:
SplitBy[Transpose[{points, clusterIndices}], Last][[All, All, 1]]
If you aren't using 7.0, you can implement this as:
Split[Transpose[{points, clusterIndices}], Last[#]==Last[#2]& ][[All, All, 1]]
Update
Sorry, I didn't see that you only wanted two groups, which I think of as clustering, not splitting. Here's some code for that:
FindClusters[Thread[Rule[clusterIndices, points]]]
How about this?
points[[
Flatten[Position[clusterIndices, #]]
]] & /#
Union[clusterIndices]
I don't know about 'better', but the more usual way in functional languages would not be to add indices to label each element (your MapIndexed) but instead to just run along each list:
Map[#1[[2]] &,
Sort[GatherBy[
Thread[ {#1, #2} &[clusterIndices, points]],
#1[[1]] &], #1[[1]][[1]] < #2[[1]][[1]] &], {2}]
Most people brought up in Lisp/ML/etc will write the Thread function out instantly is the way to implement the zip ideas from those languages.
I added in the Sort because it looks like your implementation will run into trouble if clusterIndices = {2[...,2],1,...}. On the other hand, I would still need to add in a line to fix the problem that if clusterIndices has a 3 but no 2, the output indices will be wrong. It is not clear from your fragment how you are intending to retrieve things though.
I reckon you will find list processing much easier if you refresh yourself with a hobby project like building a simple CAS in a language like Haskell where the syntax is so much more suited to functional list processing than Mathematica.
If I think of something simpler I will add to the post.
Map[#[[1]] &, GatherBy[Thread[{points, clusterIndices}], #[[2]] &], {2}]
My first step would be to execute
Transpose[{clusterIndices, points}]
and my next step would depend on what you want to do with that; Select comes to mind.