How to use OR operator in Freebase? - web-services

Case:
So, I'm using the OR operator or ONE OF as to get people from any of 2 countries.
The query looks like:
[{
"id": null,
"type": "/people/person",
"/people/person/nationality": {
"name|=": [
"Jordan",
"Ottoman Empire"
]
},
"name": null,
"limit": 30
}]
The query works fine, but it won't work if you increase the limit to be 40 for example. The error returned is "Unique query may have at most one result. Got 2". This means that there exist a person for both nationalities "Jordan" and "Ottoman Empire".
Question:
It makes sense for a "ONE OF" operator, but not for "OR" operator. Is there any operator in Freebase that can query "ANY OF" or true "OR" to cover these cases?

You're getting the error because you used object notation ({}) which expects a single result in a place where you're returning two results and would those need an array ([]).
Having said that, I think what you really need to do is hoist your |= operator up a level to /people/person/nationality. Note also that you need array notation even if just asking for nationality results for a person, because it's multi-valued (e.g. Sirhan Sirhan has both Jordan and Mandatory Palestine as his nationality).
Here's a query that will do what you want (although you should really use IDs for the countries rather than their English labels):
[{
"id": null,
"name": null,
"nationality": [],
"type": "/people/person",
"nationality|=": [
"Jordan",
"Ottoman Empire"
]
}]

Related

Converting nested list into dictionary

Learning a new thing here - I've been trying to tackle a problem all day and haven't had much success. The idea is to loop through a nested list and return a dictionary. However, the first element of the list contains the column headers for the dictionary values. So here is the nested list, or table_data:
table_data = [
["first_name", "last_name", "city", "state"],
["Elisabeth", "Gardenar", "Toledo", "OH"],
["Jamaal", "Du", "Sylvania", "OH"],
["Kathlyn", "Lavoie", "Maumee", "OH"]
]
convert_table(table_data)
I want to convert the nested list into a dictionary as seen below. Basically, I'd like a function to take in the nested list then spit out the output as shown below.
[
{"first_name": "Elisabeth", "last_name": "Gardenar", "city": "Toledo", "state": "OH"},
{"first_name": "Jamaal", "last_name": "Du", "city": "Sylvania", "state": "OH"},
{"first_name": "Kathlyn", "last_name": "Lavoie", "city": "Maumee", "state": "OH"}
]
Here is some of the code I've been fiddling with so far, but am kind of stuck on how to get the elements of the first index of the list to repeat and become key's for the values in the rest of the dictionary.
Thank you!
for i in range(len(table_data)):
for j in table_data[0]:
print(j)
for i in table_data:
for j in i:
print(j)
Being aware that Python dictionaries are inherently unordered you can do it this way:
lst = []
for row in table_data[1:]:
lst.append(dict(zip(table_data[0],row)))
If you need to preserve order use an OrderedDict like so:
import collections as co
lst_ordered = []
for row in table_data[1:]:
lst_ordered.append(co.OrderedDict(zip(table_data[0],row)))
Bernie got this right. Now I'm going to study it. Thank you.
lst = []
for row in table_data[1:]:
lst.append(dict(zip(table_data[0],row)))

How to split string in MongoDB?

The example data is as following:
{"BrandId":"a","Method":"PUT","Url":"/random/widgets/random/state"}
{"BrandId":"a","Method":"POST","Url":"/random/collection/random/state"}
{"BrandId":"b","Method":"PUT","Url":"/random/widgets/random/state"}
{"BrandId":"b","Method":"PUT","Url":"/random/widgets/random/state"}
I need to find all the rows with method=put and Url in a pattern /random/widgets/random/state. "random" is a random string with a fixed length. the expected result is :
{"BrandId":"a","total":1}
{"BrandId":"b","total":2}
I tried to write so code as :
db.accessLog.aggregate([
{$group: {
_id: '$BrandId',
total: {
$sum:{
$cond:[{$and: [ {$eq: ['$Method', 'POST']},
{Url:{$regex: /.*\/widgets.*\/state$/}} ]}, 1, 0]
}
},
{$group: {
_id: '$_id',
total:{$sum:'$total'}
}
])
but the regular expression does not work, so I suppose I need to try other way to do it, perhaps split string. And I need to use $cond. please keep it. Thanks!
You can use the following query to achieve what you want, I assume the data in a collection named 'products'
db.products.aggregate([
{$match : {'Method':'PUT','Url':/.*widgets.*\/state$/ }},
{$group: {'_id':'$BrandId','total':{$sum: 1} }}
]);
1. $match:
Find all documents that has 'PUT' method and Url in the specified pattern.
2. $group: Group by brand Id and for each entry, count 1
Greedy matching is the problem.
Assuming non-zero number of 'random' characters (sounds sensible), try a regex of:
/[^\/]+\/widgets\/[^\/]+\/state$/

Find match over Array of RegEx in MongoDB Collection

Say I have a collection with these fields:
{
"category" : "ONE",
"data": [
{
"regex": "/^[0-9]{2}$/",
"type" : "TYPE1"
},
{
"regex": "/^[a-z]{3}$/",
"type" : "TYPE2"
}
// etc
]
}
So my input is "abc" so I'd like to obtain the corresponding type (or best match, although initially I'm assuming RegExes are exclusive). Is there any possible way to achieve this with decent performance? (that would be excluding iterating over each item of the RegEx array)
Please note the schema could be re-arranged if possible, as this project is still in the design phase. So alternatives would be welcomed.
Each category can have around 100 - 150 RegExes. I plan to have around 300 categories.
But I do know that types are mutually exclusive.
Real world example for one category:
type1=^34[0-9]{4}$,
type2=^54[0-9]{4}$,
type3=^39[0-9]{4}$,
type4=^1[5-9]{2}$,
type5=^2[4-9]{2,3}$
Describing the RegEx (Divide et Impera) would greatly help in limiting the number of Documents needed to be processed.
Some ideas in this direction:
RegEx accepting length (fixed, min, max)
POSIX style character classes ([:alpha:], [:digit:], [:alnum:], etc.)
Tree like Document structure (umm)
Implementing each of these would add to the complexity (code and/or manual input) for Insertion and also some overhead for describing the searchterm before the query.
Having mutually exclusive types in a category simplifies things, but what about between categories?
300 categories # 100-150 RegExps/category => 30k to 45k RegExps
... some would surely be exact duplicates if not most of them.
In this approach I'll try to minimise the total number of Documents to be stored/queried in a reversed style vs. your initial proposed 'schema'.
Note: included only string lengths in this demo for narrowing, this may come naturally for manual input as it could reinforce a visual check over the RegEx
Consider rewiting the regexes Collection with Documents as follows:
{
"max_length": NumberLong(2),
"min_length": NumberLong(2),
"regex": "^[0-9][2]$",
"types": [
"ONE/TYPE1",
"NINE/TYPE6"
]
},
{
"max_length": NumberLong(4),
"min_length": NumberLong(3),
"regex": "^2[4-9][2,3]$",
"types": [
"ONE/TYPE5",
"TWO/TYPE2",
"SIX/TYPE8"
]
},
{
"max_length": NumberLong(6),
"min_length": NumberLong(6),
"regex": "^39[0-9][4]$",
"types": [
"ONE/TYPE3",
"SIX/TYPE2"
]
},
{
"max_length": NumberLong(3),
"min_length": NumberLong(3),
"regex": "^[a-z][3]$",
"types": [
"ONE/TYPE2"
]
}
.. each unique RegEx as it's own document, having Categories it belongs to (extensible to multiple types per category)
Demo Aggregation code:
function () {
match=null;
query='abc';
db.regexes.aggregate(
{$match: {
max_length: {$gte: query.length},
min_length: {$lte: query.length},
types: /^ONE\//
}
},
{$project: {
regex: 1,
types: 1,
_id:0
}
}
).result.some(function(re){
if (query.match(new RegExp(re.regex))) return match=re.types;
});
return match;
}
Return for 'abc' query:
[
"ONE/TYPE2"
]
this will run against only these two Documents:
{
"regex": "^2[4-9][2,3]$",
"types": [
"ONE/TYPE5",
"TWO/TYPE2",
"SIX/TYPE8"
]
},
{
"regex": "^[a-z][3]$",
"types": [
"ONE/TYPE2"
]
}
narrowed by the length 3 and having the category ONE.
Could be narrowed even further by implementing POSIX descriptors (easy to test against the searchterm but have to input 2 RegExps in the DB)
Breadth first search.
If your input starts with a letter you can throw away type 1, if it also contains a number you can throw away exclusive(numbers only or letters only) categories, and if it also contains a symbol then keep only a handful of types containing all three. Then follow above advice for remaining categories. In a sense, set up cases for input types and use cases for a select number of 'regex types' to search down to the right one.
Or you can create a regex model based on the input and compare it to the list of regex models existing as a string to get the type. That way you just have to spend resources analyzing the input to build the regex for it.

How do I get unique values of a column in AWS Dynamo?

Say, in AWS Dynamo, I have a table like this:
ID (HKey) Date (RKey) BoxName
0 1/1/2014 Box-1
1 2/1/2014 Box-1
2 3/1/2014 Box-2
3 4/1/2014 Box-3
4 5/1/2014 Box-3
5 5/1/2014 Box-1
I want to, in a single query, get the first row for each unique Box. There could be hundreds of boxes I need the first entry for at once, making individual requests inefficient.
I can't seem to find anything in the API that would allow me to do this. Is it possible? How would I do this?
You might want to consider creating a Global secondary index (GSI) on Boxname (hash key) and date as your range key. This will enable you to use Query API on the secondary index where you can query "Find all IDs with Boxname = $box".
See the documentation for GSI.
Hope this helps,
Swami
There's no way to query just for the first appearance of each box without creating an index for the boxes as suggested above. However, if you don't mind reading the whole table and then picking the right lines, then read the whole table into an array, and then make it unique by some simple piece of code. For example, suppose you've read the table into an array (note that you might have to make several calls to scan or query until you get them all), and the array is something like this:
l = [
{"ID": "0", "Date": "1/1/2014", "BoxName": "Box-1"},
{"ID": "1", "Date": "2/1/2014", "BoxName": "Box-1"},
{"ID": "2", "Date": "3/1/2014", "BoxName": "Box-2"},
{"ID": "3", "Date": "4/1/2014", "BoxName": "Box-3"},
{"ID": "4", "Date": "5/1/2014", "BoxName": "Box-3"},
{"ID": "5", "Date": "5/1/2014", "BoxName": "Box-1"}
]
Then, a simple code like this in python will give you the list in the variable "out":
out = []
seen = []
for line in l:
if line["BoxName"] not in seen:
seen.append(line["BoxName"])
out.append(line)

How can i get the number of members in a event on facebook?

I need to get the number of members who set the option "GOING", "MAYBE" and "INVITED".
There is a way to do this?
You can use
/v1.0/{event_id}/invited?summary=true&limit=0
// use "limit=0" if you're not interested in data, just the summary
which returns info, such as:
{
"data": [
],
"summary": {
"maybe_count": 0,
"declined_count": 0,
"attending_count": 3,
"count": 3
}
}
Facebook doesn't expose a count method on these tables. The only way to do this is to make a query that returns all the members and count them with your script.
Depending on how you want to process the information you have several ways to attack it
graph.facebook.com/{event_id}/invited - Returns all invited users and each one has an rsvp_status value that does not quite match up to the other options below (attending, unsure, declined, not_replied)
graph.facebook.com/{event_id}/attending - exactly what you would expect
graph.facebook.com/{event_id}/maybe - exactly what you would expect
graph.facebook.com/{event_id}/noreply - exactly what you would expect
graph.facebook.com/{event_id}/declined - exactly what you would expect