Distinct in CouchDB with input keys - mapreduce

Given documents like:
[
{"id":1, "category": "cat1", "type": "type1", "line": "line1"},
{"id":2, "category": "cat1", "type": "type1", "line": "line1"},
{"id":3, "category": "cat1", "type": "type2", "line": "line1"},
{"id":4, "category": "cat1", "type": "type1", "line": "line2"},
{"id":5, "category": "cat2", "type": "type1", "line": "line2"},
{"id":6, "category": "cat2", "type": "type1", "line": "line3"}
]
I want to be able to pass in the category and type keys and get back the distinct lines e.g. pass in keys of "cat1" and "type1" and get back ["line1", "line2"] or pass in "cat2" and "type1" and get back ["line2", "line3"]
Easier enough if I am not passing in keys:
map
function(doc) {
emit([doc.line]);
}
reduce
function(keys, values) {
return null;
}
I am using group: true, but stumped on how to handle this when passing in keys.
PS, using node and nano so query looks similar to:
db.view('catalog', 'lines', {key: ['cat1', 'type1'], group: true}, function (err, body) {
...
});

I want to be able to pass in the category and type keys and get back the distinct lines i.e. pass in keys of "cat1" and "type1" and get back ["line1", "line2"] or pass in "cat2" and "type1" and get back ["line2", "line3"]
You can get that by querying the following map and reduce with the right parameters:
map
function(o) {
emit([o.category, o.type, o.line]);
}
reduce
_count
queries
For "cat1" and "type1":
/mydb/_design/mydesign/myview?group=true&startkey=["cat1","type1"]&endkey=["cat1","type1",{}]
{"rows":[
{"key":["cat1","type1","line1"],"value":2},
{"key":["cat1","type1","line2"],"value":1}
]}
For "cat2" and "type1":
/mydb/_design/mydesign/myview?group=true&startkey=["cat2","type1"]&endkey=["cat2","type1",{}]
{"rows":[
{"key":["cat2","type1","line2"],"value":1},
{"key":["cat2","type1","line3"],"value":1}
]}

Related

How do I query an AWS OpenSearch index using a Vega visualization?

I have data in an index in JSON format, and I want to use a Vega visualization to display this (full Vega, not Vega-Lite). I've found however that every example out there is for Vega-Lite and all they're trying to do it stick their data into a time series graph. I'd like to do something different, and thus I find myself at a dead-end.
A sample doc in my index:
{
"_index": "myindex",
"_type": "doc",
"_id": "abc123",
"_version": 1,
"_score": null,
"timestamp": "2022-05-23T07:43:21.123Z",
"_source": {
"fruit": [{
"amount": 15,
"type": {
"grower_id": 47,
"grower_country": "US",
"name": "apple"
}
},
{
"amount": 43,
"type": {
"grower_id": 47,
"grower_country": "CAN",
"name": "apple"
}
},
{
"amount": 7,
"type": {
"grower_id": 23,
"grower_country": "US",
"name": "orange"
}
},
{
"amount": 14,
"type": {
"grower_id": 23,
"grower_country": "CAN",
"name": "orange"
}
}
]
}
}
What I want to do is create 2 text marks on the visualization that will display the sum of the values as follows.
Symbol1 = sum of all apples (i.e. all apples grown in the US and CAN combined)
Symbol2 = sum of all oranges (i.e. all oranges grown in the US and CAN combined)
I tried the following data element with no success:
"data": [{
"name": "mydata",
"url": {
"index": "myindex",
"body": {
"query": "fruit.type.name:'apple'",
},
}
}
]
However obviously this query isn't even correct. What I want to be able to do is return a table of values and then be able to use those values in my marks as values to drive the mark behaviour or color. I'm comfortable with doing the latter in Vega, but getting the data queried is where I'm stuck.
I've read and watched so many tutorials which cover Vega-Lite, but I'm yet to find a single working example for Vega on AWS OpenSearch.
Can anyone please help?

AWS StepFunctions - Merge and flatten the task output combined with the original input

How do we use Parameters, ResultPath and ResultSelector to combine the results of a Task with the original input in the same JSON level?
I checked the documentation on AWS, but it seems that ResultSelector always create a new dictionary which puts it in 1-level below on the result.
Example input
{
"status": "PENDING",
"uuid": "00000000-0000-0000-0000-000000000000",
"first_name": "John",
"last_name": "Doe",
"email": "john.doe#email.com",
"orders": [
{
"item_uuid": "11111111-1111-1111-1111-111111111111",
"quantities": 2,
"price": 2.38,
"created_at": 16049331038000
}
]
}
State Machine definition
"Review": {
"Type": "Task",
"Resource": "arn:aws:states:us-east-1:123456789012:activity:Review",
"ResultPath": null,
"Next": "Processing",
"Parameters": {
"task_name": "REVIEW_REQUIRED",
"uuid.$": "$.uuid"
}
},
Example output from Review Activity
{
"review_status": "APPROVED"
}
Question
How do I update the State Machine definition to combined the result of Review Activity and the original input to something as below?
{
"status": "PENDING",
"uuid": "00000000-0000-0000-0000-000000000000",
"first_name": "John",
"last_name": "Doe",
"email": "john.doe#email.com",
"orders": [
{
"item_uuid": "11111111-1111-1111-1111-111111111111",
"quantities": 2,
"price": 2.38,
"created_at": 16049331038000
}
],
"review_status": "APPROVED"
}
NOTE
I don't have access to the Activity code, just the definition file.
I recommend NOT doing the way suggested above as you will drop all data that you do not include. It's not a long term approach, you can more easily do it like this:
Step Input
{
"a": "a_value",
"b": "b_value",
"c": {
"c": "c_value"
}
}
In your state-machine.json
"Flatten And Keep All Other Keys": {
"Type": "Pass",
"InputPath": "$.c.c",
"ResultPath": "$.c",
"Next": "Some Other State"
}
Step Output
{
"a": "a_value",
"b": "b_value",
"c": "c_value"
}
While Step Function does not allow you to do so, you can create a Pass state that flattens the input as a workaround.
Example Input:
{
"name": "John Doe",
"lambdaResult": {
"age": "35",
"location": "Eastern Europe"
}
}
Amazon State Language:
"Flatten": {
"State": "Pass",
"Parameters": {
"name.$" : "$.name",
"age.$" : "$.lambdaResult.age",
"location.$": "$.lambdaResult.location"
},
"Next": "MyNextState"
}
Output:
{
"name": "John Doe",
"age": "35",
"location": "Eastern Europe"
}
It's tedious, but it gets the job done.
Thanks for your question.
It looks like you don't necessarily need to manipulate the output in any way, and are looking for a way to combine the state's output with its input before passing it on to the next state. The ResultPath field allows you to combine a task result with task input, or to select one of these. The path you provide to ResultPath controls what information passes to the output.

How to filter a elasticsearch query with items in a list

I am running an elasticsearch query but now I want to filter it by searching for the value of "result" which is already defined in the docs, going from 0 to 6. The values that I want to actually filter the search with are inside a list called "decision_results" and is defined by checkboxes on the website im running.
I tried the following code but the result of the query showed on the page does not change at all:
query = {
"_source": ["title", "raw_text", "i_cite", "cite_me", "relevancia_0", "cdf", "cite_me_semestre", "cdf_grupo", "ramo"],
"query": {
"query_string":
{
"fields": ["raw_text", "i_cite", "title"],
"query": termo
},
"filter": {
"bool": {
"should": [
{ "term": {"result": in decision_results}}
]
}
}
},
"sort": [
{"relevancia_0": {"order": "desc"}},
{"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "Math.round(doc['cdf'].value*1e3)/1.0e3"
},
"order": "desc"}},
{"cite_me_semestre": {"order": "desc"}},
{"cite_me": {"order": "desc"}},
{"date": {"order": "desc"}},
"_score"
],
"highlight": {
"fragment_size": 250,
"number_of_fragments": 1,
"type": "plain",
"order": "score",
"fragmenter": "span",
"pre_tags": "<span style='background-color: #FFFF00'>",
"post_tags": "</span>",
"fields": {"raw_text": {}}
}
}
I expect to only be returned the documents with a "result" value that is inside the list "decision_results"
I think you should read a bit more about the bool query...
replicate this structure into your query:
GET _search
{
"query": {
"bool": {
"must": {
"query_string":
{
"fields": ["raw_text", "i_cite", "title"],
"query": termo
}
},
"filter": {
"term": {"result": in decision_results}
}
}
}
}
where your main query block is in "must" block of the bool query and "term" clause of you filter block is in the filter block of you bool query. Not sure about the syntax of the above example, haven't tested, but it should be close to that.
Also, make sure your web site handles correctly your "term": {"result": in decision_results} part. Is the in decision_results properly translated to a valid json query for your term clause? If that part is an issue, you could provide more information about the context around it so we can provide help with that.

how to remove outer array in django rest json response in readonlymodelview

I getting data in array like
[
{
"category_id": "Glass_Door_Handle",
"category_name": "Glass Door Handle",
"product_name": [
{
"product_id": "SP-001",
"name": "RENUALT-SOLID-MD",
"image": "http://127.0.0.1:8000/media/1-1_aIzfcnG.jpg",
"size": [
"http://127.0.0.1:8000/api/sizemattcp/7/"
],
"timestamp": "2016-01-14T05:33:44.107117Z",
"updated": "2016-01-14T05:33:44.107142Z"
}
]
}
]
I want to data in
{
"category_id": "Glass_Door_Handle",
"category_name": "Glass Door Handle",
"product_name": [
{
"product_id": "SP-001",
"name": "RENUALT-SOLID-MD",
}
]
}
I am using readonlyViewModel
It seems you are calling your api like:
/api/models/?filter=value
And it returns your a list of objects, which contains only one element. To get a single object, just append its primary key to the url:
/api/models/1234/
If you want to get models not by id but by some other field, use the ViewSet.lookup_field parameter to specify the name of that field.

couchdb - querying views with start_key and end_key

I have a couchdb record structure which looks like this
[
{
"app_version": 2,
"platform": "android",
"session": {
"timestamp": "2014-08-20T00:00:00.000Z",
"session_id": "TOnNIhCNQ31LlkpEPQ7XnN1D",
"ip": "202.150.213.66",
"location": "1.30324,103.5498"
}
},
{
"app_version": 2,
"platform": "android",
"session": {
"timestamp": "2014-08-21T00:00:00.000Z",
"session_id": "TOnNIhCNQ31LlkpEPQ7XnN1D",
"ip": "202.150.213.66",
"location": "1.30324,103.5498"
}
}
{
"app_version": 2,
"platform": "ios",
"session": {
"timestamp": "2014-08-21T00:00:00.000Z",
"session_id": "TOnNIhCNQ31LlkpEPQ7XnN1D",
"ip": "202.150.213.66",
"location": "1.30324,103.5498"
}
},
{
"app_version": 1,
"platform": "ios",
"session": {
"timestamp": "2014-08-21T00:00:00.000Z",
"session_id": "TOnNIhCNQ31LlkpEPQ7XnN1D",
"ip": "202.150.213.66",
"location": "1.30324,103.5498"
}
}
]
I need to query all the records which happened between a a given number of dates and a app_version number, and I want to get the total of each by the platform.
So I wrote a map-reduce function like this;
"total": {
"map": "function(doc) {
date = doc.session.timestamp.split("T")[0];
emit([date, doc.app_version,doc.platform], 1);
}",
"reduce": "_count"
}
This gives me the output properly by grouping the records into dates.
["2014-08-20", 2, "android"] 2
["2014-08-20", 2, "ios"] 1
["2014-08-21", 2, "android"] 1
["2014-08-21", 2, "ios"] 1
But the problem comes when I try to query them using the start_key and end_key (to query by the date range)
Im sending the GET request as follows;
http://localhost/dummy_db_new/_design/views/_view/total?
start_key=["2014-08-20",2,WHAT_TO_PUT_HERE]
&end_key=["2014-08-20",2,WHAT_TO_PUT_HERE]
&group=true
I need to know what to put at the above places for it to have any platform(a string).
Oh I was able to find an answer.
The Answer was to use a wildcard. So basically I sent the request with a wildcard which will accept any platform type
http://localhost/dummy_db_new/_design/views/_view/total?
start_key=["2014-08-20",2,0]
&end_key=["2014-08-20",2,{}]
&group=true
{} means javascript object, so it will accept any JS object.