Writing a simple group by with map-reduce (Couchbase) - mapreduce

I'm new to the whole map-reduce concept, and i'm trying to perform a simple map-reduce function.
I'm currently working with Couchbase server as my NoSQL db.
I want to get a list of all my types:
key: 1, value: null
key: 2, value: null
key: 3, value: null
Here are my documents:
{
"type": "1",
"value": "1"
}
{
"type": "2",
"value": "2"
}
{
"type": "3",
"value": "3"
}
{
"type": "1",
"value": "4"
}
What I've been trying to do is:
Write a map function:
function (doc, meta) {
emit(doc.type, 0);
}
Using built-in reduce function:
_count
But i'm not getting the expected result.
How can I get all types ?
UPDATE
Please notice that the types are different documents, and I know that reduce works on a document and doesn't executes outside of it.

By default it will reduce all key groups. The feature you want is called group_level:
This is equivalent of reduce=true
~ $ curl 'http://localhost:8092/so/_design/dev_test/_view/test?group_level=0'
{"rows":[
{"key":null,"value":4}
]
}
But here is how you can get reduction by the first level of the key
~ $ curl 'http://localhost:8092/so/_design/dev_test/_view/test?group_level=1'
{"rows":[
{"key":"1","value":2},
{"key":"2","value":1},
{"key":"3","value":1}
]
}
There is also blog post about this: http://blog.couchbase.com/understanding-grouplevel-view-queries-compound-keys
There is appropriate option in couchbase admin console:

Related

How to add map to map array in AWS DynamoDB only when id is not existed?

Here is my DynamoDB structure.
{"books": [
{
"name": "Hello World 1",
"id": "1234"
},
{
"name": "Hello World 2",
"id": "5678"
}
]}
I want to set ConditionExpression to check whether id existed before adding new items to books array. Here is my ConditionExpression. I am using API gateway to access DynamoDB.
"ConditionExpression": "NOT contains(#lu.books.id,:id)",
"ExpressionAttributeValues": {":id": {
"S": "$input.path('$.id')"
}
}
Result when I test the API: no matter id existed or not, success to add items to array.
Any suggestion on how to do it? Thanks!
Unfortunately, you can't. However, there is a workaround.
Store the books in separate rows. For example
PK SK
BOOK_LU#<ID> BOOK_NAME#<book name>#BOOK_ID#<BOOK_ID>
Now you can use the 'if_not_exists' conditional expression
"ConditionExpression": "if_not_exists(id, :id)'",
"ExpressionAttributeValues": {":id": {
"S": "$input.path('$.id')"
}
}
The con is if you were previously fetching the list as part of another object you will have to change that.
The pro is that now you can easily work with the books + you won't hit the max row size limits if the books became too many.

Cannot find the EntityType - SessionEntityType name is composed of Session name and EntityType.display_name

Even though I have provided the correct information in the SessionEntityTypes, the I am getting the following errors. Tried from both REST & Python options, please let me know if there is anything which I am missing in the integrations.
Request
HTTP Method: POST
{
"name": "projects/{projectId}/locations/asia-northeast1/agent/environments/draft/users/-/sessions/c973fe-e44-9b5-34e-b404439b7/entityTypes/speciality_types",
"entities": [
{
"value": "APPLE_KEY",
"synonyms": [
"apple",
"green apple",
"crabapple"
]
},
{
"value": "ORANGE_KEY",
"synonyms": [
"orange"
]
}
],
"entityOverrideMode": "ENTITY_OVERRIDE_MODE_SUPPLEMENT"
}
Response
{
"error": {
"code": 400,
"message": "com.google.apps.framework.request.BadRequestException: Cannot find the EntityType of SessionEntityType 'projects/{projectId}/locations/asia-northeast1/agent/environments/draft/users/-/sessions/c973fe-e44-9b5-34e-b404439b7/entityTypes/speciality_types'. Please note that the SessionEntityType name is composed of Session name and EntityType.display_name.",
"status": "INVALID_ARGUMENT"
}
}
Google Try this API
I am going to paraphrase the issue here in order to ensure I’m not missing any details: you are attempting to create a sessionEntity using the “Try this API” tool, which is the Create (POST) version 2.
The issue is that the “name” you are passing in the request body does not have a valid format for API v2.
The format you are using for the name is:
projects/<ProjectID>/locations/<LocationID>/agent/environments/<EnvironmentID>/users/<UserID>/sessions/<SessionID>/entityTypes/<EntityTypeDisplayName>
Below I’ve listed the two valid name formats for v2 and as you can see the locations/<Location ID> is not needed:
projects/<Project ID>/agent/sessions/<Session ID>/entityTypes/<Entity Type Display Name>
and
projects/<Project ID>/agent/environments/<Environment ID>/users/<User ID>/sessions/<Session ID>/entityTypes/<Entity Type Display Name>
The below request body works as intended, I tested it in the same “Try this API” tool:
{
"name":"projects/{projectId}/agent/environments/draft/users/-/sessions/c973fe-e44-9b5-34e-b404439b7/entityTypes/speciality_types",
"entities":[
{
"value":"APPLE_KEY",
"synonyms":[
"apple",
"green apple",
"crabapple"
]
},
{
"value":"ORANGE_KEY",
"synonyms":[
"orange"
]
}
],
"entityOverrideMode":"ENTITY_OVERRIDE_MODE_SUPPLEMENT"
}

Is there a way to interpolate OutputPath's JsonPath using state's input in AWS step function?

Basically, i have the following input:
{
"name": "abc",
"choice": "choice1"
}
My dynamoDB table has the following structure:
Partition key - "name"
Complex json with choices:
{
"choices":
{
"choice1": ......,
"choice2": ......
}
}
I want to directly read from dynamodb, and get a subitem under the relevant choice:
{
"StartAt": "Read Next Message from DynamoDB",
"States": {
"Read Next Message from DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Parameters": {
"TableName": "my_table",
"Key": {
"customerName": {"S.$": "$.name"}
}
},
"OutputPath": "$.Item.choices.M.choice1.M.myvalue.S",
"Next": "World"
},
"World": {
"Type": "Pass",
"End": true
}
}
}
basically i want to do something like "$.Item.choices.M.{$.choice}.M.myvalue.S", and take one of the output's keys from the input. is this possible?
I think what you're looking for is JsonPath interpolation, but that is not supported as per this thread on AWS forums.
As far as I know Step Functions allow only path reference through $, . and [] operators (Reference Path).
I don't know how much control you have on the DynamoDB table's data but I think your problem can be solved easily if your choice types are modeled in following way
{
"choices": [{
"choiceType": "choice1",
........
},
{
"choiceType": "choice2",
........
}]
}
Now you can use the map state to iterate over the choices array. Note that don't forget to pass the expected choiceType to each iteration.
First state of the map iterator can be a choice state which compares choiceType and moves to appropriate next state. So, basically your rest of the workflow is modeled as iterator of the map state in step 1.
Now, if you don't have the control over DynamoDB table, then you can process the query result in an AWS Lambda.

CouchDB Map syntax

I have documents in CouchDB (v. 2.1.1) as follows:
{
"xyz": "a",
"abc": "def"
},
{
"xyz": "a",
"ghi": "jkl"
},
{
"xyz": "a",
"mno": "pqr"
},
{
"xyz": "a",
"stu": "vwx"
},
{
"xyz": "a",
"bcd": 1000
}
If I run a simple map function, for example:
function (doc) {
if (doc.xyz ){
emit(doc.xyz, doc.abc);}}
I get:
{
"id": "4c3406a1d92942b4fb10d1314e0061a9",
"key": "a",
"value": "def"
},
{
"id": "4c3406a1d92942b4fb10d1314e006ccf",
"key": "a",
"value": null
},
{
"id": "4c3406a1d92942b4fb10d1314e00787f",
"key": "a",
"value": null
},
{
"id": "4c3406a1d92942b4fb10d1314e00871e",
"key": "a",
"value": null
},
{
"id": "4c3406a1d92942b4fb10d1314e00906a",
"key": "a",
"value": null
}
I want to try and eliminate the 'null' outputs.
I am looking at having a CouchDB database with many small documents containing small snippets of information rather than having larger documents containing much more information per document.
My question is, is my document design a good one and if so how do I get just what I am looking for rather than rows of 'nulls'. If my storage design is not ideal, what kind of design should I be looking at to simplify the output given my plan to have many small 'docs'.
EDIT:
Having looked at possible answers, I have decided that having numerous small documents as I described in my question is not giving me the kind of benefit I imangined they would.
I was unable to get a satisfactory solution to the map function to get readable answers.
However, I investigated the 'Mango' query system available in recent updates of CouchDB and I was able using these queries to get acceptable output from a database like my supplied one.
This is what I did:
curl -X POST http://admin:123#127.0.0.1:5984/ptn/_find -d '{"selector": {"$or": [{"abc": {"$gt": null}},{"ghi": {"$gt": null}}]},"fields": ["abc","ghi"]}' -H "Content-Type:application/json"
Un-minified:
{
"selector": {
"$or": [
{
"abc": {
"$gt": null
}
},
{
"ghi": {
"$gt": null
}
}
]
},
"fields": [
"abc",
"ghi"
]
}
The output:
{"docs":[
{"abc":"def"},
{"ghi":"jkl"}
]
.....
A concise answer.
Sorting can be done but sorted fields must be indexed. Indexing is in any case advised for larger data sets.
Reference:
http://docs.couchdb.org/en/2.1.1/api/database/find.html
As my question required a map function, this perhaps cannot be regarded as a valid answer but for me it is an answer. I have tried the 'Mango' query system a little on other databases and it seems to be more useful/powerful than I thought is was although it offers no means of totaling etc.

Fetching With Distinct/Unique Values

I have a Cloudant database with objects that use the following format:
{
"_id": "0ea1ac7d5ef28860abc7030444515c4c",
"_rev": "1-362058dda0b8680a818b38e9c68c5389",
"text": "text-data",
"time-data": "1452988105",
"time-text": "3:48 PM - 16 Jan 2016",
"link": "http://url/to/website"
}
I want to fetch objects where the text attribute is distinct. There will be objects with duplicate text and I want Cloudant to handle removing them from a query.
How do I go about creating a MapReduce view that will do this for me? I'm completely new to MapReduce and I'm having difficulty understanding the relationship between the map and reduce functions. I tried tinkering with the built-in COUNT function and writing my own view, but they've failed catastrophically, haha.
Anyways, would it be easier to just delete the duplicates? If so, how do I do that?
While I'm trying to study this and find ELI5s, would anyone help me out? Thanks in advance! I appreciate it.
I'm not sure a MapReduce view is what you are looking for. A MapReduce view will essentially allow you to get the text and the number of docs with that same text, but you really won't be able to get the rest of the fields in the doc (because MapReduce has no idea which doc to return when multiple docs match the text). Here is a sample MapReduce view:
{
"_id": "_design/textObjects",
"views": {
"by_text": {
"map": "function (doc) { if (doc.text) { emit(doc.text, 1); }}",
"reduce": "_count"
}
},
"language": "javascript"
}
What this is doing:
The Map part of the Map Reduce takes each doc and maps it into a doc that looks like this:
{"key":"text-data", "value":1}
So, if you had 7 docs, 2 where text="text-data" and 5 where text="other-text-data" the data would look like this:
{"key":"text-data", "value":1}
{"key":"text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
The reduce part of the MapReduce ("reduce": "_count") groups the docs above by the key and returns the count:
{"key":"text-data","value":2},
{"key":"other-text-data","value":5}
You can query this view on your Cloudant instance:
https://<yourcloudantinstance>/<databasename>
/_design/textObjects
/_view/by_text?group=true
This will result in something similar to the following:
{"rows":[
{"key":"text-data","value":2},
{"key":"other-text-data","value":5}
]}
If this is not what you are looking for, but rather you are just looking to keep the latest info for a specific text value then you can simply find an existing document that matches that text and update it with new values:
Add an index on text:
{
"index": {
"fields": [
"text"
]
},
"type": "json"
}
Whenever you add a new document find the document with that same exact text:
{
"selector": {
"text": "text-value"
},
"fields": [
"_id",
"text"
]
}
If it exists update it. If not then insert a new document.
Finally, if you want to keep multiple docs with the same text value, but just want to be able to query the latest you could do something like this:
Add a property called latest or similar to your docs.
Add an index on text and latest:
{
"index": {
"fields": [
"text",
"latest"
]
},
"type": "json"
}
Whenever you add a new document find the document with that same exact text where latest == true:
{
"selector": {
"text": "text-value",
"latest" : true
},
"fields": [
"_id",
"text",
"latest"
]
}
Set latest = false on the existing document (if one exists)
Insert the new document with latest = true
This query will find the latest doc for all text values:
{
"selector": {
"text": {"$gt":null}
"latest" : true
},
"fields": [
"_id",
"text",
"latest"
]
}