timeout with couchdb mapReduce when database is huge

timeout with couchdb mapReduce when database is huge - mapreduce

Details:
Apache CouchDB v. 3.1.1
about 5 GB of twitter data have been dumped in partitions
Map reduce function that I have written:
{
"_id": "_design/Info",
"_rev": "13-c943aaf3b77b970f4e787be600dd240e",
"views": {
"trial-view": {
"map": "function (doc) {\n emit(doc.account_name, 1);\n}",
"reduce": "_count"
}
},
"language": "javascript",
"options": {
"partitioned": true
}
}
when I am trying the following command in postman:
http://<server_ip>:5984/mydb/_partition/partition1/_design/Info/_view/trial-view?key="BT"&group=true
I am getting following error:
{
"error": "timeout",
"reason": "The request could not be processed in a reasonable amount of time."
}
Kindly help me how to apply mapReduce on such huge data?

So, I thought of answering my own question, after realizing my mistake. The answer to this is simple. It just needed more time, as the indexing takes a lot of time. you can see the metadata to see the db data being indexed.

Related

Adding pages to a multi-column notion database works flawlessly sometimes and gives a validation error sometimes for the same input

Basically, I'm using Postman to send POST requests to
https://api.notion.com/v1/pages
It works for 70% of the times and rest of the times it gives the following error sometimes. That is, for the same input.
{
"object": "error",
"status": 400,
"code": "validation_error",
"message": "body failed validation. Fix one: body.parent.type should be not present, instead was `\"database_id\"`. body.parent.page_id should be defined, instead was `undefined`."
}
Here's how my body starts
{
"parent": {
"type": "database_id",
"database_id": "a94c42320ef04b6a9c1a7e5e73455557"
},
"properties": {
"Title": {
..................
I'm not posting the entire body because it works flawlessly sometimes.
Please help me out. Is there a way to check logs of the requests that come to my page?

First, I found out that type: database_id is not necessary in parent.
I also found out that syntax errors in the payload returns a 400 error:
body failed validation. Fix one: body.parent.type should be not present, instead was `\"database_id\"`. body.parent.page_id should be defined, instead was `undefined`.
In my case, I wrongly added a value in the same level as parent, properties. Like this:
{
"parent": {
"database_id": "<database_id>"
},
"properties": {
...
},
"wrong_value": {}
}
Since the errors are not that specific, check if you made the same misktake like me, and please also double check if the parent you are trying to post to is actually a database, not a page.

The issue was with having "type: database_id" inside "parent" in the request data.
{
"parent": {
"type": "database_id",(REMOVE THIS LINE)
"database_id": "a94c42320ef04b6a9c1a7e5e73455557"
},
"properties": {
"Title": {
..................
After removing "type" it worked fine. Notion needs to update their docs.

How to specify attributes to return from DynamoDB through AppSync

I have an AppSync pipeline resolver. The first function queries an ElasticSearch database for the DynamoDB keys. The second function queries DynamoDB using the provided keys. This was all working well until I ran into the 1 MB limit of AppSync. Since most of the data is in a few attributes/columns I don't need, I want to limit the results to just the attributes I need.
I tried adding AttributesToGet and ProjectionExpression (from here) but both gave errors like:
{
"data": {
"getItems": null
},
"errors": [
{
"path": [
"getItems"
],
"data": null,
"errorType": "MappingTemplate",
"errorInfo": null,
"locations": [
{
"line": 2,
"column": 3,
"sourceName": null
}
],
"message": "Unsupported element '$[tables][dev-table-name][projectionExpression]'."
}
]
}
My DynamoDB function request mapping template looks like (returns results as long as data is less than 1 MB):
#set($ids = [])
#foreach($pResult in ${ctx.prev.result})
#set($map = {})
$util.qr($map.put("id", $util.dynamodb.toString($pResult.id)))
$util.qr($map.put("ouId", $util.dynamodb.toString($pResult.ouId)))
$util.qr($ids.add($map))
#end
{
"version" : "2018-05-29",
"operation" : "BatchGetItem",
"tables" : {
"dev-table-name": {
"keys": $util.toJson($ids),
"consistentRead": false
}
}
}

I contacted the AWS people who confirmed that ProjectionExpression is not supported currently and that it will be a while before they will get to it.
Instead, I created a lambda to pull the data from DynamoDB.
To limit the results form DynamoDB I used $ctx.info.selectionSetList in AppSync to get the list of requested columns, then used the list to specify the data to pull from DynamoDB. I needed to get multiple results, maintaining order, so I used BatchGetItem, then merged the results with the original list of IDs using LINQ (which put the DynamoDB results back in the correct order since BatchGetItem in C# does not preserve sort order like the AppSync version does).
Because I was using C# with a number of libraries, the cold start time was a little long, so I used Lambda Layers pre-JITed to Linux which allowed us to get the cold start time down from ~1.8 seconds to ~1 second (when using 1024 GB of RAM for the Lambda).

AppSync doesn't support projection but you can explicitly define what fields to return in the response template instead of returning the entire result set.
{
"id": "$ctx.result.get('id')",
"name": "$ctx.result.get('name')",
...
}

Range query for long type in aws elasticsearch

I am trying to query an elasticsearch index in AWS to get all entries with a mass attribute greater than 1000, the datatype for the attribute is Long.
I found the range query and have tried that (see example below) but it's returning nothing but when I use other queries they return attributes with mass greater than 1000 so they're definitely in the index.
This is the Range query I'm trying:
{
"method": "POST",
"index": "users",
"type": "user",
"path": "_search?filter_path=filter",
"body": {
"size": 20,
"from": 0,
"query": {
"bool": {
"must":[{
"range": {
"mass": {
"gte": 1000
}
}
}]
}
}
}
}
I'm not getting any error messages, just zero hits.

So the problem that's causing to get you zero hits is the filter_path parameter you specify in
"path": "_search?filter_path=filter"
As stated in the official documentation the filter_path parameter is part of the common options regarding the REST API's. That means you can always add that parameter.
With Response Filtering you can reduce the response returned by Elasticsearch. Since you defined
_search?filter_path=filter
you probably get zero hits because there is no filter-element that can be returned.

loopback API GET no usable index for cloudant database

Using the Loopback API explorer GET try this button with and without a filter, I am getting result no_usable_index:
{
"error": {
"statusCode": 400,
"name": "Error",
"message": "No index exists for this sort, try indexing by the sort fields.",
"error": "no_usable_index",
"reason": "No index exists for this sort, try indexing by the sort fields.",
"scope": "couch",
"request": {
"method": "post",
"headers": {
"content-type": "application/json",
"accept": "application/json"
},
"uri": "https://XXXXXX:XXXXXX#long-instance-id-number-bluemix.cloudant.com/aac_001_dev_db/_find",
"body": "{\"selector\":{\"loopback__model__name\":\"Center\"},\"use_index\":[\"lb-index-ddoc-Center\",\"lb-index-Center\"],\"sort\":[{\"id:string\":\"asc\"}]}"
},
"headers": {
"x-frame-options": "DENY",
"x-couch-request-id": "658ac2fdf8",
"date": "Tue, 06 Mar 2018 17:45:29 GMT",
"content-type": "application/json",
"cache-control": "must-revalidate",
"strict-transport-security": "max-age=31536000",
"x-content-type-options": "nosniff",
"x-cloudant-request-class": "query",
"x-cloudant-backend": "bm-cc-us-south-02",
"via": "1.1 lb1.bm-cc-us-south-02 (Glum/1.50.1)",
"statusCode": 400,
"uri": "https://XXXXXX:XXXXXX#long-instance-id-number-bluemix.cloudant.com/aac_001_dev_db/_find"
},
"errid": "non_200",
"description": "couch returned 400",
"stack": "Error: No index exists for this sort, try indexing by the sort fields.\n at Request._callback (/home/ubuntu/workspace/aac-001-api/node_modules/loopback-connector-cloudant/node_modules/cloudant/node_modules/cloudant-nano/lib/nano.js:248:15)\n at Request.self.callback (/home/ubuntu/workspace/aac-001-api/node_modules/loopback-connector-cloudant/node_modules/cloudant/node_modules/request/request.js:186:22)\n at emitTwo (events.js:106:13)\n at Request.emit (events.js:191:7)\n at Request.<anonymous> (/home/ubuntu/workspace/aac-001-api/node_modules/loopback-connector-cloudant/node_modules/cloudant/node_modules/request/request.js:1163:10)\n at emitOne (events.js:96:13)\n at Request.emit (events.js:188:7)\n at IncomingMessage.<anonymous> (/home/ubuntu/workspace/aac-001-api/node_modules/loopback-connector-cloudant/node_modules/cloudant/node_modules/request/request.js:1085:12)\n at IncomingMessage.g (events.js:292:16)\n at emitNone (events.js:91:20)\n at IncomingMessage.emit (events.js:185:7)\n at endReadableNT (_stream_readable.js:974:12)\n at _combinedTickCallback (internal/process/next_tick.js:80:11)\n at process._tickCallback (internal/process/next_tick.js:104:9)"
}
}
So I tried searching the lb3 documentation and found a cloudant-connector loopback.io / doc / en / lb3 / Cloudant-connector.html#index that describes the index feature as "To be updated". Hmmm, it is listed as a feature backlog.
Then I found [:link:][2] to the Model-definition-JSON-file which describes manually adding an indexes property. I tried:
// common -> models -> center.json
"indexes": {
"name_index": {"name": 1}
}, ...
I also tried the suggestion: "You can specify indexes at the model property level too, for example:"
"properties": {
"name": {
"type": "string",
"required": true,
"index": true // added this (but without the comment :)
}, ...
Alas, none of these attempts worked. I am still getting the error.
I have watched several off topic youtube videos of Loopback with MongoDB, but it is curious that there is not much available showing exactly how to get the Loopback-connector to work with Cloudant.
At this point I just want:
to GET the two test documents that have been POSTed using the the API explorer.
to know if these cloudant-connector GET methods work at all with lb version 3.
I added a mongodb datasource from the command line and edited the model-config.json file datsource parameter to point to the mongodb database, then performed a similar test. Added two documents with the POST button, then clicked try this with the GET button. It returns the two documents that were posted, just like in the youtube video tutorials.
Update with more clues
In the Cloudant dashboard query page there is a selector. If I make an errant change to the selector, Cloudant returns the error "no_usable_index". This means that the error message is not from Loopback, it is from Cloudant, but passed through Loopback.
Although the Loopback explorer has the same visual look and feel for both the mongo-connector and cloudant-connector; the url (REST) interface to the databases clearly must be different. I assumed that the POST, GET sequence of button clicks in the explorer that works to connect with Mongodb would work with Cloudant. It does not. Cloudant requires that a design document be available in the database to define valid queries.
I know for sure the database is accessible from the command line with:
$ curl $CLOUDANT_URL/$CLOUDANT_DATABASE
and with a design doc defined within the database
$ curl $CLOUDANT_URL/$CLOUDANT_DATABASE/_design/$DDOC_MEDICAL_CENTERS/_view/$VIEW_MEDICAL_CENTERS_TRUE

After much troubleshooting in the old version I found that the error went away when I changed
FROM loopback-connector-cloudant#1.2.5 3-Aug 2017
TO loopback-connector-cloudant#2.0.5 23-Mar 2018
by updating the package.json and running an npm update

Utterances to test lambda function not working (but lambda function itself executes)

I have a lambda function that executes successfully with an intent called GetEvent that returns a specific string. I've created one utterance for this intent for testing purposes (one that is simple and doesn't require any of the optional slots for invoking the skill), but when using the service simulator to test the lambda function with this utterance for GetEvent I'm met with a lambda response that says "The response is invalid". Here is what the interaction model looks like:
#Intent Schema
{
"intents": [
{
"intent": "GetVessel",
"slots": [
{
"name": "boat",
"type": "LIST_OF_VESSELS"
},
{
"name": "location",
"type": "LIST_OF_LOCATIONS"
},
{
"name": "date",
"type": "AMAZON.DATE"
},
{
"name": "event",
"type": "LIST_OF_EVENTS"
}
]
},
{
"intent": "GetLocation",
"slots": [
{
"name": "event",
"type": "LIST_OF_EVENTS"
},
{
"name": "date",
"type": "AMAZON.DATE"
},
{
"name": "boat",
"type": "LIST_OF_VESSELS"
},
{
"name": "location",
"type": "LIST_OF_LOCATIONS"
}
]
},
{
"intent": "GetEvent",
"slots": [
{
"name": "event",
"type": "LIST_OF_EVENTS"
},
{
"name": "location",
"type": "LIST_OF_LOCATIONS"
}
]
}
]
}
With the appropriate custom skill type syntax and,
#First test Utterances
GetVessel what are the properties of {boat}
GetLocation where did {event} occur
GetEvent get me my query
When giving Alexa the utterance get me my query the lambda response should output the string as it did in the execution. I'm not sure why this isn't the case; this is my first project with the Alexa Skills Kit, so I am pretty new. Is there something I'm not understanding with how the lambda function, the intent schema and the utterances are all pieced together?
UPDATE: Thanks to some help from AWSSupport, I've narrowed the issue down to the area in the json request where new session is flagged as true. For the utterance to work this must be set to false (this works when inputting the json request manually, and this is also the case during the lambda execution). Why is this the case? Does Alexa really care about whether or not it is a new session during invocation? I've cross-posted this to the Amazon Developer Forums as well a couple of days ago, but have yet to get a response from someone.

This may or may not have changed -- the last time I used the service simulator (about two weeks ago at the time of writing) it had a pretty severe bug which would lead to requests being mapped to your first / wrong intent, regardless of actual simulated speech input.
So even if you typed in something random like wafaaefgae it simply tries to map that to the first intent you have defined, providing no slots to said intent which may lead to unexpected results.
Your issue could very well be related to this, triggering the same unexpected / buggy behavior because you aren't using any slots in your sample utterance
Before spending more time debugging this, I'd recommend trying the Intent using an actual echo or alternatively https://echosim.io/ -- interaction via actual speech works as expected, unlike the 'simulator'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

timeout with couchdb mapReduce when database is huge - mapreduce

So, I thought of answering my own question, after realizing my mistake. The answer to this is simple. It just needed more time, as the indexing takes a lot of time. you can see the metadata to see the db data being indexed.

Related

Adding pages to a multi-column notion database works flawlessly sometimes and gives a validation error sometimes for the same input

How to specify attributes to return from DynamoDB through AppSync

Range query for long type in aws elasticsearch

loopback API GET no usable index for cloudant database

Utterances to test lambda function not working (but lambda function itself executes)

Categories

Resources