CouchDB query to get the doc with MAX timestamp - mapreduce

My CouchDB document format as below and based on the price changes there can be multiple documents with same product_id & store_id
{
"_id": "6b645d3b173b4776db38eb9fe6014a4c",
"_rev": "1-86a1d9f0af09beaa38b6fbc3095f06a8",
"product_id": "6b645d3b173b4776db38eb9fe60148ab",
"store_id": "0364e82c13b66325ee86f99f53049d39",
"price": "12000",
"currency": "AUD_$",
"time": 1579000390326
}
and I need to get the latest document (by time - the timestamp) for given product_id & store_id
For this, with my current solution I have to do two queries as below;
To get the latest timestamp. This returns the latest timestamp for given product_id & store_id
"max_time_by_product_store_id": {
"reduce": "function(keys, values) {var ids = []
values.forEach(function(time) {
if (!isNaN(time)){
ids.push(time);
}
});
return Math.max.apply(Math, ids)
}",
"map": "function (doc) {emit([doc.store_id, doc.product_id], doc.time);}"
}
Based on the latest timestamp, again I query to get the document with three parameters that are store_id, product_id & time as below,
"store_product_time": {
"map": "function (doc) {
emit([doc.store_id, doc.product_id, doc.time]);
}"
}
This works perfectly for me but my problem is I need to do two DB queries to get the document and looking for a solution to fetch the document within one DB query.
In CouchDB selector also has no way to get the document by MAX value.

With CouchDB's /db/_find, you can descending sort the result and limit the result to one document as follows:
{
"selector": {
"_id": {
"$gt": null
}
},
"sort": [
{
"time": "desc"
}
],
"limit": 1
}
CURL
curl -H 'Content-Type: application/json' -X POST http://localhost:5984/<db>/_find -d '{"selector":{"_id":{"$gt":null}},"sort":[{"time": "desc"}],"limit": 1}'
Please note that an index must previously be created for the sort field time (see /db/_index).

Related

How do I write a query for a json in logs insights?

I have a simple message in the form of json like below in one of the log group. The query that I use is {$.level = "INFO"} This doesn't bring up any result. What could be the problem? Can somebody help please.
{
"level": "INFO",
"location": "lambda_handler:31",
"message": {
"msg": "abc",
"event": {
"Records": [
{
.
.
.
}]
}
}
}
Now CloudWatch Log Insights allows to filter based on json fields.
The sintax is as following:
Filter based on field 'level'
filter level = 'INFO'
| display level, #message
Filter based on nested fields
filter message.msg != '123'
| display message.msg, #message
Documentation:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_AnalyzeLogData-discoverable-fields.html#CWL_AnalyzeLogData-discoverable-JSON-logs

How to specify attributes to return from DynamoDB through AppSync

I have an AppSync pipeline resolver. The first function queries an ElasticSearch database for the DynamoDB keys. The second function queries DynamoDB using the provided keys. This was all working well until I ran into the 1 MB limit of AppSync. Since most of the data is in a few attributes/columns I don't need, I want to limit the results to just the attributes I need.
I tried adding AttributesToGet and ProjectionExpression (from here) but both gave errors like:
{
"data": {
"getItems": null
},
"errors": [
{
"path": [
"getItems"
],
"data": null,
"errorType": "MappingTemplate",
"errorInfo": null,
"locations": [
{
"line": 2,
"column": 3,
"sourceName": null
}
],
"message": "Unsupported element '$[tables][dev-table-name][projectionExpression]'."
}
]
}
My DynamoDB function request mapping template looks like (returns results as long as data is less than 1 MB):
#set($ids = [])
#foreach($pResult in ${ctx.prev.result})
#set($map = {})
$util.qr($map.put("id", $util.dynamodb.toString($pResult.id)))
$util.qr($map.put("ouId", $util.dynamodb.toString($pResult.ouId)))
$util.qr($ids.add($map))
#end
{
"version" : "2018-05-29",
"operation" : "BatchGetItem",
"tables" : {
"dev-table-name": {
"keys": $util.toJson($ids),
"consistentRead": false
}
}
}
I contacted the AWS people who confirmed that ProjectionExpression is not supported currently and that it will be a while before they will get to it.
Instead, I created a lambda to pull the data from DynamoDB.
To limit the results form DynamoDB I used $ctx.info.selectionSetList in AppSync to get the list of requested columns, then used the list to specify the data to pull from DynamoDB. I needed to get multiple results, maintaining order, so I used BatchGetItem, then merged the results with the original list of IDs using LINQ (which put the DynamoDB results back in the correct order since BatchGetItem in C# does not preserve sort order like the AppSync version does).
Because I was using C# with a number of libraries, the cold start time was a little long, so I used Lambda Layers pre-JITed to Linux which allowed us to get the cold start time down from ~1.8 seconds to ~1 second (when using 1024 GB of RAM for the Lambda).
AppSync doesn't support projection but you can explicitly define what fields to return in the response template instead of returning the entire result set.
{
"id": "$ctx.result.get('id')",
"name": "$ctx.result.get('name')",
...
}

Range query for long type in aws elasticsearch

I am trying to query an elasticsearch index in AWS to get all entries with a mass attribute greater than 1000, the datatype for the attribute is Long.
I found the range query and have tried that (see example below) but it's returning nothing but when I use other queries they return attributes with mass greater than 1000 so they're definitely in the index.
This is the Range query I'm trying:
{
"method": "POST",
"index": "users",
"type": "user",
"path": "_search?filter_path=filter",
"body": {
"size": 20,
"from": 0,
"query": {
"bool": {
"must":[{
"range": {
"mass": {
"gte": 1000
}
}
}]
}
}
}
}
I'm not getting any error messages, just zero hits.
So the problem that's causing to get you zero hits is the filter_path parameter you specify in
"path": "_search?filter_path=filter"
As stated in the official documentation the filter_path parameter is part of the common options regarding the REST API's. That means you can always add that parameter.
With Response Filtering you can reduce the response returned by Elasticsearch. Since you defined
_search?filter_path=filter
you probably get zero hits because there is no filter-element that can be returned.

AWS IoT rule - timestamp for Elasticsearch

Have a bunch of IoT devices (ESP32) which publish a JSON object to things/THING_NAME/log for general debugging (to be extended into other topics with values in the future).
Here is the IoT rule which kind of works.
{
"sql": "SELECT *, parse_time(\"yyyy-mm-dd'T'hh:mm:ss\", timestamp()) AS timestamp, topic(2) AS deviceId FROM 'things/+/stdout'",
"ruleDisabled": false,
"awsIotSqlVersion": "2016-03-23",
"actions": [
{
"elasticsearch": {
"roleArn": "arn:aws:iam::xxx:role/iot-es-action-role",
"endpoint": "https://xxxx.eu-west-1.es.amazonaws.com",
"index": "devices",
"type": "device",
"id": "${newuuid()}"
}
}
]
}
I'm not sure how to set #timestamp inside Elasticsearch to allow time based searches.
Maybe I'm going about this all wrong, but it almost works!
Elasticsearch can recognize date strings matching dynamic_date_formats.
The following format is automatically mapped as a date field in AWS Elasticsearch 7.1:
SELECT *, parse_time("yyyy/MM/dd HH:mm:ss", timestamp()) AS timestamp FROM 'events/job/#'
This approach does not require to create a preconfigured index, which is important for dynamically created indexes, e.g. with daily rotation for logs:
devices-${parse_time("yyyy.MM.dd", timestamp(), "UTC")}
According to elastic.co documentation,
The default value for dynamic_date_formats is:
[ "strict_date_optional_time","yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"]
#timestamp is just a convention as the # prefix is the default prefix for Logstash generated fields. Because you are not using Logstash as a middleman between IoT and Elasticsearch, you don't have a default mapping for #timestamp.
But basically, it is just a name, so call it what you want, the only thing that matters is that you declare it as a timestamp field in the mappings section of the Elasticsearch index.
If for some reason you still need it to be called #timestamp, you can either SELECT it with that prefix right away in the AS section (might be an issue with IoT's sql restrictions, not sure):
SELECT *, parse_time(\"yyyy-mm-dd'T'hh:mm:ss\", timestamp()) AS #timestamp, topic(2) AS deviceId FROM 'things/+/stdout'
Or you use the copy_to functionality when declaring you're mapping:
PUT devices/device
{
"mappings": {
"properties": {
"timestamp": {
"type": "date",
"copy_to": "#timestamp"
},
"#timestamp": {
"type": "date",
}
}
}
}

Fetching With Distinct/Unique Values

I have a Cloudant database with objects that use the following format:
{
"_id": "0ea1ac7d5ef28860abc7030444515c4c",
"_rev": "1-362058dda0b8680a818b38e9c68c5389",
"text": "text-data",
"time-data": "1452988105",
"time-text": "3:48 PM - 16 Jan 2016",
"link": "http://url/to/website"
}
I want to fetch objects where the text attribute is distinct. There will be objects with duplicate text and I want Cloudant to handle removing them from a query.
How do I go about creating a MapReduce view that will do this for me? I'm completely new to MapReduce and I'm having difficulty understanding the relationship between the map and reduce functions. I tried tinkering with the built-in COUNT function and writing my own view, but they've failed catastrophically, haha.
Anyways, would it be easier to just delete the duplicates? If so, how do I do that?
While I'm trying to study this and find ELI5s, would anyone help me out? Thanks in advance! I appreciate it.
I'm not sure a MapReduce view is what you are looking for. A MapReduce view will essentially allow you to get the text and the number of docs with that same text, but you really won't be able to get the rest of the fields in the doc (because MapReduce has no idea which doc to return when multiple docs match the text). Here is a sample MapReduce view:
{
"_id": "_design/textObjects",
"views": {
"by_text": {
"map": "function (doc) { if (doc.text) { emit(doc.text, 1); }}",
"reduce": "_count"
}
},
"language": "javascript"
}
What this is doing:
The Map part of the Map Reduce takes each doc and maps it into a doc that looks like this:
{"key":"text-data", "value":1}
So, if you had 7 docs, 2 where text="text-data" and 5 where text="other-text-data" the data would look like this:
{"key":"text-data", "value":1}
{"key":"text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
{"key":"other-text-data", "value":1}
The reduce part of the MapReduce ("reduce": "_count") groups the docs above by the key and returns the count:
{"key":"text-data","value":2},
{"key":"other-text-data","value":5}
You can query this view on your Cloudant instance:
https://<yourcloudantinstance>/<databasename>
/_design/textObjects
/_view/by_text?group=true
This will result in something similar to the following:
{"rows":[
{"key":"text-data","value":2},
{"key":"other-text-data","value":5}
]}
If this is not what you are looking for, but rather you are just looking to keep the latest info for a specific text value then you can simply find an existing document that matches that text and update it with new values:
Add an index on text:
{
"index": {
"fields": [
"text"
]
},
"type": "json"
}
Whenever you add a new document find the document with that same exact text:
{
"selector": {
"text": "text-value"
},
"fields": [
"_id",
"text"
]
}
If it exists update it. If not then insert a new document.
Finally, if you want to keep multiple docs with the same text value, but just want to be able to query the latest you could do something like this:
Add a property called latest or similar to your docs.
Add an index on text and latest:
{
"index": {
"fields": [
"text",
"latest"
]
},
"type": "json"
}
Whenever you add a new document find the document with that same exact text where latest == true:
{
"selector": {
"text": "text-value",
"latest" : true
},
"fields": [
"_id",
"text",
"latest"
]
}
Set latest = false on the existing document (if one exists)
Insert the new document with latest = true
This query will find the latest doc for all text values:
{
"selector": {
"text": {"$gt":null}
"latest" : true
},
"fields": [
"_id",
"text",
"latest"
]
}