Mongodb db.collection.distinct() on aws documentdb doesn't use index - amazon-web-services

Transitioning to new AWS documentDB service. Currently, on Mongo 3.2. When I run db.collection.distinct("FIELD_NAME") it returns the results really quickly. I did a database dump to AWS document DB (Mongo 3.6 compatible) and this simple query just gets stuck.
Here's my .explain() and the indexes on the working instance versus AWS documentdb:
Explain function on working instance:
> db.collection.explain().distinct("FIELD_NAME")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "db.collection",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"_id" : 0,
"FIELD_NAME" : 1
},
"inputStage" : {
"stage" : "DISTINCT_SCAN",
"keyPattern" : {
"FIELD_NAME" : 1
},
"indexName" : "FIELD_INDEX_NAME",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"FIELD_NAME" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [ ]
},
Explain on AWS documentdb, not working:
rs0:PRIMARY> db.collection.explain().distinct("FIELD_NAME")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "db.collection",
"winningPlan" : {
"stage" : "AGGREGATE",
"inputStage" : {
"stage" : "HASH_AGGREGATE",
"inputStage" : {
"stage" : "COLLSCAN"
}
}
}
},
}
Index on both of these instances:
{
"v" : 1,
"key" : {
"FIELD_NAME" : 1
},
"name" : "FIELD_INDEX_NAME",
"ns" : "db.collection"
}
Also, this database has a couple million documents but there are only about 20 distinct values for that "FIELD_NAME". Any help would be appreciated.
I tried it with .hint("index_name") and that didn't work. I tried clearing plan cache but I get Feature not supported: planCacheClear

COLLSCAN and IXSCAN don't have too much difference in this case, both need to scan all the documents or index entries.

Related

TRC20 curl for getaccount/getbalance

What is the correct curl format to view trc20 balance of an account? I tried the below command, but the output showed no balance.
curl -X POST http://127.0.0.1:8090/wallet/triggersmartcontract -d
'{
"contract_address":"TCFLL5dx5ZJdKnWuesXxi1VPwjLVmWZZy9",
"address":"TUT5SVvKmnxKpKdHi2tXMzPfffQNg7e3MU",
"function_selector":"balanceOf(address)",
"owner_address":"TUT5SVvKmnxKpKdHi2tXMzPfffQNg7e3MU",
"visible":true
}'
output:
{
"result" : {
"result" : true
},
"transaction" : {
"raw_data" : {
"ref_block_hash" : "0d9745f14e11d7fa",
"expiration" : 1605942390000,
"ref_block_bytes" : "9c45",
"contract" : [
{
"type" : "TriggerSmartContract",
"parameter" : {
"type_url" : "type.googleapis.com/protocol.TriggerSmartContract",
"value" : {
"contract_address" : "TCFLL5dx5ZJdKnWuesXxi1VPwjLVmWZZy9",
"owner_address" : "TUT5SVvKmnxKpKdHi2tXMzPfffQNg7e3MU",
"data" : "70a08231"
}
}
}
],
"timestamp" : 1605942331560
},
"txID" : "a3bdcb595a94f9805301fb74b33f2b536d3a6bb5050b7eb7b12808bb1e36fcd7",
"visible" : true,
"ret" : [
{}
],
"raw_data_hex" : "0a029c4522080d9745f14e11d7fa40f0d980cdde2e5a6d081f12690a31747970652e676f6f676c65617069732e636f6d2f70726f746f636f6c2e54726967676572536d617274436f6e747261637412340a1541cab799601a50938457902e1a31d3faa26ca1d76012154118fd0626daf3af02389aef3ed87db9c33f638ffa220470a0823170a891fdccde2e"
},
"constant_result" : [
"0000000000000000000000000000000000000000000000000000000000000000"
]
}
I use the default main_net_config with supportConstant = true. Do I have to enable anything else in my config file?

how to update an attribute in a nested array object DynamoDb AWS

I want to update the choices attribute directly through AWS API Gateway.
{
"id" : "1",
"general" : {
"questions : [
"choices" : ["1","2","3"]
]
}
}
Here is my resolver mapping template
#set($inputRoot = $input.path('$'))
{
"TableName" : "models",
"Key" : {
"accountId" : {
"S": "$inputRoot.accountId"
},
"category" : {
"S" : "model"
}
},
"UpdateExpression" : "SET general.questions = :questions",
"ExpressionAttributeValues" : {
":questions" : {
"L" : [
#foreach($elem in $inputRoot.questions)
{
"M" : {
"choices" : {
"L" : [
#foreach($elem1 in $elem.choices)
{"S" : "$elem1"}
#if(foreach.hasNext),#end
#end
]
}
}
}
#if($foreach.hasNext),#end
#end
]
}
}
}
But I am getting Internal server error 500 on execution.
Gateway response body: {"message": "Internal server error"}
Does DynamoDb support updating this expression or is there any error in mapping template ? If so what should be the mapping template for the object I am trying to update.

AppSync to DynamoDB update query mapping error

I have the following DynamoDB mapping template, to update an existing DynamoDB item:
{
"version" : "2017-02-28",
"operation" : "UpdateItem",
"key" : {
"id": $util.dynamodb.toDynamoDBJson($ctx.args.application.id),
"tenant": $util.dynamodb.toDynamoDBJson($ctx.identity.claims['http://domain/tenant'])
},
"update" : {
"expression" : "SET #sourceUrl = :sourceUrl, #sourceCredential = :sourceCredential, #instanceSize = :instanceSize, #users = :users",
"expressionNames" : {
"#sourceUrl" : "sourceUrl",
"#sourceCredential" : "sourceCredential",
"#instanceSize" : "instanceSize",
"#users" : "users"
},
"expressionValues" : {
":sourceUrl" : $util.dynamodb.toDynamoDbJson($ctx.args.application.sourceUrl),
":sourceCredential" : $util.dynamodb.toDynamoDbJson($ctx.args.application.sourceCredential),
":instanceSize" : $util.dynamodb.toDynamoDbJson($ctx.args.application.instanceSize),
":users" : $util.dynamodb.toDynamoDbJson($ctx.args.application.users)
}
},
"condition" : {
"expression" : "attribute_exists(#id) AND attribute_exists(#tenant)",
"expressionNames" : {
"#id" : "id",
"#tenant" : "tenant"
}
}
}
But I'm getting the following error:
message: "Unable to parse the JSON document: 'Unrecognized token '$util': was expecting ('true', 'false' or 'null')↵ at [Source: (String)"{↵ "version" : "2017-02-28",↵ "operation" : "UpdateItem",↵ "key" : {↵ "id": {"S":"abc-123"},↵ "tenant": {"S":"test"}↵ },↵ "update" : {↵ "expression" : "SET #sourceUrl = :sourceUrl, #sourceCredential = :sourceCredential, #instanceSize = :instanceSize, #users = :users",↵ "expressionNames" : {↵ "#sourceUrl" : "sourceUrl",↵ "#sourceCredential" : "sourceCredential",↵ "#instanceSize" : "instanceSize",↵ "#users" : "users"↵ }"[truncated 400 chars]; line: 17, column: 29]'"
I've tried removing parts, and it seems to be related to the expressionValues, but I can't see anything wrong with the syntax.
Seems like you misspelled the toDynamoDBJson method
Replace
$util.dynamodb.toDynamoDbJson($ctx.args.application.sourceUrl)
with
$util.dynamodb.toDynamoDBJson($ctx.args.application.sourceUrl)
Note the uppercase B in toDynamoDBJson.

Is it possible to create an array pipeline object in AWS datapipeline via Cloudformation?

When creating a data pipeline via API / CLI that creates an EmrCluster, I can specify multiple steps using an array structure:
{ "objects" : [
{ "id" : "myEmrCluster",
"terminateAfter" : "1 hours",
"schedule" : {"ref":"theSchedule"}
"step" : ["some.jar,-param1,val1", "someOther.jar,-foo,bar"] },
{ "id" : "theSchedule", "period":"1 days" }
] }
I can call put-pipeline-definition referencing the file above to create a number of steps for the EMR cluster.
Now if I want to create the pipeline using CloudFormation, I can use the PipelineObjects property in a AWS::DataPipeline::Pipeline resource type to configure the pipeline. However, pipeline objects can only be of type StringValue or RefValue. How can i create an array pipeline object field?
Here's a corresponding cloudformation template:
"Resources" : {
"MyEMRCluster" : {
"Type" : "AWS::DataPipeline::Pipeline",
"Properties" : {
"Name" : "MyETLJobs",
"Activate" : "true",
"PipelineObjects" : [
{
"Id" : "myEmrCluster",
"Fields" : [
{ "Key" : "terminateAfter","StringValue":"1 hours" },
{ "Key" : "schedule","RefValue" : "theSchedule" },
{ "Key" : "step","StringValue" : "some.jar,-param1,val1" }
]
},
{
"Id" : "theSchedule",
"Fields" : [
{ "Key" : "period","StringValue":"1 days" }
]
}
]
}
}
}
With the above template, step is a StringValue, equivalent to:
"step" : "some.jar,-param1,val1"
and not an array like the desired config.
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-datapipeline-pipeline-pipelineobjects-fields.html shows only StringValue and RefValue are valid keys - is it possible to create an array of steps via CloudFormation??
Thanks in advance.
Ah, I'm not sure where I saw that steps could be configured as an array - the documentation has no mention about that - instead, it specifies that to have multiple steps, multiple step entries should be used.
{
"Id" : "myEmrCluster",
"Fields" : [
{ "Key" : "terminateAfter","StringValue":"1 hours" },
{ "Key" : "schedule","RefValue" : "theSchedule" },
{ "Key" : "step","StringValue" : "some.jar,-param1,val1" },
{ "Key" : "step","StringValue" : "someOther.jar,-foo,bar" }
]
}
}

Using multiple location fields in ElasticSearch + Django-Haystack

I'm using django-haystack and ElasticSearch to index Stores.
Until now, each store had one lat,long coordinate pair; we had to change this to represent the fact that one store can deliver products to very different regions (disjunct) I've added up to ten locations (lat,long pairs) to them.
When using one location field everything was working fine and I got right results. Now, with multiple location fields, I can't get any results, not even the previuos one, for the same user and store coordinates.
My Index is as following:
class StoreIndex(indexes.SearchIndex,indexes.Indexable):
text = indexes.CharField(document=True, use_template=True,
template_name='search/indexes/store/store_text.txt')
location0 = indexes.LocationField()
location1 = indexes.LocationField()
location2 = indexes.LocationField()
location3 = indexes.LocationField()
location4 = indexes.LocationField()
location5 = indexes.LocationField()
location6 = indexes.LocationField()
location7 = indexes.LocationField()
location8 = indexes.LocationField()
location9 = indexes.LocationField()
def get_model(self):
return Store
def prepare_location0(self, obj):
# If you're just storing the floats...
return "%s,%s" % (obj.latitude, obj.longitude)
# ..... up to prepare_location9
def prepare_location9(self, obj):
# If you're just storing the floats...
return "%s,%s" % (obj.latitude_9, obj.longitude_9)
Is this the correct way to build my index?
From elasticsearch I get this mapping information:
curl -XGET http://localhost:9200/stores/_mapping?pretty=True
{
"stores" : {
"modelresult" : {
"properties" : {
"django_id" : {
"type" : "string"
},
"location0" : {
"type" : "geo_point",
"store" : "yes"
},
"location1" : {
"type" : "geo_point",
"store" : "yes"
},
"location2" : {
"type" : "geo_point",
"store" : "yes"
},
"location3" : {
"type" : "geo_point",
"store" : "yes"
},
"location4" : {
"type" : "geo_point",
"store" : "yes"
},
"location5" : {
"type" : "geo_point",
"store" : "yes"
},
"location6" : {
"type" : "geo_point",
"store" : "yes"
},
"location7" : {
"type" : "geo_point",
"store" : "yes"
},
"location8" : {
"type" : "geo_point",
"store" : "yes"
},
"location9" : {
"type" : "geo_point",
"store" : "yes"
},
"text" : {
"type" : "string",
"analyzer" : "snowball",
"store" : "yes",
"term_vector" : "with_positions_offsets"
}
}
}
}
}
Then, I try to query this way:
sqs0 = SearchQuerySet().dwithin('location0', usuario, max_dist).distance('location0',usuario).using('stores')
where:
usuario is a Point instance representing the user trying to find stores near his position and
max_dist is a D instance.
If I query directly, using curl I got no results, too.
Here is the result of quering using curl with multiple location fields:
$ curl -XGET http://localhost:9200/stores/modelresult/_search?pretty=true -d '{ "query" : { "match_all": {} }, "filter" : {"geo_distance" : { "distance" : "6km", "location0" : { "lat" : -23.5, "lon" : -46.6 } } } } '
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
If comment out the fields location1-9 from the StoreIndex class everything works fine, but if I leave them to get multiple location points, I get no results for the same query (user position). This happens for the same query, in django as directly, using curl. That is, if I have only one location (say location0), both queries returns correct results. With more locations (location0-9), both queries didn't give any results.
Here's the results of quering directly using curl with only one location field:
$ curl -XGET http://localhost:9200/stores/modelresult/_search?pretty=true -d '{ "query" : { "match_all": {} }, "filter" : {"geo_distance" : { "distance" : "6km", "location0" : { "lat" : -23.5, "lon" : -46.6 } } } } '
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 9,
"max_score" : 1.0,
"hits" : [ {
"_index" : "stores",
"_type" : "modelresult",
"_id" : "store.store.110",
"_score" : 1.0, "_source" : {"django_ct": "store.store", "text": "RESULT OF THE SEARCH \n\n", "django_id": "110", "id": "store.store.110", "location0": "-23.4487554,-46.58912"}
},
lot's of results here
]
}
}
Of course, I rebuild_index after any change in StoreIndex.
Any help on how to get multiple location fields working with elasticsearch and django?
PS.: I've cross posted this question on Django-Haystack and ElasticSearch Google Groups.
https://groups.google.com/d/topic/elasticsearch/85fg7vdCBBU/discussion
https://groups.google.com/d/topic/django-haystack/m2A3_SF8-ls/discussion
Thanks in advance
Mário