AWS API gateway and elastic search get query - amazon-web-services

I need to call elastic search engine directly from api gateway using http connection eg.
https:////_doc/_search?pretty&filter_path=hits.hits._source
i have n number of orders in elastic search engine which i want to get using get query ,but i want only array of json i posted and dont want any other information in the response.how can i do that ?
eg.
this is what i am getting :
{
"hits" : {
"hits" : [
{
"_index" : "gpss_orders",
"_type" : "_doc",
"_id" : "4867254",
"_score" : 1.0,
"_source" : {
"orderId" : 4867254,
"loadId" : 18214,
"orderTypeId" : 1
}
]
}
}
But i would want response something like this :
[ {
"orderId" : 4867254,
"loadId" : 18214,
"orderTypeId" : 1
}]
do i need to change in api gateway method response?
i changed the api gateway method response template and got the expected out
#set($esOutput = $input.path('$.hits.hits'))
#set($orders = [])
#foreach( $elem in $esOutput )
#set($order = $elem["_source"])
#set($response = $orders.add($order) )
#end
$orders
but now the problem i am facing is that though response from elastic search engine is the proper json,response after method integration template update become like this without any braces :
[{orderId=4867254, loadId=18214, orderTypeId=1, orderTypeName=Fuel}]
response from elastic search :
"took" : 1,
"hits" : {
"hits" : [
{
"_id" : "4867254",
"_score" : 1.0,
"_source" : {
"orderId" : 4867254,
"loadId" : 18214,
"orderTypeId" : 1,

There isn't a way to shape the return object from elasticsearch. Depending on how you access this data, you could have your own server-side code as a proxy make the query and remove extraneous information before returning it to the clients. A bonus is you can use the proxy to decide what information to return depending on factors such as permissions, caching or rate-limiting, etc.

Related

Unable to execute Dataflow Pipeline : "Failed to create job with prefix beam_bq_job_LOAD_textiotobigquerydataflow"

I'm trying to run a dataflow batch job using template "Text file on Cloud Storage to BigQuery". First three steps are working but in the last stage it is getting failed giving error:
Error message from worker: java.lang.RuntimeException: Failed to create job with prefix beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000, reached max retries: 3, last failed job: { "configuration" : { "jobType" : "LOAD", "labels" : { "beam_job_id" : "2022-11-10_02_06_07-15255037958352274885" }, "load" : { "createDisposition" : "CREATE_IF_NEEDED", "destinationTable" : { "datasetId" : "minerals_test_dataset", "projectId" : "jio-big-data-poc", "tableId" : "mytable01" }, "ignoreUnknownValues" : false, "sourceFormat" : "NEWLINE_DELIMITED_JSON", "useAvroLogicalTypes" : false, "writeDisposition" : "WRITE_APPEND" } }, "etag" : "LHqft9L/H4XBWTNZ7BSRXA==", "id" : "jio-big-data-poc:asia-south1.beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000-2", "jobReference" : { "jobId" : "beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000-2", "location" : "asia-south1", "projectId" : "jio-big-data-poc" }, "kind" : "bigquery#job", "selfLink" : "https://bigquery.googleapis.com/bigquery/v2/projects/jio-big-data-poc/jobs/beam_bq_job_LOAD_textiotobigquerydataflow0releaser1025091627592969dd_1a449a94623645758e91dcba53a86498_fc44bdad405c2c80860231502c18eb1e_00001_00000-2?location=asia-south1", "statistics" : { "creationTime" : "1668074949767", "endTime" : "1668074949869", "startTime" : "1668074949869" }, "status" : { "errorResult" : { "message" : "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" }, "errors" : [ { "message" : "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" } ], "state" : "DONE" }, "user_email" : "49449455496-compute#developer.gserviceaccount.com", "principal_subject" : "serviceAccount:49449455496-compute#developer.gserviceaccount.com" }. org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJob.runJob(BigQueryHelpers.java:200) org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:153) org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.finishBundle(WriteTables.java:378)
I tried running same job with other datasets csv files, also the javascript udf and json schema are according to the documentation, but the job is failing at the same stage. So, what can be the possible solution to this error?
The Json schema you given doesn't matches the BigQuery schema of your table :
"Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" }, "errors" : [ { "message" : "Provided Schema does not match Table jio-big-data-poc:minerals_test_dataset.mytable01. Cannot add fields (field: marks)", "reason" : "invalid" } ]
There is a field called field: marks that seems to not exists in the BigQuery table.
If you update your BigQuery schema to match perfectly with the fields of your input Json line and element, that will solve the issue.

Mongodb db.collection.distinct() on aws documentdb doesn't use index

Transitioning to new AWS documentDB service. Currently, on Mongo 3.2. When I run db.collection.distinct("FIELD_NAME") it returns the results really quickly. I did a database dump to AWS document DB (Mongo 3.6 compatible) and this simple query just gets stuck.
Here's my .explain() and the indexes on the working instance versus AWS documentdb:
Explain function on working instance:
> db.collection.explain().distinct("FIELD_NAME")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "db.collection",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [ ]
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"_id" : 0,
"FIELD_NAME" : 1
},
"inputStage" : {
"stage" : "DISTINCT_SCAN",
"keyPattern" : {
"FIELD_NAME" : 1
},
"indexName" : "FIELD_INDEX_NAME",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : {
"FIELD_NAME" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : [ ]
},
Explain on AWS documentdb, not working:
rs0:PRIMARY> db.collection.explain().distinct("FIELD_NAME")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "db.collection",
"winningPlan" : {
"stage" : "AGGREGATE",
"inputStage" : {
"stage" : "HASH_AGGREGATE",
"inputStage" : {
"stage" : "COLLSCAN"
}
}
}
},
}
Index on both of these instances:
{
"v" : 1,
"key" : {
"FIELD_NAME" : 1
},
"name" : "FIELD_INDEX_NAME",
"ns" : "db.collection"
}
Also, this database has a couple million documents but there are only about 20 distinct values for that "FIELD_NAME". Any help would be appreciated.
I tried it with .hint("index_name") and that didn't work. I tried clearing plan cache but I get Feature not supported: planCacheClear
COLLSCAN and IXSCAN don't have too much difference in this case, both need to scan all the documents or index entries.

addApi creates it but returns error

I am adding new api to wso2 as described in the wso2 Publisher APIs
The query is the following
http://testapiaddress:9763/publisher/site/blocks/item-add/ajax/add.jag?action=addAPI&name=PhoneVerification&context=/phoneverify&version=1.0&visibility=public&thumbUrl=&description=Verify a phone number&tags=phone,mobile,multimedia&endpointType=nonsecured&tiersCollection=Gold,Bronze&http_checked=http&https_checked=https&uriTemplate-0=/*&default_version_checked=default_version&bizOwner=xx&bizOwnerMail=xx#ee.com&techOwner=xx&techOwnerMail=ggg#ww.com"&endpoint_config={"production_endpoints":{"url":"http://myaccountapi.dev.payoneer.com","config":null},"endpoint_type":"address"}&swagger={"paths" : {"/CheckPhoneNumber?PhoneNumber={number}" : {"get" : {"parameters" : [{"description" : "phone number", "name" : "number", "allowMultiple" : false, "type" : "string", "required" : true, "in" : "path"}], "responses" : {"200" : {}}, "x-auth-type" : "Application%20%26%20Application%20User", "x-throttling-tier" : "Unlimited"}}, "/test" : {}, "/" : {}}, "swagger" : "2.0", "info" : {"title" : "WeatherAPI", "version" : "1.0.0"}}
I can see that the api was created in publisher portal but i get error as a response
{"error" : false}
Their response is very generic, but maybe someone has idea why i get this error.
This is not an error. It says that there is no error.
If there is an error response will be like below:
{"error" : true, "description":"error description"}

Basic geosearch with ElasticSearch

I'm putting together a proof of concept on AWS using Dynamo and the Amazon ElasticSearch service, and I'm having some trouble getting
I've checked the ES Dashboard and see the following....
I have an index [assets] and a mapping [asset_types]. Below is a sample of some the mappings, with the relevant location
filename *string*
checksum *string*
added_date *date*
General [this is a map]
location
lat *string*
lon *string*
make *string*
model *string*
I want the geo searches to be on the "General.location" field. I've tried a couple different queries so far without any luck, but I'm sure I'm missing something rather obvious.
One is from the official documentation here ,modified to the below which results with this error:
"reason": "failed to parse search source. unknown search element [bool]",
POST assets/_search
{
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "200km",
"General.location" : {
"lat" : 40,
"lon" : -70
}
}
}
}
}
I've also tried a slightly different query which raises ""reason": "failed to find geo_point field [General.location]"
POST assets/_search
{
"filter" : {
"geo_distance" : {
"distance" : "1km",
"General.location" : {
"lat" : 40,
"lon" : -70
}
}
},
"query" : {
"match_all" : {}
}
}
Am I running a query incorrectly? Do I need to update the mapping in the index to specify the geo-index? I thought if I formatted fields properly that wasn't a requirement.
Thanks
The issue lies in your mapping, where your General.location field is not properly mapped. That's the reason you get the error failed to find geo_point field
So instead of
General [this is a map]
location
lat *string*
lon *string*
You need to have
General [this is a map]
location *geo_point*
So you need to modify your mapping accordingly and reindex your data.
The second issue you have is that your first query needs to be enclosed in a query section:
POST assets/_search
{
"query" : {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "200km",
"General.location" : {
"lat" : 40,
"lon" : -70
}
}
}
}
}
}
Once you've fixed both issues, you'll be able to run your query successfully.
In addition to what Val said, I created a new mapping for location
I explicitly created a mapping for this. Note for you other novices out there, I needed to use nested properties update in order to create the "General.deviceLocation". After I did this, Val's update query worked.
PUT assets/_mapping/assets_type
{
"properties": {
"General": {
"properties": {
"deviceLocation": {
"type": "geo_point"
}
}
},
}
}

MongoDB Query For Fields That Vary - Wildcards?

I am looking for a way to get distinct "unit" values from a collection that has a structure similar to the following:
{
"_id" : ObjectId("548b1aee6e444414f00d5cf1"),
"KPI" : {
"NPV" : {
"value" : 100,
"unit" : "kUSD"
},
"NPM" : {
"value" : 100,
"unit" : "kUSD"
},
"GPM" : {
"value" : 50,
"unit" : "CAD"
}
}
}
I looked into using wildcards and regex but from what I have come across this is not supported for field matching. I would like to do something like db.collection.distinct('KPI.*.unit') but cannot determine how and it seems like performance would be poor. Does anyone have a recommendation? Thanks.
It's not a good practice to make the keys a part of the content of the document - don't use keys as data. If you don't change your document structure, you'll need to know what the possible subfields of KPI are. If you don't know what those could be, you will need to examine the documents manually to find them. Then you can issue a distinct for each using dot notation, e.g. db.collection.distinct("KPI.NPM.unit").
If what you're looking for instead is the distinct values of unit across all values of the parent KPI subfield, then you could take the union of all of the results of the distincts. You can also do it easily with an aggregation framework in MongoDB 2.6. For simplicity, I'll assume there's just three distinct subfields of KPI, the ones in the document above.
db.collection.aggregate([
{ "$group" : { "_id" : 0, "NPVunits" : { "$addToSet" : "$KPI.NPV.unit" }, "NPMunits" : { "$addToSet" : "$KPI.NPM.unit" }, "GPMunits" : { "$addToSet" : "$KPI.GPM.unit" } }
{ "$project" : { "distinct_units" : { "$setUnion" : ["$NPVunits", "$NPMunits", "$GPMunits"] } } }
])
You could also structure your data as dynamic attributes. The document above would be recast as something like
{
"_id" : ObjectId("548b1aee6e444414f00d5cf1"),
"KPI" : [
{ "type" : "NPV", "value" : 100, "unit" : "kUSD" },
{ "type" : "NPM", "value" : 100, "unit" : "kUSD" },
{ "type" : "GPM", "value" : 50, "unit" : "CAD" }
]
}
Querying for distinct units is easy now, whether you want it per type or over all types:
Per type (all types in one query)
db.collection.aggregate([
{ "$unwind" : "$KPI" },
{ "$group" : { "_id" : "$KPI.type", "units" : { "$addToSet" : "$KPI.unit" } } }
])
Over all types
db.collection.distinct("KPI.unit")