ElasticSearch AND query in python

ElasticSearch AND query in python - python-2.7

I am trying to query elastic search for logs which have one field with some value and another fields with another value
my logs looks like this in Kibana:
{
"_index": "logstash-2016.08.01",
"_type": "logstash",
"_id": "6345634653456",
"_score": null,
"_source": {
"#timestamp": "2016-08-01T09:03:50.372Z",
"session_id": "value_1",
"host": "local",
"message": "some message here with error",
"exception": null,
"level": "ERROR",
},
"fields": {
"#timestamp": [
1470042230372
]
}
}
I would like to receive all logs which have the value of "ERROR" in the level field (inside _source) and the value of value_1 in the session_id field (inside the _sources)
I am managing to query for one of them but not both together:
from elasticsearch import Elasticsearch
host = "localhost"
es = Elasticsearch([{'host': host, 'port': 9200}])
query = 'session_id:"{}"'.format("value_1")
result = es.search(index=INDEX, q=query)

Since you need to match exact values, I would recommend using filters, not queries.
Filter for your case would look somewhat like this:
filter = {
"filter": {
"and": [
{
"term": {
"level": "ERROR"
}
},
{
"term": {
"session_id": "value_1"
}
}
]
}
}
And you can pass it to filter using es.search(index=INDEX, body=filter)
EDIT: reason to use filters instead of queries: "In filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, e.g."
Source: https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-filter-context.html

Related

Searching through an array in an Elasticsearch field

I have a collection of Elasticsearch documents that look something like this:
{
"_score": 1,
"_id": "inv_s3l9ly4d16csnh1b",
"_source": {
"manufacturer_item_id": "OCN1-1204P-ARS4",
"description": "TLX Headlight",
"ext_invitem_id": "TDF30907",
"tags": [
{
"tag_text": "Test Tag"
}
],
"id": "inv_s3l9ly4d16csnh1b",
},
"_index": "parts"
}
I want to able to search for documents by tag_text under tags, but I also want to search other fields. I put together a multi_match query that looks like this:
{
"query": {
"multi_match": {
"query": "Test Tag",
"type": "cross_fields",
"fields": [
"tags",
"description"
]
}
}
}
But I don't get any results. Can someone tell me what's wrong with my query?

Okay, turns out that I was doing something silly. I got my expected results using this query:
{
"query": {
"multi_match": {
"query": "Test Tag",
"type": "cross_fields",
"fields": [
"tags.tag_text",
"description"
]
}
}
}

What's the best practice for unmarshalling data returned from a dynamo operation in aws step functions?

I am running a state machine running a dynamodb query (called using CallAwsService). The format returned looks like this:
{
Items: [
{
"string" : {
"B": blob,
"BOOL": boolean,
"BS": [ blob ],
"L": [
"AttributeValue"
],
"M": {
"string" : "AttributeValue"
},
"N": "string",
"NS": [ "string" ],
"NULL": boolean,
"S": "string",
"SS": [ "string" ]
}
}
]
}
I would like to unmarshall this data efficiently and would like to avoid using a lambda call for this
The CDK code we're currently using for the query is below
interface FindItemsStepFunctionProps {
table: Table
id: string
}
export const FindItemsStepFunction = (scope: Construct, props: FindItemStepFunctionProps): StateMachine => {
const { table, id } = props
const definition = new CallAwsService(scope, 'Query', {
service: 'dynamoDb',
action: 'query',
parameters: {
TableName: table.tableName,
IndexName: 'exampleIndexName',
KeyConditionExpression: 'id = :id',
ExpressionAttributeValues: {
':id': {
'S.$': '$.path.id',
},
},
},
iamResources: ['*'],
})
return new StateMachine(scope, id, {
logs: {
destination: new LogGroup(scope, `${id}LogGroup`, {
logGroupName: `${id}LogGroup`,
removalPolicy: RemovalPolicy.DESTROY,
retention: RetentionDays.ONE_WEEK,
}),
level: LogLevel.ALL,
},
definition,
stateMachineType: StateMachineType.EXPRESS,
stateMachineName: id,
timeout: Duration.minutes(5),
})
}

Can you unmarshall the data downstream? I'm not too well versed on StepFunctions, do you have the ability to import utilities?
Unmarshalling DDB JSON is as simple as calling the unmarshall function from DynamoDB utility:
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/modules/_aws_sdk_util_dynamodb.html
You may need to do so downstream as StepFunctions seems to implement the low level client.

Step functions still don't make it easy enough to call DynamoDB directly from a step in a state machine without using a Lambda function. The main missing parts are the handling of the different cases of finding zero, one or more records in a query, and the unmarshaling of the slightly complicated format of DynamoDB records. Sadly the $utils library is still not supported in step functions.
You will need to implement these two in specific steps in the graph.
Here is a diagram of the steps that we use as DynamoDB query template:
The first step is used to provide parameters to the query. This step can be omitted and define the parameters in the query step:
"Set Query Parameters": {
"Type": "Pass",
"Next": "DynamoDB Query ...",
"Result": {
"tableName": "<TABLE_NAME>",
"key_value": "<QUERY_KEY>",
"attribute_value": "<ATTRIBUTE_VALUE>"
}
}
The next step is the actual query to DynamoDB. You can also use GetItem instead of Query if you have the record keys.
"Type": "Task",
"Parameters": {
"TableName": "$.tableName",
"IndexName": "<INDEX_NAME_IF_NEEDED>",
"KeyConditionExpression": "#n1 = :v1",
"FilterExpression": "#n2.#n3 = :v2",
"ExpressionAttributeNames": {
"#n1": "<KEY_NAME>",
"#n2": "<ATTRIBUTE_NAME>",
"#n3": "<NESTED_ATTRIBUTE_NAME>"
},
"ExpressionAttributeValues": {
":v1": {
"S.$": "$.key_value"
},
":v2": {
"S.$": "$.attribute_value"
}
},
"ScanIndexForward": false
},
"Resource": "arn:aws:states:::aws-sdk:dynamodb:query",
"ResultPath": "$.ddb_record",
"ResultSelector": {
"result.$": "$.Items[0]"
},
"Next": "Check for DDB Object"
}
The above example seems a bit complicated, using both ExpressionAttributeNames and ExpressionAttributeValues. However, it makes it possible to query on nested attributes such as item.id.
In this example, we only take the first item response with $.Items[0]. However, you can take all the results if you need more than one.
The next step is to check if the query returned a record or not.
"Check for DDB Object": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.ddb_record.result",
"IsNull": false,
"Comment": "Found Context Object",
"Next": "Parse DDB Object"
}
],
"Default": "Do Nothing"
}
And lastly, to answer your original question, we can parse the query result, in case that we have one:
"Parse DDB Object": {
"Type": "Pass",
"Parameters": {
"string_object.$": "$.ddb_record.result.string_object.S",
"bool_object.$": "$.ddb_record.result.bool_object.Bool",
"dict_object": {
"nested_dict_object.$": "$.ddb_record.result.item.M.name.S",
},
"dict_object_full.$": "States.StringToJson($.ddb_record.result.JSON_object.S)"
},
"ResultPath": "$.parsed_ddb_record",
"End": true
}
Please note that:
Simple strings are easily converted by "string_object.$": "$.ddb_record.result.string_object.S"
The same for numbers or booleans by "bool_object.$": "$.ddb_record.result.bool_object.Bool")
Nested objects are parsing the map object ("item.name.$": "$.ddb_record.result.item.M.name.S", for example)
Creation of a JSON object can be achieved by using States.StringToJson
The parsed object is added as a new entry on the flow using "ResultPath": "$.parsed_ddb_record"

Run query by regex on _id field in Elasticsearch

I'm trying to run a regex query in Elastic search based on a field called _id, but I'm getting this error:
Can only use wildcard queries on keyword and text fields - not on
[_id] which is of type [_id]
I've tried regexp:
{
"query": {
"regexp": {
"_id": {
"value": "test-product-all-user_.*",
"flags" : "ALL",
"max_determinized_states": 10000,
"rewrite": "constant_score"
}
}
}
}
and wildcard:
{
"query": {
"wildcard": {
"_id": {
"value": "test-product-all-user_.*",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
But both threw the same error.
This is the complete error just in case:
{ "error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "Can only use wildcard queries on keyword and text fields - not on [_id] which is of type [_id]",
"index_uuid": "Cg0zrr6dRZeHJ8Jmvh5HMg",
"index": "explore_segments_v3"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "explore_segments_v3",
"node": "-ecTRBmnS2OgjHrrq6GCOw",
"reason": {
"type": "query_shard_exception",
"reason": "Can only use wildcard queries on keyword and text fields - not on [_id] which is of type [_id]",
"index_uuid": "Cg0zrr6dRZeHJ8Jmvh5HMg",
"index": "explore_segments_v3"
}
}
] }, "status": 400 }

_id is a special kind of feild in Elasticsearch , It's not really an indexed field like other text fields, it's actually "generated" based on the UID of the document.
You can refer to this link for more information https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
As per the documentation it only supports limited type of queries (term, terms, match, query_string, simple_query_string), and if you want to do more advanced text search like wildcard or regexp you will need to index the ID into an actual text field on the document itself.

How to filter a elasticsearch query with items in a list

I am running an elasticsearch query but now I want to filter it by searching for the value of "result" which is already defined in the docs, going from 0 to 6. The values that I want to actually filter the search with are inside a list called "decision_results" and is defined by checkboxes on the website im running.
I tried the following code but the result of the query showed on the page does not change at all:
query = {
"_source": ["title", "raw_text", "i_cite", "cite_me", "relevancia_0", "cdf", "cite_me_semestre", "cdf_grupo", "ramo"],
"query": {
"query_string":
{
"fields": ["raw_text", "i_cite", "title"],
"query": termo
},
"filter": {
"bool": {
"should": [
{ "term": {"result": in decision_results}}
]
}
}
},
"sort": [
{"relevancia_0": {"order": "desc"}},
{"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "Math.round(doc['cdf'].value*1e3)/1.0e3"
},
"order": "desc"}},
{"cite_me_semestre": {"order": "desc"}},
{"cite_me": {"order": "desc"}},
{"date": {"order": "desc"}},
"_score"
],
"highlight": {
"fragment_size": 250,
"number_of_fragments": 1,
"type": "plain",
"order": "score",
"fragmenter": "span",
"pre_tags": "<span style='background-color: #FFFF00'>",
"post_tags": "</span>",
"fields": {"raw_text": {}}
}
}
I expect to only be returned the documents with a "result" value that is inside the list "decision_results"

I think you should read a bit more about the bool query...
replicate this structure into your query:
GET _search
{
"query": {
"bool": {
"must": {
"query_string":
{
"fields": ["raw_text", "i_cite", "title"],
"query": termo
}
},
"filter": {
"term": {"result": in decision_results}
}
}
}
}
where your main query block is in "must" block of the bool query and "term" clause of you filter block is in the filter block of you bool query. Not sure about the syntax of the above example, haven't tested, but it should be close to that.
Also, make sure your web site handles correctly your "term": {"result": in decision_results} part. Is the in decision_results properly translated to a valid json query for your term clause? If that part is an issue, you could provide more information about the context around it so we can provide help with that.

Regular Expressions and Elastic Search

I am trying to retrieve some company results using elasticsearch. I want to get companies that start with "A", then "B", etc. If I just do a pretty typical query with "prefix" like so
GET apple/company/_search
{
"query": {
"prefix": {
"name": "a"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
But this will return Acme as well as Lemur and Associates, so I need to distinguish between A at the beginning of the whole name versus just A at the beginning of a word.
It would seem like regular expressions would come to the rescue here, but elastic search just ignores whatever I try. In tests with other applications, ^[\S]a* should get you anything that starts with A that doesn't have a space in front of it. Elastic search returns 0 results with the following:
GET apple/company/_search
{
"query": {
"regexp": {
"name": "^[\S]a*"
}
},
"fields": [
"id",
"name",
"websiteUrl"
],
"size": 100
}
In FACT, the Sense UI for Elasticsearch will immediately alert you to a "Bad String Syntax Error". That's because even in a query elastic search wants some characters escaped. Nonetheless ^[\\S]a* doesn't work either.

Searching in Elasticsearch is both about the query itself, but also about the modelling of your data so it suits best the query to be used. One cannot simply index whatever and then try to struggle to come up with a query that does something.
The Elasticsearch way for your query is to have the following mapping for that field:
PUT /apple
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"keyword_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
}
},
"mappings": {
"company": {
"properties": {
"name": {
"type": "string",
"fields": {
"analyzed_lowercase": {
"type": "string",
"analyzer": "keyword_lowercase"
}
}
}
}
}
}
}
And to use this query:
GET /apple/company/_search
{
"query": {
"prefix": {
"name.analyzed_lowercase": {
"value": "a"
}
}
}
}
or
GET /apple/company/_search
{
"query": {
"query_string": {
"query": "name.analyzed_lowercase:A*"
}
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

ElasticSearch AND query in python - python-2.7

Related

Searching through an array in an Elasticsearch field

What's the best practice for unmarshalling data returned from a dynamo operation in aws step functions?

Run query by regex on _id field in Elasticsearch

How to filter a elasticsearch query with items in a list

Regular Expressions and Elastic Search

Categories

Resources