How do I replace a field in logstash - replace

I'm doing an insert from Logstash into ElasticSearch. My problem is that I used a template in ES to lay out the data types, and I am sometimes getting values from Logstash that are null values (or dashes) when I've declared in ES that they should be doubles.
So sometimes, ES is getting a '-' instead of something like "2342", and it is rejecting it and causing an error. Now, if I can replace the '-' with the word 'null', ES works fine.
How do I do this? I assume it works with the ruby filter. I need to be able to replace the '-' fields with null when appropriate.
EDIT:
I was asked for sample configs.
So, for example, say the below config is logstash, which will then send data to ES:
filter {
if [type] == "transaction" {
match => ["message", "%{BASE16FLOAT:ts}\t%{IP:orig_ip}\t%{NOTSPACE:orig_port}" ]
}
}
Now my ES template is saying:
"transaction" : {
"properties" :
{
"ts" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"orig_ip" : {
"type" : "ip"
},
"orig_port" : {
"type" : "long"
},
}
}
So if I throw a data set like either of these, it passes:
{"ts" : "123456789.123234", "orig_ip" : "10.0.0.1", "orig_port" : "2342" }
{"ts" : "123456789.123234", "orig_ip" : "10.0.0.1", "orig_port" : null }
I get a success. But, the following [obviously] fails:
{"ts" : "123456789.123234", "orig_ip" : "10.0.0.1", "orig_port" : "-" }
How can I ensure that the "-" (with quotes) gets changed to a null?

If you amend your template by specifying "ignore_malformed": true in your orig_port long field, it should work.
"transaction" : {
"properties" :
{
"ts" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"orig_ip" : {
"type" : "ip"
},
"orig_port" : {
"type" : "long"
"ignore_malformed": true <---- add this line
}
}
}

Related

Elasticsearch document fields type index automatically changes

I'm working on a project containing django, elasticsearch and django-elasticsearch-dsl. I'm collecting a quite large amount of data and saving it to postgres and indexing it to elasticsearch, via django-elasticsearch-dsl.
Im bumping into a problem I dont understant, nor do I have any further hints what happens:
Relevant part of Django's models.py file:
class LinkDenorm(BaseModel):
...
link = CharField(null=True, max_length=2710, db_index=True)
link_expanded = TextField(null=True, db_index=True)
title = TextField(null=True, db_index=True)
text = TextField(null=True)
...
Relevant part of django-elasticsearch-dsl documents.py file:
#registry.register_document
class LinkDenorm(Document):
link_expanded = fields.KeywordField(attr='link_expanded')
class Index:
name = 'denorms_v10'
class Django:
model = models.LinkDenorm
fields = [
...
'link',
'title',
'text',
...
]
After data is successfully indexed, I verify that the index is containing the correct fields:
curl -X GET -u <myuser>:<mypasswd> "http://<my-hostname>/denorms_v10/?pretty"
{
"denorms_v10" : {
"mappings" : {
"properties" : {
...
"link" : {
"type" : "text"
},
"title" : {
"type" : "text"
},
"text" : {
"type" : "text"
}
"link_expanded" : {
"type" : "keyword"
},
...
}
}
}
}
After a certain amount of time (sometimes weeks, sometimes days) the index fields are changed. Executing the same CURL lookup as before gives me:
curl -X GET -u <myuser>:<mypasswd> "http://<my-hostname>/denorms_v10/?pretty"
{
"denorms_v10" : {
"mappings" : {
"properties" : {
...
"link" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"text" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"link_expanded" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
...
}
}
}
}
After the change happens, the queries fail, since the datatype is not correct. After investigating elasticsearch and django logs, there is nothing that would give a clue what happens with the index.
I'm a bit lost and running out of ideas. Any suggestions are most welcome. Thank you!
Miha, Your index probably use kind of an ILM without any index template.
Either you query an alias, and aliases under that are changing.
Either a process on your side delete regularly the index (depending on his size or the number of documents in it)
Then when your app do a post again it recreate an index with default Elastic mapping.

Is there an node like statement available in mongodb

I want to filter the decimal value in child array of json file.In below sample code i want to apply the like function to get the json value like(t1,t2) in below sample file.
Sample code:
db.getCollection('temp').find({},{"temp.text./.*t.*/.value":1})
Sample Json file:
{
"_id" :0"),
"temp" : {
"text" : {
"t1" : {
"value" : "960"
},
"t2" : {
"value" : "959"
},
"t3" : {
"value" : "961"
},
"t4" : {
"value" : "962"
},
"t5" : {
"value" : "6.0"
}
}
}
}
MongoDB doesn't have a way to filter field names directly other than projection, which is exact match only.
However, using aggregation you can use $objectToArray, which would convert the object {"t1" : {"value" : "960"}} to [{"k":"t1","v":{"value":"960"}}]. You can then filter based on the value of k, and use $arrayToObject to convert the entries left back into an object.
.aggregate([
{$addFields:{
"temp.text":{
$arrayToObject:{
$filter:{
input:{$objectToArray:"$temp.text"},
cond:{
$regexMatch:{
input:"$$this.k",
regex:/t/
}
}
}
}
}
}}
])
Playground

Regexp query seems to be ignored in elasticsearch

I have the following query:
{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : "dog cat",
"analyzer" : "standard",
"default_operator" : "AND",
"fields" : ["title", "content"]
}
},
{
"range" : {
"dateCreate" : {
"gte" : "2018-07-01T00:00:00+0200",
"lte" : "2018-07-31T23:59:59+0200"
}
}
},
{
"regexp" : {
"articleIds" : {
"value" : ".*?(2561|30|540).*?",
"boost" : 1
}
}
}
]
}
}
}
The fields title, content and articleIds are of type text, dateCreate is of type date. The articleIds field contains some IDs (comma-separated).
Ok, what happens now? I execute the query an get two results: Both documents contain the words "dog" and "cat" in the title or in the content. So far it's correct.
But the second result has the number 3507 in the articleIds field which doesn't match to my query. It seems that the regexp is ignored because title and content already match. What is wrong here?
And here's the document that should not match my query but does:
{
"_index" : "example",
"_type" : "doc",
"_id" : "3007780",
"_score" : 21.223656,
"_source" : {
"dateCreate" : "2018-07-13T16:54:00+0200",
"title" : "",
"content" : "Its raining cats and dogs.",
"articleIds" : "3507"
}
}
And what I'm expecting is that this document should not be in the results because it contains 3507 which is not part of my query...

Elasticsearch regexp query finds no results

I've a problem to build the correct query. I have an index with a field "ids" with the following mapping:
"ids" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
A sample content could look like this:
10,20,30
It's a list of ids. Now I want to make a query with multiple possible ids and I want to make a disjunction (OR) so I decided to use a regexp:
{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : "Test"
}
},
{
"regexp" : {
"ids" : {
"value" : "10031|20|10038",
"boost" : 1
}
}
}
]
}
},
"size" : 10,
"from" : 0
}
The query is executed successfully but with no results. I expected to find 3 results.
If you want to get 10031 or 20 or 10038, you need to add parenthesis.
Change "10031|20|10038" => "(10031|20|10038)"

MongoDB Query For Fields That Vary - Wildcards?

I am looking for a way to get distinct "unit" values from a collection that has a structure similar to the following:
{
"_id" : ObjectId("548b1aee6e444414f00d5cf1"),
"KPI" : {
"NPV" : {
"value" : 100,
"unit" : "kUSD"
},
"NPM" : {
"value" : 100,
"unit" : "kUSD"
},
"GPM" : {
"value" : 50,
"unit" : "CAD"
}
}
}
I looked into using wildcards and regex but from what I have come across this is not supported for field matching. I would like to do something like db.collection.distinct('KPI.*.unit') but cannot determine how and it seems like performance would be poor. Does anyone have a recommendation? Thanks.
It's not a good practice to make the keys a part of the content of the document - don't use keys as data. If you don't change your document structure, you'll need to know what the possible subfields of KPI are. If you don't know what those could be, you will need to examine the documents manually to find them. Then you can issue a distinct for each using dot notation, e.g. db.collection.distinct("KPI.NPM.unit").
If what you're looking for instead is the distinct values of unit across all values of the parent KPI subfield, then you could take the union of all of the results of the distincts. You can also do it easily with an aggregation framework in MongoDB 2.6. For simplicity, I'll assume there's just three distinct subfields of KPI, the ones in the document above.
db.collection.aggregate([
{ "$group" : { "_id" : 0, "NPVunits" : { "$addToSet" : "$KPI.NPV.unit" }, "NPMunits" : { "$addToSet" : "$KPI.NPM.unit" }, "GPMunits" : { "$addToSet" : "$KPI.GPM.unit" } }
{ "$project" : { "distinct_units" : { "$setUnion" : ["$NPVunits", "$NPMunits", "$GPMunits"] } } }
])
You could also structure your data as dynamic attributes. The document above would be recast as something like
{
"_id" : ObjectId("548b1aee6e444414f00d5cf1"),
"KPI" : [
{ "type" : "NPV", "value" : 100, "unit" : "kUSD" },
{ "type" : "NPM", "value" : 100, "unit" : "kUSD" },
{ "type" : "GPM", "value" : 50, "unit" : "CAD" }
]
}
Querying for distinct units is easy now, whether you want it per type or over all types:
Per type (all types in one query)
db.collection.aggregate([
{ "$unwind" : "$KPI" },
{ "$group" : { "_id" : "$KPI.type", "units" : { "$addToSet" : "$KPI.unit" } } }
])
Over all types
db.collection.distinct("KPI.unit")