I want to extract the addressId for a given housenumber in a response with a long array. The array response looks like this (snippet):
: : "footprint":null,
: : "type":null,
: : "addressId":"0011442239",
: : "streetName":"solitudestr.",
: : "streetNrFirstSuffix":null,
: : "streetNrFirst":null,
: : "streetNrLastSuffix":null,
: : "streetNrLast":null,
: : "houseNumber":"25",
: : "houseName":null,
: : "city":"stuttgart",
: : "postcode":"70499",
: : "stateOrProvince":null,
: : "countryName":null,
: : "poBoxNr":null,
: : "poBoxType":null,
: : "attention":null,
: : "geographicAreas":
: : [
: : ],
: : "firstName":null,
: : "lastName":null,
: : "title":null,
: : "region":"BW",
: : "additionalInfo":null,
: : "properties":
: : [
: : ],
: : "extAddressId":null,
: : "entrance":null,
: : "district":null,
: : "addressLine1":null,
: : "addressLine2":null,
: : "addressLine3":null,
: : "addressLine4":null,
: : "companyName":null,
: : "contactName":null,
: : "houseNrExt":null,
: : "derbyStack":false
: },
: {
: : "footprint":null,
: : "type":null,
: : "addressId":"0011442246",
: : "streetName":"solitudestr.",
: : "streetNrFirstSuffix":null,
: : "streetNrFirst":null,
: : "streetNrLastSuffix":null,
: : "streetNrLast":null,
: : "houseNumber":"26",
: : "houseName":null,
: : "city":"stuttgart",
: : "postcode":"70499",
: : "stateOrProvince":null,
: : "countryName":null,
: : "poBoxNr":null,
: : "poBoxType":null,
: : "attention":null,
: : "geographicAreas":
: : [
: : ],
: : "firstName":null,
: : "lastName":null,
: : "title":null,
: : "region":"BW",
: : "additionalInfo":null,
: : "properties":
: : [
: : ],
: : "extAddressId":null,
: : "entrance":null,
: : "district":null,
: : "addressLine1":null,
: : "addressLine2":null,
: : "addressLine3":null,
: : "addressLine4":null,
: : "companyName":null,
: : "contactName":null,
: : "houseNrExt":null,
: : "derbyStack":false
: },
i only show 2 housenumbers in this response as an example but the original response is bigger.
Q: How can i match the adressId for a specific houseNumber (i have these houseNumbers in my CSV dataset) ? I Could do a regex which extracts all addressId's but then i'd have to use the correct matching no. in Jmeter. However, i cannot assume that the ordening of these will remain same in the different environments we test the script against.
I would recommend reconsidering using regular expressions to deal with JSON data.
Starting from JMeter 3.0 you have a JSON Path PostProcessor. Using it you can execute arbitrary JSONPath queries so extracting the addressID for the given houseNumber would be as simple as:
`$..[?(#.houseNumber == '25')].addressId`
Demo:
You can use a JMeter Variable instead of the hard-coded 25 value like:
$..[?(#.houseNumber == '${houseNumber}')].addressId
If for some reason you have to use JMeter < 3.0 you still can have JSON Path postprocessing capabilities using JSON Path Extractor via JMeter Plugins
See Advanced Usage of the JSON Path Extractor in JMeter article, in particular Conditional Select chapter for more information.
You may use a regex that will capture the digits after addressId and before a specific houseNumber if you use an unrolled tempered greedy token (for better efficiency) in between them to make sure the regex engine does not overflow to another record.
"addressId":"(\d+)"(?:[^\n"]*(?:\n(?!: +: +\[)[^\n"]*|"(?!houseNumber")[^\n"]*)*"houseNumber":"25"|$)
See the regex demo (replace 25 with the necessary house number)
Details:
"addressId":" - literal string
(\d+) - Group 1 ($1$ template value) capturing 1+ digits
" - a quote
(?:[^\n"]*(?:\n(?!: +: +\[)[^\n"]*|"(?!houseNumber")[^\n"]*)*"houseNumber":"25"|$) - a non-capturing group with 2 alternatives, one being $ (end of string) or:
[^\n"]* - zero or more chars other than newline and "
(?: - then come 2 alternatives:
\n(?!: +: +\[)[^\n"]* - a newline not followed with : : [ like string and followed with 0+chars other than a newline and "
| - or
"(?!houseNumber")[^\n"]* - a " not followed with houseNumber and followed with 0+chars other than a newline and "
)* - than may repeat 0 or more times
"houseNumber":"25" - hourse number literal string.
Related
I have the following DynamoDB mapping template, to update an existing DynamoDB item:
{
"version" : "2017-02-28",
"operation" : "UpdateItem",
"key" : {
"id": $util.dynamodb.toDynamoDBJson($ctx.args.application.id),
"tenant": $util.dynamodb.toDynamoDBJson($ctx.identity.claims['http://domain/tenant'])
},
"update" : {
"expression" : "SET #sourceUrl = :sourceUrl, #sourceCredential = :sourceCredential, #instanceSize = :instanceSize, #users = :users",
"expressionNames" : {
"#sourceUrl" : "sourceUrl",
"#sourceCredential" : "sourceCredential",
"#instanceSize" : "instanceSize",
"#users" : "users"
},
"expressionValues" : {
":sourceUrl" : $util.dynamodb.toDynamoDbJson($ctx.args.application.sourceUrl),
":sourceCredential" : $util.dynamodb.toDynamoDbJson($ctx.args.application.sourceCredential),
":instanceSize" : $util.dynamodb.toDynamoDbJson($ctx.args.application.instanceSize),
":users" : $util.dynamodb.toDynamoDbJson($ctx.args.application.users)
}
},
"condition" : {
"expression" : "attribute_exists(#id) AND attribute_exists(#tenant)",
"expressionNames" : {
"#id" : "id",
"#tenant" : "tenant"
}
}
}
But I'm getting the following error:
message: "Unable to parse the JSON document: 'Unrecognized token '$util': was expecting ('true', 'false' or 'null')↵ at [Source: (String)"{↵ "version" : "2017-02-28",↵ "operation" : "UpdateItem",↵ "key" : {↵ "id": {"S":"abc-123"},↵ "tenant": {"S":"test"}↵ },↵ "update" : {↵ "expression" : "SET #sourceUrl = :sourceUrl, #sourceCredential = :sourceCredential, #instanceSize = :instanceSize, #users = :users",↵ "expressionNames" : {↵ "#sourceUrl" : "sourceUrl",↵ "#sourceCredential" : "sourceCredential",↵ "#instanceSize" : "instanceSize",↵ "#users" : "users"↵ }"[truncated 400 chars]; line: 17, column: 29]'"
I've tried removing parts, and it seems to be related to the expressionValues, but I can't see anything wrong with the syntax.
Seems like you misspelled the toDynamoDBJson method
Replace
$util.dynamodb.toDynamoDbJson($ctx.args.application.sourceUrl)
with
$util.dynamodb.toDynamoDBJson($ctx.args.application.sourceUrl)
Note the uppercase B in toDynamoDBJson.
I have one JSON log file and I am looking for a regex to split the events within it. I have written one regex but it is reading all events as one group.
Log file:
[ {
"name" : "CounterpartyNotional",
"type" : "RiskBreakdown",
"duration" : 20848,
"count" : 1,
"average" : 20848.0
}, {
"name" : "CounterpartyPreSettlement",
"type" : "RiskBreakdown",
"duration" : 15370,
"count" : 1,
"average" : 15370.0
} ]
[ {
"name" : "TraderCurrency",
"type" : "Formula",
"duration" : 344,
"count" : 1,
"average" : 344.0
} ]
PS: I will be using this regex for a Splunk tool.
Your regex does not read all events together. In the line above the regex (on the linked page) there is written "2 matches", which means your regex has split the log, but you must know how to iterate through the matches (i.e. the events) in the language which runs the regex matching.
For example in Python 3 (If you don't mind I simplify the regex):
import re
log = """[ {
"name" : "CounterpartyNotional",
"type" : "RiskBreakdown",
"duration" : 20848,
"count" : 1,
"average" : 20848.0
}, {
"name" : "CounterpartyPreSettlement",
"type" : "RiskBreakdown",
"duration" : 15370,
"count" : 1,
"average" : 15370.0
} ]
[ {
"name" : "TraderCurrency",
"type" : "Formula",
"duration" : 344,
"count" : 1,
"average" : 344.0
} ]"""
event = re.compile(r'{[^}]*?"RiskBreakdown"[^}]*}')
matches = event.findall(log)
print(matches)
And yes, it is true, this is not valid JSON, but on the linked page it is OK, so maybe it's a typo.
I'm doing an insert from Logstash into ElasticSearch. My problem is that I used a template in ES to lay out the data types, and I am sometimes getting values from Logstash that are null values (or dashes) when I've declared in ES that they should be doubles.
So sometimes, ES is getting a '-' instead of something like "2342", and it is rejecting it and causing an error. Now, if I can replace the '-' with the word 'null', ES works fine.
How do I do this? I assume it works with the ruby filter. I need to be able to replace the '-' fields with null when appropriate.
EDIT:
I was asked for sample configs.
So, for example, say the below config is logstash, which will then send data to ES:
filter {
if [type] == "transaction" {
match => ["message", "%{BASE16FLOAT:ts}\t%{IP:orig_ip}\t%{NOTSPACE:orig_port}" ]
}
}
Now my ES template is saying:
"transaction" : {
"properties" :
{
"ts" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"orig_ip" : {
"type" : "ip"
},
"orig_port" : {
"type" : "long"
},
}
}
So if I throw a data set like either of these, it passes:
{"ts" : "123456789.123234", "orig_ip" : "10.0.0.1", "orig_port" : "2342" }
{"ts" : "123456789.123234", "orig_ip" : "10.0.0.1", "orig_port" : null }
I get a success. But, the following [obviously] fails:
{"ts" : "123456789.123234", "orig_ip" : "10.0.0.1", "orig_port" : "-" }
How can I ensure that the "-" (with quotes) gets changed to a null?
If you amend your template by specifying "ignore_malformed": true in your orig_port long field, it should work.
"transaction" : {
"properties" :
{
"ts" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"orig_ip" : {
"type" : "ip"
},
"orig_port" : {
"type" : "long"
"ignore_malformed": true <---- add this line
}
}
}
I need to query a MongoDB database for documents whose field x starts with [text. I tried the following:
db.collection.find({x:{$regex:/^[text/}})
which fails because [ is part of the regex syntax. So I've spent some time trying to find how to escape [ from my regex... without any success so far.
Any help appreciated, thanks!
Using backslash \ in front of the square bracket as below :
db.collection.find({"x":{"$regex":"\\[text"}})
db.collection.find({"x":{"$regex":"^\\[text"}})
Or
db.collection.find({"x":{"$regex":"\\\\[text"}})
db.collection.find({"x":{"$regex":"^\\\\[text"}})
It returns those documents which starts with [text
For ex:
In documents contains following data
{ "_id" : ObjectId("55644128dd771680e5e5f094"), "x" : "[text" }
{ "_id" : ObjectId("556448d1dd771680e5e5f099"), "x" : "[text sd asd " }
{ "_id" : ObjectId("55644a06dd771680e5e5f09a"), "x" : "new text" }
and using db.collection.find({"x":{"$regex":"\\[text"}}) it return following results :
{ "_id" : ObjectId("55644128dd771680e5e5f094"), "x" : "[text" }
{ "_id" : ObjectId("556448d1dd771680e5e5f099"), "x" : "[text sd asd " }
I've got a query that's using a regex anchor and it seems to be slower when running an index scan rather than a collection scan.
A bit of background to the question:
I have a MSSQL database that has approximately 2.8 million rows in a table. We were running the following query against the table to return approximately 2.6 million results in 23 seconds:
select * from table where 'column' like "IL%"
So out of curiosity I decided to see if mongodb could perform this any faster than my MSSQL database and on a new test server I created a mongodb database which I filled 1 collection (test1) with just under 3 million objects. Here's the basic structure of a document in a collection:
> db.test1.findOne()
{
"_id" : 2,
"Other_REV" : "NULL",
"Holidex_Code" : "W8BP0",
"Segment_Name" : "NULL",
"Source" : "Forecast",
"Date_" : ISODate("2009-11-12T11:14:00Z"),
"Rooms_Sold" : 3,
"FB_REV" : "NULL",
"Rate_Code" : "ILM87",
"Export_Date" : ISODate("2010-12-12T11:14:00Z"),
"Rooms_Rev" : 51
}
All of my records have Rate_Code prefixed with IL and I ran the following query against the database which took just over 3 seconds:
> db.test1.find({'Rate_Code':{$regex: /^IL/}}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 2999999,
"nscannedObjects" : 2999999,
"nscanned" : 2999999,
"nscannedObjectsAllPlans" : 2999999,
"nscannedAllPlans" : 2999999,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 3398,
"indexBounds" : {
},
"server" : "MONGODB:27017"
}
Out of curiosity I created an index to see if I could speed up the retrieval at all:
> db.test1.ensureIndex({'Rate_Code':1})
However this appears to actually slow down the resolution of the query to approximately 6 seconds on average:
> db.test1.find({'Rate_Code':{$regex: /^IL/}}).explain()
{
"cursor" : "BtreeCursor Rate_Code_1",
"isMultiKey" : false,
"n" : 2999999,
"nscannedObjects" : 2999999,
"nscanned" : 2999999,
"nscannedObjectsAllPlans" : 2999999,
"nscannedAllPlans" : 2999999,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 5895,
"indexBounds" : {
"Rate_Code" : [
[
"IL",
"IM"
]
]
},
"server" : "MONGODB:27017"
}
The OS has 2GB of memory and appears to be holding both indexes quite comfortably in memory with no disk usage being recorded when the query is ran:
> db.test1.stats()
{
"ns" : "purify.test1",
"count" : 2999999,
"size" : 623999808,
"avgObjSize" : 208.0000053333351,
"storageSize" : 790593536,
"numExtents" : 18,
"nindexes" : 2,
"lastExtentSize" : 207732736,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 153218240,
"indexSizes" : {
"_id_" : 83722240,
"Rate_Code_1" : 69496000
},
"ok" : 1
}
I'm thinking the slow down is due to mongodb performing a full scan of the index followed by a full collection scan as it can't be sure that all my matches are in the index but I'm not entirely sure if this is the case. Is there any way that this could be improved upon for better performance?
Thanks for any help.