CloudWatch Metric Filter for checking JSON key exists - amazon-web-services

I'm trying to come up with a metric filter expression that filters CloudWatch Logs when a special JSON key attribute is present.
Use case is the following: the application does all kinds of logging(in JSON format) and whenever it has a special JSON key(nested JSON response from third-part service), I would like to filter it.
Example logs:
{"severity":"INFO","msg":"EVENT","event":{"key1":"value1"}}
{"severity":"INFO","msg":"FooService responded","response":{"response_code":800}}
Filter patterns that I've tried that don't work:
{ $.response }
{ $.response = *}
{ $.response = "*"}
{ $.response EXISTS }
{ $.response IS TRUE }
{ $.response NOT NULL }
{ $.response != NULL }
Expected filtering result:
{"severity":"INFO","msg":"FooService responded","response":{"response_code":800}}
{ $.response EXISTS } does the opposite of what I expect(returns the 1st line rather than then 2nd) but I'm not sure how to negate it.
Reference material: Filter and pattern syntax # CloudWatch User Guide

I haven't found a good solution.
But I did find one at least.
If you search for a key being != a specific value, it seems to do a null check on it.
So if you say:
{$.response != "something_no_one_should_have_ever_saved_this_response_as"}
Then you get all entries where response exists in your json, and where it's not your string (hopefully all of the valid entries)
Definitly not a clean solution, but it seems to be pretty functional

I don't have a solution to the task of finding records where a field exists. Indeed, the linked document in the question specifically calls this out as not supported.
but
If we simply reverse our logic this becomes a more tractable problem. Looking at your data, you want All records where there's a response key but that could also be stated as All records where there isn't an events key.
This means you could accomplish the task with {$.event NOT EXISTS}. Of course, this becomes more complicated the more types of log messages you get (I had to chain three different NOT EXISTS queries for my use case) but it does solve the problem.

Related

Cannot create index on non-empty table

I'm currently using AWS Lambda (NodeJS) with AWS QLDB.
The scenario is like this.
I have the first table and its indexes when I deployed the service. So the table and indexes will be created. My problem is that, once I need to add new table and its indexes; it can't create the index because there's existing table.
My workaround to be able to create new table even if there's an existing table in my Ledger is that I'm querying the list of tables I have.
const getTables = async (transactionExecutor: TransactionExecutor) => {
const statement = `SELECT name FROM information_schema.user_tables`;
return await transactionExecutor.execute(statement);
};
Then I have this condition to check if the table is already existing
const tables = JSON.stringify(result.getResultList());
if (
!JSON.parse(tables).some((object): boolean => object.name === process.env.TABLE_NAME)
) {
console.log('TABLE A NOT EXISTING');
await createTable(transactionExecutor, process.env.TABLE_NAME);
}
if (
!JSON.parse(tables).some(
(object): boolean => object.name === process.env.TABLE_NAME_1,
)
) {
console.log('TABLE B NOT EXISTING');
await createTable(transactionExecutor, process.env.TABLE_NAME_1);
}
I don't know how to do it with indexes, I tried using SQL commands in QLDB but it's not working.
I hope you can help me.
Thank you
I'm not quite sure what your question is (the post title and body hint at different things), but I'm going to do my best to answer.
First, QLDB stores data in Ion, not JSON. So, please use the Ion APIs to parse data and not the JSON ones. The reason your code works at all is because Ion is a superset of JSON and the result set doesn't include types that are unknown to JSON. So, for example, if the result set was changed to include an Ion Timestamp, then your code would break.
Next, actually getting a list of tables has first class support in the driver. Simply use driver.getTableNames.
Third, I think you have a question "can I add an index to a non-empty table?". The answer is "no". This is planned functionality and I will update this answer when it is available. UPDATE: Now you can! https://aws.amazon.com/about-aws/whats-new/2020/09/amazon-qldb-launches-index-improvements/
Finally, I think you're also asking if there is a way to list indexes on a table in the same way as you can list tables in a ledger. The answer to that is 'yes'. The documents returned in information_schema.user_tables look like this:
{
tableId:"...",
name:"THE_TABLE_NAME",
indexes:[
{
expr:"[THE_FIELD_BEING_INDEXED]"
}
],
status:"ACTIVE"
}

Logstash conditional check for nil/null value in a field

I have a json data with some field value as null (eg: "location": null). I would need to check whether this field is null, and take some action.
I have tried using if [location] == 'null' { do something } but it fails, also I have tried with if [location] == 'nill' { do something }
Found some relative links that mentioned to check whether the field exist if [location] but this can't be used in my case.
Please help me to solve this, thanks in advance.
You will need to use the ruby filter to check if the field has a null value.
The following filter checks if the field is null and then, if true, adds a tag to the event.
ruby {
code => "if event.get('location').nil?; event.set('tags','null-value');end"
}
You can then use the tag normally in logstash to do what you want, for example
if "null-value" in [tags] { do something }

Use a view to get the same suffix like maindomain name

I have some documents, how can use a view to get the document which have the same domain name for their email address. like all the document with #gmail.com or #yahoo.com, if endkey can get that results?
Here is what I wrote a view on map, But I do not think this is good idea
function(doc) {
for (var i in doc.emails) {
if (doc.emails[i].emailAddress.toLowerCase().indexOf("#yahoo.ibm.com")!=-1) {
emit(doc.emails[i].emailAddress.toLowerCase(), doc);
}
}
}
}
To make things clear, the endkey parameter is not looking for a suffix. Startkey and endkey are like the limits of keys to get. For example, you could get the document with the id 1 to the id 10 startkey="1"&endkey="10" .
In your case, you want to make a view that will group your documents by their domain name. I created a design document with a byDomain view. The mapping function looks like this :
function(doc){
if(doc.email){ //I used the document's property email for my view.
//Now, we will emit an array key. The first value will be the domain.
//To get the domain, we split the string with the character '#' and we take what comes after.
//Feel free to add more validations
//The second key will be the document id. We don't emit any values. It's faster to simply add
//the includes_docs query parameter.
emit([doc.email.split('#')[1],doc._id]);
}
}
Let's query all my documents to show you what I have
Request : http://localhost:5984/test/_all_docs?include_docs=true
Response:
{"total_rows":4,"offset":0,"rows":[
{"id":"7f34ec3b9332ab4e555bfca202000e5f","key":"7f34ec3b9332ab4e555bfca202000e5f","value":{"rev":"1-c84cf3bf33e1d853f99a4a5cb0a4af74"},"doc":{"_id":"7f34ec3b9332ab4e555bfca202000e5f","_rev":"1-c84cf3bf33e1d853f99a4a5cb0a4af74","email":"steve#gmail.com"}},
{"id":"7f34ec3b9332ab4e555bfca202001101","key":"7f34ec3b9332ab4e555bfca202001101","value":{"rev":"1-53a8a9f2a24d812fe3c98ad0fe020197"},"doc":{"_id":"7f34ec3b9332ab4e555bfca202001101","_rev":"1-53a8a9f2a24d812fe3c98ad0fe020197","email":"foo#example.com"}},
{"id":"7f34ec3b9332ab4e555bfca202001b02","key":"7f34ec3b9332ab4e555bfca202001b02","value":{"rev":"1-cccec02fe7172fb637ac430f0dd25fa2"},"doc":{"_id":"7f34ec3b9332ab4e555bfca202001b02","_rev":"1-cccec02fe7172fb637ac430f0dd25fa2","email":"bar#gmail.com"}},
{"id":"_design/emails","key":"_design/emails","value":{"rev":"4-76785063c7dbeec96c495db76a8faded"},"doc":{"_id":"_design/emails","_rev":"4-76785063c7dbeec96c495db76a8faded","views":{"byDomain":{"map":"\t\tfunction(doc){\n\t\t\tif(doc.email){ //I used the document's property email for my view.\n\t\t\t\t//Now, we will emit an array key. The first value will be the domain.\n\t\t\t\t//To get the domain, we split the string with the character '#' and we take what comes after.\n\t\t\t\t//Feel free to add more validations\n\t\t\t\t//The second key will be the document id. We don't emit any values. It's faster to simply add\n\t\t\t\t//the includes_docs query parameter.\n\t\t\t\temit([doc.email.split('#')[1],doc._id]); \n\t\t\t}\n\t\t}"}},"language":"javascript"}}
]}
As you can see, I got few minimalist documents with the property "email" set.
Let's query my view without any parameters
Request : http://localhost:5984/test/_design/emails/_view/byDomain
Response :
{"total_rows":3,"offset":0,"rows":[
{"id":"7f34ec3b9332ab4e555bfca202001101","key":["example.com","7f34ec3b9332ab4e555bfca202001101"],"value":null},
{"id":"7f34ec3b9332ab4e555bfca202000e5f","key":["gmail.com","7f34ec3b9332ab4e555bfca202000e5f"],"value":null},
{"id":"7f34ec3b9332ab4e555bfca202001b02","key":["gmail.com","7f34ec3b9332ab4e555bfca202001b02"],"value":null}
]}
Let's query only documents with that have the gmail.com domain.
Request : http://localhost:5984/test/_design/emails/_view/byDomain?startkey=["gmail.com"]&endkey=["gmail.com","\ufff0"]
Result :
{"total_rows":3,"offset":1,"rows":[
{"id":"7f34ec3b9332ab4e555bfca202000e5f","key":["gmail.com","7f34ec3b9332ab4e555bfca202000e5f"],"value":null},
{"id":"7f34ec3b9332ab4e555bfca202001b02","key":["gmail.com","7f34ec3b9332ab4e555bfca202001b02"],"value":null}
]}
You can just use a simple map function for this:
function (doc) {
var domain = doc.email.split('#').pop();
// this logic is fairly hack-ish, you may want to be more sophisticated
emit(domain);
}
Then you can simply pass key=gmail.com to get the results you want from the view. I would also add include_docs=true instead of emitting the entire document as your value.
You can read more about views in the official CouchDB docs.

How to do wildcard search using structured prefix operator with AWS CloudSearch

I've currently migrating to the 2013 Cloudsearch API (from the 2011 API). Previously, I had been using a wildcard prefix with my searches, like this:
bq=(and 'first secon*')
My queries sometimes include facet options, which is why I use the boolean query syntax and not the simple version.
I've created a new cloudsearch instance using the 2013 engine and indexed it. The bq parameter is gone now, so I have to use the q parameter with the q.parser=structured parameter to get the same functionality. When I query with something like this:
q.parser=simple&q=first secon*
...I get back a load of results. But when I query with this:
q.parser=structured&q=(prefix 'first secon')
...I get no hits. I don't get an error, just no results found. Am I doing something wrong?
I've just realized that if I do a prefix search for the word firs with the 2013 API, the prefix search seems to be working. But if I have any more than a single term in the query e.g. first secon then the prefix search does not work. So how is this accomplished using the structured prefix operator?
You need to specify the prefix operator for each separate query term, eg:
q=(or (prefix 'firs') (prefix 'secon'))&q.parser=structured
If someones looking for JS code to solve this issue. What you need to do is split the user input on space, and store them in an array. The join the words you want to query back together with pipes.
var params = {
query: ''
};
//Check for spaces
let words = query.split(' ');
let chunks = [];
words.forEach(word => {
chunks.push(`${word}* | ${word}`);
})
params.query = chunks.join(' | ');
cloudsearch.search(params, function(err, data) {
if (err) {
reject(err);
} else {
resolve(data);
}
});

How can I do a "where in" type query using ember-data

How can I perform a where-in type query using ember-data?
Say I have a list of tags - how can I use the store to query the API to get all relevant records where they have one of the tags present?
Something like this:
return this.store.find('tags', {
name: {
"in": ['tag1', 'tag2', 'tag3']
}
})
There isn't built in support for something like that. And, I don't think its needed.
The result that you are after can be obtained in two steps.
return this.store.find('posts'); // I guess its a blog
and then in your controller you use a computed property
filteredPosts: function('model', function() {
var tags = ['tag1', 'tag2', 'tag3'];
return this.get('model').filter(function(post) {
if ( /* post has one of tags */ ) {
}
return false;
});
});
Update: What if there are tens of thousands of tags?!
Amother option is to send a list of tags as a single argument to the back end. You'll have to do a bit of data processing before sending a request and before querying.
return this.store.find('tags', {
tags: ['tag1', 'tag2', 'tag3'].join(', ')
})
In your API you'll know that the tags argument needs to be converted into an array before querying the DB.
So, this is better because you avoid the very expensive nested loop caused by the use of filter. (expensive !== bad, it has its benefits)
It is a concern to think that there will be tens of thousands of tags, if those are going to be available in your Ember app they'll have a big memory footprint and maybe something much more advanced is needed in terms of app design.