Searching for url paths containing "/" in Kibana/EllasticSearch - regex

I'm trying to write a regex in Kibana (v 7.9.1) and I want to get all paths that are like /rest/requirements/<ID_HERE>/ and nothing else at the end. I would expect that the following would work:
"/rest/requirements/[0-9]*/"
After several tests, I noticed that the following query don't work either: "/rest/requirements/"
While if I do .*requirements.*, for example, it works.
So there is something with "/" that I cannot understand. I tried the following as well without success:
.rest.requirements.*
//rest//requirements.*
\/rest\/requirements.*
\\rest\\requirements.*
Btw, I am using the filter as Query DSL as shown below.

Problem solved:
For some reason, that specific regex I mention on the question was not working with that JSON I was passing. But was working for other types. No idea why yet.
The following, however, worked just fine:
{
"regexp": {
"path.keyword": ".*/requirements/[0-9]*/"
}
}

Related

How to extract all matches with regex in VSCode snippets and return them in a specific format?

I am trying to create a snippet that extract certain elements on the path to the file. A path may look something like this:
/src/routes/server/[region]/[serverno]/user/[id]/profile
I want the snippet to give the following output:
region, serverno, id
I have tried multiple ways to do it, but it always requires me to specify which element I want, and I cannot make it so that it matches ALL elements.
"${TM_FILEPATH/.*(?<=\\[)(.*?)(?=])(.*?)(?<=\\[)(.*?)(?=]).*/$1, $3/g}"
would produce:
serverno, id
given the example above.
This is a poor attempt at solving this issue, however this only works for the last 2 elements wrapped in [ ].
Is there a way to do this with VSCode snippets or is it necessary to use something else?
try these
"${TM_FILEPATH/[^\\[]*\\[([^\\]]+)\\][^\\[]*/$1, /g}"
or
"${TM_FILEPATH/[^\\[]*\\[([^\\]]+)\\][^\\[]*\\[([^\\]]+)\\][^\\[]*\\[([^\\]]+)\\][^\\[]*/$1, $2, $3/}"

Aws CloudWatch filtering compact json with # in it

We use serilog to output from our .nrt core app. We are using compact json to keep size down. In compact it seems to put the error key with an # sign;
"#l": "Warning"
I can’t seem to get a filter working it either returns no results or says error. I’ve tried many things but I’m sure this should work;
{ $.#l = "Warning" }
Anyone suggest where I’m going wrong.
I don't think you can use # in the selector. From the docs:
Property selectors are alphanumeric strings that also support '-' and '_' characters.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html#extract-json-log-event-values
One way to get around this would be to match the line as if it's not part of json.
For example, if your log line looks like this:
"#l": "Warning"
you could filter it out with:
[key="#l", colon, value=Warning]
I had the same issue. Most likely you used Serilog.Formatting.Compact.CompactJsonFormatter as me.
Implementing own ITextFormatter is a workaround because prefixes like # or $ are hardcoded inside CompactJsonFormatter.
I used CompactJsonFormatter as a basis, replaced there usage of #, $ by s_ and it works.

Parsing Javascript with Python

In one of my script I use urllib2and BeautifulSoup to parse a HTML page and read a <script> tag.
This is what I get :
<script>
var x_data = {
logged: logged,
lengthcarrousel: 2,
products : [
{
"serial" : "106541823"
...
</script>
My goal is to read the JSON in the x_data variable and I do not know how to do it properly.
I though of :
Convert to string and remove the first chars to the { and same for last }
Use Regular Expression with something like '{.*}' and take the first group
Something else ?
I don't know if these are efficient and if there is some other ways to do it in a nice way.
Do you think a method is preferable to the other ? any method I may not be aware of ?
Thank you in advance for any advice.
EDIT :
Following advice I get the Regexp solution but I can't search in multiple lines despite using re.MULTILINE :
string1 = '<script>
var x_data = {
logged: logged,
lengthcarrousel: 2,
products : [
{
"serial" : "106541823"}
]
};
</script>'
p = re.compile(r'\{.*\};',re.MULTILINE);
m = p.search(string1)
if m:
print m.group(0)
else:
print "Error !"
I always got an "Error !".
EDIT2 :
Works well with re.DOTALL.
I think these methods are essentially the same in terms of elegance and performance (using {.*} may be slightly better because .* is greedy, i.e. there will be almost no backtracking, and because it seems to me more "forgiving" for different JS code formatting nuances). What you may be more interested in is this: https://docs.python.org/3.6/library/json.html.
If it always looks exactly like this, then you can hack a solution like the one you proposed, based on it looking exactly like this.
Because programmers do everything in code, I suspect in practice it will not alway look exactly this, and then any hacky solution will be fragile and will fail at unexpected (read "impossibly inconvenient") moments. (Regex is known to be hacky when it comes to parsing code).
If you want to do this right, you will need to get a real JavaScript parser, apply it to the code fragment defined by the script tag content, to produce an AST, then search the AST for JavaScript nested structures that happen to look like JSON, and take the content of that tree, prettyprinted.
Even this will be fragile in the face of a programmer who assembles the JSON fragment using JavaScript assignment statements. You can handle this by computing data flow, and discovering sets of code that happen to assemble JSON code. This is rather a lot of work.
So you get to decide what the limits on your solution will be, and then accept the consequences when somebody you don't control does something random.

Grails Filter regexs

I am new to grails and so far i have only been able to use simple filters. I want to use filter in an efficient manner.
(I am using grails 2.4.3, with jdk_1.6)
I want to create a filter to allow accessing AppName/ and AppName/user/login and i could not get it right! I wanted to use regex but i am not getting it right!
i tried this
loggedInOnly(uri:'/**',uriExclude :"*.css|*.js|*image*|/|/user/login"){
before = {
println "### ###### #### #"
}
}
and i also tried to revers the regex parameter, but i am getting no luck! I searched all of google but i could not find a single thread to tell me how filter regex work!
i know i could create xxxx(controller:'*', action:'*') filter then use the controllerName and actionName parameters to check! But there gotta be a better way!
My question in a nutshell: How does regex work in filters?
First, take a closer look at the documentation. Notice that uri and uriExclude are ant paths and not regular expressions. Keeping that in mind if you look how ant paths function you will see they aren't capable of logical ors.
So, with all of that in mind it's back to using enabling regex and using the find attribute instead.
loggedInOnly(regex: true, find: '(.​*.css|.*.js|.*image.*|\\/|\\/user\\/login)​', invert: true){
before = {
...
}
}
Notice I hae used invert to have this filter apply to anything that doesn't match any of the patterns inside the find. Also, I wrote this off the top of my head so you may have to spot check the regular expression in your application (I did check it using groovy web console to make sure I didn't really mess up the syntax).
Hope this helps.

Are my regex just wrong or is there a buggy behaviour in td-agent's format behaviour?

I am using fluentd, elasticsearch and kibana to organize logs. Unfortunately, these logs are not written using any standard like apache, so I had to come up with the regex for the format myself. I used this site here to verify that they are working: http://fluentular.herokuapp.com/ .
The logs have roughly this format here:
DEBUG: 24.04.2014 16:00:00 [SingleActivityStrategy] Start Activitiy 'barbecue' zu verabeiten.
the format regex I am using is as follows:
format /(?<pri>([INFO]|[DEBUG]|[ERROR])+)...(?<date>(\d{2}\.\d{2}\.\d{4})).(?<time>(\d{2}:\d{2}:\d{2})).\[(?<subject>(.*))\].(?<msg>(.*))/
Now, judging by that website that is supposed to test specifically fluentd's behaviour with regexes, the output SHOULD be this one:
Record
Key Value
pri DEBUG
date 24.04.2014
subject SingleActivityStrategy
msg Start Activitiy 'barbecue' zu verabeiten.
Instead though, I have this ?bug? that pri is always shortened to DEBU. Same for ERROR which becomes ERRO, only INFO stays INFO. I am not very experienced with regular expressions and I find it hard to believe that this is a bug, still it confuses me and any help is greatly appreciated.
I'm not sure I can link the complete config file because I dont personally own these log files and I am trying to keep it on a level that my boss won't get mad at me for posting sensitive information, but should it definately be needed, I will post them later on after having asked him how much I can reveal.
In general, the logs always look roughly like this:
First the priority, which is either DEBUG, ERROR or INFO, next the date , next what we call the subject which is always written in [ ] and finally just a message.
Here is a link to fluentular with the format I am using and a teststring that produces the right result in fluentular, but not in my config file:
Fluentular
Sorry I couldn't make it work like a regular link to just click on.
Another link to test out regex with my format and test string is this one:
http://rubular.com/r/dfXOkQYNXP
tl;dr version:
my td-agent format regex cuts off the last letter, although fluentular says it shouldn't. My fault or a bug?
How the regex would look if you're trying to match the data specifically:
(INFO|DEBUG|ERROR)\:\s+(\d{2}\.\d{2}\.\d{4})\s(\d{2}:\d{2}:\d{2})\s\[(.*)\](.*)
In your format string, you were using . and ... for where your spaces and colon should be. I'm not to sure on why this works in Fluentular, but you should have matched the \: explicitly and each space between the values.
So you'd be looking at the following regular expression with the Fluentd fields (which are grouping names):
(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))
Meaning your td-agent.conf should look like:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))/
</source>
I would also take a look into comparing Logstash vs. Fluentd. I like Logstash far more because you create Grok filters to match the type of data you want, and it makes formatting your fields much easier because you are providing an abstraction layer, but you essentially will get the same data.
And I would watch out when you're using sites like Rubular, as they are fairly particular about multi-line matching and the like. I'd suggest something like Regexr which gives immediate feedback and you can set global and multiline matching as well.