I have a log for an app that executes every minute or so and at the end it reports number of records it failed with like this: TaskBlahBlah: [0] records failed
Up until now I would simply search through the whole document for ] records failed string and would visually identify the lines with greater then zero records. Is there a way to use regex and search for any non zero value specifically so I don't have to visually go through the list and potentially miss something?
I tried applying some regex to Notepad++ but it seems that either I did it wrong or Noteppad++ has a 'different' regex or something.
thank you
EDIT: just to list some of the things I tried:
[1-9][0-9]|[1-9]
\[[1-9][0-9]\] records failed|\[[1-9]\] records failed
For some reason it picks up things like [1] records failed but not [10] records failed
I guess this should get what you want:
/\[[1-9]\d*\] records failed/
Related
I need to parse a large amount of data in a log file, ideally I can do this by splitting the file into a list where each entry in the list is an individual entry in the log.
Every time a log entry is made it is prefixed with a string following this pattern:
"4404: 21:42:07.433 - After this point there could be anything (including new line characters and such). However, as soon as the prefix repeats that indicates a new log entry."
4404 Can be any number, but is always then followed by a :.
21:42:07.433 is the 21 hours 42 mins 7 seconds 433 milliseconds.
I don't know much about regex, but is it possible to identify this pattern using it?
I figured something like this would work...
"*: [0-24]:[0:60]:[0:60].[0-1000] - *"
However, it just throws an exception and I fear I'm not on the right track at all.
List<string> split_content = Regex.Matches(file_content, #"*: [0-24]:[0:60]:[0:60].[0-1000] - *").Cast<Match>().Select(m => m.Value).ToList();
The following expression would split a string according to your pattern:
\d+: \d{2}:\d{2}:\d{2}\.\d{3}
Add a ^ in the beginning if your delimiting string always starts a line (and use the m flag for regex). Capturing the log chunks with a regex would be more elaborate, I'd suggest just splitting (with Regex.Split) if you have your log content in the memory all at once.
Recently I've been trying to get data from our Bigtable tables using python. I'm able to connect and authenticate with the api, and I can get some sample data, but when I go to add a simple rowkey regex filter I get an empty data set, even though I know there should be data there.
All the rowkeys have a format like this:
XY_1234567_Z
where X and Y are capital letters A-Z and Z is a number 0-9. The _1234567_ is the constant that I provide. So basically I need to get all rows where rowkey is everything that contains _1234567_ for example.
This is the regex I use:
^.._1234567_.$
And this an example of my current code:
...
tbl = instance.table(tableID)
regex = ("^.._" + str(rowID) + "_.$").encode()
fltr = RowKeyRegexFilter(regex)
row_data = tbl.read_rows(filter_=fltr)
print(row_data.rows)
row_data.rows always ends up being an empty dict. I've tried removing encode() and just sending a string, and I've also tried a different regex to be more specific like this "([A-Z][A-Z])_" + str(rowID) + "_([0-9])" which still didn't work. If I try to do row_data.consume_next(), it hangs for a while and eventually gives me a StopIteration error. I've also tested the regex with regex101 and that seems to be fine, so I'm not sure where the issue is.
Looks like you've already figured it out, but please see the documentation for the python Data API [1, 2]. Calling row_data.consume_next() will fetch the next ReadRowsResponse in the stream and store it in row_data.rows. consume_next() will raise a StopIteration exception when there are no more results to consume. Alternatively, you can call row_data.consume_all() to consume all of the results from the stream (up to an optional limit).
[1] https://googleapis.dev/python/bigtable/latest/data-api.html?highlight=consume_next#stream-many-rows-from-a-table
[2] https://gcloud-python-bigtable.readthedocs.io/en/data-api-complete/row-data.html#gcloud_bigtable.row_data.PartialRowsData
It seems all I needed to do was row_data.consume_next() to get the set of data I requested. Stupid mistake. The row_data object is initially an empty dict but is populated once consume_next() reads the next item in the stream. This brought up a new issue where consume_next() would hang if read_rows() didn't find any match, but this is for another question.
I have a big log file which contains IDs. If the ID is present in the log more than 5 times - it's a success. If it's less - I want to know which ID it is.
Ultimately I need a way in Notepad++ that would give me a list of all IDs ([0-9]{10}) where the instance of that is 5 or less.
Is this somehow possible?
Edit: The format of the file is a standard log4j log, so it has a ton of other data. Example (ID in this case is 12345678901234567)
[08-08-2015 02:08:00] [INFO ] Service [329]: Attempting to substitute message ID with 12345678901234567
[08-08-2015 02:08:00] [DEBUG] ParsedBlock [49]: 3296825 => 12345678901234567
[08-08-2015 02:08:00] [DEBUG] LifeCycle [149]: All messages have not yet been sent. Waiting another 2000 milliseconds. [Send: false]
[08-08-2015 02:08:00] [DEBUG] LifeCycle$5 [326]: Running 5, 2592
Since you're in Notepad++ in the first place, you can take advantage of its functionality outside of Search. Be sure you do all this in a copy of the file, not the original, since it makes changes to the file. Since you haven't answered about the format of the file, I'm assuming the file is just the IDs, one on each line.
The first step is to sort the IDs so all the duplicates appear contiguously: Edit -> Line Operations -> Sort Lines As Integers Ascending
Then do this Search/Replace (with Search Mode set to regex):
Search: (\d{17}\r\n)\1{5,}|(\d{17}\r\n)\2*
Replace: $2
You'll be left with only the IDs that occur 5 or fewer times.
Explanation:
The first half of the alternation (\d{17}\r\n)\1{5,} matches any IDs that repeat 6 or more times. The second half (\d{17}\r\n)\2* matches any other IDs, capturing the first instance in group #2. Then the replace puts back that group with $2.
Need some help to generate appropriate Spunk query. I am searching for this but could not come up with a solution.
Currently, I want to ignore all error alerts that are generated for logs with only ev31=error; term. If we use NOT ev31=error; in search query, it also removes results with valid error terms. So the current query will fail in case log contains both error and ev31=error; terms resulting in incorrect results.
Can anyone suggest a example query, where we can ignore ev31=error; term altogether but keep logs with error term.
Try including the string you want to ignore in quotes, so your search might look something like index=myIndex NOT "ev31=error"
I am using fluentd, elasticsearch and kibana to organize logs. Unfortunately, these logs are not written using any standard like apache, so I had to come up with the regex for the format myself. I used this site here to verify that they are working: http://fluentular.herokuapp.com/ .
The logs have roughly this format here:
DEBUG: 24.04.2014 16:00:00 [SingleActivityStrategy] Start Activitiy 'barbecue' zu verabeiten.
the format regex I am using is as follows:
format /(?<pri>([INFO]|[DEBUG]|[ERROR])+)...(?<date>(\d{2}\.\d{2}\.\d{4})).(?<time>(\d{2}:\d{2}:\d{2})).\[(?<subject>(.*))\].(?<msg>(.*))/
Now, judging by that website that is supposed to test specifically fluentd's behaviour with regexes, the output SHOULD be this one:
Record
Key Value
pri DEBUG
date 24.04.2014
subject SingleActivityStrategy
msg Start Activitiy 'barbecue' zu verabeiten.
Instead though, I have this ?bug? that pri is always shortened to DEBU. Same for ERROR which becomes ERRO, only INFO stays INFO. I am not very experienced with regular expressions and I find it hard to believe that this is a bug, still it confuses me and any help is greatly appreciated.
I'm not sure I can link the complete config file because I dont personally own these log files and I am trying to keep it on a level that my boss won't get mad at me for posting sensitive information, but should it definately be needed, I will post them later on after having asked him how much I can reveal.
In general, the logs always look roughly like this:
First the priority, which is either DEBUG, ERROR or INFO, next the date , next what we call the subject which is always written in [ ] and finally just a message.
Here is a link to fluentular with the format I am using and a teststring that produces the right result in fluentular, but not in my config file:
Fluentular
Sorry I couldn't make it work like a regular link to just click on.
Another link to test out regex with my format and test string is this one:
http://rubular.com/r/dfXOkQYNXP
tl;dr version:
my td-agent format regex cuts off the last letter, although fluentular says it shouldn't. My fault or a bug?
How the regex would look if you're trying to match the data specifically:
(INFO|DEBUG|ERROR)\:\s+(\d{2}\.\d{2}\.\d{4})\s(\d{2}:\d{2}:\d{2})\s\[(.*)\](.*)
In your format string, you were using . and ... for where your spaces and colon should be. I'm not to sure on why this works in Fluentular, but you should have matched the \: explicitly and each space between the values.
So you'd be looking at the following regular expression with the Fluentd fields (which are grouping names):
(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))
Meaning your td-agent.conf should look like:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))/
</source>
I would also take a look into comparing Logstash vs. Fluentd. I like Logstash far more because you create Grok filters to match the type of data you want, and it makes formatting your fields much easier because you are providing an abstraction layer, but you essentially will get the same data.
And I would watch out when you're using sites like Rubular, as they are fairly particular about multi-line matching and the like. I'd suggest something like Regexr which gives immediate feedback and you can set global and multiline matching as well.