Simple concatenated json line breaker in Splunk - line-breaks

I know this is probably simple, but for some reason I am able to get a line breaker working in Splunk. I am fetching a data source from AWS S3, and multiple events in JSON format are concatenated. e.g.
{"key":"value", {"composite":"result"}}{"something":"else"}
So LINE_BREAKER should match on }{ with the left brace included.
I have SHOULD_LINEMERGE=false and then LINE_BREAKER=(\{.+\})\{ but i loose the closing bracket. The }{ don't have any characters between them (not even a newline), what is the best way to split these?

The LINE_BREAKER attribute requires a capture group, but discards the text that matches the capture group. The solution is to be more creative with the regex.
LINE_BREAKER=\}()\{
Empty capture groups are allowed.
Your comments confuse matters. Are events separated by }{ or by {"key"? The value of LINE_BREAKER should be set to whatever separates events. Once you've established that then you can address the TRUNCATE setting.

Related

Excluding 3dots additional to other characters with regex in a string

I have such an http-url detector regex:
(?:http|https)(?::\/{2}[\w]+)(?:[\/|\.]?)(?:[^\s<"]*)
It works pretty well for the following url representation:
http://www.acer.com/clearfi/download/
What kind of modification I can do to extract
http://schemas.microsoft.com/office/word/2003/wordml2450
from
Huanghhttp://schemas.microsoft.com/office/word/2003/wordml2450...)()()()()()
?
You can modify it to capture:
group of http stuff
followed by (group of) subdomain stuff
followed by as many as possible groups of:
one point or slash
followed by a group of characters (non-point, non-space, non-", non-<)
(?:http|https)(?:\/{2}[\w]+)([\/|\.][^\s<"\.]+)*
I made capturing groups to visualize the results
I've changed your expression here and there: (.*)(https?:\/{2}[\w]+[\/|\.]?[^\s<"]*)(\.{3}.*) and get only second capturing group from it. See example here: https://regex101.com/r/0viPC5/2
This expression probably can be simplified further but I don't know your exact input and search criteria so let's stick with what you already wrote.

Regex for capture fields from multiple format data logs

I have a Regex for capturing certain fields from logs. It working fine for one type of log entry. but sometimes, I get little different format and it capture different field.
Demo
In the given example, there are 2 different log entry. Regex working fine for first entry.
In second entry, there are 3 different sets
1. 314624K->314624K(419456K)
2. 9862316K->9542223K(12478080K)
3. 12261641K->11966292K(12478080K)
my regex is skipping the 1st one and capturing the next two. I want to capture first two occurrence.
My regex:
(?=[^P]*(?:ParNew|P.*ParNew|PSYoungGen|DefNew)).*?(?P<ParNew_before_1>\d+)K->(?P<ParNew_after_1>\d+)K\((?P<young_heap_size>\d+)K\), (?P<par_new_duration>\d+\.\d+) secs\] (?P<ParNew_before_2>\d+)K\->(?P<ParNew_after_2>\d+)K\((?P<total_heap_size>\d+)
I think problem is that i am using ".*?". I have tried to change it with \s+ but its not working as well.

Capture group that captures an entire string minus a section that matches a pattern

I'm not sure if this is possible, but I figured I'd ask anyways. What I need to do is effectively create a search/replace, but without using the regex s/pattern1/pattern2/ syntax as it is not directly exposed to me.
Is it possible to create a capture group that would take an image path, with the image size before the extension and remove the image size.
For instance convert http://example.com/path/to/image/filename-200x200.jpg to http://example.com/path/to/image/filename.jpg using only a capture group and no search/replace bits.
I'm asking as the software I'm working in does not currently have a search/replace functionality.
It's somewhat possible. There's no built-in capability for a match to be something other than a continuous segment of the source text, but you can work around that.
One approach you might consider is the use of non-capturing groups and concatenation. In regex, groups beginning with ?: aren't captured as matches.
For example, given the regex (A)(?:B)(C) and the string "ABC", the result would be:
1. "A"
2. "C"
In your case, then, you could capture around the part you want to ignore, then concatenate the parts you want.
Given the string you provided, http://example.com/path/to/image/filename-200x200.jpg, the regex (.+)(?:-200x200)(.+) returns:
1. "http://example.com/path/to/image/filename"
2. ".jpg"
You could then add the first and second capture groups to produce your intended result.

Insertion syntax for regex in Notepad++ or Perl

Shortform: searching:
"{,[0-9][0-9]," inserting Space+00... getting replaced string segment:
"{,SPACE00[0-9][0-9]," or other so-garbaged data for found [0-9][0-9] sequence ... so how do I search with a regex and insert in the middle???
Longform question:
I'm trying to do a series of simple character insertions -- digits actually -- in a series of mixed model CSV profiling data (five files each with different model parameters, several hundred lines each).
I'm visually challenged and desire to insert padding characters to columize data, so I can focus on tweaking key values, not keeping place data file to data file.
This need where the CSV data lines format are:
*Variable_symbolic-name*,{##,##,* ... ('Set of CSV Numerical Data lists' ...},\n*
an actual data line:
61,parameter17,{,70,6,1,-1,3, 00,0,0,0,0,},,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
to be morphed to:
61,parameter17,\t\t{, 0070,6,1,-1,3, 00,0,0,0,0,},,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Give or take a tab character to align all the { numeric field starts...
I've found searching: "{,[0-9][0-9]," failed but "\{,[0-9][0-9]," succeeds for the find part of the search and replace operation... but have hit a proverbial brick wall in how to do the actual replace (with an insert) of such a short length. (Obviously with so many parameters and files, I'm moving cautiously!)
However, This Perl Help tutorial leaves me in the dark as to how to keep the found ranges and insert padding before (Space, zero, zero to be specific if positive, '-00' if negative) In short, I need to know how to insert 2-3 places in the replace field in Notepad++... and retain the original data without prejudicing it!
Articles herein have cited replacing paragraphs and lines, adding newlines, etc. but this simple insertion alteration seems too simple for you all. But it's been several hours of frustration for me!
Thanks! // Frank
Resolved:
Good news: ({,)([0-9][0-9],) and \1 xx\2 works fine as does ({,)(#[0-9][0-9],) and replacing with \1 xx#\2 ... whether or not tabs are utilized. Obviously the key was ([0-9][0-9],) which included the discrimination of the comma... though I have no idea why that seemed to fail an hour ago with trials made using Sobrinho's help. Must have not tried the sequence. Thanks all!
Try to type this in the search box:
(.+)(\{,[0-9][0-9].*)
And in the replace:
\1\t\t\2
When you have things between parenthesis, they are "stored" by Notepad++ and can be reused in the replace box.
The order of the parenthesis starts with one and are accessed as \1, \2, ...
You tagged it as Perl, so here is how you do it in Perl ...
I prefer to use lookahead assertions rather than backreferences
s/(?= {,[0-9][0-9], ) /\t\t/x
Alternatively, $& contains the matched string ($0 is something different)
s/ {,[0-9][0-9], /\t\t$&/x
You will need a backreference here, meaning something which, in the replace part, will be equal to what you have matched.
Usually, the whole matched part is stored in the $0 backreference. (You can get $1 with a capture group too, and up to $2 with two capture groups, etc)
Back to your question, you could try this:
Find:
(\{,)([0-9][0-9],)
Replace by:
\t\t$1 00$2
This will insert two tab characters before the part that matched \{,[0-9][0-9], (or in other words, replace the part that matched by 2 tab characters and what you matched), then put the first captured part ({,) and then the space and double 0's and then the second captured part, the two digits and following comma.
regex101 demo

Regex to replace email address domains?

I need a regex to obfuscate emails in a database dump file I have. I'd like to replace all domains with a set domain like #fake.com so I don't risk sending out emails to real people during development. The emails do have to be unique to match database constraints, so I only want to replace the domain and keep the usernames.
I current have this regex for finding emails
\b[A-Z0-9._%-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b
How do I convert this search regex into a regex I can use in a find and replace operation in either Sublime Text or SED or Vim?
EDIT:
Just a note, I just realized I could replace all strings found by #[A-Z0-9.-]+\.[A-Z]{2,4}\b in this case, but academically I am still interested in how you could treat each section of the email regex as a token and replace the username / domain independently.
SublimeText
SublimeText uses Boost syntax, which supports quite a large subset of features in Perl regex. But for this task, you don't need all those advanced constructs.
Below are 2 possible approaches:
If you can assume that # doesn't appear in any other context (which is quite a fair assumption for normal text), then you can just search for the domain part #[A-Z0-9.-]+\.[A-Z]{2,4}\b and replace it.
If you use capturing groups (pattern) and backreference in replacement string.
Find what
\b([A-Z0-9._%-]+)#[A-Z0-9.-]+\.[A-Z]{2,4}\b
([A-Z0-9._%-]+) is the first (and only) capturing group in the regex.
Replace with
$1#fake.com
$1 refers to the text captured by the first capturing group.
Note that for both methods above, you need to turn off case-sensitivity (indicated as the 2nd button on the lower left corner), unless you specifically want to remove only emails written in ALL CAPS.
You may use the following command for Vim:
:%s/\(\<[A-Za-z0-9._%-]\+#\)[A-Za-z0-9.-]\+\.[A-Za-z]\{2,4}\>/\1fake.com/g
Everything between \( and \) will become a group that will be replaced by an escaped number of the group (\1 in this case). I've also modified the regexp to match the small letters and to have Vim-compatible syntax.
Also you may turn off the case sensitivity by putting \c anywhere in your regexp like this:
:%s/\c\(\<[A-Z0-9._%-]\+#\)[A-Z0-9.-]\+\.[A-Z]\{2,4}\>/\1fake.com/g
Please also note that % in the beginning of the line asks Vim to do the replacement in a whole file and g at the end to do multiple replacements in the same line.
One more approach is using the zero-width matching (\#<=):
:%s/\c\(\<[A-Z0-9._%-]\+#\)\#<=[A-Z0-9.-]\+\.[A-Z]\{2,4}\>/fake.com/g