Regular expression to remove syslog date in filebeat? - regex

I would like to parse some syslog lines that they look like
Oct 20 16:34:59 artguard TTN-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I would like to turn them into
TTN-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
So I was wondering how the regular expression should look like that would allow me to do so, since the first part will change every day, because it is appended by the syslog.
EDIT: to avoid duplicated, I am trying to use REGEX with filebeat, where no all regex are supported as explained here

Regex101
(TTN-.*$)
Debuggex Demo
Explained
1st Capturing Group (TTN-.*$)
TTN- matches the characters TTN- literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

The regular expression TTN-\S* is probably a way of doing what you're looking for, here it is in a java-script example.
var value = "Oct 20 16:34:59 artguard TTN-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
var matches = value.match(
new RegExp("TTN-\\S*", "gi")
);
document.writeln(matches);
It works in two main parts:
The TTN- matches TTN- (obviously)
The \S* matches any character that is not a white-space, this is done as many times as possible.
Currently it is always expecting atleas a '-' after the TTN but if you repace the '-' with a '-{01}' in the regex it will expect TNN maybe a dash followed by 0-n characters that are not a white-space. You could also replace \S* with \w* to get all the letters and digits or .* to get all characters apart from end of line /n character, TNN-\S*[^\s{2}] too end the match with two spaces. Hope this was helpful.

Related

What is the way to combine two regexes? [duplicate]

I want to design an expression for not allowing whitespace at the beginning and at the end of a string, but allowing in the middle of the string.
The regex I've tried is this:
\^[^\s][a-z\sA-Z\s0-9\s-()][^\s$]\
This should work:
^[^\s]+(\s+[^\s]+)*$
If you want to include character restrictions:
^[-a-zA-Z0-9-()]+(\s+[-a-zA-Z0-9-()]+)*$
Explanation:
the starting ^ and ending $ denotes the string.
considering the first regex I gave, [^\s]+ means at least one not whitespace and \s+ means at least one white space. Note also that parentheses () groups together the second and third fragments and * at the end means zero or more of this group.
So, if you take a look, the expression is: begins with at least one non whitespace and ends with any number of groups of at least one whitespace followed by at least one non whitespace.
For example if the input is 'A' then it matches, because it matches with the begins with at least one non whitespace condition. The input 'AA' matches for the same reason. The input 'A A' matches also because the first A matches for the at least one not whitespace condition, then the ' A' matches for the any number of groups of at least one whitespace followed by at least one non whitespace.
' A' does not match because the begins with at least one non whitespace condition is not satisfied. 'A ' does not matches because the ends with any number of groups of at least one whitespace followed by at least one non whitespace condition is not satisfied.
If you want to restrict which characters to accept at the beginning and end, see the second regex. I have allowed a-z, A-Z, 0-9 and () at beginning and end. Only these are allowed.
Regex playground: http://www.regexr.com/
This RegEx will allow neither white-space at the beginning nor at the end of your string/word.
^[^\s].+[^\s]$
Any string that doesn't begin or end with a white-space will be matched.
Explanation:
^ denotes the beginning of the string.
\s denotes white-spaces and so [^\s] denotes NOT white-space. You could alternatively use \S to denote the same.
. denotes any character expect line break.
+ is a quantifier which denote - one or more times. That means, the character which + follows can be repeated on or more times.
You can use this as RegEx cheat sheet.
In cases when you have a specific pattern, say, ^[a-zA-Z0-9\s()-]+$, that you want to adjust so that spaces at the start and end were not allowed, you may use lookaheads anchored at the pattern start:
^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$
^^^^^^^^^^^^^^^^^^^^
Here,
(?!\s) - a negative lookahead that fails the match if (since it is after ^) immediately at the start of string there is a whitespace char
(?![\s\S]*\s$) - a negative lookahead that fails the match if, (since it is also executed after ^, the previous pattern is a lookaround that is not a consuming pattern) immediately at the start of string, there are any 0+ chars as many as possible ([\s\S]*, equal to [^]*) followed with a whitespace char at the end of string ($).
In JS, you may use the following equivalent regex declarations:
var regex = /^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$/
var regex = /^(?!\s)(?![^]*\s$)[a-zA-Z0-9\s()-]+$/
var regex = new RegExp("^(?!\\s)(?![^]*\\s$)[a-zA-Z0-9\\s()-]+$")
var regex = new RegExp(String.raw`^(?!\s)(?![^]*\s$)[a-zA-Z0-9\s()-]+$`)
If you know there are no linebreaks, [\s\S] and [^] may be replaced with .:
var regex = /^(?!\s)(?!.*\s$)[a-zA-Z0-9\s()-]+$/
See the regex demo.
JS demo:
var strs = ['a b c', ' a b b', 'a b c '];
var regex = /^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$/;
for (var i=0; i<strs.length; i++){
console.log('"',strs[i], '"=>', regex.test(strs[i]))
}
if the string must be at least 1 character long, if newlines are allowed in the middle together with any other characters and the first+last character can really be anyhing except whitespace (including ##$!...), then you are looking for:
^\S$|^\S[\s\S]*\S$
explanation and unit tests: https://regex101.com/r/uT8zU0
This worked for me:
^[^\s].+[a-zA-Z]+[a-zA-Z]+$
Hope it helps.
How about:
^\S.+\S$
This will match any string that doesn't begin or end with any kind of space.
^[^\s].+[^\s]$
That's it!!!! it allows any string that contains any caracter (a part from \n) without whitespace at the beginning or end; in case you want \n in the middle there is an option s that you have to replace .+ by [.\n]+
pattern="^[^\s]+[-a-zA-Z\s]+([-a-zA-Z]+)*$"
This will help you accept only characters and wont allow spaces at the start nor whitespaces.
This is the regex for no white space at the begining nor at the end but only one between. Also works without a 3 character limit :
\^([^\s]*[A-Za-z0-9]\s{0,1})[^\s]*$\ - just remove {0,1} and add * in order to have limitless space between.
As a modification of #Aprillion's answer, I prefer:
^\S$|^\S[ \S]*\S$
It will not match a space at the beginning, end, or both.
It matches any number of spaces between a non-whitespace character at the beginning and end of a string.
It also matches only a single non-whitespace character (unlike many of the answers here).
It will not match any newline (\n), \r, \t, \f, nor \v in the string (unlike Aprillion's answer). I realize this isn't explicit to the question, but it's a useful distinction.
Letters and numbers divided only by one space. Also, no spaces allowed at beginning and end.
/^[a-z0-9]+( [a-z0-9]+)*$/gi
I found a reliable way to do this is just to specify what you do want to allow for the first character and check the other characters as normal e.g. in JavaScript:
RegExp("^[a-zA-Z][a-zA-Z- ]*$")
So that expression accepts only a single letter at the start, and then any number of letters, hyphens or spaces thereafter.
use /^[^\s].([A-Za-z]+\s)*[A-Za-z]+$/. this one. it only accept one space between words and no more space at beginning and end
If we do not have to make a specific class of valid character set (Going to accept any language character), and we just going to prevent spaces from Start & End, The must simple can be this pattern:
/^(?! ).*[^ ]$/
Try on HTML Input:
input:invalid {box-shadow:0 0 0 4px red}
/* Note: ^ and $ removed from pattern. Because HTML Input already use the pattern from First to End by itself. */
<input pattern="(?! ).*[^ ]">
Explaination
^ Start of
(?!...) (Negative lookahead) Not equal to ... > for next set
Just Space / \s (Space & Tabs & Next line chars)
(?! ) Do not accept any space in first of next set (.*)
. Any character (Execpt \n\r linebreaks)
* Zero or more (Length of the set)
[^ ] Set/Class of Any character expect space
$ End of
Try it live: https://regexr.com/6e1o4
^[^0-9 ]{1}([a-zA-Z]+\s{1})+[a-zA-Z]+$
-for No more than one whitespaces in between , No spaces in first and last.
^[^0-9 ]{1}([a-zA-Z ])+[a-zA-Z]+$
-for more than one whitespaces in between , No spaces in first and last.
Other answers introduce a limit on the length of the match. This can be avoided using Negative lookaheads and lookbehinds:
^(?!\s)([a-zA-Z0-9\s])*?(?<!\s)$
This starts by checking that the first character is not whitespace ^(?!\s). It then captures the characters you want a-zA-Z0-9\s non greedily (*?), and ends by checking that the character before $ (end of string/line) is not \s.
Check that lookaheads/lookbehinds are supported in your platform/browser.
Here you go,
\b^[^\s][a-zA-Z0-9]*\s+[a-zA-Z0-9]*\b
\b refers to word boundary
\s+ means allowing white-space one or more at the middle.
(^(\s)+|(\s)+$)
This expression will match the first and last spaces of the article..

How to select all whitespaces with certain interval

I'd let use regex to find all whitespaces after given number of interval.
for example, if interval is 3 for following string the result should be
string = "this is a test for regex"
string = "this(select)is a(select)test(select)for(select)regex"
the whitespace after is should not be selected since the interval is 3 and the length of is only 2
I did this ^(?=.{3,})\s$ but no luck. Thank you
Depending on your regex engine, you can use either of the following:
\K Reset Method
If your engine supports \K.
See regex in use here
.{3,}?\K\s+
This method matches any character 3 or more times (but as few as possible), then resets the pattern's match, then matches one or more whitespace characters.
Capture Group Method
See regex in use here
(.{3,}?)\s+
Replace with $1
This method captures any character 3 or more times (but as few as possible), then matches one or more whitespace characters. You would then replace the matches with first capture group's match.
The ? that follows a quantifier (in the cases above {3,}) causes it to match in a lazy manner, meaning that once it satisfies at least 3 matches and finds a whitespace character, it'll stop (this prevents it from matching the whole line up to the last space).
The \K token resets the pattern's match. This means that nothing preceding this toke will be captured in the overall match (resulting in only the whitespace characters being matched)
This would do it:
.{3,}?(\s)
.{3,}? - lazily match any 3 chars
(\s) - capture the whitespace which follows
Your desired spaces would be in $1.
https://regex101.com/r/7iWjQO/1

I need a regular expression to grab data from a string, up to a comma

I need a regular expression to grab data from a string, up to a comma. However, I need to make sure that if the string doesn't have a comma that I still grab the whole string.
Example: I need what is capitalized in the string below
"THIS IS THE FIRST PART, this is the second part"
"THIS IS THE ONLY PART"
Us the full match (typically $&) of /^[^,]*/ or use group match 1 of /^([^,]*)/
You can try this pattern:
^.+?(,|$)
Of if you really don't want to match the comma:
^.+?(?=,|$)
https://regex101.com/r/2g39yQ/1
Try
^(.*?)(?=,|$)
^ asserts position at start of a line
.*? matches any character (except for line terminators)
*? Quantifier — Matches between zero and unlimited times, as few
times as possible, expanding as needed (lazy)
Positive Lookahead (?=,|$)
Assert that the Regex below matches
1st Alternative ,
, matches the character , literally (case sensitive)
2nd Alternative $
$ asserts position at the end of a line
If only matching is needed (no capture), remove parentheses around .*?, i.e. ^.*?(?=,|$).
Here at regex101.

How to combine lines in regular expressions?

So i am new to regular expressions and i am learning them using a simple text editor only. I have the following file
84544484N
32343545M
32334546E
34456434M
I am trying to combine each pair of lines into one tab delimited line
The result should be :
84544484N 32343545M
32334546E 34456434M
I wrote the following :
Search: (.*?)\n(.*?)
Replace: \1\t\2
this did not work can someone please explain why and give me the correct solution. Thank you!!
The (.*?)\n(.*?) pattern will never work well because the (.*?) at the end of the pattern will always return an empty string (since *? is a lazy matching quantifier and if it can return zero characters (and it can) it will. Use greedy matching and adjust the pattern like:
(.+)\r?\n *(.*)
or - since SublimeText uses Boost regex - you can match any newline sequence with \R:
(.+)\R *(.*)
and replace with \1\t\2. Note I replaced *? with + in the first capturing group because you need to match non-empty lines.
Regex breakdown:
(.+) - one or more characters other than a newline (as many as possible) up to
\R - a newline sequence (\r\n, \r or just \n)
* - a literal space, zero or more occurrences
(.*) - Group 2: zero or more characters other than a newline (as many as possible)
/

Regular Expression that matches a word, or nothing

I'm really struggling to put a label on this, which is probably why I was unable to find what I need through a search.
I'm looking to match the following:
Auto Reply
Automatic Reply
AutomaticReply
The platform that I'm using doesn't allow for the specification of case-insensitive searches. I tried the following regular expression:
.*[aA]uto(?:matic)[ ]*[rR]eply.*
Thinking that (?:matic) would cause my expression to match Auto or Automatic. However, it is only matching Automatic.
What am I doing wrong?
What is the proper terminology here?
This is using Perl for the regular expression engine (I think that's PCRE but I'm not sure).
(?:...) is to regex patterns as (...) is to arithmetic: It simply overrides precedence.
ab|cd # Matches ab or cd
a(?:b|c)d # Matches abd or acd
A ? quantifier is what makes matching optional.
a? # Matches a or an empty string
abc?d # Matches abcd or abd
a(?:bc)?d # Matches abcd or ad
You want
(?:matic)?
Without the needless leading and trailing .*, we get the following:
/[aA]uto(?:matic)?[ ]*[rR]eply/
As #adamdc78 points out, that matches AutoReply. This can be avoided as using the following:
/[aA]uto(?:matic[ ]*|[ ]+)[rR]eply/
or
/[aA]uto(?:matic|[ ])[ ]*[rR]eply/
This should work:
/.*[aA]uto(?:matic)? *[rR]eply/
you were simply missing the ? after (?:matic)
[Aa]uto(?:matic ?| )[Rr]eply
This assumes that you do not want AutoReply to be a valid hit.
You're just missing the optional ("?") in the regex. If you're looking to match the entire line after the reply, then including the .* at the end is fine, but your question didn't specify what you were looking for.
You can use this regex with line start/end anchors:
^[aA]uto(?:matic)? *[rR]eply$
Explanation:
^ assert position at start of the string
[aA] match a single character present in the list below
aA a single character in the list aA literally (case sensitive)
uto matches the characters uto literally (case sensitive)
(?:matic)? Non-capturing group
Quantifier: Between zero and one time, as many times as possible, giving back as needed
[greedy]
matic matches the characters matic literally (case sensitive)
* matches the character literally
Quantifier: Between zero and unlimited times, as many times as possible, giving back
as needed [greedy]
[rR] match a single character present in the list below
rR a single character in the list rR literally (case sensitive)
eply matches the characters eply literally (case sensitive)
$ assert position at end of the string
Slightly different. Same result.
m/([aA]uto(matic)? ?[rR]eply)/
Tested Against:
Some other stuff....
Auto Reply
Automatic Reply
AutomaticReply
Now some similar stuff that shouldn't match (auto).