Using RegEx to grab a field in brackets - regex

I have multiple square bracketed data in the log file of a splunk log. I am attempting to find a particular field named UserDataGuid and then gather the data in the bracket after this. My only option seems to be regular expressions in a standard that seems similar to perl to me. Yet does not work what am I doing wrong here ?
| rex "\]\s(?<UserDataGuid>.*?)\s*$"
// this trial looks more promising but grabs the last bracket :( and doesn't name the field, to be used in a subSearch.
| rex "(?i)UserDataGuid\s*\[([^\}]*)\]
the data looks like this
[21] INFO UserDataGuid [fas08f0da-faf6-4308-aad6-hfld5643gs] [(null)] [(null)] [(null)]
and I want only the guid
fas08f0da-faf6-4308-aad6-hfld5643gs
and I would love for it to be a field I could reuse like fields are used in splunk.

It looks like you want
(?<=UserDataGuid\s\[)([^\]]*)

I'd try the following regex:
(?<=UserDataGuid \[).*?(?=\])/g
This will capture fas08f0da-faf6-4308-aad6-hfld5643gs. See a demo here.

With
\]\s(?<UserDataGuid>.*?)\s*$
you say: match a ] > \], follow by any space character (only one) > \s, follow by a group with name UserDataGuid > (?<UserDataGuid> ... ) that contains any character, except newline (zero times, to unlimited times) > .*? ( in lazy mode, ? ), follow by any space character (zero times, to unlimited times) > \s*, follow by end of string > $
I think that you don't want this (?<UserDataGuid> ... );
you want match (in some way) UserDataGuid, no call UserDataGuid at the group that match " any character, except newline (zero times, to unlimited times) > .*? ( in lazy mode, ? ) "
In
(?i)UserDataGuid\s*\[([^\}]*)\]
change the }, for a ], and then, you captured your GUID in group #1
but, you don't need match "UserDataGuid\s[*"
you could use:
(?<=UserDataGuid \[)([^\]]*)
and then, you only match the GUID, and find it in the group #1
you can remove the parenthesis of group #1, because is a full match:
(?<=UserDataGuid \[)[^\]]*
https://regex101.com/r/sI3kW4/1

Related

Regex: Remove empty sections from INI (Text) files where the files could contain multiple such sections

I have below resultant php.ini file, after I was able to remove the comments & spaces thro' some simple Regex find/replace steps:
[PHP]
engine = On
short_open_tag = Off
....
....
[CLI Server]
cli_server.color = On
[Date]
[filter]
[iconv]
[imap]
[intl]
[sqlite3]
[Pcre]
[Pdo]
[Pdo_mysql]
pdo_mysql.default_socket=
[Phar]
[mail function]
SMTP = localhost
smtp_port = 25
mail.add_x_header = Off
[ODBC]
....
....
[dba]
[opcache]
[curl]
curl.cainfo = "{PHP_Dir_Path}\extras\curl\cacert.pem"
[openssl]
[ffi]
As you can see, there are multiple occurrences where multiple empty sections(sections which doesn't contain any semicolon-less(non-commented) lines) in this file, and I can't bring myself to make a regex find/replace pattern that could let me remove all such empty sections in one go, so that it becomes like below:
[PHP]
engine = On
short_open_tag = Off
....
....
[CLI Server]
cli_server.color = On
[Pdo_mysql]
pdo_mysql.default_socket=
[mail function]
SMTP = localhost
smtp_port = 25
mail.add_x_header = Off
[ODBC]
....
....
[curl]
curl.cainfo = "{PHP_Dir_Path}\extras\curl\cacert.pem"
Can anyone help me out achieve, what I need ?
An idea to look ahead after lines starting with [ for another opening bracket (or end).
^\[.*+\s*+(?![^\[])
Here is the demo at regex101 - If using NP++ uncheck: [ ] . dot matches newline
^ line start (NP++ default)
\[ matches an opening bracket
.*+ any amount of any characters besides newline (without giving back)
\s*+ any amount of whitespace (also possessive to reduce backtracking)
(?! negative lookahead to fail on the defined condition ) which is:
[^\[] a character that is not an opening bracket
In short words it matches lines starting with [ up to eol and any amount of whitespace...
if there is either no character ahead or the next character is another [ opening bracket.
Side note: Its positive equivalent is ^\[.*+\s*+(?=\[|\z) where \z matches end of string.
You can try to match if there is a ] followed by a new line and then a [, with the following regex:
\]\n\[
EDIT:
As pointed by your comment, that would just get the ][ characters, so you could try this instead:
(\[(\w)+\]\n)(?!\w)
This will match a title that is not followed by a word in the next line.
EDIT2:
My previous answer would not get the last section if it was empty, so I changed it to check the newline OR end of file.
(\[(\w)+\])(\n(?!\w)|$)
You need to tell your regex-engine to use the single-line aka "dotall" mode. Then you can easily pick out any bracketed strings that are only separated by a newline:
/\[[^\]]+\]\s\[[^\]]+\]/gs
The s flag enables "dotall" mode.
Update: Overlooked one obvious problem with my solution. It gets a bit more complicated now, using a lookahead (?:\s(?=\[)). Also extra caution needs to be taken to capture the last empty section, which is done with the |$ part. Regexr link updated...
/\[[^\]]+\](?:\s(?=\[)|$)/gs

Regex for SQL Query

Hello together I have the following problem:
I have a long list of SQL queries which I would like to adapt to one of my changes. Finally, I have a renaming problem and I'm afraid I want to solve it more complicated than expected.
The query looks like this:
INSERT member (member, prename, name, street, postalcode, town, tel1, tel2, fax, bem, anrede, salutation, email, name2, name3, association, project) VALUES (2005, N'John', N'Doe', N'Street 4711', N'1234', N'Town', N'1234-5678', N'1234-5678', N'1234-5678', N'Leader', NULL, N'Dear Mr. Doe', N'a#b.com', N'This is the text i want to delete', N'Name2', N'Name3', NULL, NULL);
In the "Insert" there was another column which I removed (which I did simply via Notepad++ by typing the search term - "example, " - and replaced it with an empty field. Only the following entry in Values I can't get out using this method, because the text varies here. So far I have only worked with the text file in which I adjusted the list of queries.
So as you can see there is one more entry in Values than in the insertions (there was another column here, but it was removed by my change).
It is the entry after the email address. I would like to remove this including the comma (N'This is the text i want to delete',).
My idea was to form a group and say that the 14th digit after the comma should be removed. However, even after research I do not know how to realize this.
I thought it could look like this (tried in https://regex101.com/)
VALUES\s?\((,) something here
Is this even the right approach or is there another method? I only knew Regex to solve this problem, because of course the values look different here.
And how can I finally use the regex to get the queries adapted (because the queries are local to my computer and not yet included in the code).
Short summary:
Change the query from
VALUES (... test5, test6, test7 ...)
To
VALUES (... test5, test7 ...)
As per my comment, you could use find/replace, where you search for:
(\bVALUES +\((?:[^,]+,){13})[^,]+,
And replace with $1
See the online demo
( - Open 1st capture group.
\bValues +\( - Match a word-boundary, literally 'VALUES', followed by at least a single space and a literal open paranthesis.
(?: - Open non-capturing group.
[^,]+, - Match anything but a comma at least once followed by a comma.
){13} - Close non-capture group and repeat it 13 times.
) - Close 1st capture group.
[^,]+, - Match anything but a comma at least once followed by a comma.
You may use the following to remove / replace the value you need:
Find What: \bVALUES\s*\((\s*(?:N'[^']*'|\w+))(?:,(?1)){12}\K,(?1)
Replace With: (empty string, or whatever value you need)
See the regex demo
Details
\bVALUES - whole word VALUES
\s* - 0+ whitespaces
\( - a (
(\s*(?:N'[^']*'|\w+)) - Group 1: 0+ whitespaces and then either N' followed with any 0 or more chars other than ' and then a ', or 1+ word chars
(?:,(?1)){12} - twelve repetitions of , followed with the Group 1 pattern
\K - match reset operator that discards the text matched so far from the match memory buffer
, - a comma
(?1) - Group 1 pattern.
Settings screen:

Regex which grabs everything between two characters at the end of a line

I'm looking to create a regex which grabs the text between two ":"s but only if it is the "last set", for example:
\--- org.codehaus.groovy.modules.http-builder:http-builder:0.7.1
should return:
http-builder
It should be noted that it's possible to get something like:
\--- org::codehaus::groovy::modules::http-builder:http-builder:0.7.1
because the input does not necessarily follow conventions (based on the problem at hand) but the required information is ALWAYS in the last two ":"s.
I've tried some of the following (minus the end of line):
1) (?<=\:).*(?=\:)
2) [^(.*:)].*[^(:.*)]
3) :.*: (this was the most successful, although I got the ":"s with the result but there are issues when there is more than one set of ":"s)
Futher information:
I need to use Groovy for this
I can read it using a stream or a file (in case that matters)
Thanks for reading and any help!
:([^:]*):[^:]*$
That means:
Sequence must start with a :
Then start capturing (
Capture all characters that are not colons [^:]*
End capturing ) ...
... at the next colon :
Then there's another sequence of chars [^:]*
And after that sequence the line must end $ (no more sequence)
Or if you can use non-greedy matches, you can also use
:(.*?):[^:]*$
.* means capture as many characters as possible, while .*? means capture as little characters as possible. Not all regex implementation support that, though.
How about splitting on the : and grabbing the next-to-last segment?
['org.codehaus.groovy.modules.http-builder:http-builder:0.7.1',
/\--- org::codehaus::groovy::modules::http-builder:http-builder:0.7.1/].each { line ->
assert 'http-builder' == line.split(':')[-2]
}

Search with regular expression in Sublime Text 2

I want to create a rule to remove array( and ) from this text:
"price"=> array(129),
to get:
"price"=> 129,
I tried this expression without success:
(?<="price"=>\s*)array\((?=\d*)\)(?=,)
Then I decided to made replacement in 2 steps. Firstly, I removed array(:
(?<="price"=>\s\s\s\s\s)array\(
And got:
"price"=> 129),
So I had to remove only a closing parenthesis ). I tried without success:
(?<="price"=>\s*\d*)\)(?=,)
This works, but only for a known number of whitespaces and digits:
(?<="price"=>\s\s\s\s\s\d\d\d)\)(?=,)
Try this for the find:
("price"=>\s+)array\((\d+)\)
and this for the replace:
\1\2
you can match whole line with this
\"price"[^a)]+(array\()\d+(\),)
it contains one group for "array(" and another for "),"
Try this:
(?:(?<=\"price\"=>\s*)array\((?=\d+\)))|(?<=\"price\"=>\s*array\(\d+)\)
The regex consists mainly two parts (the pipe in the middle is an alternation symbol which means if the first part doesn't match it should look for the second part).
The first part checks if array( is preceded by "price"=> ... and is succeded by ) by using the look-behind (?<= ... ) and look-ahead (?= ... ) symbol respectively.
(?:(?<=\"price\"=>\s*)array\((?=\d+\)))
Then we have a pipe (explained above)..
|
The second part checks if ) is preceded by everything we've matched before ("price"=> array(129) also using the look-behind symbol (<= ... ):
(?<=\"price\"=>\s*array\(\d+)\)
Thus for the string "price"=> array(129), the result should be two matches: array( and ).
Please let me know if this works for you.

Regular expression help - comma delimited string

I don't write many regular expressions so I'm going to need some help on the one.
I need a regular expression that can validate that a string is an alphanumeric comma delimited string.
Examples:
123, 4A67, GGG, 767 would be valid.
12333, 78787&*, GH778 would be invalid
fghkjhfdg8797< would be invalid
This is what I have so far, but isn't quite right: ^(?=.*[a-zA-Z0-9][,]).*$
Any suggestions?
Sounds like you need an expression like this:
^[0-9a-zA-Z]+(,[0-9a-zA-Z]+)*$
Posix allows for the more self-descriptive version:
^[[:alnum:]]+(,[[:alnum:]]+)*$
^[[:alnum:]]+([[:space:]]*,[[:space:]]*[[:alnum:]]+)*$ // allow whitespace
If you're willing to admit underscores, too, search for entire words (\w+):
^\w+(,\w+)*$
^\w+(\s*,\s*\w+)*$ // allow whitespaces around the comma
Try this pattern: ^([a-zA-Z0-9]+,?\s*)+$
I tested it with your cases, as well as just a single number "123". I don't know if you will always have a comma or not.
The [a-zA-Z0-9]+ means match 1 or more of these symbols
The ,? means match 0 or 1 commas (basically, the comma is optional)
The \s* handles 1 or more spaces after the comma
and finally the outer + says match 1 or more of the pattern.
This will also match
123 123 abc (no commas) which might be a problem
This will also match 123, (ends with a comma) which might be a problem.
Try the following expression:
/^([a-z0-9\s]+,)*([a-z0-9\s]+){1}$/i
This will work for:
test
test, test
test123,Test 123,test
I would strongly suggest trimming the whitespaces at the beginning and end of each item in the comma-separated list.
You seem to be lacking repetition. How about:
^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$
I'm not sure how you'd express that in VB.Net, but in Python:
>>> import re
>>> x [ "123, $a67, GGG, 767", "12333, 78787&*, GH778" ]
>>> r = '^(?:[a-zA-Z0-9 ]+,)*[a-zA-Z0-9 ]+$'
>>> for s in x:
... print re.match( r, s )
...
<_sre.SRE_Match object at 0xb75c8218>
None
>>>>
You can use shortcuts instead of listing the [a-zA-Z0-9 ] part, but this is probably easier to understand.
Analyzing the highlights:
[a-zA-Z0-9 ]+ : capture one or more (but not zero) of the listed ranges, and space.
(?:[...]+,)* : In non-capturing parenthesis, match one or more of the characters, plus a comma at the end. Match such sequences zero or more times. Capturing zero times allows for no comma.
[...]+ : capture at least one of these. This does not include a comma. This is to ensure that it does not accept a trailing comma. If a trailing comma is acceptable, then the expression is easier: ^[a-zA-Z0-9 ,]+
Yes, when you want to catch comma separated things where a comma at the end is not legal, and the things match to $LONGSTUFF, you have to repeat $LONGSTUFF:
$LONGSTUFF(,$LONGSTUFF)*
If $LONGSTUFF is really long and contains comma repeated items itself etc., it might be a good idea to not build the regexp by hand and instead rely on a computer for doing that for you, even if it's just through string concatenation. For example, I just wanted to build a regular expression to validate the CPUID parameter of a XEN configuration file, of the ['1:a=b,c=d','2:e=f,g=h'] type. I... believe this mostly fits the bill: (whitespace notwithstanding!)
xend_fudge_item_re = r"""
e[a-d]x= #register of the call return value to fudge
(
0x[0-9A-F]+ | #either hardcode the reply
[10xks]{32} #or edit the bitfield directly
)
"""
xend_string_item_re = r"""
(0x)?[0-9A-F]+: #leafnum (the contents of EAX before the call)
%s #one fudge
(,%s)* #repeated multiple times
""" % (xend_fudge_item_re, xend_fudge_item_re)
xend_syntax = re.compile(r"""
\[ #a list of
'%s' #string elements
(,'%s')* #repeated multiple times
\]
$ #and nothing else
""" % (xend_string_item_re, xend_string_item_re), re.VERBOSE | re.MULTILINE)
Try ^(?!,)((, *)?([a-zA-Z0-9])\b)*$
Step by step description:
Don't match a beginning comma (good for the upcoming "loop").
Match optional comma and spaces.
Match characters you like.
The match of a word boundary make sure that a comma is necessary if more arguments are stacked in string.
Please use - ^((([a-zA-Z0-9\s]){1,45},)+([a-zA-Z0-9\s]){1,45})$
Here, I have set max word size to 45, as longest word in english is 45 characters, can be changed as per requirement