Fetch Nth field from end of a string - regex

I want to write a regex to pull the nth field from the end of a string in splunk. Please let me know how to proceed.

Code
See regex in use here
(\S+)(?:\s+\S+){4}\s*$
Alternatively, you can also use the following:
See regex in use here
\S+(?=(?:\s+\S+){4}\s*$)
Explanation
Both methods use the same logic. The only difference is the first method captures the 5th element from the end of the string and matches the rest of the string while the second method matches the 5th element from the end of the string and ensures what follows the 5th element matches the pattern.
\S+ Match any non-whitespace character one or more times
(?:\s+\S+){4} Match the following exactly 4 times
\s+ Match any whitespace characters one or more times
\S+ Match any non-whitespace characters one or more times
\s* Match any whitespace characters any number of times
$ Assert position at the end of the line
Further explanations:
(\S+) Captures any non-whitespace character one or more times into a capture group
(?= ... ) where ... represents some pattern (in our case (?:\s+\S+){4}\s*$). This is a positive lookahead that ensures what follows matches without consuming any characters.

Related

Matching the second to the last word

I have the following line:
19 crooks highway Ontario
I want to match anything that comes before the last space (whitespace). So, in this case, I want to get back
highway
I tried this:
([^\s]+)
You may use a lookahead to do it:
\S+(?=\s\S*$)
\S+ any non-whitespace characters, repeat 1+ times (equivolent to [^\s]+)
(?=...) lookahead, the string after the previous match must satifies the condition inside
\s\S*$ any non-whitespace characters preceded by a whitespace, then the end of the string
See the test case
Use this:
\s+(\S+)\s+\S*$
Use $ anchor to match the end of the line
\s+\S*$ matches one or more spaces followed by zero or more non-whitespaces at the end of the line
The desired result is captured in the capturing group
Demo

Match all instances of a certain character inside every word preceded by a certain word and not delimited by a space

Given a string such as below:
word.hi. bla. word.
I want to construct a regex which will match all "."s preceded by "word" and any other non space character
So, in the above example I would want the the first, second and last dots to be matched.
While matching the first and last dots would be easy with global flag (/(?:word.*)\K./gU), I'm not sure how to construct a regex that would also match the second dot.
Appreciate any pointers.
You might match word and then get all consecutive matches using the \G anchor excluding matching whitespace chars or a dot.
(?:\bword|\G(?!\A))[^.\s]*\K\.
In parts
(?: Non capture group
\bword Match word preceded by a word boundary
| Or
\G(?!\A) Assert the position at the end of the previous match, not at the start
) Close non capture group
[^.\s]* Match 0+ occurrences of any char except . or a whitespace char
\K Clear the match buffer (forget what is matched until now)
\. Match a dot
Regex demo

Exception point from the beginning or from the end of the nickname

I have custom pattern for check facebook user nickname:
(?:https:\/\/)?(?:http:\/\/)?(?:www\.)?(?:facebook)\.com\/(?:\w*#!\/)?([\w-\.]+)
How can I exclude a point from the beginning or from the end of a nickname?
Example:
facebook.com/john.doe // correct
facebook.com/.john.doe // uncorrect started with dot (.)
facebook.com/john.doe. // uncorrect ended with dot (.)
One option is to replace the last part with \/\w+(?:\.\w+)*$.
That will match 1+ times a word character followed by a repeating group that will match 0+ times a dot and 1+ times a word character and assert the end of the string $
If there can be only 1 part with a dot following, than the * can be replaced by a ? to make it optional.
Regex demo
If it is not at the end of the string, you could use \/\w+(?:\.\w+)*(?!\S) using a negative lookahead to assert that what is on the right side is not a non whitespace character.
Regex demo
Note that (?:facebook) can be written without the grouping structure and the start could be written by just making the s optional (?:https?:\/\/)?. Depending on the delimiter you don't have to escape the forward slash.

How can i validate for special character at particular position in regexp

I have written Regexp #"^([a-zA-Z ]+[a-zA-Z0-9 ]*)$" it allows all the characters and numbers except special characters and first characters cannot be numbers. Now I have to allow '-' character anywhere except last and first character. How can i modify it.
You can use this:
#"^([a-zA-Z ]+[a-zA-Z0-9\- ]*[a-zA-Z0-9 ]+)$"
The first group allows only letters (at least one)
The second group allows any character (- included)
The last group allows only letters and numbers (at least one to
exclude any other character)
You can test it here
You could add a - to the second character class and add negative lookahead (?! to make sure the string does not end with -.
^(?!.*-$)([a-zA-Z ]+[a-zA-Z0-9 -]*)$
Explanation
^ Assert position at the start of the line
(?!.*-$) Negative lookahead to assert that the string does not end with -
( Capturing group
[a-zA-Z ]+ Match character class one or more times
[a-zA-Z0-9 -]* Match character class with - zero or more times
) Close capturing group
$ Assert position at the end of the line
Note
Your regex is inside a capturing group. If you don't use that group you might leave out the parenthesis:
^(?!.*-$)[a-zA-Z ]+[a-zA-Z0-9 -]*$

Finding words in a string that start with number (Regex)

I need to find words in a string that start with number(i.e digit)
In following string:
1st 2nd 3rd a56b 5th 6th ***7th
The words 1st 2nd 3rd 5th 6th should be returned.
I tried with the regex:
(\b[^ a-zA-Z ^ *]+(th|rd|st|nd))+
But this regex returns the words not starting with alphabets but can't handle the cases when word starts with special characters.
For the current string, you may use a pattern like
(?<!\S)\d+(?:th|rd|st|nd)\b
See the regex demo
The pattern matches:
(?<!\S) - a location at the start of a string or after a whitespace
\d+ - 1 or more digits
(?:th|rd|st|nd) - one of the four alternatives
\b - a word boundary.
If you plan to match any 0+ non-whitespace chars after a digit that is preceded with a whitespace or is at the start of a string, use
(?<!\S)\d\S*
where \S* will match any 0+ non-whitespace chars.
See this regex demo.
NOTE: In case the lookbehind is not supported, replace (?<!\S) with (?:^|\s) and also wrap the rest of the pattern with a capturing group to access the latter later:
(?:^|\s)(\d\S*)
and the value will be in Group 1.
To get word which is starting with number/digit and ending with th/st/nd/rd you can try this.
((?<!\S)(\d+)(th|rd|nd|st))
(?<!\S) detects the word's starting position
\d+ matches 1 or more digits
th|rd|st|nd matches one among those 4.
You can check it here