Extracting a field from a path [duplicate] - regex

This question already has answers here:
Regex for extracting part of a file path
(2 answers)
Closed 9 months ago.
SPLUNK REGEX
we have some data that contains a hierarchy of folders that we want to extract from the source path, the raw data looks like this :
source= /usr/local/intranet/areas/ua1/output/MUN
we would like to create 2 Splunk inline regex to extract the "intranet" and the "output"
Can someone please help
Thanks

I'm not sure if this is the kind of job you want to solve with regex, to be honest. You're probably better off string parsing it by splitting then selecting what you need from the array?
Regardless, you could likely just do something like this:
Input:
/usr/local/intranet/areas/ua1/output/MUN
Regex:
(?:\/.+?){2}\/([^\/]+)(?:\/.+?){2}\/([^\/]+)
Then group 1 will match intranet and group 2 will match output.
(?:\/.+?){2} - Match a forward-slash followed by whatever characters twice. So this matches /usr/local.
\/([^\/]+) - Match a forward-slash, then in a group match the contents of the section until the next slash. This matches /intranet but stores intranet in group 1.
Then we just repeat this to get the next segment you want as well.
If you need 2 separate patterns or want to match only the relevant part instead of putting them in groups, then like this, assuming the global modifier is disabled:
(?<=\/)[^\/]+(?=(?:\/[^\/]+){4}$) - To match intranet.
(?<=\/)[^\/]+(?=\/[^\/]+$) - To match output
This one uses a look-behind (?<=…) and look-ahead (?=…) instead of matching, so it's not included in the final result, no groups needed. Otherwise, it follows mostly the same logic.

Related

Can I use negative lookahead and other conditions together in regex group?

I'm trying to match some URLs against another table using regex and - because the original source wasn't put together properly, I'm using a regex to clean them within the SQL.
As an example, the URLs might be /this-is-my-test-string/ or /this-is-my-test-string and the reference table is always of the form /this-is-my-test-string so using this regex works well to capture the matching part.
(\/[^\/)]*)\/?
However I've now come across some others with the form /this-is-my-test-string- and /this-is-my-test-string-/ which aren't as straightforward - I can't just add - to the exclusion as it's present in the rest of the string. From reading around - regex is not something I use regularly - a lookahead would seem to be the answer, but I can't work out how to include this in the expression.
Any help would be gratefully received.
You can use $ to anchor the end of the string, and use a non-greedy quantifier *? on the non-slash character set to allow -? to match a - from (or near) the end of the string:
(\/[^\/)]*?)-?\/?$

Find and replace a Regex pattern occurring more than once [duplicate]

This question already has answers here:
How can I match overlapping strings with regex?
(6 answers)
Matching when an arbitrary pattern appears multiple times
(1 answer)
Closed 2 years ago.
I'm trying to find-and-replace instances where consecutive commas appear throughout a string; replacing them w/ something like ",N/A,". I was using a very simple /,,/g pattern, and that works on things like ",,abc" and ",,,,abc" (with even numbers of commas). However, it doesn't catch things like ",,,abc". That's because the first two commas are considered a match, and then the third comma is just considered part of a new ",abc" string. Is there a way to handle this w/ a RegEx pattern or options? Otherwise, I'm going to need to perform multiple searches.
FWIW - I'm working in JavaScript, but I'm guessing this is just a general RegEx question/answer.
The reason why /,,/g only matches once with three commas is because the global match restarts after the position of the final consumed characters. You need a way to match the pattern of ,, without consuming those characters for pattern matching purposes.
If your language supports it, use a positive lookahead. A positive lookeahead lets a regex match some additional characters, but not consume them in the pattern.
/,(?=,)/g
In English, this means:
, # match a comma, then
(?= #start a group that must exist, and if so, isn't consumed by the pattern,
, # a comma
)
See more about this here: https://www.regular-expressions.info/lookaround.html
Javascript supports positive lookahead. :)

Regex - How to exclude matches without look-behind? [duplicate]

This question already has answers here:
How to negate specific word in regex? [duplicate]
(12 answers)
Closed 3 years ago.
I'm trying to scan all attributes from a database, searching for specific patterns and ignoring similar ones that I know should not match but I'm having some problems as in the below example:
Let's say I'm trying to find Customer Registration Numbers and one of my patterns is this:
.*CRN.*
Then I'm ignoring everything that are not CRNs (like currency and country name) like this:
(CRN)(?!CY|AME)
So far everything is working fine as look ahead is included in Javascript
The next step is to exclude things like SCRN (screen) for example but look behind (?<!S)(CRN)(?!CY|AME) doesn't work.
Is there any alternative?
Example inputs:
CREDIT_CARD
DISCARD
CARDINALITY
CARDNO
My Regex (?!.*DISCARD.*|.*CARDINALITY.*).*CARD.*
CARDINALITY was removed but DISCARD still being considered :(
The regex that you want is:
(?!\b(?:CARDINALITY|DISCARD)\b)(\b\w*CARD\w*\b)
It is important that you are testing the negative lookahead against the entire word and thus we are trying to match (\b\w*CARD\w*\b) rather than just CARD. The problem with the following regex:
(?!(?:CARDINALITY|DISCARD))CARD
is that with the case of DISCARD, when the scan is at the character position where CARD begins, we are past DIS and you would need a negative lookbehind condition to eliminate DISCARD from consideration. But when we are trying to match the complete word as we are in the regex I propose, we are still at the start of the word when we are applying the negative lookahead conditions.
Regex Demo (click on "RUN TESTS")

regex to find files containing one word but not another [duplicate]

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
The community reviewed whether to reopen this question 5 months ago and left it closed:
Original close reason(s) were not resolved
I am trying to quickly find all .java files which contain one term but are missing another term. I'm using MyEclipse 10.7 and its 'Search | File Search' feature, which supports regular expressions.
Will regex work in this scenario? What would the correct regex be?
The only solution I could find to work is the following Regex:
^(?!.[\s\S]*MISSING_TERM).[\s\S]*INCLUDED_TERM.*$
It finds every file which includes INCLUDED_TERM but lacks MISSING_TERM, regardless of the line.
The key is the \s\S, which ensures the whole file is searched and not each line.
If you want to find it on a single line, use it like this:
^(?!.*MISSING_TERM).*INCLUDED_TERM.*$
You can also use \ as an escape character, cause you may need it like class\.variable.
You could use something like:
(?<!.*bar)foo(?!.*bar)
Will match if "foo" is found but "bar" is not.
Notice: you must configure your search engine to use multiline regex (EX: Notepad++ has an option called ". matches newline") because usually the dot represent any character except line break.
(?m)\A(?=.*REGEX_TO_FIND)(?!.*MISSING_REGEX.*).*\z
The regex can get kinda tricky but it breaks down into two pieces.
Find the matching term/phrase/word. This part isn't too tricky as this is what regex normally looks for.
Finding the term not present. This is the tricky part, but it's possible.
I have an example HERE which shows how you want to find the word connectReadOnly in the text, and fail to find disconnect. Since the text contains connectReadOnly it starts looking for the next piece, not finding disconnect. Since disconnect is in the text it fails on the entire string (what you will need for your entire file to match). If you play around with the second piece, the negation part (?!.*disconnect.*), you can set that as whatever regex you need. In my example I don't want to find disconnect anywhere in my code :) You can easily replace that with your word to search on, or even a more complex regex to "not find".
The key is to use multi line mode, which is set using the beginning (?m) and then using the start/end of string chars. Using ^ and $ to start/end a line, where \A and \z start and end a string, thus extending the match over the entire file.
EDIT: For the connectReadOnly and disconnect question use: (?m)\A(?=.*connectReadOnly)(?!.*disconnect.*).*\z. The updated example can be found here.

Regex - How to search for singular or plural version of word [duplicate]

This question already has answers here:
Regex search and replace with optional plural
(4 answers)
Closed 6 years ago.
I'm trying to do what should be a simple Regular Expression, where all I want to do is match the singular portion of a word whether or not it has an s on the end. So if I have the following words
test
tests
EDIT: Further examples, I need to this to be possible for many words not just those two
movie
movies
page
pages
time
times
For all of them I need to get the word without the s on the end but I can't find a regular expression that will always grab the first bit without the s on the end and work for both cases.
I've tried the following:
([a-zA-Z]+)([s\b]{0,}) - This returns the full word as the first match in both cases
([a-zA-Z]+?)([s\b]{0,}) - This returns 3 different matching groups for both words
([a-zA-Z]+)([s]?) - This returns the full word as the first match in both cases
([a-zA-Z]+)(s\b) - This works for tests but doesn't match test at all
([a-zA-Z]+)(s\b)? - This returns the full word as the first match in both cases
I've been using http://gskinner.com/RegExr/ for trying out the different regex's.
EDIT: This is for a sublime text snippet, which for those that don't know a snippet in sublime text is a shortcut so that I can type say the name of my database and hit "run snippet" and it will turn it into something like:
$movies= $this->ci->db->get_where("movies", "");
if ($movies->num_rows()) {
foreach ($movies->result() AS $movie) {
}
}
All I need is to turn "movies" into "movie" and auto inserts it into the foreach loop.
Which means I can't just do a find and replace on the text and I only need to take 60 - 70 words into account (it's only running against my own tables, not every word in the english language).
Thanks!
- Tim
Ok I've found a solution:
([a-zA-Z]+?)(s\b|\b)
Works as desired, then you can simply use the first match as the unpluralized version of the word.
Thanks #Jahroy for helping me find it. I added this as answer for future surfers who just want a solution but please check out Jahroy's comment for more in depth information.
For simple plurals, use this:
test(?=s| |$)
For more complex plurals, you're in trouble using regex. For example, this regex
part(y|i)(?=es | )
will return "party" or "parti", but what you do with that I'm not sure
Here's how you can do it with vi or sed:
s/\([A-Za-z]\)[sS]$/\1
That replaces a bunch of letters that end with S with everything but the last letter.
NOTE:
The escape chars (backslashes before the parens) might be different in different contexts.
ALSO:
The \1 (which means the first pattern) may also vary depending on context.
ALSO:
This will only work if your word is the only word on the line.
If your table name is one of many words on the line, you could probably replace the $ (which stands for the end of the line) with a wildcard that represents whitespace or a word boundary (these differ based on context).