Need help on reg expression to string match for 4 below file names
9369.PCTYYYYMMDD.txt
9370.PCTYYYYMMDD.txt
9369.s369YYMMDDd-0008-pct.txt
9370.s370YYMMDDd-0008-pct.txt
I have worked out like this ^93(69|70).(s369|s370|pct).*\.txt$
but this is matching with the below file names also; which it should not
9369.s369YYMMDDd-0023-pct.txt
9370.s370YYMMDDd-0023-pct.txt
please help me.
Thanks in advance....
Remove s370 from the second capturing group and don't forget to turn on the case insensitive modifier i.
^93(69|70)\.(s369|pct).*?\.txt$
DEMO
Related
I am trying to match everything between multiple set of brackets
Example of data
[[42.30722,-83.181125],[42.30722,-83.18112667],[42.30722167,-83.18112667,[42.30721667,-83.181125],[+42.30721667,-83.181125]]
I need to match everything within the inner brackets as below
42.30722,-83.181125,
42.30722,-83.18112667,
42.30722167,-83.18112667,
42.30721667,-83.181125,
+42.30721667,-83.181125
How do I do that. I tried \[([^\[\]]|)*\] but it gives me values with brackets. Can anybody please help me with this. Thanks in advance
Seems like one of them is missing a bracket maybe, or if not, maybe some expression similar to:
\[([+-]?\d+\.\d+)\s*,\s*([+-]?\d+\.\d+)\s*\]?
might be OK to start with.
Test
import re
expression = r"\[([+-]?\d+\.\d+)\s*,\s*([+-]?\d+\.\d+)\s*\]?"
string = """
[[42.30722,-83.181125],[42.30722,-83.18112667],[42.30722167,-83.18112667,[42.30721667,-83.181125],[+42.30721667,-83.181125]]
"""
print([list(i) for i in re.findall(expression, string)])
print(re.findall(expression, string))
Output
[['42.30722', '-83.181125'], ['42.30722', '-83.18112667'], ['42.30722167', '-83.18112667'], ['42.30721667', '-83.181125'], ['+42.30721667', '-83.181125']]
[('42.30722', '-83.181125'), ('42.30722', '-83.18112667'), ('42.30722167', '-83.18112667'), ('42.30721667', '-83.181125'), ('+42.30721667', '-83.181125')]
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
A little late, but figured I would include it anyhow.
Your 3rd set is missing a ']'.
If that is in there, then in Alteryx, you can just use Text to Columns splitting to Rows and ignore delimiter in brackets
I am trying to exclude delimiters within text qualifiers. For this, I am trying to use Regex. However, I am new to Regex and am not able to fully accomplish my needs. I would be very greatful if someone can help me out.
In Alteryx, I load a delimited flat text file as 'non-delimited' and say that it does not have text qualifiers. Thus, the input will look something like this:
"aabb"|ccdd|eeff|gghh
"aa|bb"|ccdd|eeff|gghh
"aa|bb"|ccdd|"ee|ff"|gghh
"aa|bb"|"cc|dd"|"ee|ff"|"gg|hh"
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gg|hh"
aabb|ccdd|eeff|gghh
"aa|bb"|ccdd|eeff|"gg|hh"
aabb|cc|dd|eeff|gghh
aabb|"cc||dd"|eeff|gghh
aabb|"c|c|dd"|eeff|gghh
"aa||bb"|ccdd|eeff|gghh
"a|a|b|b"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"g|g|hh"
"aabb"|ccdd|eeff|"gg||hh"
I want to exclude all delimiters that are in between text qualifiers.
I have tried to use Regex to replace the delimiters within text qualifiers with nothing.
So far, I have tried the following Regex code for my target:
(")(.*?[^"])\|+(.*?)(")
And I have used the following for my replace:
$1$2$3$4
However, this will not fix te lines 11, 13, 14 and 15.
I wish to obtain the following results:
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|"eeff"|gghh
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gghh"
aabb|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"gghh"
aabb|cc|dd|eeff|gghh
aabb|"ccdd"|eeff|gghh
aabb|"ccdd"|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"gghh"
"aabb"|ccdd|eeff|"gghh"
Thank you in advance for helping me out!
With kind regards,
Robin
I can't think of the correct syntax in REGEX unless you are putting in each pattern that could be found.
However, an easier way (maybe not as performant), would be to use a Text to Columns selecting Ignore delimiters in quotes. If you need it back together in one cell afterwards, you can transpose, then remove delimiters followed by a Summarize to concatenate each RecordID Group.
I was trying to find solution for my problem.
Input: prd-abcd-efgh-i-0dflnk55f5d45df
Output: prd-abcd-efgh
Tried Splunk Query : index=aws-* (host=prd-abcd-efgh*) | rex field=host "^(?<host>[^.]+)"| dedup host | stats count by host,methodPath
I want to remove everything comes after "-i-" using simple regex.I tried with regex "^(?[^.]+)" listed here
https://answers.splunk.com/answers/77101/extracting-selected-hosts-with-regex-regex-hosts-with-exceptions.html
Please help me to solve it.
replace(host, "(?<=-i-).*", "")
Example here: https://regex101.com/r/blcCcQ/2
This (?<=-i-) is a lookbehind
I have no knowledge of Splunk. but the normal way to do that would be to match the part you don't want and replace it with an empty string.
The regex for doing that could be:
-i-.*
Then replace the match with an empty string.
Something simple like this should work:
([a-z-]+)-i-.+
The first capture group will return only the part preceding -i-.
Hello I'm trying to find a regex that would catch the terms in a url.
For example, given:
https://stackoverflow.com, it would catch "stackoverflow"
and given https://stackoverflow.com/questions/ask, it would catch "stackoverflow", "questions", "ask" and any potential terms in between the slash character after the domain name.
Up until now I managed to find the following regex but it cannot repeat catching groups
https?:\/\/(?:www\.)?([\da-z-]*)(?:[\.a-z]*)(?:\/([\da-z]*)\/?)+
Do you guys have any ways to resolve that issue?? that would be great.
I testet the answer of Michal M it appears not to get "www." so I updated it
/(?:\/(?:w{3}\.)?)\K([\w]+)/i
Edit: As soon as it's not important to match the "www." I placed it inside a non capturing group so it won't be captured. Btw I also placed the case insensitive modifier so "WWW." would be okay too.
Try this one:
(?:(\/))\K(\w+)
tested in notepad++
You may try using two separate regexes -- one for the hostname part and another for the terms in the path part. Then combine them with alternation construction and do global search:
https?:\/\/(?:\w+\.)*(\w+)\.\w+ # this would capture hostname "term"
|
\/(\w+) # this would capture path "terms"
(Note: requires /x modifier.)
Demo: https://regex101.com/r/nA8jT9/2
Thanks I managed to rearrange it for it to work with the "www"
(?:\/(?:www\.)?)\K([\w\d]+)
I am trying to write a regex which will strip away the rest of the path after a particular folder name.
If Input is:
/Repository/Framework/PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces/IDemoReader.cs
Output should be:
/Repository/Framework/PITA/branches/ChangePack-6a7B6
Some constrains:
ChangePack- will be followed change pack id which is a mix of numbers or alphabets a-z or A-Z only in any order. And there is no limit on length of change pack id.
ChangePack- is a constant. It will always be there.
And the text before the ChangePack can also change. Like it can also be:
/Repository/Demo1/Demo2/4.3//PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces
My regex-fu is bad. What I have come up with till now is:
^(.*?)\-6a7B6
I need to make this generic.
Any help will be much appreciated.
Below regex can do the trick.
^(.*?ChangePack-[\w]+)
Input:
/Repository/Framework/PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces/IDemoReader.cs
/Repository/Demo1/Demo2/4.3//PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces
Output:
/Repository/Framework/PITA/branches/ChangePack-6a7B6
/Repository/Demo1/Demo2/4.3//PITA/branches/ChangePack-6a7B6
Check out the live regex demo here.
^(.*?ChangePack-[a-zA-Z0-9]+)
Try this.Instead of replace grab the match $1 or \1.See demo.
https://regex101.com/r/iY3eK8/17
Will you always have '/Repository/Framework/PITA/branches/' at the beginning? If so, this will do the trick:
/Repository/Framework/PITA/branches/\w+-\w*
Instead of regex you could can use split and join functions. Example python:
path = "/a/b/c/d/e"
folders = path.split("/")
newpath = "/".join(folders[:3]) #trims off everything from the third folder over
print(newpath) #prints "/a/b"
If you really want regex, try something like ^.*\/folder\/ where folder is the name of the directory you want to match.