Imagine I have the following file names:
ZRD0004170011600001020190521.dat
ZRD0004170011600001020190521.pdf
ZRD0004170011600001020190521_TC.pdf
FLX0004170007100001020180630.dat
RES0004170007100001020180331.dat
RES0004170007100001020180930.dat
RES0004170007100001020181231.dat
RES0004170012200001020180930.dat
RES0004170012200001020181231.dat
ZNP0004170120190226.dat
ZNP0004170120190226.pdf
ZRD0004170012600001020190520.dat
ZRD0004170012600001020190520.pdf
ZRD0004170012600001020190520_TC.pdf
I want to detect the date pattern YYYYMMDD which is appearing in these files, which can appear immediately before "." or before "_TC".
Can someone help me here?
Thanks in advance!
Normally, I think you can use this regex :
[0-9]{8}(\.|_TC)
Is it what you want ?
Related
I am trying to exclude delimiters within text qualifiers. For this, I am trying to use Regex. However, I am new to Regex and am not able to fully accomplish my needs. I would be very greatful if someone can help me out.
In Alteryx, I load a delimited flat text file as 'non-delimited' and say that it does not have text qualifiers. Thus, the input will look something like this:
"aabb"|ccdd|eeff|gghh
"aa|bb"|ccdd|eeff|gghh
"aa|bb"|ccdd|"ee|ff"|gghh
"aa|bb"|"cc|dd"|"ee|ff"|"gg|hh"
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gg|hh"
aabb|ccdd|eeff|gghh
"aa|bb"|ccdd|eeff|"gg|hh"
aabb|cc|dd|eeff|gghh
aabb|"cc||dd"|eeff|gghh
aabb|"c|c|dd"|eeff|gghh
"aa||bb"|ccdd|eeff|gghh
"a|a|b|b"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"g|g|hh"
"aabb"|ccdd|eeff|"gg||hh"
I want to exclude all delimiters that are in between text qualifiers.
I have tried to use Regex to replace the delimiters within text qualifiers with nothing.
So far, I have tried the following Regex code for my target:
(")(.*?[^"])\|+(.*?)(")
And I have used the following for my replace:
$1$2$3$4
However, this will not fix te lines 11, 13, 14 and 15.
I wish to obtain the following results:
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|"eeff"|gghh
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gghh"
aabb|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"gghh"
aabb|cc|dd|eeff|gghh
aabb|"ccdd"|eeff|gghh
aabb|"ccdd"|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"gghh"
"aabb"|ccdd|eeff|"gghh"
Thank you in advance for helping me out!
With kind regards,
Robin
I can't think of the correct syntax in REGEX unless you are putting in each pattern that could be found.
However, an easier way (maybe not as performant), would be to use a Text to Columns selecting Ignore delimiters in quotes. If you need it back together in one cell afterwards, you can transpose, then remove delimiters followed by a Summarize to concatenate each RecordID Group.
I am trying to write a regex which will strip away the rest of the path after a particular folder name.
If Input is:
/Repository/Framework/PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces/IDemoReader.cs
Output should be:
/Repository/Framework/PITA/branches/ChangePack-6a7B6
Some constrains:
ChangePack- will be followed change pack id which is a mix of numbers or alphabets a-z or A-Z only in any order. And there is no limit on length of change pack id.
ChangePack- is a constant. It will always be there.
And the text before the ChangePack can also change. Like it can also be:
/Repository/Demo1/Demo2/4.3//PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces
My regex-fu is bad. What I have come up with till now is:
^(.*?)\-6a7B6
I need to make this generic.
Any help will be much appreciated.
Below regex can do the trick.
^(.*?ChangePack-[\w]+)
Input:
/Repository/Framework/PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces/IDemoReader.cs
/Repository/Demo1/Demo2/4.3//PITA/branches/ChangePack-6a7B6/core/src/Pita.x86.Interfaces
Output:
/Repository/Framework/PITA/branches/ChangePack-6a7B6
/Repository/Demo1/Demo2/4.3//PITA/branches/ChangePack-6a7B6
Check out the live regex demo here.
^(.*?ChangePack-[a-zA-Z0-9]+)
Try this.Instead of replace grab the match $1 or \1.See demo.
https://regex101.com/r/iY3eK8/17
Will you always have '/Repository/Framework/PITA/branches/' at the beginning? If so, this will do the trick:
/Repository/Framework/PITA/branches/\w+-\w*
Instead of regex you could can use split and join functions. Example python:
path = "/a/b/c/d/e"
folders = path.split("/")
newpath = "/".join(folders[:3]) #trims off everything from the third folder over
print(newpath) #prints "/a/b"
If you really want regex, try something like ^.*\/folder\/ where folder is the name of the directory you want to match.
Need help on reg expression to string match for 4 below file names
9369.PCTYYYYMMDD.txt
9370.PCTYYYYMMDD.txt
9369.s369YYMMDDd-0008-pct.txt
9370.s370YYMMDDd-0008-pct.txt
I have worked out like this ^93(69|70).(s369|s370|pct).*\.txt$
but this is matching with the below file names also; which it should not
9369.s369YYMMDDd-0023-pct.txt
9370.s370YYMMDDd-0023-pct.txt
please help me.
Thanks in advance....
Remove s370 from the second capturing group and don't forget to turn on the case insensitive modifier i.
^93(69|70)\.(s369|pct).*?\.txt$
DEMO
I have the following date string - "2013-02-20T17:24:33Z"
I want to write a regex to extract just the date part "2013-02-20". How do I do that? Any help will be appreciated.
Thanks,
Murtaza
You could use capture group for this.
/(\d{4}-\d{2}-\d{1,2}).*/
Using $1, you can get your desired part.
Well straightforward approach would be \d\d\d\d-\d\d-\d\d but you can also use quantifiers to make it look nicer \d{4}-\d{2}-\d{2}.
Just search for the first T and use substring. I assume you always get a well-formatted date string.
If the date string is not guaranteed to be valid, you can use any date related library to parse and validate the input (validation includes the calendar logic, which regex fails to achieve), and reformat the output.
No sample code, since you didn't mention the language.
using substring
string date = "2013-02-20T17:24:33Z";
string h = date.Substring(0, 10);
I am trying to extract dates from a text variable.
I have created a regex which extracts 'MOST' formats of date as follows:
$regexp = '#[0-9]{2,4}[-\/ ]{1}([A-Za-z]{3}|[0-9]{2})[-\/ ]{1}[0-9]{2,4}#';
preg_match_all($regexp, $output, $dates);
It does not however extract dates of the format '08 Aug 2012' and I do not know why.. As far as I can tell.. it should..
For now I have inserted a seperate regex which works:
$regexp = '#[0-9]{2}[ ]{1}[A-Za-z]{3}[ ]{1}[0-9]{4}#';
preg_match_all($regexp, $output, $dates);
which is essentially the same..
It however seems pointless to have multiple regex when I need only have one.
If anyone could tell me why the first regex isnt working for such a format, and explain why, it would be greatly appreciated.
Thanks
Well, your regexp is correct for the date format you presented. And as such it also works without problems: http://ideone.com/XxdKV