Regex validation of filename failing - regex

I'm trying to validate a filename having letters "CAT" or "DOG" followed by 8 numerics, and ending in ".TXT".
Examples:
CAT20000101.TXT
DOG20031212.TXT
This would NOT match:
ATA12330000.TXT
CAT200T0101.TXT
DOG20031212.TX1
Here's the regex I am trying to make work:
(([A-Z]{3})([0-9]{8})([\.TXT]))\w+
Why is the last section (.TXT) failing against non-matching file extensions?
See example: http://regexr.com/3a7fo

Inside character class there is no regex grouping hence [\.TXT] is not right.
You can use this regex:
^[A-Z]{3}[0-9]{8}\.TXT$
For only matching CAT and DOG use:
^(CAT|DOG)[0-9]{8}\.TXT$

lose the unnecessary parentheses
[A-Z]{3}[0-9]{8}[\.TXT]\w+
lose the unnecessary/pattern-breaking character class [] around \.TXT
[A-Z]{3}[0-9]{8}\.TXT\w+
lose the \w+ at the end
[A-Z]{3}[0-9]{8}\.TXT
change [A-Z]{3} to (?:CAT|DOG).
(?:CAT|DOG)[0-9]{8}\.TXT
voilĂ .

It's failing because \.TXT is in square brackets, which matches only one of those four characters. Just use (\.TXT).

remove square brackets around [.TXT] to .TXT
Your example modified http://regexr.com/3a7fu

Related

Regex: matching up first occurence before special characters (|,-,/...)

I have product id on a sheet in two parts separated by special characters
I have several pattern, I can't find a solution that works for all my patterns, I would like to keep only the text before the "-", "|", space can be everywhere
aaa23-rerez3
dfds12|gdflk 132
ds123 fdsf-123 gad
sa 123,fdsg 123
I found this regex :
.*\w
working for some pattern but didn't work for pipe | and -
many thanks for your help
To match only the text before the | or - you can use an anchor ^ to assert the start of the string and use a negated character class to match any char except the listed in the character class.
^[^|-]+
Regex demo
If the spaces can be anywhere and you also want to match those along with only word characters:
^\s*(?:\w+\s*)+
Regex demo
I hope the following regular expression works for you. I tested it and it worked for all your patterns.
^([^-\|\s]+)(?=[-\|\s].*$)
Allow spaces, but separate if special character found.
["aaa23-rerez3", "dfds12|gdflk 132", "ds123 fdsf-123 gad", "sa 123,fdsg 123"].forEach(x => console.log(x, x.split(/[^\d\w\s]/g)))
Separates space also.
["aaa23-rerez3", "dfds12|gdflk 132", "ds123 fdsf-123 gad", "sa 123,fdsg 123"].forEach(x => console.log(x, x.split(/\W/g)))

How to find multiple dot with space character before first dots using REGEX

#^.[\S]+\.[\S]+\.(.*)$
I have used this regex to find multiple dot, but if my string contains white-space before first dot then it is not working
^.[\S]+\.[\S]+\.(.*)$
I expect that the regex should find this value
adajda9a b0a09.haa.ajada
teast.php.tasd
madnadak.ajada.a.jjhjhh
adjahdja.dfajha.ada.adjahdaj..jajjjjjhjha....dahhhhhbbja...
madkaja.adhakjda.sjjj
sadada.asdaa.jadfajk jadajda ajdhajda ada- 0(i09d0a9 )_) aciai
aadhadka.adad.akdjajdka0sd009999a.o999
adajda9a b0a09.haa.ajada
If you just want to match strings that have at least two dots, then why not just use this:
^.*\..*\..*$
Demo
You could also write this using a lookahead:
^(?=.*\..*\.).*$
I have created a regex that will match strings that have multiple dots in them and where there is only one space before the dots appear.
^[^.\s]* [^\s]*(?:\..*\..*)+$
Demo: https://regex101.com/r/UQksQK/4/
If you want to allow several spaces before the dots, use
^[^\.\s]* +.*(?:\..*\..*)+$
This will also match:
adajda9a b0a09.haa.ajada.123
If you want to forbid the space character between the dots, change the regex into:
^[^.\s]* +[^\s]*(?:\.[^\s]*\.[^\s]*)+$
It will not match strings like (where you have spaces between the dots):
adajda9a b0a09.ha a.ajada.123
Per comment to match line with space preceding first multiple dots:
^[^\.]* .*\..*\..*$
Test:
$ cat test.regexp
teast.php.tasd
madnadak.ajada.a.jjhjhh
adjahdja.dfajha.ada.adjahdaj..jajjjjjhjha....dahhhhhbbja...
madkaja.adhakjda.sjjj
sadada.asdaa.jadfajk jadajda ajdhajda ada- 0(i09d0a9 )_) aciai
aadhadka.adad.akdjajdka0sd009999a.o999
adajda9a b0a09.haa.ajada
$ egrep "^[^\.]* .*\..*\..*$" test.regexp
adajda9a b0a09.haa.ajada

How to match either a subset (preferred), or the whole line in a regex?

I have a string that looks something like this:
"Element 1 | Element 2| Element 3: element 4"
I want to substring the portion of the source string that follows the colon (to the end of the source string), but if there is no colon, then I want to grab the whole string.
What I've tried so far are variations around this:
:.*|.*
:?.*
etc.
However, while they'll match if either the colon is present or not, they don't prefer the substring when the colon is found.
I've been playing with this on http://regexpal.com.
Ultimately, this will be used in a CMDB tool for matching CIs - so a general solution would be ideal, rather than language- or engine-specific.
You can use the following:
(:.*|[^:]*)$
See DEMO
Explanation:
if there is no colon, then I want to grab the whole string
This if condition can be specified using a negitive character class of colon
You can use:
(?:^|:)[^:\n]*$
RegEx Demo

VIM - Replace based on a search regex

I've got a file with several (1000+) records like :
lbc3.*'
ssa2.*'
lie1.*'
sld0.*'
ssdasd.*'
I can find them all by :
/s[w|l].*[0-9].*$
What i want to do is to replace the final part of each pattern found with \.*'
I can't do :%s//s[w|l].*[0-9].*$/\\\\\.\*' because it'll replace all the string, and what i need is only replace the end of it from
.'
to
\.'
So the file output is llike :
lbc3\\.*'
ssa2\\.*'
lie1\\.*'
sld0\\.*'
ssdasd\\.*'
Thanks.
In general, the solution is to use a capture. Put \(...\) around the part of the regex that matches what you want to keep, and use \1 to include whatever matched that part of the regex in the replacement string:
s/\(s[w|l].*[0-9].*\)\.\*'$/\1\\.*'/
Since you're really just inserting a backslash between two strings that you aren't changing, you could use a second set of parens and \2 for the second one:
s/\(s[w|l].*[0-9].*\)\(\.\*'\)$/\1\\\2/
Alternatively, you could use \zs and \ze to delimit just the part of the string you want to replace:
s/s[w|l].*p0-9].*\zs\ze\*\'$/\\/

Regex find comma not inside quotes

I'm checking line by line in C#
Example data:
bob jones,123,55.6,,,"Hello , World",,0
jim neighbor,432,66.5,,,Andy "Blank,,1
john smith,555,77.4,,,Some value,,2
Regex to pick commas outside of quotes doesn't resolve second line, it's the closest.
Try the following regex:
(?!\B"[^"]*),(?![^"]*"\B)
Here is a demonstration:
regex101 demo
It does not match the second line because the " you inserted does not have a closing quotation mark.
It will not match values like so: ,r"a string",10 because the letter on the edge of the " will create a word boundary, rather than a non-word boundary.
Alternative version
(".*?,.*?"|.*?(?:,|$))
This will match the content and the commas and is compatible with values that are full of punctuation marks
regex101 demo
The below regex is for parsing each fields in a line, not an entire line
Apply the methodical and desperate regex technique: Divide and conquer
Case: field does not contain a quote
abc,
abc(end of line)
[^,"]*(,|$)
Case: field contains exactly two quotes
abc"abc,"abc,
abc"abc,"abc(end of line)
[^,"]*"[^"]*"[^,"]*(,|$)
Case: field contains exactly one quote
abc"abc(end of line)
abc"abc, (and that there's no quote before the end of this line)
[^,"]*"[^,"]$
[^,"]*"[^"],(?!.*")
Now that we have all the cases, we then '|' everything together and enjoy the resultant monstrosity.
The best answer written by Vasili Syrakis does not work with negative numbers inside quotation marks such as:
bob jones,123,"-55.6",,,"Hello , World",,0
jim neighbor,432,66.5
Following regex works for this purpose:
,(?!(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$))
But I was not successful with this part of input:
,Andy "Blank,
try this pattern ".*?"(*SKIP)(*FAIL)|, Demo
import re
print re.sub(',(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)',"",string)