Assistance with a regular expression - regex

I am not good with regular expressions, and I could use some help with a couple of expressions I am working on. I have a line of text, such as Text here then 999-99 and I'd like to isolate that number sequence at the end. It could be either 999-99 or 999-99-9. The following seems to work:
\d{3}-\d{2}(-\d{1})?
But I notice that it really just seems to be searching anywhere within the text, as I can add text after the number sequence and it still matches. This needs to be more strict, so that the line must end with this exact sequence, and nothing after it. I tried ending with $ instead of ?, but that never seems to create a match (it always returns false).
I could also use some help with character replacement. I am working on a program which deals with OCR scanning, and occasionally the string value that comes back contains undisplayable characters, represented by the ܀ symbol. Is there a regular expression which will replace the ܀ characters with a space?

Try this regular expression.
([\d-]+)$

This should work. Just end your regex with $. It represents end of line
\d{3}-\d{2}(-\d{1})?$

Use the word-boundary metacharacter, \b:
\b\d{3}-\d{2}(-\d)?\b
You can also remove the {1} from the last \d since it's redundant.

Related

regex to match strings not ending with a pattern?

I am trying to form a regular expression that will match strings that do NOT end a with a DOT FOLLOWED BY NUMBER.
eg.
abcd1
abcdf12
abcdf124
abcd1.0
abcd1.134
abcdf12.13
abcdf124.2
abcdf124.21
I want to match first three.
I tried modifying this post but it didn't work for me as the number may have variable length.
Can someone help?
You can use something like this:
^((?!\.[\d]+)[\w.])+$
It anchors at the start and end of a line. It basically says:
Anchor at the start of the line
DO NOT match the pattern .NUMBERS
Take every letter, digit, etc, unless we hit the pattern above
Anchor at the end of the line
So, this pattern matches this (no dot then number):
This.Is.Your.Pattern or This.Is.Your.Pattern2012
However it won't match this (dot before the number):
This.Is.Your.Pattern.2012
EDIT: In response to Wiseguy's comment, you can use this:
^((?!\.[\d]+$)[\w.])+$ - which provides an anchor after the number. Therefore, it must be a dot, then only a number at the end... not that you specified that in your question..
If you can relax your restrictions a bit, you may try using this (extended) regular expression:
^[^.]*.?[^0-9]*$
You may omit anchoring metasymbols ^ and $ if you're using function/tool that matches against whole string.
Explanation: This regex allows any symbols except dot until (optional) dot is found, after which all non-numerical symbols are allowed. It won't work for numbers in improper format, like in string: abcd1...3 or abcd1.fdfd2. It also won't work correctly for some string with multiple dots, like abcd.ab123cd.a (the problem description is a bit ambigous).
Philosophical explanation: When using regular expressions, often you don't need to do exactly what your task seems to be, etc. So even simple regex will do the job. An abstract example: you have a file with lines are either numbers, or some complicated names(without digits), and say, you want to filter out all numbers, then simple filtering by [^0-9] - grep '^[0-9]' will do the job.
But if your task is more complex and requires validation of format and doing other fancy stuff on data, why not use a simple script(say, in awk, python, perl or other language)? Or a short "hand-written" function, if you're implementing stand-alone application. Regexes are cool, but they are often not the right tool to use.
I would just use a simple negative look-behind anchored at the end:
.*(?<!\\.\\d+)$

Password validation regex

I am trying to get one regular expression that does the following:
makes sure there are no white-space characters
minimum length of 8
makes sure there is at least:
one non-alpha character
one upper case character
one lower case character
I found this regular expression:
((?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z])(?!\s).{8,})
which takes care of points 2 and 3 above, but how do I add the first requirement to the above regex expression?
I know I can do two expressions the one above and then
\s
but I'd like to have it all in one, I tried doing something like ?!\s but I couldn't get it to work. Any ideas?
^(?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z])\S{8,}$
should do. Be aware, though, that you're only validating ASCII letters. Is Ä not a letter for your requirements?
\S means "any character except whitespace", so by using this instead of the dot, and by anchoring the regex at the start and end of the string, we make sure that the string doesn't contain any whitespace.
I also removed the unnecessary parentheses around the entire expression.
Tim's answer works well, and is a good reminder that there are many ways to solve the same problem with regexes, but you were on the right track to finding a solution yourself. If you had changed (?!\s) to (?!.*\s) and added the ^ and $ anchors to the end, it would work.
^((?=.*[^a-zA-Z])(?=.*[a-z])(?=.*[A-Z])(?!.*\s).{8,})$

Regular expression question

I have some text like this:
dagGeneralCodes$_ctl1$_ctl0
Some text
dagGeneralCodes$_ctl2$_ctl0
Some text
dagGeneralCodes$_ctl3$_ctl0
Some text
dagGeneralCodes$_ctl4$_ctl0
Some text
I want to create a regular expression that extracts the last occurrence of dagGeneralCodes$_ctl[number]$_ctl0 from the text above.
the result should be: dagGeneralCodes$_ctl4$_ctl0
Thanks in advance
Wael
This should do it:
.*(dagGeneralCodes\$_ctl\d\$_ctl0)
The .* at the front is greedy so initially it will grab the entire input string. It will then backtrack until it finds the last occurrence of the text you want.
Alternatively you can just find all the matches and keep the last one, which is what I'd suggest.
Also, specific advice will probably need to be given depending on what language you're doing this in. In Java, for example, you will need to use DOTALL mode to . matches newlines because ordinarily it doesn't. Other languages call this multiline mode. Javascript has a slightly different workaround for this and so on.
You can use:
[\d\D]*(dagGeneralCodes\$_ctl\d+\$_ctl0)
I'm using [\d\D] instead of . to make it match new-line as well. The * is used in a greedy way so that it will consume all but the last occurrence of dagGeneralCodes$_ctl[number]$_ctl0.
I really like using this Regular Expression Cheatsheet; it's free, a single page, and printed, fits on my cube wall.

Regular Expression to List accepted words

I need a regular expression to list accepted Version Numbers. ie. Say I wanted to accept "V1.00" and "V1.02". I've tried this "(V1.00)|(V1.01)" which almost works but then if I input "V1.002" (Which is likely due to the weird version numbers I am working with) I still get a match. I need to match the exact strings.
Can anyone help?
The reason you're getting a match on "V1.002" is because it is seeing the substring "V1.00", which is part of your regex. You need to specify that there is nothing more to match. So, you could do this:
^(V1\.00|V1\.01)$
A more compact way of getting the same result would be:
^(V1\.0[01])$
Do this:
^(V1\.00|V1\.01)$
(. needs to be escaped, ^ means must be on the beginning of the text and $ must be on the end of the text)
I would use the '^' and '$' to mark the beginning and end of the string, like this:
^(V1\.00|V1\.01)$
That way the entire string must match the regex.

concatenate multiple regexes into one regex

For a text file, I want to match to the string that starts with "BEAM" and "FILE PATH". I would have used
^BEAM.*$
^FILE PATH.*$
if I were to match them separately. But now I have to concatenate those two matching patterns into one pattern.
Any idea on how to do this?
A pipe/bar character generally represents "or" with regexps. You could try:
^(BEAM|FILE PATH).*$
The accepted answer is right but you may have redundancy in your Regular Expression.
^ means match the start of a line
(BEAM|FILE PATH) - means the string "BEAM" or the string "FILE PATH"
.* means anything at all
$ means match the end of the line
So in effect, all you are saying is match my strings at the beginning of the line since you don't care what's at the end. You could do this with:
^(BEAM|FILE PATH)
There are two cases where this reduction wouldn't be valid:
If you doing some with the matched string, so you want to match the whole line to pass the data to something else.
You're using a Regular Expression function that wants to match a whole string rather than part of it. You can sometimes solve this by picking the a different Regular Expression function or method. For example in Python use search instead of match.
If the above post doesn't work, try escaping the () and | in different ways until you find one that works. Some regex engines treat these characters differently (special vs. non-special characters), especially if you are running the match in a shell (shell will look for special characters too):
^\(BEAM|FILE PATH\).*$
%\(BEAM\|FILE PATH\).*$
etc.