concatenate multiple regexes into one regex - regex

For a text file, I want to match to the string that starts with "BEAM" and "FILE PATH". I would have used
^BEAM.*$
^FILE PATH.*$
if I were to match them separately. But now I have to concatenate those two matching patterns into one pattern.
Any idea on how to do this?

A pipe/bar character generally represents "or" with regexps. You could try:
^(BEAM|FILE PATH).*$

The accepted answer is right but you may have redundancy in your Regular Expression.
^ means match the start of a line
(BEAM|FILE PATH) - means the string "BEAM" or the string "FILE PATH"
.* means anything at all
$ means match the end of the line
So in effect, all you are saying is match my strings at the beginning of the line since you don't care what's at the end. You could do this with:
^(BEAM|FILE PATH)
There are two cases where this reduction wouldn't be valid:
If you doing some with the matched string, so you want to match the whole line to pass the data to something else.
You're using a Regular Expression function that wants to match a whole string rather than part of it. You can sometimes solve this by picking the a different Regular Expression function or method. For example in Python use search instead of match.

If the above post doesn't work, try escaping the () and | in different ways until you find one that works. Some regex engines treat these characters differently (special vs. non-special characters), especially if you are running the match in a shell (shell will look for special characters too):
^\(BEAM|FILE PATH\).*$
%\(BEAM\|FILE PATH\).*$
etc.

Related

Extract specific string using regular expression

I want to extract only a specific string if its match
example as an input string:
13.10.0/
13.10.1/
13.10.2/
13.10.3/
13.10.4.2/
13.10.4.4/
13.10.4.5/
I'm using this regex [0-9]+.[0-9]+.[0-9] to extract only digit.digit.digit from a string if its match
but in that case, this is the wrong output related to my regex :
13.10.0
13.10.1
13.10.2
13.10.3
13.10.4.2 (no need to match this string 13.10.4 )
13.10.4.4 (no need to match this string13.10.4 )
13.10.4.5(no need to match this string 13.10.4 )
the correct output that I need :
13.10.0
13.10.1
13.10.2
13.10.3
It's hard to say without knowing how you're passing these strings in -- are they lines in a file? An array of strings in a programming language?
If you're searching a file using grep or a similar tool, it will give you all lines that match anywhere, even if only part of the line matches.
Normally, you'd deal with this using anchors to specify the regex must start on the first character of the line, and end on the last (e.g. ^[0-9]+.[0-9]+.[0-9]$). ^ matches the start of the line, and $ matches at the end.
In your case, you've got slashes at the end of all the lines, so the easiest fix is to match that final slash, with ^[0-9]+.[0-9]+.[0-9]/.
You could also use lookahead or groups to match the slash without returning it -- but that depends a bit more on what tool you're running this regex in and how you're processing it.
If your strings are separated by whitespace (other than newlines), replacing ^ with (^|\s) (either the beginning of the string, or some whitespace character) may work -- but it will add a leading space to some of your results.
You may also need to set your regex tool to match multiple times in a line (e.g. the -o flag in grep). Again, it's hard to give useful advice about this without knowing what regular-expression tool you're using, or how you're processing the results.
I think you want:
^\d+\.\d+\.\d+$
Which is exactly 3 groups of digit(s) separates by (literal) dots.
Some tools (like grep) match all lines that contain your regex, and may have additional characters before/after.
Use $ character to match end of line after your regex. (Also note, that . matches any character, not literal dot)
[0-9]+\.[0-9]+\.[0-9]$

Regular expression to check strings containing a set of words separated by a delimiter

As the title says, I'm trying to build up a regular expression that can recognize strings with this format:
word!!cat!!DOG!! ... Phone!!home!!
where !! is used as a delimiter. Each word must have a length between 1 and 5 characters. Empty words are not allowed, i.e. no strings like !!,!!!! etc.
A word can only contain alphabetical characters between a and z (case insensitive). After each word I expect to find the special delimiter !!.
I came up with the solution below but since I need to add other controls (e.g. words can contain spaces) I would like to know if I'm on the right way.
(([a-zA-Z]{1,5})([!]{2}))+
Also note that empty strings are not allowed, hence the use of +
Help and advices are very welcome since I just started learning how to build regular expressions. I run some tests using http://regexr.com/ and it seems to be okay but I want to be sure. Thank you!
Examples that shouldn't match:
a!!b!!aaaaaa!!
a123!!b!!c!!
aAaa!!bbb
aAaa!!bbb!
Splitting the string and using the values between the !!
It depends on what you want to do with the regular expression. If you want to match the values between the !!, here are two ways:
Matching with groups
([^!]+)!!
[^!]+ requires at least 1 character other than !
!! instead of [!]{2} because it is the same but much more readable
Matching with lookahead
If you only want to match the actual word (and not the two !), you can do this by using a positive lookahead:
[^!]+(?=!!)
(?=) is a positive lookahead. It requires everything inside, i.e. here !!, to be directly after the previous match. It however won't be in the resulting match.
Here is a live example.
Validating the string
If you however want to check the validity of the whole string, then you need something like this:
^([^!]+!!)+$
^ start of the string
$ end of the string
It requires the whole string to contain only ([^!]+!!) one or more than one times.
If [^!] does not fit your requirements, you can of course replace it with [a-zA-Z] or similar.

Ant regex expression

Quite a simple one in theory but can't quite get it!
I want a regex in ant which matches anything as long as it has a slash on the end.
Below is what I expect to work
<regexp id="slash.end.pattern" pattern="*/"/>
However this throws back
java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*/
^
I have also tried escaping this to \*, but that matches a literal *.
Any help appreciated!
Your original regex pattern didn't work because * is a special character in regex that is only used to quantify other characters.
The pattern (.)*/$, which you mentioned in your comment, will match any string of characters not containing newlines, however it uses a possibly unnecessary capturing group. .*/$ should work just as well.
If you need to match newline characters, the dot . won't be enough. You could try something like [\s\S]*/$
On that note, it should be mentioned that you might not want to use $ in this pattern. Suppose you have the following string:
abc/def/
Should this be evaluated as two matches, abc/ and def/? Or is it a single match containing the whole thing? Your current approach creates a single match. If instead you would like to search for strings of characters and then stop the match as soon as a / is found, you could use something like this: [\s\S]*?/.

How to extract file location using Regular Expressions(VB.NET)

I am facing a problem whereby I am given a string that contains a path to a file and the file's name and I only want to extract the path (without the file's name)
For example, I will receive something like
C:\Users\OopsD\Projects\test.acdbd
and from that string I want to extract only
C:\Users\OopsD\Projects
I was trying to create a RegEx to match a backslash followed by a word, followed by a dot followed by another word - this is to match the
\test.acdbd
part and replace it with empty string so that the final result is
C:\Users\OopsD\Projects
Can anyone, familiar with RegEx, help me on this one? Also, I will be using regular expressions quite a lot in the future. Is there a (free) program I can download to create regular expressions?
Are you really sure you need to be using Regex for such as simple task? How about this:
Dim file As New IO.FileInfo(" C:\Users\OopsD\Projects\test.acdbd")
MsgBox(file.Directory.FullName)
Regarding the free program on Regex, I would definitely recommend http://www.gskinner.com/RegExr/ - using it all the time. But you always have to consider alternatives, before going the Regex way.
The regex that you are looking for is as below:
[^/]+$
where,
^ (caret):Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the caret match after line breaks (i.e. at the start of a line in a file) as well.
$ (dollar):Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the dollar match before line breaks (i.e. at the end of a line in a file) as well. Also matches before the very last line break if the string ends with a line break.
+ (plus):Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once.
More reference can be found out at this link.
Many Regex softwares and tools are out there. Some of them are:
www.gskinner.com/RegExr/
www.txt2re.com
Rubular- It is not just for Ruby.

regex to match strings not ending with a pattern?

I am trying to form a regular expression that will match strings that do NOT end a with a DOT FOLLOWED BY NUMBER.
eg.
abcd1
abcdf12
abcdf124
abcd1.0
abcd1.134
abcdf12.13
abcdf124.2
abcdf124.21
I want to match first three.
I tried modifying this post but it didn't work for me as the number may have variable length.
Can someone help?
You can use something like this:
^((?!\.[\d]+)[\w.])+$
It anchors at the start and end of a line. It basically says:
Anchor at the start of the line
DO NOT match the pattern .NUMBERS
Take every letter, digit, etc, unless we hit the pattern above
Anchor at the end of the line
So, this pattern matches this (no dot then number):
This.Is.Your.Pattern or This.Is.Your.Pattern2012
However it won't match this (dot before the number):
This.Is.Your.Pattern.2012
EDIT: In response to Wiseguy's comment, you can use this:
^((?!\.[\d]+$)[\w.])+$ - which provides an anchor after the number. Therefore, it must be a dot, then only a number at the end... not that you specified that in your question..
If you can relax your restrictions a bit, you may try using this (extended) regular expression:
^[^.]*.?[^0-9]*$
You may omit anchoring metasymbols ^ and $ if you're using function/tool that matches against whole string.
Explanation: This regex allows any symbols except dot until (optional) dot is found, after which all non-numerical symbols are allowed. It won't work for numbers in improper format, like in string: abcd1...3 or abcd1.fdfd2. It also won't work correctly for some string with multiple dots, like abcd.ab123cd.a (the problem description is a bit ambigous).
Philosophical explanation: When using regular expressions, often you don't need to do exactly what your task seems to be, etc. So even simple regex will do the job. An abstract example: you have a file with lines are either numbers, or some complicated names(without digits), and say, you want to filter out all numbers, then simple filtering by [^0-9] - grep '^[0-9]' will do the job.
But if your task is more complex and requires validation of format and doing other fancy stuff on data, why not use a simple script(say, in awk, python, perl or other language)? Or a short "hand-written" function, if you're implementing stand-alone application. Regexes are cool, but they are often not the right tool to use.
I would just use a simple negative look-behind anchored at the end:
.*(?<!\\.\\d+)$