Extracting matches with Perl regex - regex

I am trying to extract the following string RL2.OPT.TEST.01­ from this one ­a lot of text...[RL2.OPT.TEST.01]<some more text­. Basically there can be anything before the ­RL2.­ but it always begins with ­RL2.­, and always finish by either ­]­, ­<­, or a space.
I tried with the following regex: m/RL2\..*[\s<\]]+?$/g
but even if it finds a match for the string ­-- [RL2.OPT.TEST.01]­, it does not work for ­-- [RL2.OPT.TEST.01] some more text­.
I need an array of all the resultings matches in the large string I have. I think I should also mention that this string has a lot of newline characters, but never in the middle of the strings I am trying to extract.
Any clue about what is wrong with my regex?

Use a negated character class and remove the end of the line anchor $
m/RL2\.[^\s<\]]*/g
DEMO
[^\s<\]]* Negated character class which matches any character but not of space or < or ] zero or more times.

Related

Removing last character from a line using regex

I just started learning regex and I'm trying to understand how it possible to do the following:
If I have:
helmut_rankl:20Suzuki12
helmut1195:wasserfall1974
helmut1951:roller11
Get:
helmut_rankl:20Suzuki1
helmut1195:wasserfall197
helmut1951:roller1
I tried using .$ which actually match the last character of a string, but it doesn't match letters and numbers.
How do I get these results from the input?
You could match the whole line, and assert a single char to the right if you want to match at least a single character.
.+(?=.)
Regex demo
If you also want to match empty strings:
.*(?=.)
This will do what you want with regex's match function.
^(.*).$
Broken down:
^ matches the start of the string
( and ) denote a capturing group. The matches which fall within it are returned.
.* matches everything, as much as it can.
The final . matches any single character (i.e. the last character of the line)
$ matches the end of the line/input

Regular expression for alpahbet,underscore,hyphen,apostrophe only

I want a regular expression that accept only alphabets,hyphen,apostrophe,underscore.
I tried
/^[ A-Za-z-_']*$/
but its not working. Please help.
Your regex is wrong. Try this:
/^[0-9A-Za-z_#'-]+$/
OR
/^[\w#'-]+$/
Hyphen needs to be at first or last position inside a character class to avoid escaping. Also if empty string isn't allowed then use + (1 or more) instead of * (0 or more)
Explanation:
^ assert position at start of the string
[\w#'-]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible
\w match any word character [a-zA-Z0-9_]
#'- a single character in the list #'- literally
$ assert position at end of the string
Move the hyphen at the end or the beginig of the character class or escape it:
^[ A-Za-z_'-]*$
or
^[- A-Za-z_']*$
or
^[ A-Za-z\-_']*$
If you want all letters:
^[ \pL_'-]*$
or
When using a hyphen in a character class, be sure to place it at the end of the character class as a best practice.
The reason for this is because the hyphen is used to signify a range of characters in the character class, and when it is at the end of the class, it will not create any ranges.
My best bet would be :
/[A-Za-z-\'_#0-9]+/g
You can use the following (in Java):
String acceptHyphenApostropheUnderscoreRegEx = "^(\\p{Alpha}*+((['_-]+)\\p{Alpha})?)*+$";
If you want to have spaces and # also (as some have given above) try:
String acceptHyphenApostropheUnderscoreRegEx = "^(\\p{Alpha}*+((\\s|['#_-]+)\\p{Alpha})?)*+$";

regex 1 character and space only

Hi i am learning regex..
I was trying to make a regex expression for following conditon:
any letter in the sequence given below - C-MPSTV-XZ condition is that it should not be repeated.
This letter can have one blank space in front or back ie it can be " C" or "C "
[C-MPSTV-XZ{1} ]{2}
I was trying the above expression {1} expected one character only and space after that allowing one space only. At the end of string i put {2} to get only 2 character .
I was expecting regex_match to be false for input "XX" but its not working.
Appreciate your help.
\s?[C-MPSTV-XZ]\s?. If you are using std::regex_match,
you shouldn't need anything else, since regex_match requires
a match over the entire string.
Your posted regex will match two characters which are both not spaces, because you're asking for any two from inside the character class. You're also going to accept {, 1 and } as characters because quantifiers act as literal characters inside a character class.
The simple alternative is to just spell out the two conditions explicitly:
( [C-MPRSTV-XZ]|[C-MPRSTV-XZ] )
This assumes that your regex engine is treating whitespace within regexes as significant. If not, or if you don't like that, replace the spaces with a suitable escape sequence.

Limiting RegEx to match only a string of 1-254 characters length

This is my RegEx:
"^[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
I need to match only strings less than 255 characters.
I've tried adding the word boundaries at the start of the RegEx but it fails:
"^(?=.{1,254})[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
You need the $ in the lookahead to make sure it's only up to 254. Otherwise, the lookahead will match even when there are more than 254.
(?=.{1,254}$)
Also, keep in mind that you can greatly simplify your regex because many characters that would usually need to be escaped do not need to when in a character class (square brackets).
"[\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]"
is the same as this:
"[-\w!#$%&'*+/=`{|}~?^]"
Note that the dash must be first in the character class to be a literal dash, and the caret must not be first.
With some other simplifications, here is the complete string:
"^(?=.{1,254}$)[-\w!#$%&'*+/=`{|}~?^]+(\.[-\w!#$%&'*+/=`{|}~?^]+)*#((\d{1,3}\.){3}\d{1,3}|([-\w]+\.)+[a-zA-Z]{2,6})$"
Notes:
I removed the stipulation that the first char shouldn't be a period ([^.]) because the next character class doesn't match a period anyway, so it's redundant.
I removed many extraneous parens
I replaced [0-9] with \d
I replaced {0,1} with the shorthand "?"
After the # sign, it seemed that you were trying to match an IP address or text domain name, so I separated them more so it couldn't be a combination
I'm not sure what the optional square bracket at the end was for, so I removed it: "(]?)"
I tried it in Regex Hero, and it works. See if it works for you.
This depends on what language you are working in. In Python for example you can regex to split a text into separate strings, and then use len() to remove strings longer than the 255 characters you want
I think this post will help. It shows how to limit certain patterns but I am not sure how you would add it to the entire regex.

Regex matching beginning AND end strings

This seems like it should be trivial, but I'm not so good with regular expressions, and this doesn't seem to be easy to Google.
I need a regex that starts with the string 'dbo.' and ends with the string '_fn'
So far as I am concerned, I don't care what characters are in between these two strings, so long as the beginning and end are correct.
This is to match functions in a SQL server database.
For example:
dbo.functionName_fn - Match
dbo._fn_functionName - No Match
dbo.functionName_fn_blah - No Match
If you're searching for hits within a larger text, you don't want to use ^ and $ as some other responders have said; those match the beginning and end of the text. Try this instead:
\bdbo\.\w+_fn\b
\b is a word boundary: it matches a position that is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. This regex will find what you're looking for in any of these strings:
dbo.functionName_fn
foo dbo.functionName_fn bar
(dbo.functionName_fn)
...but not in this one:
foodbo.functionName_fnbar
\w+ matches one or more "word characters" (letters, digits, or _). If you need something more inclusive, you can try \S+ (one or more non-whitespace characters) or .+? (one or more of any characters except linefeeds, non-greedily). The non-greedy +? prevents it from accidentally matching something like dbo.func1_fn dbo.func2_fn as if it were just one hit.
^dbo\..*_fn$
This should work you.
Well, the simple regex is this:
/^dbo\..*_fn$/
It would be better, however, to use the string manipulation functionality of whatever programming language you're using to slice off the first four and the last three characters of the string and check whether they're what you want.
\bdbo\..*fn
I was looking through a ton of java code for a specific library: car.csclh.server.isr.businesslogic.TypePlatform (although I only knew car and Platform at the time). Unfortunately, none of the other suggestions here worked for me, so I figured I'd post this.
Here's the regex I used to find it:
\bcar\..*Platform
Scanner scanner = new Scanner(System.in);
String part = scanner.nextLine();
String line = scanner.nextLine();
String temp = "\\b" + part + "|" + part + "\\b";
Pattern pattern = Pattern.compile(temp.toLowerCase());
Matcher matcher = pattern.matcher(line.toLowerCase());
System.out.println(matcher.find() ? "YES" : "NO");
If you need to determine if any of the words of this text start or end with the sequence, you can use this regex: \bsubstring|substring\b:
anythingsubstring
substringanything
anythingsubstringanything
The simplest thing that you can do is:
dbo.*_fn$
It searches with dbo, followed by any characters, and then ends with _fn.
If you can identify what’s the right next character after n if it’s space, you can replace $ with space .