I need a regex command that can be used to only keep 0-9,a-z,A-Z, "-" and ":".
How can I do this?
(Also, I would like to know if there are any good Regex GUI editors)
Use a character class, the following will match any one of the characters you listed:
[0-9a-zA-Z\-:]
And here is a regex that will match strings that contain only those characters:
^[0-9a-zA-Z\-:]*$
If you don't want to allow empty strings, change the * to +.
It wasn't exactly clear if this is what you were trying to do, if you are actually trying to remove all other characters except the listed one, you can negate the character class by adding ^ to the beginning of it, like so:
[^0-9a-zA-Z\-:]
This will match all characters except the ones listed, so you should be able to replace matches of the above regex with an empty string to remove the unwanted characters.
Related
using: this tool to evaluate my expression
My test string: "Little" Timmy (tim) McGraw
my regex:
^[()"]|.["()]
It looks like I'm properly catching the characters I want but my matches are including whatever character comes just before the match. I'm not sure what, or if anything, I'm doing wrong to be catching the preceding characters like that? The goal is to capture characters we don't want in the name field of one of our systems.
Brief
Your current regex ^[()"]|.["()] says the following:
^[()"]|.["()] Match either of the following
^[()"] Match the following
^ Assert position at the start of the line
[()"] Match any character present in the list ()"
.["()] Match the following
. Match any character (this is the issue you were having)
["()] Match any character present in the list "()
Code
You can actually shorten your regex to just [()"].
Ultimately, however, it would be much easier to create a negated set that determines which characters are valid rather than those that are invalid. This approach would get you something like [^\w ]. This means match anything not present in the set. So match any non-word and non-space characters (in your sample string this will match the symbols ()" since they are not in the set).
I am trying to extract the following string RL2.OPT.TEST.01 from this one a lot of text...[RL2.OPT.TEST.01]<some more text. Basically there can be anything before the RL2. but it always begins with RL2., and always finish by either ], <, or a space.
I tried with the following regex: m/RL2\..*[\s<\]]+?$/g
but even if it finds a match for the string -- [RL2.OPT.TEST.01], it does not work for -- [RL2.OPT.TEST.01] some more text.
I need an array of all the resultings matches in the large string I have. I think I should also mention that this string has a lot of newline characters, but never in the middle of the strings I am trying to extract.
Any clue about what is wrong with my regex?
Use a negated character class and remove the end of the line anchor $
m/RL2\.[^\s<\]]*/g
DEMO
[^\s<\]]* Negated character class which matches any character but not of space or < or ] zero or more times.
Hi i am learning regex..
I was trying to make a regex expression for following conditon:
any letter in the sequence given below - C-MPSTV-XZ condition is that it should not be repeated.
This letter can have one blank space in front or back ie it can be " C" or "C "
[C-MPSTV-XZ{1} ]{2}
I was trying the above expression {1} expected one character only and space after that allowing one space only. At the end of string i put {2} to get only 2 character .
I was expecting regex_match to be false for input "XX" but its not working.
Appreciate your help.
\s?[C-MPSTV-XZ]\s?. If you are using std::regex_match,
you shouldn't need anything else, since regex_match requires
a match over the entire string.
Your posted regex will match two characters which are both not spaces, because you're asking for any two from inside the character class. You're also going to accept {, 1 and } as characters because quantifiers act as literal characters inside a character class.
The simple alternative is to just spell out the two conditions explicitly:
( [C-MPRSTV-XZ]|[C-MPRSTV-XZ] )
This assumes that your regex engine is treating whitespace within regexes as significant. If not, or if you don't like that, replace the spaces with a suitable escape sequence.
This is my RegEx:
"^[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
I need to match only strings less than 255 characters.
I've tried adding the word boundaries at the start of the RegEx but it fails:
"^(?=.{1,254})[^\.]([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)([\.]{0,1})([\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]+)[^\.]#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,6}|[0-9]{1,3})(\]?)$"
You need the $ in the lookahead to make sure it's only up to 254. Otherwise, the lookahead will match even when there are more than 254.
(?=.{1,254}$)
Also, keep in mind that you can greatly simplify your regex because many characters that would usually need to be escaped do not need to when in a character class (square brackets).
"[\w-\!\#\$\%\&\'\*\+\-\/\=\`\{\|\}\~\?\^]"
is the same as this:
"[-\w!#$%&'*+/=`{|}~?^]"
Note that the dash must be first in the character class to be a literal dash, and the caret must not be first.
With some other simplifications, here is the complete string:
"^(?=.{1,254}$)[-\w!#$%&'*+/=`{|}~?^]+(\.[-\w!#$%&'*+/=`{|}~?^]+)*#((\d{1,3}\.){3}\d{1,3}|([-\w]+\.)+[a-zA-Z]{2,6})$"
Notes:
I removed the stipulation that the first char shouldn't be a period ([^.]) because the next character class doesn't match a period anyway, so it's redundant.
I removed many extraneous parens
I replaced [0-9] with \d
I replaced {0,1} with the shorthand "?"
After the # sign, it seemed that you were trying to match an IP address or text domain name, so I separated them more so it couldn't be a combination
I'm not sure what the optional square bracket at the end was for, so I removed it: "(]?)"
I tried it in Regex Hero, and it works. See if it works for you.
This depends on what language you are working in. In Python for example you can regex to split a text into separate strings, and then use len() to remove strings longer than the 255 characters you want
I think this post will help. It shows how to limit certain patterns but I am not sure how you would add it to the entire regex.
Can someone show me a regex to select #OnlinePopup_AFE53E2CACBF4D8196E6360D4DDB6B70 its okay to assume #OnlinePopup
~DCTM~dctm://aicpcudev/37004e1f8000219e?DMS_OBJECT_SPEC=RELATION_ID#OnlinePopup_AFE53E2CACBF4D8196E6360D4DDB6B70_11472026_1214836152225_6455280574472127786
NB: The following is .NET Regex syntax, modify for your flavour.
The following:
#[^_]+_[^_]+
will match:
Hash
One or more characters until an underscore
Underscore
One or more characters until an underscore
If the first bit is constant, and you want to be more specific you could use:
#OnlinePopup_[A-F0-9]+
This will match
OnlinePopup_ (exactly)
One or more hex characters until a non Hex character
Simply matching anything between the first '#' and the first or last '_' will not work for your example since the string that you want returned has an underscore in it. If all the text that you want to match has only one underscore in it, you could use this regex:
/(#[^_]+_[^_]+)/
This matches an octothorpe (#), followed by two strings that do not contain an underscore, seperated by a single underscore.
Something a little simpler:
(\#OnlinePopup_.*?)_
Assuming your text starts with # and ends with _