How to check for certain characters using regex - regex

I am trying to check for following characters in my string using regex but based on tutorials online and ]some questions on SO I havent been able to figure out a solution so far. Can anyone help. I would really appreciate it.
Here is my string:
0-9~!##$%^&*()_+`-={}[]\|:”;’,./<>?ÀàÂâÄäÆæÇçÉéÈèÊêËëÎîÏïÔôÖöŒœßÙùÛûÜüŸÿ
I also want to allow single and double quotes in my string. So is there a way to do it.

If you just want to match the presence of any of those characters in the string you can just use this.
**Updated to include ' and "
/["'\d~!##\$%\^&\*\(\)_\+`\-=\{\}\[\]\|:”;’,\.\/<>\?ÀàÂâÄäÆæÇçÉéÈèÊêËëÎîÏïÔôÖöŒœßÙùÛûÜüŸÿ]/g
This is just a basic character class - http://www.regular-expressions.info/charclass.html
I would suggest you might be better to use a whitelist approach, rather than exclude characters, for example, /[^\w\s"']/g will match anything that is not " ' _ whitespace or alphanumeric

Related

How to use Regex to find a string with only spaces delimiting it?

I'm trying to parse a string that is only whitespace delimited.
I am currently using /\b[somestring]\b/
But i don't want it to pick up words that have any punctuation next to it.
So if i parsed this string:
"Trying to,\n
test this out,\n
but Trying to do this has taken a while.\n
Trying to do stuff is fun,\n
but i am stuck"
with /\bTrying To\b/
I find three, but i only want two because i don't want to include
"Trying to,"
--Edit
Expected output
Trying To
Trying To
If I understood correctly your question, then you can use a regex lookahead like this:
Trying to(?=\s)
Working demo
The idea is to search for your string Trying to that is following with a space character
Edit: if you want to include those having \n, then you can use:
Trying to(?=\s|\\n)
Btw, if you want to include the literal space and the literal \n, then you don't need to use a regex lookahead, but simple group like this:
Trying to(?:\s|\\n)
Maybe I am misunderstanding what you want, but should this work? /(?:^| )Trying to /gm, if you don't want all matches you could get rid of the g flag but please note this answer requires the m flag if you want new lines to work. Or if for some reason you don't want to deal with regex flags the following should work: /(?:^|\n| )Trying to /.

Regex to handle a dynamic set of delimters

Im writing a parser and need to handle escaping characters via regex, if possible.
Given a sample string of with the escape character of '\' and a delimiter of '&':
TestSection1&TestSection2\\&TestSection3\&TestSection4
I would like to be able to split on a valid '&', that is to say not an & that is escaped. So the above example would come out something like this:
TestSection1
TestSection2\
TestSection3\&TestSection4
Ive tried a quite a few regex that Ive tried to muddle together but no luck. Does anyone have any insight on how one can accomplish this, or if its even possible?
Thanks
You can use this double lookbehind based regex:
(.+?)(?:(?<!(?<!\\)\\)&|$)
RegEx Demo
(?:(?<!(?<!\\)\\)&|$) means match & or end anchor if & is not preceded by a single \

regex_match allow '

I am using regex_match for validation for last names
I have this so far regex_match[/^[a-zA-Z -]{0,25}+$/] but I also want to allow ' for names like O'Neal. I tried this regex_match[/^[a-zA-Z -\']{0,25}+$/] but it didnt work
any suggestions?
Thanks,
J
-\' is an invalid range. You need to put the dash at the end of the character class:
/^[a-zA-Z '-]{0,25}$/
The + is superfluous here (in some regex flavors, it activates "possessive matching", but it's definitely not needed here), as is the backslash.
Also, I suspect that the square brackets around the regex are not syntactically correct in whatever language you're using. (Which one is that, by the way?)
But the real problem is your trying to validate a name (with a regex, no less).
You don't need the \ to escape the ', but you probably should put the dash last so it's not creating a range.
Nor do you need the + after the {0,25}, it's not a valid regex with it.
This works fine for me ^[a-zA-Z '-]{0,25}$

RegExp extraction

Here's the input string:
loadMedia('mediacontainer1', 'http://www.something.com/videos/JohnsAwesomeVideo.flv', 'http://www.something.com/videos/JohnsAwesomeCaption.xml', '/videos/video-splash-image.gif)
With this RegExp: \'.+.xml\'
... we get this:
'mediacontainer1', 'http://www.something.com/videos/JohnsAwesomeVideo.flv', 'http://www.something.com/videos/JohnsAwesomeCaption.xml'
... but I want to extract only this:
http://www.something.com/videos/JohnsAwesomeCaption.xml
Any suggestions? I'm sure this problem has been asked before, but it's difficult to search for. I'll be happy to Accept a solution.
Thanks!
If you want to get everything within quotes that starts with http:
(?<=')http:[^']+(?=')
If you only want those ending with .xml
(?<=')http:[^']+\.xml(?=')
It doesn't select the quotation marks (as you asked)
It's fast!
Fair warning: it only works if the regex engine you're using can handle lookbehind
Knowing the language would be helpful. Basically, you are having a problem because the + quantifier is greedy, meaning it will match the largest part of the string that it can. you need to use a non-greedy quantifier, which will match as little as possible.
We will need to know the language you're in to know what the syntax for the non-greedy quantifier should be.
Here is a perl recipe. Just as a sidenote, instead of .+, you probably want to match [^.]+.xml.
\'.+?.xml\'
should work if your language supports perl-like regexes.
This should work (tested in javascript, but pretty sure it would work in most cases)
'[^']+?\.xml'
it looks for these rules
starts with '
is followed by anything but '
ends in .xml'
you can demo it at http://RegExr.com?2tp6q
in .net this regex works for me:
\'[\w:/.]+\.xml\'
breaking it down:
a ' character
followed by a word character or ':' or '/' or '.' any number of times (which matches the url bit)
followed by '.xml' (which differentiates the sought string from the other urls which it will match without this)
followed by another ' character
I tested it here
Edit
I missed that you don't want the quotes in the result, in which case as has been pointed out you need to use look behind and look ahead to include the quotes in the search, but not in the answer. again in .net:
(?<=')[\w:/.]+\.xml(?=')
but I think the best solution is a combination of those offered already:
(?<=')[^']+\.xml(?=')
which seems the simplest to read, at least to me.

How to make a regular expression looking for a list of extensions separated by a space

I want to be able to take a string of text from the user that should be formated like this:
.ext1 .ext2 .ext3 ...
Basically, I am looking for a dot, a string of alphanumeric characters of any length a space, and rinse and repeat. I am a little confused on how to say " i need a period, string of characters and a space". But also, the last extension could either be followed by nothing, or a space, or a series of spaces. Also, I guess in between extensions could be followed by any number of spaces?
EDIT: I made it clearer what I was looking for.
Thanks!
Try this:
^(?:\.[A-Za-z0-9]+ +)*\.[A-Za-z0-9]+ *$
(Rubular)
In a Java string literal you need to escape the backslashes:
"^(?:\\.[A-Za-z0-9]+ +)*\\.[A-Za-z0-9]+ *$"
(\.\w+)\s* Match this and get your results.
^((\.\w+)\s*)*$ Check this and if it's true, your String is exactly what you want.
For the last pattern thing, you can't (AFAIK) do both getting all extensions (separated) and checking that the last is followed by other things. Either you check your string, or you extract the extensions from it.
I'd start with something like: ^.[a-z0-9]+([\t\n\v ]+.[a-z0-9]+)*$