Regular expression to support strings contacting special characters - regex

I have written down a regular expression to accept string like
Name
Name Surname
and not to accept a sequence of spaces, and empty input.
Here is my regular expression:
regexp = "[A-Za-z0-9][A-Za-z0-9\s]*[A-Za-z0-9]|[A-Za-z0-9]"
How to update this, to create a regular expression accepting strings like:
Name somestring. (for example "Name Jr.")
and Strings like this:
aaa-bbb (for example "Katty-Perry")
and Strings like this:
aaa'bbb (for example "Drai'Lyn")
and strings like
aaa bbb
and strings like
aaa
at the same time.

I would try:
[a-z0-9]+([\s'-][a-z0-9]+)*.?
See demo
Note that x+ means 'one or more x", but you code xx*.

This would help
[\w\'-.]*
\w includes (a-bA-B0-9)
and * matches 0 or more characters.
include characters such as ' , -
Etc., to match further..

I would suggest you that regex
[\w \.-_']*
I think the best way is to specify by yourself the specific characters you want to accept (. or ' or ...)

Try this
\b(?:[A-Za-z0-9]+(?:[-' _][a-zA-Z0-9]+)?\.?)\b

I have resolved this issue.
The answer, that worked for me was
[A-Za-z0-9][A-Za-z0-9\s-\']*[A-Za-z0-9.]|[A-Za-z0-9]

Related

Regex: how to match all character classes and not just one or more [duplicate]

Obviously, you can use the | (pipe?) to represent OR, but is there a way to represent AND as well?
Specifically, I'd like to match paragraphs of text that contain ALL of a certain phrase, but in no particular order.
Use a non-consuming regular expression.
The typical (i.e. Perl/Java) notation is:
(?=expr)
This means "match expr but after that continue matching at the original match-point."
You can do as many of these as you want, and this will be an "and." Example:
(?=match this expression)(?=match this too)(?=oh, and this)
You can even add capture groups inside the non-consuming expressions if you need to save some of the data therein.
You need to use lookahead as some of the other responders have said, but the lookahead has to account for other characters between its target word and the current match position. For example:
(?=.*word1)(?=.*word2)(?=.*word3)
The .* in the first lookahead lets it match however many characters it needs to before it gets to "word1". Then the match position is reset and the second lookahead seeks out "word2". Reset again, and the final part matches "word3"; since it's the last word you're checking for, it isn't necessary that it be in a lookahead, but it doesn't hurt.
In order to match a whole paragraph, you need to anchor the regex at both ends and add a final .* to consume the remaining characters. Using Perl-style notation, that would be:
/^(?=.*word1)(?=.*word2)(?=.*word3).*$/m
The 'm' modifier is for multline mode; it lets the ^ and $ match at paragraph boundaries ("line boundaries" in regex-speak). It's essential in this case that you not use the 's' modifier, which lets the dot metacharacter match newlines as well as all other characters.
Finally, you want to make sure you're matching whole words and not just fragments of longer words, so you need to add word boundaries:
/^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$/m
Look at this example:
We have 2 regexps A and B and we want to match both of them, so in pseudo-code it looks like this:
pattern = "/A AND B/"
It can be written without using the AND operator like this:
pattern = "/NOT (NOT A OR NOT B)/"
in PCRE:
"/(^(^A|^B))/"
regexp_match(pattern,data)
The AND operator is implicit in the RegExp syntax.
The OR operator has instead to be specified with a pipe.
The following RegExp:
var re = /ab/;
means the letter a AND the letter b.
It also works with groups:
var re = /(co)(de)/;
it means the group co AND the group de.
Replacing the (implicit) AND with an OR would require the following lines:
var re = /a|b/;
var re = /(co)|(de)/;
You can do that with a regular expression but probably you'll want to some else. For example use several regexp and combine them in a if clause.
You can enumerate all possible permutations with a standard regexp, like this (matches a, b and c in any order):
(abc)|(bca)|(acb)|(bac)|(cab)|(cba)
However, this makes a very long and probably inefficient regexp, if you have more than couple terms.
If you are using some extended regexp version, like Perl's or Java's, they have better ways to do this. Other answers have suggested using positive lookahead operation.
Is it not possible in your case to do the AND on several matching results? in pseudocode
regexp_match(pattern1, data) && regexp_match(pattern2, data) && ...
Why not use awk?
with awk regex AND, OR matters is so simple
awk '/WORD1/ && /WORD2/ && /WORD3/' myfile
The order is always implied in the structure of the regular expression. To accomplish what you want, you'll have to match the input string multiple times against different expressions.
What you want to do is not possible with a single regexp.
If you use Perl regular expressions, you can use positive lookahead:
For example
(?=[1-9][0-9]{2})[0-9]*[05]\b
would be numbers greater than 100 and divisible by 5
In addition to the accepted answer
I will provide you with some practical examples that will get things more clear to some of You. For example lets say we have those three lines of text:
[12/Oct/2015:00:37:29 +0200] // only this + will get selected
[12/Oct/2015:00:37:x9 +0200]
[12/Oct/2015:00:37:29 +020x]
See demo here DEMO
What we want to do here is to select the + sign but only if it's after two numbers with a space and if it's before four numbers. Those are the only constraints. We would use this regular expression to achieve it:
'~(?<=\d{2} )\+(?=\d{4})~g'
Note if you separate the expression it will give you different results.
Or perhaps you want to select some text between tags... but not the tags! Then you could use:
'~(?<=<p>).*?(?=<\/p>)~g'
for this text:
<p>Hello !</p> <p>I wont select tags! Only text with in</p>
See demo here DEMO
You could pipe your output to another regex. Using grep, you could do this:
grep A | grep B
((yes).*(no))|((no).*(yes))
Will match sentence having both yes and no at the same time, regardless the order in which they appear:
Do i like cookies? **Yes**, i do. But milk - **no**, definitely no.
**No**, you may not have my phone. **Yes**, you may go f yourself.
Will both match, ignoring case.
Use AND outside the regular expression. In PHP lookahead operator did not not seem to work for me, instead I used this
if( preg_match("/^.{3,}$/",$pass1) && !preg_match("/\s{1}/",$pass1))
return true;
else
return false;
The above regex will match if the password length is 3 characters or more and there are no spaces in the password.
Here is a possible "form" for "and" operator:
Take the following regex for an example:
If we want to match words without the "e" character, we could do this:
/\b[^\We]+\b/g
\W means NOT a "word" character.
^\W means a "word" character.
[^\We] means a "word" character, but not an "e".
see it in action: word without e
"and" Operator for Regular Expressions
I think this pattern can be used as an "and" operator for regular expressions.
In general, if:
A = not a
B = not b
then:
[^AB] = not(A or B)
= not(A) and not(B)
= a and b
Difference Set
So, if we want to implement the concept of difference set in regular expressions, we could do this:
a - b = a and not(b)
= a and B
= [^Ab]

How to remove dollar format with regex

I am trying to remove the dollar format from the string '$1,109,889.23'. I tried using a regular expression with:
"[^\\d]"
but then I get the commas.
Any help? Thanks in advance.
You don't need a regex for this.
Just use lsParseCurrency:
numericValue = lsParseCurrency('$1,109,889.23');
writeOutput(numericValue);
Example on trycf.com yields:
1109889.23
As per Leigh's commment below... if yer locale is something that doesn't use , and . for the thousands and decimal separators (respectively)... make sure to specify the locale too, eg:
numericValue = lsParseCurrency('$1,109,889.23', 'en_us');
How about just doing a search and replace for , and $?
but if you're going to do it.
[^\d.]+
I am using ColdFusion. The [^\d.] works great as eldarerathis mentioned above.
<cfset amt = '$1,109,889.23'>
<cfset newAmt = ReReplace(amt, "[^\d.]", "","ALL") >
<cfoutput>#newAmt#</cfoutput>
Since $ is a metacharacter in regex (it means "end of string" or "end of line", depending on the current settings), it needs to be escaped to \$. But why use a regex at all if it's just one fixed character?
[\d,.]+ would give you the number part. Here is your example on Rubular.
I'm not sure I understand exactly what you want, but it seems that in your example you would want to end up with "1,109,889.23". If this is the case, why don't you simply select use [\d,.\x20]+ (note the period is not escaped, as it is in character set, and note the \x20 to select spaces if they use that instead of commas) to select just the number. If you want to select everything that is NOT part of the number, as your example indicates, then you would just search for [^\d,.\x20]. This will work for any currency format, not just ones that use the dollar sign. It will also allow multiple types of punctuation, such as allowing spaces instead of commas to separate multiple numbers. However, I agree with Tim Pietzcker that a regular expression might not be the right tool for the job.
In Java:
String newString = oldString.replaceAll("$", "");
Would this RegEx work for you?
^\$

Please help about regular expression

What's wrong with this expression?
^[a-zA-Z]+(([\''\-][a-zA-Z])?[a-zA-Z]*)*$
I want to allow alpha characters with space,-, and ' characters
for example O'neal;Jackson-Peter, Mary Jane
The following is all you need:
^[a-zA-Z' -]+$
The important thing is that the "-" is the last character in the group, otherwise it'd be interpreted as a range (unless you escaped it with "\")
How you actually input that expression as a string in your target language is different depending on the language. For C#, I usually use "#" strings, like so:
var regex = new Regex(#"^[a-zA-Z' -]+$");
This will match any string made up of at least one character, which can be alpha characters, hyphen or the single quote mark:
^[a-zA-Z-\']+$
This will also include empty strings:
^[a-zA-Z-\']*$
If it needs to begin and end with alpha characters (as names do):
^[a-zA-Z][a-zA-Z-\']*[a-zA-Z]$
Something like this?
^[a-zA-Z '\-,]*$

Regular Expression to find sequences of lowercase letters joined with underscore

I can't seem to make my regular expression work.
I'd like to have some alpha text, no numbers, an underscore and then some more aplha text.
for example: blah_blah
I have an non-working example here
^[a-z][_][a-z]$
Thanks in advance people.
EDIT: I apologize, I'd like to enforce the use of all lower case.
^[a-z]+_[a-z]+$
Try this:
[A-Za-z]+_[A-Za-z]+
Lowercase :
[a-z]+_[a-z]+
You just need:
[a-z]+_[a-z]+
or if it needs to be an entire line:
^[a-z]+_[a-z]+$
Try:
^[a-z]+_[a-z]+$
Depending on which flavor of regex you're using there are a different possibilities:
^[A-Za-z]+_[A-Za-z]+$
^\a+_\a+$
^[[:alpha:]]+_[[:alpha:]]+$
The first form being the most widely accepted.
Your example suggests you're looking for things exactly like "blah_foo" and don't want to extract it from strings like "Hey blah_foo you". If this is not the case, you should drop the "^" (match the beginning of the string) and "$" (match the end of the string)

RegEx for String.Format

Hiho everyone! :)
I have an application, in which the user can insert a string into a textbox, which will be used for a String.Format output later. So the user's input must have a certain format:
I would like to replace exactly one placeholder, so the string should be of a form like this: "Text{0}Text". So it has to contain at least one '{0}', but no other statement between curly braces, for example no {1}.
For the text before and after the '{0}', I would allow any characters.
So I think, I have to respect the following restrictions: { must be written as {{, } must be written as }}, " must be written as \" and \ must be written as \.
Can somebody tell me, how I can write such a RegEx? In particular, can I do something like 'any character WITHOUT' to exclude the four characters ( {, }, " and \ ) above instead of listing every allowed character?
Many thanks!!
Nikki:)
I hate to be the guy who doesn't answer the question, but really it's poor usability to ask your user to format input to work with String.Format. Provide them with two input requests, so they enter the part before the {0} and the part after the {0}. Then you'll want to just concatenate the strings instead of use String.Format- using String.Format on user-supplied text is just a bad idea.
[^(){}\r\n]+\{0}[^(){}\r\n]+
will match any text except (, ), {, } and linebreaks, then match {0}, then the same as before. There needs to be at least one character before and after the {0}; if you don't want that, replace + with *.
You might also want to anchor the regex to beginning and end of your input string:
^[^(){}\r\n]+\{0}[^(){}\r\n]+$
(Similar to Tim's answer)
Something like:
^[^{}()]*(\{0})[^{}()]*$
Tested at http://www.regular-expressions.info/javascriptexample.html
It sounds like you're looking for the [^CHARS_GO_HERE] construct. The exact regex you'd need depends on your regex engine, but it would resemble [^({})].
Check out the "Negated Character Classes" section of the Character Class page at Regular-Expressions.info.
I think your question can be answered by the regexp:
^(((\{\{|\}\}|\\"|\\\\|[^\{\}\"\\])*(\{0\}))+(\{\{|\}\}|\\"|\\\\|[^\{\}\"\\])*$
Explanation:
The expression is built up as follows:
^(allowed chars {0})+(allowed chars)*$
one or more sequences of allowed chars followed by a {0} with optional allowed chars at the end.
allowed chars is built of the 4 sequences you mentioned (I assumed the \ escape is \\ instead of \.) plus all chars that do not contain the escapes chars:
(\{\{|\}\}|\\"|\\\\|[^\{\}\"\\])
combined they make up the regexp I started with.