Choosing just the alphanumeric words with regex - regex

I'm trying to find the regular expression to find just the alphanumeric words from a string i.e the words that are a combination of alphabets or numbers. If a word is pure numbers or pure characters I need to discard it.

Try this regular expression:
\b([a-z]+[0-9]+[a-z0-9]*|[0-9]+[a-z]+[a-z0-9]*)\b
Or more compact:
\b([a-z]+[0-9]+|[0-9]+[a-z]+)[a-z0-9]*\b
This matches all words (note the word boundaries \b) that either start with one or more letters followed by one or more digits or vice versa that may be followed by one or more letters or digits. So the condition of at least one letter and at least one digit is always fulfilled.

With lookaheads:
'/\b(?![0-9]+\b)(?![a-z]+\b)[0-9a-z]+\b/i'
A quick test that also shows example usage:
$str = 'foo bar F0O 8ar';
$arr = array();
preg_match_all('/\b(?![0-9]+\b)(?![a-z]+\b)[0-9a-z]+\b/i', $str, $arr);
print_r($arr);
Output:
F0O
8ar

This will return all individual alphanumeric words, which you can loop through. I don't think regex can do the whole job by itself.
\b[a-z0-9]+\b
Make sure you mark that as case-insensitive.

\b(?:[a-z]+[0-9]+|[0-9]+[a-z]+)[[:alnum:]]*\b

'\b([a-zA-Z]+[0-9]+ | [0-9]+[a-zA-Z]+ | [a-zA-Z]+[0-9]+[a-zA-Z]*)\b'

Related

Regex for this particular pattern

I have three different things
xxx
xxx>xxx
xxx>xxx>xxx
Where xxx can be any combination of letters and number
I need a regex that can match the first two but NOT the third.
To match ASCII letters and digits try the following:
^[a-zA-Z0-9]{3}(>[a-zA-Z0-9]{3})?$
If letters and digits outside of the ASCII character set are required then the following should suffice:
^[^\W_]{3}(>[^\W_]{3})?$
^\w+(?:>\w+)?$
matches an entire string.
\w+(?:>\w+)?\b(?!>)
matches strings like this in a larger substring.
If you want to exclude the underscore from matching, you can use [\p{L]\p{N}] instead (if your regex engine knows Unicode), or [^\W_] if it doesn't, as a substitute for \w.

Looking for Regex

I want to validate Winforms text box with regex.
The input sting example:
ZX1 OR N?V OR 2L? OR ?55 (any sequence of three symbols length strings with OR between them)
What is the regex that you would advise?
UPDATE:
Trying this one but seams to be it is not 100% correct
string text = "ZX1 OR N?V OR 2L? OR ?55";
Regex r = new Regex("([0-9A-Z?]{3} OR )*[0-9A-Z?]{3}");
"^\\s*\\S{3}(?:\\s+OR\\s+\\S{3})*\\s*$"
should work in a variety of languages.
\\S
matches any non-space character, and
\\s
matches any space character, so the regex above matches any number of triplets of non-space characters separated by the string "OR" surrounded by space characters.
The ^ and $ serve to ensure that it matches the whole string so you can take those out if you want to find this pattern inside a larger string.
What is the list of possible symbols you can have? can you have at most one question mark?
This will match what you've given, but it will also match multiple question marks.
([A-Z?]{3} OR )*[A-Z?]{3}
try...
(([\w\S]{3}\s+)or\s+)+[\w\S]{3}

Regex: Check if string contains at least one digit

I have got a text string like this:
test1test
I want to check if it contains at least one digit using a regex.
What would this regex look like?
I'm surprised nobody has mentioned the simplest version:
\d
This will match any digit. If your regular expression engine is Unicode-aware, this means it will match anything that's defined as a digit in any language, not just the Arabic numerals 0-9.
There's no need to put it in [square brackets] to define it as a character class, as one of the other answers did; \d works fine by itself.
Since it's not anchored with ^ or $, it will match any subset of the string, so if the string contains at least one digit, this will match.
And there's no need for the added complexity of +, since the goal is just to determine whether there's at least one digit. If there's at least one digit, this will match; and it will do so with a minimum of overhead.
The regular expression you are looking for is simply this:
[0-9]
You do not mention what language you are using. If your regular expression evaluator forces REs to be anchored, you need this:
.*[0-9].*
Some RE engines (modern ones!) also allow you to write the first as \d (mnemonically: digit) and the second would then become .*\d.*.
In Java:
public boolean containsNumber(String string)
{
return string.matches(".*\\d+.*");
}
you could use look-ahead assertion for this:
^(?=.*\d).+$
Another possible solution, in case you're looking for all the words in a given string, which contain a number
Pattern
\w*\d{1,}\w*
\w* - Matches 0 or more instances of [A-Za-z0-9_]
\d{1,} - Matches 1 or more instances of a number
\w* - Matches 0 or more instances of [A-Za-z0-9_]
The whole point in \w* is to allow not having a character in the beginning or at the end of a word. This enables capturing the first and last words in the text.
If the goal here is to only get the numbers without the words, you can omit both \w*.
Given the string
Hello world 1 4m very happy to be here, my name is W1lly W0nk4
Matches
1 4m W1lly W0nk4
Check this example in regexr - https://regexr.com/5ff7q
Ref this
SELECT * FROM product WHERE name REGEXP '[0-9]'
This:
\d+
should work
Edit, no clue why I added the "+", without it works just as fine.
\d
In perl:
if($testString =~ /\d/)
{
print "This string contains at least one digit"
}
where \d matches to a digit.
If anybody falls here from google, this is how to check if string contains at least one digit in C#:
With Regex
if (Regex.IsMatch(myString, #"\d"))
{
// do something
}
Info: necessary to add using System.Text.RegularExpressions
With linq:
if (myString.Any(c => Char.IsDigit(c)))
{
// do something
}
Info: necessary to add using System.Linq

Check string for all lowercase letters in PowerShell

I want to be able to test if a PowerShell string is all lowercase letters.
I am not the worlds best regex monkey, but I have been trying along these lines:
if ($mystring -match "[a-z]^[A-Z]") {
echo "its lower!"
}
But of course they doesn't work, and searching the Internet hasn't got me anywhere. Is there a way to do this (besides testing every character in a loop)?
PowerShell by default matches case-insensitively, so you need to use the -cmatch operator:
if ($mystring -cmatch "^[a-z]*$") { ... }
-cmatch is always case-sensitive, while -imatch is always case-insensitive.
Side note: Your regular expression was also a little weird. Basically you want the one I provided here which consists of
The anchor for the start of the string (^)
A character class of lower-case Latin letters ([a-z])
A quantifier, telling to repeat the character class at least 0 times, thereby matching as many characters as needed (*). You can use + instead to disallow an empty string.
The anchor for the end of the string ($). The two anchors make sure that the regular expression has to match every character in the string. If you'd just use [a-z]* then this would match any string that has a string of at least 0 lower-case letters somewhere in it. Which would be every string.
P.S.: Ahmad has a point, though, that if your string might consist of other things than letters too and you want to make sure that every letter in it is lower-case, instead of also requiring that the string consists solely of letters, then you have to invert the character class, sort of:
if ($mystring -cmatch "^[^A-Z]*$") { ... }
The ^ at the start of the character class inverts the class, matching every character not included. Thereby this regular expression would only fail if the string contains upper-case letters somewhere. Still, the -cmatch is still needed.
If your test is so simple, you can and probably should avoid the use of regular expressions:
$mystring -ceq $mystring.ToLower()
Try this pattern, which matches anything that is not an uppercase letter: "^[^A-Z]*$"
This would return false for any uppercase letters while allowing the string to contain other items as long as all letters are lowercase. For example, "hello world 123" would be valid.
If you strictly want letters without spaces, numbers etc., then Johannes's solution fits.

Regular expression to match phone number?

I want to match a phone number that can have letters and an optional hyphen:
This is valid: 333-WELL
This is also valid: 4URGENT
In other words, there can be at most one hyphen but if there is no hyphen, there can be at most seven 0-9 or A-Z characters.
I dont know how to do and "if statement" in a regex. Is that even possible?
I think this should do it:
/^[a-zA-Z0-9]{3}-?[a-zA-Z0-9]{4}$/
It matches 3 letters or numbers followed by an optional hyphen followed by 4 letters or numbers. This one works in ruby. Depending on the regex engine you're using you may need to alter it slightly.
You seek the alternation operator, indicated with pipe character: |
However, you may need either 7 alternatives (1 for each hyphen location + 1 for no hyphen), or you may require the hyphen between 3rd and 4th character and use 2 alternatives.
One use of alternation operator defines two alternatives, as in:
({3,3}[0-9A-Za-z]-{4,4}[0-9A-Za-z]|{7,7}[0-9A-Za-z])
Not sure if this counts, but I'd break it into two regexes:
#!/usr/bin/perl
use strict;
use warnings;
my $text = '333-URGE';
print "Format OK\n" if $text =~ m/^[\dA-Z]{1,6}-?[\dA-Z]{1,6}$/;
print "Length OK\n" if $text =~ m/^(?:[\dA-Z]{7}|[\dA-Z-]{8})$/;
This should avoid accepting multiple dashes, dashes in the wrong place, etc...
Supposing that you want to allow the hyphen to be anywhere, lookarounds will be of use to you. Something like this:
^([A-Z0-9]{7}|(?=^[^-]+-[^-]+$)[A-Z0-9-]{8})$
There are two main parts to this pattern: [A-Z0-9]{7} to match a hyphen-free string and (?=^[^-]+-[^-]+$)[A-Z0-9-]{8} to match a hyphenated string.
The (?=^[^-]+-[^-]+$) will match for any string with a SINGLE hyphen in it (and the hyphen isn't the first or last character), then the [A-Z0-9-]{8} part will count the characters and make sure they are all valid.
Thank you Heath Hunnicutt for his alternation operator answer as well as showing me an example.
Based on his advice, here's my answer:
[A-Z0-9]{7}|[A-Z0-9][A-Z0-9-]{7}
Note: I tested my regex here. (Just including this for reference)