Check string for all lowercase letters in PowerShell - regex

I want to be able to test if a PowerShell string is all lowercase letters.
I am not the worlds best regex monkey, but I have been trying along these lines:
if ($mystring -match "[a-z]^[A-Z]") {
echo "its lower!"
}
But of course they doesn't work, and searching the Internet hasn't got me anywhere. Is there a way to do this (besides testing every character in a loop)?

PowerShell by default matches case-insensitively, so you need to use the -cmatch operator:
if ($mystring -cmatch "^[a-z]*$") { ... }
-cmatch is always case-sensitive, while -imatch is always case-insensitive.
Side note: Your regular expression was also a little weird. Basically you want the one I provided here which consists of
The anchor for the start of the string (^)
A character class of lower-case Latin letters ([a-z])
A quantifier, telling to repeat the character class at least 0 times, thereby matching as many characters as needed (*). You can use + instead to disallow an empty string.
The anchor for the end of the string ($). The two anchors make sure that the regular expression has to match every character in the string. If you'd just use [a-z]* then this would match any string that has a string of at least 0 lower-case letters somewhere in it. Which would be every string.
P.S.: Ahmad has a point, though, that if your string might consist of other things than letters too and you want to make sure that every letter in it is lower-case, instead of also requiring that the string consists solely of letters, then you have to invert the character class, sort of:
if ($mystring -cmatch "^[^A-Z]*$") { ... }
The ^ at the start of the character class inverts the class, matching every character not included. Thereby this regular expression would only fail if the string contains upper-case letters somewhere. Still, the -cmatch is still needed.

If your test is so simple, you can and probably should avoid the use of regular expressions:
$mystring -ceq $mystring.ToLower()

Try this pattern, which matches anything that is not an uppercase letter: "^[^A-Z]*$"
This would return false for any uppercase letters while allowing the string to contain other items as long as all letters are lowercase. For example, "hello world 123" would be valid.
If you strictly want letters without spaces, numbers etc., then Johannes's solution fits.

Related

Matching a lower-case character on the position before an uppercase character (camelCase)?

I have a regex in a piece of Typescript code that is used to match strings where there is a space, a dash/underscore or camelcase.
Because this pattern also is used to split the string later, in the case of the camelcase I actually need to match the lowercase character immediately before the camelcase/uppercase character, because I am trying to catch the camelcase. I am trying to reduce a string into two "initials" basically, so if I would input my alias for example "saddexProductions" or "Saddex Productions" etc, the output would be "SP". If there is no indicator that the string consists of two parts, for example "Saddexproductions", the output will be "Sa". If I match the uppercase character in the middle of the string though and split there, that character will be removed and the result with input "saddexProductions" would be "SR".
Here is what I have come up with so far:
const splitRegex: RegExp = /\s|(?<=.)([a-z](?<=[A-Z]{1}))|\-|\_/;
Specifically, it is this part that is relevant:
(?<=.)([a-z](?<=[A-Z]{1}))
All the other scenarios I have described but this one give the desired result. There can be pretty much anything in front and following the camelcase, but it is always the single lowercase character before the uppercase character that needs to be matched, not the uppercase character.
How would I accomplish this? Thanks in advance.
You can use
const splitRegex: RegExp = /[-_\s]|([a-z](?=[A-Z]))/;
Details:
[-_\s] - a character class matching a -, _ or a whitespace
| - or
([a-z](?=[A-Z])) - a capturing group with ID=1 that matches a lowercase ASCII letter followed with an uppercase ASCII letter without adding the latter to the overall match value (as it is inside a positive lookahead that is a non-consuming regex construct).

Regex that only allows empty string, letters, numbers or spaces?

Need help coming up with a regex that only allows numbers, letters, empty string, or spaces.
^[_A-z0-9]*((-|\s)*[_A-z0-9])*$
This one is the closest I've found but it allows underscores and hyphen.
Only letters, numbers, space, or empty string?
Then 1 character class will do.
^[A-Za-z0-9 ]*$
^ : start of the string or line (depending on the flag)
[A-Za-z0-9 ]* : zero or more upper-case or lower-case letters, or digits, or spaces.
$ : end of the string or line (depending on the flag)
The A-z range contains more than just letters.
You can see that in the ASCII table.
And \s for whitespace also includes tabs or linebreaks (depending on the flag).
But if you also want those, then just use that instead of the space.
^[A-Za-z0-9\s]*$
Also, depending on the regex engine/dialect that your language/tool uses, you could use \p{L} for any unicode letter.
Since [A-Za-z] only includes the normal ascii letters.
Reference here
Your regex is too complicated for what you need.
the first part is fine, you are allowing letter and number, you could simply add the space character with it.
Then, if you use the * character, which translate to 0 or any, you could take care of your empty string problem.
See here.
/^[a-z0-9 ]*$/gmi
Notice here that i'm not using A-z like you were because this translate to any character between the A in ascii (101) and the z(172). this mean it will also match char in between (133 to 141 that are not number nor letter). I've instead use a-z which allow lowercase letter and used the flag i which tell the regex to not take care of the case.
Here is a visual explanation of the regex
You can also test more cases in this regex101
Matching only certain characters is equivalent to not matching any other character, so you could use the regex r = /[^a-z\d ]/i to determine if the string contains any character other than the ones permitted. In Ruby that would be implemented as follows.
"aBc d01e e$9" !~ r #=> false
"aBc d01e ex9" !~ r #=> true
In this situation there may not much to choose between this approach and attempting to match /\A[a-z\d ]+\z/i, but in other situations the use of a negative match can simplify the regex considerably.

Regex to match all numbers, letters and punctuation symbols?

I want a regex which can match all numbers, letters, and all punctuation symbols as well (full stop, comma, question mark, exclamation mark, colon, etc.).
The string must be at least one character long, but can be any length above that.
Is it possible?
Try \\p{Graph}+ or \\p{Print}+
#Test
public void shouldMatch()
{
assertTrue("asdf123ASFD!##$%^&*()".matches("\\p{Graph}+"));
}
#Test
public void shouldMatchWithWhitespaces()
{
assertTrue("asdf 123 ASFD !##$%^&*()".matches("[\\p{Graph}\\s]+"));
}
You can get more infos here (Section: POSIX character classes (US-ASCII only)):
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
Start by looking at character classes
http://www.regular-expressions.info/charclass.html
An example:
[A-Za-z_0-9]*
Will match anything with standard letters in ascii plus the underscore.
You can add your desired punctuation to the set.
You can use \w to match any word characters, and depending on which regex implementation you use it may match unicode characters too.
Another approach is to decide what you DON'T want to match. If you want to match a string of characters that are not whitespace you could use
\S*
If I understood well, it should be easy. Please try:
([^\s]+)
This regex match one or more occurrences of any characters but not a space.
This is the easiest way to match (and reuse) any string. Maybe you already know what's parenthesis means in regular expressions. They are used for backreference, I.e. to reuse later the matched string.

Regular expression for letters, numbers and - _

I'm having trouble checking in PHP if a value is is any of the following combinations
letters (upper or lowercase)
numbers (0-9)
underscore (_)
dash (-)
point (.)
no spaces! or other characters
a few examples:
OK: "screen123.css"
OK: "screen-new-file.css"
OK: "screen_new.js"
NOT OK: "screen new file.css"
I guess I need a regex for this, since I need to throw an error when a give string has other characters in it than the ones mentioned above.
The pattern you want is something like (see it on rubular.com):
^[a-zA-Z0-9_.-]*$
Explanation:
^ is the beginning of the line anchor
$ is the end of the line anchor
[...] is a character class definition
* is "zero-or-more" repetition
Note that the literal dash - is the last character in the character class definition, otherwise it has a different meaning (i.e. range). The . also has a different meaning outside character class definitions, but inside, it's just a literal .
References
regular-expressions.info/Anchors, Character Classes and Repetition
In PHP
Here's a snippet to show how you can use this pattern:
<?php
$arr = array(
'screen123.css',
'screen-new-file.css',
'screen_new.js',
'screen new file.css'
);
foreach ($arr as $s) {
if (preg_match('/^[\w.-]*$/', $s)) {
print "$s is a match\n";
} else {
print "$s is NO match!!!\n";
};
}
?>
The above prints (as seen on ideone.com):
screen123.css is a match
screen-new-file.css is a match
screen_new.js is a match
screen new file.css is NO match!!!
Note that the pattern is slightly different, using \w instead. This is the character class for "word character".
API references
preg_match
Note on specification
This seems to follow your specification, but note that this will match things like ....., etc, which may or may not be what you desire. If you can be more specific what pattern you want to match, the regex will be slightly more complicated.
The above regex also matches the empty string. If you need at least one character, then use + (one-or-more) instead of * (zero-or-more) for repetition.
In any case, you can further clarify your specification (always helps when asking regex question), but hopefully you can also learn how to write the pattern yourself given the above information.
you can use
^[\w.-]+$
the + is to make sure it has at least 1 character. Need the ^ and $ to denote the begin and end, otherwise if the string has a match in the middle, such as ####xyz%%%% then it is still a match.
\w already includes alphabets (upper and lower case), numbers, and underscore. So the rest ., -, are just put into the "class" to match. The + means 1 occurrence or more.
P.S. thanks for the note in the comment about preventing - to denote a range.
This is the pattern you are looking for
/^[\w-_.]*$/
What this means:
^ Start of string
[...] Match characters inside
\w Any word character so 0-9 a-z A-Z
-_. Match - and _ and .
* Zero or more of pattern or unlimited
$ End of string
If you want to limit the amount of characters:
/^[\w-_.]{0,5}$/
{0,5} Means 0-5 characters
To actually cover your pattern, i.e, valid file names according to your rules, I think that you need a little more. Note this doesn't match legal file names from a system perspective. That would be system dependent and more liberal in what it accepts. This is intended to match your acceptable patterns.
^([a-zA-Z0-9]+[_-])*[a-zA-Z0-9]+\.[a-zA-Z0-9]+$
Explanation:
^ Match the start of a string. This (plus the end match) forces the string to conform to the exact expression, not merely contain a substring matching the expression.
([a-zA-Z0-9]+[_-])* Zero or more occurrences of one or more letters or numbers followed by an underscore or dash. This causes all names that contain a dash or underscore to have letters or numbers between them.
[a-zA-Z0-9]+ One or more letters or numbers. This covers all names that do not contain an underscore or a dash.
\. A literal period (dot). Forces the file name to have an extension and, by exclusion from the rest of the pattern, only allow the period to be used between the name and the extension. If you want more than one extension that could be handled as well using the same technique as for the dash/underscore, just at the end.
[a-zA-Z0-9]+ One or more letters or numbers. The extension must be at least one character long and must contain only letters and numbers. This is typical, but if you wanted allow underscores, that could be addressed as well. You could also supply a length range {2,3} instead of the one or more + matcher, if that were more appropriate.
$ Match the end of the string. See the starting character.
Something like this should work
$code = "screen new file.css";
if (!preg_match("/^[-_a-zA-Z0-9.]+$/", $code))
{
echo "not valid";
}
This will echo "not valid"
[A-Za-z0-9_.-]*
This will also match for empty strings, if you do not want that exchange the last * for an +

Choosing just the alphanumeric words with regex

I'm trying to find the regular expression to find just the alphanumeric words from a string i.e the words that are a combination of alphabets or numbers. If a word is pure numbers or pure characters I need to discard it.
Try this regular expression:
\b([a-z]+[0-9]+[a-z0-9]*|[0-9]+[a-z]+[a-z0-9]*)\b
Or more compact:
\b([a-z]+[0-9]+|[0-9]+[a-z]+)[a-z0-9]*\b
This matches all words (note the word boundaries \b) that either start with one or more letters followed by one or more digits or vice versa that may be followed by one or more letters or digits. So the condition of at least one letter and at least one digit is always fulfilled.
With lookaheads:
'/\b(?![0-9]+\b)(?![a-z]+\b)[0-9a-z]+\b/i'
A quick test that also shows example usage:
$str = 'foo bar F0O 8ar';
$arr = array();
preg_match_all('/\b(?![0-9]+\b)(?![a-z]+\b)[0-9a-z]+\b/i', $str, $arr);
print_r($arr);
Output:
F0O
8ar
This will return all individual alphanumeric words, which you can loop through. I don't think regex can do the whole job by itself.
\b[a-z0-9]+\b
Make sure you mark that as case-insensitive.
\b(?:[a-z]+[0-9]+|[0-9]+[a-z]+)[[:alnum:]]*\b
'\b([a-zA-Z]+[0-9]+ | [0-9]+[a-zA-Z]+ | [a-zA-Z]+[0-9]+[a-zA-Z]*)\b'