Regex differentiate between all vs few char Uppercase in String - regex

I have to pass a string into a program, depending on the string, it will return only one response value. I am facing difficulty in building patterns for two cases.
If a string ends with '?' and is not all uppercase return 'x', no matter what the contents of string.
If a string end with '?' and is all uppercase return 'y'.
If a string ends with '!' , or is all uppercase (no question mark at end) return 'z'.
If a string is only whitespace return 'a'.
Here are two example strings, they are to be four separate patterns -
phrase1 = "Simple String with some UPPercase in Between ends with?"
phrase2 = "BIG STRING ALL CAPS ENDS WITH?"
phrase3_a = "ALLCAPSSTRING NOTHING AT THE END OF STRING"
phrase3_b = "Any String with ALL UPPERCASE (or not) but ends with!"
phrase4 = "\t\t\t\t"
I haven't built accurate patterns, and that's what I'm asking here. After that I plan to use a single re.compile with all patterns & then finditer to use the group which is not None. In code below, I have removed the whitespaces,since if none of the other patterns match, matching a whitespace pattern [\s] will return None, which I can use separetely-
phrase=re.sub(r'[\s]','',phrase)
pattern_phrase1 = re.compile (r'[a-zA-Z0-9]\?$')
pattern_phrase2 = re.compile (r'[A-Z0-9]\?$')
pattern_phrase3 = re.compile (r'[A-Z]|[.!$]')

Solution 1 - using isx functions
def hey(phrase):
responses ={'ques':x,'ques_yell':y,'yell':z,'onlycall':b,'what':c}
phrase=''.join(phrase.split())
if phrase=='':
return responses['onlycall']
if phrase.isupper():
if phrase[-1]=='?':
return responses['ques_yell']
return responses['yell']
elif not phrase.isupper():
if phrase[-1]=='?':
return responses['ques']
return responses['what']

Related

Regex: Last Occurrence of a Repeating Character

So, I am looking for a Regex that is able to match with every maximal non-empty substring of consonants followed by a maximal non-empty substring of vowels in a String
e.g. In the following strings, you can see all expected matches:
"zcdbadaerfe" = {"zcdba", "dae", "rfe"}
"foubsyudba" = {"fou", "bsyu", "dba"}
I am very close! This is the regex I have managed to come up with so far:
([^aeiou].*?[aeiou])+
It returns the expected matches except for it only returns the first of any repeating lengths of vowels, for example:
String: "cccaaabbee"
Expected Matches: {"cccaaa", "bbee"}
Actual Matches: {"ccca", "bbe"}
I want to figure out how I can include the last found vowel character that comes before (a) a constant or (b) the end of the string.
Thanks! :-)
Your pattern is slightly off. I suggest using this version:
[b-df-hj-np-tv-z]+[aeiou]+
This pattern says to match:
[b-df-hj-np-tv-z]+ a lowercase non vowel, one or more times
[aeiou]+ followed by a lowercase vowel, one or more times
Here is a working demo.
const rgx = /[^aeiou]+[aeiou]+(?=[^aeiou])|.*[aeiou](?=\b)/g;
Segment
Description
[^aeiou]+
one or more of anything BUT vowels
[aeiou]+
one or more vowels
(?=[^aeiou])
will be a match if it is followed by anything BUT a vowel
|
OR
.*[aeiou](?=\b)
zero or more of any character followed by a vowel and it needs to be followed by a non-word
function lastVowel(str) {
const rgx = /[^aeiou]+[aeiou]+(?=[^aeiou])|.*[aeiou](?=\b)/g;
return [...str.matchAll(rgx)].flat();
}
const str1 = "cccaaabbee";
const str2 = "zcdbadaerfe";
const str3 = "foubsyudba";
console.log(lastVowel(str1));
console.log(lastVowel(str2));
console.log(lastVowel(str3));

regex to extract substring for special cases

I have a scenario where i want to extract some substring based on following condition.
search for any pattern myvalue=123& , extract myvalue=123
If the "myvalue" present at end of the line without "&", extract myvalue=123
for ex:
The string is abcdmyvalue=123&xyz => the it should return myvalue=123
The string is abcdmyvalue=123 => the it should return myvalue=123
for first scenario it is working for me with following regex - myvalue=(.?(?=[&,""]))
I am looking for how to modify this regex to include my second scenario as well. I am using https://regex101.com/ to test this.
Thanks in Advace!
Some notes about the pattern that you tried
if you want to only match, you can omit the capture group
e* matches 0+ times an e char
the part .*?(?=[&,""]) matches as least chars until it can assert eiter & , or " to the right, so the positive lookahead expects a single char to the right to be present
You could shorten the pattern to a match only, using a negated character class that matches 0+ times any character except a whitespace char or &
myvalue=[^&\s]*
Regex demo
function regex(data) {
var test = data.match(/=(.*)&/);
if (test === null) {
return data.split('=')[1]
} else {
return test[1]
}
}
console.log(regex('abcdmyvalue=123&3e')); //123
console.log(regex('abcdmyvalue=123')); //123
here is your working code if there is no & at end of string it will have null and will go else block there we can simply split the string and get the value, If & is present at the end of string then regex will simply extract the value between = and &
if you want to use existing regex then you can do it like that
var test = data1.match(/=(.*)&|=(.*)/)
const result = test[1] ? test[1] : test[2];
console.log(result);

How to check if a string only contains letters, numbers, underscores and period. Flutter/Dart

I want to check if a string only contains:
Letters
Numbers
Underscores
Periods
in Flutter, I tried the following to get only the letters but even if other characters are there it returns true if it contains a letter:
String mainString = "abc123";
print(mainString.contains(new RegExp(r'[a-z]')));
As I told it returns true since it contains letters, but I want to know if it only contains letters.
Is there a way to do that?
The problem with your RegExp is that you allow it to match substrings, and you match only a single character. You can force it to require that the entire string be matched with ^ and $, and you can match against one or more of the expression with +:
print(RegExp(r'^[a-z]+$').hasMatch(mainString));
To match all the characters you mentioned:
print(RegExp(r'^[A-Za-z0-9_.]+$').hasMatch(mainString));
the basic way of doing this is as follow:
define a list of acceptable characters:
// for example
List<String> validChar = ["1", "2", "3", "t"];
loop through all character of your string and check its validity:
// given text
String x = "t5";
bool valid = true;
for(int i=0; i<x.length; i++){
if(!validChar.contains(x[i])){
valid = false;
}
}
print(valid);
just change the x and validChar as your need.

Dart Regex does not match whole word for Arabic text

This pattern works fine in Java and javascript but does not seem to work in Dart. Any help is appreciated.
void main() {
String englishText = "The new nature will not find rest";
String englishFind = "Nature";
RegExp englishExp = new RegExp("\\b$englishFind\\b", unicode:true, caseSensitive:false);
bool englishResult = englishExp.hasMatch(englishText);//matches
print(englishResult); //true
String arabicText = "لن تجد الطبيعة الجديدة راحتها";
String arabicFind="الطبيعة";
RegExp arabicExp = new RegExp("\\b$arabicFind\\b", unicode:true);
bool arabicResult = arabicExp.hasMatch(arabicText);//does not match
print(arabicResult);//false
}
\b word boundary is still matching only in ASCII only contexts even when you define unicode:true whose main point is to make sure "UTF-16 surrogate pairs in the original string will be treated as a single code point and will not match separately".
You may "decompose" the word boundary and add Arabic letter and digit ranges to the class:
String arabicText = "لن تجد الطبيعة الجديدة راحتها";
String arabicFind="الطبيعة";
RegExp arabicExp = new RegExp("(?:^|[^a-zA-Z0-9_\\u06F0-\\u06F9\\u0622\\u0627\\u0628\\u067E\\u062A-\\u062C\\u0686\\u062D-\\u0632\\u0698\\u0633-\\u063A\\u0641\\u0642\\u06A9\\u06AF\\u0644-\\u0648\\u06CC\\u202C\\u064B\\u064C\\u064E-\\u0652])$arabicFind(?![a-zA-Z0-9_\\u06F0-\\u06F9\\u0622\\u0627\\u0628\\u067E\\u062A-\\u062C\\u0686\\u062D-\\u0632\\u0698\\u0633-\\u063A\\u0641\\u0642\\u06A9\\u06AF\\u0644-\\u0648\\u06CC\\u202C\\u064B\\u064C\\u064E-\\u0652])", unicode:true);
bool arabicResult = arabicExp.hasMatch(arabicText);//does not match
print(arabicResult); // => true
The regex will match an $arabicFind word when it is
(?:^|[^a-zA-Z0-9_\u06F0-\u06F9\u0622\u0627\u0628\u067E\u062A-\u062C\u0686\u062D-\u0632\u0698\u0633-\u063A\u0641\u0642\u06A9\u06AF\u0644-\u0648\u06CC\u202C\u064B\u064C\u064E-\u0652]) - preceded with start of string (^) or (|) any char but ASCII letter, digit or _ and Farsi letters or digits
(?![a-zA-Z0-9_\u06F0-\u06F9\u0622\u0627\u0628\u067E\u062A-\u062C\u0686\u062D-\u0632\u0698\u0633-\u063A\u0641\u0642\u06A9\u06AF\u0644-\u0648\u06CC\u202C\u064B\u064C\u064E-\u0652]) - not followed with an ASCII letter, digit or _ and Farsi letters or digits.

How to validate a string to have only certain letters by perl and regex

I am looking for a perl regex which will validate a string containing only the letters ACGT. For example "AACGGGTTA" should be valid while "AAYYGGTTA" should be invalid, since the second string has "YY" which is not one of A,C,G,T letters. I have the following code, but it validates both the above strings
if($userinput =~/[A|C|G|T]/i)
{
$validEntry = 1;
print "Valid\n";
}
Thanks
Use a character class, and make sure you check the whole string by using the start of string token, \A, and end of string token, \z.
You should also use * or + to indicate how many characters you want to match -- * means "zero or more" and + means "one or more."
Thus, the regex below is saying "between the start and the end of the (case insensitive) string, there should be one or more of the following characters only: a, c, g, t"
if($userinput =~ /\A[acgt]+\z/i)
{
$validEntry = 1;
print "Valid\n";
}
Using the character-counting tr operator:
if( $userinput !~ tr/ACGT//c )
{
$validEntry = 1;
print "Valid\n";
}
tr/characterset// counts how many characters in the string are in characterset; with the /c flag, it counts how many are not in the characterset. Using !~ instead of =~ negates the result, so it will be true if there are no characters not in characterset or false if there are characters not in characterset.
Your character class [A|C|G|T] contains |. | does not stand for alternation in a character class, it only stands for itself. Therefore, the character class would include the | character, which is not what you want.
Your pattern is not anchored. The pattern /[ACGT]+/ would match any string that contains one or more of any of those characters. Instead, you need to anchor your pattern, so that only strings that contain just those characters from beginning to end are matched.
$ can match a newline. To avoid that, use \z to anchor at the end. \A anchors at the beginning (although it doesn't make a difference whether you use that or ^ in this case, using \A provides a nice symmetry.
So, you check should be written:
if ($userinput =~ /\A [ACGT]+ \z/ix)
{
$validEntry = 1;
print "Valid\n";
}