regex and chars method usage. How to improve this regex - regex

#input = "rrgb"
def is_letters?
#input.chars.all? {|letter| letter == /[a..zA..Z]/}
end
def right_letters?
#input.chars.all? {|letter| letter =~ (/[rgbyrp]/)}
end
So #right_letters? will return true because it will return an arrays of trues : [true, true, true, true]. 0s are truthy and so it will return an array of trues?
#is_letters? will return an array of falses right? I can't use == there if I want the line to mean "the letter is either a lower case letter or uppercase letter".
Is there a better way to code "this letter is one of these letters :r,g,b,y,r,p

This is really a question about Ruby, and all? doesn't work like you think it does - it just returns false if the block ever returns a falsey value, and true otherwise.
To your question, yes. What you've missed is that regex operates on a whole string, you don't have to do one character at a time. So:
#input = "rrgb"
def is_letters?
! #input.match /[^a-z]/i
end
def right_letters?
#input.match /[rgbyrp]/
end
Note as well that I demonstrate above that the syntax you're trying to use for a character class range (ie. a..z) is wrong, the regex syntax is a-z.

Related

re pattern doesnt recognize comma value

I hope you can help me with this one.
I'm running an apply method on a pandas dataframe to identify if the value has a correct number format (since some of them have a comma value separating the thousands). Thing is that, as far as i can seen my regex pattern doesnt recognize the comma value. Here's my code:
def afloat(x):
x=str(x)
pattern=re.compile(r"\d+,\d\d\d")
return pattern.match(x)
data=["1,000","999","2,580"]
df=pd.DataFrame(data,columns=["data"])
df["status"]=df.apply(lambda x: afloat(df["data"]),axis=1)
what I get is the following, even though there are comma values that, as far as i can tell, they do match with the pattern i'm defining:
data status
0 1,000 None
1 999 None
2 2,580 None
I just can't identify what i'm doing wrong. thanks!
I tried this and it worked for me:
import re
def afloat(x):
x=str(x).replace(".",",")
pattern=re.compile(r"\d+,\d\d\d")
return pattern.match(x)
test = 100.123
print(afloat(test))
If you look at what you pass to .apply, you will see where the trouble is: .apply(lambda x: afloat(df["data"]),axis=1) passes the whole df["data"] column and not its current row value.
Instead, you should use .apply(lambda x: afloat(x["data"]),axis=1) where x denotes the current row.
Now, the pattern you have is used with re.match and thus will only be searched for at the start of the string, but after it, there may be more text. To make sure you match the entire string with the pattern, add $ at the end of the regex pattern.
However, since all you want to do is to check if a value matches some regex pattern and return a boolean column, you should consider using Series.str.match:
>>> df['data'].str.match(r'\d+,\d{3}$')
0 True
1 False
2 True
Name: data, dtype: bool
Here, \d+,\d{3}$ will match a string that starts with 1+ digits, a comma, and then three digits up to the end of string.

Using If and Else statements in Ruby

Very new to the world of programming and just starting to learn, working through Flatiron School prework and have been doing ok but unable to understand "if" and "else" statements for some reason. The problem is similiar to Chris Pine 'deaf grandma' problem but without saying "BYE!" three times.
~The method should take in a string argument containing a phrase and check to see if the phrase is written in all uppercase: if it isn't, then grandma can't hear you. She should then respond with (return) HUH?! SPEAK UP, SONNY!.
~However, if you shout at her (i.e. call the method with a string argument containing a phrase that is all uppercase, then she can hear you (or at least she thinks that she can) and should respond with (return) NO, NOT SINCE 1938!
I have so far:
def speak_to_grandma
puts "Hi Nana, how are you?".upcase
if false
puts "HUH?! SPEAK UP, SONNY!"
else
puts "NO, NOT SINCE 1938!"
end
end
but am getting wrong number of arguments...how am I supposed to add argument while using the if/else statements? This is probably a very easy and basic question but can't seem to get my head around this (overthinking probably).
Any help and clarity would be greatly appreciated.
input_phrase = "Hi Nana, how are you?"
def speak_to_grandma(phrase)
# Check if string equals same phrase all upper case letters, which means string is all uppercase
if phrase == phrase.upcase
# return this string if condition is true
puts "NO, NOT SINCE 1938!"
else
# return this string if condition is false
puts "HUH?! SPEAK UP, SONNY!"
end
end
# execute function passing input_phrase variable as argument
speak_to_grandma(input_phrase)
how am I supposed to add argument while using the if/else statements?
This is probably a very easy and basic question but can't seem to get
my head around this (overthinking probably).
Your mistake was that function was not accepting any arguments, here it accepts "phrase" variable as argument and processes it:
def speak_to_grandma(phrase)
You had
if false
but did not check what exactly is false.. To rewrite my version with "false" :
input_phrase = "Hi Nana, how are you?"
def speak_to_grandma(phrase)
# Check if it is false that string is all upper case
if (phrase == phrase.upcase) == false
# return this string if condition is false
puts "HUH?! SPEAK UP, SONNY!"
else
# return this string if condition is true
puts "NO, NOT SINCE 1938!"
end
end
speak_to_grandma(input_phrase)
Here I am evaluating
if (phrase == phrase.upcase) == false
Basically means "if expression that phrase equals phrase all uppercase is false"

Regex: allow for the occurrence of a certain character up to one time

I want to search for a specific (DNA) string 'AGCTAGCT' and allow for the occurrence of one (and only one) mismatch (signified as 'N').
The following are matches (no or one N):
AGCTAGCT
NGCTAGCT
AGCNAGCT
The following are not matches (two or more Ns):
AGNTAGCN
AGNTANCN
Use negative lookahead at the start to check for the strings whether it contains two N's or not.
^(?!.*?N.*N)[AGCTN]{8}$
I assumed that you string contains only A,G,C,T,N letters.
^(?!.*?N.*N)[AGCTN]+$
Or simply like this,
^(?!.*?N.*N).+$
DEMO
In any language you could do something like this
var count = str.match(/N/g).length; // just count the number of N in the string
if(count == 1 || count == 0) { // and compare it
// str valid
}
If you only want a regex, you could use this regex
/^[^N]*N?[^N]*$/
You can test if the string matches the above regex or not.
if you are using python, you can make it without regex:
myList = []
for word in dna :
if word.count('N') < 2 :
myList.append(word)
and now, if you want to generate all the DNA, i dont know how DNA takes letters, but this can save you:
import itertools
letters = ['A', 'G', 'C', 'T', 'N']
for letter in itertools.permutations(letters):
print ''.join(letter)
then, you will have all the permutations you can have from the four letters.
I think a regular expression is not the best choice for doing this. I say that because (at least to my knowledge) there is no easy way to express an arbitrary string to match with at most one mistake, other than explicitly considering all the possible mistakes.
being said that, it'd be something like this
AGCTAGCT|NGCTAGCT|ANCTAGCT|AGNTAGCT|AGCNAGCT|AGCTNGCT|AGCTANCT|AGCTAGNT|AGCTAGCN
maybe it can be simplified a bit.
EDIT
Given that N is a mismatch, a regular expression to accept what you want should replace each N with the wrong alternatives.
AGCTAGCT|[GCT]GCTAGCT|A[ACT]CTAGCT|AG[AGT]TAGCT|AGC[AGC]AGCT
|AGCT[GCT]GCT|AGCTA[ACT]CT|AGCTAG[AGT]T|AGCTAGC[AGC]
Simplifying...
(A(G(C(T(A(G(C(T|[AGC])|[AGT]T)|[ACT]CT)|[GCT]GCT)|[AGC]AGCT)|[AGT]TAGCT)|[ACT]CTAGCT)|[GCT]GCTAGCT)
Demo replacing N with wrong choices https://regex101.com/r/bB0gX1/1.

Regex pattern for validating password

I am working with a small issue, but I don't know how to solve it clearly. I have to validate a generated password, with some constraints:
password length: [8, 24]
password contains
at least 1 lower case character
at least 1 upper case character
at least 1 digit
at least 1 special character (printable based on ASCII code)
I've used Regex pattern, but it didn't work correctly with both cases: valid and invalid.
The first RegEx pattern:
def pattern = /(=?.{8,24})((:?[a-z]+)(:?[0-9]+)(:?[A-Z]+)(:?\W+))/
can check all invalid passwords but not for the valid one.
The second RegEx pattern:
def pattern = /(=?.{8,24})((:?[a-z]*)(:?[0-9]*)(:?[A-Z]*)(:?\W*))/
can check all valid passwords but not for the invalid one.
I am new to Groovy, so I don't know how to create the correct RegEx pattern to solve this.
Could you please help me?
Regex is not a solution to everything, and trying to come up with a single regex for a given problem is often wasting brain cycles. Just separate it out into multiple tests, for example (this Perl-like pseudo code, but you should be able to transform that to the language you are using):
sub valid_pw
{
return false if (length($_) < 8 || length($_) > 24);
# don't use [a-z], it makes for nasty surprises in e.g. fi_FI
return false if (!/[[:lower:]]/);
return false if (!/[[:upper:]]/);
return false if (!/[[:digit:]]/);
return false if (!/[[:print:]]/);
return true;
}
Why do you need a single regex for this? (also why are you placing a maximum length on the password, which is another discussion)
/^.{8,24}$/
/[a-z]/
/[A-Z]/
/\d/
/[^\d\w]/
I guess you could combine them using lookaheads (say /(?=.*[a-z])(?=.*[A-Z])...), but if you do it's probably a really good idea to comment it heavily.

Regular Expression for username

I need help on regular expression on the condition (4) below:
Begin with a-z
End with a-z0-9
allow 3 special characters like ._-
The characters in (3) must be followed by alphanumeric characters, and it cannot be followed by any characters in (3) themselves.
Not sure how to do this. Any help is appreciated, with the sample and some explanations.
You can try this:
^(?=.{5,10}$)(?!.*[._-]{2})[a-z][a-z0-9._-]*[a-z0-9]$
This uses lookaheads to enforce that username must have between 5 and 10 characters (?=.{5,10}$), and that none of the 3 special characters appear twice in a row (?!.*[._-]{2}), but overall they can appear any number of times (Konrad interprets it differently, in that the 3 special characters can appear up to 3 times).
Here's a test harness in Java:
String[] test = {
"abc",
"abcde",
"acd_e",
"_abcd",
"abcd_",
"a__bc",
"a_.bc",
"a_b.c-d",
"a_b_c_d_e",
"this-is-too-long",
};
for (String s : test) {
System.out.format("%s %B %n", s,
s.matches("^(?=.{5,10}$)(?!.*[._-]{2})[a-z][a-z0-9._-]*[a-z0-9]$")
);
}
This prints:
abc FALSE
abcde TRUE
acd_e TRUE
_abcd FALSE
abcd_ FALSE
a__bc FALSE
a_.bc FALSE
a_b.c-d TRUE
a_b_c_d_e TRUE
this-is-too-long FALSE
See also
regular-expressions.info/Lookarounds, limiting repetitions, and anchors
So basically:
Start with [a-z].
Allow a first serving of [a-z0-9], several times. 1)
Allow
at most one of [._-], followed by
at least one of [a-z0-9]
three times or less.
End with [a-z0-9] (implied in the above).
Which yields:
^[a-z][a-z0-9]*([._-][a-z0-9]+){0,3}$
But beware that this may result in user names with only one character.
1) (posted by #codeka)
try that:
^[a-zA-Z](([\._\-][a-zA-Z0-9])|[a-zA-Z0-9])*[a-z0-9]$
1) ^[a-zA-Z]: beginning
2) (([._-][a-zA-Z0-9])|[a-zA-Z0-9])*: any number of either alphanum, or special char followed by alphanum
3) [a-z0-9]$
Well, because I feel like being ... me. One Regex does not need to rule them all -- and for one of the Nine, see nqueens. (However, in this case there are some nice answers already; I'm just pointing out a slightly different approach.)
function valid(n) {
return (n.length > 3
&& n.match(/^[a-z]/i)
&& n.match(/[a-z0-9]$/i)
&& !n.match(/[._-]{2}/)
}
Now imagine that you only allow one ., _ or - total (perhaps I misread the initial requirements shrug); that's easy to add (and visualize):
&& n.replace(/[^._-]/g, "").length <= 1
And before anyone says "that's less efficient", go profile it in the intended usage. Also note, I didn't give up using regular expressions entirely, they are a wonderful thing.