Ruby, checking if an item exists in an array using Regular expression - regex

I am attempting to search through an array of strings (new_string) and check if it includes any 'operators'
where am I going wrong?
def example
operators = ["+", "-"]
string = "+ hi"
new_string = string.split(" ")
if new_string.include? Regexp.union(operators)
print "true"
else
print "false"
end
end

You can use any? instead, which takes a pattern:
pattern = Regexp.union(['+', '-']) #=> /\+|\-/
['foo', '+', 'bar'].any?(pattern) #=> true
But since you already have a string, you can skip the splitting and use match?:
'foo + bar'.match?(pattern) #=> true

You wish to determine if a string (string) contains at least one character in a given array of characters (operators). The fact that those characters are '+' and '-' is not relevant; the same methods would be used for any array of characters. There are many ways to do that. #Stefan gives one. Here are a few more. None of them mutate (modify) string.
string = "There is a + in this string"
operators = ["+", "-"]
The following is used in some calculations.
op_str = operators.join
#=> "+-"
#1
r = /[#{ op_str }]/
#=> /[+-]/
string.match?(r)
#=> true
[+-] is a character class. It asserts that the string matches any character in the class.
#2
string.delete(op_str).size < string.size
#=> true
See String#delete.
#3
string.tr(op_str, '').size < string.size
#=> true
See String#tr.
#4
string.count(op_str) > 0
#=> true
See String#count.
#5
(string.chars & operators).any?
#=> true
See Array#&.

Related

How to only replace the vowels of words that match the words in a given array with a "*"?

I need to create a ruby method that accepts a string and an array and if any of the words in the string matches the words in the given array then all the vowels of the matched words in the string should be replaced with a "*". I have tried to do this using regex and an "if condition" but I don't know why this does not work. I'd really appreciate if somebody could explain me where I have gone wrong and how I can get this code right.
def censor(sentence, arr)
if arr.include? sentence.downcase
sentence.downcase.gsub(/[aeiou]/, "*")
end
end
puts censor("Gosh, it's so hot", ["gosh", "hot", "shoot", "so"])
#expected_output = "G*sh, it's s* h*t"
are.include? sentence.downcase reads, “If one of the elements of arr equals sentence.downcase ...”, not what you want.
baddies = ["gosh", "it's", "hot", "shoot", "so"]
sentence = "Gosh, it's so very hot"
r = /\b#{baddies.join('|')}\b/i
#=> /\bgosh|it's|hot|shoot|so\b/i
sentence.gsub(r) { |w| w.gsub(/[aeiou]/i, '*') }
#=> "G*sh *t's s* very h*t"
In the regular expression, \b is a word break and #{baddies.join('|')} requires a match of one of the baddies. The word breaks are to avoid, for example, "so" matching "solo" or "possible". One could alternatively write:
/\b#{Regexp.union(baddies).source}\b/
#=> /\bgosh|it's|hot|shoot|so\b/
See Regexp::union and Regexp#source. source is needed because Regexp.union(baddies) is unaffected by the case-indifference modifier (i).
Another approach is split the sentence into words, manipulate each word, then rejoin all the pieces to form a new sentence. One difficulty with this approach concerns the character "'", which serves double-duty as a single quote and an apostrophe. Consider
sentence = "She liked the song, 'don't box me in'"
baddies = ["don't"]
the approach I've given here yields the correct result:
r = /\b#{baddies.join('|')}\b/i
#=> /\bdon't\b/i
sentence.gsub(r) { |w| w.gsub(/[aeiou]/i, '*') }
#=> "She liked the song 'd*n't box me in'"
If we instead divide up the sentence into parts we might try the following:
sentence.split(/([\p{Punct}' ])/)
#=> ["She", " ", "liked", " ", "", " ", "the", " ", "song", ",", "",
# " ", "", "'", "don", "'", "t", " ", "box", " ", "me", " ", "in", "'"]
As seen, the regex split "don't" into "don" and "'t", not what we want. Clearly, distinguishing between single quotes and apostrophes is a non-trivial task. This is made difficult by the the fact that words can begin or end with apostrophes ("'twas") and most nouns in the possessive form that end with "s" are followed by an apostrophe ("Chris' car").
Your code does not return any value if the condition is valid.
One option is to split words by spaces and punctuation, manipulate, then rejoin:
def censor(sentence, arr)
words = sentence.scan(/[\w'-]+|[.,!?]+/) # this splits the senctence into an array of words and punctuation
res = []
words.each do |word|
word = word.gsub(/[aeiou]/, "*") if arr.include? word.downcase
res << word
end
res.join(' ') # add spaces also before punctuation
end
puts censor("Gosh, it's so hot", ["gosh", "hot", "shoot", "so"])
#=> G*sh , it's s* h*t
Note that res.join(' ') add spaces also before punctuation. I'm not so good with regexp, but this could solve:
res.join(' ').gsub(/ [.,!?]/) { |punct| "#{punct}".strip }
#=> G*sh, it's s* h*t
This part words = sentence.scan(/[\w'-]+|[.,!?]+/) returns ["Gosh", ",", "it's", "so", "hot"]

Why condition returns True using regular expressions for finding special characters in the string?

I need to validate the variable names:
name = ["2w2", " variable", "variable0", "va[riable0", "var_1__Int", "a", "qq-q"]
And just names "variable0", "var_1__Int" and "a" are correct.
I could Identify most of "wrong" name of variables using regex:
import re
if re.match("^\d|\W|.*-|[()[]{}]", name):
print(False)
else:
print(True)
However, I still become True result for va[riable0. Why is it the case?
I control for all type of parentheses.
.match() checks for a match only at the beginning of the string, while .search() checks for a match anywhere in the string.
You can also simplify your regex to this and call search() method:
^\d|\W
That basically checks whether first character is digit or a non-word is anywhere in the input.
RegEx Demo
Code Demo
Code:
>>> name = ["2w2", " variable", "variable0", "va[riable0", "var_1__Int", "a", "qq-q"]
>>> pattern = re.compile(r'^\d|\W')
>>> for str in name:
... if pattern.search(str):
... print(str + ' => False')
... else:
... print(str + ' => True')
...
2w2 => False
variable => False
variable0 => True
va[riable0 => False
var_1__Int => True
a => True
qq-q => False
Your expression is:
"^\d|\W|.*-|[()[]{}]"
But re.match() matches from the beginning of the string always, so your ^ is unnecessary, but you need a $ at the end, to make sure the entire input string matches, and not just a prefix.

groovy regex, how to match array items in a string

The string looks like this "[xx],[xx],[xx]"
Where xx is a ploygon like this "(1.0,2.3),(2.0,3)...
Basically, we are looking for a way to get the string between each pair of square brackets into an array.
E.g. String source = "[hello],[1,2],[(1,2),(2,4)]"
would result in an object a such that:
a[0] == 'hello'
a[1] == '1,2'
a[2] == '(1,2),(2,4)'
We have tried various strategies, including using groovy regex:
def p = "[12],[34]"
def points = p =~ /(\[([^\[]*)\])*/
println points[0][2] // yields 12
However,this yields the following 2 dim array:
[[12], [12], 12]
[, null, null]
[[34], [34], 34]
so if we took the 3rd item from every even rows we would be ok, but this does look very correct. We are not talking into account the ',' and we are not sure why we are getting "[12]" twice, when it should be zero times?
Any regex experts out there?
I think that this is what you're looking for:
def p = "[hello],[1,2],[(1,2),(2,4)]"
def points = p.findAll(/\[(.*?)\]/){match, group -> group }
println points[0]
println points[1]
println points[2]
This scripts prints:
hello
1,2
(1,2),(2,4)
The key is the use of the .*? to make the expression non-greedy to found the minimum between chars [] to avoid that the first [ match with the last ] resulting match in hello],[1,2],[(1,2),(2,4) match... then with findAll you returns only the group captured.
Hope it helps,

Why ++ becomes -+-+-+- : string.gsub "strange" behavior

Why ++ becomes -+-+-+- ?
I'd like to clean a string from double operating signs. How should I process ?
String = "++"
print (String ) -- -> ++
String = string.gsub( String, "++", "+")
print (String ) -- -> + ok
String = string.gsub( String, "--", "+")
print (String ) -- -> +++ ?
String = string.gsub( String, "+-", "-")
print (String ) -- -> -+-+-+- ??
String = string.gsub( String, "-+", "-")
print (String ) -- -> -+-+-+- ??? ;-)
The core problem is that gsub operates on patterns (Lua's minimal regular expressions) and your string contains unescaped magic characters. However, even knowing that I found myself surprised by your results.
It's easier to see what gsub is doing if we change the replacement string:
string.gsub('+', '--', '|') => |+|
string.gsub('+++', '--', '|') => |+|+|+|
- means "0 or more occurrences of the preceding atom". Unlike +, it's non-greedy, matching the fewest characters possible.
I just tested it and apparently "fewest characters possible" mostly means 0 characters. For instance, my intuition about this:
string.gsub('aaa','a-', '|')
Is that the expression a- would match each a, replace them with '|', resulting in '|||'. In fact, it matches on the 0-length gaps before and after each character, resulting in: '|a|a|a|'
In fact, it doesn't matter what atom we precede with -, it always matches on the smallest length, 0:
string.gsub('aaa','x-', '|') => |a|a|a|
string.gsub('aaa','a-', '|') => |a|a|a|
string.gsub('aaa','?-', '|') => |a|a|a|
string.gsub('aaa','--', '|') => |a|a|a|
You can see that last one is your case and explains your results. Your next result is the exact same thing:
string.gsub('+++','+-','|') => |+|+|+|
Your final result is more straightforward:
string.gsub('-+-+-+-','-+','|') => |+|+|+|
In this case, you're matching "1 or more occurances of the atom -", so you're just replacing the - characters, just as you'd expect.

How can I return true only if one of a set of strings matches?

I want to return true if the user enters only one of a set of possible matches. Similar to an XOR operator, but only one string out of the entire group may exist in the input. Here is my code:
if input.match?(/str|con|dex|wis|int|cha/)
The following inputs should return true:
+2 int
-3con
str
con
wisdom
dexterity
The following inputs should return false:
+1 int +2 cha
-4dex+3con-1cha
int cha
str dex
con wis cha
strength intelligence
strdex
I'd probably go with String#scan and a simple regex so that you can understand what you've done later:
if input.scan(/str|dex|con|int|wis|cha/).length == 1
# Found exactly one
else
# Didn't find it or found too many
end
That also makes it easier to distinguish between the various ways it can fail.
Presumably your strings will be relatively small so scanning the string for all the matches won't have any noticeable overhead.
The following are three ways to answer the question without creating an intermediate array. All employ the regular expression:
R = /str|con|dex|wis|int|cha/
and return the following:
one_match? "It wasn't a con, really" #=> true
one_match? "That sounds to me like a wild guess." #=> falsy (nil or false)
one_match? "Both int and dex are present." #=> falsy (nil or false)
one_match? "Three is an integer." #=> true
one_match? "Both int and indexes are present." #=> falsy (nil or false)
#1 Do first and last match begin at the same index?
def one_match?(s)
(idx = s.index(R)) && idx == s.rindex(R)
end
See String#index and String#rindex.
#2 Use the form of String#index that takes an argument equal to the index at which the search is to begin.
def one_match?(s)
s.index(R) && s.index(R, Regexp.last_match.end(0)).nil?
end
See Regexp::last_match and MatchData#end. Regexp.last_match can be replaced by $~.
#3 Use the form of String#gsub that takes one argument and no block to create an enumerator that generates matches
def one_match?(s)
s.gsub(/str|con|dex|wis|int|cha/).count { true } == 1
end
See Enumerable#count.
Alternatively,
s.gsub(/str|con|dex|wis|int|cha/).to_a.size == 1
though this has the disadvantage of creating a temporary array.
To match whole words only
In the penultimate example 'int' matches 'int' in 'integer' and in the last match 'dex' matches 'dex' in 'indexes'. To enforce full-word matches the regular expression can be changed to:
/\b(?:str|con|dex|wis|int|cha)\b/
If you have to use a regex you may use
/\A(?!(?:.*(str|con|dex|wis|int|cha)){2}).*\g<1>/m
See the regex demo
Details
\A - start of string
(?!(?:.*(str|con|dex|wis|int|cha)){2}) - no two occurrences of any 0+ chars followed with str, con, dex, wis, int, cha
.* - any 0+ chars as many as possible
\g<1> - Group 1 pattern (str, con, dex, wis, int or cha).