VB.NET - Regex.Replace error with [ character - regex

I want to remove some characters from a textbox. It works, but when i try to replace the "[" character it gives a error. Why?
Return Regex.Replace(html, "[", "").Replace(",", " ").Replace("]", "").Replace(Chr(34), " ")
When i delete the "[", "").Replace( part it works great?
Return Regex.Replace(html, ",", " ").Replace("]", "").Replace(Chr(34), " ")

The problem is that since the [ character has a special meaning in regex, It must be escaped in order to use it as part of a regex sequence, therefore to escape it all you have to do is add a \ before the character.
Therefore this would be your proper regex code Return Regex.Replace(html, "\[", "").Replace(",", " ").Replace("]", "").Replace(Chr(34), " ")

Because [ is a reserved character that regex patterns use. You should always escape your search patterns using Regex.Escape(). This will find all reserved characters and escape them with a backslash.
Dim searchPattern = Regex.Escape("[")
Return Regex.Replace(html, searchPattern, ""). 'etc...
But why do you need to use regex anyway? Here's a better way of doing it, I think, using StringBuilder:
Dim sb = New StringBuilder(html) _
.Replace("[", "") _
.Replace(",", " ") _
.Replace("]", "") _
.Replace(Chr(34), " ")
Return sb.ToString()

Related

Str.global_replace in OCaml putting carats where they shouldn't be

I am working to convert multiline strings into a list of tokens that might be easier for me to work with.
In accordance with the specific needs of my project, I'm padding any carat symbol that appears in my input with spaces, so that "^" gets turned into " ^ ". I'm using something like the following function to do so:
let bad_function string = Str.global_replace (Str.regexp "^") " ^ " (string)
I then use something like the below function to then turn this multiline string into a list of tokens (ignoring whitespace).
let string_to_tokens string = (Str.split (Str.regexp "[ \n\r\x0c\t]+") (string));;
For some reason, bad_function adds carats to places where they shouldn't be. Take the following line of code:
(bad_function " This is some
multiline input
with newline characters
and tabs. When I convert this string
into a list of tokens I get ^s showing up where
they shouldn't. ")
The first line of the string turns into:
^ This is some \n ^
When I feed the output from bad_function into string_to_tokens I get the following list:
string_to_tokens (bad_function " This is some
multiline input
with newline characters
and tabs. When I convert this string
into a list of tokens I get ^s showing up where
they shouldn't. ")
["^"; "This"; "is"; "some"; "^"; "multiline"; "input"; "^"; "with";
"newline"; "characters"; "^"; "and"; "tabs."; "When"; "I"; "convert";
"this"; "string"; "^"; "into"; "a"; "list"; "of"; "tokens"; "I"; "get";
"^s"; "showing"; "up"; "where"; "^"; "they"; "shouldn't."]
Why is this happening, and how can I fix so these functions behave like I want them to?
As explained in the Str module.
^ Matches at beginning of line: either at the beginning of the
matched string, or just after a '\n' character.
So you have to quote the '^' character using the escape character "\".
However, note that (also from the doc)
any backslash character in the regular expression must be doubled to
make it past the OCaml string parser.
This means you have to put a double '\' to do what you want without getting a warning.
This should do the job:
let bad_function string = Str.global_replace (Str.regexp "\\^") " ^ " (string);;

How to only replace the vowels of words that match the words in a given array with a "*"?

I need to create a ruby method that accepts a string and an array and if any of the words in the string matches the words in the given array then all the vowels of the matched words in the string should be replaced with a "*". I have tried to do this using regex and an "if condition" but I don't know why this does not work. I'd really appreciate if somebody could explain me where I have gone wrong and how I can get this code right.
def censor(sentence, arr)
if arr.include? sentence.downcase
sentence.downcase.gsub(/[aeiou]/, "*")
end
end
puts censor("Gosh, it's so hot", ["gosh", "hot", "shoot", "so"])
#expected_output = "G*sh, it's s* h*t"
are.include? sentence.downcase reads, “If one of the elements of arr equals sentence.downcase ...”, not what you want.
baddies = ["gosh", "it's", "hot", "shoot", "so"]
sentence = "Gosh, it's so very hot"
r = /\b#{baddies.join('|')}\b/i
#=> /\bgosh|it's|hot|shoot|so\b/i
sentence.gsub(r) { |w| w.gsub(/[aeiou]/i, '*') }
#=> "G*sh *t's s* very h*t"
In the regular expression, \b is a word break and #{baddies.join('|')} requires a match of one of the baddies. The word breaks are to avoid, for example, "so" matching "solo" or "possible". One could alternatively write:
/\b#{Regexp.union(baddies).source}\b/
#=> /\bgosh|it's|hot|shoot|so\b/
See Regexp::union and Regexp#source. source is needed because Regexp.union(baddies) is unaffected by the case-indifference modifier (i).
Another approach is split the sentence into words, manipulate each word, then rejoin all the pieces to form a new sentence. One difficulty with this approach concerns the character "'", which serves double-duty as a single quote and an apostrophe. Consider
sentence = "She liked the song, 'don't box me in'"
baddies = ["don't"]
the approach I've given here yields the correct result:
r = /\b#{baddies.join('|')}\b/i
#=> /\bdon't\b/i
sentence.gsub(r) { |w| w.gsub(/[aeiou]/i, '*') }
#=> "She liked the song 'd*n't box me in'"
If we instead divide up the sentence into parts we might try the following:
sentence.split(/([\p{Punct}' ])/)
#=> ["She", " ", "liked", " ", "", " ", "the", " ", "song", ",", "",
# " ", "", "'", "don", "'", "t", " ", "box", " ", "me", " ", "in", "'"]
As seen, the regex split "don't" into "don" and "'t", not what we want. Clearly, distinguishing between single quotes and apostrophes is a non-trivial task. This is made difficult by the the fact that words can begin or end with apostrophes ("'twas") and most nouns in the possessive form that end with "s" are followed by an apostrophe ("Chris' car").
Your code does not return any value if the condition is valid.
One option is to split words by spaces and punctuation, manipulate, then rejoin:
def censor(sentence, arr)
words = sentence.scan(/[\w'-]+|[.,!?]+/) # this splits the senctence into an array of words and punctuation
res = []
words.each do |word|
word = word.gsub(/[aeiou]/, "*") if arr.include? word.downcase
res << word
end
res.join(' ') # add spaces also before punctuation
end
puts censor("Gosh, it's so hot", ["gosh", "hot", "shoot", "so"])
#=> G*sh , it's s* h*t
Note that res.join(' ') add spaces also before punctuation. I'm not so good with regexp, but this could solve:
res.join(' ').gsub(/ [.,!?]/) { |punct| "#{punct}".strip }
#=> G*sh, it's s* h*t
This part words = sentence.scan(/[\w'-]+|[.,!?]+/) returns ["Gosh", ",", "it's", "so", "hot"]

How can I remove all trailing backslashes from a string in Scala?

I want to remove all trailing backslashes ('\') from a string.
For example:
"ab" -> "ab"
"ab\\\\" -> "ab"
"\\\\ab\\" -> "\\\\ab"
"\\" -> ""
I am able to do this using below code but unable to handle the scenario where the String has only slash(es). Please let me know if this can be achieved through a different regex.
val str = """\\\\q\\"""
val regex = """^(.*[^\\])(\\+)$""".r
str match {
case regex(rest, slashes) => str.stripSuffix(slashes)
case _ => str
}
Converting my comment as an answer. This should work for removing all trailing backslashes:
str = str.replaceFirst("\\\\+$", "");
\\\\+ matches 1+ backslashes (single backslash is entered as \\\\ in Java/Scala).
While not a regex, I suggest a simpler solution : str.reverse.dropWhile(_ == '\\').reverse
Not using a regex, but you could use String.lastIndexWhere(p: (Char) ⇒ Boolean) to get the position of the last character which is not a '\' in order to substring until this character:
str.substring(0, str.lastIndexWhere(_ != '\\') + 1)
If, for some reason, you're committed to a regex solution, it can be done.
val regex = """[^\\]?(\\*)$""".r.unanchored
str match {
case regex(slashes) => str.stripSuffix(slashes)
}
You can do the same with slice function
str.slice(0,str.lastIndexWhere(_ != '\\')+1)

Scala: How to replace all consecutive underscore with a single space?

I want to replace all the consecutive underscores with a single space. This is the code that I have written. But it is not replacing anything. Below is the code that I have written. What am I doing wrong?
import scala.util.matching.Regex
val regex: Regex = new Regex("/[\\W_]+/g")
val name: String = "cust_id"
val newName: String = regex.replaceAllIn(name, " ")
println(newName)
Answer: "cust_id"
You could use replaceAll to do the job without regex :
val name: String = "cust_id"
val newName: String = name.replaceAll("_"," ")
println(newName)
The slashes in your regular expression don't belong there.
new Regex("[\\W_]+", "g").replaceAllIn("cust_id", " ")
// "cust id"
A string in Scala may be treated as a collection, hence we can map over it and in this case apply pattern matching to substitute characters, like this
"cust_id".map {
case '_' => " "
case c => c
}.mkString
Method mkString glues up the vector of characters back onto a string.

Wrong return from Regex.IsMatch - Regular expression

I want to find in string a specific string surrounded by white spaces. For example I want receive the value true from:
Regex.IsMatch("I like ZaleK", "zalek",RegexOptions.IgnoreCase)
and value false from:
Regex.IsMatch("I likeZaleK", "zalek",RegexOptions.IgnoreCase)
Here is my code:
Regex.IsMatch(w_all_file, #"\b" + TB_string.Text.Trim() + #"\b", RegexOptions.IgnoreCase) ;
It does not work when in the w_all_file is string I am looking for followed by "-"
For example: if w_all_file = "I like zalek_" - the string "zalek" is not found, but if
w_all_file = "I like zalek-" - the string "zalek" is found
Any ideas why?
Thanks,
Zalek
The \b character in regex doesn't consider an underscore as word boundry. You might want to change it to something like this:
Regex.IsMatch(w_all_file, #"[\b_]" + TB_string.Text.Trim() + #"[\b_]", RegexOptions.IgnoreCase) ;
That's what you need?
string input = "type your name";
string pattern = "your";
Regex.IsMatch(input, " " + pattern + " ");
\b matches at a word boundary, which are defined as between a character that is included in \w and one that is not. \w is the same as [a-zA-Z0-9_], so it matches underscores.
So basically, \b will match after the "k" in zalek- but not in zalek_.
It sounds like you want the match to also fail on zalek-, which you can do by using lookaround. Just replace the \b at the beginning with (?<![\w-]), and replace the \b at the end with (?![\w-]):
Regex.IsMatch(w_all_file, #"(?<![\w-])" + TB_string.Text.Trim() + #"(?![\w-])", RegexOptions.IgnoreCase) ;
Note that if you add additional characters to the character class [\w-], you need to make sure that the "-" is the very last character, or that you escape it with a backslash (if you don't it will be interpreted as a range of characters).