Remove all the non words except quote in regex - regex

I have the following:
string = re.sub("[^A-Za-z]]", ' ', string)
This works to remove all the non words. Now I would like to do almost the same but keep the single quotes in my string this time. How do I need to change my regex?
Example: Queen's son is sleeping, but he will wake up.
Result: queen's son is sleeping but he will wake up

You can just include the single quote escaped in your group:
([^A-Za-z\'])
Including it in your example:
string = re.sub("[^A-Za-z\']", ' ', string)
Edit: You don't need to escape single quote so:
string = re.sub("[^A-Za-z']", ' ', string)

Related

To extract only specific characters using Regex in python

I need to extract specific characters like brackets (not the elements within it), *, # etc and replace it with ' '. So I compiled my pattern like below
p = re.compile(r'\s([\[]).*|\s([\(]).*|\s([#]).*|\s([\{]).*|\s([\*]).*|\s([\<]).*|\s.*(\>)\s|\s.*
(\])\s|\s.*(\))\s|\s.*(#)\s|\s.*(\*)\s|\s.*(\})\s')
string = "hello (you) "
for match in re.finditer(p, string):
print(match.group())
This gives the output:
(you)
But what I am expecting is match to give the output list with the captured group like below
["(",")"]
so that I can replace it with ' ' and have the desired output as
hello you
Input: Abnormal heart rate (with fever) should be monitored. Insert your <Name> here.
Output:Abnormal heart rate with fever should be monitored. Insert your Name here.
This answer assumes that you want to replace terms in parentheses or angle brackets with only the content inside them. That is:
(with fever) -> with fever
<Name> -> Name
We can try using re.sub here with a callback function:
inp = "Abnormal heart rate (with fever) should be monitored. Insert your <Name> here."
print(re.sub(r'\(.*?\)|<.*?>', lambda x: re.sub(r'[()<>]', '', x.group(0)), inp))
This prints:
Abnormal heart rate with fever should be monitored. Insert your Name here.
The logic here is that we selectively target the (...) and <...> terms using an alternation. Then, we pass the entire match to a lambda callback which then replaces the surrounding symbols with just the content.
Just list all the characters you want to remove in a single character set, and use re.sub() to remove them.
print(re.sub(r'[[\](){}<>#*]', '', string))
I think you can proceed with replace all with space expect A-Z a-z if you also want digits 0-9 you can specify.
public class MyClass {
public static void main(String args[]) {
String string = "hello (you) hai";
String result =string.replaceAll("[^A-Z a-z]","");
System.out.println(result);
}
}
This will work but here we are using replaceAll();

Regex Query: Remove escape characters but retain punctuations from a string

I have a string that goes like - "\n\n\some text\t goes here. some\t\t other text goes here\b\n\n\n".
What I want - "some text goes here. some other text goes here."
Here is what I am doing: re.sub('[^A-Za-z0-9]+', ' ', s)
Problem is that this removes all the punctuations as well. How do I retain those?
Here's a solution that finds all the escape characters in your string and then removes them.
r = repr(s) # Convert escape sequences to literal backslashes
r = r[1:-1] # Remove the quote characters introduced by `repr`
escapes = set(re.findall(r'\\\w\d*', r)) # Get escape chars
answer = re.sub('|'.join(map(re.escape, escapes)), '', r) # Remove them
# \ome text goes here. some other text goes here

python: Removing all kinds of quotation marks

I have the following string:
txt="Daniel's car é à muito esperto"
I am trying to remove all kinds of quotation marks.
I tried:
txt=re.sub(r"\u0022\u201C\u201D\u0027\u2019\u2018\u2019\u0060\u00B4\'\"", ' ', txt)
I expected:
"Daniel s car é à muito esperto"
but actually nothing is happening.
The reason that the regex does not work is that it matches only a single string
r"\u0022\u201C\u201D\u0027\u2019\u2018\u2019\u0060\u00B4\'\""
To fix that one could use either alteration between each character or a character set.
txt=re.sub(r"[\u0022\u201C\u201D\u0027\u2019\u2018\u2019\u0060\u00B4\'\"]", ' ', txt)
One might need to pass the re.UNICODE flag. Untested.

How to determine if variable contains a specified string using RegEx

How can I write a condition which will compare Recipient.AdressEntry for example with the following String "I351" using RegEx?
Here is my If condition which works but is hardcoded to every known email address.
For Each recip In recips
If recip.AddressEntry = "Dov John, I351" Then
objMsg.To = "example#mail.domain"
objMsg.CC = recip.Address
objMsg.Subject = Msg.Subject
objMsg.Body = Msg.Body
objMsg.Send
End If
Next
The reason I need this condition is email may have one of several colleagues from my team and one or more from another team. AdressEntry of my colleagues ends with I351 so I will check if this email contains one of my teammates.
For Each recip In recips
If (recip.AddressEntry = "Dov John, I351" _
Or recip.AddressEntry = "Vod Nohj, I351") Then
objMsg.To = "example#mail.domain"
objMsg.CC = recip.Address
objMsg.Subject = Msg.Subject
objMsg.Body = Msg.Body
objMsg.Send
End If
Next
You still didn't clarify exactly what the condition you want to use for matching is, so I'll do my best:
If you simply want to check if the string ends with "I351", you don't need regex, you can use something like the following:
If recip.AddressEntry Like "*I351" Then
' ...
End If
If you want to check if the string follows this format "LastName FirstName, I351", you can achieve that using Regex by using something like the following:
Dim regEx As New RegExp
regEx.Pattern = "^\w+\s\w+,\sI351$"
If regEx.Test(recip.AddressEntry) Then
' ...
End If
Explanation of the regex pattern:
' ^ Asserts position at the start of the string.
' \w Matches any word character.
' + Matches between one and unlimited times.
' \s Matches a whitespace character.
' \w+ Same as above.
' , Matches the character `,` literally.
' \s Matches a whitespace character.
' I351 Matches the string `I351` literally.
' $ Asserts position at the end of the string.
Try it online.
Hope that helps.

Matching multiple quoted strings in a single line with regex

I want to match quoted strings of the form 'a string' within a line. My issue comes with the fact that I may have multiple strings like this in a single line. Something like
result = functionCall('Hello', 5, 'World')
I can search for phrases bounded by strings with ['].*['], and that picks up quoted strings just fine if there is a single one in a line. But with the above example it would find 'Hello', ', 5, ' and 'World', when I only actually want 'Hello' and 'World'. Obviously I need some way of knowing how many ' precede the currently found ' and not try to match when there is an odd amount.
Just to note, in my case strings are only defined using ', never ".
you should use [^']+ between quotes:
var myString = "result = functionCall('Hello', 5, 'World')";
var parts = myString.match(/'[^']+'/g);