Python: Regex matches in Windows but not in Linux 4.4.59+ - regex

I have this code, trying to replace curly quotes with straight quotes
quoteChars = [u'\u2018', u'\u2019']
pattern = u'({0})'.format('|'.join(quoteChars))
matched = re.search(pattern, myString) # match against whole string
if matched:
self.log('SELF:: Search Query:: Replacing characters in string. Found one of these {0}'.format(pattern))
myString = re.sub(pattern, "'", myString)
self.log('SELF:: Amended Search Query [{0}]'.format(myString))
else:
self.log('SELF:: Search Query:: String has none of these {0}'.format(pattern))
I set the variable myString to the following (‘Pop‑Up’ Edition)
In windows it correctly detects that there are curly apostrophe's but when I try it on a Mac whic reports its OS as Linux 4.4.59+ it does not match the pattern..
Do I have to set the regex pattern differently on Linux? And what are the rules? in relation to curly apostrophes both single and double, opening or closing?

I'd use regex escapes:
quoteChars = [r'\u2018', r'\u2019']
pattern = '|'.join(quoteChars)
Then
myString = re.sub(pattern, "'", myString)

Related

searching a string substring using regex in python

hello i'm trying to search a string for its substrings and return "yes" if found.
for exp : i have string Deracu876, substrings are {D,d,e,E,r,R,A,a,c,C,u,U,8,7,6} so here is the result :
deracu876 :yes
Deracu8762:no
Dderacu876 : yes
sNdAp725:no
here is the code i wrote using regex but not working
import re
def match(text,pattern):
# regex
# searching pattern
if re.search(pattern,text,re.IGNORECASE):
return('Yes')
else:
return('No')
text=input()
pattern=""
for w in text :
pattern=pattern+'|'+w
print(match("Deracu8762",pattern))
Your for loop is putting a | at the beginning of the pattern, e.g. if text is abc, the pattern is |a|b|c. This will match an empty string, which is a substring of every string.
You can simply wrap [] around the characters, e.g. [deracu876]'. This matches any of those characters.
You also need to make another pattern that rejects characters that aren't in text. You can do this by putting the characters in [^], e.g. [^deracu876].
def match(text, substring):
if re.search('[' + substring + ']', text, re.IGNORECASE) and not re.search('[^' + substring '], text, re.IGNORECASE):
return "True"
else:
return "False"
text = input()
print(match("Deracu8762",text))

Getting words Starting with symbol in dart

I'm trying to parse in Dart long strings containing hashtags, so far I tried various combinations with regexp but I cannot find the right use.
My code is
String mytestString = "#one #two, #three#FOur,#five";
RegExp regExp = new RegExp(r"/(^|\s)#\w+/g");
print(regExp.allMatches(mytestString).toString());
The desidered output would be a list of hahstags
#one #two #three #FOur #five
Thankyou in advance
You should not use a regex literal inside a string literal, or backslashes and flags will become part of the regex pattern. Also, omit the left-hand boundary pattern (that matches start of string or whitespace) if you need to match # followed with 1+ word chars in any context.
Use
String mytestString = "#one #two, #three#FOur,#five";
final regExp = new RegExp(r"#\w+");
Iterable<String> matches = regExp.allMatches(mytestString).map((m) => m[0]);
print(matches);
Output: (#one, #two, #three, #FOur, #five)
String mytestString = "#one #two, #three#FOur,#five";
RegExp regExp = new RegExp(r"/(#\w+)/g");
print(regExp.allMatches(mytestString).toString());
This should match all of the hashtags, placing them into capture groups for you to later use.

VBScript RegEx - match between words

I'm having a hard time coming up with a working RegEx that words in VBScript. I'm trying to match all text between 2 keywords:
(?<=key)(.*)(?=Id)
This throws a RegEx error in VBScript. Id
Blob I'm matching against:
\"key\":[\"food\",\"real\",\"versus\",\"giant\",\"giant gummy\",\"diy candy\",\"candy\",\"gummy worm\",\"pizza\",\"fries\",\"spooky diy science\",\"spooky\",\"trapped\"],\"Id\"
Ideally, I'd end up with a comma delimited list like this:
food,real,versus,giant,giant gummy,diy candy,candy,gummy worm,pizza,fries,spooky diy science,spooky,trapped
but, I'd settle for all text between 2 keywords working in VBScript.
Thanks in advance!
VBScript's regular expression engine doesn't support lookbehind assertions, so you'll want to do something like this instead:
s = "\""key\"":[\""food\"",\""real\"",\""trapped\""],\""Id\"""
'remove backslashes and double quotes from string
s1 = Replace(s, "\", "")
s1 = Replace(s1, Chr(34), "")
Set re = New RegExp
re.Pattern = "key:\[(.*?)\],Id"
For Each m In re.Execute(s1)
list = m.Submatches(0)
Next
WScript.Echo list

How to mimic regular Expression negative lookbehind?

What I'm trying to accomplish
I'm trying to create a function to use string interpolation within VBA. The issue I'm having is that I'm not sure how to replace "\n" with a vbNewLine, as long as it does not have the escape character "\" before it?
What I have found and tried
VBScript does not have a negative look behind as far as I can research.
Below has two examples of Patterns that I have already tried:
Private Sub testingInjectFunction()
Dim dict As New Scripting.Dictionary
dict("test") = "Line"
Debug.Print Inject("${test}1\n${test}2 & link: C:\\notes.txt", dict)
End Sub
Public Function Inject(ByVal source As String, dict As Scripting.Dictionary) As String
Inject = source
Dim regEx As Object
Set regEx = CreateObject("VBScript.RegExp")
regEx.Global = True
' PATTERN # 1 REPLACES ALL '\n'
'regEx.Pattern = "\\n"
' PATTERN # 2 REPLACES EXTRA CHARACTER AS LONG AS IT IS NOT '\'
regEx.Pattern = "[^\\]\\n"
' REGEX REPLACE
Inject = regEx.Replace(Inject, vbNewLine)
' REPLACE ALL '${dICT.KEYS(index)}' WITH 'dICT.ITEMS(index)' VALUES
Dim index As Integer
For index = 0 To dict.Count - 1
Inject = Replace(Inject, "${" & dict.Keys(index) & "}", dict.Items(index))
Next index
End Function
Desired result
Line1
Line2 & link: C:\notes.txt
Result for Pattern # 1: (Replaces when not wanted)
Line1
Line2 & link: C:\
otes.txt
Result for Pattern # 2: (Replaces the 1 in 'Line1')
Line
Line2 & link: C:\\notes.txt
Summary question
I can easily write code that doesn't use Regular Expressions that can achieve my desired goal but want to see if there is a way with Regular Expressions in VBA.
How can I use Regular Expressions in VBA to Replace "\n" with a vbNewLine, as long as it does not have the escape character "\" before it?
Yes, you may use a regex here. Since the backslash is not used to escape itself in these strings, you may modify your solution like this:
regEx.Pattern = "(^|[^\\])\\n"
S = regEx.Replace(S, "$1" & vbNewLine)
It will match and capture any char but \ before \n and then will put it back with the $1 placeholder. As there is a chance that \n appears at the start of the string, ^ - the start of string anchor - is added as an alternative into the capturing group.
Pattern details
(^|[^\\]) - Capturing group 1: start of string (^) or (|) any char but a backslash ([^\\])
\\ - a backslash
n - a n char.

Escaping dollars groovy

I'm having trouble escaping double dollars from a string to be used with regex functions pattern/matcher.
This is part of the String:
WHERE oid_2 = $$test$$ || oid_2 = $$test2$$
and this is the closest code I've tried to get near the solution:
List<String> strList = new ArrayList<String>();
Pattern pattern = Pattern.compile("\$\$.*?\$\$");
log.debug("PATTERN: "+pattern)
Matcher matcher = pattern.matcher(queryText);
while (matcher.find()) {
strList.add(matcher.group());
}
log.debug(strList)
This is the debug output i get
- PATTERN: $$.*?$$
- []
So the pattern is actually right, but the placeholders are not found in the string.
As a test I've tried to replace "$$test$$" with "XXtestXX" and everything works perfectly. What am I missing? I've tried "/$" strings, "\\" but still have no solution.
Note that a $ in regex matches the end of the string. To use it as a literal $ symbol, you need to escape it with a literal backslash.
You used "\$\$.*?\$\$" that got translated into a literal string like $$.*?$$ that matches 2 end of string positions, any 0+ chars as few as possible and then again 2 end of strings, which has little sense. You actually would need a backslash to first escape the $ that is used in Groovy to inject variables into a double quoted string literal, and then use 2 backslashes to define a literal backslash - "\\\$\\\$.*?\\\$\\\$".
However, when you work with regex, slashy strings are quite helpful since all you need to escape a special char is a single backslash.
Here is a sample code extracting all matches from the string you have in Groovy:
def regex = /\$\$.*?\$\$/;
def s = 'WHERE oid_2 = $$test$$ || oid_2 = $$test2$$'
def m = s =~ regex
(0..<m.count).each { print m[it] + '\n' }
See the online demo.
Anyone who gets here might like to know another answer to this, if you want to use Groovy slashy strings:
myComparisonString ==~ /.*something costs [$]stuff.*/
I couldn't find another way of putting a $ in a slashy string, at least if the $ is to be followed by text. If, conversely, it is followed by a number (or presumably any non-letter), this will work:
myComparisonString ==~ /.*something costs \$100.*/
... the trouble being, of course, that the GString "compiler" (if that's its name) would recognise "$stuff" as an interpolated variable.