Finding single escaped characters - regex

I would like to replace some escaping character in a given text. Here what I've tried.
_RE_SPECIAL_CHARS = re.compile(r"(?:[^#\\]|\\.)+#")
text = r"ok#\#.py"
search = re.search(_RE_SPECIAL_CHARS, text)
print(text)
if search:
print(_RE_SPECIAL_CHARS.sub("<star>", text))
else:
print('<< NOTHING FOUND ! >>')
This prints :
ok#\#.py
<star>\#.py
What I need to have instead is ok<star>\#.py.

You can use lookbehind and just match the special character:
re.compile(r"(?<=[^#\\]|\\.)#")
See DEMO
Or you can capture the part before # in group 1 and replace with \1<star>
re.compile(r"((?:[^#\\]|\\.)+)#")
and
print(_RE_SPECIAL_CHARS.sub("\1<star>", text))
See DEMO

Related

searching a string substring using regex in python

hello i'm trying to search a string for its substrings and return "yes" if found.
for exp : i have string Deracu876, substrings are {D,d,e,E,r,R,A,a,c,C,u,U,8,7,6} so here is the result :
deracu876 :yes
Deracu8762:no
Dderacu876 : yes
sNdAp725:no
here is the code i wrote using regex but not working
import re
def match(text,pattern):
# regex
# searching pattern
if re.search(pattern,text,re.IGNORECASE):
return('Yes')
else:
return('No')
text=input()
pattern=""
for w in text :
pattern=pattern+'|'+w
print(match("Deracu8762",pattern))
Your for loop is putting a | at the beginning of the pattern, e.g. if text is abc, the pattern is |a|b|c. This will match an empty string, which is a substring of every string.
You can simply wrap [] around the characters, e.g. [deracu876]'. This matches any of those characters.
You also need to make another pattern that rejects characters that aren't in text. You can do this by putting the characters in [^], e.g. [^deracu876].
def match(text, substring):
if re.search('[' + substring + ']', text, re.IGNORECASE) and not re.search('[^' + substring '], text, re.IGNORECASE):
return "True"
else:
return "False"
text = input()
print(match("Deracu8762",text))

Regex to capture hyphenated words separated by new line character

I have a pattern such as word-\nword, i.e. words are hyphenated and separated by new line character.
I would like the output as word-word. I get word-\nword with the below code.
text_string = "word-\nword"
result=re.findall("[A-Za-z]+-\n[A-Za-z]+", text_string)
print(result)
I tried this, but did not work, I get no result.
text_string = "word-\nword"
result=re.findall("[A-Za-z]+-(?=\n)[A-Za-z]+", text_string)
print(result)
How can I achieve this.
Thank You !
Edit:
Would it be efficient to do a replace and run a simple regex
text_string = "aaa bbb ccc-\nddd eee fff"
replaced_text = text_string.replace('-\n', '-')
result = re.findall("\w+-\w+",replaced_text)
print(result)
or use the method suggested by CertainPerformance
text_string = "word-\nword"
result=re.sub("(?i)(\w+)-\n(\w+)", r'\1-\2', text_string)
print(result)
You should use re.sub instead of re.findall:
result = re.sub(r"(?<=-)\n+", "", test_str)
This matches any new lines after a - and replaces it with empty string.
Demo
You can alternatively use
(?<=-)\n(?=\w)
which matches new lines only if there is a - before it and it is followed by word characters.
If the string is composed of just that, then a pure regex solution is to use re.sub, capture the first word and the second word in a group, then echo those two groups back (without the dash and newline):
result=re.sub("(?i)([a-z]+)-\n([a-z]+)", r'\1\2', text_string)
Otherwise, if there is other stuff in the string, iterate over each match and join the groups:
text_string = "wordone-\nwordtwo wordthree-\nwordfour"
result=re.findall("(?i)([a-z]+)-\n([a-z]+)", text_string)
for match in result:
print(''.join(match))
You can simply replace any occurrences of '-\n' with '-' instead:
result = text_string.replace('-\n', '-')

Regexp to extract studyinstanceuid from dump

I need to capture numbers and dots between brackets on lines containing the string 0020,000d, for example:
I: (0020,000d) UI [1.2.410.200001.1104.20160720104648421 ] # 38, 1 StudyInstanceUID
Using this regexp 0020,000d.*\[([\.0-9]+)\] I can match the needed value only if it doesn't have a space inside the brackets. How can I match the needed value ignoring any other character?.
Edit
If I use this regexp 0020,000d.*\[([\.0-9(\s|^\s))]+)\] I can capture numbers and dots and/or spaces, now if the string contains a space how can I capture in a group everything but the space?.
To clarify, I want to extract the 1.2.410.200001.1104.20160720104648421 string.
Codifying my (apparently helpful) answer from the comments:
You just need to allow zero or more spaces after the numbers-and-dots sequence before the closing bracket:
0020,000d.*\[([.0-9]+) *\]
Also, please note that you don't need to escape a dot in a character class.
Try this
let regex = /(?!\[)[.\d]+(?=[(\s)*\]])/g
let str = 'I: (0020,000d) UI [1.2.410.200001.1104.20160720104648421 ]'
let result = str.match(regex);
console.log(result);

Surrounding one group with special characters in using substitute in vim

Given string:
some_function(inputId = "select_something"),
(...)
some_other_function(inputId = "some_other_label")
I would like to arrive at:
some_function(inputId = ns("select_something")),
(...)
some_other_function(inputId = ns("some_other_label"))
The key change here is the element ns( ... ) that surrounds the string available in the "" after the inputId
Regex
So far, I have came up with this regex:
:%substitute/\(inputId\s=\s\)\(\"[a-zA-Z]"\)/\1ns(/2/cgI
However, when deployed, it produces an error:
E488: Trailing characters
A simpler version of that regex works, the syntax:
:%substitute/\(inputId\s=\s\)/\1ns(/cgI
would correctly inser ns( after finding inputId = and create string
some_other_function(inputId = ns("some_other_label")
Challenge
I'm struggling to match the remaining part of the string, ex. "select_something") and return it as:
"select_something")).
You have many problems with your regex.
[a-zA-Z] will only match one letter. Presumably you want to match everything up to the next ", so you'll need a \+ and you'll also need to match underscores too. I would recommend \w\+. Unless more than [a-zA-Z_] might be in the string, in which case I would do .\{-}.
You have a /2 instead of \2. This is why you're getting E488.
I would do this:
:%s/\(inputId = \)\(".\{-}\)"/\1ns(\2)/cgI
Or use the start match atom: (that is, \zs)
:%s/inputId = \zs\".\{-}"/ns(&)/cgI
You can use a negated character class "[^"]*" to match a quoted string:
%s/\(inputId\s*=\s*\)\("[^"]*"\)/\1ns(\2)/g

Incorrect use of regex wildcards

This is not correct use of wildcards ? I'm attempting to match String that contains a date. I don't want to include the date in the returned String or the String value that prepends the matched String.
object FindText extends App{
val toFind = "find1"
val line = "this is find1 the line 1 \n 21/03/2015"
val find = (toFind+".*\\d{2}/\\d{2}/\\d{4}").r
println(find.findFirstIn(line))
}
Output should be : "find1 the line 1 \n "
but String is not found.
Dot does not match newline characters by default. You can set a DOTALL flag to make it happen (I have also added a "positive look-ahead - the (?=...) thingy - since you did not want the date to be included in the match": val find = (toFind+"""(?s).*(?=\d{2}/\d{2}/\d{4})""").r
(Note also, that in scala you do not need to escape special characters in strings, enclosed in a triple-quote pairs ... pretty neat).
The problem lies with the newline in the test string. A .* does not match newlines apparently. Replacing this with .*\\n?.* should fix it. One could also use a multiline flag in the regex such as:
val find = ("(?s)"+toFind+".*\\d{2}/\\d{2}/\\d{4}").r