Finding single escaped characters

Finding single escaped characters - regex

I would like to replace some escaping character in a given text. Here what I've tried.
_RE_SPECIAL_CHARS = re.compile(r"(?:[^#\\]|\\.)+#")
text = r"ok#\#.py"
search = re.search(_RE_SPECIAL_CHARS, text)
print(text)
if search:
print(_RE_SPECIAL_CHARS.sub("<star>", text))
else:
print('<< NOTHING FOUND ! >>')
This prints :
ok#\#.py
<star>\#.py
What I need to have instead is ok<star>\#.py.

You can use lookbehind and just match the special character:
re.compile(r"(?<=[^#\\]|\\.)#")
See DEMO
Or you can capture the part before # in group 1 and replace with \1<star>
re.compile(r"((?:[^#\\]|\\.)+)#")
and
print(_RE_SPECIAL_CHARS.sub("\1<star>", text))
See DEMO

Related

searching a string substring using regex in python

hello i'm trying to search a string for its substrings and return "yes" if found.
for exp : i have string Deracu876, substrings are {D,d,e,E,r,R,A,a,c,C,u,U,8,7,6} so here is the result :
deracu876 :yes
Deracu8762:no
Dderacu876 : yes
sNdAp725:no
here is the code i wrote using regex but not working
import re
def match(text,pattern):
# regex
# searching pattern
if re.search(pattern,text,re.IGNORECASE):
return('Yes')
else:
return('No')
text=input()
pattern=""
for w in text :
pattern=pattern+'|'+w
print(match("Deracu8762",pattern))

Your for loop is putting a | at the beginning of the pattern, e.g. if text is abc, the pattern is |a|b|c. This will match an empty string, which is a substring of every string.
You can simply wrap [] around the characters, e.g. [deracu876]'. This matches any of those characters.
You also need to make another pattern that rejects characters that aren't in text. You can do this by putting the characters in [^], e.g. [^deracu876].
def match(text, substring):
if re.search('[' + substring + ']', text, re.IGNORECASE) and not re.search('[^' + substring '], text, re.IGNORECASE):
return "True"
else:
return "False"
text = input()
print(match("Deracu8762",text))

Regex to capture hyphenated words separated by new line character

I have a pattern such as word-\nword, i.e. words are hyphenated and separated by new line character.
I would like the output as word-word. I get word-\nword with the below code.
text_string = "word-\nword"
result=re.findall("[A-Za-z]+-\n[A-Za-z]+", text_string)
print(result)
I tried this, but did not work, I get no result.
text_string = "word-\nword"
result=re.findall("[A-Za-z]+-(?=\n)[A-Za-z]+", text_string)
print(result)
How can I achieve this.
Thank You !
Edit:
Would it be efficient to do a replace and run a simple regex
text_string = "aaa bbb ccc-\nddd eee fff"
replaced_text = text_string.replace('-\n', '-')
result = re.findall("\w+-\w+",replaced_text)
print(result)
or use the method suggested by CertainPerformance
text_string = "word-\nword"
result=re.sub("(?i)(\w+)-\n(\w+)", r'\1-\2', text_string)
print(result)

You should use re.sub instead of re.findall:
result = re.sub(r"(?<=-)\n+", "", test_str)
This matches any new lines after a - and replaces it with empty string.
Demo
You can alternatively use
(?<=-)\n(?=\w)
which matches new lines only if there is a - before it and it is followed by word characters.

If the string is composed of just that, then a pure regex solution is to use re.sub, capture the first word and the second word in a group, then echo those two groups back (without the dash and newline):
result=re.sub("(?i)([a-z]+)-\n([a-z]+)", r'\1\2', text_string)
Otherwise, if there is other stuff in the string, iterate over each match and join the groups:
text_string = "wordone-\nwordtwo wordthree-\nwordfour"
result=re.findall("(?i)([a-z]+)-\n([a-z]+)", text_string)
for match in result:
print(''.join(match))

You can simply replace any occurrences of '-\n' with '-' instead:
result = text_string.replace('-\n', '-')

Regexp to extract studyinstanceuid from dump

I need to capture numbers and dots between brackets on lines containing the string 0020,000d, for example:
I: (0020,000d) UI [1.2.410.200001.1104.20160720104648421 ] # 38, 1 StudyInstanceUID
Using this regexp 0020,000d.*\[([\.0-9]+)\] I can match the needed value only if it doesn't have a space inside the brackets. How can I match the needed value ignoring any other character?.
Edit
If I use this regexp 0020,000d.*\[([\.0-9(\s|^\s))]+)\] I can capture numbers and dots and/or spaces, now if the string contains a space how can I capture in a group everything but the space?.
To clarify, I want to extract the 1.2.410.200001.1104.20160720104648421 string.

Codifying my (apparently helpful) answer from the comments:
You just need to allow zero or more spaces after the numbers-and-dots sequence before the closing bracket:
0020,000d.*\[([.0-9]+) *\]
Also, please note that you don't need to escape a dot in a character class.

Try this
let regex = /(?!\[)[.\d]+(?=[(\s)*\]])/g
let str = 'I: (0020,000d) UI [1.2.410.200001.1104.20160720104648421 ]'
let result = str.match(regex);
console.log(result);

Surrounding one group with special characters in using substitute in vim

Given string:
some_function(inputId = "select_something"),
(...)
some_other_function(inputId = "some_other_label")
I would like to arrive at:
some_function(inputId = ns("select_something")),
(...)
some_other_function(inputId = ns("some_other_label"))
The key change here is the element ns( ... ) that surrounds the string available in the "" after the inputId
Regex
So far, I have came up with this regex:
:%substitute/\(inputId\s=\s\)\(\"[a-zA-Z]"\)/\1ns(/2/cgI
However, when deployed, it produces an error:
E488: Trailing characters
A simpler version of that regex works, the syntax:
:%substitute/\(inputId\s=\s\)/\1ns(/cgI
would correctly inser ns( after finding inputId = and create string
some_other_function(inputId = ns("some_other_label")
Challenge
I'm struggling to match the remaining part of the string, ex. "select_something") and return it as:
"select_something")).

You have many problems with your regex.
[a-zA-Z] will only match one letter. Presumably you want to match everything up to the next ", so you'll need a \+ and you'll also need to match underscores too. I would recommend \w\+. Unless more than [a-zA-Z_] might be in the string, in which case I would do .\{-}.
You have a /2 instead of \2. This is why you're getting E488.
I would do this:
:%s/\(inputId = \)\(".\{-}\)"/\1ns(\2)/cgI
Or use the start match atom: (that is, \zs)
:%s/inputId = \zs\".\{-}"/ns(&)/cgI

You can use a negated character class "[^"]*" to match a quoted string:
%s/\(inputId\s*=\s*\)\("[^"]*"\)/\1ns(\2)/g

Incorrect use of regex wildcards

This is not correct use of wildcards ? I'm attempting to match String that contains a date. I don't want to include the date in the returned String or the String value that prepends the matched String.
object FindText extends App{
val toFind = "find1"
val line = "this is find1 the line 1 \n 21/03/2015"
val find = (toFind+".*\\d{2}/\\d{2}/\\d{4}").r
println(find.findFirstIn(line))
}
Output should be : "find1 the line 1 \n "
but String is not found.

Dot does not match newline characters by default. You can set a DOTALL flag to make it happen (I have also added a "positive look-ahead - the (?=...) thingy - since you did not want the date to be included in the match": val find = (toFind+"""(?s).*(?=\d{2}/\d{2}/\d{4})""").r
(Note also, that in scala you do not need to escape special characters in strings, enclosed in a triple-quote pairs ... pretty neat).

The problem lies with the newline in the test string. A .* does not match newlines apparently. Replacing this with .*\\n?.* should fix it. One could also use a multiline flag in the regex such as:
val find = ("(?s)"+toFind+".*\\d{2}/\\d{2}/\\d{4}").r

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Finding single escaped characters - regex

You can use lookbehind and just match the special character: re.compile(r"(?<=[^#\\]|\\.)#") See DEMO Or you can capture the part before # in group 1 and replace with \1<star> re.compile(r"((?:[^#\\]|\\.)+)#") and print(_RE_SPECIAL_CHARS.sub("\1<star>", text)) See DEMO

Related

searching a string substring using regex in python

Regex to capture hyphenated words separated by new line character

Regexp to extract studyinstanceuid from dump

Surrounding one group with special characters in using substitute in vim

Incorrect use of regex wildcards

Categories

Resources