I have hundreds lines of text like this:
gi|393925858|gb|AGTA02071966.1| 0000000739 . G A 121.20 PASS NS=74:AN=2:DP=8448 GT:DP:GQ:EC:SG 0/1:262:144:116:R
I wanted to ONLY replace the colon with semicolon in this portion "NS=74:AN=2:DP=8448" of the line. Here is how I matched and replaced it:
if re.match(r'.*NS=\d+(:)AN=\d(:)DP=\d+.*', line):
print line.replace(':', ";")
I thought I just replaced the matched pattern in all lines, but it replaced EVERY colon with semicolon in all lines, is there a way to specify just the matched ones, or my regular expression was incorrect? Thanks.
The way to do this is to use the full regex in the replacement, using capture groups (parentheses) to capture the digits you want to keep.
So your search term is this:
NS=(\d+):AN=(\d+):DP=(\d+)
And your replace term is this:
NS=\1;AN=\2;DP=\3
Note that in the replacement, the \1 will be filled in with what the first capture group (parens) captured from the original text.
Related
In a regular expression (notepad++), I want to search for:( )|(:)|(_)|(\.), and to insert \ before to, as above, a blank space, colon, under line and ".".
Search example: abcd:1234 jiod.8ufd_adfd
Result: abcd\:1234\ jiod\.8ufd\_adfd
Briefly, how can I refer to what was found in the replace expression?
Note that it is not \1, \2, \3 or \4 in the example, as I need to include what was found, there is no way to know which was found, is there?
You can use a single character class (instead of using the alternation with capturing groups) to match one of the listed
In the replacement use $& to refer to the matched text and prepend a backslash.
Match
[:\h._]
Replace with
\\$&
The character class matches either a colon, horizontal whitespace char, dot or underscore.
Regex demo
There's no such thing as insert, because if you think about it, inserting is just replacing the original with a new string that contains the old text as well.
Try this instead: search for ([ :_.]) (your original regex is pointlessly complicated) and replace with \\$1 (ie, slash followed by the original text).
I am using TextMate to replace expression [my_expression] consisting in characters between open and closed brackets by {my_expression}; so I tried to replace
\[[^]]*\]
by
{$1}
The regex matches the correct expression, but the replacement gives {$1}, so that the variable is not recognised. Can someone has an idea ?
You forgot to escape a character, [^]] should be [^\]].
You also need a capture group. $1 is back-referencing the 1st Capture Group, and you had no capture groups, so use the following Regex:
\[([^\]]*)\]
This adds () around [^\]]*, so the data inside the [] is captured. For more info, see this page on Capture Groups
However, this RegEx is shorter:
\[(.*?)\]
Also substituting with {$1}
Live Demo on Regex101
Use a capturing group (...):
\[([^\]]*)\]
The $1 is a backreference to the text enclosed with [...].
Here is the regex demo and also Numbered Backreferences.
Also, the TextMate docs:
1. Syntax elements
(...) group
20.4.1 Captures
To reference a capture, use $n where n is the capture register number. Using $0 means the entire match.
And also:
If you want to use [, -, ] as a normal character in a character class, you should escape these characters by \.
Using REGEX (in PowerShell) I would like to find a pattern in a text file that is over two lines and replace it with new text and preserve the whitespace. Example text:
ObjectType=Page
ObjectID=70000
My match string is
RunObjectType=Page;\s+RunObjectID=70000
The result I want is
ObjectType=Page
ObjectID=88888
The problem is my replacement string
RunObjectType=Page;`n+RunObjectID=88888
returns
ObjectType=Page
ObjectID=88888
And I need it to keep the original spacing. To complicate matters the amount of spacing may change.
Suggestions?
Leverage a capturing group and a backreference to that group in the replacement pattern:
$s -replace 'RunObjectType=Page;(\s+)RunObjectID=70000', 'RunObjectType=Page;$1RunObjectID=88888'
See the regex demo
With the (\s+), you capture all the whitespaces into the Group 1 buffer and then, using $1 backreference, the value is inserted into the result.
I have something in a text file that looks like '%r'%XXXX, where the XXXX represents some name at the end. Examples include '%r'%addR or '%r'%removeA. I can match these patterns using regex '%r'%\w+. What I would like to replace this with is '{!r}'.format(XXXX). Note that the name has to stay the same in the replace so I'd get '{!r}.format(addR) or '{!r}.format(removeA). Is there a way to replace parts of the matched string in this way while retaining the unknown variable name pulled out with \w+ in the regex search?
I'm specifically looking for a solution using the find and replace features in Notepad++.
You can use
'%r'%(\w+)
and replace with '{!r}.format\(\1\)
The '%r'%(\w+) pattern contains a pair of unescaped parentheses that create a capturing group. Inside the replacement pattern, we use a \1 backreference to restore that value.
NOTE: The ( and ) in the replacement must be escaped because otherwise they are treated as Boost conditional replacement pattern functional characters.
See more on capturing groups and backreferences.
Search on:
'%r'%(XXXX)
Replace with:
Whatever You like \1
\1 will match the first set of grouping parentheses.
I've got a CSV file with lines like:
57,13,"Bob, Bill and Susan",Student,Club,Funded,64,3200^M
I need them to look like
57,13,Bob-Bill-and-Susan,Student,Club,Funded,64,3200
I'm using vim regexes. I've broken it down into 4 steps:
Remove ^M and insert newlines:
:%s:<ctrl-V><ctrl-M>:\r:g`
Replace all with -:
:%s: :\-:g
Remove commas between quotes: Need help here.
Remove quotes:
:%s:\"\([^"]*\)\":\1:g
How do I remove commas between quotes, without removing all commas in the file?
Something like this?
:%s:\("\w\+\),\(\w\+"\):\1 \2:g
My preferred solution to this problem (removing commas inside quoted regions) is to use replacements with an expression instead of trying to get this done in one regex.
To do this you need to prepend you replacement with \= to get the replacement treated as a vim expression. From here you can extract just the parts between quotes and then manipulate the the matched part separately. This requires having two short regexes instead of one complicated one.
:%s/".\{-}"/\=substitute(submatch(0), ',', '' , 'g')/g
So ".\{-}" matches anything in quotes (non greedy) and substitute(submatch(0), ',', '' , 'g') takes what was matched and removes all of the commas and its return value is used as the actual replacement.
The relevant help page is :help sub-replace-special.
As for the other parts of your question. Step 1 is essentially trying to remove all carriage returns since the file format is actually the dos file format. You can remove them with the dos2unix program.
In Step 2 escaping the - in the replacement is unnecessary. So the command is just
:%s/ /-/g
In Step 4, you have an overly complicated regex if all you want to do is remove quotes. Since all you need to do is match quotes and remove them
:%s/"//g
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g
example: "this is , an, example"
\("\w*\) match start of " every letter following qoutes group \1 for back reference
\(,\) capture comma group \2 for back reference
(.*"\) match every other character upto the second qoute ->group 3 for backreference
:\1\3: only include groups without comma, discard group 2 from returned string which is \2
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g removes commas