I have a JSON file which has a bunch of jpg references that I'm trying to replace with png. I want to match on a pattern where there is a double digit and period before the jpg, capture 1, and use it in the replacement. The issue is I only ever get pattern not found.
"plith":"img/01.jpg"},{"block_ha....
where the substitution code looks like the following
:%s/\(\d{2}\.\)+jpg/$1png/g
I tried this substitution command:
:%s/\v(\d{2}\.)jpg/\1png/g
And it replaced the line:
"plith":"img/01.jpg"},{"block_ha....
With:
"plith":"img/01.png"},{"block_ha....
If the 2 digits and the following dot can be repeated, you can apply the + quantifier to \d{2}\.:
:%s/\v(\d{2}\.)+jpg/\1png/g
In your original command:
:%s/(\d{2}.)+jpg/$1png/g
There seemed to be 3 problems:
you use non-escaped parentheses to capture the digits, but by default you need to escape them. If you don't want to, you can switch to very magic mode by adding the atom \v in your pattern.
you don't escape the ., which means that it will match any character (except a newline), instead of a literal dot
in the replacement part, you use $1 to refer to the first capturing group, but it should be \1
Related
With regular expression I would like to get all characters between round brackets, but \( and \) characters should be also included in the result.
Examples:
input: fo(ob)a)r
output: ob
input: foo(bar\(qwerty\))baz
output: bar\(qwerty\)
This is what I used for finding text between brackets:
(?<=\()([^\s\(\)]+)(?=\)), but I can't make exceptions for brackets preceded by \.
You could do something like this :
.*(?<!\\)\((.*?)(?<!\\)\)
Basically, it matches as many characters as possible until it sees an open parenthesis without a backslash (using a negative lookbehind), then groups the next matching characters until a closing parenthesis (still without a backslash).
Note that this regex may not work properly if you escape the backslashes.
Example : https://regex101.com/r/BqVKZp/1
This regex works for both your examples, without any lookaheads and lookbehinds:
\((.+[^\\])\)
A U flag is needed.
I have a text file with a list of elements separated by line-breaks, like this:
alpha
beta
gamma
...
I want to get it into this format:
(alpha),
(beta),
(gamma),
...
So I am using following regular expressions in Notepad++ for replacing those lines:
Find: ([^\n]+)
Replace: \($1\),
but the output now strangely has another line-break for each line into it:
(alpha
),
(beta
),
(gamma
),
...
I have no clue how this is happening. When I solely use $1 or \), apart for replacement it works just fine, but everytime I put a literal after the backreference it puts a line-break in between. I know that I can work around that with another regular expression afterwards, but could anybody explain to me why exactly this is happening?
Instead of [^\n] (=any char but an LF, line feed, \n) you should use . that only matches any char other than line break chars. Use the following regex to match a non-empty line:
^.+$
Replace with \($0\), where $0 replacement backreference (also called placeholder) stands for the whole match and the parentheses are escaped (since parentheses are special metacharacters inside Boost replacement patterns used to define conditional replacement patterns).
No need to use the m modifier here since ^ and $ anchors match start and end of the line respectively by default in Notepad++.
See the NPP S&R settings:
I've got a CSV file with lines like:
57,13,"Bob, Bill and Susan",Student,Club,Funded,64,3200^M
I need them to look like
57,13,Bob-Bill-and-Susan,Student,Club,Funded,64,3200
I'm using vim regexes. I've broken it down into 4 steps:
Remove ^M and insert newlines:
:%s:<ctrl-V><ctrl-M>:\r:g`
Replace all with -:
:%s: :\-:g
Remove commas between quotes: Need help here.
Remove quotes:
:%s:\"\([^"]*\)\":\1:g
How do I remove commas between quotes, without removing all commas in the file?
Something like this?
:%s:\("\w\+\),\(\w\+"\):\1 \2:g
My preferred solution to this problem (removing commas inside quoted regions) is to use replacements with an expression instead of trying to get this done in one regex.
To do this you need to prepend you replacement with \= to get the replacement treated as a vim expression. From here you can extract just the parts between quotes and then manipulate the the matched part separately. This requires having two short regexes instead of one complicated one.
:%s/".\{-}"/\=substitute(submatch(0), ',', '' , 'g')/g
So ".\{-}" matches anything in quotes (non greedy) and substitute(submatch(0), ',', '' , 'g') takes what was matched and removes all of the commas and its return value is used as the actual replacement.
The relevant help page is :help sub-replace-special.
As for the other parts of your question. Step 1 is essentially trying to remove all carriage returns since the file format is actually the dos file format. You can remove them with the dos2unix program.
In Step 2 escaping the - in the replacement is unnecessary. So the command is just
:%s/ /-/g
In Step 4, you have an overly complicated regex if all you want to do is remove quotes. Since all you need to do is match quotes and remove them
:%s/"//g
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g
example: "this is , an, example"
\("\w*\) match start of " every letter following qoutes group \1 for back reference
\(,\) capture comma group \2 for back reference
(.*"\) match every other character upto the second qoute ->group 3 for backreference
:\1\3: only include groups without comma, discard group 2 from returned string which is \2
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g removes commas
I have a large file with content inside every bracket. This is not at the beginning of the line.
1. Atmos-phere (7800)
2. Atmospheric composition (90100)
3.Air quality (10110)
4. Atmospheric chemistry and composition (889s120)
5.Atmospheric particulates (10678130)
I need to do the following
Replace the entire content, get rid of line numbers
1.Atmosphere (10000) to plain Atmosphere
Delete the line numbers as well
1.Atmosphere (10000) to plain Atmosphere
make it a hyperlink
1.Atmosphere (10000) to plain linky study
[I added/Edit] Extract the words into a new file, where we get a simple list of key words. Can you also please explain the numbers in replace the \1\2, and escape on some characters
Each set of key words is a new line
Atmospheric
Atmospheric composition
Air quality
Each set is a on one line separated by one space and commas
Atmospheric, Atmospheric composition, Air quality
I tried find with regex like so, \(*\) it finds the brackets, but dont know how to replace this, and where to put the replace, and what variable holds the replacement value.
Here is mine exression for notepad ([0-9(). ]*)(.*)(\s\()(.*)
You need split your search in groups
([0-9. ]*) numbers, spaces and dots combination in 0 or more times
(.*) everything till next expression
(\s\() space and opening parenthesis
(.*) everything else
In replace box - for practicing if you place
\1\2\3\4 this do nothing :) just print all groups from above from 1.1 to 1.4
\2 this way you get only 1.2 group
new_thing\2new_thing adds your text before and after group
<a href=blah.com/\2.html>linky study</a> so now your text is added - spaces between words can be problematic when creating link - so another expression need to be made to replace all spaces in link to i.e. _
If you need add backslash as text (or other special sign used by regex) it must be escaped so you put \\ for backslash or \$ for dolar sign
Want more tune - <a href=blah.com/\2.html>\2</a> add again 1.2 group - or use whichever you want
On the screenshot you can see how I use it (I had found and replaced one line)
Ok and then we have case 4.2 with colon at the end so simply add colon after extracted section:
change replace from \2 to \2,
Now you need join it so simplest way is to Edit->Line Operations->Join Lines
but if you want to be real pro switch to Extended mode (just above Regular expression mode in Replace window) and Find \r\n and replace with space.
Removing line endings can differ in some cases but this is another story - for now I assume that you using windows since Notepad++ is windows tool and line endings are in windows style :)
The following regex should do the job: \d+\.\s*(.*?)\s*\(.*?\).
And the replacement: <a href=example.com\\\1.htm>\1</a>.
Explanation:
\d+ : Match a digit 0 or more times.
\. : Match a dot.
\s* : Match spaces 0 or more times.
(.*?) : Group and match everything until ( found.
\s* : Match spaces 0 or more times.
\(.*?\) : Match parenthesis and what's between it.
The replacement part is simple since \1 is referring to the matching group.
Online demo.
Try replacing ^\d+\.(.*) \(\w+\)$ with <a href=blah.com\\\1.htm>linky study</a>.
The ^\d+. removes the leading number and dot. The (.*) collects the words. Then there is a single space. The \(\w+\)$ matches the final number in brackets.
Update for the added Q4.
Regular expressions capture things written between round brackets ( and ). Brackets that are to be found in the text being searched must be escaped as \( and \). In the replacement expression the \1 and \2 etc are replaced by the corresponding capture expression. So a search expression such as Z(\d+)X([aeiou]+)Y might match Z29XeieiY then the replacement expression P\2Q\1R would insert PeieiQ29R. In the search at the top of this answer there is one capture, the (.) captures or collects the words and then the \1 inserts the captured words into the replacement text.
grep "http:\/\/.*\.jpg" index.html -o
Gives me text starting with http:// and ending with .jpg
So does: grep "http:\/\/.*\.\(jpg\)" index.html -o
What is the difference? And is there any condition where this might fail?
I got it to match either jpg,png or gif using this regex:
http:\/\/.*\.\(jpg\|png\|gif\)
Something to do with backreference or regex grouping that I read. Cannot understand this part \(\)
Grouping is used for two purposes in regular expressions.
One uses is to delimit parts of the regexp when using alternatives. That's the case in your third regexp, it allows you to say that the extension can be any of jpg, png, or gif.
The other use is for backreferences. This allows you to refer to the text that matched an earlier part of the regexp later in the regexp. For instance, the following regexp matches any letter that appears twice in a row:
\([a-z]\)\1
The backreference \1 means "match whatever matched the first group in the regexp".
( and ) are metacharacters. i.e. they don't match themselves, but mean something to grep.
From here:
Grouping is performed with backslashes followed by parentheses ‘(’,
‘)’.
so in the above the \( and \) define within them a group of possibilities to match separated by the | character. i.e. your filename extensions.