I have several html files (source code) which contain lots of text and include the code of many reports files linked to.
I need to replace every space ( ) in the filenames by the undescore sign (_). This replace must not affect the rest of the text.
The links all follow the same folder structure, but the filenames are all very different except their extension (.pdf)
For example, I have:
Please find the presentations:
href="/Portals/12/Documents/GE-Project/Amalty-WS/Joanna_POT Pb paint health econ env_RUS.pdf">Presentation 1
I need:
Please find the presentations:
href="/Portals/12/Documents/GE-Project/Amalty-WS/Joanna_POT_Pb_paint_health_econ_env_RUS.pdf">Presentation 1
In Windows, using TEXTPAD, I have tried find/replace using regex (/Amalty-WS/[a-z,A-Z,0-9,_, ,]*\.pdf) but can't figure out how to replace only the spaces.
You can use
(?:\G(?!^)|/Amalty-WS/)[^"\s]*\K\s(?=[^"]*")
Replace with _.
See the regex demo.
Details:
(?:\G(?!^)|/Amalty-WS/) - either the end of the previous successful match (\G(?!^)) or /Amalty-WS/ string
[^"\s]* - zero or more chars other than " and whitespace
\K - match reset operator that discards text matched so far
\s - a whitespace
(?=[^"]*") - followed with zero or more chars other than " and then a ".
Related
I am trying to match lines in a file that contain only a single / so my thought is i can search for a string of any length that doesn't contain a / and then match exactly one / and then match another string of any length not containing a / and finally ending with a line break.
My attempt at this was [^/]*/[^/]*$. however this doesn't seem to work.
I went ahead and tried matching just parts of this pattern and started by just trying to match strings of any length not containing a / which I would think should be just [^/]* but this isn't working.
I am pretty familiar with regex but not as familiar with using it in vim so firstly, am I putting in my regex wrong for using vim? and secondly, if my input for vim is correct, then what is wrong with my regex?
You may search for all the lines matching your pattern using
:g/^[^\/]*\/[^\/]*$
Note that g will match all occurrences, backslashes need escaping here, and the pattern matches
^ - start of a line
[^\/]* - 0+ chars other than /
\/ - a /
[^\/]* - 0+ chars other than /
$ - end of a line.
Note that [^\/]* (negated bracket expression) won't match a line break sequence in Vim, unlike in text editors like Sublime Text 3 or Notepad++, thus, it will match exactly what you need.
Note that you may avoid escaping backslashes if you select another delimiter. See the Vim regex reference:
Frequently you need to do S&R in a text which contains UNIX file paths - text strings with slashes ("/") inside. Because S&R command uses slashes for pattern/replacement separation you have to escape every slash in your pattern, i.e. use "\/" for every "/" in your pattern... To avoid this so-called "backslashitis" you can use different separators in S&R.
So, you may also use :g~^[^/]*/[^/]*$~, or :g#^[^/]*/[^/]*$# as Amadan suggests.
I was looking for a way to put quotes in Windows Services paths that don't have them because they have spaces. See https://regex101.com/r/N6cbk8/2 for the regex I found which highlights the strings I want to put in quotes, so for example:
D:\SerPatHL7Server\RunAsService.exe runasservice d:\SerPatHL7Server\SerPatHL7server.exe
needs to become
"D:\SerPatHL7Server\RunAsService.exe" runasservice d:\SerPatHL7Server\SerPatHL7server.exe
And the other services need also needs quotes like so:
E:\Program Files (x86)\Endobase\ebserver.exe
to become
"E:\Program Files (x86)\Endobase\ebserver.exe"
But PowerShell won't accept the \K in | Select-String -Pattern "(^.*?)\K\.exe" and throws the error: "The string (^.\*?)\K\.exe is not a valid regular expression: parsing "(^.*?)\K\.exe" - Unrecognized escape sequence \K."
I couldn't find an alternative with my limited knowledge of Regex expressions.
See the above link for a full list of examples. Is there a way to achieve my goal?
The \K is a match reset operator used in PCRE, Boost, Python PiPy regex and Onigmo libraries. You do not need this operator in .NET because it supports an infinite width lookbehind (and \K is actually a kind of this lookbehind work around).
You just need to match any 0+ chars as few as possible up to the first .exe that is followed with whitespace or end of string and replace with " + match + ".
Use
-replace "^.*?\.exe(?!\S)", '"$&"'
Details
^ - start of the line
.*? - any 0+ chars other than newline as few as possible
\. - a literal .
exe - a literal exe substring
(?!\S) - there should be whitespace or end of string immediately to the right of the current location.
"$&" - the replacement pattern where $& stands for the whole match.
I have a document with multiple information. What I want is to build a Notepad++ Regex replace function, that finds the following lines in the document and replaces the blank spaces between the "" with an underline (_).
Example:
The line is:
&LOG Part: "NAME TEST.zip"
The result should be:
&LOG Part: "NAME_TEST.zip"
The perfect solution would be that the regex finds the &LOG Part: "NAME TEST.zip" lines and replaces the blank space with an underline.
What I have tried for now is this expression to find the text between the " ":
\"[^"]*\"
It should do it, but I don't know which expression to use to replace the blank spaces with an underline.
Anyone could help with a solution?
Thanks!
The \"[^"]*\" will only match whole substrings from " up to another closest " without matching individual spaces you want to replace.
Since Notepad++ does not support infinite width lookbehind, the only possible solution is using the \G - based regex to set the boundaries and use multiple matching (this one will replace consecutive spaces with 1 _):
(?:"|(?!^)\G)\K([^ "]*) +(?=[^"]*")
Or (if each space should be replaced with an underscore):
(?:"|(?!^)\G)\K([^ "]*) (?=[^"]*")
And replace with $1_. If you need to restrict to replacing inside &LOG Part only, just add it to the beginning:
(?:&LOG Part:\s*"|(?!^)\G)\K([^ "]*) (?=[^"]*")
A human-readable explanation of the regex:
(?:"|(?!^)\G)\K - Find a ", or, with each subsequent successful match, the end of the previous successful match position, and omit all the text in the buffer (thanks to \K)
([^ "]*) - (Group 1, accessed with$1from the replacement pattern) 0+ characters other than a space and"`
+ - one or more literal spaces (replace with \h to match all horizontal whitespace, or \s to match any whitespace)
(?=[^"]*") - check if there is a double quote ahead of the current position
I have a large file with content inside every bracket. This is not at the beginning of the line.
1. Atmos-phere (7800)
2. Atmospheric composition (90100)
3.Air quality (10110)
4. Atmospheric chemistry and composition (889s120)
5.Atmospheric particulates (10678130)
I need to do the following
Replace the entire content, get rid of line numbers
1.Atmosphere (10000) to plain Atmosphere
Delete the line numbers as well
1.Atmosphere (10000) to plain Atmosphere
make it a hyperlink
1.Atmosphere (10000) to plain linky study
[I added/Edit] Extract the words into a new file, where we get a simple list of key words. Can you also please explain the numbers in replace the \1\2, and escape on some characters
Each set of key words is a new line
Atmospheric
Atmospheric composition
Air quality
Each set is a on one line separated by one space and commas
Atmospheric, Atmospheric composition, Air quality
I tried find with regex like so, \(*\) it finds the brackets, but dont know how to replace this, and where to put the replace, and what variable holds the replacement value.
Here is mine exression for notepad ([0-9(). ]*)(.*)(\s\()(.*)
You need split your search in groups
([0-9. ]*) numbers, spaces and dots combination in 0 or more times
(.*) everything till next expression
(\s\() space and opening parenthesis
(.*) everything else
In replace box - for practicing if you place
\1\2\3\4 this do nothing :) just print all groups from above from 1.1 to 1.4
\2 this way you get only 1.2 group
new_thing\2new_thing adds your text before and after group
<a href=blah.com/\2.html>linky study</a> so now your text is added - spaces between words can be problematic when creating link - so another expression need to be made to replace all spaces in link to i.e. _
If you need add backslash as text (or other special sign used by regex) it must be escaped so you put \\ for backslash or \$ for dolar sign
Want more tune - <a href=blah.com/\2.html>\2</a> add again 1.2 group - or use whichever you want
On the screenshot you can see how I use it (I had found and replaced one line)
Ok and then we have case 4.2 with colon at the end so simply add colon after extracted section:
change replace from \2 to \2,
Now you need join it so simplest way is to Edit->Line Operations->Join Lines
but if you want to be real pro switch to Extended mode (just above Regular expression mode in Replace window) and Find \r\n and replace with space.
Removing line endings can differ in some cases but this is another story - for now I assume that you using windows since Notepad++ is windows tool and line endings are in windows style :)
The following regex should do the job: \d+\.\s*(.*?)\s*\(.*?\).
And the replacement: <a href=example.com\\\1.htm>\1</a>.
Explanation:
\d+ : Match a digit 0 or more times.
\. : Match a dot.
\s* : Match spaces 0 or more times.
(.*?) : Group and match everything until ( found.
\s* : Match spaces 0 or more times.
\(.*?\) : Match parenthesis and what's between it.
The replacement part is simple since \1 is referring to the matching group.
Online demo.
Try replacing ^\d+\.(.*) \(\w+\)$ with <a href=blah.com\\\1.htm>linky study</a>.
The ^\d+. removes the leading number and dot. The (.*) collects the words. Then there is a single space. The \(\w+\)$ matches the final number in brackets.
Update for the added Q4.
Regular expressions capture things written between round brackets ( and ). Brackets that are to be found in the text being searched must be escaped as \( and \). In the replacement expression the \1 and \2 etc are replaced by the corresponding capture expression. So a search expression such as Z(\d+)X([aeiou]+)Y might match Z29XeieiY then the replacement expression P\2Q\1R would insert PeieiQ29R. In the search at the top of this answer there is one capture, the (.) captures or collects the words and then the \1 inserts the captured words into the replacement text.
I need to reformat a text file a bit in my Notepad++ and I have a text of this kind:
This is some example text. This is some example text. This is some example text.
- This is some example text.
-This is some example text.
- This is some example text.
- This is some example text.
So as you can see in above text there are two types of "-" preceeding text the one with the space after "-" and ones without it I need to find only the ones without sapce and add it in between "-" and the "text"
If I ran piece of code below
-[A-Za-z0-9]
it finds dash and first letter right after it, which is not useful as when I replace the text it changes this first letter which is always different (depending on what is written) so I need to find this and select only the "-" and then replace it with "- " unless there is better way.
For demonstration purposes:
Find what: -([A-Za-z0-9])(.+)
Replace with: - \1\2
The parentheses denote a capture group. In the Replace with line, you use backslash and the number of group to add it.
That said, what you really want to match for is a NOT group, like -([^\s]) (match where a dash isn't immediately followed by a whitespace).
Search for
-([^ ])
and replace with
- \1
[^ ] is a negated character class and matches everything but a space. This character is stored in \1 because of the brackets () around the pattern.