Regular Expression (Regex) - regex

I need to reformat a text file a bit in my Notepad++ and I have a text of this kind:
This is some example text. This is some example text. This is some example text.
- This is some example text.
-This is some example text.
- This is some example text.
- This is some example text.
So as you can see in above text there are two types of "-" preceeding text the one with the space after "-" and ones without it I need to find only the ones without sapce and add it in between "-" and the "text"
If I ran piece of code below
-[A-Za-z0-9]
it finds dash and first letter right after it, which is not useful as when I replace the text it changes this first letter which is always different (depending on what is written) so I need to find this and select only the "-" and then replace it with "- " unless there is better way.

For demonstration purposes:
Find what: -([A-Za-z0-9])(.+)
Replace with: - \1\2
The parentheses denote a capture group. In the Replace with line, you use backslash and the number of group to add it.
That said, what you really want to match for is a NOT group, like -([^\s]) (match where a dash isn't immediately followed by a whitespace).

Search for
-([^ ])
and replace with
- \1
[^ ] is a negated character class and matches everything but a space. This character is stored in \1 because of the brackets () around the pattern.

Related

Replace full stop between letters with hypen in sentance

Somehow some text pages I've been given seem to have a full stop where a hyphen should be.
For example, "Then it happened.first one foot.then the other.". As you can see there is a full-stop where a hyphen should be.
I figured out the Regex to 'find' ([a-z][.][a-z]) all the occurences but can't figure out the 'replace'.
When I tried, [ - ], [-], [a-z][-][a-z], [a-z][ - ][a-z] and [a-z][ ][-][ ][a-z] it removes the last letter of the the precceding word and the first letter of the following one.
I'm using a text editor (TextPad).
How do I solve this?
Consider [.](?=[a-z]):
It does avoid abbreviations assuming they're only uppercase.
It does not handle "Then it happened.Winnie the Pooh looked up." (The word after the hyphen is a proper noun.)
It also does not handle "Then it happened.G.W. Bush looked up." (The word after the hyphen is an abbreviation.)
To handle those cases, consider (?<![A-Z])[.](?!\s|$):
It says that the letter prior to the . should not be uppercase.
It also says that there shouldn't be spaces or end-of-lines after it.
You can perform the replace with just " - ", since the regular expression only consumes the period. (The context is matched using lookarounds.)
See the demo.
I don't have TextPad and cannot test my answer. From what I read, TextPad should support capture groups either via \1 or $1. Try one of the following
Search ([a-z])\.([a-z]) and replace with \1 - \2
or
Search ([a-z])\.([a-z]) and replace with $1 - $2
or
Search \([a-z]\)\.\([a-z]\) and replace with \1 - \2.
or
Search \([a-z]\)\.\([a-z]\) and replace with $1 - $2.
You can try like this
var string = "Then it happened.first one foot.then the other.";
var result = string.replace(/[.]/g,'-');

Find character, text around and extract it in Notepad++

I have a problem to find a character, enlarge it by constant number of characters around and return it.
Example of text:
Contrary to popular belief, (Lorem Ipsum) is not simply random text. It (has) roots in a piece of ...
Expected result:
r belief, (Lorem Ipsu
text. It (has) roots
How it should work:
find position of "(" - 10 characters
find position of "(" + 10 characters
extract text with start position of point 1. and end position of point 2. (and store it in a new row)
Please is it possible to do this in Notepad++ or similar software with function Find and Replace?
I believe this can be done with regex, but I am not able to write it.
Thank you very much!
Do a regular expression find/replace like this:
Open Replace Dialog
Find What: (.{10}\(.{10})
Replace With: \r\n\1\r\n
check regular expression
click Replace or Replace All
Depending on your line endings, you may need to change the \r\n to \n in the replacement.
Explanation:
the regular expressin centers at a literal ( (it has to be escaped as \( due the regex rules)
it captures the 10 character before and after it with the two .{10} sections
all the 21 character are captured into \1 (by putting the whole regular expression in unescaped parenthesis)
the replacement inserts \1 surrounded by linebreaks (either \r\n or \n, adopt what you need)

Regex - replace blank spaces in line (Notepad++)

I have a document with multiple information. What I want is to build a Notepad++ Regex replace function, that finds the following lines in the document and replaces the blank spaces between the "" with an underline (_).
Example:
The line is:
&LOG Part: "NAME TEST.zip"
The result should be:
&LOG Part: "NAME_TEST.zip"
The perfect solution would be that the regex finds the &LOG Part: "NAME TEST.zip" lines and replaces the blank space with an underline.
What I have tried for now is this expression to find the text between the " ":
\"[^"]*\"
It should do it, but I don't know which expression to use to replace the blank spaces with an underline.
Anyone could help with a solution?
Thanks!
The \"[^"]*\" will only match whole substrings from " up to another closest " without matching individual spaces you want to replace.
Since Notepad++ does not support infinite width lookbehind, the only possible solution is using the \G - based regex to set the boundaries and use multiple matching (this one will replace consecutive spaces with 1 _):
(?:"|(?!^)\G)\K([^ "]*) +(?=[^"]*")
Or (if each space should be replaced with an underscore):
(?:"|(?!^)\G)\K([^ "]*) (?=[^"]*")
And replace with $1_. If you need to restrict to replacing inside &LOG Part only, just add it to the beginning:
(?:&LOG Part:\s*"|(?!^)\G)\K([^ "]*) (?=[^"]*")
A human-readable explanation of the regex:
(?:"|(?!^)\G)\K - Find a ", or, with each subsequent successful match, the end of the previous successful match position, and omit all the text in the buffer (thanks to \K)
([^ "]*) - (Group 1, accessed with$1from the replacement pattern) 0+ characters other than a space and"`
+ - one or more literal spaces (replace with \h to match all horizontal whitespace, or \s to match any whitespace)
(?=[^"]*") - check if there is a double quote ahead of the current position

Replace quotes inside quoted string with escaped quotes in notepad++?

I am using Notepad++ to find (".*)"(.*) and replace it with \1\"\2 but it doesn't seem to work. I don't know why.
Example:
Someone said "My name is "sean""
I want it to be:
Someone said "My name is \"sean\""
Edit: In my case the closing quote is always on the end of line so will (".*)"(.*"$) work?
Edit2: Also the first quote is preceded with a comma so I will use (,".*)"(.*"$) though it may not work in some cases but I think it will work with my file.
Now there is the problem with the replace it doesn't add \" it just add some space.
It should work... you just need to do a little fixing...
The Find what regex should be ("[^"]*)("\w*)(")([^"]*")
The Replace with expression should be \1\\\2\\\3\4
Make sure you select the Search Mode to be "Regular expression"
Explanation...
This is quite tricky - I've assumed that the quoted text WITHIN quotes is just a single word. If you assume something else it becomes very hard to pin down.
You need to find a
" followed by
[^"]* - any number of characters that are NOT a " and then
("\w*)(") - a quoted word, and then finally
([^"]*") - any additional number of non-quote characters + a final quote
This is important because regular expression matching is greedy by default, and a .* would continue to match all characters, including " until the end of the string (see link )
In the replacement string you need to have \\ to represent a single \

How to find and replace contents of a bracket inside notepad++

I have a large file with content inside every bracket. This is not at the beginning of the line.
1. Atmos-phere (7800)
2. Atmospheric composition (90100)
3.Air quality (10110)
4. Atmospheric chemistry and composition (889s120)
5.Atmospheric particulates (10678130)
I need to do the following
Replace the entire content, get rid of line numbers
1.Atmosphere (10000) to plain Atmosphere
Delete the line numbers as well
1.Atmosphere (10000) to plain Atmosphere
make it a hyperlink
1.Atmosphere (10000) to plain linky study
[I added/Edit] Extract the words into a new file, where we get a simple list of key words. Can you also please explain the numbers in replace the \1\2, and escape on some characters
Each set of key words is a new line
Atmospheric
Atmospheric composition
Air quality
Each set is a on one line separated by one space and commas
Atmospheric, Atmospheric composition, Air quality
I tried find with regex like so, \(*\) it finds the brackets, but dont know how to replace this, and where to put the replace, and what variable holds the replacement value.
Here is mine exression for notepad ([0-9(). ]*)(.*)(\s\()(.*)
You need split your search in groups
([0-9. ]*) numbers, spaces and dots combination in 0 or more times
(.*) everything till next expression
(\s\() space and opening parenthesis
(.*) everything else
In replace box - for practicing if you place
\1\2\3\4 this do nothing :) just print all groups from above from 1.1 to 1.4
\2 this way you get only 1.2 group
new_thing\2new_thing adds your text before and after group
<a href=blah.com/\2.html>linky study</a> so now your text is added - spaces between words can be problematic when creating link - so another expression need to be made to replace all spaces in link to i.e. _
If you need add backslash as text (or other special sign used by regex) it must be escaped so you put \\ for backslash or \$ for dolar sign
Want more tune - <a href=blah.com/\2.html>\2</a> add again 1.2 group - or use whichever you want
On the screenshot you can see how I use it (I had found and replaced one line)
Ok and then we have case 4.2 with colon at the end so simply add colon after extracted section:
change replace from \2 to \2,
Now you need join it so simplest way is to Edit->Line Operations->Join Lines
but if you want to be real pro switch to Extended mode (just above Regular expression mode in Replace window) and Find \r\n and replace with space.
Removing line endings can differ in some cases but this is another story - for now I assume that you using windows since Notepad++ is windows tool and line endings are in windows style :)
The following regex should do the job: \d+\.\s*(.*?)\s*\(.*?\).
And the replacement: <a href=example.com\\\1.htm>\1</a>.
Explanation:
\d+ : Match a digit 0 or more times.
\. : Match a dot.
\s* : Match spaces 0 or more times.
(.*?) : Group and match everything until ( found.
\s* : Match spaces 0 or more times.
\(.*?\) : Match parenthesis and what's between it.
The replacement part is simple since \1 is referring to the matching group.
Online demo.
Try replacing ^\d+\.(.*) \(\w+\)$ with <a href=blah.com\\\1.htm>linky study</a>.
The ^\d+. removes the leading number and dot. The (.*) collects the words. Then there is a single space. The \(\w+\)$ matches the final number in brackets.
Update for the added Q4.
Regular expressions capture things written between round brackets ( and ). Brackets that are to be found in the text being searched must be escaped as \( and \). In the replacement expression the \1 and \2 etc are replaced by the corresponding capture expression. So a search expression such as Z(\d+)X([aeiou]+)Y might match Z29XeieiY then the replacement expression P\2Q\1R would insert PeieiQ29R. In the search at the top of this answer there is one capture, the (.) captures or collects the words and then the \1 inserts the captured words into the replacement text.