How to find and replace contents of a bracket inside notepad++ - regex

I have a large file with content inside every bracket. This is not at the beginning of the line.
1. Atmos-phere (7800)
2. Atmospheric composition (90100)
3.Air quality (10110)
4. Atmospheric chemistry and composition (889s120)
5.Atmospheric particulates (10678130)
I need to do the following
Replace the entire content, get rid of line numbers
1.Atmosphere (10000) to plain Atmosphere
Delete the line numbers as well
1.Atmosphere (10000) to plain Atmosphere
make it a hyperlink
1.Atmosphere (10000) to plain linky study
[I added/Edit] Extract the words into a new file, where we get a simple list of key words. Can you also please explain the numbers in replace the \1\2, and escape on some characters
Each set of key words is a new line
Atmospheric
Atmospheric composition
Air quality
Each set is a on one line separated by one space and commas
Atmospheric, Atmospheric composition, Air quality
I tried find with regex like so, \(*\) it finds the brackets, but dont know how to replace this, and where to put the replace, and what variable holds the replacement value.

Here is mine exression for notepad ([0-9(). ]*)(.*)(\s\()(.*)
You need split your search in groups
([0-9. ]*) numbers, spaces and dots combination in 0 or more times
(.*) everything till next expression
(\s\() space and opening parenthesis
(.*) everything else
In replace box - for practicing if you place
\1\2\3\4 this do nothing :) just print all groups from above from 1.1 to 1.4
\2 this way you get only 1.2 group
new_thing\2new_thing adds your text before and after group
<a href=blah.com/\2.html>linky study</a> so now your text is added - spaces between words can be problematic when creating link - so another expression need to be made to replace all spaces in link to i.e. _
If you need add backslash as text (or other special sign used by regex) it must be escaped so you put \\ for backslash or \$ for dolar sign
Want more tune - <a href=blah.com/\2.html>\2</a> add again 1.2 group - or use whichever you want
On the screenshot you can see how I use it (I had found and replaced one line)
Ok and then we have case 4.2 with colon at the end so simply add colon after extracted section:
change replace from \2 to \2,
Now you need join it so simplest way is to Edit->Line Operations->Join Lines
but if you want to be real pro switch to Extended mode (just above Regular expression mode in Replace window) and Find \r\n and replace with space.
Removing line endings can differ in some cases but this is another story - for now I assume that you using windows since Notepad++ is windows tool and line endings are in windows style :)

The following regex should do the job: \d+\.\s*(.*?)\s*\(.*?\).
And the replacement: <a href=example.com\\\1.htm>\1</a>.
Explanation:
\d+ : Match a digit 0 or more times.
\. : Match a dot.
\s* : Match spaces 0 or more times.
(.*?) : Group and match everything until ( found.
\s* : Match spaces 0 or more times.
\(.*?\) : Match parenthesis and what's between it.
The replacement part is simple since \1 is referring to the matching group.
Online demo.

Try replacing ^\d+\.(.*) \(\w+\)$ with <a href=blah.com\\\1.htm>linky study</a>.
The ^\d+. removes the leading number and dot. The (.*) collects the words. Then there is a single space. The \(\w+\)$ matches the final number in brackets.
Update for the added Q4.
Regular expressions capture things written between round brackets ( and ). Brackets that are to be found in the text being searched must be escaped as \( and \). In the replacement expression the \1 and \2 etc are replaced by the corresponding capture expression. So a search expression such as Z(\d+)X([aeiou]+)Y might match Z29XeieiY then the replacement expression P\2Q\1R would insert PeieiQ29R. In the search at the top of this answer there is one capture, the (.) captures or collects the words and then the \1 inserts the captured words into the replacement text.

Related

How to keep parts of the text you replace in Notepad++

I would like to replace all new lines after lower cases with space.
Regex: "[a-z]\n" and only replace the \n with space.
How do I keep the lower case, when I replace it?
Use Capture groups.
For your example, it would look like this:
Replace ([a-z])\n with $1
$n represents the content of the nth capture group. Capture groups are created by putting braces ( ) arround a part of the regex.
Windows uses \r\n for newlines, so for here is a regex that supports both styles of line endings by making the \r optional:
([a-z])\r?\n

How to encase words with quotations?

I am currently trying to convert a list of 1000 words into this format:
'known', 'buss', 'hello',
and so on.
The list i have is currently in this format:
known
worry
claim
tenuous
porter
I am trying to use notepad++ to do this, if anybody could point me in the correct direction, that would be great!
Use this if you want a comma delimited list but no extra comma at the end.
Ctrl+H
Find what: (\S+)(\s+)?
Replace with: '$1'(?2,:)
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
(\S+) # group 1, 1 or more non spaces
(\s+)? # group 2, 1 or more spaces, optional
Replacement:
'$1' # content of group 1 enclosed in quotes
(?2,:) # if group 2 exists, add a comma, else, do nothing
Screen capture (before):
Screen capture (after):
How about replacing (\S+) with '$1'? Make sure your Regular Expression button is selected in the Find and Replace tool inside Notepad++
Explanation
(\S+) is regex for repeating non-whitespace characters (1 or more). Wrapping it in parenthesis puts it in a capture group which can be accessed in numerical order by using a dollar sign ($1).
'$1' will take that found text from the Find above and replace it with capture group #1 ($1) wrapped in single quotes '.
Sample
Input: known worry claim tenuous porter
Output: 'known' 'worry' 'claim' 'tenuous' 'porter'

Find character, text around and extract it in Notepad++

I have a problem to find a character, enlarge it by constant number of characters around and return it.
Example of text:
Contrary to popular belief, (Lorem Ipsum) is not simply random text. It (has) roots in a piece of ...
Expected result:
r belief, (Lorem Ipsu
text. It (has) roots
How it should work:
find position of "(" - 10 characters
find position of "(" + 10 characters
extract text with start position of point 1. and end position of point 2. (and store it in a new row)
Please is it possible to do this in Notepad++ or similar software with function Find and Replace?
I believe this can be done with regex, but I am not able to write it.
Thank you very much!
Do a regular expression find/replace like this:
Open Replace Dialog
Find What: (.{10}\(.{10})
Replace With: \r\n\1\r\n
check regular expression
click Replace or Replace All
Depending on your line endings, you may need to change the \r\n to \n in the replacement.
Explanation:
the regular expressin centers at a literal ( (it has to be escaped as \( due the regex rules)
it captures the 10 character before and after it with the two .{10} sections
all the 21 character are captured into \1 (by putting the whole regular expression in unescaped parenthesis)
the replacement inserts \1 surrounded by linebreaks (either \r\n or \n, adopt what you need)

While replacing using regex, How to keep a part of matched string?

I have
12.hello.mp3
21.true.mp3
35.good.mp3
.
.
.
so on as file names in listed in a text file.
I need to replace only those dots(.) infront of numbers with a space.(e.g. 12.hello.mp3 => 12 hello.mp3).
If I have regex as "[0-9].", it replaces number also.
Please help me.
Replace
^(\d+)\.(.*mp3)$
with
\1 \2
Also, in recent versions of notepad++, it will also accept the following, which is also accepted by other IDEs/editors (eg. JetBrains products like Intellij IDEA):
$1 $2
This assumes that the notepad++ regex matching engine supports groups. What the regex basically means is: match the digits in front of the first dot as group 1 and everything after it as group 2 (but only if it ends with mp3)
I tested with vscode. You must use groups with parentheses (group of regex)
Practical example
start with sample data
1 a text
2 another text
3 yet more text
Do the Regex to find/Search the numerical digits and spaces. The group here will be the digits as it is surrounded in parenthesis
(\d)\s
Run a replace regex ops. Replace spaces for a dash but keep the numbers or digits in each line
$1-
Outputs
1-a text
2-another text
3-yet more text
Using the basic pattern, well described in the accepted answer here is an example to add the class="odd" and class="even" to every <tr> element in Notepad++ or any other regex compatible editor:
Find what: (<tr><td>)(.*?\r\n)(<tr><td>)(.*?\r\n)
Replace with: <tr class="odd"><td>\2<tr class="even"><td>\4

Regular Expression (Regex)

I need to reformat a text file a bit in my Notepad++ and I have a text of this kind:
This is some example text. This is some example text. This is some example text.
- This is some example text.
-This is some example text.
- This is some example text.
- This is some example text.
So as you can see in above text there are two types of "-" preceeding text the one with the space after "-" and ones without it I need to find only the ones without sapce and add it in between "-" and the "text"
If I ran piece of code below
-[A-Za-z0-9]
it finds dash and first letter right after it, which is not useful as when I replace the text it changes this first letter which is always different (depending on what is written) so I need to find this and select only the "-" and then replace it with "- " unless there is better way.
For demonstration purposes:
Find what: -([A-Za-z0-9])(.+)
Replace with: - \1\2
The parentheses denote a capture group. In the Replace with line, you use backslash and the number of group to add it.
That said, what you really want to match for is a NOT group, like -([^\s]) (match where a dash isn't immediately followed by a whitespace).
Search for
-([^ ])
and replace with
- \1
[^ ] is a negated character class and matches everything but a space. This character is stored in \1 because of the brackets () around the pattern.