Find character, text around and extract it in Notepad++ - regex

I have a problem to find a character, enlarge it by constant number of characters around and return it.
Example of text:
Contrary to popular belief, (Lorem Ipsum) is not simply random text. It (has) roots in a piece of ...
Expected result:
r belief, (Lorem Ipsu
text. It (has) roots
How it should work:
find position of "(" - 10 characters
find position of "(" + 10 characters
extract text with start position of point 1. and end position of point 2. (and store it in a new row)
Please is it possible to do this in Notepad++ or similar software with function Find and Replace?
I believe this can be done with regex, but I am not able to write it.
Thank you very much!

Do a regular expression find/replace like this:
Open Replace Dialog
Find What: (.{10}\(.{10})
Replace With: \r\n\1\r\n
check regular expression
click Replace or Replace All
Depending on your line endings, you may need to change the \r\n to \n in the replacement.
Explanation:
the regular expressin centers at a literal ( (it has to be escaped as \( due the regex rules)
it captures the 10 character before and after it with the two .{10} sections
all the 21 character are captured into \1 (by putting the whole regular expression in unescaped parenthesis)
the replacement inserts \1 surrounded by linebreaks (either \r\n or \n, adopt what you need)

Related

Multi-line regular expressions in Visual Studio Code

I cannot figure a way to make regular expression match stop not on end of line, but on end of file in VS Code? Is it a tool limitation or there is some kind of pattern that I am not aware of?
It seems the CR is not matched with [\s\S]. Add \r to this character class:
[\s\S\r]+
will match any 1+ chars.
Other alternatives that proved working are [^\r]+ and [\w\W]+.
If you want to make any character class match line breaks, be it a positive or negative character class, you need to add \r in it.
Examples:
Any text between the two closest a and b chars: a[^ab\r]*b
Any text between START and the closest STOP words:
START[\s\S\r]*?STOP
START[^\r]*?STOP
START[\w\W]*?STOP
Any text between the closest START and STOP words:
START(?:(?!START)[\s\S\r])*?STOP
See a demo screenshot below:
To matcha multi-line text block starting from aaa and ending with the first bbb (lazy qualifier)
aaa(.|\n)+?bbb
To find a multi-line text block starting from aaa and ending with the last bbb. (greedy qualifier)
aaa(.|\n)+bbb
If you want to exclude certain characters from the "in between" text, you can do that too. This only finds blocks where the character "c" doesn't occur between "aaa" and "bbb":
aaa([^c]|\n)+?bbb

Regular expression to find and replace wrong quotation marks

I have a document which has been copy/pasted from MS Word. All the quotations are copied as ''something'' which basically is creating a mess in my LaTeX document, hence they have to be ``something''.
Is it possible to make a regular expression that finds all these ''something'' where something can be anything (including symbols, numbers etc.), and a regular expression that replaces it with the correct quotation? I am using Sublime Text which is able to use RegEX directly in the editor.
The below regex would match all the double single quoted strings and capture all the characters except the first two single quotes(only in the matched string). Replacing the matched characters with double backticks plus the characters inside group index 1 will give you the desired result.
Regex:
''(.*?'')
Replacemnet string:
``$1
DEMO

How to find and replace contents of a bracket inside notepad++

I have a large file with content inside every bracket. This is not at the beginning of the line.
1. Atmos-phere (7800)
2. Atmospheric composition (90100)
3.Air quality (10110)
4. Atmospheric chemistry and composition (889s120)
5.Atmospheric particulates (10678130)
I need to do the following
Replace the entire content, get rid of line numbers
1.Atmosphere (10000) to plain Atmosphere
Delete the line numbers as well
1.Atmosphere (10000) to plain Atmosphere
make it a hyperlink
1.Atmosphere (10000) to plain linky study
[I added/Edit] Extract the words into a new file, where we get a simple list of key words. Can you also please explain the numbers in replace the \1\2, and escape on some characters
Each set of key words is a new line
Atmospheric
Atmospheric composition
Air quality
Each set is a on one line separated by one space and commas
Atmospheric, Atmospheric composition, Air quality
I tried find with regex like so, \(*\) it finds the brackets, but dont know how to replace this, and where to put the replace, and what variable holds the replacement value.
Here is mine exression for notepad ([0-9(). ]*)(.*)(\s\()(.*)
You need split your search in groups
([0-9. ]*) numbers, spaces and dots combination in 0 or more times
(.*) everything till next expression
(\s\() space and opening parenthesis
(.*) everything else
In replace box - for practicing if you place
\1\2\3\4 this do nothing :) just print all groups from above from 1.1 to 1.4
\2 this way you get only 1.2 group
new_thing\2new_thing adds your text before and after group
<a href=blah.com/\2.html>linky study</a> so now your text is added - spaces between words can be problematic when creating link - so another expression need to be made to replace all spaces in link to i.e. _
If you need add backslash as text (or other special sign used by regex) it must be escaped so you put \\ for backslash or \$ for dolar sign
Want more tune - <a href=blah.com/\2.html>\2</a> add again 1.2 group - or use whichever you want
On the screenshot you can see how I use it (I had found and replaced one line)
Ok and then we have case 4.2 with colon at the end so simply add colon after extracted section:
change replace from \2 to \2,
Now you need join it so simplest way is to Edit->Line Operations->Join Lines
but if you want to be real pro switch to Extended mode (just above Regular expression mode in Replace window) and Find \r\n and replace with space.
Removing line endings can differ in some cases but this is another story - for now I assume that you using windows since Notepad++ is windows tool and line endings are in windows style :)
The following regex should do the job: \d+\.\s*(.*?)\s*\(.*?\).
And the replacement: <a href=example.com\\\1.htm>\1</a>.
Explanation:
\d+ : Match a digit 0 or more times.
\. : Match a dot.
\s* : Match spaces 0 or more times.
(.*?) : Group and match everything until ( found.
\s* : Match spaces 0 or more times.
\(.*?\) : Match parenthesis and what's between it.
The replacement part is simple since \1 is referring to the matching group.
Online demo.
Try replacing ^\d+\.(.*) \(\w+\)$ with <a href=blah.com\\\1.htm>linky study</a>.
The ^\d+. removes the leading number and dot. The (.*) collects the words. Then there is a single space. The \(\w+\)$ matches the final number in brackets.
Update for the added Q4.
Regular expressions capture things written between round brackets ( and ). Brackets that are to be found in the text being searched must be escaped as \( and \). In the replacement expression the \1 and \2 etc are replaced by the corresponding capture expression. So a search expression such as Z(\d+)X([aeiou]+)Y might match Z29XeieiY then the replacement expression P\2Q\1R would insert PeieiQ29R. In the search at the top of this answer there is one capture, the (.) captures or collects the words and then the \1 inserts the captured words into the replacement text.

Regular Expression (Regex)

I need to reformat a text file a bit in my Notepad++ and I have a text of this kind:
This is some example text. This is some example text. This is some example text.
- This is some example text.
-This is some example text.
- This is some example text.
- This is some example text.
So as you can see in above text there are two types of "-" preceeding text the one with the space after "-" and ones without it I need to find only the ones without sapce and add it in between "-" and the "text"
If I ran piece of code below
-[A-Za-z0-9]
it finds dash and first letter right after it, which is not useful as when I replace the text it changes this first letter which is always different (depending on what is written) so I need to find this and select only the "-" and then replace it with "- " unless there is better way.
For demonstration purposes:
Find what: -([A-Za-z0-9])(.+)
Replace with: - \1\2
The parentheses denote a capture group. In the Replace with line, you use backslash and the number of group to add it.
That said, what you really want to match for is a NOT group, like -([^\s]) (match where a dash isn't immediately followed by a whitespace).
Search for
-([^ ])
and replace with
- \1
[^ ] is a negated character class and matches everything but a space. This character is stored in \1 because of the brackets () around the pattern.

Eclipse, regular expression search and replace

In eclipse, is it possible to use the matched search string as part of the replace string when performing a regular expression search and replace?
Basically, I want to replace all occurrences of
variableName.someMethod()
with:
((TypeName)variableName.someMethod())
Where variableName can be any variable name at all.
In sed I could use something like:
s/[a-zA-Z]+\.someMethod\(\)/((TypeName)&)/g
That is, & represents the matched search string. Is there something similar in Eclipse?
Thanks!
Yes, ( ) captures a group. You can use it again with $i where i is the i'th capture group.
So:
search: (\w+\.someMethod\(\))
replace: ((TypeName)$1)
Hint: Ctrl + Space in the textboxes gives you all kinds of suggestions for regular expression writing.
Using ...
search = (^.*import )(.*)(\(.*\):)
replace = $1$2
...replaces ...
from checks import checklist(_list):
...with...
from checks import checklist
Blocks in regex are delineated by parenthesis (which are not preceded by a "\")
(^.*import ) finds "from checks import " and loads it to $1 (eclipse starts counting at 1)
(.*) find the next "everything" until the next encountered "(" and loads it to $2. $2 stops at the "(" because of the next part (see next line below)
(\(.*\):) says "at the first encountered "(" after starting block $2...stop block $2 and start $3. $3 gets loaded with the "('any text'):" or, in the example, the "(_list):"
Then in the replace, just put the $1$2 to replace all three blocks with just the first two.
NomeN has answered correctly, but this answer wouldn't be of much use for beginners like me because we will have another problem to solve and we wouldn't know how to use RegEx in there. So I am adding a bit of explanation to this. The answer is
search: (\w+\\.someMethod\\(\\))
replace: ((TypeName)$1)
Here:
In search:
First and last (, ) depicts a group in regex
\w depicts words (alphanumeric + underscore)
+ depicts one or more (ie one or more of alphanumeric + underscore)
. is a special character which depicts any character (ie .+ means
one or more of any character). Because this is a special character
to depict a . we should give an escape character with it, ie \.
someMethod is given as it is to be searched.
The two parenthesis (, ) are given along with escape character
because they are special character which are used to depict a group
(we will discuss about group in next point)
In replace:
It is given ((TypeName)$1), here $1 depicts the
group. That is all the characters that are enclosed within the first
and last parenthesis (, ) in the search field
Also make sure you have checked the 'Regular expression' option in
find an replace box
At least at STS (SpringSource Tool Suite) groups are numbered starting form 0, so replace string will be
replace: ((TypeName)$0)
For someone who needs an explanation and an example of how to use a regxp in Eclipse. Here is my example illustrating the problem.
I want to rename
/download.mp4^lecture_id=271
to
/271.mp4
And there can be multiple of these.
Here is how it should be done.
Then hit find/replace button