How to group lines of text using Notepad++ - regex

I find Notepad++ regex to be very different from regex in Microsoft Word. I was wondering how I can group several lines of text using Notepad++. I have a text file with 100+ URLs. They are written one URL address per line. I would like to group all of them by tens by removing the carriage returns from every first to 9th line, but retaining the carriage return on every 10th line and adding another carriage return thereafter. For example:
I want this:
http://website1.com
http://website2.com
http://website3.com
http://website4.com
http://website5.com
http://website6.com
http://website7.com
http://website8.com
http://website9.com
http://website10.com
http://website11.com
http://website12.com
http://website13.com
http://website14.com
http://website15.com
http://website16.com
http://website17.com
http://website18.com
http://website19.com
http://website20.com
http://website21.com
http://website22.com
http://website23.com
http://website24.com
http://website25.com
http://website26.com
http://website27.com
http://website28.com
http://website29.com
http://website30.com
to look like:
http://website1.comhttp://website2.comhttp://website3.comhttp://website4.comhttp://website5.comhttp://website6.comhttp://website7.comhttp://website8.comhttp://website9.comhttp://website10.com
http://website11.comhttp://website12.comhttp://website13.comhttp://website14.comhttp://website15.comhttp://website16.comhttp://website17.comhttp://website18.comhttp://website19.comhttp://website20.com
http://website21.comhttp://website22.comhttp://website23.comhttp://website24.comhttp://website25.comhttp://website26.comhttp://website27.comhttp://website28.comhttp://website29.comhttp://website30.com
Any help would be appreciated!

Ok, I have found a way:
There is a such possibility, but only with 6 entries in a row (longest regex is not parsed by the Notepad++).
1)So, open the file and remove from it all newlines characters, so the text will be a long-long line.
2)Open replace dialog, insert in the "Find what" field the next :
(http://[^\:]*\.comhttp://[^\:]*\.comhttp://[^\:]*\.comhttp://[^\:]*\.comhttp://[^\:]*\.comhttp://[^\:]*\.com)
and in the "Replace With" the next:
\1\r\n
Put the cursor at the first position in the text and press "Replace all"
So, the regex contains this (http://[^\:]*\.com){6} (the regex is repeated 6 times). If you work with Unix and you need unix-type new line style, replace this : \1\r\n with this \1\n

Related

Notepad++ and regex (multiline)

I have been facing a challenge. I have a text file with the following pattern:
SOME RANDOM TITLE IN CAPS (nnnn)
text text text
more text
...
SOME OTHER RANDOM TITLE IN CAPS (nnnn)
What is for sure is that what I want to extract are lines with a bracket and a date ex: (2015) ; (20008)
After the (nnnn) there is no text, sometimes space and CR LF, sometimes just CR LF
I would like to delete everything else and keep just the TITLE LINE with the brackets
The time I spent I could have done it by hand (there are 100lines) but I like the challenge :)
I thought I could find the issue but I am stuck.
I have tried something along this line:
^.*\(\d\d\d\d\)(?s)(.*)(^.*\(\d\d\d\d\))
But I don't get what I want. I can't seem to stop the (?s)(.*) going all the way to the end of the text instead of stopping at the next occurrence.
I suggest using the Search > Mark feature. Use a pattern like \(\d{4}\) and check the "Bookmark Line" option then click "Mark All". Then use Search > Bookmark > Remove Unmarked Lines. This will remove all lines except the ones that have matched your pattern.
Note: If it's possible to have parentheses with 4 digits within your other lines you could add $ to the end of the expression to ensure that the pattern only matches the end of the line. E.g. more text (1234) and other stuff would be matched by the pattern I gave above but if you use pattern \(\d{4}\)$ it will no longer match.
If you want to be even more specific with your pattern by looking for those lines with only uppercase letters and spaces followed by parentheses with 4 digits inside where the parentheses are at the end of the line, then you could use a pattern like this: [A-Z ]+\(\d{4}\)$
Sample input:
SOME RANDOM TITLE IN CAPS (2008)
text text text
more text
...
SOME OTHER RANDOM TITLE IN CAPS (2010)
Here is how to mark the lines:
After clicking "Mark All" here is what you see:
Now use Search > Bookmark > Remove Unmarked Lines and you get this:
The following RegEx maches the 2 lines with brackets containing 4 numbers:
.*?\(\d{4}\)\s*
It starts matching anything at start zero or more times (non greedy), then it matches a start bracket followed by 4 numbers. Finally ending White Space and new line.
If you want to remove all lines but the ones that end with (4numbers) you may try with this:
^(?!.*\(\d{4}\)\h*$).*(?:\r?\n|\z)
Replace by: (nothing)
See demo

Regular expression for selecting trailing whitespace except first space after last character in line

I'm editing text in Atom.
Beginning with the regex, $\s , I haven't been able to figure out how to anchor my selection from the second blank space after the line.
I want to remove the thousands of line returns in a text file ( originally formatted as an .srt video transcript ) and replace them with a single, blank space so as to not join together any words.
For example, my file looks like this:
This content is
difficult to read
because the lines
break after too
few characters.
$\s will select all trailing whitespace, something that I don't want to do, because if I delete all the space selected by that regex then I will cause lots of words to join up into nonsense.
I want to start trimming the trailing whitespace of each line from the second blank space, not the first, so that the expected output would be:
"This content is difficult to read because the lines break after too many characters."
Instead of:
"This content isdifficult to readbecause the linesbreak after toofew characters."
I have solved this problem using MS Word's Find and Replace; substituting a single space ( by literally hitting the space bar once ) for all the hard returns ( enter ^p in the Find field ).
I don't know why the Atom regex engine wasn't recognising the answer provided in the comments from regex101.com? It solves my problem in the regex101 tester.

Regex - replace folder details with filename

Completely new to Regex so I was hoping I could find an answer here.
I'm using Notepad++, and I have a big bulk of file details from a folder in a text document, like so:
01/01/2015 08:00 1,000,000 filename.exe
01/02/2015 08:30 1,450,000 aDifferentFilename.exe
And I want to do a find and replace so that the whole thing is replaced by:
filename.exe
aDifferentFilename.exe
I could delete them manually, but there's over a thousand lines!
I've used ^(.*)% to find the lines one by one, but what would I put in the replace field to keep the filename, i.e filename.exe?
Any help/explanation would be great!
In Notepad++'s find dialog, click on the tab for "replace" (probably obvious, but to be complete). Make sure the radio button for "Regular expression" is checked (again, probably obvious). In the "Find what:" text box enter:
^([^ ]+[ ]+){3}(.*)$
if the pattern in your file is consistently four total fields of information (including the file names) each separated by spaces. Explanation: finds three groups of one or more non spaces followed by one or more spaces followed by everything else on the line. "Everything else on the line" is assigned to group 2 (it is enclosed in the second set of parenthesis the expression). We will use this fact below to specify the "Replace with:" string. This is necessary to advance the search position past the text we want to keep, otherwise after the replacement it would match the expression, and would itself be replaced.
Enter this:
^(.{34})(.*)$
if the consistent pattern in your file is that the file name always starts in the 35th column (both patterns could hold true, in which case you could use either). Explanation: This finds the first 34 characters at the start of each line followed by everything else on the line. See explanation above why we want to "find everything else on the line." Note that it is not necessary to group ".{34}" in parenthesis, I simply did this so that in both exampls the "replace with:" text would be group 2.
In the "Replace with:" text box enter \2
Explanation: This tells Notepad++ to replace what we matched with the group 2 subset of what we matched, in other words, "everything else on the line", which in this case is the file name.
Click "Replace"
Another option: If the text you want to keep always starts in column 35 (like required for the approach immediately above), you can select the column of text you want to delete by holding down ctrl+alt+shift and then left clicking with your mouse and dragging. Once the text is selected, hit delete
You can try matching on either 3 sets of spaces, or assume the comma is always fixed. Here is something quick and dirty which matches the comma in a greedy fashion, and 5 characters after that.
^(.*,.....)

How to search a word using regex and concatenate it to other words also found by using regex on a per line basis?

I have a file in format:
has | have | had\tmeaning of have\n
apple\tmeaning of apple\n
write | wrote\tmeaning of write\n
I want to have it in the following format:
has\tmeaning of have\n
have\tmeaning of have\n
had\tmeaning of have\n
apple\tmeaning of apple\n
etc. Word(s) (has, have, had) can be single or multiple. Multiple words are seperated by space, pipe character, space. Meaning is followed by tab character and ended by new line. I am not sure but want to assume that meaning may contain pipe or tab character (or better any character except newline). Can it be done in notepad++? If not, is there other easy alternative?
My input file uses actual newline and tab characters. Since I can't paste them in stackoverflow, I have presented them as \n and \t (escape sequences) instead in the examples.
EDIT
It sounds like in your input, the tabs and new lines are not literally inserted. This should work:
Search: \s*([^ |]+) \|\s*(?=.*?\t(.*?)(?=(?:\R|$)))
Replace: \1\t\2\n
Original
In the Replace tab, make sure to check the "regex" box at the bottom left, then use this:
Search: \s*([^ |]+) \|\s*(?=.*?\\t(.*?)(?=(?:\\n|$)))
Replace: \1\t\2\n

How to remove everything except first column from text file in Notepad++?

I have a huge text file ( 375K lines ). All I want is the first column of the text file. I am using notepad++. How can I remove everything except that first column?
Deleting using column select is impossible in such file. I think regex can help me or may be some plugin.
Edit
#Bolt: Column: Consider this as first 12 characters or [space] then numbers then [space]
To use column-mode select, you can use Alt-Shft-Arrow keys or Alt + Left mouse click
Search for
^(............).*
and replace with \1
Turn on regular expression mode.
^ match the start of the row
(............) matches 12 characters (no matter what) and stores it in \1
.* matches everything else in the row that will be removed.
Select the block consisting of your first column using ALT+SHIFT and cursor keys or ALT+Mouse.
Copy the Block (CTRL+C)
Select All (CTRL+A)
Paste the copied Block (CTRL+V)
Done