I work under Windows, I'm trying to clean a text I'd like to study, what's the right regex in Notepad++ to remove lines which are <= 10-character-sized.
Search:
^.{0,9}((\r?\n)|$)
Replace with a blank (ie nothing)
Step.1) Replace all the lines containing less than or equal to 10 chars with empty string.
^.{0,10}$
Step.2) Now you have lots of empty lines. So, remove empty lines:
Remove Empty Lines
"Edit" > "Line Operations" > "Remove Empty Lines"
Related
I have a huge file around half a gig and the file has records shown below:
44 ,1577,23GRE ,GREASE THE ENGINE
44 ,1577,23GRE ,GREASE THE ENGINE
44 ,1577,24GRE ,GREASE THE WHEELS
I want to remove white spaces between the the commas and whitespaces after content "GREASE THE ENGINE" and convert the file as shown below using vi:
"44","1577","23GRE","GREASE THE ENGINE"
"44","1577","23GRE","GREASE THE ENGINE"
"44","1577","24GRE","GREASE THE WHEELS"
I tried removing whitespaces by giving a command :1,$s/ //g This removes all the whitespace and renders the file as shown below which defeats the purpose. I want GREASE THE ENGINE with spaces.
44,1577,23GRE,GREASETHEENGINE
44,1577,23GRE,GREASETHEENGINE
44,1577,24GRE,GREASETHEWHEELS
Appreciate any or all help.
Thanks
You can use this substitute command in vi:
:1,$s/ *,/,/g
:1,$s/ *$//
:1,$s/,/","/g
First we replace trailing spaces then replace all spaces followed by , to a single ,. Finally we match each field that is not a comma and quote them.
[[:blank:]] will match a space or tab.
For your input it gives:
"44","1577","23GRE","GREASE THE ENGINE"
"44","1577","23GRE","GREASE THE ENGINE"
"44","1577","24GRE","GREASE THE WHEELS"
Another possibility:
%s/\s\+\([,$]\)/"\1"/g
%s/^/"/g
%s/$/"/g
Remove any sequence of white space characters (white space, tab, etc.) before a comma or the end of line;
Add " at the beginning of every line
Add " at the end of every line
If you had an empty line at the end of the file, you will end up with a line of "" which is easy to delete.
I have a text file that contains thousands of lines of text as below.
aaaa "test "
aa "test "(version 2)
bbbb "test "(version 4)
bbbbb "test1 "(with heads)
abs "test1 "
absc "test3"
I would like to be able to remove all the duplicates based on a search and keep only the first line (in my case all lines with the same value between the quotation marks)
EDIT : More details about how I detect that a line is a duplicate of another :
I check the value between the quotation marks. On the 3 first lines there is the value "test " between quotation marks so I want to keep the first line with this value and remove the other values. For lines 4 and 5 the value is "test1 " so I keep only line 4 and remove the other.
So after cleaning my text file would have this form
aaaa "test "
bbbbb "test1 "(with heads)
absc "test3"
I tried to use this regular search in notepad++
(.\".*?")
But I don't know how to use it to find duplicates and remove the other lines with the same value. I already checked other user's case but I can't found a solution.
I would solve it in several steps.
append line numbers
put the quoted text in front
sort, now lines with the same quoted text are sorted behind each other, and secondly in the original sequence due to the line numbers from step 1
remove "duplicates"
remove the inserted quoted text from step 2
sort by the line number from step 1
remove the line numbers from step 1
Now the detailed explanation:
append line numbers: use Edit -> Column Editor in the first column two times
insert text (some delimiter that does not occur in the file, e.g. | or : )
insert numbers start with 1 increment by 1 use leading zeros
Now each line should start with a line number and a delimiter
prepend the quoted text: use regexp replace
Find what: ^([^"]*)("[^"]+")(.*)$
Replace: \2\1\2\3
Now your lines should start with the text.
Sort: by using Edit -> Line Operations -> Sort ...
Remove Duplicates: with an regexp replace:
Find What: ("[^"]+")(.*)\n\1.*
Replace: \1\2
Use Replace All.
Remove the texts from step 2: using regex replace
Find What: ^"[^"]+"
Replace with: Nothing i.e. leave empty
Sort by the original line numbers: by using Edit -> Line Operations -> Sort ...
Remove the line numbers from step 1: using a regexp replace:
Find What: ^(.*\|) (use \| or whatever you used in step 1 as delimiter)
Replace with: Nothing i.e. leave empty
I have a problem deleting some lines from a character matrix file created by SupRip (OCR Sup Extractor). I want to make a regex which is going to do this:
- If character l or I found then delete current line and next 40 lines until those 2 characters are not found anymore in the file.
I want to know if this is possible with Notepad++.
Yes, it is possible to do this using regex.
Find: [^\n]*[lI][^\n]*\n(?:[^\n]*\n){1,40}
Replace:
^^^ empty string
I have a document that has a range of numbers like this:
0300010000000394001001,27
0300010000000394001002,0
0300010000000394002001,182
0300010000000394002002,51
0300010000000394003001,156
0300010000000394003002,40
I need to find the new line character and replace with a number of spaces depending on the string length.
If it has 24 characters like this - 0300010000000394001002,0 then I need to replace the new line character at the end with 5 blank spaces.
If it has 25 characters like this - 0300010000000394002002,51 then I need to replace the new line character at the end with 4 blank spaces and so on.
In my text editor I can use find and replace. I search for the line length by ^(.|\s){24}$ for 24 characters - but this will obviously replace the whole line and I only need to replace the new line character at the end.
I want to specify a new line character AFTER ^(.|\s){24}$. Is this possible?
It sounds like you need two things.
Multi-line Mode (See "Using ^ and $ as Start of Line and...")
Backreferencing
Most editors that support regex support these naturally, but you'll have to let us know what editor you're using for us to be specific. Without knowing what editor you're using, all I can say is that you want to do some combination of the following:
regex subst
----- -----
^(.{24})\n $1 <-- there are spaces here
^(.{24})^M \1 <-- there are spaces here
^(.{24})\s ^^^^^
This question already has answers here:
Removing empty lines in Notepad++
(22 answers)
Closed 9 years ago.
I have a text file with a thousand lines of numbers like so:
402
115
90
...
As you can see there is a blank line in between each number that I want to remove so that I have
402
115
90
...
How can I do this?
Press Ctrl+H (Replace)
Select Extended from SearchMode
Put \r\n\r\n in Find What
Put \r\n in ReplaceWith
Click on Replace All
As of NP++ V6.2.3 (nor sure about older versions) simply:
Go menu -> Edit -> Line operations
Choose "Remove Empty Lines" or "Remove Empty Lines (Containing white spaces)" according to your needs.
By the way, in Notepad++ there's built-in plugin that can handle this:
TextFX -> TextFX Edit -> Delete Blank Lines (first press CTRL+A to select all).
This will remove any number of blank lines
CTRL + H to replace
Select Extended search mode
replace all \r\n with (space)
then switch to regular expression and replace all \s+ with \n
You can record a macro that removes the first blank line, and positions the cursor correctly for the second line. Then you can repeat executing that macro.
This should get your sorted:
Highlight from the end of the first line, to the very beginning of the third line.
Use the Ctrl + H to bring up the 'Find and Replace' window.
The highlighed region will already be plased in the 'Find' textbox.
Replace with: \r\n
'Replace All' will then remove all the additional line spaces not required.
Here's how it should look: