How to remove everything except first column from text file in Notepad++? - regex

I have a huge text file ( 375K lines ). All I want is the first column of the text file. I am using notepad++. How can I remove everything except that first column?
Deleting using column select is impossible in such file. I think regex can help me or may be some plugin.
Edit
#Bolt: Column: Consider this as first 12 characters or [space] then numbers then [space]

To use column-mode select, you can use Alt-Shft-Arrow keys or Alt + Left mouse click

Search for
^(............).*
and replace with \1
Turn on regular expression mode.
^ match the start of the row
(............) matches 12 characters (no matter what) and stores it in \1
.* matches everything else in the row that will be removed.

Select the block consisting of your first column using ALT+SHIFT and cursor keys or ALT+Mouse.
Copy the Block (CTRL+C)
Select All (CTRL+A)
Paste the copied Block (CTRL+V)
Done

Related

Regex - replace folder details with filename

Completely new to Regex so I was hoping I could find an answer here.
I'm using Notepad++, and I have a big bulk of file details from a folder in a text document, like so:
01/01/2015 08:00 1,000,000 filename.exe
01/02/2015 08:30 1,450,000 aDifferentFilename.exe
And I want to do a find and replace so that the whole thing is replaced by:
filename.exe
aDifferentFilename.exe
I could delete them manually, but there's over a thousand lines!
I've used ^(.*)% to find the lines one by one, but what would I put in the replace field to keep the filename, i.e filename.exe?
Any help/explanation would be great!
In Notepad++'s find dialog, click on the tab for "replace" (probably obvious, but to be complete). Make sure the radio button for "Regular expression" is checked (again, probably obvious). In the "Find what:" text box enter:
^([^ ]+[ ]+){3}(.*)$
if the pattern in your file is consistently four total fields of information (including the file names) each separated by spaces. Explanation: finds three groups of one or more non spaces followed by one or more spaces followed by everything else on the line. "Everything else on the line" is assigned to group 2 (it is enclosed in the second set of parenthesis the expression). We will use this fact below to specify the "Replace with:" string. This is necessary to advance the search position past the text we want to keep, otherwise after the replacement it would match the expression, and would itself be replaced.
Enter this:
^(.{34})(.*)$
if the consistent pattern in your file is that the file name always starts in the 35th column (both patterns could hold true, in which case you could use either). Explanation: This finds the first 34 characters at the start of each line followed by everything else on the line. See explanation above why we want to "find everything else on the line." Note that it is not necessary to group ".{34}" in parenthesis, I simply did this so that in both exampls the "replace with:" text would be group 2.
In the "Replace with:" text box enter \2
Explanation: This tells Notepad++ to replace what we matched with the group 2 subset of what we matched, in other words, "everything else on the line", which in this case is the file name.
Click "Replace"
Another option: If the text you want to keep always starts in column 35 (like required for the approach immediately above), you can select the column of text you want to delete by holding down ctrl+alt+shift and then left clicking with your mouse and dragging. Once the text is selected, hit delete
You can try matching on either 3 sets of spaces, or assume the comma is always fixed. Here is something quick and dirty which matches the comma in a greedy fashion, and 5 characters after that.
^(.*,.....)

Changing order of csv-file entries with regex replacement in Notepad++

I am trying to change the order of the entries of an *.csv-file with Notepad++ built-in find/replace function. This is how the file looks like now:
ABC;DEF;Here comes some long text with ,.- in it;true;false;
QWE;RTY;Here comes some long text with ,.- in it;true;false;
And this is how it should look like after find/replace:
DEF;Here comes some long text with ,.- in it;ABC;true;false;;
RTY;Here comes some long text with ,.- in it;QWE;true;false;;
So column #1 should be at the position of #3, column number #2 and #3 should shift one to the left.
What I tried so far:
I tried to get the first three columns with an regular expression in the find field, put some brackets around them and reorder them with the $ sign in the replace field. But my regex matches for nearly the whole line, not only the first three columns- what am I doing wrong? Here is my regex:
([A-Z]{3})\;([A-Z]{3})\;(.*[^\;])\;
The first two columns and the following ; are select properly, the problem must be in the third round bracket. But I have no clue what the problem is. The third expression should match to everything except ; and is ended by an ;.
The content of the replacement field should be $2;$3;$1;, I guess that's right.
The main problem is that you're escaping the semi-colons unnecessarily. Use this expression ^(?s)([A-Z]{3};)([A-Z]{3};)([^\n\r;]*;) and replace it with this expression $2$3$1
Have included line delimiters \r or \n too in case of a line with fewer columns. Also you should use start of string anchor ^ to be safe if you have more columns.

Unwrap text in Sublime Text 2

I'd like to unwrap lines so that I can turn them from lines with hard line-breaks to no line breaks.
Specifically, this means that contiguous runs of lines with non-whitespace should be joined together Essentially, any \n with no whitespace on either side should be replaced with a single space. Other linebreaks shouldn't get touched.
I feel like it ought to be a search-and-replace with a search string something like (?!\n)\n(?!\n) -> , but that doesn't work, as it doesn't match anything.
Is there an ST2 built-in command for this?
any \n with no whitespace on either side
(?<!\s)\n(?!\s)
other linebreaks shouldn't get touched.
(?<!(?:\s|\n))\n(?!\s)
Replace with ''
As #flow mentioned, there are built-ins for that task. Just select the lines you want to join and press Ctrl + J.
And your way should works too. Only you missed a bit. It should be (?<!\n)\n(?!\n)
The following solution works best for text copied from a console log with 80 columns. It only removes \n if the line touches the last column.
Find:
(.{80})\n
Replace:
$1

How to group lines of text using Notepad++

I find Notepad++ regex to be very different from regex in Microsoft Word. I was wondering how I can group several lines of text using Notepad++. I have a text file with 100+ URLs. They are written one URL address per line. I would like to group all of them by tens by removing the carriage returns from every first to 9th line, but retaining the carriage return on every 10th line and adding another carriage return thereafter. For example:
I want this:
http://website1.com
http://website2.com
http://website3.com
http://website4.com
http://website5.com
http://website6.com
http://website7.com
http://website8.com
http://website9.com
http://website10.com
http://website11.com
http://website12.com
http://website13.com
http://website14.com
http://website15.com
http://website16.com
http://website17.com
http://website18.com
http://website19.com
http://website20.com
http://website21.com
http://website22.com
http://website23.com
http://website24.com
http://website25.com
http://website26.com
http://website27.com
http://website28.com
http://website29.com
http://website30.com
to look like:
http://website1.comhttp://website2.comhttp://website3.comhttp://website4.comhttp://website5.comhttp://website6.comhttp://website7.comhttp://website8.comhttp://website9.comhttp://website10.com
http://website11.comhttp://website12.comhttp://website13.comhttp://website14.comhttp://website15.comhttp://website16.comhttp://website17.comhttp://website18.comhttp://website19.comhttp://website20.com
http://website21.comhttp://website22.comhttp://website23.comhttp://website24.comhttp://website25.comhttp://website26.comhttp://website27.comhttp://website28.comhttp://website29.comhttp://website30.com
Any help would be appreciated!
Ok, I have found a way:
There is a such possibility, but only with 6 entries in a row (longest regex is not parsed by the Notepad++).
1)So, open the file and remove from it all newlines characters, so the text will be a long-long line.
2)Open replace dialog, insert in the "Find what" field the next :
(http://[^\:]*\.comhttp://[^\:]*\.comhttp://[^\:]*\.comhttp://[^\:]*\.comhttp://[^\:]*\.comhttp://[^\:]*\.com)
and in the "Replace With" the next:
\1\r\n
Put the cursor at the first position in the text and press "Replace all"
So, the regex contains this (http://[^\:]*\.com){6} (the regex is repeated 6 times). If you work with Unix and you need unix-type new line style, replace this : \1\r\n with this \1\n

Removing empty lines in Notepad++

How can I replace empty lines in Notepad++? I tried a find and replace with the empty lines in the find, and nothing in the replace, but it did not work; it probably needs regex.
There is now a built-in way to do this as of version 6.5.2
Edit -> Line Operations -> Remove Empty Lines or Remove Empty Lines (Containing Blank characters)
You need something like a regular expression.
You have to be in Extended mode
If you want all the lines to end up on a single line use \r\n. If you want to simply remove empty lines, use \n\r as #Link originally suggested.
Replace either expression with nothing.
There is a plugin that adds a menu entitled TextFX. This menu, which houses a dizzying array of quick text editing options, gives a person the ability to make quick coding changes. In this menu, you can find selections such as Drop Quotes, Delete Blank Lines as well as Unwrap and Rewrap Text
Do the following:
TextFX > TextFX Edit > Delete Blank Lines
TextFX > TextFX Edit > Delete Surplus Blank Lines
notepad++
Ctrl-H
Select Regular Expression
Enter ^[ \t]*$\r?\n into find what, leave replace empty. This will match all lines starting with white space and ending with carriage return (in this case a windows crlf)
Click the Find Next button to see for yourself how it matches only empty lines.
Press ctrl + h (Shortcut for replace).
In the Find what zone, type ^\R ( for exact empty lines) or ^\h*\R ( for empty lines with blanks, only).
Leave the Replace with zone empty.
Check the Wrap around option.
Select the Regular expression search mode.
Click on the Replace All button.
You can follow the technique as shown in the following screenshot:
Find what: ^\r\n
Replace with: keep this empty
Search Mode: Regular expression
Wrap around: selected
NOTE: for *nix files just find by \n
This worked for me:
Press ctrl + h (Shortcut for replace)
Write one of the following regex in find what box.
[\n\r]+$ or ^[\n\r]+
Leave Replace with box blank
In Search Mode, select Regex
Click on Replace All
Done!
In notepad++ press CTRL+H , in search mode click on the "Extended (\n, \r, \t ...)" radio button then type in the "Find what" box: \r\n (short for CR LF) and leave the "Replace with" box empty..
Finally hit replace all
Well I'm not sure about the regex or your situation..
How about CTRL+A, Select the TextFX menu -> TextFX Edit -> Delete Blank Lines and viola all blank line gone.
A side note - if the line is blank i.e. does not contain spaces, this will work
1) Ctrl + H ( Or Search 🠆 Replace..) to open Replace window.
2) Select 'Search Mode' 'Regular expression'
3) In 'Find What' type ^(\s*)(.*)(\s*)$ & in 'Replace With' type \2
^ - Matches start of line character
(\s*) - Matches empty space characters
(.*) - Matches any characters
(\s*) - Matches empty spaces characters
$ - Matches end of line character
\2 - Denotes the matching contend of the 2nd bracket
Refer https://www.rexegg.com/regex-quickstart.html for more on regex.
You can search for the following regex: ^(?:[\t ]*(?:\r?\n|\r))+ and replace it with empty field
Ctrl+H.
find - \r\r
replace with - \r.
This obviously does not work if the blank lines contain tabs or blanks. Many web pages (e.g. http://www.guardian.co.uk/) contain these white lines, as a result of a faulty HTML editor.
Remove white space using regular expression as follows:
change pattern: [\t ]+$
into nothing.
where [\t ] matches either tab or space. '+' matches one or more occurrences, and '$' marks the end of line.
Then use notepad++/textFX to remove single or extra empty lines.
Be sure that these blank lines are not significant in the given context.
Edit >> Blank Operations >> Trim Leading and Trailing Spaces (to remove black tabs and spaces in empty lines)
Ctrl + H to get replace window and replace pattern: ^\r\n with nothing (select regular expression)
Note: step 1 will remove your code intendation done via tabs and blank spaces
Sometimes \n\r etc not work, here to figure it out, what your actually regular expression should be.
Advantage of this trick: If you want to replace in multiple file at once, you must need this method. Above will not work...
CTRL+A, Select the TextFX menu -> TextFX Edit -> Delete Blank Lines as suggested above works.
But if lines contains some space, then move the cursor to that line and do a CTRL + H. The "Find what:" sec will show the blank space and in the "Replace with" section, leave it blank.
Now all the spaces are removed and now try CTRL+A, Select the TextFX menu -> TextFX Edit -> Delete Blank Lines
/n/r assumes a specific type of line break. To target any blank line you could also use:
^$
This says - any line that begins and then ends with nothing between. This is more of a catch-all. Replace with the same empty string.
I did not see the combined one as answer, so search for ^\s+$ and replace by {nothing}
^\s+$ means
^ start of line
\s+ Matches minimum one whitespace character (spaces, tabs, line breaks)
$ until end of line
This pattern is tested in Notepad++ v8.1.1
It replaces all spaces/tabs/blank lines before and after each row of text.
It shouldn't mess with anything in the middle of the text.
Find: ^(\s|\t)+|(\s|\t)+$
Replace: leave this blank
Before:
_____________________________________
\tWORD\r\n
\r\n
\tWORD\s\tWORD\s\t\r\n
\r\n
\r\n
WORD\s\s\tWORD\t\sWORD\s\r\n
\t\r\n
\s\s\s\r\n
WORD\s\sWORD\s\s\t\r\n
____________________________________
After:
_____________________________________
WORD\r\n
WORD\s\tWORD\r\n
WORD\s\s\tWORD\t\sWORD\r\n
WORD\s\sWORD
_____________________________________
A few of the above expressions and extended expressions did not work for me, but the regular expression "$\n$" did.
An easy alternative for removing white space from empty lines:
TextFX>TextFX Edit> Trim Trailing Spaces
This will remove all trailing spaces, including trailing spaces in blank lines.
Make sure, no trailing spaces are significant.
this work for me:
SEARCH:^\r
REPLACE: (empty)