Find pipe character and replace with newline - regex

I currently have a list of a few thousand elements each separated using the "|" character. Is there a way with sublime text to place each item on it's own line?
I have attempted to use the Regex find and replace with the parameters of
Find: |
Where: doc.txt
Replace: \n
For some reason that placed every character on a new line.
For example:
listItem1|newItem2|newItem3|newItem4|newItem5|newItem6
Had placed each letter on a new line, but I was intending for it to find the character and insert a carriage return. much like
listItem1
newitem2
newItem3
newItem4
newItem5
newItem6
Is there a simple way to accomplish this without using a plugin? I've seen some examples using plugins, but I would think there would be a way.

Select one | and press ALT + F3 on windows and linux, CMD + CTRL + G on mac to select all instances in the file, then hit Enter

To use \n in the replace field you need to activate regex.
You need to escape the pipe | character to use it in a regex search.
So activate regex find and replace and use these values:
FIND = \|
REPLACE = \n
That will achieve what you want to do.

Related

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

How to search a word using regex and concatenate it to other words also found by using regex on a per line basis?

I have a file in format:
has | have | had\tmeaning of have\n
apple\tmeaning of apple\n
write | wrote\tmeaning of write\n
I want to have it in the following format:
has\tmeaning of have\n
have\tmeaning of have\n
had\tmeaning of have\n
apple\tmeaning of apple\n
etc. Word(s) (has, have, had) can be single or multiple. Multiple words are seperated by space, pipe character, space. Meaning is followed by tab character and ended by new line. I am not sure but want to assume that meaning may contain pipe or tab character (or better any character except newline). Can it be done in notepad++? If not, is there other easy alternative?
My input file uses actual newline and tab characters. Since I can't paste them in stackoverflow, I have presented them as \n and \t (escape sequences) instead in the examples.
EDIT
It sounds like in your input, the tabs and new lines are not literally inserted. This should work:
Search: \s*([^ |]+) \|\s*(?=.*?\t(.*?)(?=(?:\R|$)))
Replace: \1\t\2\n
Original
In the Replace tab, make sure to check the "regex" box at the bottom left, then use this:
Search: \s*([^ |]+) \|\s*(?=.*?\\t(.*?)(?=(?:\\n|$)))
Replace: \1\t\2\n

How do I remove all non-ASCII characters with regex and Notepad++?

I searched a lot, but nowhere is it written how to remove non-ASCII characters from Notepad++.
I need to know what command to write in find and replace (with picture it would be great).
If I want to make a white-list and bookmark all the ASCII words/lines so non-ASCII lines would be unmarked
If the file is quite large and can't select all the ASCII lines and just want to select the lines containing non-ASCII characters...
This expression will search for non-ASCII values:
[^\x00-\x7F]+
Tick off 'Search Mode = Regular expression', and click Find Next.
Source: Regex any ASCII character
In Notepad++, if you go to menu Search → Find characters in range → Non-ASCII Characters (128-255) you can then step through the document to each non-ASCII character.
Be sure to tick off "Wrap around" if you want to loop in the document for all non-ASCII characters.
In addition to the answer by ProGM, in case you see characters in boxes like NUL or ACK and want to get rid of them, those are ASCII control characters (0 to 31), you can find them with the following expression and remove them:
[\x00-\x1F]+
In order to remove all non-ASCII AND ASCII control characters, you should remove all characters matching this regex:
[^\x1F-\x7F]+
To remove all non-ASCII characters, you can use following replacement: [^\x00-\x7F]+
To highlight characters, I recommend using the Mark function in the search window: this highlights non-ASCII characters and put a bookmark in the lines containing one of them
If you want to highlight and put a bookmark on the ASCII characters instead, you can use the regex [\x00-\x7F] to do so.
Cheers
To keep new lines:
First select a character for new line... I used #.
Select replace option, extended.
input \n replace with #
Hit Replace All
Next:
Select Replace option Regular Expression.
Input this : [^\x20-\x7E]+
Keep Replace With Empty
Hit Replace All
Now, Select Replace option Extended and Replace # with \n
:) now, you have a clean ASCII file ;)
Another good trick is to go into UTF8 mode in your editor so that you can actually see these funny characters and delete them yourself.
Another way...
Install the Text FX plugin if you don't have it already
Go to the TextFX menu option -> zap all non printable characters to #. It will replace all invalid chars with 3 # symbols
Go to Find/Replace and look for ###. Replace it with a space.
This is nice if you can't remember the regex or don't care to look it up. But the regex mentioned by others is a nice solution as well.
Click on View/Show Symbol/Show All Character - to show the [SOH] characters in the file
Click on the [SOH] symbol in the file
CTRL=H to bring up the replace
Leave the 'Find What:' as is
Change the 'Replace with:' to the character of your choosing (comma,semicolon, other...)
Click 'Replace All'
Done and done!
In addition to Steffen Winkler:
[\x00-\x08\x0B-\x0C\x0E-\x1F]+
Ignores \r \n AND \t (carriage return, linefeed, tab)

Unwrap text in Sublime Text 2

I'd like to unwrap lines so that I can turn them from lines with hard line-breaks to no line breaks.
Specifically, this means that contiguous runs of lines with non-whitespace should be joined together Essentially, any \n with no whitespace on either side should be replaced with a single space. Other linebreaks shouldn't get touched.
I feel like it ought to be a search-and-replace with a search string something like (?!\n)\n(?!\n) -> , but that doesn't work, as it doesn't match anything.
Is there an ST2 built-in command for this?
any \n with no whitespace on either side
(?<!\s)\n(?!\s)
other linebreaks shouldn't get touched.
(?<!(?:\s|\n))\n(?!\s)
Replace with ''
As #flow mentioned, there are built-ins for that task. Just select the lines you want to join and press Ctrl + J.
And your way should works too. Only you missed a bit. It should be (?<!\n)\n(?!\n)
The following solution works best for text copied from a console log with 80 columns. It only removes \n if the line touches the last column.
Find:
(.{80})\n
Replace:
$1

Sublime text find and replace with special characters

I suck at regular expressions and what to do a simple find and replace of '}' to '} /n' notepad++ can recongise what I'm after as it has the 3 options of normal find and replace, special chars and full regex. I however only ever used to option two. How can I enable special characters in my search using sublime text 2.
Cheers
Joe,
In sublime you can use find & replace. Drag the replace panel large enough and use cmd + enter to go to a new line.
Hit replace all:
And thats it your sorted chap
If you want to replace } within notepad++ with } \n you can use the following rules:
find: }
Replace with: } \n
Searchmethod: Extended (\n, \e...)
Press replace all