Notepad++ regex to find single character bounded by | - regex

I'm having trouble coming up with the regex I need to do this find/replace in Notepad++. I'm fine with needing a couple of separate searches to complete the process.
Basically I need to add a | at the beginning and end of every line from a CSV, plus replace all the , with |. Then, on any value with only 1 character, I need to put two spaces around the character on each side ("A" becomes " A ")
Source:
col1,col2,col3,col4,col5,col6
name,desc,something,else,here,too
another,,three,,,
single,characters,here,a,b,c
last,line,here,,almost,
Results:
|col1|col2|col3|col4|col5|col6|
|name|desc|something|else|here|too|
|another||three||||
|single|characters|here| a | b | c |
|last|line|here||almost||
Adding the | to the beginning and the end of the line is simple enough, and replacing , with | is obviously straightforward. But I can't come up with the regex to find |x| where x is limited to a single character. I'm sure it is simple, but I'm new to regex.

Regex:
(?:(^)|(?!^)\G)(?:([^\r\n,]{2,})|([^\r\n,]))?(?:(,$)|(,)|($))
Replacement string:
(?{1}|)(?{2}\2)(?{3} \3 )(?{4}||)(?{5}|)(?{6}|)
Ugly, dirty and long but works.
Regex Explanation:
(?: # Start of non-capturing group (a)
(^) # Assert beginning of line (CP #1)
| # Or
(?!^) # //
\G # Match at previous matched position
) # End of non-capturing group (a)
(?: # Start of non-capturing group (b)
([^\r\n,]{2,}) # Match characters with more than 2-char length (any except \r, \n or `,`) (CP #2)
| # Or
([^\r\n,]) # Match one-char string (CP #3)
)? # Optional - End of non-capturing group (b)
(?: # Start of non-capturing group (c)
(,$) # Match `,$` (CP #4)
| # Or
(,) # Match single comma (CP #5)
| # Or
($) # Assert end of line (CP #6)
) # End of non-capturing group (c)

Three Step Solution:
Pattern: ^.+$ Replacement: |$0|
Pattern: , Replacement: |
Pattern: (?<=\|)([^|\r\n])(?=\|) Replacement: $0

The first replace adds | at the beginning and at the end, and replaces commas:
Search: ^|$|,
Replace: |
The second replace adds space around single character matches:
Search: (?<=[|])([^|])(?=[|])
Replace: $1
Add spaces to the left and to the right of $1.

Related

How to do text wrapping without adding newline if the residue is short?

Description
Say I have a lot of strings, some of them are very long:
Aim for the moon. If you miss, you may hit a star. ā€“ Clement Stone
Nothing about us without us
I want to have a text wrapper doing this algorithm:
Starting from the beginning of the string, identify the nearest blank character ( ) that around position 25
If the residue is smaller than 5 character-length, then do nothing. If not, replace that blank character with \n
Identify the next nearest blank character in the end of the next 25 characters
Return to 2 until end of line
So that text will be replaced to:
Aim for the moon. If you\nmiss, you may hit a star.\nā€“ Clement Stone
Nothing about us without us
Attempt 1
Consulting Wrapping Text With Regular Expressions
Matching pattern: (.{1,25})( +|$\n?)
Replacing pattern: $1\n
But this will produce Nothing about us without\nus, which is not preferable.
Attempt 2
Using a Lookahead Construct in a If-Then-Else Conditionals:
Matching pattern: (.{1,25})(?(?=(.{1,5}$).*))( +|$\n?)
Replacing pattern: $1$2\n
It still produce Nothing about us without\nus, which is not preferable.
Created this based on #sln 's? answer to a different word wrap problem.
All I have added is this alternative point to add a line break:
"Expand by up to 5 characters until before a linebreak or EOS"
and changed the number of characters allowed from 50 to 25
[^\r\n]{1,5}(?=\r?\n|$)
Compressed
(?:((?>.{1,25}(?:[^\r\n]{1,5}(?=\r?\n|$)|(?<=[^\S\r\n])[^\S\r\n]?|(?=\r?\n)|$|[^\S\r\n]))|.{1,25})(?:\r?\n)?|(?:\r?\n|$))
Replacement
$1 followed by a linebreak
$1\r\n
Preview
https://regex101.com/r/pRqdhi/1
Detailed Regular Expression
(?:
# -- Words/Characters
( # (1 start)
(?> # Atomic Group - Match words with valid breaks
.{1,25} # 1-N characters
# Followed by one of 4 prioritized, non-linebreak whitespace
(?: # break types:
[^\r\n]{1,5}(?=\r?\n|$) # Expand by up to 5 characters until before a linebreak or EOS
|
(?<= [^\S\r\n] ) # 1. - Behind a non-linebreak whitespace
[^\S\r\n]? # ( optionally accept an extra non-linebreak whitespace )
| (?= \r? \n ) # 2. - Ahead a linebreak
| $ # 3. - EOS
| [^\S\r\n] # 4. - Accept an extra non-linebreak whitespace
)
) # End atomic group
|
.{1,25} # No valid word breaks, just break on the N'th character
) # (1 end)
(?: \r? \n )? # Optional linebreak after Words/Characters
|
# -- Or, Linebreak
(?: \r? \n | $ ) # Stand alone linebreak or at EOS
)
If your input is run line-by-line, and there is no newline character in the middle of a line, then you can try this:
Pattern: (.{1,25}.{1,5}$|.{1,25}(?= ))
Substitution: $1\n
Then apply this:
Pattern: \n
Substitution: \n

Regex match text after last '-'

I am really stuck with the following regex problem:
I want to remove the last piece of a string, but only if the '-' is more then once occurring in the string.
Example:
BOL-83846-M/L -> Should match -M/L and remove it
B0L-026O1 -> Should not match
D&F-176954 -> Should not match
BOL-04134-58/60 -> Should match -58/60 and remove it
BOL-5068-4 - 6 jaar -> Should match -4 - 6 jaar and remove it (maybe in multiple search/replace steps)
It would be no problem if the regex needs two (or more) steps to remove it.
Now I have
[^-]*$
But in sublime it matches B0L-026O1 and D&F-176954
Need your help please
You can match the first - in a capture group, and then match the second - till the end of the string to remove it.
In the replacement use capture group 1.
^([^-\n]*-[^-\n]*)-.*$
^ Start of string
( Capture group 1
[^-\n]*-[^-\n]* Match the first - between chars other than - (or a newline if you don't want to cross lines)
) Capture group 1
-.*$ Match the second - and the rest of the line
Regex demo
You can match the following regular expression.
^[^-\r\n]*(?:$|-[^-\r\n]*(?=-|$))
Demo
If the string contains two or more hyphens this returns the beginning of the string up to, but not including, the second hyphen; else it returns the entire string.
The regular expression can be broken down as follows.
^ # match the beginning of the string
[^-\r\n]* # match zero or more characters other than hyphens,
# carriage returns and linefeeds
(?: # begin a non-capture group
$ # match the end of the string
| # or
- # match a hyphen
[^-\r\n]* # match zero or more characters other than hyphens,
# carriage returns and linefeeds
(?= # begin a positive lookahead
- # match a hyphen
| # or
$ # match the end of the string
) # end positive lookahead
) # end non-capture group

I'm trying to delete everything but the email but I get the opposite

I tried to search in other questions here but none seem to work for me
I'm using notepadd++ and i'm trying to remove everything but email from a email list where everything is in the same line.
Example:
| RONNAN FERREIRA | RENANRFCRON#GMAIL.COM 17933 | RONNE YAN CANAVARRO DE ASSIS |
This regex seems to select every email perfecly:
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
When i put $1 or \1 on "replace with" it just delete all the emails, i want it to do the opposite.
Ctrl+H
Find what: \G.*?(\S+#\S+)(?:(?=.*#)|.*$)
Replace with: $1 <-- there is a space after $1, you may add any character you want as delimiter
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
\G # restart from last match position
.*? # 0 or more any character, not greedy
(\S+#\S+) # group 1, an email
(?: # non capture group
(?=.*#) # positive lookahead, check if we have an # after
| # OR
.*$ # 0 or more any character until end of line
) # end group
Screen capture (before):
Screen capture (after):
Another option might be to either capture an email format in a capturing group, or match all parts divided by 1+ or more horizontal whitespace chars that do not contain an #.
To match only the spaces after an email except for the last one, you could make use of a conditional which has the form (?(?=regex)then|else)
In the replacement use $1
([^\h#]+#[^\h#]+(?(?=[^#\r\n]*#)\h+))|\h*(?<!\S)[^\h#]+(?:\h+[^\h#]+)*(?!\S)\h*
In parts:
( Capture group 1
[^\h#]+#[^\h#]+ Match an e-mail like format
(?(?=[^#\r\n]*#)\h+) Conditional, match 1+ horizontal whitespace chars if there is another email
) Close group
| Or
\h* Match 0+ horizontal whitespace chars
(?<!\S) Assert what is on the left is not a non whitespace char
[^\h#]+(?:\h+[^\h#]+)* Repeat matching the parts that don't have an #
(?!\S) Assert what is on the right is not a non whitespace char
\h* Match 0+ horizontal whitespace chars
Regex demo
A test for the example content with multiple email addresses
| RONNAN FERREIRA | RENANRFCRON#GMAIL.COM 17933 | RONNE YAN CANAVARRO DE ASSIS | RENANRFCRON#GMAIL.COM 17933 |

Notepad++ replace with spaces

Using a regex in Notepad++ I am trying to replace 53 characters on a line with spaces:
Find: (^RS.{192})(.{53})(.{265})
Replace: \1(\x20){53}\3
It's replacing group \2 with " {53}" but what I want is 53 spaces.
How do you do this?
Replacement terms are not regex expressions, except they may use back references.
Just code 53 literal spaces:
Replace: \1 \3
A bit tedious, but it works.
space is \s
which means you need to use \s{53}
Assuming there is ALLWAYS RS and 192 characters before and 265 after
Ctrl+H
Find what: (?:^RS.{192}|\G)\K.(?=.{265,}$)
Replace with: # a space
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
(?: # start non capture group
^ # beginning of line
RS # literally RS
.{192} # 192 any character
| # R
\G # restart from last match position
) # end group
\K # forget all we've seen until this position
. # 1 any character
(?= # positive lookahead, zero-length assertion to make sure we have after:
.{265,} # at least 256 any characters
$ # end of line
) # en lookahead
Replacement:
% # the character to insert
Given shorter line to illusrate:
RSabcdefghijklmnopqrstuvwxyz
Result for given example:
RSabcdefghij qrstuvwxyz
Screen shot:

Find an item in the text with exceptions[Regular Expression]

Please help create a regular expression that would be allocated "|" character everywhere except parentheses.
example|example (example(example))|example|example|example(example|example|example(example|example))|example
After making the selection should have 5 characters "|" are out of the equation. I want to note that the contents within the brackets should remain unchanged including the "|" character within them.
Considering you want to match pipes that are outside any set of parentheses, with nested sets, here's the pattern to achieve what you want:
Regex:
(?x) # Allow comments in regex (ignore whitespace)
(?: # Repeat *
[^(|)]*+ # Match every char except ( ) or |
( # 1. Group 1
\( # Opening paren
(?: # chars inside:
[^()]++ # a. everything inside parens except nested parens
| # or
(?1) # b. nested parens (recurse group 1)
) #
\) # Until closing paren.
)?+ # (end of group 1)
)*+ #
\K # Keep text out of match
\| # Match a pipe
regex101 Demo
One-liner:
(?:[^(|)]*+(\((?:[^()]++|(?1))\))?+)*+\K\|
regex101 Demo
This pattern uses some advanced features:
Possessive quantifiers
Recursion
Resetting the match start