I want an expression that allows number and one dash OR number and one space. Space or dash are optional.
I tried this
/^([0-9]+(-[0-9]+)?)|([0-9]+(\s[0-9]+)?)$/
Accepted regular expressions:
11-222
444 99
You can put the OR in the middle of your expression: ^([0-9]+)(\s|-)([0-9]+)$ works with your examples in Notepad++.
Let's explain your regex.
^ # beginning of line
( # start group 1
[0-9]+ # 1 or more digits
( # start group 2
- # a hyphen
[0-9]+ # 1 or more digits
)? # end group 2, optional
) # end group 1
| # OR
( # start group 3
[0-9]+ # 1 or more digits
( # start group 4
\s # a space
[0-9]+ # 1 or more digits
)? # end group 4, optional
) # end group 3
$ # end of line
The OR acts between the group 1 at the beginning of the line and the group 3 at the end of the line. But you want group 1 and group 3 anchored at the beginning and at the end.
Add a group over group 1 and 3:
^(([0-9]+(-[0-9]+)?)|([0-9]+(\s[0-9]+)?))$
You can use non capture groups (more efficient) instead of capture group
^(?:(?:[0-9]+(?:-[0-9]+)?)|(?:[0-9]+(?:\s[0-9]+)?))$
Combine the hyphen and the space in a character class and remove the superfluous groups:
^[0-9]+(?:[-\s][0-9]+)?$
If your regex flavour supports it, change the [0-9] into \d. Finally your regex becomes:
^\d+(?:[-\s]\d+)?$
Much simpler, no?
I have a snippet of text from EDI X12. I am trying to find lines where a BBQ segment is followed by another BBQ segment. I want to replace all BBQ segments in the second line with BBB
Orig text
HI*BBR<0Y6D0Z1<D8<20190816~
HI*BBQ<05BC0ZZ<D8<20190806*BBQ<05BB0ZZ<D8<20190729*BBQ<06UM07Z<D8<20190729~
HI*BBQ<0JBL0ZZ<D8<20190809*BBQ<0J9N0ZZ<D8<20190816*BBQ<0KBS0ZZ<D8<20190816~
HI*BI<71<RD8<20190716-20190722~
Needs to become
HI*BBR<0Y6D0Z1<D8<20190816~
HI*BBQ<05BC0ZZ<D8<20190806*BBQ<05BB0ZZ<D8<20190729*BBQ<06UM07Z<D8<20190729~
HI*BBB<0JBL0ZZ<D8<20190809*BBB<0J9N0ZZ<D8<20190816*BBB<0KBS0ZZ<D8<20190816~
HI*BI<71<RD8<20190716-20190722~
This targets what I am looking for in capturing group 3, but how to replace BBQ with BBB within that group?
(^HI\*BBQ.+?~\r\n)(^HI\*)(BBQ.+?~\r\n)
Thanks for any ideas!
Ctrl+H
Find what: (?:^HI\*BBQ\b.+?~\RHI\*BB|\G(?!^).*?\bBB)\KQ\b
Replace with: B
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
(?: # non capture group
^ # begining of line
HI\*BBQ # literally
.+? # 1 or more any character but newline
~ # a tilde
\R # any kind of linebreak
HI\*BB # literally
| # OR
\G # restart from last match position
(?!^) # not at the beginning of line
.*?BB # 0 or more any character but newline, not greedy, followed by BB
) # end group
\K # forget all we have seen until this position
Q # the letter Q
Screen capture (before):
Screen capture (after):
My file has 4000k lines. I need to reformat it. So, I am trying notepad++ (or awk). The structure every line is
acc|GENBANK|ABJ91977.1|GENBANK|DQ876324|pol protein Tabulator[Human immunodeficiency virus 1]TabulatorTLWQRPFVTIKVGGQLKEALLDTGADDTVLEEIELPGRWKPKMIGGIGGFIKVRQYDQIXVEICGHKAIGTVLVGPTPVNVIGRNLMTQIGCTLN
The characters among the 4th vertical bar | and the first [ is variable length. Only I am looking for tips or where to focus to do it myself. I tried to print with awk but how there are one part variable in length, I obtained different results. Neither I can select by columns.
I would like to obtain a file with this structure
acc|GENBANK|ABJ91977.1|GENBANK|DQ876324,acc|GENBANK|ABJ91977.1|GENBANK|DQ876324,pol protein
and other file with this structure
acc|GENBANK|ABJ91977.1|GENBANK|DQ876324TabulatorTLWQRPFVTIKVGGQLKEALLDTGADDTVLEEIELPGRWKPKMIGGIGGFIKVRQYDQIXVEICGHKAIGTVLVGPTPVNVIGRNLMTQIGCTLN
TAB are in bold letters - Tabulator
Here is a way to do for the first file.
Ctrl+H
Find what: (^[^|]+(?:\|[^|]+){4})\|(.+?)\h+\[.+$
Replace with: $1,$1,$2
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
( # group 1
^ # beginning of line
[^|]+ # 1 or more non pipe
(?: # start non capture group
\| # a pipe
[^|]+ # 1 or more non pipe
){4} # end group, must appear 4 times
) # end group 1
\| # a pipe
(.+?) # group 2, 1 or more any character but newline, not greedy
\h+ # 1 or more horizontal spaces (space or tabulation)
\[ # 1 openning square bracket
.+ # 1 or more any character but newline
$ # end of line
Replacement:
$1 # content of group 1
, # a comma
$1 # content of group 1
, # a comma
$2 # content of group 2
Result for given example:
acc|GENBANK|ABJ91977.1|GENBANK|DQ876324,acc|GENBANK|ABJ91977.1|GENBANK|DQ876324,pol protein
Screen capture:
For the second file:
Ctrl+H
Find what: (^[^|]+(?:\|[^|]+){4})\|.+?\h+\[.+?\](.+)$
Replace with: $1$2
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
( # group 1
^ # beginning of line
[^|]+ # 1 or more non pipe
(?: # start non capture group
\| # a pipe
[^|]+ # 1 or more non pipe
){4} # end group, must appear 4 times
) # end group 1
\| # a pipe
.+? # 1 or more any character but newline, not greedy
\h+ # 1 or more horizontal spaces (space or tabulation)
\[ # 1 openning square bracket
.+? # 1 or more any character but newline, not greedy
\] # a closing square bracket
(.+) # group 2, 1 or more any character but newline
$ # end of line
Screen capture:
I have following line:
Data 5 in:out:40 Files
I want to match all the strings untill 3rd whitespace, So, in this case, I want to get back
Data 5 in:out:40
How about:
^(\S+\s+\S+\s+\S+)
Lets break this down:
^ # start from string beginning
( # match everything inside (begin)
\S+ # match all non-whitespace(s)
\s+ # whitespace(s)
\d+ # match all non-whitespace(s)
\s+ # whitespace(s)
\S+ # match all non-whitespace(s)
) # match everything inside (end)
You can test the regex in a debugger.
I'm having trouble coming up with the regex I need to do this find/replace in Notepad++. I'm fine with needing a couple of separate searches to complete the process.
Basically I need to add a | at the beginning and end of every line from a CSV, plus replace all the , with |. Then, on any value with only 1 character, I need to put two spaces around the character on each side ("A" becomes " A ")
Source:
col1,col2,col3,col4,col5,col6
name,desc,something,else,here,too
another,,three,,,
single,characters,here,a,b,c
last,line,here,,almost,
Results:
|col1|col2|col3|col4|col5|col6|
|name|desc|something|else|here|too|
|another||three||||
|single|characters|here| a | b | c |
|last|line|here||almost||
Adding the | to the beginning and the end of the line is simple enough, and replacing , with | is obviously straightforward. But I can't come up with the regex to find |x| where x is limited to a single character. I'm sure it is simple, but I'm new to regex.
Regex:
(?:(^)|(?!^)\G)(?:([^\r\n,]{2,})|([^\r\n,]))?(?:(,$)|(,)|($))
Replacement string:
(?{1}|)(?{2}\2)(?{3} \3 )(?{4}||)(?{5}|)(?{6}|)
Ugly, dirty and long but works.
Regex Explanation:
(?: # Start of non-capturing group (a)
(^) # Assert beginning of line (CP #1)
| # Or
(?!^) # //
\G # Match at previous matched position
) # End of non-capturing group (a)
(?: # Start of non-capturing group (b)
([^\r\n,]{2,}) # Match characters with more than 2-char length (any except \r, \n or `,`) (CP #2)
| # Or
([^\r\n,]) # Match one-char string (CP #3)
)? # Optional - End of non-capturing group (b)
(?: # Start of non-capturing group (c)
(,$) # Match `,$` (CP #4)
| # Or
(,) # Match single comma (CP #5)
| # Or
($) # Assert end of line (CP #6)
) # End of non-capturing group (c)
Three Step Solution:
Pattern: ^.+$ Replacement: |$0|
Pattern: , Replacement: |
Pattern: (?<=\|)([^|\r\n])(?=\|) Replacement: $0
The first replace adds | at the beginning and at the end, and replaces commas:
Search: ^|$|,
Replace: |
The second replace adds space around single character matches:
Search: (?<=[|])([^|])(?=[|])
Replace: $1
Add spaces to the left and to the right of $1.