Pattern regex substitution in Notepad++ [closed] - regex

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
How do I achieve this kind of regex substitution in Notepad++ & Linux / Unix Korn shell (Plain BSD Linux)?
z1.9z.01.01 Yabdadba do
da.8p.25.7p Foobar
tg.7j.75.2q Whatever
90.6q.88.zx Jane Doe
Note the char. I am not sure what you want to call it.
Substitution #1
o/p should be
Yabdadba do
Foobar
Whatever
Jane Doe
Substitution #2
o/p should be
9z Yabdadba do
8p Foobar
7j Whatever
6q Jane Doe
Substitution #3
o/p should be
z1.9z.01.01
da.8p.25.7p
tg.7j.75.2q
90.6q.88.zx
I tried using ^.* and $ with the regex option, but it won't do anything.

Using the assumption that the parts are fixed and of this form XX.XX.XX.XX
For Substitution # 1
Find (?m)^[^.\s]{2}(?:\.[^.\s]{2}){3}[^\S\r\n]+(?=\S.*)
Replace nothing
(?m) # Multi-line mode
^ # BOL
[^.\s]{2} # Four parts separated by dot's
(?: \. [^.\s]{2} ){3}
[^\S\r\n]+ # Whitespace following
(?= \S .* ) # Must be some text here
For Substitution # 2
Find (?m)^[^.\s]{2}\.([^.\s]{2})(?:\.[^.\s]{2}){2}(?=[^\S\r\n]+\S.*)
Replace ' $1 '
(?m) # Multi-line mode
^ # BOL
[^.\s]{2} # Four parts separated by dot's
\.
( [^.\s]{2} ) # (1)
(?: \. [^.\s]{2} ){2}
(?= # Whitespace following
[^\S\r\n]+
\S .* # Must be some text here
)
For Substitution # 3
Find (?m)^([^.\s]{2}(?:\.[^.\s]{2}){3})[^\S\r\n]+\S.*
Replace $1
(?m) # Multi-line mode
^ # BOL
( # (1 start), Four parts separated by dot's
[^.\s]{2}
(?: \. [^.\s]{2} ){3}
) # (1 end)
[^\S\r\n]+ # Whitespace following
\S .* # Must be some text here

^([a-z0-9]+?[.]([a-z0-9]+?)[.][a-z0-9]+?[.][a-z0-9]+?[ ]+(.+)$
Capture group 1 contains the dotted strings
Capture group 2 contains the second term of the dotted strings
Capture group 3 contains the names on the right side.
You can try at regex tester online

Since you mentioned Unix shell:
cut -f2 yourfile or awk '{print $2}' yourfile
awk -F"[\t.]" '{print $2, $5}' yourfile
cut -f1 yourfile or awk '{print $1}' yourfile
cut selects fields from files, so your first and last question demanded to select the second and first field. awk is more versatile but can be used for the same task.
Your second question asks for printing the second and fifths fields (fields separated by either tab or ".").

For notepad++ :
Substitution # 1
find = ^.*?\s+(.*?)$
repalce = \1
Substitution # 2
find = ^(\w{2})\.(\w{2})\.(\w{2})\.(\w{2})\s+(.*?)$
repalce = \2 \5
Substitution # 3
find = ^([a-z0-9.]+).*?$
repalce = \1

Related

How to parse the following date using grep command in bash

Given date in the json file as "ts":"2021-04-23T13:11:57Z" or "2021-05-05T07:22:54+05:00" I want to read the string using grep.
Need help in forming the regex of the last part i.e the time zone.
My current command goes like
grep -Po '"ts":"\K([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-2][0-9]:[0-5][0-9]:[0-5][0-9]+Z this works fine for the first format how do i modify it so that it works on both of the formats..
With your shown samples with GNU grep's PCRE option, you could try following regex to match both of the timings.
grep -oP '(?:"ts":)?"\d{4}-\d{2}-\d{2}T(?:[0-1][1-9]|2[0-4]):(?:[0-4][0-9]|5[0-9])[+:](?:[0-4][0-9]|5[0-9])(?:Z"|\+(?:[0-4][0-9]|5[0-9]):(?:[0-4][0-9]|5[0-9])")' Input_file
Explanation: Adding detailed explanation for above.
(?:"ts":)? ##In a non-capturing group matching "ts": keeping it optional here.
"\d{4}-\d{2}-\d{2}T ##Matching " followed by 4 digits-2digits-2digits T here.
(?: ##Starting 1st non-capturing group here.
[0-1][1-9]|2[0-4] ##Matching 0 to 19 and 20 to 24 here to cover 24 hours.
): ##Closing 1st non-capturing group followed by colon here.
(?: ##Starting 2nd non-capturing group here.
[0-4][0-9]|5[0-9] ##Matching 00 to 59 for mins here.
) ##Closing 2nd non-capturing group here.
[+:] ##Matching either + or : here.
(?: ##Starting 3rd capturing group here.
[0-4][0-9]|5[0-9] ##Matching 00 to 59 for seconds here.
) ##Closing 3rd non-capturing group here.
(?: ##Starting 4th non-capturing group here.
Z"|\+ ##Matching Z" OR +(literal character) here.
(?: ##Starting non-capturing group here.
[0-4][0-9]|5[0-9] ##Matching 00 to 59 here.
) ##Closing non-capturing group here.
: ##Matching colon here.
(?: ##Starting non-capturing group here.
[0-4][0-9]|5[0-9] ##Matching 00 to 59 here.
)" ##Closing non-capturing group here, followed by "
) ##Closing 4th non-capturing group here.
You can use the following to parse either time string from the line. You will need to isolate the line beginning with "ts:" first. For example the following grep expression will do:
grep -Po '[0-9+TZ:-]{2,}'
Which simply extracts the string of characters made up of [0-9+TZ:-] where there is a repetition of at least {2,}.
Example Use
$ echo '"ts":"2021-04-23T13:11:57Z"' | grep -Po '[0-9+TZ:-]{2,}'
2021-04-23T13:11:57Z
and
$ echo '"ts":"2021-05-05T07:22:54+05:00"' | grep -Po '[0-9+TZ:-]{2,}'
2021-05-05T07:22:54+05:00
The normal caveats apply, you are better served using a json aware utility like jq. That said, you can separate the values with grep, but you must take care in isolating the line.
You can use sed to isolate the line using the normal /match/s/find/replace/ form with a capture group and backreference. For example you can use:
sed -En '/^[[:blank:]]*"ts"/s/^.*"([0-9+TZ:-]+)"$/\1/p'
Which matches the line beginning with ^[[:blank:]]*"ts" before extraction and the -n suppresses the normal printing of pattern-space so that only the wanted text is output, e.g.
Example Use
$ echo '"ts":"2021-04-23T13:11:57Z"' | sed -En '/^[[:blank:]]*"ts"/s/^.*"([0-9+TZ:-]+)"$/\1/p'
2021-04-23T13:11:57Z
and
$ echo '"ts":"2021-05-05T07:22:54+05:00"' | sed -En '/^[[:blank:]]*"ts"/s/^.*"([0-9+TZ:-]+)"$/\1/p'
2021-05-05T07:22:54+05:00
For such a specific string, another option with a bit broader match could be
grep -Po '(?:"ts":)?"\K\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d(?:Z|[+-]\d\d:\d\d)(?=")' file
Explanation
(?:"ts":)? Optionally match "ts":
"\K Match " and clear the match buffer (forget what is matched so far)
\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d Match a date time like pattern with a T char in between
(?: Non capture group
Z Match a Z char
| Or
[+-]\d\d:\d\d Match + or - and 2 digits : 2 digits
) Close non capture group
(?=") Positive lookahead, assert " directly to the right
Output
2021-04-23T13:11:57Z
2021-05-05T07:22:54+05:00
Or using -E for extended regular expressions (which will include the outer double quotes)
grep -Eo '("ts":)?"[0-9]{4}-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9](Z|[+-][0-9][0-9]:[0-9][0-9])"' ./file

Using regex to extract text inside two characters

I'm trying to extract some text from a set of strings. I have three cases on those strings
X | A | Y
A | Y
A
Where A is the text I want to extract. I've tried using (?:\|)(.*?)(?:\|) which only works on the first case and have been trying to combine several options I've seen in other questions but no luck so far, if I match a case, the other cases won't be matched.
If I understand you correctly, try:
(?:.*?\|([^\|]+)\|.*?)|(^[^\|]+)
The result will be in either capturing group 1 or group 2
I think this will work (?:^|(?<=\|))\s*(A)\s*(?:(?=\|)|$)
It finds the A substring in capture group 1
This is definitely a case where you need assertions.
I don't think it will work without them.
Explained:
(?:
^ # BOS
| # or,
(?<= \| ) # | behind
)
\s* # optional wsp trim
( A ) # (1), What your looking for
\s* # optional wsp trim
(?:
(?= \| ) # | ahead
| # or,
$ # EOS
)

Notepad++ regex to find single character bounded by |

I'm having trouble coming up with the regex I need to do this find/replace in Notepad++. I'm fine with needing a couple of separate searches to complete the process.
Basically I need to add a | at the beginning and end of every line from a CSV, plus replace all the , with |. Then, on any value with only 1 character, I need to put two spaces around the character on each side ("A" becomes " A ")
Source:
col1,col2,col3,col4,col5,col6
name,desc,something,else,here,too
another,,three,,,
single,characters,here,a,b,c
last,line,here,,almost,
Results:
|col1|col2|col3|col4|col5|col6|
|name|desc|something|else|here|too|
|another||three||||
|single|characters|here| a | b | c |
|last|line|here||almost||
Adding the | to the beginning and the end of the line is simple enough, and replacing , with | is obviously straightforward. But I can't come up with the regex to find |x| where x is limited to a single character. I'm sure it is simple, but I'm new to regex.
Regex:
(?:(^)|(?!^)\G)(?:([^\r\n,]{2,})|([^\r\n,]))?(?:(,$)|(,)|($))
Replacement string:
(?{1}|)(?{2}\2)(?{3} \3 )(?{4}||)(?{5}|)(?{6}|)
Ugly, dirty and long but works.
Regex Explanation:
(?: # Start of non-capturing group (a)
(^) # Assert beginning of line (CP #1)
| # Or
(?!^) # //
\G # Match at previous matched position
) # End of non-capturing group (a)
(?: # Start of non-capturing group (b)
([^\r\n,]{2,}) # Match characters with more than 2-char length (any except \r, \n or `,`) (CP #2)
| # Or
([^\r\n,]) # Match one-char string (CP #3)
)? # Optional - End of non-capturing group (b)
(?: # Start of non-capturing group (c)
(,$) # Match `,$` (CP #4)
| # Or
(,) # Match single comma (CP #5)
| # Or
($) # Assert end of line (CP #6)
) # End of non-capturing group (c)
Three Step Solution:
Pattern: ^.+$ Replacement: |$0|
Pattern: , Replacement: |
Pattern: (?<=\|)([^|\r\n])(?=\|) Replacement: $0
The first replace adds | at the beginning and at the end, and replaces commas:
Search: ^|$|,
Replace: |
The second replace adds space around single character matches:
Search: (?<=[|])([^|])(?=[|])
Replace: $1
Add spaces to the left and to the right of $1.

RegEx replace all after and all before [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a file with about 2000 lines and the columns are divided with ,.
I need to replace all dots ., that are after the 10th comma , with a comma. However, I do not replace any dots that are before that 10th comma on each line.
How can I make it replace all dots after the 10th comma with commas?
Find what:
(^(?:[^,\n]*,){10}[^.\n]*|(?!^)\G[^.\n]*).
Replace with:
$1,
Place the cursor at the beginning of the line. Then Replace All.
Explanation
( # Capturing group 1, whatever that stays the same
^(?:[^,\n]*,){10}[^.\n]* # From the beginning of the line, skip 10 columns
# (with 10 commas), then skip to the nearest dot
| # OR
(?!^)\G[^.\n]* # Continue from where the last dot matches
# and skip to the nearest dot
)
. # Dot, to be replaced
I would use this regex:
(?:^(?:[^\R,]*,){10}|(?!^)\G)[^\R.]*\K\.
And replace with ,.
Are you sure it's notepad++ v4.6? That version is pretty old and unfortunately, its regex capabilities won't support the above. The above works on v6.1.
(?: # Beginning of non-capture group
^ # Match only at the start of the string
(?: # Beginning of non-capture group
[^\R,]* # Match non-newlines and non-comma characters
, # Match commas
){10} # Close of non-capture group and repeat 10 times
| # OR
(?!^)\G # A \G anchor that is not at the start to match from previous matches
) # Close of non-capture group
[^\R.]* # Match non-newlines and non-dot characters
\K # Reset the matching
\. # Match a dot

regex to match postgresql bytea

In PostgreSQL, there is a BLOB datatype called bytea. It's just an array of bytes.
bytea literals are output in the following way:
'\\037\\213\\010\\010\\005`Us\\000\\0001.fp3\'\\223\\222%'
See PostgreSQL docs for full definition of the format.
I'm trying to construct a Perl regular expression which will match any such string.
It should also match standard ANSI SQL string literals, like 'Joe', 'Joe''s Mom', 'Fish Called ''Wendy'''
It should also match backslash-escaped variant: 'Joe\'s Mom', .
First aproach (shown below) works only for some bytea representations.
s{ ' # Opening apostrophe
(?: # Start group
[^\\\'] # Anything but a backslash or an apostrophe
| # or
\\ . # Backslash and anything
| # or
\'\' # Double apostrophe
)* # End of group
' # Closing apostrophe
}{LITERAL_REPLACED}xgo;
For other (longer ones, with many escaped apostrophes, Perl gives such warning:
Complex regular subexpression recursion limit (32766) exceeded at ./sqa.pl line 33, <> line 1.
So I am looking for a better (but still regex-based) solution, it probably requires some regex alchemy (avoiding backreferences and all).
OK, here the best solution I could put together, thanks to Leon and hobbs.
Note: This is not the solution I was looking for! It still makes Perl fail with warning "recursion limit (32766) exceeded", for some long strings. (try to stuff 400k random bytes into a bytea field, then export with pg_dump --inserts).
However, it matches most bytea strings (as they appear in SQL code and in server logs), and ANSI SQL string literals. For example:
'\014cS\0059\036a4JEd\021o\005t\0015K7'
'\\037\\213\\010\\010\\005`Us\\000\\0001.fp3\'\\223\\222%'
' Joe''s Mom friend\'s dog is called \'Fluffy'''
And here's the regex:
m{
' # opening apostrophe
(?> # start non-backtracking group
[^\\']+ # anything but a backslash or an apostrophe, one or more times
| # or
(?: # group of
\\ \\? [0-7]{3} # one or two backslashes and three octal digits
)+ # one or more times
| # or
'' # double apostrophe
| # or
\\ [\\'] # backslash-escaped apostrophe or backslash
)* # end of group
' # closing apostrophe
}x;
If you don't care about correctness, at least for now, couldn't you just try to match against regular quoted string literals? Probably something like
m{
(?> # start of a quote group
' # opening apostrophe
(?> # start non-backtracking group
[^\\']+ # anything but a backslash or an apostrophe, one or more times
| # or
\\ . # backslash-escaped something
)* # end of group
' # closing apostrophe
)+ # end of a quote group, many of these
}x;
First of all, it seems like you're trying to so two very different things in one regexp:
Matching it for correctness.
Unquoting it.
To match it, you could try something like his:
m{ ^ # Start of string
' # Opening apostrophe
(?> # Start non-backtracking group
[^\\\'] # Anything but a backslash or an apostrophe
| # or
(?: # Start group
\d{3} # 3 digits
|
. # one other character
) # end group
| # or
'' # Double apostrophe
)* # End of group
' # Closing apostrophe
$ # End of string
}xms;