gVim reg exp: How to search with saved pattern - regex

I have some patterns like
a,10
a,12
a,13
b,20
b,22
c,30
d,33
I want to convert to
a,10,12,13
b,20,22,0
c,30,0,0
d,33,0,0
using gVim regexp.
Is it possible to search with saved patterns in gVim regular expression? Like
%s/\\(.\*\\),\\(.\*\\)\n\1..../\1,\2/gc
Or is there any other method to achieve this?

Convoluted but following would work
:%s/\v\d+$\zs\n\w+
:%s/\d\zs$/,0,0,0
:%s/\v^\w+(,\d+){3}\zs.*$
:%s/\v\d+$\zs\n\w+
search all lines ending with a digit
followed by a newline
starting with a word
and removes the newline and word
:%s/\d\zs$/,0,0,0
Add three 0's to each line ending with a digit
:%s/\v^\w+(,\d+){3}\zs.*$
Removes until the end line after the 3th matching comma/digit pair

Related

Regex, extracting both word and number at the end of a string

I'm a beginner to regex and encountered a problem and didn't find a solution, So let's say I have a string ab123cd456, I'm trying to find a regex expression that would extract the text untill the last number (if any) and the number itself so the result of the extraction would be ["ab123cd", "456"]
extracting the end number is easily done by \d+$
but I am unable to make an expression to extract the ab123cd I've tried .*(?=\d+$) which extract ab123cd45 which is weird to me because + is a greedy expression
Please note that I want a single expression for the task
You need to have a non-greedy match as the first one:
import re
lines = ["abc12cd1234"]
for line in lines:
mre = re.match(r'(\w+?)(\d+)$', line)
if mre:
print mre.groups()
Will print:
('abc12cd', '1234')

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

Search regex for Notepad++

I am looking to create a regex for searching Notepad++
I have a notepad page with thousands of random codes such as:
415615610230
151156125611
161651651516
511111115165
I need to search the entire notepad for multiple codes with once search
I know the regex would look like (415615610230|151156125611|161651651516)
but what I need to do is build a regex like above by pasting in all my search criteria.
If I have say 100,000 numbers I might need to search the 100,000 numbers for 20 codes/numbers.
lets just say I want to search for
5155584865
5155584866
5155584867
5155584868
5155584869
5155584870
5155584871
5155584872
5155584873
5155584874
5155584875
5155584876
5155584877
5155584878
5155584879
5155584880
5155584881
5155584882
5155584883
5155584884
The regex should look like:
(5155584865|5155584866|5155584867|5155584868|5155584869|5155584870|5155584871|5155584872|5155584873|5155584874|5155584875|5155584876|5155584877|5155584878|5155584879|5155584880|5155584881|5155584882|5155584883|5155584884)
Is there a way to build the regex above by just pasting in
5155584865
5155584866
5155584867
5155584868
5155584869
5155584870
5155584871
5155584872
5155584873
5155584874
5155584875
5155584876
5155584877
5155584878
5155584879
5155584880
5155584881
5155584882
5155584883
5155584884
Or can anyone recommend an easier way to search the entire notepad document?
If you just want to search for the template above (e.g. starting with 51555848) the you can do
/51555848.([^\s]+)/g
This will match everything starting with 51555848 and ending with a whitespace.
copy your space separated numbers in a new document in your notepad++ and then replace all spaces or whitespaces (\s) with the pipe symbol (| or \| if your search mode is regex).
And you do not need the round brackets for your search string
EDIT:
Instructions for converting a list of numbers (line separated) into a regex
mark everything (ctrl + a)
join rows (ctrl + j)
replace (ctrl + h) with
search pattern: \s+
replace pattern: \|
search mode: Regex

powershell complicated regular expression

I've been stuck trying to write a regular expression that matches the following condition. Basically, I have a text file that contains several text lines (composed of words and digits). For example:
Some_text Number 45 Some_text ptrn: anchor Some_text Number 22 Some_text
What I need is to return “45” (or any other digits after word “Number”), but only in case that in the line was found “ptrn: anchor”. Again, if the pattern “ptrn: anchor” has been found in some line, the script should look back all along the line until it gets first word “Number” and then output the digits beside it.
I'm not so good at regular expressions and very appreciate any help.
This should do:
"Number\s*(\d+).*ptrn: anchor"
Note that if there are multiple numbers before ptrn: anchor in a single line, the first one will be returned.

RegEx in Notepad++ to find a wild character and replace the whole word

I have a test file with number values as below:
32405494
32405495
32405496
32407498
Using Notepad++, what I am trying to achieve here is to search the first 4 digits using regular expression and replace the whole number with G3E_STYLERULE_SEQ.NEXTVAL
I am able to find these values using 3240*. My question is, how do I replace the whole number with G3E_STYLERULE_SEQ.NEXTVAL?
When I am click the Replace All button, I get the following output:
G3E_STYLERULE_SEQ.NEXTVAL5494
G3E_STYLERULE_SEQ.NEXTVAL5495
G3E_STYLERULE_SEQ.NEXTVAL5496
G3E_STYLERULE_SEQ.NEXTVAL7498
However, I am expecting the following:
G3E_STYLERULE_SEQ.NEXTVAL
G3E_STYLERULE_SEQ.NEXTVAL
G3E_STYLERULE_SEQ.NEXTVAL
G3E_STYLERULE_SEQ.NEXTVAL
Any ideas to achieve this? Is it even possible through Notepad++? Are there any other text editors which I can use to achieve this?
Use something like this:
3240.*
. is the wildcard character in regex and * means that the previous character is to be repeated 0 or more times (your current regex actually matches 324 and then 0 which appears 0 or more times).
3240.* will therefore match 3240 and any other following characters.
You might also want to add a line anchor:
^3240.*
So that you don't replace numbers having 3240 in the middle too.
in notepad++, you can use this regex:
^3240\d+
it will match the four digits you're searching at the beginning of your string followed by any digit.
Try this -
Search this - ^3240\d*$
Replace with- G3E_STYLERULE_SEQ.NEXTVAL