How to find the 3rd occurrence of a pattern on a line - regex

Today I had to align a table at only the first multiple spaces on a line.
p.e.
<ScrollWheelDown> move window three lines down
<S-ScrollWheelDown> move window one page down
<ScrollWheelUp> move window three lines up
<S-ScrollWheelUp> move window one page up
I use Tabular plugin to align tables but I could not find a way how to find only the first occurrence of multiple spaces and do an align only there.
I don't know it either in VIM:
What will be the regex if I only want to find the 3rd occurrence of a pattern on a line?
Is the regex the same as using Tabular?

The regex would be:
/\(.\{-}\zsPATTERN\)\{3}
So if, for example, you want to change the 3rd 'foo' to 'bar' on the following line:
lorem ifoopsum foo lor foor ipsum foo dolor foo
^1 ^2 ^3 ^4 ^5
run:
s/\(.\{-}\zsfoo\)\{3}/bar/
to get:
lorem ifoopsum foo lor barr ipsum foo dolor foo
^1 ^2 ^3=bar ^4 ^5

I don't know if it fits your needs, but you can search that way :
Place your cursor at the beginning line
Type 3 / pattern Return
It place the cursor on the 3rd occurrence of the next matching line (highlighting all occurrences)
You can also macro :
qa+3nq
then #a to go to the next line 3rd occurence

For Google users (like me) that search just for: "regex nth occurrence". This will return position of last character of third 'foo' (you need to change {3} to your n and foo to your text):
length(regexp_replace('lorem ifoopsum foo lor foor1 ipsum foo dolor foo', '((?:.*?foo){3}).*$', '\1'))
This: (?:.*?foo) searches for anything followed by 'foo', then it is repeated 3 times (?:.*?foo){3}, then string from start to (including) 3rd repetition is captured, then rest of string is matched by .*$, then whole string is replaced by captured thing, and length of it is position of last character of 3rd 'foo'.

Try this:
:Tabularize /^.\{-}\S\s\{2,}
Yes, Tabularize uses Vim's regex, so the example on Eelvex's answer should work.

Related

Regex across double line break

I have the following text, and I need to extra parts out of it:
[Firstname LastName 21/06/2018 - 17:27]
Lorem Ipsum
[Foo Bar 25/01/2017 - 12:10]
Lorem Ipsum - First line
Lorem ipsum Second line
Lorem ipsum third line
Some other random text
I need to extract parts of this text, which I have almost managed to do using the following regex:
\[(?<name>\w+? \w+?) (?<date>\d{2}\/\d{2}\/\d{4}) - (?<time>\d{2}:\d{2})\]\n*(?<note>.+)
Everything works correctly, except for the group labelled <note>, it's only picking up the first line of the note. If there is a line break in the note, then anything after the line break is not picked up.
How can I get it to match all text in the note section, until the regex finds a double line break?
Instead of looking for . (which does not include newlines by default) you can look for [^[], or every character before the next square bracket, followed by two line breaks:
\[(?<name>\w+? \w+?) (?<date>\d{2}\/\d{2}\/\d{4}) - (?<time>\d{2}:\d{2})\]\n*(?<note>[^[]+\n\n)
https://regex101.com/r/12S3ZQ/3
I have modified your original regex to give you the expected output.
\[(?<name>\w+? \w+?) (?<date>\d{2}\/\d{2}\/\d{4}) - (?<time>\d{2}:\d{2})\]\n*(?<note>.+\n?\n?)+
It should match everything until the double line break, notice the only change is at the end.
Instead of...
(?<note>.+)
It is now...
(?<note>.+\n?\n?)+
Edit: Changed the regex so it will include lines separated by ONE line break, but not two.
You may use
\[(?<name>\w+? \w+?) (?<date>\d{2}\/\d{2}\/\d{4}) - (?<time>\d{2}:\d{2})\]\s*(?<note>[\s\S]+?)(?=\n{2}|$)
See the regex demo
The (?<note>[\s\S]+?)(?=\n{2}|$) will match 1+ chars, as few as possible, up to the first 2 newline chars or end of string.
If your regex engine supports \R construct to match any line break sequence, you can use (?=\R{2}|$).

Regular expression to get only the first word from each line

I have a text file
#sp_id int,
#sp_name varchar(120),
#sp_gender varchar(10),
#sp_date_of_birth varchar(10),
#sp_address varchar(120),
#sp_is_active int,
#sp_role int
Here, I want to get only the first word from each line. How can I do this? The spaces between the words may be space or tab etc.
Here is what I suggest:
Find what: ^([^ \t]+).*
Replace with: $1
Explanation: ^ matches the start of line, ([^ \t]+) matches 1 or more (due to +) characters other than space and tab (due to [^ \t]), and then any number of characters up to the end of the line with .*.
See settings:
In case you might have leading whitespace, you might want to use
^\s*([^ \t]+).*
I did something similar with this:
with open('handles.txt', 'r') as handles:
handlelist = [line.rstrip('\n') for line in handles]
newlist = [str(re.findall("\w+", line)[0]) for line in handlelist]
This gets a list containing all the lines in the document,
then it changes each line to a string and uses regex to extract the first word (ignoring white spaces)
My file (handles.txt) contained info like this:
JoIyke - personal twitter link;
newMan - another twitter handle;
yourlink - yet another one.
The code will return this list:
[JoIyke, newMan, yourlink]
Find What: ^(\S+).*$
Replace by : \1
You can simply use this to get the first word.Here we are capturing the first word in a group and replace the while line by the captured group.
Find the first word of each line with /^\w+/gm.

How to delete a pattern when it is not found between two symbols in Perl?

I have a document like this:
Once upon a time, there lived a cat.
The AAAAAA cat was ZZZZZZ very happy.
The AAAAAAcatZZZZZZ knew many other cats from many AAAAAA cities ZZZZZZ.
The cat knew brown cats and AAAAAA green catsZZZZZZ and red cats.
The AAAAAA and ZZZZZZ are similar to { and }, but are used to avoid problems with other scripts that might interpret { and } as other meanings.
I need to delete all appearances of "cat" when it is not found between an AAAAAA and ZZZZZZ.
Once upon a time, there lived a .
The AAAAAA cat was ZZZZZZ very happy.
The AAAAAAcatZZZZZZ knew many other s from many AAAAAA cities ZZZZZZ.
The knew brown s and AAAAAA green catsZZZZZZ and red s.
All AAAAAA's have a matching ZZZZZZ.
The AAAAAA's and matching ZZZZZZ's are not split across lines.
The AAAAAA's and matching ZZZZZZ's are never nested.
The pattern, "cat" in the example above, is not treated as a word. This could be anything.
I have tried several things, e.g.:
perl -pe 's/[^AAAAAAA](.*)(cat)(.*)[^BBBBBBB]//g' <<< "AAAAAAA cat 1 BBBBBBB cat 2"
How can I delete any pattern when it is not found between some matching set of symbols?
You have several possible ways:
You can use the \K feature to remove the part you don't want from match result:
s/AAAAAA.*?ZZZZZZ\K|cat//gs
(\K removes all on the left from match result, but all characters on left are consumed by the regex engine. Consequence, when the first part of the alternation succeeds, you replace the empty string (immediatly after ZZZZZZ) with an empty string.)
You can use a capturing group to inject as it (with a reference $1) the substring you want to preserve in the replacement string:
s/(AAAAAA.*?ZZZZZZ)|cat/$1/gs
You can use backtracking control verbs to skip and not retry the substring matched:
s/AAAAAA.*?ZZZZZZ(*SKIP)(*FAIL)|cat//gs
((*SKIP) forces the regex engine to not retry the substring found on the left if the pattern fails later. (*FAIL) forces the pattern to fail.)
Note: if AAAAAA and ZZZZZZ must be always on the same line, you can remove the /s modifier and process the data line by line.

Regular Expression: Extract the lines

I try to extract the name1 (first-row), name2 (second-row), name3 (third-row) and the street-name (last-row) with regex:
Company Inc.
JohnDoe
Foobar
Industrieterrein 13
The very last row is the street name and this part is already working (the text is stored in the variable "S2").
REGEXREPLACE(S2, "(.*\n)+(?!(.*\n))", "")
This expression will return me the very last line. I am also able the extract the first row:
REGEXREPLACE(S2, "(\n.*)", "")
My problem is, that I do not know how to extract the second and third row....
Also how do I test if the text contains one, two, three or more rows?
Update:
The regex is used in the context of Scribe (a ETL tool). The problem is I can not execute sourcecode, I only have the following functions:
REGEXMATCH(input, pattern)
REGEXREPLACE(input, pattern, replacement)
If the regex language provides support for lookaheads you may count rows backwards and thus get (assuming . does not match newline)
(.*)$ # matching the last line
(.*)(?=(\n.*){1}$) # matching the second last line (excl. newline)
(.*)(?=(\n.*){2}$) # matching the third last line (excl. newline)
just use this regex:
(.+)+
explain:
.
Wildcard: Matches any single character except \n.
+
Matches the previous element one or more times.
As for a regular expression that will match each of four rows, how about this:
(.*?)\n(.*?)\n(.*?)\n(.*)
The parentheses will match, and the \n will match a new line. Note: you may have to use \r\n instead of just \n depending; try both.
You can try the following:
((.*?)\n){3}

Substitute the n-th occurrence of a word in vim

I saw other questions dealing with the finding the n-th occurrence of a word/pattern, but I couldn't find how you would actually substitute the n-th occurrence of a pattern in vim. There's the obvious way of hard coding all the occurrences like
:s/.*\(word\).*\(word\).*\(word\).*/.*\1.*\2.*newWord.*/g
Is there a better way of doing this?
For information,
s/\%(\(pattern\).\{-}\)\{41}\zs\1/2/
also works to replace 42th occurrence. However, I prefer the solution given by John Kugelman which is more simple -- even if it will not limit itself to the current line.
You can do this a little more simply by using multiple searches. The empty pattern in the :s/pattern/repl/ command means replace the most recent search result.
:/word//word//word/ s//newWord/
or
:/word//word/ s/word/newWord/
You could then repeat this multiple times by doing #:, or even 10#: to repeat the command 10 more times.
Alternatively, if I were doing this interactively I would do something like:
3/word
:s//newWord/r
That would find the third occurrence of word starting at the cursor and then perform a substitution.
Replace each Nth occurrence of PATTERN in a line with REPLACE.
:%s/\(\zsPATTERN.\{-}\)\{N}/REPLACE/
To replace the nth occurrence of PATTERN in a line in vim, in addtion to the above answer I just wanted to explain the pattern matching i.e how it is actually working for easy understanding.
So I will be discussing the \(.\{-}\zsPATTERN\)\{N} solution,
The example I will be using is replacing the second occurrence of more than 1 space in a sentence(string).
According to the pattern match code->
According to the zs doc,
\zs - Scroll the text horizontally to position the cursor at the start (left
side) of the screen.
.\{-} 0 or more as few as possible (*)
Here . is matching any character and {} the number of times.
e.g ab{2,3}c here it will match where b comes either 2 or 3 times.
In this case, we can also use .* which is 0 or many as many possible.
According to vim non-greedy docs, "{-}" is the same as "*" but uses the shortest match first algorithm.
\{N} -> Matches n of the preceding atom
/\<\d\{4}\> search for exactly 4 digits, same as /\<\d\d\d\d>
**ignore these \<\> they are for exact searching, like search for fred -> \<fred\> will only search fred not alfred.
\( \) combining the whole pattern.
PATTERN here is your pattern you are matching -> \s\{1,} (\s - space and {1,} as explained just above, search for 1 or more space)
"abc subtring def"
:%s/\(.\{-}\zs\s\{1,}\)\{2}/,/
OUTPUT -> "abc subtring,def"
# explanation: first space would be between abc and substring and second
# occurence of the pattern would be between substring and def, hence that
# will be replaced by the "," as specified in replace command above.
This answers your actual question, but not your intent.
You asked about replacing the nth occurrence of a word (but seemed to mean "within a line"). Here's an answer for the question as asked, in case someone finds it like I did =)
For weird tasks (like needing to replace every 12th occurrence of "dog" with "parrot"), I like to use recursive recordings.
First blank the recording in #q
qqq
Now start a new recording in q
qq
Next, manually do the thing you want to do (using the example above, replace the 12th occurrence of "dog" with "parrot"):
/dog
nnnnnnnnnnn
delete "dog" and get into insert
diwi
type parrot
parrot
Now play your currently empty "#q" recording
#q
which does nothing.
Finally, stop recording:
q
Now your recording in #q calls itself at the end. But because it calls the recording by name, it won't be empty anymore. So, call the recording:
#q
It will replay the recording, then at the end, as the last step, replay itself again. It will repeat this until the end of the file.
TLDR;
qq
q
/dog
nnnnnnnnnnndiwiparrot<esc>
#q
q
#q
Well, if you do /gc then you can count the number of times it asks you for confirmation, and go ahead with the replacement when you get to the nth :D