Example :%y+ yanks to external +clipboard. Now I want results by regexes such as ^\S\+ to the external +clipboard. How?
Trial 1. [Fail] :g#^\S\+#y+
Trial 2. [Fail] :^\S\+y+
Have a look here http://vim.wikia.com/wiki/Copy_the_search_results_into_clipboard, the section 'Copying lines containing search hits'.
This worked for me (clear register a, yank pattern results into it, copy over into clipboard):
qaq
:g/pattern/y A
:let #+ = #a
When I tried :g/pattern/y+ it only copied the first line.
That section on the link also gives a function you can write to copy all lines, but it still looks like a 2-step process (do /pattern and then call the function): but you could probably write your own macro/shortcut to do both in sequence.
Perhaps you can find some inspiration in the following:
:saveas %.tmp
to start working on a copy
:v//d
:exec '%s/^.\{-\}\(' . #/ . '\).*$/\1/g'
So for example doing a search /word\d on the following text:
Tincidunt. Proin sagittis2. Curabitur auctor metus non mauris. Nunc condimentum
nisl non augue. Donec leo urna, dignissim vitae, porttitor ut, iaculis sit
amet, sem.
Class aptent taciti sociosqu3 ad litora torquent per conubia nostra, per
inceptos himenaeos. Suspendisse potenti. Quisque augue metus, hendrerit sit
amet, commodo vel, scelerisque ut, ante. Praesent euismod euismod risus. Mauris
ut metus sit amet mi cursus commodo. Morbi congue mauris ac sapien. Donec
justo. Sed congue nunc vel mauris0. Pellentesque vehicula orci id libero. In hac
habitasse platea dictumst. Nulla sollicitudin, purus1 id elementum dictum, dolor
augue hendrerit ante, vel semper metus enim et dolor. Pellentesque molestie9.
will highlight the words as shown. Doing
:v//d|exec '%s/^.\{-\}\(' . #/ . '\).*$/\1/g'
results in the following buffer:
sagittis2
sociosqu3
mauris0
purus1
molestie9
Try the following. First define a function for saving your
matched regexps into a list (:help List).
:fun Add(list, item)
: call add(a:list, a:item)
: return a:item
:endfun
Note that it also returns the added item. This isn't necessary,
but what you return here becomes the replacement text in the
:substitute that follows below.
Next, create an empty list.
:let list=[]
Then run your regexp.
:%s/^\S\+/\=Add(list, submatch(0))/
Note that the replacement begins with \= (see :help
sub-replace-special). The result is that we replace the match expression
with itself, but at the same time capture the result in list.
The submatch() function returns matched parts of the :substitute
command. submatch(0) returns the entire match.
Note that as a side effect the buffer is marked as having been changed.
If you saved the buffer before you executed the substitution then you
can fix this by simply hitting u for undo, or appending '| u' to your
command. (If you are going to undo the operation then it doesn't matter
what we return from our Add function.)
You can view your matches with
:echo list
or you can append them to your buffer with
:$put=list
I hope this helps. It's straightforward once you know how, but there's
quite a lot to digest there.
Oh, I nearly forgot! You can add your matches to the clipboard with
:let #+ = join(list, '^#')
Then your matches will be in the clipboard, each on a separate line.
^# represents a NUL character, and is generated with <C-V><C-J>, meaning
press control-V, followed by control-J. (Why you have to use ^# here is
a whole topic in itself.)
append all search result to reg X(uppercase) separated by newline:
:g/^/call setreg('X', matchstr(getline('.'), '^\S\+') . "\n")
to clear reg x:
qxq
Related
So I have this text that I am trying to parse with Regex:
Name: Test Data 1
Description: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec feugiat nulla id nisi venenatis blandit.
Donec blandit egestas orci, at tristique dui vehicula in. Maecenas fringilla fringilla enim, in pulvinar ex gravida
in. Nam cursus facilisis ante, sed tristique nisl sagittis sed. In auctor felis id neque suscipit ullamcorper. Nunc
faucibus elit sed metus vestibulum, ullamcorper pulvinar nisi auctor. Praesent sodales orci mauris, eget dapibus
mauris sodales in. Ut iaculis, ante vitae ullamcorper semper, metus tortor auctor purus, eu convallis nulla lacus
in tellus. Phasellus feugiat tempus neque, in fringilla nisi scelerisque sed. Donec elementum diam nec mattis dignissim.
I am trying to parse it to load it into a database.
With this expression, I am trying to get a match on the "Name" and "Description" parameters but also trying to get a match on the parameter value as well (which can sometimes be multi-line).
(.*):\s(.*)
I have been searching for a while now and I cannot seem to be able to make it match the whole paragraph but stop when it hits a blank line.
I would like the result to be as follows:
1st Match
Group 1: Name
Group 2: Test Data 1
2nd Match
Group 1: Description
Group 2: Description value with multi-line
https://regex101.com/r/mG2ms9/3
Thanks
You can use the following:
(.*?):\s([\s\S]*?)(?=\n(?:\n|\w|$))
Here it is on regex101.
[\s\S] matches any character, even a new line (whereas '.' does not, by default).
Then we're matching as few characters as possible (*?) up until the point where the next line is either blank (\n), starts with a word character (\w), or is the end of the string ($).
We can get away with the \w option since all of the new lines in the description parameter are followed by a space. If this isn't always the case, you could replace \w with something like .*: to check instead if the next line contains ':' and stop if so.
Note that I disabled multi-line mode; it's not suitable here.
I’m having a hard time figuring out the regex code in Google Sheets to check a cell then return everything including new lines \n and returns \r before a certain pattern \*+.
A little more background: I'm using REGEXEXTRACT(A:A,"...") format inside a bigger ArrayFormula so that it automatically updates when a new row is added. This one’s working properly. It’s only the regex part I’m having trouble with.
So, for the purpose of this question, let's say I'm only worried about extracting the data from the A1 cell before a certain pattern and return that value in cell B1. Which brings us to this code in cell B1:
REGEXEXTRACT(A1,"...")
For example, this is how my A1 cell looks like:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus accumsan risus id ex dapibus sodales.
Curabitur dui lacus, tincidunt vel ligula quis, volutpat mattis eros.
In quis metus at ex auctor lobortis. Aliquam sed nisi purus. Sed cursus odio erat, ut tristique sapien interdum interdum. Morbi vel sollicitudin ante, non pellentesque libero.
***********
Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Aenean egestas urna facilisis massa posuere, quis accumsan erat ornare.
Curabitur at dapibus nibh. Nam nec vestibulum ligula. Phasellus bibendum mi urna, ac hendrerit libero interdum non. Suspendisse semper non elit aliquam auctor.
Morbi vel sem tortor. Donec a sapien quis erat condimentum consequat in ut sem. Quisque in tellus sed est lobortis ultricies sed vitae enim.
I want to return this value in B1:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus accumsan risus id ex dapibus sodales.
Curabitur dui lacus, tincidunt vel ligula quis, volutpat mattis eros.
In quis metus at ex auctor lobortis. Aliquam sed nisi purus. Sed cursus odio erat, ut tristique sapien interdum interdum. Morbi vel sollicitudin ante, non pellentesque libero.
Which is basically anything before the pattern *******. In Python, I can add the re.DOTALL to the .* but I can't get this to work in Google Sheets.
To make a dot match line breaks, you need to add (?s) to the pattern. To match any char, you may use a .. To match up to the leftmost occurrence, use lazy quantifier, *?. To actually extract a substring you need, wrap the part of the pattern you are interested in getting with capturing parentheses.
So, to match up to the first ******* substring, you may use
(?s)^(.*?)\*\*\*\*\*\*\*
or (?s)^(.*?)\*{7}. See the regex demo (note that Go regex engine is also RE2, so you may test your patterns there, at regex101.com).
(?s) - a DOTALL modifier
^ - start of string
(.*?) - Group 1: any 0+ chars as few as possible
\*\*\*\*\*\*\* - 7 literal asterisk symbols.
Note you cannot rely on a negated character class (that matches line breaks) if your substring may contain * chars, that is, ^([^*]*)\*\*\*\*\*\*\* won't work in those cases.
If you just want to match any chars up to the first * in the string, your regex will simplify greatly to
^([^*]+)
It matches
^ - start of string
([^*]+) - Capturing group 1: one or more chars other than *.
re.DOTALL flag in python corresponds to (?s) single line mode flag in re2.
Python:
(Dot.) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.
re2:
Flags: s let . match \n (default false)
So,
=REGEXEXTRACT(A1,"(?s)(.*?)\*")
This corresponds to re.findall()
Not regex though might suit someone wanting the same result but less particular about the method:
=ArrayFormula(LEFT(A1:A,Find("***********",A1:A)-3))
If you really only want to match everything before the first *:
=REGEXEXTRACT(A1;"[^*]*")
If you want to allow a single star in the text and only stop at multiple (2 or more) stars (possibly divided by newlines) at the beginning of a line, you could try:
=REGEXEXTRACT(A1;"(?s)^(.*)\n(\*\n?){2,}")
But you would have to strip the stars. E.g.
=REGEXREPLACE(REGEXEXTRACT(A1;"(?s)^(.*)\n(\*\n?){2,}"); "\n(\*\n?){2,}"; "")
A lookahead does not seem to work in Google Sheets.
Using regexr, I wrote the expression /[\.!?] [A-Z]/g to match sentences using 3 assumptions:
Sentences end with punctuation: [.,!?] (I'm not sure how to match double punctuation marks or combinations...)
One or more spaces always follow the punctuation mark.
The next sentence begins with a CAPITAL letter. (True 99% of the time, except for lowercase nouns such as iDevices)
Using sed, I'd like to take these matches, and substitute the space(s) with a \n character. I can do an after match $' and a before match $`, but how can I replace within a match?
If there is a better way of splitting texts into one sentence per line, I'm open to alternatives.
No bashisms: for Linux, OS X, and BSD
Input:
Vivamus fermentum semper porta. Nunc diam velit, adipiscing ut
tristique vitae, sagittis vel odio. Maecenas convallis ullamcorper
ultricies. Curabitur ornare, ligula semper consectetur sagittis, nisi
diam iaculis velit, id fringilla sem nunc vel mi.
Output:
Vivamus fermentum semper porta.
Nunc diam velit, adipiscing ut tristique vitae, sagittis vel odio.
Maecenas convallis ullamcorper ultricies.
Curabitur ornare, ligula semper consectetur sagittis, nisi diam iaculis velit, id fringilla sem nunc vel mi.
You can use this replacement:
sed 's/\([.!?][.!?]*\) *\([A-Z]\)/\1\n\2/g;' file
\(...\) delimits a capture groups and \1 is a reference to the captured content.
The OSX version of sed doesn't interpret \n as a newline, you must use instead the sequence \1'$'\n\\2 as replacement string.
A more POSIX way consists to write:
sed 's/\([.!?][.!?]*\) *\([A-Z]\)/\1\
\2/g;' file
with an escaped newline as suggested by #cliffordheath.
Note that the dot doesn't need to be escaped inside a character class.
You need to use capture groups with \( and \) to re-insert the punctuation and initial letter. This example allows a following sentence to start with any alphanumeric (but requires at least one space to avoid messing up decimal numbers):
$ sed -e 's/\([.!?]\) *\([[:alnum:]]\)/\1\
\2/g'
foo. bat! baz? foo, bar.
foo.
bat!
baz?
foo, bar.
I hope this helps.
I have been monkeying with the following regex expression:
(\b\*)\w+(\*\b)
What I wanted to do was extract
^vitae^
from
Nam vestibulum hendrerit justo. Quisque ^vitae^ libero magna. Curabitur pretium eros ut augue ullamcorper feugiat. Aenean blandit libero vitae nunc sodales pharetra.
But what I seem to get is that regex found the text in question and returns the all of the text in the sentence as opposed to just
^vitae^
Any help would be greatly appreciated
Thanks!
To match any text between ^
#"\^([^^]*)\^")
//matchs ^ anything that isn't ^ and finally ^
It also matches line breaks if there are any
What about this expression:
#"\^\w+\^"
I'm trying to match untill the first occurence of ] is found but can't seem to make it work, if someone could help me figure this out.
The string I'm matching against:
[plugin:tabs][tab title="test"]Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam sit amet nisl nisl. Ut interdum libero vitae quam ultricies et lacinia elit aliquet. Praesent tincidunt, sem tempus feugiat feugiat, turpis tellus scelerisque erat, sit amet feugiat neque arcu ac lectus. Sed at mi et elit interdum scelerisque vitae eu felis.[/tab][/plugin]
What it should match:
[plugin:tabs]
What it keeps matching:
[plugin:tabs][tab title="test"]
The regex:
(\[plugin:(?<identifier>[^\s]+)(?<parameters>.*?)\])
EDIT:
What it should also match:
[plugin:tabs test="test"]
You just need to add ? like so (lazy match, will match as few characters as possible):
(\[plugin:(?<identifier>[^\s]+?)(?<parameters>.*?)\])
^
Although the (?<parameters>.*?) part is unnecessary then.
So your final Regex would look like this:
(\[plugin:(?<identifier>[^\s]+?)\])
€dit: See #stema's answer.
Try this here
(\[plugin:(?<identifier>[^\]\s]+)(?<parameters>.*?)\])
See it here on Regexr
This avoids additionally to the whitespace characters also the ] character in the first named group.
If you don't need the first capturing group you can make it a non-capturing group by adding ?: right after the opening bracket.
(?:\[plugin:(?<identifier>[^\]\s]+)(?<parameters>.*?)\])
To avoid that the space in between is captured by the second group, just match optional whitespace between the two groups
(?:\[plugin:(?<identifier>[^\]\s]+)\s*(?<parameters>.*?)\])
See it here on Regexr
With any language that supports lookbehinds that will be your easiest solution.
/(^(?<!])*)/