Regex Extract in Google Docs for capturing the end of variable strings

Regex Extract in Google Docs for capturing the end of variable strings - regex

In Google Docs, if I have a series of strings like "Something.Here.Search.Term.Chicago", where the last component after "Term." can be anything.
How do I use regex extract to only capture what comes after "Term."?
Note that the length of the string varies before Term so I can't use Left or Right and position since it's always different.

You can use a positive look-behind as well, to avoid having to capture with groups:
/(?<=Term\.).*/
Though depending on the language you are implementing this with, it may not support look-behinds (namely JavaScript).

If you don't want to mess about with capturing groups and you know the component you want is the substring between the last . and the end of the string, you could use
[^.]+$

Here's what worked for me using you sample data:
=REGEXREPLACE(A1; ".*Term.(.*)" ; "$1")

I don't know Google Docs, but normally in regular expressions, you would do
"Something\.Here\.Search\.Term\.(.*)"
The () means capture and remember the pattern within. In this case .* means everything. You can usually access the pattern as $1, etc. in Javascript.
See Examples of Regular Expressions

What about using a "look-ahead" expression (?=),
then something repeated followed by a word boundary?
Something like this:
(?=Term\\.).*\W

Related

Regular expression to split optional groups

Full string syntax is: "db:server:port"
Server and port are optional, i.e. can have partial strings, such as:
db
or
db:server
Trying to use:
(.*):?(.*)?:?(.*)?
selects the whole string
Please advise.

Give this one a shot:
([^:]*?):?([^:]*?):?([^:]*?)$
Not sure what language you're using, so it may not work.
Example: http://regex101.com/r/eQ6bF0
Note on the example it's set for a global/multiline match - beware that this will match across newlines if you don't use the correct modifier.

You didn't specify a language that I can see, so there may be different specific answers, but the basic problem is that .* will match a ":" character. That means the first term will suck the entire string in. I would use ([^:]*) instead of (.*).

You can try this:
([^:]+)(?::([^:]+)(?::([^:]+))?)?

I think this is what you're looking for:
(db|:server|:port)
will match any and all of these:
db:server:port
db
db:server
Working example:
http://regex101.com/r/rK1lI5

Regular expressions middle of string

How I can get part of SIP URI?
For example I have URI sip:username#sip.somedomain.com, I need get just username and I use [^sip:](.*)[$#]+ expression, but appeared result is username#. How I can exclude from matching #?

this should do the job
(?<=^sip:)(.*)(?=[$#])

Use a lookahead instead of actually matching #:
^sip:(.*?)(?=#|\$)
Either you are using a very strange regex flavor, or your starting character class is a mistake. [^sip:] matches a single character that isn't any of s,i,p or :. I am also not certain what the $ character is for, since that isn't a part of SIP syntax.

If lookaheads are not available in your regex flavour (for instance POSIX regexes lack them), you can still match parts of the string in your regex you don't eventually want to return, if you use capture groups and only grab the contents of some of them.
For example
^sip:(.*?)[$#]+ Then only return the contents of the first capture group

(vim) regex: masking text with help of pattern

Am i correct to understand, that the definition
:range s[ubstitute]/pattern/string/cgiI
suggests that in the string part indeed only strings are to be used, that is patterns not allowed? What i would like to do is do replacement of say any N symbols at position M with X*N symbols, so i would have liked to use something like this:
:%s/^\(.\{10}\).\{28}/\1X\{28}/g
Which does not work because \{28} is interpreted literally.
Is writing the 28 XXXXX...X in the replace part the only possibility?

You can use expressions in the replacement part via \=. You have to access the match via submatch(), and join it together with the static string, which you can generate via repeat():
:%s/^\(.\{10}\).\{28}/\=submatch(1) . repeat('X',28)/g

The only regex constructs allowed in the replacement part are numbered groups: \1 \2 \3 etc. The repeating construct {28} is not valid there, though it's a clever idea. You'll have to use 28 X's.

Another alternative is using a expression in the replacement part:
:%s/^\(.\{10}\).\{28}/\=submatch(1).repeat("X",28)/g
The first matched group is obtained with submatch(1). For more information see :h sub-replace-expression.

Capture followed by Digits: Replace Syntax? (Dreamweaver)

When you address a regex capture, things can get tricky when digits follow the capture. In PCRE, I can write
${1}000
to substitute the capture of Group 1 followed by three zeroes.
Does anyone know the equivalent syntax in Dreamweaver replace operations, if any?
If we had a series of "A"s instead of zeroes, we could use:
$1AAAA
But these:
$10000
${1}0000
do not work.
I believe the regex flavor is ECMAScript. Just cannot find the information.
This may not be addressed in the syntax. If so, that would be good to know.
Thank you!
Edit: I should add that this is not matter of life and death as I have a number of grep tools at my fingertips. I would just like to know.

Dreamweaver's regular expression find and replace is supposed to be based on JavaScript's implementation of RegExp. You should be able to just use $1000 in the replacement text. However, like you've found, the replacement groups ($ + group number) are not properly recognized when the replacement text has digits immediately after the grouping token.
FWIW: I've logged a bug on this at http://adobe.ly/DWwish

How can I "inverse match" with regex?

I'm processing a file, line-by-line, and I'd like to do an inverse match. For instance, I want to match lines where there is a string of six letters, but only if these six letters are not 'Andrea'. How should I do that?
I'm using RegexBuddy, but still having trouble.

(?!Andrea).{6}
Assuming your regexp engine supports negative lookaheads...
...or maybe you'd prefer to use [A-Za-z]{6} in place of .{6}
Note that lookaheads and lookbehinds are generally not the right way to "inverse" a regular expression match. Regexps aren't really set up for doing negative matching; they leave that to whatever language you are using them with.

For Python/Java,
^(.(?!(some text)))*$
http://www.lisnichenko.com/articles/javapython-inverse-regex.html

In PCRE and similar variants, you can actually create a regex that matches any line not containing a value:
^(?:(?!Andrea).)*$
This is called a tempered greedy token. The downside is that it doesn't perform well.

The capabilities and syntax of the regex implementation matter.
You could use look-ahead. Using Python as an example,
import re
not_andrea = re.compile('(?!Andrea)\w{6}', re.IGNORECASE)
To break that down:
(?!Andrea) means 'match if the next 6 characters are not "Andrea"'; if so then
\w means a "word character" - alphanumeric characters. This is equivalent to the class [a-zA-Z0-9_]
\w{6} means exactly six word characters.
re.IGNORECASE means that you will exclude "Andrea", "andrea", "ANDREA" ...
Another way is to use your program logic - use all lines not matching Andrea and put them through a second regex to check for six characters. Or first check for at least six word characters, and then check that it does not match Andrea.

Negative lookahead assertion
(?!Andrea)
This is not exactly an inverted match, but it's the best you can directly do with regex. Not all platforms support them though.

If you want to do this in RegexBuddy, there are two ways to get a list of all lines not matching a regex.
On the toolbar on the Test panel, set the test scope to "Line by line". When you do that, an item List All Lines without Matches will appear under the List All button on the same toolbar. (If you don't see the List All button, click the Match button in the main toolbar.)
On the GREP panel, you can turn on the "line-based" and the "invert results" checkboxes to get a list of non-matching lines in the files you're grepping through.

I just came up with this method which may be hardware intensive but it is working:
You can replace all characters which match the regex by an empty string.
This is a oneliner:
notMatched = re.sub(regex, "", string)
I used this because I was forced to use a very complex regex and couldn't figure out how to invert every part of it within a reasonable amount of time.
This will only return you the string result, not any match objects!

(?! is useful in practice. Although strictly speaking, looking ahead is not a regular expression as defined mathematically.
You can write an inverted regular expression manually.
Here is a program to calculate the result automatically.
Its result is machine generated, which is usually much more complex than hand writing one. But the result works.

If you have the possibility to do two regex matches for the inverse and join them together you can use two capturing groups to first capture everything before your regex
^((?!yourRegex).)*
and then capture everything behind your regex
(?<=yourRegex).*
This works for most regexes. One problem I discovered was when I had a quantifier like {2,4} at the end. Then you gotta get creative.

In Perl you can do:
process($line) if ($line =~ !/Andrea/);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex Extract in Google Docs for capturing the end of variable strings - regex

You can use a positive look-behind as well, to avoid having to capture with groups: /(?<=Term\.).*/ Though depending on the language you are implementing this with, it may not support look-behinds (namely JavaScript).

If you don't want to mess about with capturing groups and you know the component you want is the substring between the last . and the end of the string, you could use [^.]+$

Here's what worked for me using you sample data: =REGEXREPLACE(A1; ".Term.(.)" ; "$1")

I don't know Google Docs, but normally in regular expressions, you would do "Something\.Here\.Search\.Term\.(.)" The () means capture and remember the pattern within. In this case . means everything. You can usually access the pattern as $1, etc. in Javascript. See Examples of Regular Expressions

What about using a "look-ahead" expression (?=), then something repeated followed by a word boundary? Something like this: (?=Term\\.).*\W

Related

Regular expression to split optional groups

Regular expressions middle of string

(vim) regex: masking text with help of pattern

Capture followed by Digits: Replace Syntax? (Dreamweaver)

How can I "inverse match" with regex?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex Extract in Google Docs for capturing the end of variable strings - regex

You can use a positive look-behind as well, to avoid having to capture with groups: /(?<=Term\.).*/ Though depending on the language you are implementing this with, it may not support look-behinds (namely JavaScript).

If you don't want to mess about with capturing groups and you know the component you want is the substring between the last . and the end of the string, you could use [^.]+$

Here's what worked for me using you sample data: =REGEXREPLACE(A1; ".*Term.(.*)" ; "$1")

I don't know Google Docs, but normally in regular expressions, you would do "Something\.Here\.Search\.Term\.(.*)" The () means capture and remember the pattern within. In this case .* means everything. You can usually access the pattern as $1, etc. in Javascript. See Examples of Regular Expressions

What about using a "look-ahead" expression (?=), then something repeated followed by a word boundary? Something like this: (?=Term\\.).*\W

Related

Regular expression to split optional groups

Regular expressions middle of string

(vim) regex: masking text with help of pattern

Capture followed by Digits: Replace Syntax? (Dreamweaver)

How can I "inverse match" with regex?

Categories

Resources

Here's what worked for me using you sample data: =REGEXREPLACE(A1; ".Term.(.)" ; "$1")

I don't know Google Docs, but normally in regular expressions, you would do "Something\.Here\.Search\.Term\.(.)" The () means capture and remember the pattern within. In this case . means everything. You can usually access the pattern as $1, etc. in Javascript. See Examples of Regular Expressions