RegEx to match two alternatives but nothing else - regex

I need to capture either
\d+\.\d+
or
\d+
but nothing else.
For instance, "0.02", "1" and "0.50" should match positively. I noticed that I cannot simply use something like
[\d+\.\d+|\d+]

(\d+\.\d+|\d+)
should do the trick.

You can do either:
(\d+|\d+\.\d+)
or
(\d+(\.\d+)?)
but that creates a second capturing group. The more sophisticated version is:
(\d+(?:\.\d+)?)
That's called a non-capturing group.
By the way Regular Expression Info is a superb site for regular expression tutorials and information.

Or \d+(\.\d+)? if you find that easier to read :)

Related

Match then exclude without lookbehinds

In Rust with the Regex crate, I've been trying to wrap my head around a regex expression to capture and extract things between square brackets [] yet exclude the brackets from the capture. Given:
// template[tags(foo,bar,baz)]
# template[replace_all(foo:bar)]
I'd like:
tags(foo,bar,baz)
replace_all(foo:bar)
I can easily get the [] capture group but i'm not understanding how to capture with an exclusion of characters after the match. I've been manually replacing these but it seems gross to me. I would love to be able to do it all in one expression.
Update: I am aware that I can get these in multiple capture groups but i'm really curious if there's a way to only capture the single one - hence exclude.
Looking over the docs i'm just not pickin up a way this can be done. There's a lot of great examples using look aheads and behinds but that doesn't appear to be apart of the rust regex crate. Am i missing something obvious here? Thanks for the help.

Is it possible to say in Regex "if the next word does not match this expression"?

I'm trying to detect occurrences of words italicized with *asterisks* around it. However I want to ensure it's not within a link. So it should find "text" in here is some *text* but not within http://google.com/hereissome*text*intheurl.
My first instinct was to use look aheads, but it doesn't seem to work if I use a URL regex such as John Gruber's:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
And put it in a look ahead at the beginning of the pattern, followed by the rest of the pattern.
(?=URLPATTERN)\*[a-zA-Z\s]\*
So how would I do this?
You can use this alternation technique to match everything first on LHS that you want to discard. Then on RHS use captured group to match desired text.
https?:\/\/\S*|(\*\S+\*)
You can then use captured group #1 for your emphasized text.
RegEx Demo
The following regexp:
^(?!http://google.com/hereissome.*text.*intheurl).*
Matches everything but http://google.com/hereissome*text*intheurl. This is called negative lookahead. Some regexp libraries may not support it, python's does.
Here is a link to Mastering Lookahead and Lookbehind.

Overcomplicating regular expression

I have the following regular expression ^(?:\/foo\/)([A-Za-z0-9-]{0,})|^(?:\/foo) that needs to match /foo,/foo/, /foo/abc-123 but not /foobar. This works, I've tested it but I'm sure there is a simpler way using something like lookbehind or ahead.
How can I simplify it, or do I need to? Maybe it's just me being over paranoid about the ugliness of it. Maybe drop the non capturing groups, to have ^\/foo\/([A-Za-z0-9-]{0,})|^\/foo still doesn't look right
Note the goal is to capture abd-123 if present, but not capture the / or the empty string
You can use this simpler regex for the same purpose:
^\/foo(?:\/([A-Za-z0-9-]*))?$
RegEx Demo

Regex Extract in Google Docs for capturing the end of variable strings

In Google Docs, if I have a series of strings like "Something.Here.Search.Term.Chicago", where the last component after "Term." can be anything.
How do I use regex extract to only capture what comes after "Term."?
Note that the length of the string varies before Term so I can't use Left or Right and position since it's always different.
You can use a positive look-behind as well, to avoid having to capture with groups:
/(?<=Term\.).*/
Though depending on the language you are implementing this with, it may not support look-behinds (namely JavaScript).
If you don't want to mess about with capturing groups and you know the component you want is the substring between the last . and the end of the string, you could use
[^.]+$
Here's what worked for me using you sample data:
=REGEXREPLACE(A1; ".*Term.(.*)" ; "$1")
I don't know Google Docs, but normally in regular expressions, you would do
"Something\.Here\.Search\.Term\.(.*)"
The () means capture and remember the pattern within. In this case .* means everything. You can usually access the pattern as $1, etc. in Javascript.
See Examples of Regular Expressions
What about using a "look-ahead" expression (?=),
then something repeated followed by a word boundary?
Something like this:
(?=Term\\.).*\W

Capture followed by Digits: Replace Syntax? (Dreamweaver)

When you address a regex capture, things can get tricky when digits follow the capture. In PCRE, I can write
${1}000
to substitute the capture of Group 1 followed by three zeroes.
Does anyone know the equivalent syntax in Dreamweaver replace operations, if any?
If we had a series of "A"s instead of zeroes, we could use:
$1AAAA
But these:
$10000
${1}0000
do not work.
I believe the regex flavor is ECMAScript. Just cannot find the information.
This may not be addressed in the syntax. If so, that would be good to know.
Thank you!
Edit: I should add that this is not matter of life and death as I have a number of grep tools at my fingertips. I would just like to know.
Dreamweaver's regular expression find and replace is supposed to be based on JavaScript's implementation of RegExp. You should be able to just use $1000 in the replacement text. However, like you've found, the replacement groups ($ + group number) are not properly recognized when the replacement text has digits immediately after the grouping token.
FWIW: I've logged a bug on this at http://adobe.ly/DWwish