Regular expression to split optional groups - regex

Full string syntax is: "db:server:port"
Server and port are optional, i.e. can have partial strings, such as:
db
or
db:server
Trying to use:
(.*):?(.*)?:?(.*)?
selects the whole string
Please advise.

Give this one a shot:
([^:]*?):?([^:]*?):?([^:]*?)$
Not sure what language you're using, so it may not work.
Example: http://regex101.com/r/eQ6bF0
Note on the example it's set for a global/multiline match - beware that this will match across newlines if you don't use the correct modifier.

You didn't specify a language that I can see, so there may be different specific answers, but the basic problem is that .* will match a ":" character. That means the first term will suck the entire string in. I would use ([^:]*) instead of (.*).

You can try this:
([^:]+)(?::([^:]+)(?::([^:]+))?)?

I think this is what you're looking for:
(db|:server|:port)
will match any and all of these:
db:server:port
db
db:server
Working example:
http://regex101.com/r/rK1lI5

Related

Regex Extract in Google Docs for capturing the end of variable strings

In Google Docs, if I have a series of strings like "Something.Here.Search.Term.Chicago", where the last component after "Term." can be anything.
How do I use regex extract to only capture what comes after "Term."?
Note that the length of the string varies before Term so I can't use Left or Right and position since it's always different.
You can use a positive look-behind as well, to avoid having to capture with groups:
/(?<=Term\.).*/
Though depending on the language you are implementing this with, it may not support look-behinds (namely JavaScript).
If you don't want to mess about with capturing groups and you know the component you want is the substring between the last . and the end of the string, you could use
[^.]+$
Here's what worked for me using you sample data:
=REGEXREPLACE(A1; ".*Term.(.*)" ; "$1")
I don't know Google Docs, but normally in regular expressions, you would do
"Something\.Here\.Search\.Term\.(.*)"
The () means capture and remember the pattern within. In this case .* means everything. You can usually access the pattern as $1, etc. in Javascript.
See Examples of Regular Expressions
What about using a "look-ahead" expression (?=),
then something repeated followed by a word boundary?
Something like this:
(?=Term\\.).*\W

(vim) regex: masking text with help of pattern

Am i correct to understand, that the definition
:range s[ubstitute]/pattern/string/cgiI
suggests that in the string part indeed only strings are to be used, that is patterns not allowed? What i would like to do is do replacement of say any N symbols at position M with X*N symbols, so i would have liked to use something like this:
:%s/^\(.\{10}\).\{28}/\1X\{28}/g
Which does not work because \{28} is interpreted literally.
Is writing the 28 XXXXX...X in the replace part the only possibility?
You can use expressions in the replacement part via \=. You have to access the match via submatch(), and join it together with the static string, which you can generate via repeat():
:%s/^\(.\{10}\).\{28}/\=submatch(1) . repeat('X',28)/g
The only regex constructs allowed in the replacement part are numbered groups: \1 \2 \3 etc. The repeating construct {28} is not valid there, though it's a clever idea. You'll have to use 28 X's.
Another alternative is using a expression in the replacement part:
:%s/^\(.\{10}\).\{28}/\=submatch(1).repeat("X",28)/g
The first matched group is obtained with submatch(1). For more information see :h sub-replace-expression.

Simple regex for matching up to an optional character?

I'm sure this is a simple question for someone at ease with regular expressions:
I need to match everything up until the character #
I don't want the string following the # character, just the stuff before it, and the character itself should not be matched. This is the most important part, and what I'm mainly asking. As a second question, I would also like to know how to match the rest, after the # character. But not in the same expression, because I will need that in another context.
Here's an example string:
topics/install.xml#id_install
I want only topics/install.xml. And for the second question (separate expression) I want id_install
First expression:
^([^#]*)
Second expression:
#(.*)$
[a-zA-Z0-9]*[\#]
If your string contains any other special characters you need to add them into the first square bracket escaped.
I don't use C#, but i will assume that it uses pcre... if so,
"([^#]*)#.*"
with a call to 'match'. A call to 'search' does not need the trailing ".*"
The parens define the 'keep group'; the [^#] means any character that is not a '#'
You probably tried something like
"(.*)#.*"
and found that it fails when multiple '#' signs are present (keeping the leading '#'s)?
That is because ".*" is greedy, and will match as much as it can.
Your matcher should have a method that looks something like 'group(...)'. Most matchers
return the entire matched sequence as group(0), the first paren-matched group as group(1),
and so forth.
PCRE is so important i strongly encourage you to search for it on google, learn it, and always have it in your programming toolkit.
Use look ahead and look behind:
To get all characters up to, but not including the pound (#): .*?(?=\#)
To get all characters following, but not including the pound (#): (?<=\#).*
If you don't mind using groups, you can do it all in one shot:
(.*?)\#(.*) Your answers will be in group(1) and group(2). Notice the non-greedy construct, *?, which will attempt to match as little as possible instead of as much as possible.
If you want to allow for missing # section, use ([^\#]*)(?:\#(.*))?. It uses a non-collecting group to test the second half, and if it finds it, returns everything after the pound.
Honestly though, for you situation, it is probably easier to use the Split method provided in String.
More on lookahead and lookbehind
first:
/[^\#]*(?=\#)/ edit: is faster than /.*?(?=\#)/
second:
/(?<=\#).*/
For something like this in C# I would usually skip the regular expressions stuff altogether and do something like:
string[] split = exampleString.Split('#');
string firstString = split[0];
string secondString = split[1];

Regex match everything after question mark?

I have a feed in Yahoo Pipes and want to match everything after a question mark.
So far I've figured out how to match the question mark using..
\?
Now just to match everything that is after/follows the question mark.
\?(.*)
You want the content of the first capture group.
Try this:
\?(.*)
The parentheses are a capturing group that you can use to extract the part of the string you are interested in.
If the string can contain new lines you may have to use the "dot all" modifier to allow the dot to match the new line character. Whether or not you have to do this, and how to do this, depends on the language you are using. It appears that you forgot to mention the programming language you are using in your question.
Another alternative that you can use if your language supports fixed width lookbehind assertions is:
(?<=\?).*
With the positive lookbehind technique:
(?<=\?).*
(We're searching for a text preceded by a question mark here)
Input: derpderp?mystring blahbeh
Output: mystring blahbeh
Example
Basically the ?<= is a group construct, that requires the escaped question-mark, before any match can be made.
They perform really well, but not all implementations support them.
\?(.*)$
If you want to match all chars after "?" you can use a group to match any char, and you'd better use the "$" sign to indicate the end of line.
?(.*\n)+
With this you can get everything Even a new line
Check out this site: http://rubular.com/ Basically the site allows you to enter some example text (what you would be looking for on your site) and then as you build the regular expression it will highlight what is being matched in real time.
str.replace(/^.+?\"|^.|\".+/, '');
This is sometimes bad to use when you wanna select what else to remove between "" and you cannot use it more than twice in one string. All it does is select whatever is not in between "" and replace it with nothing.
Even for me it is a bit confusing, but ill try to explain it. ^.+? (not anything OPTIONAL) till first " then | Or/stop (still researching what it really means) till/at ^. has selected nothing until before the 2nd " using (| stop/at). And select all that comes after with .+.

Match last word after /

so, i have some kind of intern urls: for example "/img/pic/Image1.jpg" or "/pic/Image1.jpg" or just "Image1.jpg", and i need to match this "Image1.jpg" in other words i want to match last character sequence after / or if there are no / than just character sequence. Thank you in advance!
.*/(.*) won't work if there are no /s.
([^/]*)$ should work whether there are or aren't.
Actually you don't need regexp for this.
s="this/is/a/test"
s.substr(s.lastIndexOf("/")+1)
=> test
and it also works fine for strings without any / because then lastIndexOf returns -1.
s="hest"
s.substr(s.lastIndexOf("/")+1)
=> hest
.*/([^/]*)
The capturing group matches the last sequence after /.
The following expression would do the trick:
/([\w\d._-]*)$
Or even easier (but i think this has also been posted below before me)
([^/]+)$
A simple regex that I have tested:
\w+(.)\w+$
Here is a good site you can test it on: http://rubular.com/
In Ruby You would write
([^\/]*)$
Regexps in Ruby are quite universal and You can test them live here: http://rubular.com/
By the way: maybe there is other solution that not involves regexps? E.g File.basenam(path) (Ruby again)
Edit: profjim has posted it earlier.
I noticed you said in your comments you're using javascript. You don't actually need a regex for this and I always think it's nice to have an alternative to using regex.
var str = "/pic/Image1.jpg";
str.split("/").pop();
// example:
alert("/pic/Image1.jpg".split("/").pop()); // alerts "Image1.jpg"
alert("Image2.jpg".split("/").pop()); // alerts "Image2.jpg"
Something like .*/(.*)$ (details depend on whether we're talking about Perl, or some other dialect of regular expressions)
First .* matches everything (including slashes). Then there's one slash, then there's .* that matches everything from that slash to the end (that is $).
The * operates greedily from left to right, which means that when you have multiple slashes, the first .* will match all but the last one.