I'm trying to create a regex to match part of a URL
The possible URLs might be
www.mysite.com?userid=123xy
www.mysite.com?userid=123x&username=joe
www.mysite.com?tag=xyz&userid=1ww45
www.mysite.com?tag=xyz&userid=1g3x5&username=joe
I'm trying to match the userid=123456
So far I have
Dim r As New Regex("[&?]userID.*[?&]")
Debug.WriteLine(r.Match(strUrl))
But this is only matching lines 2 and 4.
Can anyone help?
(?<=[?&]userid=)[^&#\s]*
Output:
123xy
123x
1ww45
1g3x5
A few points:
This works both if you are matching one URL at a time and if you have a whitespace-separated set.
This captures the username only. It uses the non-capturing positive look-behind assertion since you only care about the username.
The fragment part, if present, will be ignored (e.g. if the URL looked like this: www.mysite.com?tag=xyz&userid=1ww45#top)
If the case of userid doesn't matter, use RegexOptions.IgnoreCase.
I got it:
[&?]userID=[^\s&#]+
PHP solution:
"/[\\?&]userid=([^&]*)/"
Tests:
$tests = [
[
"regex" => "/[\\?&]userid=([^&]*)/",
"expected" => "123xy",
"inputs" => [
"www.mysite.com?userid=123xy",
"www.mysite.com?userid=123xy&username=joe",
"www.mysite.com?tag=xyz&userid=123xy",
"www.mysite.com?tag=xyz&userid=123xy&username=joe"
]
]
];
foreach ($tests as $test) {
$regex = $test['regex'];
$expected = $test['expected'];
foreach ($test['inputs'] as $input) {
if (!preg_match($regex, $input, $match)) {
throw new Exception("Regex '{$regex}' doesn't match for input '{$input}' or error has occured.");
}
$matched = $match[1];
if ($matched !== $expected) {
throw new Exception("Found '{$matched}' instead of '{$expected}'.");
}
echo "Matched '{$matched}' in '{$input}'." . PHP_EOL;
}
}
Results:
Matched '123xy' in 'www.mysite.com?userid=123xy'.
Matched '123xy' in 'www.mysite.com?userid=123xy&username=joe'.
Matched '123xy' in 'www.mysite.com?tag=xyz&userid=123xy'.
Matched '123xy' in 'www.mysite.com?tag=xyz&userid=123xy&username=joe'.
You can use the regex: .*?(userid=\d+).*
.*? - is a non-greedy way to express: everything that comes before (userid=\d+)
Python example:
import re
a = 'www.mysite.com?userid=12345'
b = 'www.mysite.com?userid=12345&username=joe'
mat = re.match('.*?(userid=\d+).*', a)
print mat.group(1) # prints userid=12345
mat = re.match('.*?(userid=\d+).*', b)
print mat.group(1) # prints userid=12345
Link to Fiddler
Related
In the following declarative syntax pipeline:
pipeline {
agent any
stages {
stage( "1" ) {
steps {
script {
orig = "/path/to/file"
two_lev_down = (orig =~ /^(?:\/[^\/]*){2}(.*)/)[0][1]
echo "${two_lev_down}"
depth = 2
two_lev_down = (orig =~ /^(?:\/[^\/]*){depth}(.*)/)[0][1]
echo "${two_lev_down}"
}
}
}
}
}
...the regex is meant to match everything after the third instance of "/".
The first, i.e. (orig =~ /^(?:\/[^\/]*){2}(.*)/)[0][1] works.
But the second, (orig =~ /^(?:\/[^\/]*){depth}(.*)/)[0][1] does not. It generates this error:
java.util.regex.PatternSyntaxException: Illegal repetition near index 10
^(?:/[^/]*){depth}(.*)
I assume the problem is the use of the variable depth instead of a hardcoded integer, since that's the only difference between the working code and error-generating code.
How can I use a Groovy variable in a regex pattern find-count? Or what is the Groovy-language idiomatic way to write a regex that returns everything after the nth occurrence of a pattern?
You are missing the $ in front of your variable. It should be:
orig = "/path/to/file"
depth = 2
two_lev_down = (orig =~ /^(?:\/[^\/]*){$depth}(.*)/)[0][1]
assert '/file' == two_lev_down
Why?
In Groovy the String-interpolation (over GString) works for 3 String literals:
usual double quotes: "Hello $world, my name is ${name.toUpperCase()}"
Slashy-strings used usually as regexp-literals: /.{$depth}/
Multi-line double-quoted Strings:
def email = """
Dear ${user}.
Thank your for blablah.
"""
I have the below code in which I am checking for a specific variable location in an array excluded.It works fine with all the array elements except one (abc/def/libraries/linux_3.2.60-1+deb7u3.dsc). When I provide this element as my location its printing "location not excluded" , even though its excluded.
How can I made my code to get this element as well as excluded?
use strict;
use warnings;
my #excluded = (
"xyz/efg/headers/",
"abc/def/libraries/jni-mr.h",
"abc/def/libraries/linux_3.2.60-1+deb7u3.dsc",
);
my $location = "abc/def/libraries/linux_3.2.60-1+deb7u3.dsc";
my $badpath = 0;
foreach (#excluded) {
# -- Check if location is contained in excluded array
if ($location =~ /^$_/) {
$badpath = 1;
print "location is excluded : $location \n";
}
}
if (! $badpath) {
print "location is not excluded : $location \n";
}
Desired Output:
location is excluded : abc/def/libraries/linux_3.2.60-1+deb7u3.dsc
Current Output:
location is not excluded : abc/def/libraries/linux_3.2.60-1+deb7u3.dsc
Use quotemeta($text) or \Q$text\E (inside double quotes or a regex literal) to create a pattern that matches the value of $text. In other words, use
if ($location =~ /^\Q$_\E/)
instead of:
if ($location =~ /^$_/)
It looks like you intend to define your exclusions by regex, but you have not escaped regex metachars properly in those regexes. For your failing case, the metachar causing it to fail is the plus (+), which is a one-or-more multiplier in most regex flavors (including Perl), but you need to match it literally.
Also, I'd recommend moving the ^ anchor from the loop to each individual regex, which would make the code more flexible, in that you could choose not to anchor some of the exclusion regexes if you wanted.
Also, you should use the qr() construct, which allows you to precompile regexes, saving on CPU.
Also, this requirement is a good candidate for using grep().
use strict;
use warnings;
my #excluded = (
qr(^xyz/efg/headers/),
qr(^abc/def/libraries/jni-mr\.h),
qr(^abc/def/libraries/linux_3\.2\.60-1\+deb7u3\.dsc),
);
my $location = 'abc/def/libraries/linux_3.2.60-1+deb7u3.dsc';
# -- Check if location is contained in excluded array
my $badpath = scalar(grep($location =~ $_, #excluded )) >= 1 ? 1 : 0;
if ($badpath) {
print "location is excluded : $location \n";
} else {
print "location is not excluded : $location \n";
}
There is a string in the following format:
It can start with any number of strings enclosed by double braces, possibly with white space between them (whitespace may or may not occur).
It may also contain strings enclosed by double-braces in the middle.
I am looking for a regular expression that can separate the start from the rest.
For example, given the following string:
{{a}}{{b}} {{c}} def{{g}}hij
The two parts are:
{{a}}{{b}} {{c}}
def{{g}}hij
I tried this:
/^({{.*}})(.*)$/
But, it captured also the g in the middle:
{{a}}{{b}} {{c}} def{{g}}
hij
I tried this:
/^({{.*?}})(.*)$/
But, it captured only the first a:
{{a}}
{{b}} {{c}} def{{g}}hij
This keeps matching {{, any non { or } character 1 or more times, }}, possible whitespace zero or more times and stores it in the first group. Rest of the string will be in the 2nd group. If there are no parts surrounded by {{ and }} the first group will be empty. This was in JavaScript.
var str = "{{a}}{{b}} {{c}} def{{g}}hij";
str.match(/^\s*((?:\{\{[^{}]+\}\}\s*)*)(.*)/)
// [whole match, group 1, group 2]
// ["{{a}}{{b}} {{c}} def{{g}}hij", "{{a}}{{b}} {{c}} ", "def{{g}}hij"]
How about using preg_split:
$str = '{{a}}{{b}} {{c}} def{{g}}hij';
$list = preg_split('/(\s[^{].+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($list);
output:
Array
(
[0] => {{a}}{{b}} {{c}}
[1] => def{{g}}hij
)
I think I got it:
var string = "{{a}}{{b}} {{c}} def{{g}}hij";
console.log(string.match(/((\{\{\w+\}\})\s*)+/g));
// Output: [ '{{a}}{{b}} {{c}} ', '{{g}}' ]
Explanation:
( starts a group.
( another;
\{\{\w+\}\} looks for {{A-Za-z_0-9}}
) closes second group.
\s* Counts whitespace if it's there.
)+ closes the first group and looks for oits one or more occurrences.
When it gets any not-{{something}} type data, it stops.
P.S. -> Complex RegEx takes CPU speed.
You can use this:
(java)
string[] result = yourstr.split("\\s+(?!{)");
(php)
$result = preg_split('/\s+(?!{)/', '{{a}}{{b}} {{c}} def{{g}}hij');
print_r($result);
I don´t know exactly why are you want to split, but in case that the string contains always a def inside, and you want to separate the string from there in two halves, then, you can try something like:
string text = "{{a}}{{b}} {{c}} def{{g}}hij";
Regex r = new Regex("def");
string[] split = new string[2];
int index = r.Match(text).Index;
split[0] = string.Join("", text.Take(index).Select(x => x.ToString()).ToArray<string>());
split[1] = string.Join("", text.Skip(index).Take(text.Length - index).Select(x => x.ToString()).ToArray<string>());
// Output: [ '{{a}}{{b}} {{c}} ', 'def{{g}}hij' ]
I want to get an array of all the words with capital letters that are included in the string. But only if the line begins with "set".
For example:
- string "setUserId", result array("User", "Id")
- string "getUserId", result false
Without limitation about "set" RegEx look like /([A-Z][a-z]+)/
$str ='setUserId';
$rep_str = preg_replace('/^set/','',$str);
if($str != $rep_str) {
$array = preg_split('/(?<=[a-z])(?=[A-Z])/',$rep_str);
var_dump($array);
}
See it
Also your regex will also work.:
$str = 'setUserId';
if(preg_match('/^set/',$str) && preg_match_all('/([A-Z][a-z]*)/',$str,$match)) {
var_dump($match[1]);
}
See it
I don't know regular expression at all. Can anybody help me with one very simple regular expression which is,
extracting 'word:word' from a sentence. e.g "Java Tutorial Format:Pdf With Location:Tokyo Javascript"?
Little modification:
the first 'word' is from a list but second is anything. "word1 in [ABC, FGR, HTY]"
guys situation demands a little more
modification.
The matching form can be "word11:word12 word13 .. " till the next "word21: ... " .
things are becoming complex with sec.....i have to learn reg ex :(
thanks in advance.
You can use the regex:
\w+:\w+
Explanation:
\w - single char which is either a letter(uppercase or lowercase), digit or a _.
\w+ - one or more of above char..basically a word
so \w+:\w+
would match a pair of words separated by a colon.
Try \b(\S+?):(\S+?)\b. Group 1 will capture "Format" and group 2, "Pdf".
A working example:
<html>
<head>
<script type="text/javascript">
function test() {
var re = /\b(\S+?):(\S+?)\b/g; // without 'g' matches only the first
var text = "Java Tutorial Format:Pdf With Location:Tokyo Javascript";
var match = null;
while ( (match = re.exec(text)) != null) {
alert(match[1] + " -- " + match[2]);
}
}
</script>
</head>
<body onload="test();">
</body>
</html>
A good reference for regexes is https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp
Use this snippet :
$str=" this is pavun:kumar hello world bk:systesm" ;
if ( preg_match_all ( '/(\w+\:\w+)/',$str ,$val ) )
{
print_r ( $val ) ;
}
else
{
print "Not matched \n";
}
Continuing Jaú's function with your additional requirement:
function test() {
var words = ['Format', 'Location', 'Size'],
text = "Java Tutorial Format:Pdf With Location:Tokyo Language:Javascript",
match = null;
var re = new RegExp( '(' + words.join('|') + '):(\\w+)', 'g');
while ( (match = re.exec(text)) != null) {
alert(match[1] + " = " + match[2]);
}
}
I am currently solving that problem in my nodejs app and found that this is, what I guess, suitable for colon-paired wordings:
([\w]+:)("(([^"])*)"|'(([^'])*)'|(([^\s])*))
It also matches quoted value. like a:"b" c:'d e' f:g
Example coding in es6:
const regex = /([\w]+:)("(([^"])*)"|'(([^'])*)'|(([^\s])*))/g;
const str = `category:"live casino" gsp:S1aik-UBnl aa:"b" c:'d e' f:g`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Example coding in PHP
$re = '/([\w]+:)("(([^"])*)"|\'(([^\'])*)\'|(([^\s])*))/';
$str = 'category:"live casino" gsp:S1aik-UBnl aa:"b" c:\'d e\' f:g';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
You can check/test your regex expressions using this online tool: https://regex101.com
Btw, if not deleted by regex101.com, you can browse that example coding here
here's the non regex way, in your favourite language, split on white spaces, go through the element, check for ":" , print them if found. Eg Python
>>> s="Java Tutorial Format:Pdf With Location:Tokyo Javascript"
>>> for i in s.split():
... if ":" in i:
... print i
...
Format:Pdf
Location:Tokyo
You can do further checks to make sure its really "someword:someword" by splitting again on ":" and checking if there are 2 elements in the splitted list. eg
>>> for i in s.split():
... if ":" in i:
... a=i.split(":")
... if len(a) == 2:
... print i
...
Format:Pdf
Location:Tokyo
([^:]+):(.+)
Meaning: (everything except : one or more times), :, (any character one ore more time)
You'll find good manuals on the net... Maybe it's time for you to learn...