Regular expression that matches string equals to one in a group - regex

E.g. I want to match string with the same word at the end as at the begin, so that following strings match:
aaa dsfj gjroo gnfsdj riier aaa
sdf foiqjf skdfjqei adf sdf sdjfei sdf
rew123 jefqeoi03945 jq984rjfa;p94 ajefoj384 rew123

This one could do te job:
/^(\w+\b).*\b\1$/
explanation:
/ : regex delimiter
^ : start of string
( : start capture group 1
\w+ : one or more word character
\b : word boundary
) : end of group 1
.* : any number of any char
\b : word boundary
\1 : group 1
$ : end of string
/ : regex delimiter

M42's answer is ok except degenerate cases -- it will not match string with only one word. In order to accept those within one regexp use:
/^(?:(\w+\b).*\b\1|\w+)$/
Also matching only necessary part may be significantly faster on very large strings. Here're my solutions on javascript:
RegExp:
function areEdgeWordsTheSame(str) {
var m = str.match(/^(\w+)\b/);
return (new RegExp(m[1]+'$')).test(str);
}
String:
function areEdgeWordsTheSame(str) {
var idx = str.indexOf(' ');
if (idx < 0) return true;
return str.substr(0, idx) == str.substr(-idx);
}

I don't think a regular expression is the right choice here. Why not split the the lines into an array and compare the first and the last item:
In c#:
string[] words = line.Split(' ');
return words.Length >= 2 && words[0] == words[words.Length - 1];

Related

regex to extract substring for special cases

I have a scenario where i want to extract some substring based on following condition.
search for any pattern myvalue=123& , extract myvalue=123
If the "myvalue" present at end of the line without "&", extract myvalue=123
for ex:
The string is abcdmyvalue=123&xyz => the it should return myvalue=123
The string is abcdmyvalue=123 => the it should return myvalue=123
for first scenario it is working for me with following regex - myvalue=(.?(?=[&,""]))
I am looking for how to modify this regex to include my second scenario as well. I am using https://regex101.com/ to test this.
Thanks in Advace!
Some notes about the pattern that you tried
if you want to only match, you can omit the capture group
e* matches 0+ times an e char
the part .*?(?=[&,""]) matches as least chars until it can assert eiter & , or " to the right, so the positive lookahead expects a single char to the right to be present
You could shorten the pattern to a match only, using a negated character class that matches 0+ times any character except a whitespace char or &
myvalue=[^&\s]*
Regex demo
function regex(data) {
var test = data.match(/=(.*)&/);
if (test === null) {
return data.split('=')[1]
} else {
return test[1]
}
}
console.log(regex('abcdmyvalue=123&3e')); //123
console.log(regex('abcdmyvalue=123')); //123
here is your working code if there is no & at end of string it will have null and will go else block there we can simply split the string and get the value, If & is present at the end of string then regex will simply extract the value between = and &
if you want to use existing regex then you can do it like that
var test = data1.match(/=(.*)&|=(.*)/)
const result = test[1] ? test[1] : test[2];
console.log(result);

Pattern match for (length)%code with before length

I have a pattern like x%c, where x is a single digit integer and c is an alphanumeric code of length x. % is just a token separator of length and code
For instance 2%74 is valid since 74 is of 2 digits. Similarly, 1%8 and 4%3232 are also valid.
I have tried regex of form ^([0-9])(%)([A-Z0-9]){\1}, where I am trying to put a limit on length by the value of group 1. It does not work apparently since the group is treated as a string, not a number.
If I change the above regex to ^([0-9])(%)([A-Z0-9]){2} it will work for 2%74 it is of no use since my length is to be limited controlled by the first group not a fixed digit.
I it is not possible by regex is there a better approach in java?
One way could be using 2 capture groups, and convert the first group to an int and count the characters for the second group.
\b(\d+)%(\d+)\b
\b Word boundary
(\d+) Capture group 1, match 1+ digits
% Match literally
(\d+) Capture group 2, match 1+ digits
\b Word boundary
Regex demo | Java demo
For example
String regex = "\\b(\\d+)%(\\d+)\\b";
String string = "2%74";
Pattern pattern = Pattern.compile(regex);
String strings[] = { "2%74", "1%8", "4%3232", "5%123456", "6%0" };
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
if (Integer.parseInt(matcher.group(1)) == matcher.group(2).length()) {
System.out.println("Match for " + s);
} else {
System.out.println("No match for " + s);
}
}
}
Output
Match for 2%74
Match for 1%8
Match for 4%3232
No match for 5%123456
No match for 6%0

Regex Express Return All Chars before a '/' but if there are 2 '/' Return all before that

I have been trying to get a regex expression to return me the following in the following situations.
XX -> XX
XXX -> XXX
XX/XX -> XX
XX/XX/XX -> XX/XX
XXX/XXX/XX -> XXX/XXX
I had the following Regex, however they do no work.
^[^/]+ => https://regex101.com/r/xvCbNB/1
=========
([A-Z])\w+ => https://regex101.com/r/xvCbNB/2
They are close but are not there.
Any Help would be appreciated.
You want to get all text from the start till the last occurrence of a specific character or till the end of string if the character is missing.
Use
^(?:.*(?=\/)|.+)
See the regex demo and the regex graph:
Details
^ - start of string
(?:.*(?=\/)|.+) - a non-capturing group that matches either of the two alternatives, and if the first one matches first the second won't be tried:
.*(?=\/) - any 0+ chars other than line break chars, as many as possible upt to but excluding /
| - or
.+ - any 1+ chars other than line break chars, as many as possible.
It will be easier to use a replace here to match / followed by non-slash characters before end of line:
Search regex:
/[^/]*$
Replacement String:
""
Updated RegEx Demo 1
If you're looking for a regex match then use this regex:
^(.*?)(?:/[^/]*)?$
Updated RegEx Demo 2
Any special reason it has to be a regular expression? How about just splitting the string at the slashes, remove the last item and rejoin:
function removeItemAfterLastSlash(string) {
const list = string.split(/\//);
if (list.length == 1) [
return string;
}
list.pop();
return list.join("/");
}
Or look for the last slash an remove it:
function removeItemAfterLastSlash(string) {
const index = string.lastIndexOf("/");
if (index === -1) {
return string;
}
return string.splice(0, index);
}

Split with a multicharacter regex pattern and keep delimiters

I have next string and regex for splitting it:
val str = "this is #[loc] sparta"
val regex = "((?<=( #\\[\\w{3,100}\\] ))|(?=( #\\[\\w{3,100}\\] )))"
print(str.split(Regex(regex)))
//print - [this is, #[loc] , sparta]
Works fine. But in develop I did not realize when in #[***] block must be a not only text (\w) - he have and "-" and numbers (UUID), and my correct blocks is -
val str = "this is #[loc_75acca83-a39b-4df1-8c3c-b690df00db62]"
and in this case regex don't work.
How to change this part - "\w{3,100}" for new requirements?
I try change to any - "\.{3,100}" - not work
To fix your issue, you may replace your regex with
val regex = """((?<=( #\[[^\]\[]{3,100}] ))|(?=( #\[[^\]\[]{3,100}] )))"""
The \w can be replaced with [^\]\[] that matches any char but [ and ].
Note the use of a raw string literal, """...""", that allows the use of a single backslash as a regex escape.
See the Kotlin online demo.
Alternatively, you may use the following method to split and keep delimiters:
private fun splitKeepDelims(s: String, rx: Regex, keep_empty: Boolean = true) : MutableList<String> {
var res = mutableListOf<String>() // Declare the mutable list var
var start = 0 // Define var for substring start pos
rx.findAll(s).forEach { // Looking for matches
val substr_before = s.substring(start, it.range.first()) // // Substring before match start
if (substr_before.length > 0 || keep_empty) {
res.add(substr_before) // Adding substring before match start
}
res.add(it.value) // Adding match
start = it.range.last()+1 // Updating start pos of next substring before match
}
if ( start != s.length ) res.add(s.substring(start)) // Adding text after last match if any
return res
}
Then, just use it like
val str = "this is #[loc_75acca83-a39b-4df1-8c3c-b690df00db62] sparta"
val regex = """#\[[\]\[]+]""".toRegex()
print(splitKeepDelims(str, regex))
// => [this is , #[loc_75acca83-a39b-4df1-8c3c-b690df00db62], sparta]
See the Kotlin demo.
The \[[^\]\[]+] pattern matches
\[ - a [ char
[^\]\[]+ - 1+ chars other than [ and ]
] - a ] char.

c# regex split or replace. here's my code i did

I am trying to replace a certain group to "" by using regex.
I was searching and doing my best, but it's over my head.
What I want to do is,
string text = "(12je)apple(/)(jj92)banana(/)cat";
string resultIwant = {apple, banana, cat};
In the first square bracket, there must be 4 character including numbers.
and '(/)' will come to close.
Here's my code. (I was using matches function)
string text= #"(12dj)apple(/)(88j1)banana(/)cat";
string pattern = #"\(.{4}\)(?<value>.+?)\(/\)";
Regex rex = new Regex(pattern);
MatchCollection mc = rex.Matches(text);
if(mc.Count > 0)
{
foreach(Match str in mc)
{
print(str.Groups["value"].Value.ToString());
}
}
However, the result was
apple
banana
So I think I should use replace or something else instead of Matches.
The below regex would capture the word characters which are just after to ),
(?<=\))(\w+)
DEMO
Your c# code would be,
{
string str = "(12je)apple(/)(jj92)banana(/)cat";
Regex rgx = new Regex(#"(?<=\))(\w+)");
foreach (Match m in rgx.Matches(str))
Console.WriteLine(m.Groups[1].Value);
}
IDEONE
Explanation:
(?<=\)) Positive lookbehind is used here. It sets the matching marker just after to the ) symbol.
() capturing groups.
\w+ Then it captures all the following word characters. It won't capture the following ( symbol because it isn't a word character.