Kotlin Regex named groups support - regex

Does Kotlin have support for named regex groups?
Named regex group looks like this: (?<name>...)

According to this discussion,
This will be supported in Kotlin 1.1.
https://youtrack.jetbrains.com/issue/KT-12753
Kotlin 1.1 EAP is already available to try.
"""(\w+?)(?<num>\d+)""".toRegex().matchEntire("area51")!!.groups["num"]!!.value
You'll have to use kotlin-stdlib-jre8.

As of Kotlin 1.0 the Regex class doesn't provide a way to access matched named groups in MatchGroupCollection because the Standard Library can only employ regex api available in JDK6, that doesn't have support for named groups either.
If you target JDK8 you can use java.util.regex.Pattern and java.util.regex.Matcher classes. The latter provides group method to get the result of named-capturing group match.

As of Kotlin 1.4, you need to cast result of groups to MatchNamedGroupCollection:
val groups = """(\w+?)(?<num>\d+)""".toRegex().matchEntire("area51")!!.groups as? MatchNamedGroupCollection
if (groups != null) {
println(groups.get("num")?.value)
}
And as #Vadzim correctly noticed, you must use kotlin-stdlib-jdk8 instead of kotlin-stdlib:
dependencies {
implementation "org.jetbrains.kotlin:kotlin-stdlib-jdk8"
}
Here is a good explanation about it

The above answers did not work for me, what did work however was using the following method:
val pattern = Pattern.compile("""(\w+?)(?<num>\d+)""")
val matcher = pattern.matcher("area51")
while (matcher.find()) {
val result = matcher.group("num")
}

kotlin
fun regex(regex: Regex, input: String, group: String): String {
return regex
.matchEntire(input)!!
.groups[group]!!
.value
}
#Test
fun regex() {
// given
val expected = "s3://asdf/qwer"
val pattern = "[\\s\\S]*Location\\s+(?<s3>[\\w/:_-]+)[\\s\\S]*"
val input = """
...
...
Location s3://asdf/qwer
Serde Library org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
""".trimIndent()
val group = "s3"
// when
val actual = CommonUtil.regex(pattern.toRegex(), input, group)
// then
assertEquals(expected, actual)
}

Related

Extract JSON from String using flutter dart

Hello I want to extract JSON from below input string.
I have tried bellow regex in java and it is working fine,
private static final Pattern shortcode_media = Pattern.compile("\"shortcode_media\":(\\{.+\\})");
I want in regex for dart.
Input String
<script type="text/javascript">window.__initialDataLoaded(window._sharedData);</script><script type="text/javascript">window.__additionalDataLoaded('/p/B9fphP5gBeG/',{"graphql":{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}});</script><script type="text/javascript">
<script type="text/javascript">window.__initialDataLoaded(window._newData);</script><script type="text/javascript">window._newData('/p/B9fphP5gBeG/',{"graphql":{"post":{"__typename":"id","id":"2260708142683789190","new_code":"B9fphP5gBeG"}}});</script><script type="text/javascript">
(function(){
function normalizeError(err) {
var errorInfo = err.error || {};
var getConfigProp = function(propName, defaultValueIfNotTruthy) {
var propValue = window._sharedData && window._sharedData[propName];
return propValue ? propValue : defaultValueIfNotTruthy;
};
return {}
}
)
Expected json
{"graphql":{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}}
Note: There are multiple json string in input string, i need json of shortcode_media tag
please use
void main() {
​
String json = '''
{"graphql":
{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}},
"abc":{"def":"test"}
}
''';
RegExp regExp = new RegExp(
"\"shortcode_media\":(\\{.+\\})",
caseSensitive: false,
multiLine: false,
);
print(regExp.stringMatch(json).toString());
}
output
"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}
Dartpad
The corresponding Dart RegExp would be:
static final RegExp shortcodeMedia = RegExp(r'"shortcode_media":(\{.+\})");
It does not work, though. JSON is not a regular language, so you can't parse it using regular expressions.
The value of "shortcode_media" in your example JSON ends with several } characters. The RegExp will stop the match at the third of those, even though the second } is the one matching the leading {. If your JSON text contains any further values after the shortcode_media entry, those might be included as well.
Stopping at the first } would also be too short.
If someone reorders the JSON source code to the equivalent
"shortcode_media":{"dimensions":{"height":1326,"width":1080},"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG"}
(that is, putting the "dimensions" entry first), then you would only capture until the end of the dimensions block.
I would recommend either using a proper JSON parser, or at least improving the RegExp to be able to handle a single nested JSON object - since you seem to already know that it will happen.
Such a RegExp could be:
RegExp(r'"shortcode_media":(\{(?:[^{}]*(?:\{.*?\})?)*?\})')
This RegExp will capture the correct number of braces for the example code, but still won't work if there are more nested JSON objects. Only a real parser can handle the general case correctly.

One-liner to extract domain from email address

How to optionally extract domain from local-part#domain? My attempt is
Try(email.split("#")(1)).toOption
but seems there should be a way without depending on exception handling. Ideally, I am after one-liner.
Not one liner, and only works on 2.13. But this seems very clear to me.
def extractDomain(email: String): Option[String] = email match {
case s"${_}#${domain}" => Some(domain)
case _ => None
}
(Note, if there are more than one # sign, this will just split on the first one).
email.dropWhile(_ != '#').drop(1)
email.split("#").lastOption
These are equivalent ONLY if what's passed is an email address.
If the string passed doesn't include # then lastOption will still return a Some() of the entire string, whereas your solution will return a None.
So if you can trust your input then this answer provides a cleaner approach.
You can use Some(email.split("#")(1)), this will split the String and then wrap in Some, which is instance of Option.
Let me cheat a little: I will prepare separate file Email.scala with extractor:
object Email{
def unapply(mail: String): Option[(String, String)] = {
mail match {
case s"$user#$domain" => Some(user, domain)
case _ => None
}
}
}
and then it can be used with pattern matching:
val Email(_, domain) = "test#domain.com"
Not a one-liner, but I always match on array extractors when I do String.split (pre-2.13), I think it's short enough and reads much better than getting parts by index.
email.split("#", 2) match {
case Array(_, domainPart # _*) => domainPart.headOption
}
limit = 2 makes sure that domainPart has at most 1 element.
Note you don't need a catch-all in this case, since split will always return at least one value in the array (although makes sense to cover it with tests to protect against future changes).

Angular2 pipe regex url detection

I would like to have a pipe which is detecting any url in a string and creating a link with it. At the moment I created this, which seems not working:
#Pipe({name:'matchUrl'})
export class MatchUrlPipe implements PipeTransform {
transform(value: string, arg?: any): any {
var exp = /https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)/g;
return value.replace(exp, "<a href='$1'>$1</a>");
}
}
How can I fix it?
Seems like there are two problems with your implementation:
Your regex has the first capturing group ( $1 ) matching the 'www' part of the url. You want to change the regex like this for it to work (note the extra pair of parethesis at the start and end of the regex):
var exp = /(https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*))/g;
Pipes can't render html normally. You need a trick to do that as mentioned in other questione like this. You need to assign your 'piped value' to the attribute outerHTML of a span for example (the span will not be rendered).
Plunker example

Grails: Use regex with filter

I have two URL's addresses which I have filter for everyone of them:
all(uri: '/api/first/**')
{
before =
}
all(uri: '/api/second/**')
{
before =
}
I want to write just one filter for both.
So I have tried to write a filter with regex:
all(uri: '\\/api\\/(first|second)\\/.*', regex: true)
{
before =
}
But it doesn't work.
I have tried many ways ('**' / '.*' / invert: true)
But didn't succeed.
Does someone know where the mistake and what the right way to write the filter?
Thanks...
As per the documentation, uri is an ant path and does not support regular expressions. You need to rely on find:
all(regex: true, find: '/api/(first|second)/.*')
{
before = {
...
}
}

Regex to parse querystring values to named groups

I have a HTML with the following content:
... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...
I would like to parse that and get a match with named groups:
match 1
group["user"]=123
group["section"]=2
match 2
group["user"]=678
group["section"]=5
I can do it if parameters always go in order, first User and then Section, but I don't know how to do it if the order is different.
Thank you!
In my case I had to parse an Url because the utility HttpUtility.ParseQueryString is not available in WP7. So, I created a extension method like this:
public static class UriExtensions
{
private static readonly Regex queryStringRegex;
static UriExtensions()
{
queryStringRegex = new Regex(#"[\?&](?<name>[^&=]+)=(?<value>[^&=]+)");
}
public static IEnumerable<KeyValuePair<string, string>> ParseQueryString(this Uri uri)
{
if (uri == null)
throw new ArgumentException("uri");
var matches = queryStringRegex.Matches(uri.OriginalString);
for (int i = 0; i < matches.Count; i++)
{
var match = matches[i];
yield return new KeyValuePair<string, string>(match.Groups["name"].Value, match.Groups["value"].Value);
}
}
}
Then It's matter of using it, for example
var uri = new Uri(HttpUtility.UrlDecode(#"file.aspx?userId=123&section=2"),UriKind.RelativeOrAbsolute);
var parameters = uri.ParseQueryString().ToDictionary( kvp => kvp.Key, kvp => kvp.Value);
var userId = parameters["userId"];
var section = parameters["section"];
NOTE: I'm returning the IEnumerable instead of the dictionary directly just because I'm assuming that there might be duplicated parameter's name. If there are duplicated names, then the dictionary will throw an exception.
Why use regex to split it out?
You could first extrct the query string. Split the result on & and then create a map by splitting the result from that on =
You didn't specify what language you are working in, but this should do the trick in C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace RegexTest
{
class Program
{
static void Main(string[] args)
{
string subjectString = #"... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...";
Regex regexObj =
new Regex(#"<a href=""file.aspx\?(?:(?:userId=(?<user>.+?)&section=(?<section>.+?)"")|(?:section=(?<section>.+?)&user=(?<user>.+?)""))");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
string user = matchResults.Groups["user"].Value;
string section = matchResults.Groups["section"].Value;
Console.WriteLine(string.Format("User = {0}, Section = {1}", user, section));
matchResults = matchResults.NextMatch();
}
Console.ReadKey();
}
}
}
Using regex to first find the key value pairs and then doing splits... doesn't seem right.
I'm interested in a complete regex solution.
Anyone?
Check this out
\<a\s+href\s*=\s*["'](?<baseUri>.+?)\?(?:(?<key>.+?)=(?<value>.+?)[&"'])*\s*\>
You can get pairs with something like Groups["key"].Captures[i] & Groups["value"].Captures[i]
Perhaps something like this (I am rusty on regex, and wasn't good at them in the first place anyway. Untested):
/href="[^?]*([?&](userId=(?<user>\d+))|section=(?<section>\d+))*"/
(By the way, the XHTML is malformed; & should be & in the attributes.)
Another approach is to put the capturing groups inside lookaheads:
Regex r = new Regex(#"<a href=""file\.aspx\?" +
#"(?=[^""<>]*?user=(?<user>\w+))" +
#"(?=[^""<>]*?section=(?<section>\w+))";
If there are only two parameters, there's no reason to prefer this way over the alternation-based approaches suggested by Mike and strager. But if you needed to match three parameters, the other regexes would grow to several times their current length, while this one would only need another lookahead like just like the two existing ones.
By the way, contrary to your response to Claus, it matters quite a bit which language you're working in. There's a huge variation in capabilities, syntax, and API from one language to the next.
You did not say which regex flavor you are using. Since your sample URL links to an .aspx file, I'll assume .NET. In .NET, a single regex can have multiple named capturing groups with the same name, and .NET will treat them as if they were one group. Thus you can use the regex
userID=(?<user>\d+)&section=(?<section>\d+)|section=(?<section>\d+)&userID=(?<user>\d+)
This simple regex with alternation will be far more efficient than any tricks with lookaround. You can easily expand it if your requirements include matching the parameters only if they're in a link.
a simple python implementation overcoming the ordering problem
In [2]: x = re.compile('(?:(userId|section)=(\d+))+')
In [3]: t = 'href="file.aspx?section=2&userId=123"'
In [4]: x.findall(t)
Out[4]: [('section', '2'), ('userId', '123')]
In [5]: t = 'href="file.aspx?userId=123&section=2"'
In [6]: x.findall(t)
Out[6]: [('userId', '123'), ('section', '2')]