regular expression matching string in scala - regex

I have a string like this
result: String = /home/administrator/com.supai.common-api-1.8.5-DEV- SNAPPSHOT/com/a/infra/UserAccountDetailsMetaData$.class
/home/administrator/com.supai.common-api-1.8.5-DEV- SNAPSHOT/com/a/infra/UserAccountDetailsMetaData.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/UserAccountMetaData$.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/UserAccountMetaData.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/UserOverridenFunctionMetaDataMetaData$.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/UserOverridenFunctionMetaDataMetaData.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/UserOverridenPermissionMetaData$.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/UserOverridenPermissionMetaData.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/UserRoleMetaData$.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/UserRoleMetaData.class
/home/administrator/com.supai.common-api-1.8.5-DEV- SNAPSHOT/com/a/infra/VendorAddressMetaData$.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/VendorAddressMetaData.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/reactore/infra/VendorContactMetaData$.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/reactore/infra/VendorContactMetaData.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/reactore/infra/VendorMetaData$.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/VendorMetaData.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/WeekMetaData$.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/WeekMetaData.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/WorkflowMetadataMetaData$.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/WorkflowMetadataMetaData.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/WorkflowNotificationMetaData$.class
/home/administrator/com.supai.common-api-1.8.5-DEV-SNAPSHOT/com/a/infra/WorkflowNotificationMetaData.class
/home/a/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
/home/a/usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jar
/home/common/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
/home/raghav/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
/home/sysadmin/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
/home/tmp/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
/home/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
/home/usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jar
/usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jar
regex: scala.util.matching.Regex = (\\/([u|s|r])\\/([s|h|a|r|e]))
x: scala.util.matching.Regex.MatchIterator = empty iterator`
and out of this how can I get only this part /usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jarand this part can be anywhere in the string, how can I achieve this, I tried using regular expression in Scala but don't know how to use forward slashes, so anybody plz explain how to do this in scala.

What is your search criteria? Your pattern seems to be wrong.
In your rexexp, I see u|s|r which means to search for either u, or s or r . See here for more information
how can I get only this part
/usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jarand
this part can be anywhere in the string
If you are looking for a path, see the below example:
scala> val input = """/home/common/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
| /home/raghav/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
| /home/sysadmin/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
| /home/tmp/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
| /home/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
| /home/usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jar
| /usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jar"""
input: String =
/home/common/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
/home/raghav/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
/home/sysadmin/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
/home/tmp/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
/home/usr/share/common-api/lib/com.supai.common-api-1.3-SNAPSHOT.jar
/home/usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jar
/usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jar
scala> val myRegExp = "/usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jar".r
myRegExp: scala.util.matching.Regex = /usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jar
scala> val myRegExp2 = "helloWorld.jar".r
myRegExp2: scala.util.matching.Regex = helloWorld.jar
scala> (myRegExp findAllIn input) foreach( println)
/usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jar
/usr/share/common-api/lib/com.supai.common-api-1.8.5-DEV-SNAPSHOT.jar
scala> (myRegExp2 findAllIn input) foreach( println)
scala>

Related

Regex pattern to detect payload.* pattern

I want to validate all strings that have 'payload.*' i.e the string must start with 'payload' followed by a period (.) and followed by minimum 1 character.
example :-
Input1 :- payload.Hello Output1 :-> Valid
Input2 :- Hipayload.Hello Output1 :-> InValid
Input3 :- payload.H Output1 :-> Valid
Input4 :- payload. Output1 :-> InValid
You can use x.matches(y) (where x and y are both Strings) to match against a Regex pattern in Scala (or use "raw" Strings):
scala> val regex: String = "^payload\\..+"
regex: String = ^payload\..+
scala> val regexAltRaw: String = raw"^payload\..+"
regexAltRaw : String = ^payload\..+
scala> val regexAltTripleQuotes: String = """^payload\..+"""
regexAltTripleQuotes : String = ^payload\..+
scala> "".matches(regex)
res0: Boolean = false
scala> "payload.Hello".matches(regex)
res1: Boolean = true
scala> "Hipayload.Hello".matches(regex)
res2: Boolean = false
scala> "payload.H".matches(regex)
res3: Boolean = true
scala> "payload.".matches(regex)
res4: Boolean = false
To explain the pattern:
^payload - starts with "payload"
\\. - "." literally (without using "raw" Strings, Scala requires you to use double back-slashes to escape rather than single slashes like you would in normal Regex)
.+ - any character, one or more times
In order to avoid validation of input payload.. you can use foolowing regex:
^payload\.[^.]+$
Input Array:
val ar = Array("payload.Hello","Hipayload.Hello","payload.H","payload.")
Regex String:
val p = """^payload\..{1,}"""
In Scala REPL:
scala> val ar = Array("payload.Hello","Hipayload.Hello","payload.H","payload.")
ar: Array[String] = Array(payload.Hello, Hipayload.Hello, payload.H, payload.)
scala> val p = """^payload\..{1,}"""
p: String = ^payload\..{1,}
Test:
scala> ar.map(x=>if(x.matches(p))"Valid" else "InValid")
res3: Array[String] = Array(Valid, InValid, Valid, InValid)

Scala: Regex Pattern Matching

I have the following input strings
"/horses/c132?XXX=abc-049#companyorg"
"/Goats/b-01?XXX=abc-721#"
"/CATS/001?XXX=abc-451#CompanyOrg"
I'd like to obtain the following as output
"horses", "c132", "abc-049#companyorg"
"Goats", "b-01", "abc-721#"
"CATS", "001", "abc-451#CompanyOrg"
I tried the following
StandardTokenParsers
import scala.util.parsing.combinator.syntactical._
val p = new StandardTokenParsers {
lexical.reserved ++= List("/", "?", "XXX=")
def p = "/" ~ opt(ident) ~ "/" ~ opt(ident) ~ "?" ~ "XXX=" ~ opt(ident)
}
p: scala.util.parsing.combinator.syntactical.StandardTokenParsers{def p: this.Parser[this.~[this.~[this.~[String,Option[String]],String],Option[String]]]} = $anon$1#6ca97ddf
scala> p.p(new p.lexical.Scanner("/horses/c132?XXX=abc-049#companyorg"))
warning: there was one feature warning; re-run with -feature for details
res3: p.ParseResult[p.~[p.~[p.~[String,Option[String]],String],Option[String]]] =
[1.1] failure: ``/'' expected but ErrorToken(illegal character) found
/horses/c132?XXX=abc-049#companyorg
^
RegEx
import scala.util.matching.regex
val p1 = "(/)(.*)(/)(.*)(?)(XXX)(=)(.*)".r
p1: scala.util.matching.Regex = (/)(.*)(/)(.*)(?)(XXX)(=)(.*)
scala> val p1(_,animal,_,id,_,_,_,company) = "/horses/c132?XXX=abc-049#companyorg"
scala.MatchError: /horses/c132?XXX=abc-049#companyorg (of class java.lang.String)
... 32 elided
Can someone please help? Thanks!
Your pattern looks like /(desired-group1)/(desired-group2)?XXX=(desired-group3).
So, regex would be
scala> val extractionPattern = """(/)(.*)(/)(.*)(\?XXX=)(.*)""".r
extractionPattern: scala.util.matching.Regex = (/)(.*)(/)(.*)(\?XXX=)(.*)
note - escape ? char.
How it is going to work is,
Full match `/horses/c132?XXX=abc-049#companyorg`
Group 1. `/`
Group 2. `horses`
Group 3. `/`
Group 4. `c132`
Group 5. `?XXX=`
Group 6. `abc-049#companyorg`
Now, apply the regex which gives you the group of all matches
scala> extractionPattern.findAllIn("""/horses/c132?XXX=abc-049#companyorg""")
.matchData.flatMap{m => m.subgroups}.toList
res15: List[String] = List(/, horses, /, c132, ?XXX=, abc-049#companyorg)
Since you only care care about 2nd, 4th and 6th match, only collect those.
So the solution would look like,
scala> extractionPattern.findAllIn("""/horses/c132?XXX=abc-049#companyorg""")
.matchData.map(_.subgroups)
.flatMap(matches => Seq(matches(1), matches(3), matches(4))).toList
res16: List[String] = List(horses, c132, ?XXX=)
When your input does not match regex, you get empty result
scala> extractionPattern.findAllIn("""/horses/c132""")
.matchData.map(_.subgroups)
.flatMap(matches => Seq(matches(1), matches(3), matches(4))).toList
res17: List[String] = List()
Working regex here - https://regex101.com/r/HuGRls/1/

How to pull string value in url using scala regex?

I have below urls in my applications, I want to take one of the value in urls.
For example:
rapidvie value 416
Input URL: http://localhost:8080/bladdey/shop/?rapidView=416&projectKey=DSCI&view=detail&
Output should be: 416
I've written the code in scala using import java.util.regex.{Matcher, Pattern}
val p: Pattern = Pattern.compile("[?&]rapidView=(\\d+)[?&]")**strong text**
val m:Matcher = p.matcher(url)
if(m.find())
println(m.group(1))
I am getting output, but i want to migrate this scala using scala.util.matching library.
How to implement this in simply?
This code is working with java utils.
In Scala, you may use an unanchored regex within a match block to get just the captured part:
val s = "http://localhost:8080/bladdey/shop/?rapidView=416&projectKey=DSCI&view=detail&"
val pattern ="""[?&]rapidView=(\d+)""".r.unanchored
val res = s match {
case pattern(rapidView) => rapidView
case _ => ""
}
println(res)
// => 416
See the Scala demo
Details:
"""[?&]rapidView=(\d+)""".r.unanchored - the triple quoted string literal allows using single backslashes with regex escapes, and the .unanchored property makes the regex match partially, not the entire string
pattern(rapidView) gets the 1 or more digits part (captured with (\d+)) if a pattern finds a partial match
case _ => "" will return an empty string upon no match.
You can do this quite easily with Scala:
scala> val url = "http://localhost:8080/bladdey/shop/?rapidView=416&projectKey=DSCI&view=detail&"
url: String = http://localhost:8080/bladdey/shop/?rapidView=416&projectKey=DSCI&view=detail&
scala> url.split("rapidView=").tail.head.split("&").head
res0: String = 416
You can also extend it by parameterize the search word:
scala> def searchParam(sp: String) = sp + "="
searchParam: (sp: String)String
scala> val sw = "rapidView"
sw: String = rapidView
And just search with the parameter name
scala> url.split(searchParam(sw)).tail.head.split("&").head
res1: String = 416
scala> val sw2 = "projectKey"
sw2: String = projectKey
scala> url.split(searchParam(sw2)).tail.head.split("&").head
res2: String = DSCI

strange behaviour with filter?

I want to extract MIME-like headers (starting with [Cc]ontent- ) from a multiline string:
scala> val regex = "[Cc]ontent-".r
regex: scala.util.matching.Regex = [Cc]ontent-
scala> headerAndBody
res2: String =
"Content-Type:application/smil
Content-ID:0.smil
content-transfer-encoding:binary
<smil><head>
"
This fails
scala> headerAndBody.lines.filter(x => regex.pattern.matcher(x).matches).toList
res4: List[String] = List()
but the "related" cases work as expected:
scala> headerAndBody.lines.filter(x => regex.pattern.matcher("Content-").matches).toList
res5: List[String] = List(Content-Type:application/smil, Content-ID:0.smil, content-transfer-encoding:binary, <smil><head>)
and:
scala> headerAndBody.lines.filter(x => x.startsWith("Content-")).toList
res8: List[String] = List(Content-Type:application/smil, Content-ID:0.smil)
what am I doing wrong in
x => regex.pattern.matcher(x).matches
since it returns an empty List??
The reason for the failure with the first line is that you use the java.util.regex.Matcher.matches() method that requires a full string match.
To fix that, use the Matcher.find() method that searches for the match anywhere inside the input string and use the "^[Cc]ontent-" regex (note that the ^ symbol will force the match to appear at the start of the string).
Note that this line of code does not work as you expect:
headerAndBody.lines.filter(x => regex.pattern.matcher("Content-").matches).toList
You run the regex check against the pattern Content-, and it is always true (that is why you get all the lines in the result).
See this IDEONE demo:
val headerAndBody = "Content-Type:application/smil\nContent-ID:0.smil\ncontent-transfer-encoding:binary\n<smil><head>"
val regex = "^[Cc]ontent-".r
val s1 = headerAndBody.lines.filter(x => regex.pattern.matcher(x).find()).toList
println(s1)
val s2 = headerAndBody.lines.filter(x => regex.pattern.matcher("Content-").matches).toList
print (s2)
Results (the first is the fix, and the second shows that your second line of code fails):
List(Content-Type:application/smil, Content-ID:0.smil, content-transfer-encoding:binary)
List(Content-Type:application/smil, Content-ID:0.smil, content-transfer-encoding:binary, <smil><head>)
Your regexp should match all line but not only first sub-string.
val regex = "[Cc]ontent-.*".r

Scala regex pattern match groups different from that using findAllIn

I find that the groups extracted by Pattern-matching on regex's in Scala are different from those extracted using findAllIn function.
1) Here is an example of extraction using pattern match -
scala> val fullRegex = """(.+?)=(.+?)""".r
fullRegex: scala.util.matching.Regex = (.+?)=(.+?)
scala> val x = """a='b'"""
x: String = a='b'
scala> x match { case fullRegex(l,r) => println( l ); println(r) }
a
'b'
2) And here is an example of extraction using the findAllIn function -
scala> fullRegex.findAllIn(x).toArray
res4: Array[String] = Array(a=')
I was expecting the returned Array using findAllIn to be Array(a, 'b'). Why is it not so?
This is because you have not specified to what extent the second lazy match should go. So after = it consumes just one character and stops as it is in lazy mode.
See here.
https://regex101.com/r/dU7oN5/10
Change it to .+?=.+ to get full array
In particular, the pattern match's use of unapplySeq uses Matcher.matches, while findAllIn uses Matcher.find. matches tries to match entire input.
scala> import java.util.regex._
import java.util.regex._
scala> val p = Pattern compile ".+?"
p: java.util.regex.Pattern = .+?
scala> val m = p matcher "hello"
m: java.util.regex.Matcher = java.util.regex.Matcher[pattern=.+? region=0,5 lastmatch=]
scala> m.matches
res0: Boolean = true
scala> m.group
res1: String = hello
scala> m.reset
res2: java.util.regex.Matcher = java.util.regex.Matcher[pattern=.+? region=0,5 lastmatch=]
scala> m.find
res3: Boolean = true
scala> m.group
res4: String = h
scala>