How to use "\w+" to find words in a string? - regex

I need to write a function that takes a string as input. This function will return a List[String]. I have to use the regular expression "\w+" in this function as a requirement for this task. So when given a line string of random text with a few actual words dotted around inside it, I need to add all of these 'proper' words and add them to the list to be returned. I must also use ".findAllIn". I have tried the following
def foo(stringIn: String) : List[String] = {
val regEx = """\w+""".r
val match = regEx.findAllIn(s).toList
match
}
But it just returns the string that I pass into the function.

match is a reserved keyword in scala. So you just need to replace that.
def foo(stringIn: String) : List[String] = {
val regEx = """\w+""".r
regEx.findAllIn(stringIn).toList
}
scala> foo("hey. how are you?")
res17: List[String] = List(hey, how, are, you)
\\w is the pattern for a word character, in the current regex context equal to [a-zA-Z_0-9], that matches a lower- and uppercase letters, digits and an underscore.
\\w+ is for one ore more occurrences of the above.
scala> foo("hey")
res18: List[String] = List(hey)
In above case, there is nothing for the regex to split by. Hence returns the original string.
scala> foo("hey-hey")
res20: List[String] = List(hey, hey)
- is not part of \\w. Hence it splits by -

Related

How do i use the regex in scala to check the first 3 chars of filename

What is the scala code to check the first 3 characters of a fileName is String
I want a boolean to be returned , If the first 3 chars of a fileName are letters , then true needs to be returned , otherwise false
val fileName = "ABC1234.dat"
val regex = "[A-Z]*".r
val result = fileName.substring(0,3) match {
case regex(fileName) => true
case _ => false
}
You could use findFirstIn matching 3 times a char a-zA-Z [A-Za-z]{3} or use \\p{L}{3} to match any letter from any language and check for nonEmpty on the Option
val fileName = "ABC1234.dat"
val regex = "[A-Za-z]{3}".r
regex.findFirstIn(fileName).nonEmpty
Output
res0: Boolean = true
If you want to use substring with matches as in the comment, matches takes a string as the regex and has to match the whole pattern.
fileName.substring(0,3).matches("(?i)[a-z]{3}")
Note that substring will give an StringIndexOutOfBoundsException if the string is shorter than the specified indices, and using findFirstIn with the Option would return false.

How to convert string that contains only characters and numbers in scala?

I have String of characters, numbers, symbols and slashes. I want to remove everything else except characters and number
my String is like val mystring="abd#1098\jaka.kdcs"
I want only abd1098jakakdcs
You can use isLetterOrDigit function on Char and filter required chars from the string.
scala> val str = "abd#1098\\jaka.kdcs"
str: String = abd#1098\jaka.kdcs
scala> str.filter(_.isLetterOrDigit)
res3: String = abd1098jakakdcs
In First step you need to use regular expressions to check characters and numbers only
Example : scala> "34Az".matches("[a-zA-Z0-9]{4}")

Scala regex get parameters in path

regex noob here.
example path:
home://Joseph/age=20/race=human/height=170/etc
Using regex, how do I grab everything after the "=" between the /Joseph/ path and /etc? I'm trying to create a list like
[20, human, 170]
So far I have
val pattern = ("""(?<=Joseph/)[^/]*""").r
val matches = pattern.findAllIn(path)
The pattern lets me just get "age=20" but I thought findAllIn would let me find all of the "parameter=" matches. And after that, I'm not sure how I would use regex to just obtain the "20" in "age=20", etc.
Code
See regex in use here
(?:(?<=/Joseph/)|\G(?!\A)/)[^=]+=([^=/]+)
Usage
See code in use here
object Main extends App {
val path = "home://Joseph/age=20/race=human/height=170/etc"
val pattern = ("""(?:(?<=/Joseph/)|\G(?!\A)/)[^=]+=([^=/]+)""").r
pattern.findAllIn(path).matchData foreach {
m => println(m.group(1))
}
}
Results
Input
home://Joseph/age=20/race=human/height=170/etc
Output
20
human
170
Explanation
(?:(?<=/Joseph/)|\G(?!\A)/) Match the following
(?<=/Joseph/) Positive lookbehind ensuring what precedes matches /Joseph/ literally
\G(?!\A)/ Assert position at the end of the previous match and match / literally
[^=]+ Match one or more of any character except =
= Match this literally
([^=/]+) Capture one or more of any character except = and / into capture group 1
Your pattern looks for the pattern directly after Joseph/, which is why only age=20 matched, maybe just look after =?
val s = "home://Joseph/age=20/race=human/height=170/etc"
// s: String = home://Joseph/age=20/race=human/height=170/etc
val pattern = "(?<==)[^/]*".r
// pattern: scala.util.matching.Regex = (?<==)[^/]*
pattern.findAllIn(s).toList
// res3: List[String] = List(20, human, 170)

Cannot retrive a group from Scala Regex match

I am struggling with regexps in Scala (2.11.5), I have a followin string to parse (example):
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
I want to extract third numeric value in the string above (it needs to be third after a slash because there can be other groups following), in order to do that I have the following regex pattern:
val pattern = """\/\d+,\d+,(\d+)""".r
I have been trying to retrieve the group for the third sequence of digits, but nothing seems to work for me.
val matchList = pattern.findAllMatchIn(string).foreach(println)
val matchListb = pattern.findAllIn(string).foreach(println)
I also tried using matching pattern.
string match {
case pattern(a) => println(a)
case _ => "What's going on?"
}
and got the same results. Either whole regexp is returned or nothing.
Is there an easy way to retrieve a group form regexp pattern in Scala?
You can use group method of scala.util.matching.Regex.Match to get the result.
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
val pattern = """\/\d+,\d+,(\d+)""".r
val result = pattern.findAllMatchIn(string) // returns iterator of Match
.toArray
.headOption // returns None if match fails
.map(_.group(1)) // select first regex group
// or simply
val result = pattern.findFirstMatchIn(string).map(_.group(1))
// result = Some(14058913)
// result will be None if the string does not match the pattern.
// if you have more than one groups, for instance:
// val pattern = """\/(\d+),\d+,(\d+)""".r
// result will be Some(56)
Pattern matching is usually the easiest way to do it, but it requires a match on the full string, so you'll have to prefix and suffix your regex pattern with .*:
val string = "http://sth.com/sth/56,57597,14058913,Article_title,,5.html"
val pattern = """.*\/\d+,\d+,(\d+).*""".r
val pattern(x) = string
// x: String = 14058913

Scala regex "starts with lowercase alphabets" not working

val AlphabetPattern = "^([a-z]+)".r
def stringMatch(s: String) = s match {
case AlphabetPattern() => println("found")
case _ => println("not found")
}
If I try,
stringMatch("hello")
I get "not found", but I expected to get "found".
My understanding of the regex,
[a-z] = in the range of 'a' to 'z'
+ = one more of the previous pattern
^ = starts with
So regex AlphabetPattern is "all strings that start with one or more alphabets in the range a-z"
Surely I am missing something, want to know what.
Replace case AlphabetPattern() with case AlphabetPattern(_) and it works. The extractor pattern takes a variable to which it binds the result. Here we discard it but you could use x or whatever.
edit: Further to Randall's comment below, if you check the docs for Regex you'll see that it has an unapplySeq rather than an unapply method, which means it takes multiple variables. If you have the wrong number, it won't match, rather like
list match { case List(a,b,c) => a + b + c }
won't match if list doesn't have exactly 3 elements.
There are some issues with the match statement. s match is matching on the value of s which is checked against AlphabetPattern and _ which always evaluates to _ since s is never equal to "^([a-z]+)".r. Use one of the find methods in Scala.Util.Regex to look for a match with the given `Regex.
For example, using findFirstIn to find the first match of a string in AlphabetPattern.
scala> AlphabetPattern.findFirstIn("hello")
res0: Option[String] = Some(hello)
The stringMatch method using findFirstIn and a case statement:
scala> def stringMatch(s: String) = AlphabetPattern findFirstIn s match {
| case Some(s) => println("Found: " + s)
| case None => println("Not found")
| }
stringMatch: (s:String)Unit
scala> stringMatch("hello")
Found: hello