How can you get the submap of map with a string being a pattern ? Example, I have this map :
def map = [val1:ATOPKLPP835, val2: ATOPKLPP847, val3:ATOPKLPP739, val4:YYHSTYSTX439, val5:UUSTETSFEE34]
The first three values are identical until the ninth character. I would like to get a submap only with the string "ATOPKLPP". How can I do ?
Have a look at this:
def map = [val1:'ATOPKLPP835', val2: 'ATOPKLPP847', val3:'ATOPKLPP739', val4:'YYHSTYSTX439', val5:'UUSTETSFEE34']
def found = map.findAll { it.value.startsWith('ATOPKLPP')}
assert found == [val1:'ATOPKLPP835', val2:'ATOPKLPP847', val3:'ATOPKLPP739']
You can define whatever criterion in closure passed to findAll.
Related
I am starting to learn Scala and want to use regular expressions to match a character from a string so I can populate a mutable map of characters and their value (String values, numbers etc) and then print the result.
I have looked at several answers on SO and gone over the Scala Docs but can't seem to get this right. I have a short Lexer class that currently looks like this:
class Lexer {
private val tokens: mutable.Map[String, Any] = collection.mutable.Map()
private def checkCharacter(char: Character): Unit = {
val Operator = "[-+*/^%=()]".r
val Digit = "[\\d]".r
val Other = "[^\\d][^-+*/^%=()]".r
char.toString match {
case Operator(c) => tokens(c) = "Operator"
case Digit(c) => tokens(c) = Integer.parseInt(c)
case Other(c) => tokens(c) = "Other" // Temp value, write function for this
}
}
def lex(input: String): Unit = {
val inputArray = input.toArray
for (s <- inputArray)
checkCharacter(s)
for((key, value) <- tokens)
println(key + ": " + value)
}
}
I'm pretty confused by the sort of strange method syntax, Operator(c), that I have seen being used to handle the value to match and am also unsure if this is the correct way to use regex in Scala. I think what I want this code to do is clear, I'd really appreciate some help understanding this. If more info is needed I will supply what I can
This official doc has lot's of examples: https://www.scala-lang.org/api/2.12.1/scala/util/matching/Regex.html. What might be confusing is the type of the regular expression and its use in pattern matching...
You can construct a regex from any string by using .r:
scala> val regex = "(something)".r
regex: scala.util.matching.Regex = (something)
Your regex becomes an object that has a few useful methods to be able to find matching groups like findAllIn.
In Scala it's idiomatic to use pattern matching for safe extraction of values, thus Regex class also has unapplySeq method to support pattern matching. This makes it an extractor object. You can use it directly (not common):
scala> regex.unapplySeq("something")
res1: Option[List[String]] = Some(List(something))
or you can let Scala compiler call it for you when you do pattern matching:
scala> "something" match {
| case regex(x) => x
| case _ => ???
| }
res2: String = something
You might ask why exactly this return type on unapply/unapplySeq. The doc explains it very well:
The return type of an unapply should be chosen as follows:
If it is just a test, return a Boolean. For instance case even().
If it returns a single sub-value of type T, return an Option[T].
If you want to return several sub-values T1,...,Tn, group them in an optional tuple Option[(T1,...,Tn)].
Sometimes, the number of values to extract isn’t fixed and we would
like to return an arbitrary number of values, depending on the input.
For this use case, you can define extractors with an unapplySeq method
which returns an Option[Seq[T]]. Common examples of these patterns
include deconstructing a List using case List(x, y, z) => and
decomposing a String using a regular expression Regex, such as case
r(name, remainingFields # _*) =>
In short your regex might match one or more groups, thus you need to return a list/seq. It has to be wrapped in an Option to comply with extractor contract.
The way you are using regex is correct, I would just map your function over the input array to avoid creating mutable maps. Perhaps something like this:
class Lexer {
private def getCharacterType(char: Character): Any = {
val Operator = "([-+*/^%=()])".r
val Digit = "([\\d])".r
//val Other = "[^\\d][^-+*/^%=()]".r
char.toString match {
case Operator(c) => "Operator"
case Digit(c) => Integer.parseInt(c)
case _ => "Other" // Temp value, write function for this
}
}
def lex(input: String): Unit = {
val inputArray = input.toArray
val tokens = inputArray.map(x => x -> getCharacterType(x))
for((key, value) <- tokens)
println(key + ": " + value)
}
}
scala> val l = new Lexer()
l: Lexer = Lexer#60f662bd
scala> l.lex("a-1")
a: Other
-: Operator
1: 1
I have a list in Groovy which contains the names in the below format:
My_name_is_Jack
My_name_is_Rock
My_name_is_Sunn
How can I trim the list and get only the last part of it; i.e. Names - Jack, Rock and Sunn. (Please note that the names are only 4 characters long)
Here you go with either one of the approach.
You can use sustring with lastIndexOf
or replace method to remove My_name_is_ with empty string
Script (using the first approach):
def list = ['My_name_is_Jack', 'My_name_is_Rock', 'My_name_is_Sunn']
//Closure to get the name
def getName = { s -> s.substring(s.lastIndexOf('_')+1, s.size()) }
println list.collect{getName it}
If you want to use replace, then use below closure.
def getName = { s -> s.replace('My_name_is_','') }
You can quickly try it online demo
Or
def list = ['My_name_is_Jack', 'My_name_is_Rock', 'My_name_is_Sunn']
println list*.split('_')*.getAt(-1)
You can either remove the common prefix:
def names = [ "My_name_is_Jack", "My_name_is_Rock", "My_name_is_Sunn", ]
assert ['Jack', 'Rock', 'Sunn'] == names*.replaceFirst('My_name_is_','')
or since you are actually interrested in the last four chars, you can also take those:
assert ['Jack', 'Rock', 'Sunn'] == names*.getAt(-4..-1)
How do I replace comma and right parantheses at the same time, ,') with ), in groovy?
I tried replaceAll with double escape
value = "('cat','rat',',')";
//Replace ,') with )
value = value.replaceAll('\\,')',')');
Tried these with no luck
How can I replace a string in parentheses using a regex?
How to escape comma and double quote at same time for CSV file?
Your question is a bit cofusing, but to replace ,') you don't need escapes at all. Simply use
def value = "('cat','rat',',')";
println value.replace(",')", ")"); // ('cat','rat',')
However, I think you rather want this result ('cat','rat'). Right?
If so, you can use the following code, using Pattern:
import java.util.regex.Pattern
def value = "('cat','rat',',')";
def pattern = Pattern.compile(",'\\)");
def matcher = pattern.matcher(value);
while (matcher.find()) {
value = matcher.replaceAll(")");
matcher = pattern.matcher(value);
}
println value; // ('cat','rat')
Explanation:
You are creating the second replaceable text with your regex, it's not there when you try to replace it, but get's created as a result of the first replacement. So we create a new matcher in the loop and let it find the string again...
I am trying to filter out rows of a text file whose second column value begins with words from a list.
I have the list such as:
val mylist = ["Inter", "Intra"]
If I have a row like:
Cricket Inter-house
Inter is in the list, so that row should get filtered out by the RDD.filter operation. I am using the following regex:
`[A-Za-z0-9]+`
I tried using """[A-Za-z0-9]+""".r to extract the substring but the result is in a non empty iterator.
My question is how to access the above result in the filter operation?
You need to construct regular expression like ".* Inter.*".r since """[A-Za-z0-9]+""" matches any word. Here is some working example, hope it helps:
val mylist = List("Inter", "Intra")
val textRdd = sc.parallelize(List("Cricket Inter-house", "Cricket Int-house",
"AAA BBB", "Cricket Intra-house"))
// map over my list to dynamically construct regular expressions and check if it is within
// the text and use reduce to make sure none of the pattern exists in the text, you have to
// call collect() to see the result or take(5) if you just want to see the first five results.
(textRdd.filter(text => mylist.map(word => !(".* " + word + ".*").r
.pattern.matcher(text).matches).reduce(_&&_)).collect())
// res1: Array[String] = Array(Cricket Int-house, AAA BBB)
filter will remove anything for which the function passed to the filter method returns true. Thus, Regex isn't exactly what you want. Instead, let's develop a function that takes a row and compares it against a candidate string and returns true if the second column in that row starts with the candidate:
val filterFunction: (String, String) => Boolean =
(row, candidate) => row.split(" ").tail.head.startsWith(candidate)
We can convince ourselves that this works pretty easily using a worksheet:
// Test data
val mylist = List("Inter", "Intra")
val file = List("Cricket Inter-house", "Boom Shakalaka")
filterFunction("Cricket Inter-house", "Inter") // true
filterFunction("Cricket Inter-house", "Intra") // false
filterFunction("Boom Shakalaka", "Inter") // false
filterFunction("Boom Shakalaka", "Intra") // false
Now all that remains is to utilize this function in the filter. Essentially, for every row, we want to test the filter against every line in our candidate list. That means taking the candidate list and 'folding left' to check every item on it against the function. If any candidate reports true, then we know that row should be filtered out of the final result:
val result = file.filter((row: String) => {
!mylist.foldLeft(false)((x: Boolean, candidate: String) => {
x || filterFunction(row, candidate)
})
})
// result: List[String] = List(Boom Shakalaka)
The above can be a little dense to unpack. We are passing to the filter method a function that takes in a row and produces a boolean value. We want that value to be true if and only if the row does not match our criteria. We've already embedded our criteria in the filterFunction: we just need to run it against every combination of item in mylist.
To do this we use foldLeft, which takes a starting value (in this case false) and iteratively moves through the list, updating that starting value and returning the final result.
To 'update' that value we write a function that logically-ORs the starting value with the result of running our filter function against the row and the current item in mylist.
I'm still learning the basics and I have a question.
I have a function
def reverse(s: String): String = {
s.reverse
}
Now I have a List[String] and I want to reverse each String element.
I've tried foreach, but it seems to return Unit, not String.
So, I want a List[String] with the same elements, but the strings reversed.
List(abcd, efgh) becomes List(dcba, hgfe).
What I have now:
def reverse(ls : List[String]):List[String] = {
List(ls.foreach (reverse))
}
Use map method:
List("abcd", "efgh").map(s => reverse(s))
Or simply:
List("abcd", "efgh").map(reverse)
Unlike foreach which is here for side effects (like printing things out) map does returns result.
try this,
List("abcd", "efgh").reverse