How to convert string that contains only characters and numbers in scala? - regex

I have String of characters, numbers, symbols and slashes. I want to remove everything else except characters and number
my String is like val mystring="abd#1098\jaka.kdcs"
I want only abd1098jakakdcs

You can use isLetterOrDigit function on Char and filter required chars from the string.
scala> val str = "abd#1098\\jaka.kdcs"
str: String = abd#1098\jaka.kdcs
scala> str.filter(_.isLetterOrDigit)
res3: String = abd1098jakakdcs

In First step you need to use regular expressions to check characters and numbers only
Example : scala> "34Az".matches("[a-zA-Z0-9]{4}")

Related

How do i use the regex in scala to check the first 3 chars of filename

What is the scala code to check the first 3 characters of a fileName is String
I want a boolean to be returned , If the first 3 chars of a fileName are letters , then true needs to be returned , otherwise false
val fileName = "ABC1234.dat"
val regex = "[A-Z]*".r
val result = fileName.substring(0,3) match {
case regex(fileName) => true
case _ => false
}
You could use findFirstIn matching 3 times a char a-zA-Z [A-Za-z]{3} or use \\p{L}{3} to match any letter from any language and check for nonEmpty on the Option
val fileName = "ABC1234.dat"
val regex = "[A-Za-z]{3}".r
regex.findFirstIn(fileName).nonEmpty
Output
res0: Boolean = true
If you want to use substring with matches as in the comment, matches takes a string as the regex and has to match the whole pattern.
fileName.substring(0,3).matches("(?i)[a-z]{3}")
Note that substring will give an StringIndexOutOfBoundsException if the string is shorter than the specified indices, and using findFirstIn with the Option would return false.

How to use "\w+" to find words in a string?

I need to write a function that takes a string as input. This function will return a List[String]. I have to use the regular expression "\w+" in this function as a requirement for this task. So when given a line string of random text with a few actual words dotted around inside it, I need to add all of these 'proper' words and add them to the list to be returned. I must also use ".findAllIn". I have tried the following
def foo(stringIn: String) : List[String] = {
val regEx = """\w+""".r
val match = regEx.findAllIn(s).toList
match
}
But it just returns the string that I pass into the function.
match is a reserved keyword in scala. So you just need to replace that.
def foo(stringIn: String) : List[String] = {
val regEx = """\w+""".r
regEx.findAllIn(stringIn).toList
}
scala> foo("hey. how are you?")
res17: List[String] = List(hey, how, are, you)
\\w is the pattern for a word character, in the current regex context equal to [a-zA-Z_0-9], that matches a lower- and uppercase letters, digits and an underscore.
\\w+ is for one ore more occurrences of the above.
scala> foo("hey")
res18: List[String] = List(hey)
In above case, there is nothing for the regex to split by. Hence returns the original string.
scala> foo("hey-hey")
res20: List[String] = List(hey, hey)
- is not part of \\w. Hence it splits by -

Find index locations by regex pattern and replace them with a list of indexes in Scala

I have strings in this format:
object[i].base.base_x[i] and I get lists like List(0,1).
I want to use regular expressions in scala to find the match [i] in the given string and replace the first occurance with 0 and the second with 1. Hence getting something like object[0].base.base_x[1].
I have the following code:
val stringWithoutIndex = "object[i].base.base_x[i]" // basically this string is generated dynamically
val indexReplacePattern = raw"\[i\]".r
val indexValues = List(0,1) // list generated dynamically
if(indexValues.nonEmpty){
indexValues.map(row => {
indexReplacePattern.replaceFirstIn(stringWithoutIndex , "[" + row + "]")
})
else stringWithoutIndex
Since String is immutable, I cannot update stringWithoutIndex resulting into an output like List("object[0].base.base_x[i]", "object[1].base.base_x[i]").
I tried looking into StringBuilder but I am not sure how to update it. Also, is there a better way to do this? Suggestions other than regex are also welcome.
You couldloop through the integers in indexValues using foldLeft and pass the string stringWithoutIndex as the start value.
Then use replaceFirst to replace the first match with the current value of indexValues.
If you want to use a regex, you might use a positive lookahead (?=]) and a positive lookbehind (?<=\[) to assert the i is between opening and square brackets.
(?<=\[)i(?=])
For example:
val strRegex = """(?<=\[)i(?=])"""
val res = indexValues.foldLeft(stringWithoutIndex) { (s, row) =>
s.replaceFirst(strRegex, row.toString)
}
See the regex demo | Scala demo
How about this:
scala> val str = "object[i].base.base_x[i]"
str: String = object[i].base.base_x[i]
scala> str.replace('i', '0').replace("base_x[0]", "base_x[1]")
res0: String = object[0].base.base_x[1]
This sounds like a job for foldLeft. No need for the if (indexValues.nonEmpty) check.
indexValues.foldLeft(stringWithoutIndex) { (s, row) =>
indexReplacePattern.replaceFirstIn(s, "[" + row + "]")
}

Regular expression to match n times in which n is not fixed

The pattern I want to match is a sequence of length n where n is right before the sequence.
For example, when the input is "1aaaaa", I want to match the single character "a", as the first number specifies only 1 character is matched.
Similar, when the input is "2aaaaa", I want to match the first two characters "aa", but not the rest, as the number 2 specifies two characters will be matched.
I understand a{1} and a{2} will match "a" one or two times. But how to match a{n} in which n is not fixed?
Is it possible to do this type of match using regular expressions?
This will work for repeating numbers.
import re
a="1aaa2bbbbb1cccccccc4dddddddddddd"
for b in re.findall(r'\d[a-z]+', a):
print b[int(b[0])+1:int(b[0])+1+int(b[0])]
Output:
a
bb
c
dddd
Though I have done in Java, it will help you get going in your program.
Here you can select the first letter as sub-string from the given input string and use it in your regex to match the string accordingly.
public class DynamicRegex {
public static void main(String args[]){
Scanner scan = new Scanner(System.in);
System.out.println("Enter a string: ");
String str = scan.nextLine();
String testStr = str.substring(0, 1); //Get the first character from the string using sub-string.
String pattern = "a{"+ testStr +"}"; //Use the sub-string in your regex as length of the string to match.
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(str);
if(m.find()){
System.out.println(m.group());
}
}
}

How to detect if a string have letters in the beginning using scala funcs or regex?

I have a string number that might and might not have 2 or more chars in the beginning of the number, and maybe some chars that are not letters or numbers.
If its two or more in the beginning so delete the first 2 and clean the string from chars others than letters or numberss.
I want to detect that either using scala funcs or regex and clean this string.
examples:
"ABC12345" (after function) => "C12345"
"AB12345" (after function) => "12345"
"A12345" (after function) => "A12345"
"ABC1 23 +.4 5" (after function) => "C12345"
Regex matching characters which you want to remove:
^[A-Z]{2}|[^A-Z0-9]
It matches either exactly two letters at the beginning of string or anything other than [A-Z0-9].
Usage in Scala:
scala> val regex = """^[A-Z]{2}|[^A-Z0-9]""".r
regex: scala.util.matching.Regex = ^[A-Z]{2}|[^A-Z0-9]
scala> val ss = List("ABC12345", "A12345", "ABC1 23 +.4 5")
ss: List[String] = List(ABC12345, A12345, ABC1 23 +.4 5)
scala> ss.map(s => regex.replaceAllIn(s, ""))
res0: List[String] = List(C12345, A12345, C12345)