matching exact string in Ocaml using regex - regex

How to find a exact match using regular expression in Ocaml? For example, I have a code like this:
let contains s1 s2 =
let re = Str.regexp_string s2
in
try ignore (Str.search_forward re s1 0); true
with Not_found -> false
where s2 is "_X_1" and s1 feeds strings like "A_1_X_1", "A_1_X_2", ....and so on to the function 'contains'. The aim is to find the exact match when s1 is "A_1_X_1". But the current code finds match even when s1 is "A_1_X_10", "A_1_X_11", "A_1_X_100" etc.
I tried with "[_x_1]", "[_X_1]$" as s2 instead of "_X_1" but does not seem to work. Can somebody suggest what can be wrong?

You can use the $ metacharacter to match the end of the line (which, assuming the string doens't contain multiple lines, is the end of the string). But you can't put that through Str.regexp_string; that just escapes the metacharacters. You should first quote the actual substring part, and then append the $, and then make a regexp from that:
let endswith s1 s2 =
let re = Str.regexp (Str.quote s2 ^ "$")
in
try ignore (Str.search_forward re s1 0); true
with Not_found -> false

Str.match_end is what you need:
let ends_with patt str =
let open Str in
let re = regexp_string patt in
try
let len = String.length str in
ignore (search_backward re str len);
match_end () == len
with Not_found -> false
With this definition, the function works as you require:
# ends_with "_X_1" "A_1_X_10";;
- : bool = false
# ends_with "_X_1" "A_1_X_1";;
- : bool = true
# ends_with "_X_1" "_X_1";;
- : bool = true
# ends_with "_X_1" "";;
- : bool = false

A regex will match anywhere in the input, so the behaviour you see is normal.
You need to anchor your regex: ^_X_1$.
Also, [_x_1] will not help: [...] is a character class, here you ask the regex engine to match a character which is x, 1 or _.

Related

How do i use the regex in scala to check the first 3 chars of filename

What is the scala code to check the first 3 characters of a fileName is String
I want a boolean to be returned , If the first 3 chars of a fileName are letters , then true needs to be returned , otherwise false
val fileName = "ABC1234.dat"
val regex = "[A-Z]*".r
val result = fileName.substring(0,3) match {
case regex(fileName) => true
case _ => false
}
You could use findFirstIn matching 3 times a char a-zA-Z [A-Za-z]{3} or use \\p{L}{3} to match any letter from any language and check for nonEmpty on the Option
val fileName = "ABC1234.dat"
val regex = "[A-Za-z]{3}".r
regex.findFirstIn(fileName).nonEmpty
Output
res0: Boolean = true
If you want to use substring with matches as in the comment, matches takes a string as the regex and has to match the whole pattern.
fileName.substring(0,3).matches("(?i)[a-z]{3}")
Note that substring will give an StringIndexOutOfBoundsException if the string is shorter than the specified indices, and using findFirstIn with the Option would return false.

Kotlin .split() with multiple regex

Input: """aaaabb\\\\\cc"""
Pattern: ["""aaa""", """\\""", """\"""]
Output: [aaa, abb, \\, \\, \, cc]
How can I split Input to Output using patterns in Pattern in Kotlin?
I found that Regex("(?<=cha)|(?=cha)") helps patterns to remain after spliting, so I tried to use looping, but some of the patterns like '\' and '[' require escape backslash, so I'm not able to use loop for spliting.
EDIT:
val temp = mutableListOf<String>()
for (e in Input.split(Regex("(?<=\\)|(?=\\)"))) temp.add(e)
This is what I've been doing, but this does not work for multiple regex, and this add extra "" at the end of temp if Input ends with "\"
You may use the function I wrote for some previous question that splits by a pattern keeping all matched and non-matched substrings:
private fun splitKeepDelims(s: String, rx: Regex, keep_empty: Boolean = true) : MutableList<String> {
var res = mutableListOf<String>() // Declare the mutable list var
var start = 0 // Define var for substring start pos
rx.findAll(s).forEach { // Looking for matches
val substr_before = s.substring(start, it.range.first()) // // Substring before match start
if (substr_before.length > 0 || keep_empty) {
res.add(substr_before) // Adding substring before match start
}
res.add(it.value) // Adding match
start = it.range.last()+1 // Updating start pos of next substring before match
}
if ( start != s.length ) res.add(s.substring(start)) // Adding text after last match if any
return res
}
You just need a dynamic pattern from yoyur Pattern list items by joining them with a |, an alternation operator while remembering to escape all the items:
val Pattern = listOf("aaa", """\\""", "\\") // Define the list of literal patterns
val rx = Pattern.map{Regex.escape(it)}.joinToString("|").toRegex() // Build a pattern, \Qaaa\E|\Q\\\E|\Q\\E
val text = """aaaabb\\\\\cc"""
println(splitKeepDelims(text, rx, false))
// => [aaa, abb, \\, \\, \, cc]
See the Kotlin demo
Note that between \Q and \E, all chars in the pattern are considered literal chars, not special regex metacharacters.

Scala regex get parameters in path

regex noob here.
example path:
home://Joseph/age=20/race=human/height=170/etc
Using regex, how do I grab everything after the "=" between the /Joseph/ path and /etc? I'm trying to create a list like
[20, human, 170]
So far I have
val pattern = ("""(?<=Joseph/)[^/]*""").r
val matches = pattern.findAllIn(path)
The pattern lets me just get "age=20" but I thought findAllIn would let me find all of the "parameter=" matches. And after that, I'm not sure how I would use regex to just obtain the "20" in "age=20", etc.
Code
See regex in use here
(?:(?<=/Joseph/)|\G(?!\A)/)[^=]+=([^=/]+)
Usage
See code in use here
object Main extends App {
val path = "home://Joseph/age=20/race=human/height=170/etc"
val pattern = ("""(?:(?<=/Joseph/)|\G(?!\A)/)[^=]+=([^=/]+)""").r
pattern.findAllIn(path).matchData foreach {
m => println(m.group(1))
}
}
Results
Input
home://Joseph/age=20/race=human/height=170/etc
Output
20
human
170
Explanation
(?:(?<=/Joseph/)|\G(?!\A)/) Match the following
(?<=/Joseph/) Positive lookbehind ensuring what precedes matches /Joseph/ literally
\G(?!\A)/ Assert position at the end of the previous match and match / literally
[^=]+ Match one or more of any character except =
= Match this literally
([^=/]+) Capture one or more of any character except = and / into capture group 1
Your pattern looks for the pattern directly after Joseph/, which is why only age=20 matched, maybe just look after =?
val s = "home://Joseph/age=20/race=human/height=170/etc"
// s: String = home://Joseph/age=20/race=human/height=170/etc
val pattern = "(?<==)[^/]*".r
// pattern: scala.util.matching.Regex = (?<==)[^/]*
pattern.findAllIn(s).toList
// res3: List[String] = List(20, human, 170)

How to pull string value in url using scala regex?

I have below urls in my applications, I want to take one of the value in urls.
For example:
rapidvie value 416
Input URL: http://localhost:8080/bladdey/shop/?rapidView=416&projectKey=DSCI&view=detail&
Output should be: 416
I've written the code in scala using import java.util.regex.{Matcher, Pattern}
val p: Pattern = Pattern.compile("[?&]rapidView=(\\d+)[?&]")**strong text**
val m:Matcher = p.matcher(url)
if(m.find())
println(m.group(1))
I am getting output, but i want to migrate this scala using scala.util.matching library.
How to implement this in simply?
This code is working with java utils.
In Scala, you may use an unanchored regex within a match block to get just the captured part:
val s = "http://localhost:8080/bladdey/shop/?rapidView=416&projectKey=DSCI&view=detail&"
val pattern ="""[?&]rapidView=(\d+)""".r.unanchored
val res = s match {
case pattern(rapidView) => rapidView
case _ => ""
}
println(res)
// => 416
See the Scala demo
Details:
"""[?&]rapidView=(\d+)""".r.unanchored - the triple quoted string literal allows using single backslashes with regex escapes, and the .unanchored property makes the regex match partially, not the entire string
pattern(rapidView) gets the 1 or more digits part (captured with (\d+)) if a pattern finds a partial match
case _ => "" will return an empty string upon no match.
You can do this quite easily with Scala:
scala> val url = "http://localhost:8080/bladdey/shop/?rapidView=416&projectKey=DSCI&view=detail&"
url: String = http://localhost:8080/bladdey/shop/?rapidView=416&projectKey=DSCI&view=detail&
scala> url.split("rapidView=").tail.head.split("&").head
res0: String = 416
You can also extend it by parameterize the search word:
scala> def searchParam(sp: String) = sp + "="
searchParam: (sp: String)String
scala> val sw = "rapidView"
sw: String = rapidView
And just search with the parameter name
scala> url.split(searchParam(sw)).tail.head.split("&").head
res1: String = 416
scala> val sw2 = "projectKey"
sw2: String = projectKey
scala> url.split(searchParam(sw2)).tail.head.split("&").head
res2: String = DSCI

Scala: concatenating a string in a regex pattern string causing issue

If I am doing this it is working fine:
val string = "somestring;userid=someidT;otherstuffs"
var pattern = """[;?&]userid=([^;&]+)?(;|&|$)""".r
val result = pattern.findFirstMatchIn(string).get;
But I am getting an error when I am doing this
val string = "somestring;userid=someidT;otherstuffs"
val id_name = "userid"
var pattern = """[;?&]""" + id_name + """=([^;&]+)?(;|&|$)""".r
val result = pattern.findFirstMatchIn(string).get;
This is the error:
error: value findFirstMatchIn is not a member of String
You may use an interpolated string literal and use a bit simpler regex:
val string = "somestring;userid=someidT;otherstuffs"
val id_name = "userid"
var pattern = s"[;?&]${id_name}=([^;&]*)".r
val result = pattern.findFirstMatchIn(string).get.group(1)
println(result)
// => someidT
See the Scala demo.
The [;?&]$id_name=([^;&]*) pattern finds ;, ? or & and then userId (since ${id_name} is interpolated) and then = is matched and then any 0+ chars other than ; and & are captured into Group 1 that is returned.
NOTE: if you want to use a $ as an end of string anchor in the interpolated string literal use $$.
Also, remember to Regex.quote("pattern") if the variable may contain special regex operators like (, ), [, etc. See Scala: regex, escape string.
Add parenthesis around the string so that regex is made after the string has been constructed instead of the other way around:
var pattern = ("[;?&]" + id_name + "=([^;&]+)?(;|&|$)").r
// pattern: scala.util.matching.Regex = [;?&]userid=([^;&]+)?(;|&|$)
val result = pattern.findFirstMatchIn(string).get;
// result: scala.util.matching.Regex.Match = ;userid=someidT;