How to validate if a string only contains number chars in Ocaml - ocaml

I'm using Str.regexp, I want to know how to check if undetermined length string contains only number characters.
This is what I'm doing:
Str.string_match "[0-9]+" "1212df3124" 0;;
The problem is it evaluates to true, but it should returns false because it contains 'df' substring. (This is not the same as C# regexp, it's Ocaml)

The Str.string_match function checks whether the pattern matches starting at the index you supply. As long as there's at least one digit at the beginning of the string, your pattern will match. If the string starts with something other than a digit, your pattern will fail to match:
# Str.string_match (Str.regexp "[0-9]+") "df3124" 0;;
- : bool = false
To check against the whole string, you need to "anchor" the pattern to the end with $. I.e., you need to make sure the match goes to the end of the string.
# Str.string_match (Str.regexp "[0-9]+") "1212df3124" 0;;
- : bool = true
# Str.string_match (Str.regexp "[0-9]+$") "1212df3124" 0;;
- : bool = false
# Str.string_match (Str.regexp "[0-9]+$") "3141592" 0;;
- : bool = true
# Str.string_match (Str.regexp "[0-9]+$") "" 0;;
- : bool = false

Another solutions is to use int_of_string to see if it raises an exception:
let check_str s =
try int_of_string s |> ignore; true
with Failure _ -> false
If you are going to convert your string to an integer anyway, you can use that.
Beware, it will allow everything that OCaml's parser consider to be an integer
check_str "10";; //gives true
check_str "0b10";; //gives true, 0b11 = 2
check_str "0o10";; //gives true, 0o10 = 8
check_str "0x10";; //gives true, 0x10 = 16
So if you want to allow only decimal representation you can do:
let check_str s =
try (int_of_string s |> string_of_int) = s
with Failure _ -> false
as string_of_int returns the string representation of an integer, in decimal.

Related

UDF (Regular expression) to match a string variants with some exclusions

I need to use (Regular expression) on the string Mod* followed by a specific one character e.g. "A" , like:
Mod A , Mod_A , Module xx A , Modules (A & B) and so on.
But, with the following conditions:
(1)- if the cell contains any of (Modif* or Moder* or Modr*) and Mod* Plus my specific character then the result is True
(2)- if the cell contains any of (Modif* or Moder* or Modr*) and not Mod* Plus my specific character then the result is False
Please this example and the expected result:
Item Description
Expected Result of RegexMatch
new modified of module A 1
TRUE
new modification of mod A
TRUE
new moderate of mod_A
TRUE
to modules (A & B)
TRUE
new modified and moderate A 1
FALSE
new modification of  A
FALSE
new moderate of modify
FALSE
to modules (D & E)
FALSE
Public Function RegexMatch(str) As Boolean
Dim tbx2 As String: tbx2 = "A" 'ActiveSheet.TextBox2.Value
Static re As New RegExp
re.Pattern = "\b[M]od(?!erate).*\b[" & tbx2 & "]\b"
re.IgnoreCase = True
RegexMatch = re.Test(str)
End Function
In advance, great thanks for your kindly help.
Not sure if I understand your requirements correctly: You want rows that contain a word that starts with "mod", but words starting with "Modif" or "Moder" or "Modr" doesn't count. Additionally, a module character (eg "A") needs to be present.
I usually get dizzy when I see longer regex terms, so I try to program some lines of code instead. The following function replaces special characters like "(" or "_" with blanks, splits the string into words and check the content word by word. Easy to understand, easy to adapt:
Function CheckModul(s As String, modulChar As String) As Boolean
Dim words() As String
words = Split(replaceSpecialChars(s), " ")
Dim i As Long, hasModul As Boolean, hasModulChar As Boolean
For i = 0 To UBound(words)
Dim word As String
word = UCase(words(i))
If word Like "MOD*" _
And Not word Like "MODIF*" _
And Not word Like "MODER*" _
And Not word Like "MODR*" Then
hasModul = True
End If
If word = modulChar Then
hasModulChar = True
End If
Next
CheckModul = hasModul And hasModulChar
End Function
Function replaceSpecialChars(ByVal s As String) As String
Dim i As Long
replaceSpecialChars = s
For i = 1 To Len(replaceSpecialChars)
If Mid(replaceSpecialChars, i, 1) Like "[!0-9A-Za-z]" Then Mid(replaceSpecialChars, i) = " "
Next
End Function
Tested as UDF with your data:

How do i use the regex in scala to check the first 3 chars of filename

What is the scala code to check the first 3 characters of a fileName is String
I want a boolean to be returned , If the first 3 chars of a fileName are letters , then true needs to be returned , otherwise false
val fileName = "ABC1234.dat"
val regex = "[A-Z]*".r
val result = fileName.substring(0,3) match {
case regex(fileName) => true
case _ => false
}
You could use findFirstIn matching 3 times a char a-zA-Z [A-Za-z]{3} or use \\p{L}{3} to match any letter from any language and check for nonEmpty on the Option
val fileName = "ABC1234.dat"
val regex = "[A-Za-z]{3}".r
regex.findFirstIn(fileName).nonEmpty
Output
res0: Boolean = true
If you want to use substring with matches as in the comment, matches takes a string as the regex and has to match the whole pattern.
fileName.substring(0,3).matches("(?i)[a-z]{3}")
Note that substring will give an StringIndexOutOfBoundsException if the string is shorter than the specified indices, and using findFirstIn with the Option would return false.

Regex not working for empty string - Swift

My function should handle every regex and return a true or false. It's working good... still now
func test(_ input: String) -> Bool {
let pattern = ".{7}" //allow exactly 7 numbers
let regex = try! NSRegularExpression(pattern: pattern, options: [NSRegularExpression.Options.caseInsensitive])
let leftover = regex.stringByReplacingMatches(in: input, options: [], range: NSMakeRange(0, input.characters.count), withTemplate: "")
if leftover.isEmpty {
return true
}
return false
}
print(test("123456")) //false
print(test("1234567")) //true
print(test("12345678")) //false
print(test("")) //true - I expect false
So I understand why test("") is false. But how can I fix my regex that it return false?
Sometimes I use the regex .* My function should handle this one, too. So I can't make a check like this
if input.isEmpty {
return false
}
If input is the empty string then leftover will be the empty string
as well, and therefore your function returns true. Another case where
your approach fails is
print(test("12345671234567")) // true (expected: false)
An alternative is to use the range(of:) method of String with the .regularExpression option. Then check if the matched range is the entire string.
In order to match 7 digits (and not 7 arbitrary characters), the
pattern should be \d{7}.
func test(_ input: String) -> Bool {
let pattern = "\\d{7}"
return input.range(of: pattern, options: [.regularExpression, .caseInsensitive])
== input.startIndex..<input.endIndex
}
A solution is to specify that your regex has to match the entire string to be valid, so you can do this by adding ^ and $ at your regex to ensure the start and the end of the string.
let pattern = "^.{7}$" //allow exactly 7 numbers
let regex = try! NSRegularExpression(pattern: pattern, options: [.caseInsensitive])
let numberOfOccurences = regex.numberOfMatches(in: input, options: [], range: NSMakeRange(0, input.utf16.count))
return (numberOfOccurences != 0)
In theory, we should be checking if numberOfOccurences is truly equal to 1 to return true, but checking the start and the end should give you only one or zero match.

OCaml regexp "any" matching, where "]" is one of the characters

I'd like to match a string containing any of the characters "a" through "z", or "[" or "]", but nothing else. The regexp should match
"b"
"]abc["
"ab[c"
but not these
"2"
"(abc)"
I tried this:
let content_check(s:string):bool =
Str.string_match (Str.regexp "^[a-z[\]]*$") s 0;;
content_check "]abc[";;
and got warned that the "escape" before the "]" was illegal, although I'm pretty certain that the equivalent in, say, sed or awk would work fine.
Anyhow, I tried un-escaping the cracket, but
let content_check(s:string):bool =
Str.string_match (Str.regexp "^[a-z[]]*$") s 0;;
doesn't work at all, since it should match any of a-z or "[", then the first "]" closes the "any" selection, after which there must be any number of "]"s. So it should match
[abc]]]]
but not
]]]abc[
In practice, that's not what happens at all; I get the following:
# let content_check(s:string):bool =
Str.string_match (Str.regexp "^[a-zA-Z[]]*$") s 0;;
content_check "]abc[";;
content_check "[abc]]]";;
content_check "]abc[";;
val content_check : string -> bool = <fun>
# - : bool = false
# - : bool = false
# - : bool = false
Can anyone explain/suggest an alternative?
#Tim Pietzker's suggestion sounded really good, but appears not to work:
# #load "str.cma" ;;
let content_check(s:string):bool =
Str.string_match (Str.regexp "^[a-z[\\]]*$") s 0;;
content_check "]abc[";;
# val content_check : string -> bool = <fun>
# - : bool = false
#
nor does it work when I double-escape the "[" in the pattern, just in case. :(
Indeed, here's a MWE:
#load "str.cma" ;;
let content_check(s:string):bool =
Str.string_match (Str.regexp "[\\]]") s 0;;
content_check "]";; (* should be true *)
This is not going to really answer your question, but it will solve your problem. With the re library:
let re_set = Re.(rep (* "rep" is the star *) ## alt [
rg 'a' 'z' ; (* the range from a to z *)
set "[]" ; (* the set composed of [ and ] *)
])
(* version that matches the whole text *)
let re = Re.(compile ##
seq [ start ; re_set ; stop ])
let content_check s =
Printf.printf "%s : %b\n" s (Re.execp re s)
let () =
List.iter content_check [
"]abc[" ;
"[abc]]]" ;
"]abc[" ;
"]abc[" ;
"abc##"
]
As you noticed, str from the stdlib is akward, to put it midly. re is a very good alternative, and it comes with various regexp syntax and combinators (which I tend to use, because I think it's easier to use than regexp syntax).
I'm an idiot. (But perhaps the designers of Str weren't so clever either.)
From the "Str" documentation: "To include a ] character in a set, make it the first character in the set."
With this, it's not so clear how to search for "anything except a ]", since you'd have to place the "^" in front of it. Sigh.
:(

matching exact string in Ocaml using regex

How to find a exact match using regular expression in Ocaml? For example, I have a code like this:
let contains s1 s2 =
let re = Str.regexp_string s2
in
try ignore (Str.search_forward re s1 0); true
with Not_found -> false
where s2 is "_X_1" and s1 feeds strings like "A_1_X_1", "A_1_X_2", ....and so on to the function 'contains'. The aim is to find the exact match when s1 is "A_1_X_1". But the current code finds match even when s1 is "A_1_X_10", "A_1_X_11", "A_1_X_100" etc.
I tried with "[_x_1]", "[_X_1]$" as s2 instead of "_X_1" but does not seem to work. Can somebody suggest what can be wrong?
You can use the $ metacharacter to match the end of the line (which, assuming the string doens't contain multiple lines, is the end of the string). But you can't put that through Str.regexp_string; that just escapes the metacharacters. You should first quote the actual substring part, and then append the $, and then make a regexp from that:
let endswith s1 s2 =
let re = Str.regexp (Str.quote s2 ^ "$")
in
try ignore (Str.search_forward re s1 0); true
with Not_found -> false
Str.match_end is what you need:
let ends_with patt str =
let open Str in
let re = regexp_string patt in
try
let len = String.length str in
ignore (search_backward re str len);
match_end () == len
with Not_found -> false
With this definition, the function works as you require:
# ends_with "_X_1" "A_1_X_10";;
- : bool = false
# ends_with "_X_1" "A_1_X_1";;
- : bool = true
# ends_with "_X_1" "_X_1";;
- : bool = true
# ends_with "_X_1" "";;
- : bool = false
A regex will match anywhere in the input, so the behaviour you see is normal.
You need to anchor your regex: ^_X_1$.
Also, [_x_1] will not help: [...] is a character class, here you ask the regex engine to match a character which is x, 1 or _.