How to get the integer part of a string in OCaml? - ocaml

My string has the format of A3, A5, A38... and I'm only interested in the integer part of it.
I want something like:
let getIntofString s = match s with
"A1" -> 1
| ..
How do I do it?

Take a look at string_of_int from the initially opened module and at sub from the standard library's String module. That should get you started.

The Scanf module allows for some simple pattern matching where no elaborate parsing is needed.
Example:
let identity x = x
let parseAInt s =
Scanf.sscanf s "A%u" identity
Here, Scanf.sscanf takes as its first argument the input string, as its second argument the pattern to be matched (%u denoting an unsigned integer), and as its third argument a function that converts the parsed results into the type needed. As we don't need such a conversion in this case, the identity function suffices here.
Note that you may have to handle exceptions (Scanf.Scan_failure for a pattern mismatch, End_of_file for running out of characters to read, or Failure for being unable to convert a string to a number) if you cannot guarantee that the input to this function actually matches the pattern (for example, because it was supplied by a user).

You have to use sub, length and int_of_string.
open String
let getIntofString s =
let l=length s in
let si=sub s 1 (l-1) in
int_of_string si
;;
Test
# getIntofString "A18";;
- : int = 18
# getIntofString "A";;
Exception: Failure "int_of_string".
# getIntofString "";;
Exception: Invalid_argument "String.sub / Bytes.sub".

Related

Why is my regex failing on on certain strings that otherwise succeed?

I have code written in F# that iterates over an array of strings using regex to extract part of those strings. The problem is that the regex appears to randomly successfully match on some, but fail on others, even on an exact duplicates from the same list where it previously succeeded. What am I missing? Is this some sort of regex issue that I am not aware of?
Regex Pattern:
(?i)/(.*?/v\d/.*?((?=\?)|(?=\d)|(?=\n)))
F# code:
[<Literal>]
let ApiPattern = #"(?i)/(.*?/v\d/.*?((?=\?)|(?=\d)|(?=\n)))"
let parseOutEndpoints (inputs : (int * string) array) =
let regEx = new Regex(ApiPattern, RegexOptions.Compiled)
inputs |> Array.map (fun (id, path) -> [|id.ToString(); path|]) |> Array.collect (fun x -> x)
|> writeRawPathsToFile
File.ReadAllLines(RawPathsFile)
|> Array.map(fun (x) ->
let m = regEx.Match(x)
if m.Success
then
let endpoint = Domain.Endpoint(m.Value)
endpoint
else
let line = $"{x}"
File.AppendAllLines(FailedRegexMatches, [line], Encoding.UTF8)
Domain.NoEndpoint
)
Sample string array Data:
All of these should return a match, but don't. In comparison to this original list, a significantly reduced list of successful matches will be returned.
/enterprise-review/v9/choose?rr=Straight&pr=1%2E35239
/review-id-service/v1/business-id
/orderout/v1/vendor/shipping
/vendor-service/v1/Product/PartnerId/35310108
/Inspect/v1/Recommendation/Products/LaneId/0002,519188,13148,16939,7348,195982
/bin-inventory/v1/vendor?el=1%2E35239
/u-future/v1/fone?fhid=3028
/decline-summary/v1/details/card/65821974
/provide-service/v8/proDetails
/monetary-points/v1/sum/wins/681197
/listen-service/v1/audio-Details
/comment/v1/data
/comment/v1/data
/listen-service/v1/audio-Details
/comment/v1/data
/comment/v1/data
/listen-service/v1/audio-Details
/comment/v1/data
/comment/v1/data
This one helped to resolve your issue:
/(.*?/v\d/.*?((?=[\?\d\s])|$))
The reason behind problem: probably \r (windows carriage return), whitespaces and also end of string (noted as $ in regex).
Here's your regex and input in regexstorm, a .net Rex tester:
regex storm
I'd have made this a comment but RS's share urls contain the full Rex and input so it's too long for a comment (and SO doesn't allow url shorteners in comments)
So, my question is; does this look right to you? Are all the highlighted matches what you're expecting to match? If so, as RS's engine is .net based, I don't think there is a problem with the regex part of your code..

(Ocaml) Using 'match' to extract list of chars from a list of chars

I have just started to learn ocaml and I find it difficult to extract small list of chars from a bigger list of chars.
lets say I have:
let list_of_chars = ['#' ; 'a' ; 'b' ; 'c'; ... ; '!' ; '3' ; '4' ; '5' ];;
I have the following knowledge - I know that in the
list above I have '#' followed by a '!' in some location further in the list .
I want to extract the lists ['a' ;'b' ;'c' ; ...] and ['3' ; '4' ; '5'] and do something with them,
so I do the following thing:
let variable = match list_of_chars with
| '#'::l1#['!']#l2 -> (*[code to do something with l1 and l2]*)
| _ -> raise Exception ;;
This code doesn't work for me, it's throwing errors. Is there a simple way of doing this?
(specifically for using match)
As another answer points out, you can’t use pattern matching for this because pattern matching only lets you use constructors and # is not a constructor.
Here is how you might solve your problem
let split ~equal ~on list =
let rec go acc = function
| [] -> None
| x::xs -> if equal x on then Some (rev acc, xs) else go (x::acc) xs
in
go [] list
let variable = match list_of_chars with
| '#'::rest ->
match split rest ~on:'!' ~equal:(Char.equal) with
| None -> raise Exception
| Some (left,right) ->
... (* your code here *)
I’m now going to hypothesise that you are trying to do some kind of parsing or lexing. I recommend that you do not do it with a list of chars. Indeed I think there is almost never a reason to have a list of chars in ocaml: a string is better for a string (a chat list has an overhead of 23x in memory usage) and while one might use chars as a kind of mnemonic enum in C, ocaml has actual enums (aka variant types or sum types) so those should usually be used instead. I guess you might end up with a chat list if you are doing something with a trie.
If you are interested in parsing or lexing, you may want to look into:
Ocamllex and ocamlyacc
Sedlex
Angstrom or another parser generator like it
One of the regular expression libraries (eg Re, Re2, Pcre (note Re and Re2 are mostly unrelated)
Using strings and functions like lsplit2
# is an operator, not a valid pattern. Patterns need to be static and can't match a varying number of elements in the middle of a list. But since you know the position of ! it doesn't need to be dynamic. You can accomplish it just using :::
let variable = match list_of_chars with
| '#'::a::b::c::'!'::l2 -> let l1 = [a;b;c] in ...
| _ -> raise Exception ;;

Scala regex find/replace with additional formatting

I'm trying to replace parts of a string that contains what should be dates, but which are possibly in an impermissible format. Specifically, all of the dates are in the form "mm/dd/YYYY" and they need to be in the form "YYYY-mm-dd". One caveat is that the original dates may not exactly be in the mm/dd/YYYY format; some are like "5/6/2015". For example, if
val x = "where date >= '05/06/2017'"
then
x.replaceAll("'([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})'", "'$3-$1-$2'")
performs the desired replacement (returns "2017-05-06"), but for
val y = "where date >= '5/6/2017'"
this does not return the desired replacement (returns "2017-5-6" -- for me, an invalid representation). With the Joda Time wrapper nscala-time, I've tried capturing the dates and then reformatting them:
import com.github.nscala_time.time.Imports._
import org.joda.time.DateTime
val f = DateTimeFormat.forPattern("yyyy-MM-dd")
y.replaceAll("'([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})'",
"'"+f.print(DateTimeFormat.forPattern("MM/dd/yyyy").parseDateTime("$1"))+"'")
But this fails with a java.lang.IllegalArgumentException: Invalid format: "$1". I've also tried using the f interpolator and padding with 0s, but it doesn't seem to like that either.
Are you not able to do additional processing on the captured groups ($1, etc.) inside the replaceAll? If not, how else can I achieve the desired result?
The $1 like backreferences can only be used inside string replacement patterns. In your code, "$1" is not a backreference any longer.
You may use a "callback" with replaceAllIn to actually get the match object and access its groups to further manipulate them:
val pattern = "'([0-9]{1,2}/[0-9]{1,2}/[0-9]{4})'".r
y = pattern replaceAllIn (y, m => "'"+f.print(DateTimeFormat.forPattern("MM/dd/yyyy").parseDateTime(m.group(1)))+"'")
Regex.replaceAllIn is overloaded and can take a Match => String.

how to get a list of char from user input in ocaml

I cannot find a way to get a list from values given by the user. I already did
# read_line() |> Str.split (Str.regexp " +") |> List.map int_of_string;;
but don't know how to do the same for chars.
just remember that a string is an array of char:
# let s = "ab";;
# s.[0];;
- : char = 'a'
So, if you have to return a char for any of the elements of your list :
# read_line() |> Str.split (Str.regexp " +") |> List.map (fun x -> x.[0])
From the context of your previous questions, you seem to actually need a way to parse text that contains a mix of integers and characters that describe the transitions for a state machine. I'd suggest that you may be best served with the Scanf module for this. If you need something much more complex, though, this may require a handwritten scanner (you can use Str.string_match for that or – if you're willing to dive that deep – use the ocamllex scanner generator).
A simple example of reading a line from a channel with Scanf would be:
let read_transition input =
try
let line = input_line input in
Scanf.sscanf line "%d %c %d"
(fun x ch y -> Some (x, ch, y))
with End_of_file -> None
Note that we're reading the line with input_line rather than using Scanf.scanf on the input channel directly. The reason is that Scanf.scanf may need up to a character worth of lookahead and thus, if you mix it with other ways of reading the channel, characters may get skipped. By using input_line and then Scanf.sscanf (rather than Scanf.scanf) we avoid that corner case.
Note also that depending on the syntax of the input, you may have to adjust the scanf pattern accordingly.

Haskell Regular Expressions and Reading String as Integer

Let's say I want to consider input of the form
[int_1, int_2, ..., int_n]
[int_1, int_2, ..., int_m]
...
where the input is read in from a text file. My goal is to obtain the maximum size of this list. Currently I have a regular expression that recognizes this pattern:
let input = "[1,2,3] [1,2,3,4,5]"
let p = input =~ "(\\[([0-9],)*[0-9]\\])" :: [[String]]
Output:
[["[1,2,3]","[1,2,3]","2,"],["[1,2,3,4,5]","[1,2,3,4,5]","4,"]]
So what I'm after is the max of the third index + 1. However, where I'm stuck is trying to consider this index as an int. For instance I can refer to the element just fine:
(p !! 0) !! 2
> "2,"
But I can't convert this to an int, I've tried
read( (p !! 0) !! 2)
However, this does not work despite the fact that
:t (p !! 0) !! 2
> (p !! 0) !! 2 :: String
Appears to be a string. Any advice as to why I can't read this as an int would be greatly appreciated.
Thanks again.
I'm not entirely sure that your approach is one I'd recommend, but I'm struggling to wrap my head around the goal, so I'll just answer the question.
The problem is that read "2," can't just produce an Int, because there's a leftover comma. You can use reads to get around this. reads produces a list of possible parses and the strings left over, so:
Prelude> (reads "2,") :: [(Int,String)]
[(2,",")]
In this case it's unambiguous, so you get one parse from which you can then pull out the int, although regard for your future self-respect suggests being defensive and not assuming that there will always be a valid parse (the Safe module is good for that sort of thing).
Alternatively, you could modify your regex to not include the comma in the matched group.