Convert a string into array of words - ocaml

How do I separate a string into a list/array of white space separated words.
let x = "this is my sentence";;
And store them inan list/array like this:
["this", "is", "my", "sentence"]

Using the standard library Str split_delim and the regexp type.
Str.split_delim (Str.regexp " ") "this is my sentence";;
- : bytes list = ["this"; "is"; "my"; "sentence"]
Highly recommend getting UTop, it's really good for quickly searching through Libraries (I typed Str, saw it was there, then Str. and looked for the appropriate function).

The full process goes like this:
first opam install re
if you are using utop, then you can do something like this
#require "re.pcre"
let () =
Re_pcre.split ~rex:(Re_pcre.regexp " +") "Hello world more"
|> List.iter print_endline
and then just run it with utop code.ml
if you want to compile native code, then you'd have:
let () =
Re_pcre.split ~rex:(Re_pcre.regexp " +") "Hello world more"
|> List.iter print_endline
Notice how the #require is gone.
then at command line you'd do: ocamlfind ocamlopt -package re.pcre code.ml -linkpkg -o Test
The OCaml website has tons of tutorials and help, I also have a blog post designed to get you up to speed quickly: http://hyegar.com/2015/10/20/so-youre-learning-ocaml/

Related

OCaml special characters formatting?

I have the following ocaml code:
let rec c_write =
"printf(\" %d \");\n"
On calling this function in the interpreter, I expect to get the output
printf("%d"); followed by a new line, but instead I get
printf(\" %d \");\n
How can I get my expected output when I'm calling the function without using any other I/O functions?
The expression let rec c_write = "printf(\" %d \");\n" is not a function. It is a value of type string which is bound to a variable named c_write. So you're not using any I/O functions in your code.
When entered in the interactive toplevel, this value is printed by the interpreter evaluation loop for user convenience. The same as when a Python interpreter will print for you the value that you've just entered.
The representation, chosen by the OCaml toplevel interpreter, in general, has nothing to do with the representation which is used to store a value in a file or to print it. Moreover, in OCaml, there is no canonical representations.
If you want to write a function that prints a C printf statement then this is how it will look like in OCaml
let print_printf () =
print_endline {|printf("%d");|}
In the example above, I've used {||} to denote a sting literal instead of more common "", since in this literal there is no need to escape special characters and they are interpreted literally (i.e., the don't have any special meaning).
You can achieve the same result using the regular "" quotes for denoting it
let print_printf () =
print_endline "printf(\"%d\");"
Here is an example of the toplevel interaction using these definitions:
# let print_printf () =
print_endline {|printf("%d");|};;
val print_printf : unit -> unit = <fun>
# print_printf ();;
printf("%d");
- : unit = ()
# let print_printf () =
print_endline "printf(\"%d\");";;
val print_printf : unit -> unit = <fun>
# print_printf ();;
printf("%d");
- : unit = ()
If you will put this code in a file, compile, and execute and redirect into a C file it will be a well-formed C file (modulo the absence of the function body).
Since you are somehow using the toplevel printer for printing, and that you somehow needs a very specific format, you need to install a custom printer.
The following would work:
# #install_printer Format.pp_print_string;;
# " This \" is not escaped " ;;
- : string = This " is not escaped
However, it seems very likely that this is not really the problem that you are trying to solve.

Using Regex to the number of twitter mentions in Spark (Scala)

I am new to Spark. I want to output the top 2 twitter mentions using this test.txt file:
"I love to dance #Kelsey, especially with you #Kelsey!"
"Can't believe you went to #harvard. Come on man #harvard"
"I love #harvard"
Essentially, multiple mentions in a single tweet only counts once. So the output would be like:
(2, #harvard)
(1, #Kelsey)
So far, my codes looks like the following:
val tweets = sc.textFile("testFile")
val myReg = """(?<=#)([\\w]+)""".r
val mentions = tweets.filter(x => (myReg.pattern.matcher(x).matches))
However, it would not work because x is still a line and it will not match as a result. Is there anyway I can test the word in the line instead of the line itself? Also, how do I check if that mention is redundant in the tweet?
I adjusted your regex a little and you might need to translate it back to spark syntax, but this way you find all mentions and group them. The .toSet is important to remove duplicates, .toLowercase would also make sense there
val tweets = List("I love to dance #Kelsey, especially with you #Kelsey!",
"Can't believe you went to #harvard. Come on man #harvard",
"I love #harvard")
val myReg = """(#\w+)""".r
val mentions = tweets.flatMap(x => myReg.findAllIn(x).toSet).groupBy(identity).mapValues(_.length)
println(mentions)
That works for me, the regexs is more tweeter exact
val myReg = "(^|[^#\\w])#(\\w{1,15})\\b".r
val mentions = tweets.flatMap(x => myReg.findAllIn(x).matchData.map(_.group(0).trim -> 1)).reduceByKey(_ + _)

How can I unit test Alex code?

I'm writing a lexer in Alex with the monad wrapper. It's not behaving as I expect, and I would like to write some unit tests for it. I can write unit tests for lexing a single token by doing:
runAlex "foo" alexMonadScan `shouldBe` Right TokenFoo
but I don't know how to test that the string "foo bar" gets lexed to [TokenFoo, TokenBar].
Given that Token is my token type, I'd need a function like runAlex that has the type String -> Alex [Token] -> Either String [Token], but I don't know how to transform alexMonadScan so that it has the type Alex [Token] rather than Alex Token.
I tried
runAlex "foo bar" (liftM (:[]) alexMonadScan) `shouldBe` [TokenFoo, TokenBar]
which seems to have the right type, but it returns Right [TokenEOF], apparently dropping the tokens it saw along the way.
How can I achieve this?
There is a function alexScanTokens :: String -> [token] which you can use.
It's defined in the file templates/wrappers.hs
Here's a monadic version I found here:
alexScanTokens :: String -> Either String [Keyword]
alexScanTokens inp = runAlex inp gather
where
gather = do
t <- alexMonadScan
case trace (show t) t of
EOF -> return [EOF]
_ -> (t:) `liftM` gather

Run bash script from ocaml

I think that what i mean is in the title. I have tried to search if it is possible to run bash script from ocaml like from java or from php but i could not find.
i know that we can use ocaml as a scripting language but it is not what i want
Ocaml as a scripting language
In other words
From the Ocaml documentation :
val command : string -> int
So if you want to call a script from Ocaml, do this :
let status = Sys.command "./myexecutable.sh" in
Printf.printf "status = %d\n" status
Feel free to do what you want with the exit code.
Just in case you are interested in collecting the output of the bash script,
let () =
(* Just using pstree as an example, you can pick something else *)
let ic = Unix.open_process_in "pstree" in
let all_input = ref [] in
try
while true do
all_input := input_line oc :: !all_input
done
with
End_of_file ->
(* Just an example, you can do something else here *)
close_in ic;
List.iter print_endline !all_input
I think what you are looking for is the Sys module (http://caml.inria.fr/pub/docs/manual-ocaml/libref/Sys.html).
Sys.command might be one way to do what you want.
If this is not enough, then you may want to take a look at what the Unix has to offer (http://caml.inria.fr/pub/docs/manual-ocaml/libref/Unix.html).

Regex in swift. A template for a specific numeric format

I am new in swift, I have been working with it only few weeks and now I am trying to parse something like a price list from incoming string. It has the next format:
2.99 X 3.00 = 10 A
Some text here
1.22 X 1.5 10 A
And the hardest part is that sometime A or some digit is missing but X should be in the place.
I would like to find out how it is possible to use regex in swift (or something like that if it does not exist) to write a template for parsing the next value
d.dd X d.d SomeValueIfExists
I would very appreciate any useful information, topics to read or any other resources to get more knowledge about swift.
PS. I have access to the dev. forums but I've never used them before.
I did an example recentl, and maybe a little harder than necessary, to demonstrate RegEx use in Swift:
let str1: NSString = "I run 12 miles"
let str2 = "I run 12 miles"
let match = str1.rangeOfString("\\d+", options: .RegularExpressionSearch)
let finalStr = str1.substringWithRange(match).toInt()
let n: Double = 2.2*Double(finalStr!)
let newStr = str2.stringByReplacingOccurrencesOfString("\\d+", withString: "\(n)", options: NSStringCompareOptions.RegularExpressionSearch, range: nil)
println(newStr) //I run 26.4 miles
Two of these have "RegularExpressionSearch". If you put this in a playground you can see what each line does. Note the double \ escapes. One for the normal RegEx use and anther because \ is a special character in Swift.
Also a good article:
http://benscheirman.com/2014/06/regex-in-swift/