How can I use this regex in Haskell? - regex

I'm trying to make a simple Haskell program that will take any line that looks like someFilenameHere0035.xml and returns 0035. My sample input file, input.txt, would look like this:
someFilenameHere0035.xml
anotherFilenameHere4465.xml
And running: cat input.txt | runhaskell getID.hs should return:
0035
4465
I'm having so much difficulty figuring this out. Here's what I have so far:
import Text.Regex.PCRE
getID :: String -> [String]
getID str = str =~ "([0-9]+)\\.xml" :: [String]
main :: IO ()
main = interact $ unlines . getID
But I get an error message I don't understand at all:
• No instance for (RegexContext Regex String [String])
arising from a use of ‘=~’
• In the expression: str =~ "([0-9]+)\\.xml" :: [String]
In an equation for ‘getID’:
getID str = str =~ "([0-9]+)\\.xml" :: [String] (haskell-stack-ghc)
I feel like I'm really close, but I don't know where to go from here. What am I doing wrong?

First off you only want the number part so we can get rid of the \\.xml.
The regex-pcre library defines an instance for RegexContext Regex String String but not RegexContext Regex String [String] hence the error.
So if we change the type signature to String -> String then that error is taken care of.
unlines expects [String] so to test what we had at this point I wrote a quick function that wraps its argument in a list (there's probably a nicer way to do that but that's not the point of the question):
toList :: a -> [a]
toList a = [a]
Running your command with main = interact $ unlines . toList . getID output 0035, so we're almost there.
getID is passed a String of the file contents, these are conveniently separated by the \n character. So we can use splitOn "\n" from the Data.List.Split library to get our list of .xml files.
Then we simply need to map getID over that list (toList is no longer needed).
This gives us:
import Text.Regex.PCRE
import Data.List.Split
getID :: String -> String
getID str = str =~ "([0-9]+)"
main :: IO ()
main = interact $ unlines . map getID . splitOn "\n"
This gives me the desired output when I run your command.
Hopefully this helps :)

Related

Longest prefix of an OCaml `string list` ending in a specific `string` value

I am trying to work out whether there is a particularly neat or efficient way of truncating a string after the final occurrence of a specific element. For my purposes, it is a monomorphized string list and the string I am looking for the final (highest index) occurrence of is known at compile-time, since I am only using it in one case.
The motivation for this is to find the nearest ancestor in a Unix directory system of the CWD whose name in its parent is a particular folder name. I.E., if I wanted to find the nearest ancestor called bin and I was running the executable from a CWD of /home/anon/bin/projects/sample/src/bin/foo/, then I would want to get back /home/anon/bin/projects/sample/src/bin
The current implementation I am using is the following:
let reverse_prune : tgt:string -> string -> string =
let rec drop_until x ys =
match ys with
| [] -> []
| y :: _ when x = y -> ys
| _ :: yt -> drop_until x yt
in
fun ~tgt path ->
String.split_on_char '/' path
|> List.rev |> drop_until tgt |> List.rev |> String.concat "/"
It isn't a particularly common or expensive code-path so there isn't actually a real need to optimize, but since I am still trying to learn practical OCaml techniques, I wanted to know if there was a cleaner way of doing this.
I also realize that it may technically be possible to avoid the string-splitting altogether and just operate on the raw CWD string without splitting it. I am, of course, welcome to such suggestions as well, but I am specifically curious if there is something that would replace the List.rev |> drop_until tgt |> List.rev snippet, rather than solve the overall problem in a different way.
I don't think this has anything to do with OCaml actually since I'd say the easiest way to do this is by using a regular expression:
let reverse_prune tgt path =
let re =
Str.regexp (Format.sprintf {|^[/a-zA-Z_-]*/%s\([/a-zA-Z_-]*\)$|} tgt)
in
Str.replace_first re {|\1|} path
let () =
reverse_prune "bin" "/home/anon/bin/projects/sample/src/bin/foo/"
|> Format.printf "%s#."
Is there a reason you want to reimplement regular expression searching in a string? If no, just use a solution like mine, I'd say.
If you want the part that comes before just change the group:
let reverse_prune tgt path =
let re =
Str.regexp (Format.sprintf {|^\([/a-zA-Z_-]*/\)%s[/a-zA-Z_-]*$|} tgt)
in
Str.replace_first re {|\1|} path

Split a [Char] by each character in Haskell

I'm new to Haskell and I would like some help into how to split a sentence into separate characters.
Example: "Test sentence" == ["T","e","s","t"," ","s","e","n","t","e","n","c","e"]
I checked everywhere but cant find a solution that don't require import of separate modules and stuff.
Many thanks in advance!
In Haskell a String is a type synonym for [Char].
If you really want to turn a [Char] into a lists of one-character Strings (you probably don't):
charStrs :: String -> [String]
charStrs = fmap pure
charStrs "hello world" -- ["h","e","l","l","o"," ","w","o","r","l","d"]
edit updated to pure

ocaml Str.full_split does not returns the original string instead of the expected substring

I am trying to write a program that will read diff files and return the filenames, just the filenames. So I wrote the following code
open Printf
open Str
let syname: string = "diff --git a/drivers/usc/filex.c b/drivers/usc/filex"
let fileb =
let pat_filename = Str.regexp "a\/(.+)b" in
let s = Str.full_split pat_filename syname in
s
let print_split_res (elem: Str.split_result) =
match elem with
| Text t -> print_string t
| Delim d -> print_string d
let rec print_list (l: Str.split_result list) =
match l with
| [] -> ()
| hd :: tl -> print_split_res hd ; print_string "\n" ; print_list tl
;;
() = print_list fileb
upon running this I get the original sting diff --git a/drivers/usc/filex.c b/drivers/usc/filex back as the output.
Whereas if I use the same regex pattern with the python standard library I get the desired result
import re
p=re.compile('a\/(.+)b')
p.findall("diff --git a/drivers/usc/filex.c b/drivers/usc/filex")
Output: ['drivers/usc/filex.c ']
What am I doing wrong?
Not to be snide, but the way to understand OCaml regular expressions is to read the documentation, not compare to things in another language :-) Sadly, there is no real standard for regular expressions across languages.
The main problem appears to be that parentheses in OCaml regular expressions match themselves. To get grouping behavior they need to be escaped with '\\'. In other words, your pattern is looking for actual parentheses in the filename. Your code works for me if you change your regular expression to this:
Str.regexp "a/\\(.+\\)b"
Note that the backslashes must themselves be escaped so that Str.regexp sees them.
You also have the problem that your pattern doesn't match the slash after b. So the resulting text will start with a slash.
As a side comment, I also removed the backslash before /, which is technically not allowed in an OCaml string.

How to make a haskell function that returns all matches of a specific regex?

I find that I can do something like this the below with string literals
import Text.Regex.TDFA
import Text.Regex.TDFA ()
let x = ("foo" =~ ("o" :: String)) :: [[String]]
But I cannot at all figure out a way to do something like
getMatches input = (input =~ "o") :: [[String]]
it gives me something like
Non type-variable argument in the constraint: RegexContext Text.Regex.TDFA.Text.Regex source1 [[String]]
I have been googling for a long while and can't find anything that gives me the exact type signature I want.
Does anyone know if there is any way to do this?
If you want a simple way to make a Haskell function that returns all matches of a specific text pattern, then maybe “try this instead” of regex.
With https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:splitCap
import Text.Megaparsec (chunk)
import Replace.Megaparsec (splitCap)
import Data.Either (rights)
rights $ splitCap (chunk "o") "foo"
["o","o"]

how to combine map and replicate on an array of integers

For a school exercise i need to generate a series of symbols with a given array of numbers. given is [3,3,2,1] output "+===+===+==+=+".
My approach would be to use map and replicate "=" on the array then intercalate "+" and finally concat the array to a single string.
My solution is something like this (while standing knee deep in errors)
printLine arr = map (replicate "=") arr >>> intercalate '*' >>> concat
what is the correct syntax? or shouldn't i use map at all?
you are on the right track, you just mixed up the functions a bit:
replicate will take a number n and repeat the second argument n-times into a list (so you just got the order wrong - you could use flip or an aux. function like I did bellow)
you have to watch out if you want Char or String ('=' VS "=" for example) - read the type-definitions (try :t intercalate or Hoogle) carefully and remember: String ~ [Char]!
intercalate actually does the concatenation so you don't need concat at all
Here is a almost working version:
eqSigns :: Int -> String
eqSigns n = replicate n '='
mixIn :: [Int] -> String
mixIn = intercalate "+" . map eqSigns
try it and see if you get the missing parts in there ;)
here is the version with flip instead:
mixIn :: [Int] -> String
mixIn = intercalate "+" . map (flip replicate '=')
PS: are you coming from some ML/F# background?