Elixir regex replacing square brakets and comma

Elixir regex replacing square brakets and comma - regex

As mentioned above, the goal is to remove square brackets and commas.
My current solution is the following:
Given:
"[40.45694301152436, -3.6907402812214514]"
|> String.replace("[", "")
|> String.replace(",", "")
|> String.replace("]", "")
|> String.split(" ")
|> Enum.map(fn x -> String.to_float(x) end)
Output:
[40.45694301152436, -3.6907402812214514]
I know this can be compacted much more, but I've been looking at examples all day and all failed to do the job above.

Instead of a string, you can pass a regex to String.replace. In Elixir you can build a regex with ~r sigil.
"[40.45694301152436, -3.6907402812214514]"
|> String.replace(~r'[\[\],]', "")
|> String.split()
|> Enum.map(&String.to_float/1)

While #fhdhsni has given you a great answer if your concern is readability I'd suggest abstracting the whole thing to a separate function like so:
defmodule T do
def parsefloats(stringtobeparsed) do
stringtobeparsed
|> String.replace("[", "")
|> String.replace(",", "")
|> String.replace("]", "")
|> String.split(" ")
|> Enum.map(fn x -> String.to_float(x) end)
end
end
Then you call it like so:
[x,y] = T.parsefloats("[40.45694301152436, -3.6907402812214514]")
# [40.45694301152436, -3.6907402812214514]
iex(3)> x
# 40.45694301152436
iex(4)> y
# -3.6907402812214514
Not any better in terms of more compact code but more readable I think.

Here's another option, using String.slice:
"[40.45694301152436, -3.6907402812214514]"
|> String.slice(1..-2)
|> String.split(~r/,\s+/)
|> Enum.map(&String.to_float/1)
Cheers!

Apart from the string replace solutions, you can take a look at Code.eval_string as well.
This way the string will get parsed and you'll get back the list you're looking for;
{list, _} = Code.eval_string "[40.45694301152436, -3.6907402812214514]"
# {[40.45694301152436, -3.6907402812214514], []}

I think the comment on your question about using a JSON parser is best, followed by fhdhsni's simple answer. But here's a method that extracts the numbers, rather than replacing the brackets:
str = "[40.45694301152436, -3.6907402812214514]"
regex = ~r/([\d\.-]+), ([\d\.+-]+)/
Regex.run(regex, str, capture: :all_but_first) |> Enum.map(&String.to_float/1)
Output:
[40.45694301152436, -3.6907402812214514]

Related

Longest prefix of an OCaml `string list` ending in a specific `string` value

I am trying to work out whether there is a particularly neat or efficient way of truncating a string after the final occurrence of a specific element. For my purposes, it is a monomorphized string list and the string I am looking for the final (highest index) occurrence of is known at compile-time, since I am only using it in one case.
The motivation for this is to find the nearest ancestor in a Unix directory system of the CWD whose name in its parent is a particular folder name. I.E., if I wanted to find the nearest ancestor called bin and I was running the executable from a CWD of /home/anon/bin/projects/sample/src/bin/foo/, then I would want to get back /home/anon/bin/projects/sample/src/bin
The current implementation I am using is the following:
let reverse_prune : tgt:string -> string -> string =
let rec drop_until x ys =
match ys with
| [] -> []
| y :: _ when x = y -> ys
| _ :: yt -> drop_until x yt
in
fun ~tgt path ->
String.split_on_char '/' path
|> List.rev |> drop_until tgt |> List.rev |> String.concat "/"
It isn't a particularly common or expensive code-path so there isn't actually a real need to optimize, but since I am still trying to learn practical OCaml techniques, I wanted to know if there was a cleaner way of doing this.
I also realize that it may technically be possible to avoid the string-splitting altogether and just operate on the raw CWD string without splitting it. I am, of course, welcome to such suggestions as well, but I am specifically curious if there is something that would replace the List.rev |> drop_until tgt |> List.rev snippet, rather than solve the overall problem in a different way.

I don't think this has anything to do with OCaml actually since I'd say the easiest way to do this is by using a regular expression:
let reverse_prune tgt path =
let re =
Str.regexp (Format.sprintf {|^[/a-zA-Z_-]*/%s\([/a-zA-Z_-]*\)$|} tgt)
in
Str.replace_first re {|\1|} path
let () =
reverse_prune "bin" "/home/anon/bin/projects/sample/src/bin/foo/"
|> Format.printf "%s#."
Is there a reason you want to reimplement regular expression searching in a string? If no, just use a solution like mine, I'd say.
If you want the part that comes before just change the group:
let reverse_prune tgt path =
let re =
Str.regexp (Format.sprintf {|^\([/a-zA-Z_-]*/\)%s[/a-zA-Z_-]*$|} tgt)
in
Str.replace_first re {|\1|} path

Elixir: How to count urls in a string

Suppose I have a string:
content = "Please visit https://www.google.com...\nOr visit http://my.website.io\nhttp://myfriends.website.com\nOr https://www.myneigborsite.com, http://visit.me.com"
There are 5 urls in the string.
How do i count the urls using syntax?
I have tried using Regex.scan/2 |> Enum.count/1, or String.split/2 |> Enum.count/1 <- with regex but i always get wrong output.
I have also tried every http/https regex I found in the internet, but still I can't get the correct output.
Here's one that I've tried.
iex> content
...> |> String.split(~r/^(https?):\/\/[^\s$.?#].[^\s]*$/)
...> |> Enum.count()
...> |> Kernel.-(1)
-1
Another one with the same regex..
iex> Regex.scan(~r/^(https?):\/\/[^\s$.?#].[^\s]*$/, content) |> Enum.count()
0
but when I check if the regex matches some of the urls
iex> Regex.match?(~r/^(https?):\/\/[^\s$.?#].[^\s]*$/, "https://www.google.com")
true
iex(48)> Regex.match?(~r/^(https?):\/\/[^\s$.?#].[^\s]*$/, "http://my.website.io")
true
It does match.
I can't figure out what's the problem. Please help me.

You need to only count urls, which means you don’t need an overcomplicated regular expression.
~r|https?://[\w.-]+|
|> Regex.scan(content)
|> Enum.count()
#⇒ 5
Your attempts failed because you put $, the EOL-matcher in the expressions, which is obviously not matched when the URL is not terminating the string.

Programmatically build an F# regular expression with the FsVerbalExpressions library

I've been using the library FsVerbalExpressions to write some functions. I'm having a hard time trying to build a regEx programmatically.
For example, if I have a string "Int. Bus. Mach", I can remove periods and whitespaces and end up with the array
let splitString = [|"Int"; "Bus"; "Mach"|]
What I'd like to do is build a regular expression from splitString so that its result is:
let hardCoded =
VerbEx()
|> startOfLine
|> then' "Int"
|> anything
|> whiteSpace
|> then' "Bus"
|> anything
|> whiteSpace
|> then' "Mach"
hardCoded;;
val it : VerbEx =
^(Int)(.*)\s(Bus)(.*)\s(Mach) {MatchTimeout = -00:00:00.0010000;
Regex = ^(Int)(.*)\s(Bus)(.*)\s(Mach);
RegexOptions = None;
RightToLeft = false;}
My problem is that I don't know how to build this programmatically so that, if the original string is "This is a much bigger string", the entire regEx is built from code rather than hard coded. I can create individual regular expressions with
let test =
splitString
|> Array.map (fun thing -> VerbEx()
|> then' thing)
|> Array.toList
but this is a list of VerbEx() rather than a single VerbEx() above.
Does anyone know how I could build a regEx with FsVerbalExpressions programmatically?
Thanks in advance for your help!

Think about it like this: you need to start with some initial value, VerbEx() |> startOfLine, and then apply to it repeating patterns that have the general shape of anything |> whitespace |> then' word.
You can also think about it in inductive terms: you're producing a series of values, where each value is expressed as previousValue |> anything |> whitespace |> then' word - that is, each next value in the series is previous value with some change applied to it. The very last element of such series is your final answer.
Such operation - producing a series of values, where each value is expressed as a modification of the previous one, - is traditionally called fold. And sure enough, F# has standard library functions for performing this operation:
let applyChange previousValue word =
previousValue |> anything |> whitespace |> then' word
let initialValue = VerbEx() |> startOfLine
let finalAnswer = splitString |> Array.fold applyChange initialValue
Or you can roll that all together:
let finalAnswer =
splitString
|> Array.fold
(fun previousValue word -> previousValue |> anything |> whitespace |> then' word)
(VerbEx() |> startOfLine)

Removing commas from inside quoted parts of a string in Elm 0.16

Right now I am trying to remove any commas that are contained within quotation marks and replace them with spaces in this string:
(,(,data,"quoted,data",123,4.5,),(,data,(,!##,(,4.5,),"(,more","data,)",),),)
I am currently using this function that uses Javascript style regex:
removeNeedlessCommmas sExpression =
sExpression
|> (\_ -> replaceSpacesWithCommas sExpression)
|> Regex.replace Regex.All (Regex.regex ",") (\_ -> ",(?!(?:[^"]*"[^"]*")*[^"]*$)g")
This regex is displayed as working correctly in sites such as regex101.com.
However, I have tried many ways of escaping the regex so that it works in Elm 0.16, but the rest of my code in my file is always still highlighted like the rest of the file is enclosed in a string. This is the error that I am getting with my current code:
(line 1, column 64): unexpected "_" expecting space, "&" or escape code
39│ printToBrowser "((data \"quoted data\" 123 4.5) (data (!##(4.5) \"(more\" \"data)\")))"
Maybe <http://elm-lang.org/docs/syntax> can help you figure it out.
I will post the main function that the error is referring to so that it makes more sense:
main : Html.Html
main =
printToBrowser "((data \"quoted data\" 123 4.5) (data (!## (4.5) \"(more\" \"data)\")))"
Any assistance would be greatly appreciated. Thanks in advance.

I think you need 3 things:
Add a closing ) to the last anonymous function in removeNeedlessCommmas (this could have just been a copy-paste error)
Escape all the inner " in your regex like so: ",(?!(?:[^\"]*\"[^\"]*\")*[^\"]*$)g"
Use the regex for matching, and replace with a space like so: Regex.replace Regex.All (Regex.regex ",(?!(?:[^\"]*\"[^\"]*\")*[^\"]*$)g") (\_ -> " ")

If you'd consider a cowardly workaround alternative to a death-defying super-regex, I can offer this:
removeNeedlessCommas sExpr =
replace All (regex "\"[^\"]*?\"")
(\{match} -> String.map (\c -> if c == ',' then ' ' else c) match)
sExpr
It lets regex find the quoted strings but does the comma substitution to those strings in a separate step. If preferred, that could be done by regex as well.
Here's my test harness, which ran fine in http://elm-lang.org/try :
import Html exposing (..)
import Regex exposing (..)
import String
str = """(,(,data,"quoted,data",123,4.5,),(,data,(,!##,(,4.5,),"(,more","data,)",),),)"""
main = div []
[ (text str)
, br [] []
, (text (removeNeedlessCommas str))]
Output:
(,(,data,"quoted,data",123,4.5,),(,data,(,!##,(,4.5,),"(,more","data,)",),),)
(,(,data,"quoted data",123,4.5,),(,data,(,!##,(,4.5,),"( more","data )",),),)
Just for good measure, here's an algorithmic solution that does completely without regex:
removeNeedlessCommas str =
reverse
<| snd
<| foldl (\c (inQ, acc) ->
case c of
'"' -> (not inQ, cons c acc)
',' -> (inQ, cons (if inQ then ' ' else c) acc)
_ -> (inQ, cons c acc))
(False, "")
str

F# Mapping Regular Expression Matches with Active Patterns

I found this useful article on using Active Patterns with Regular Expressions:
http://www.markhneedham.com/blog/2009/05/10/f-regular-expressionsactive-patterns/
The original code snippet used in the article was this:
open System.Text.RegularExpressions
let (|Match|_|) pattern input =
let m = Regex.Match(input, pattern) in
if m.Success then Some (List.tl [ for g in m.Groups -> g.Value ]) else None
let ContainsUrl value =
match value with
| Match "(http:\/\/\S+)" result -> Some(result.Head)
| _ -> None
Which would let you know if at least one url was found and what that url was (if I understood the snippet correctly)
Then in the comment section Joel suggested this modification:
Alternative, since a given group may
or may not be a successful match:
List.tail [ for g in m.Groups -> if g.Success then Some g.Value else None ]
Or maybe you give labels to your
groups and you want to access them by
name:
(re.GetGroupNames()
|> Seq.map (fun n -> (n, m.Groups.[n]))
|> Seq.filter (fun (n, g) -> g.Success)
|> Seq.map (fun (n, g) -> (n, g.Value))
|> Map.ofSeq)
After trying to combine all of this I came up with the following code:
let testString = "http://www.bob.com http://www.b.com http://www.bob.com http://www.bill.com"
let (|Match|_|) pattern input =
let re = new Regex(pattern)
let m = re.Match(input) in
if m.Success then Some ((re.GetGroupNames()
|> Seq.map (fun n -> (n, m.Groups.[n]))
|> Seq.filter (fun (n, g) -> g.Success)
|> Seq.map (fun (n, g) -> (n, g.Value))
|> Map.ofSeq)) else None
let GroupMatches stringToSearch =
match stringToSearch with
| Match "(http:\/\/\S+)" result -> printfn "%A" result
| _ -> ()
GroupMatches testString;;
When I run my code in an interactive session this is what is output:
map [("0", "http://www.bob.com"); ("1", "http://www.bob.com")]
The result I am trying to achieve would look something like this:
map [("http://www.bob.com", 2); ("http://www.b.com", 1); ("http://www.bill.com", 1);]
Basically a mapping of each unique match found followed by the count of the number of times that specific matching string was found in the text.
If you think I'm going down the wrong path here please feel free to suggest a completely different approach. I'm somewhat new to both Active Patterns and Regular Expressions so I have no idea where to even begin in trying to fix this.
I also came up with this which is basically what I would do in C# translated to F#.
let testString = "http://www.bob.com http://www.b.com http://www.bob.com http://www.bill.com"
let matches =
let matchDictionary = new Dictionary<string,int>()
for mtch in (Regex.Matches(testString, "(http:\/\/\S+)")) do
for m in mtch.Captures do
if(matchDictionary.ContainsKey(m.Value)) then
matchDictionary.Item(m.Value) <- matchDictionary.Item(m.Value) + 1
else
matchDictionary.Add(m.Value, 1)
matchDictionary
Which returns this when run:
val matches : Dictionary = dict [("http://www.bob.com", 2); ("http://www.b.com", 1); ("http://www.bill.com", 1)]
This is basically the result I am looking for, but I'm trying to learn the functional way to do this, and I think that should include active patterns. Feel free to try to "functionalize" this if it makes more sense than my first attempt.
Thanks in advance,
Bob

Interesting stuff, I think everything you are exploring here is valid. (Partial) active patterns for regular expression matching work very well indeed. Especially when you have a string which you want to match against multiple alternative cases. The only thing I'd suggest with the more complex regex active patterns is that you give them more descriptive names, possibly building up a collection of different regex active patterns with differing purposes.
As for your C# to F# example, you can have functional solution just fine without active patterns, e.g.
let testString = "http://www.bob.com http://www.b.com http://www.bob.com http://www.bill.com"
let matches input =
Regex.Matches(input, "(http:\/\/\S+)")
|> Seq.cast<Match>
|> Seq.groupBy (fun m -> m.Value)
|> Seq.map (fun (value, groups) -> value, (groups |> Seq.length))
//FSI output:
> matches testString;;
val it : seq<string * int> =
seq
[("http://www.bob.com", 2); ("http://www.b.com", 1);
("http://www.bill.com", 1)]
Update
The reason why this particular example works fine without active patterns is because 1) you are only testing one pattern, 2) you are dynamically processing the matches.
For a real world example of active patterns, let's consider a case where 1) we are testing multiple regexes, 2) we are testing for one regex match with multiple groups. For these scenarios, I use the following two active patterns, which are a bit more general than the first Match active pattern you showed (I do not discard first group in the match, and I return a list of the Group objects, not just their values -- one uses the compiled regex option for static regex patterns, one uses the interpreted regex option for dynamic regex patterns). Because the .NET regex API is so feature filled, what you return from your active pattern is really up to what you find useful. But returning a list of something is good, because then you can pattern match on that list.
let (|InterpretedMatch|_|) pattern input =
if input = null then None
else
let m = Regex.Match(input, pattern)
if m.Success then Some [for x in m.Groups -> x]
else None
///Match the pattern using a cached compiled Regex
let (|CompiledMatch|_|) pattern input =
if input = null then None
else
let m = Regex.Match(input, pattern, RegexOptions.Compiled)
if m.Success then Some [for x in m.Groups -> x]
else None
Notice also how these active patterns consider null a non-match, instead of throwing an exception.
OK, so let's say we want to parse names. We have the following requirements:
Must have first and last name
May have middle name
First, optional middle, and last name are separated by a single blank space in that order
Each part of the name may consist of any combination of at least one or more letters or numbers
Input may be malformed
First we'll define the following record:
type Name = {First:string; Middle:option<string>; Last:string}
Then we can use our regex active pattern quite effectively in a function for parsing a name:
let parseName name =
match name with
| CompiledMatch #"^(\w+) (\w+) (\w+)$" [_; first; middle; last] ->
Some({First=first.Value; Middle=Some(middle.Value); Last=last.Value})
| CompiledMatch #"^(\w+) (\w+)$" [_; first; last] ->
Some({First=first.Value; Middle=None; Last=last.Value})
| _ ->
None
Notice one of the key advantages we gain here, which is the case with pattern matching in general, is that we are able to simultaneously test that an input matches the regex pattern, and decompose the returned list of groups if it does.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Elixir regex replacing square brakets and comma - regex

Instead of a string, you can pass a regex to String.replace. In Elixir you can build a regex with ~r sigil. "[40.45694301152436, -3.6907402812214514]" |> String.replace(~r'[\[\],]', "") |> String.split() |> Enum.map(&String.to_float/1)

Here's another option, using String.slice: "[40.45694301152436, -3.6907402812214514]" |> String.slice(1..-2) |> String.split(~r/,\s+/) |> Enum.map(&String.to_float/1) Cheers!

Apart from the string replace solutions, you can take a look at Code.eval_string as well. This way the string will get parsed and you'll get back the list you're looking for; {list, _} = Code.eval_string "[40.45694301152436, -3.6907402812214514]" # {[40.45694301152436, -3.6907402812214514], []}

Related

Longest prefix of an OCaml `string list` ending in a specific `string` value

Elixir: How to count urls in a string

Programmatically build an F# regular expression with the FsVerbalExpressions library

Removing commas from inside quoted parts of a string in Elm 0.16

F# Mapping Regular Expression Matches with Active Patterns

Categories

Resources