I am trying to append a string to an existing string.
I came across this thread here which explains it.
Just for reference I am pasting the content here from that page
let (^$) c s = s ^ Char.escaped c (* append *)
let ($^) c s = Char.escaped c ^ s (* prepend *)
Now I wanted to know what does (^$) mean in
let (^$) c s = s ^ Char.escaped c (* append *)
This page here states that
operator ^ is for string concatenation
what is (^$) ?
#icktoofay is correct, this code:
let (^$) c s = s ^ Char.escaped c
is defining a new operator ^$.
You can use an operator as an ordinary (prefix) function name by enclosing it in parentheses. And, indeed, this is what you do when you define an operator.
$ ocaml
OCaml version 4.02.1
# (+) 44 22;;
- : int = 66
# let (++++) x y = x * 100 + y;;
val ( ++++ ) : int -> int -> int = <fun>
# 3 ++++ 5;;
- : int = 305
Infix operators in OCaml start with one of the operator-like characters =<>#^|&+-*/$%, then can have any number of further operator-like characters !$%&*+-./:<=>?#^|~. So you can have an infix operator $^ or $^??#+ and so on.
See Section 6.1 of the OCaml manual.
It is to append the given character to the string with escaping:
'x' ^$ "hello" ----> "hellox"
'\n' ^$ "hello" ----> "hello\\n"
Related
I am trying to write the following code in OCaml:
let a = 0
let b = 1
if a > b then
{
print_endline "a";
print_endline "a";
}
print_endline "b"
And then I encountered the following error:
File "[21]", line 4, characters 0-2:
4 | if a > b then
^^
Error: Syntax error
I have tried using the begin and end keywords.
If you're writing a program (rather than mucking about in a REPL), then there are only certain constructs which can exist at the top level of your program.
One of those is a binding. So the following is fine:
let a = 0
let b = 1
But a conditional expression (if/else) is not permitted. We can get around this by binding that expression to a pattern. Since print_endline will just return (), we can write:
let () =
...
Your use of { and } is incorrect in this situation, but you can grouped multiple expressions with ; and ( and ). Remember that ; is not a "statement terminator" but rather a separator.
let () =
if a > b then (
print_endline "a";
print_endline "a"
);
print_endline "b"
Note that if can only exist without a matching else if the entire expression returns unit. This meets that criteria.
I've tried to write a grammar for the language. Here is my grammar:
S -> aS | bS | λ
I also wanted to generate the word "bbababb" which does not have two consecutive a's.
I started with,
bS => bbS => bbaS => bbabS => bbabaS => bbababS => bbababbS => bbababbλ => bbababb.
And finally I tried the following regular expression,
(a+b*)a*(a+b*)
I really appreciate your help.
Let's try to write some rules that describe all strings that don't have two consecutive a's:
the empty string is in the language
if x is a string in the language ending in a, you can add b to the end to get another string in the language
if x is a string in the language ending in b, you can add an a or a b to it to get another string in the language
This lets us write down a grammar:
S -> e | aB | bS
B -> e | bS
That grammar should work for us. Consider your string bbababb:
S -> bS -> bbS -> bbaB -> bbabS
-> bbabaB -> bbababS -> bbababbS
-> bbababb
To turn a regular grammar such as this into a regular expression, we can write equations and solve for S:
S = e + aB + bS
B = e + bS
Replace for B:
S = e + a(e + bS) + bS
= e + a + abS + bS
= e + a + (ab + b)S
Now we can eliminate recursion to solve for S:
S = (ab + b)*(e + a)
This gives us a regular expression: (ab + b)*(e + a)
a must always be followed by b, except the last char, so you can express it as "b or ab, with an optional trailing a":
\b(b|ab)+a?\b
See live demo.
\b (word boundaries) might be able to be removed depending on your usage and regex engine.
I need to create a regular expression (for program in haskell) that will catch the strings containing "X" and ".", assuming that there are 4 "X" and only one ".". It cannot catch any string with other X-to-dot relations.
I have thought about something like
[X\.]{5}
But it catches also "XXXXX" or ".....", so it isn't what I need.
That's called permutation parsing, and while "pure" regular expressions can't parse permutations it's possible if your regex engine supports lookahead. (See this answer for an example.)
However I find the regex in the linked answer difficult to understand. It's cleaner in my opinion to use a library designed for permutation parsing, such as megaparsec.
You use the Text.Megaparsec.Perm module by building a PermParser in a quasi-Applicative style using the <||> operator, then converting it into a regular MonadParsec action using makePermParser.
So here's a parser which recognises any combination of four Xs and one .:
import Control.Applicative
import Data.Ord
import Data.List
import Text.Megaparsec
import Text.Megaparsec.Perm
fourXoneDot :: Parsec Dec String String
fourXoneDot = makePermParser $ mkFive <$$> x <||> x <||> x <||> x <||> dot
where mkFive a b c d e = [a, b, c, d, e]
x = char 'X'
dot = char '.'
I'm applying the mkFive function, which just stuffs its arguments into a five-element list, to four instances of the x parser and one dot, combined with <||>.
ghci> parse fourXoneDot "" "XXXX."
Right "XXXX."
ghci> parse fourXoneDot "" "XX.XX"
Right "XXXX."
ghci> parse fourXoneDot "" "XX.X"
Left {- ... -}
This parser always returns "XXXX." because that's the order I combined the parsers in: I'm mapping mkFive over the five parsers and it doesn't reorder its arguments. If you want the permutation parser to return its input string exactly, the trick is to track the current position within the component parsers, and then sort the output.
fourXoneDotSorted :: Parsec Dec String String
fourXoneDotSorted = makePermParser $ mkFive <$$> x <||> x <||> x <||> x <||> dot
where mkFive a b c d e = map snd $ sortBy (comparing fst) [a, b, c, d, e]
x = withPos (char 'X')
dot = withPos (char '.')
withPos = liftA2 (,) getPosition
ghci> parse fourXoneDotSorted "" "XX.XX"
Right "XX.XX"
As the megaparsec docs note, the implementation of the Text.Megaparsec.Perm module is based on Parsing Permutation Phrases; the idea is described in detail in the paper and the accompanying slides.
The other answers look quite complicated to me, given that there are only five strings in this language. Here's a perfectly fine and very readable regex for this:
\.XXXX|X\.XXX|XX\.XX|XXX\.X|XXXX\.
Are you attached to regex, or did you just end up at regex because this was a question you didn't want to try answering with applicative parsers?
Here's the simplest possible attoparsec implementation I can think of:
parseDotXs :: Parser ()
parseDotXs = do
dotXs <- count 5 (satisfy (inClass ".X"))
let (dots,xS) = span (=='.') . sort $ dotXs
if (length dots == 1) && (length xS == 4) then do
return ()
else do
fail "Mismatch between dots and Xs"
You may need to adjust slightly depending on your input type.
There are tons of fancy ways to do stuff in applicative parsing land, but there is no rule saying you can't just do things the rock-stupid simple way.
Try the following regex :
(?<=^| )(?=[^. ]*\.)(?=(?:[^X ]*X){4}).{5}(?=$| )
Demo here
If you have one word per string, you can simplify the regex by this one :
^(?=[^. \n]*\.)(?=(?:[^X \n]*X){4}).{5}$
Demo here
I'm trying to write a lexer for a variation on C using OCaml. For the lexer I need to match the strings "^" and "||" (as the exponentiation and or symbols respectively). Both of these are special characters in regex, and when I try to escape them using the backslash, nothing changes and the code runs as if "\^" was still beginning of line and "\|\|" was still "or or". What can I do to fix this?
Backslash characters in string literals have to be doubled to make them past the OCaml string parser:
# let r = Str.regexp "\\^" in
Str.search_forward r "FOO^BAR" 0;;
- : int = 3
If you are using OCaml 4.02 or later, you can also use quoted strings ({| ... |}), which do not handle a backslash character specially. This may result in more readable code because backslash characters do not have to be doubled:
# let r = Str.regexp {|\^|} in
Str.search_forward r "FOO^BAR" 0;;
- : int = 3
Or you may consider using Str.regexp_string (or Str.quote), which creates a regular expression that will match all characters in its argument literally:
# let r = Str.regexp_string "^" in
Str.search_forward r "FOO^BAR" 0;;
- : int = 3
The Str module does not take | as a special regex character, so you do not have to worry about quoting when you want to use it literally:
# let r = Str.regexp "||" in
Str.search_forward r "FOO||BAR" 0;;
- : int = 3
| has to be quoted only when you want to use it as the "or" construct:
# let r = Str.regexp "BAZ\\|BAR" in
Str.search_forward r "FOOBAR" 0;;
- : int = 3
You might want to refer to Str.regexp for the full syntax of regular expressions.
infix 3 .. errors out. Which characters are allowed or not allowed for defining custom infixes? Where might I find a list online?
thanks
You may infix any non-qualified identifier.
The following is from the SML 90' definition
The following are the reserved words used in the Core. They may not (except =) be used as identifiers.
abstype and andalso as case do datatype else
end exception fn fun handle if in infix
infixr let local nonfix of op open orelse
raise rec then type val with withtype while
( ) [ ] { } , : ; ... _ | = => -> #
....
An identifier is either alphanumeric: any sequence of letters,
digits or primes (') and underbars (_) starting with a letter or
prime, or symbolic: any non-empty sequence of the following
symbols:
! % & # + - / : < = > ? # \ ~ ' ^ | *
In either case, however, reserved words are excluded. This means that
for example # and | are not identifiers, but ## and |=| are
identifiers. The only exception to this rule is that the symbol =,
which is a reserved word, is also allowed as an identifier to stand
for the equality predicate.