What is the difference between these three fscanf calls in OCaml? - ocaml

I wrote a short bit of code to simply skip num_lines lines in an input file (printing the lines out for debugging purposes. Here's two things I tried that didn't work:
for i = 0 to num_lines do
print_endline (fscanf infile "%s" (fun p -> p));
done;;
for i = 0 to num_lines do
print_endline (fscanf infile "%S\n" (fun p -> p));
done;;
But this one did work:
for i = 0 to num_lines do
print_endline (fscanf infile "%s\n" (fun p -> p));
done;;
I've been trying to comprehend the documentation on fscanf, but it doesn't seem to be sinking in. Could someone explain to me exactly why the last snippet worked, but the first two didn't?

"%s" -- Matches everything to next white-space ("\n" here) but never matches "\n"
"%S\n" -- Matches thing that looks like Ocaml strings, then a "\n"
"%s\n" -- Matches everything to next white-space ("\n" here) then "\n". This will act different if there is no trailing "\n" in file or if there is a space before the "\n", etc.
"%s " -- Matches anything up to white-space, and then all trailing white-space including "\n" (or possibly even no white-space). This works because " " means "any white space, possible none", in the format string.

Related

Str.global_replace in OCaml putting carats where they shouldn't be

I am working to convert multiline strings into a list of tokens that might be easier for me to work with.
In accordance with the specific needs of my project, I'm padding any carat symbol that appears in my input with spaces, so that "^" gets turned into " ^ ". I'm using something like the following function to do so:
let bad_function string = Str.global_replace (Str.regexp "^") " ^ " (string)
I then use something like the below function to then turn this multiline string into a list of tokens (ignoring whitespace).
let string_to_tokens string = (Str.split (Str.regexp "[ \n\r\x0c\t]+") (string));;
For some reason, bad_function adds carats to places where they shouldn't be. Take the following line of code:
(bad_function " This is some
multiline input
with newline characters
and tabs. When I convert this string
into a list of tokens I get ^s showing up where
they shouldn't. ")
The first line of the string turns into:
^ This is some \n ^
When I feed the output from bad_function into string_to_tokens I get the following list:
string_to_tokens (bad_function " This is some
multiline input
with newline characters
and tabs. When I convert this string
into a list of tokens I get ^s showing up where
they shouldn't. ")
["^"; "This"; "is"; "some"; "^"; "multiline"; "input"; "^"; "with";
"newline"; "characters"; "^"; "and"; "tabs."; "When"; "I"; "convert";
"this"; "string"; "^"; "into"; "a"; "list"; "of"; "tokens"; "I"; "get";
"^s"; "showing"; "up"; "where"; "^"; "they"; "shouldn't."]
Why is this happening, and how can I fix so these functions behave like I want them to?
As explained in the Str module.
^ Matches at beginning of line: either at the beginning of the
matched string, or just after a '\n' character.
So you have to quote the '^' character using the escape character "\".
However, note that (also from the doc)
any backslash character in the regular expression must be doubled to
make it past the OCaml string parser.
This means you have to put a double '\' to do what you want without getting a warning.
This should do the job:
let bad_function string = Str.global_replace (Str.regexp "\\^") " ^ " (string);;

Why can I not print this input a second time in OCaml?

I am very new to OCaml and am attempting to learn and write a program at the same time. I am writing a palindrome program. I am attempting to get a string from the user such as d e v e d or Hello World! or loud all of the preceding are valid user input. I need to read these strings and display them then reverse them and check if it is a palindrome or not. I did the following code...
print_string "Enter a string: ";
let str = read_line () in
Printf.printf "%s\n" str;;
Printf.printf "%s\n" str;;
this works fine and will give the print, Enter a string: d e v e d or Enter a string: Hello World! The issue comes when I try to add another Printf.printf "%s\n" str;; into the code. it gives me an error of File "main.ml", line 5, characters 21-24:
Error: Unbound value str with line 5 being the line of the 2nd Printf.printf statement. I have tried this with no ; for both of the print statements, with 1 or with 2 and I get the same error each time. Does anyone with more OCaml knowledge know why I get this error.
Because of your use of in your code parses as:
(let str = read_line () in Printf.printf "%s\n" str);;
and then a completely separate:
Printf.printf "%s\n" str;;
So str is local to the first printf.
You want:
let str = read_line ();;
Printf.printf "%s\n" str;;
Printf.printf "%s\n" str;;
which is three separate definitions. The first defines a global variable str.

Format: add trailing spaces to character output to left-justify

How do you format a string to have constant width and be left-justified? There is the Aw formatter, where w denotes desired width of character output, but it prepends the spaces if w > len(characters), instead of appending them.
When I try
44 format(A15)
print 44, 'Hi Stack Overflow'
I get
> Hi Stack Overflow<
instead of
>Hi Stack Overflow <
Is there any simple Fortran formatting solution that solves this?
As noted in the question, the problem is that when a character expression of length shorter than the output field width the padding spaces appear before the character expression. What we want is for the padding spaces to come after our desired string.
There isn't a simple formatting solution, in the sense of a natural edit descriptor. However, what we can do is output an expression with sufficient trailing spaces (which count towards the length).
For example:
print '(A50)', 'Hello'//REPEAT(' ',50)
or
character(50) :: hello='Hello'
print '(A50)', hello
or even
print '(A50)', [character(50) :: 'hello']
That is, in each case the output item is a character of length (at least) 50. Each will be padded on the right with blanks.
If you chose, you could even make a function which returns the extended (left-justified) expression:
print '(A50)', right_pad('Hello')
where the function is left as an exercise for the reader.
To complete #francescalus excellent answer for future reference, the proposed solution also works in case of allocatables in place of string literals:
character(len=:), allocatable :: a_str
a_str = "foobar"
write (*,"(A,I4)") a_str, 42
write (*,"(A,I4)") [character(len=20) :: a_str], 42
will output
foobar 42
foobar 42
a bit ugly but you can concatenate a blank string:
character*15 :: blank=' '
print 44, 'Hi Stack Overflow'//blank
program test ! Write left justified constant width character columns
! declare some strings.
character(len=32) :: str1,str2,str3,str4,str5,str6
! define the string values.
str1 = " Nina "; str2 = " Alba " ; str3 = " blue "
str4 = " Jamil "; str5 = " Arnost " ; str6 = " green "
write(*,'(a)') "123456789012345678901234567890"
! format to 3 columns 10 character wide each.
! and adjust the stings to the left.
write(*,'(3(a10))') adjustl(str1), adjustl(str2), adjustl(str3)
write(*,'(3(a10))') adjustl(str4), adjustl(str5), adjustl(str6)
end program test
$ ./a.out
123456789012345678901234567890
Nina Alba blue
Jamil Arnost green
adjustl() moves leading spaces to the end of the string.
I suggest that you do not limit the number of output characters.
Change it to the following will work:
44 format(A)
print 44, 'Hi Stack Overflow'

Split string with specified delimiter in lua

I'm trying to create a split() function in lua with delimiter by choice, when the default is space.
the default is working fine. The problem starts when I give a delimiter to the function. For some reason it doesn't return the last sub string.
The function:
function split(str,sep)
if sep == nil then
words = {}
for word in str:gmatch("%w+") do table.insert(words, word) end
return words
end
return {str:match((str:gsub("[^"..sep.."]*"..sep, "([^"..sep.."]*)"..sep)))} -- BUG!! doesnt return last value
end
I try to run this:
local str = "a,b,c,d,e,f,g"
local sep = ","
t = split(str,sep)
for i,j in ipairs(t) do
print(i,j)
end
and I get:
1 a
2 b
3 c
4 d
5 e
6 f
Can't figure out where the bug is...
When splitting strings, the easiest way to avoid corner cases is to append the delimiter to the string, when you know the string cannot end with the delimiter:
str = "a,b,c,d,e,f,g"
str = str .. ','
for w in str:gmatch("(.-),") do print(w) end
Alternatively, you can use a pattern with an optional delimiter:
str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+),?") do print(w) end
Actually, we don't need the optional delimiter since we're capturing non-delimiters:
str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+)") do print(w) end
Here's my go-to split() function:
-- split("a,b,c", ",") => {"a", "b", "c"}
function split(s, sep)
local fields = {}
local sep = sep or " "
local pattern = string.format("([^%s]+)", sep)
string.gsub(s, pattern, function(c) fields[#fields + 1] = c end)
return fields
end
"[^"..sep.."]*"..sep This is what causes the problem. You are matching a string of characters which are not the separator followed by the separator. However, the last substring you want to match (g) is not followed by the separator character.
The quickest way to fix this is to also consider \0 a separator ("[^"..sep.."\0]*"..sep), as it represents the beginning and/or the end of the string. This way, g, which is not followed by a separator but by the end of the string would still be considered a match.
I'd say your approach is overly complicated in general; first of all you can just match individual substrings that do not contain the separator; secondly you can do this in a for-loop using the gmatch function
local result = {}
for field in your_string:gsub(("[^%s]+"):format(your_separator)) do
table.insert(result, field)
end
return result
EDIT: The above code made a bit more simple:
local pattern = "[^%" .. your_separator .. "]+"
for field in string.gsub(your_string, pattern) do
-- ...and so on (The rest should be easy enough to understand)
EDIT2: Keep in mind that you should also escape your separators. A separator like % could cause problems if you don't escape it as %%
function escape(str)
return str:gsub("([%^%$%(%)%%%.%[%]%*%+%-%?])", "%%%1")
end

Is trailing white space is forbidden in s-expression?

When I try sexplib, it tells me
Sexp.of_string " a";; is correct.
Sexp.of_string "a ";; is wrong.
Is trailing white space is forbidden in sexp?
Why?
According to an informal grammar specification, whitespaces should be ignored on both ends of an atom:
{2 Syntax Specification of S-expressions}
{9 Lexical conventions of S-expression}
Whitespace, which consists of space, newline, carriage return,
horizontal tab and form feed, is ignored unless within an
OCaml-string, where it is treated according to OCaml-conventions. The
semicolon introduces comments. Comments are ignored, and range up to
the next newline character. The left parenthesis opens a new list,
the right parenthesis closes it again. Lists can be empty. The
double quote denotes the beginning and end of a string following the
lexical conventions of OCaml (see OCaml-manual for details). All
characters other than double quotes, left- and right parentheses, and
whitespace are considered part of a contiguous string.
Indeed, you can read an atom with a trailing whitespace from a file without any errors.
The error is thrown from a function Pre_sexp.of_string_bigstring in a case when a parser successfully returns, but something was left in a buffer. So the main question is why did something has left in the buffer. It seems that there exists several parsers, and files and string are parsed with different parsers.
I've examined parse_atom rule defined at pre_sexp.ml:699 (all locations are for this commit ) and discovered that when the trailing whitespace is hit, the bump_found_atom is called. Then, if something is on stack, the position indicator is incremented and parsing continues. Otherwise, parsing is finished, but the position is not incremented. With a simple patch this can be fixed:
diff --git a/lib/pre_sexp.ml b/lib/pre_sexp.ml
index 86603f3..9690c0f 100644
--- a/lib/pre_sexp.ml
+++ b/lib/pre_sexp.ml
## -502,7 +502,7 ## let mk_cont_parser cont_parse = (); fun _state str ~max_pos ~pos ->
let pbuf_str = Buffer.contents pbuf in \
let atom = MK_ATOM in \
match GET_PSTACK with \
- | [] -> Done (atom, mk_parse_pos state pos) \
+ | [] -> Done (atom, mk_parse_pos state (pos + 1)) \
| rev_sexp_lst :: sexp_stack -> \
Buffer.clear pbuf; \
let pstack = (atom :: rev_sexp_lst) :: sexp_stack in \
After this patch, the following code produces an expected 'a', 'a', 'a' output:
let s1 = Sexp.of_string " a" in
let s2 = Sexp.of_string "a " in
let s3 = Sexp.of_string " a " in
printf "'%s', '%s', '%s'\n"
(Sexp.to_string s1)
(Sexp.to_string s2)
(Sexp.to_string s3);