I want to use Yesod and Haskell in order to call Pandoc, to translate some format into some other format.
pandocConverted :: String -> IO String
pandocConverted input = do
(Just hIn, Just hOut, _, _) <- createProcess (proc "pandoc" []) { std_in = CreatePipe, std_out = CreatePipe }
hPutStr hIn input
converted <- hGetContents hOut
return converted
This works well. But how do I translate into a different format?
e.g. how do I call pandoc like this?
pandoc -s README -o example4.tex
or this?
pandoc -s -S -t docbook README -o example9.db
You can use the -f and -t flags to specify source and target formats when using pandoc in a pipe.
createProcess (proc "pandoc" ["-f", "markdown", "-t", "latex"])
However, since pandoc is foremostly a Haskell library (to which the pandoc executable is merely a command-line interface), it would be more idiomatic to invoke the library directly within your Yesod program, instead of using createProcess.
import Text.Pandoc
import Text.Pandoc.Error (handleError)
import Control.Arrow
pandocConverted' :: String -> String
pandocConverted' = readMarkdown def
>>> fmap (writeLaTeX def {writerReferenceLinks = True})
>>> handleError
Related
I have searched in modules Sys, Gc, and Unix, but did not find a means to get the system page size in OCaml. How can we get the system page size?
I have OCaml 4.06 and macOS 10.12.6 (Sierra)
If you just want the answer for MacOS, there is a pagesize command that you can run with Unix.open_process_in:
$ rlwrap ocaml
OCaml version 4.06.0
# #load "unix.cma";;
# Unix.open_process_in "pagesize" |> input_line |> int_of_string;;
- : int = 4096
Update
There is a POSIX command line program getconf that is quite portable, I believe. It works on macOS and all the versions of Linux I tried. You can use that instead:
$ rlwrap ocaml
OCaml version 4.06.0
# #load "unix.cma";;
# Unix.open_process_in "getconf PAGE_SIZE" |> input_line |> int_of_string;;
- : int = 4096
You can call sysconf(SC_PAGESIZE) from ocaml to get that information. You can either do that using a .c file, or using ctypes (although you'll need the value of the SC_PAGESIZE, so it might not be the best solution):
% utop -require ctypes.foreign
# open Foreign;;
# open Ctypes;;
# let sysconf = foreign "sysconf" (int #-> returning long);;
val sysconf : int -> Signed.long = <fun>
# sysconf 30;;
- : Signed.long = <long 4096>
I'm using Python 2.7 and I'm trying to accomplish a shell like behavior using argparse.
My issue, in general, that I cannot seem to find a way, in Python 2.7, to use argparse's subparsers as optional.
It's kind of hard to explain my issue so I'll describe what I require from my program.
The program has 2 modes of work:
Starting the program with a given command (each command has it's own
additional arguments) and additional arguments will run a specific
task.
Starting the program without a command will start a shell-like program that can take a line of arguments and process them as if the
program was called with the given line as it's arguments.
So, if for example my program supports 'cmd1' and 'cmd2' commands, I could use it like so:
python program.py cmd1 additional_args1
python program.py cmd2 additional_args2
or with shell mode:
python program.py
cmd1 additional_args1
cmd2 additional_args2
quit
In addition, I also want my program to be able to take optional global arguments that will effect all commands.
For that I'm using argparse like so (This is a pure example):
parser = argparse.ArgumentParser(description="{} - Version {}".format(PROGRAM_NAME, PROGRAM_VERSION))
parser.add_argument("-i", "--info", help="Display more information")
subparsers = parser.add_subparsers()
parserCmd1 = subparsers.add_parser("cmd1", help="First Command")
parserCmd1.set_defaults(func=cmd1)
parserCmd2 = subparsers.add_parser("cmd2", help="Second Command")
parserCmd2.add_argument("-o", "--output", help="Redirect Output")
parserCmd2.set_defaults(func=cmd2)
So I can call cmd1 (with no additional args) or cmd2 (with or without -o flag). And for both I can add flag -i to display even more information of the called command.
My issue is that I cannot activate shell mode, because I have to provide cmd1 or cmd2 as an argument (because of using subparsers which are mandatory)
Restrictions:
I cannot use Python 3 (I know it can be easily done there)
Because of global optional arguments I cannot check to see if I get no arguments to skip arg parsing.
I don't want to add a new command to call shell, it must be when providing no command at all
So how can I achieve This kind of behavior with argparse and python 2.7?
Another idea is to use a 2 stage parsing. One handles 'globals', returning strings it can't handle. Then conditionally handle the extras with subparsers.
import argparse
def cmd1(args):
print('cmd1', args)
def cmd2(args):
print('cmd2', args)
parser1 = argparse.ArgumentParser()
parser1.add_argument("-i", "--info", help="Display more information")
parser2 = argparse.ArgumentParser()
subparsers = parser2.add_subparsers(dest='cmd')
parserCmd1 = subparsers.add_parser("cmd1", help="First Command")
parserCmd1.set_defaults(func=cmd1)
parserCmd2 = subparsers.add_parser("cmd2", help="Second Command")
parserCmd2.add_argument("-o", "--output", help="Redirect Output")
parserCmd2.set_defaults(func=cmd2)
args, extras = parser1.parse_known_args()
if len(extras)>0 and extras[0] in ['cmd1','cmd2']:
args = parser2.parse_args(extras, namespace=args)
args.func(args)
else:
print('doing system with', args, extras)
sample runs:
0901:~/mypy$ python stack46667843.py -i info
('doing system with', Namespace(info='info'), [])
0901:~/mypy$ python stack46667843.py -i info extras for sys
('doing system with', Namespace(info='info'), ['extras', 'for', 'sys'])
0901:~/mypy$ python stack46667843.py -i info cmd1
('cmd1', Namespace(cmd='cmd1', func=<function cmd1 at 0xb74b025c>, info='info'))
0901:~/mypy$ python stack46667843.py -i info cmd2 -o out
('cmd2', Namespace(cmd='cmd2', func=<function cmd2 at 0xb719ebc4>, info='info', output='out'))
0901:~/mypy$
A bug/issue (with links) on the topic of 'optional' subparsers.
https://bugs.python.org/issue29298
Notice that this has a recent pull request.
With your script and the addition of
args = parser.parse_args()
print(args)
results are
1008:~/mypy$ python3 stack46667843.py
Namespace(info=None)
1009:~/mypy$ python2 stack46667843.py
usage: stack46667843.py [-h] [-i INFO] {cmd1,cmd2} ...
stack46667843.py: error: too few arguments
1009:~/mypy$ python2 stack46667843.py cmd1
Namespace(func=<function cmd1 at 0xb748825c>, info=None)
1011:~/mypy$ python3 stack46667843.py cmd1
Namespace(func=<function cmd1 at 0xb7134dac>, info=None)
I thought the 'optional' subparsers affected both Py2 and 3 versions, but apparently it doesn't. I'll have to look at the code to verify why.
In both languages, subparsers.required is False. If I set it to true
subparsers.required=True
(and add a dest to the subparsers definition), the PY3 error message is
1031:~/mypy$ python3 stack46667843.py
usage: stack46667843.py [-h] [-i INFO] {cmd1,cmd2} ...
stack46667843.py: error: the following arguments are required: cmd
So there's a difference in how the 2 versions test for required arguments. Py3 pays attention to the required attribute; Py2 (apparently) uses the earlier method of checking whether the positionals list is empty or not.
Checking for required arguments occurs near the end of parser._parse_known_args.
Python2.7 includes
# if we didn't use all the Positional objects, there were too few
# arg strings supplied.
if positionals:
self.error(_('too few arguments'))
before the iteration that checks action.required. That's what's catching the missing cmd and saying too few arguments.
So a kludge is to edit your argparse.py and remove that block so it matches the corresponding section of the Py3 version.
All my scripts use Unicode literals throughout, with
from __future__ import unicode_literals
but this creates a problem when there is the potential for functions being called with bytestrings, and I'm wondering what the best approach is for handling this and producing clear helpful errors.
I gather that one common approach, which I've adopted, is to simply make this clear when it occurs, with something like
def my_func(somearg):
"""The 'somearg' argument must be Unicode."""
if not isinstance(arg, unicode):
raise TypeError("Parameter 'somearg' should be a Unicode")
# ...
for all arguments that need to be Unicode (and might be bytestrings). However even if I do this, I encounter problems with my argparse command line script if supplied parameters correspond to such arguments, and I wonder what the best approach here is. It seems that I can simply check the encoding of such arguments, and decode them using that encoding, with, for example
if __name__ == '__main__':
parser = argparse.ArgumentParser(...)
parser.add_argument('somearg', ...)
# ...
args = parser.parse_args()
some_arg = args.somearg
if not isinstance(config_arg, unicode):
some_arg = some_arg.decode(sys.getfilesystemencoding())
#...
my_func(some_arg, ...)
Is this combination of approaches a common design pattern for Unicode modules that may receive bytestring inputs? Specifically,
can I reliable decode command line arguments in this way, and
will sys.getfilesystemencoding() give me the correct encoding for command line arguments; or
does argparse provide some builtin facility for accomplishing this that I've missed?
I don't think getfilesystemencoding will necessarily get the right encoding for the shell, it depends on the shell (and can be customised by the shell, independent of the filesystem). The file system encoding is only concerned with how non-ascii filenames are stored.
Instead, you should probably be looking at sys.stdin.encoding which will give you the encoding for standard input.
Additionally, you might consider using the type keyword argument when you add an argument:
import sys
import argparse as ap
def foo(str_, encoding=sys.stdin.encoding):
return str_.decode(encoding)
parser = ap.ArgumentParser()
parser.add_argument('my_int', type=int)
parser.add_argument('my_arg', type=foo)
args = parser.parse_args()
print repr(args)
Demo:
$ python spam.py abc hello
usage: spam.py [-h] my_int my_arg
spam.py: error: argument my_int: invalid int value: 'abc'
$ python spam.py 123 hello
Namespace(my_arg=u'hello', my_int=123)
$ python spam.py 123 ollǝɥ
Namespace(my_arg=u'oll\u01dd\u0265', my_int=123)
If you have to work with non-ascii data a lot, I would highly recommend upgrading to python3. Everything is a lot easier there, for example, parsed arguments will already be unicode on python3.
Since there is conflicting information about the command line argument encoding around, I decided to test it by changing my shell encoding to latin-1 whilst leaving the file system encoding as utf-8. For my tests I use the c-cedilla character which has a different encoding in these two:
>>> u'Ç'.encode('ISO8859-1')
'\xc7'
>>> u'Ç'.encode('utf-8')
'\xc3\x87'
Now I create an example script:
#!/usr/bin/python2.7
import argparse as ap
import sys
print 'sys.stdin.encoding is ', sys.stdin.encoding
print 'sys.getfilesystemencoding() is', sys.getfilesystemencoding()
def encoded(s):
print 'encoded', repr(s)
return s
def decoded_filesystemencoding(s):
try:
s = s.decode(sys.getfilesystemencoding())
except UnicodeDecodeError:
s = 'failed!'
return s
def decoded_stdinputencoding(s):
try:
s = s.decode(sys.stdin.encoding)
except UnicodeDecodeError:
s = 'failed!'
return s
parser = ap.ArgumentParser()
parser.add_argument('first', type=encoded)
parser.add_argument('second', type=decoded_filesystemencoding)
parser.add_argument('third', type=decoded_stdinputencoding)
args = parser.parse_args()
print repr(args)
Then I change my shell encoding to ISO/IEC 8859-1:
And I call the script:
wim-macbook:tmp wim$ ./spam.py Ç Ç Ç
sys.stdin.encoding is ISO8859-1
sys.getfilesystemencoding() is utf-8
encoded '\xc7'
Namespace(first='\xc7', second='failed!', third=u'\xc7')
As you can see, the command line arguments were encoding in latin-1, and so the second command line argument (using sys.getfilesystemencoding) fails to decode. The third command line argument (using sys.stdin.encoding) decodes correctly.
sys.getfilesystemencoding() is the correct(but see examples) encoding for OS data such as filenames, environment variables, and command-line arguments.
You could see the logic behind the choice: sys.argv[0] may be the path to the script (the filename) and therefore it is natural to assume that it uses the same encoding as other filenames and that other items in the argv list use the same character encoding as sys.argv[0]. os.environ['PATH'] contains paths and therefore it is also natural that environment variables use the same encoding:
$ echo 'import sys; print(sys.argv)' >print_argv.py
$ python print_argv.py
['print_argv.py']
Note: sys.argv[0] is the script filename whatever other command-line arguments you might have.
"best way" depends on your specific use-case e.g., on Windows, you should probably use Unicode API directly (CommandLineToArgvW()). On POSIX, if all you need is to pass some argv items to OS functions back (such as os.listdir()) then you could leave them as bytes -- command-line argument can be arbitrary byte sequence, see PEP 0383 -- Non-decodable Bytes in System Character Interfaces:
import os, sys
os.execl(sys.executable, sys.executable, '-c', 'import sys; print(sys.argv)',
bytes(bytearray(range(1, 0x100))))
As you can see POSIX allows to pass any bytes (except zero).
Obviously, you can also misconfigure your environment:
$ LANG=C PYTHONIOENCODING=latin-1 python -c'import sys;
> print(sys.argv, sys.stdin.encoding, sys.getfilesystemencoding())' €
(['-c', '\xe2\x82\xac'], 'latin-1', 'ANSI_X3.4-1968') # Linux output
The output shows that € is encoded using utf-8 but both locale and PYTHONIOENCODING are configured differently.
The examples demonstrate that sys.argv may be encoded using a character encoding that does not correspond to any of the standard encodings or it even may contain arbitrary (except zero byte) binary data on POSIX (no character encoding). On Windows, I guess, you could paste a Unicode string that can't be encoded using ANSI or OEM Windows encodings but you might get the correct value using Unicode API anyway (Python 2 probably drops data here).
Python 3 uses Unicode sys.argv and therefore it shouldn't lose data on Windows (Unicode API is used) and it allows to demonstrate that sys.getfilesystemencoding() is used (not sys.stdin.encoding) to decode sys.argv on Linux (where sys.getfilesystemencoding() is derived from locale):
$ LANG=C.UTF-8 PYTHONIOENCODING=latin-1 python3 -c'import sys; print(*map(ascii, sys.argv))' µ
'-c' '\xb5'
$ LANG=C PYTHONIOENCODING=latin-1 python3 -c'import sys; print(*map(ascii, sys.argv))' µ
'-c' '\udcc2\udcb5'
$ LANG=en_US.ISO-8859-15 PYTHONIOENCODING=latin-1 python3 -c'import sys; print(*map(ascii, sys.argv))' µ
'-c' '\xc2\xb5'
The output shows that LANG that defines locale in this case that defines sys.getfilesystemencoding() on Linux is used to decode the command-line arguments:
$ python3
>>> print(ascii(b'\xc2\xb5'.decode('utf-8')))
'\xb5'
>>> print(ascii(b'\xc2\xb5'.decode('ascii', 'surrogateescape')))
'\udcc2\udcb5'
>>> print(ascii(b'\xc2\xb5'.decode('iso-8859-15')))
'\xc2\xb5'
I want to create a haskell interpreter that I can use from C++ on linux.
I have a file FFIInterpreter.hs which implements the interpreter in haskell and exports the functions via FFI to C++.
module FFIInterpreter where
import Language.Haskell.Interpreter
import Data.IORef
import Foreign.StablePtr
import Foreign.C.Types
import Foreign.C.String
import Control.Monad
import Foreign.Marshal.Alloc
type Session = Interpreter ()
type Context = StablePtr (IORef Session)
foreign export ccall createContext :: CString -> IO Context
createContext :: CString -> IO Context
createContext name = join ((liftM doCreateContext) (peekCString name))
where
doCreateContext :: ModuleName -> IO Context
doCreateContext name
= do let session = newModule name
_ <- runInterpreter session
liftIO $ newStablePtr =<< newIORef session
newModule :: ModuleName -> Session
newModule name = loadModules [name] >> setTopLevelModules [name]
foreign export ccall freeContext :: Context -> IO ()
freeContext :: Context -> IO ()
freeContext = freeStablePtr
foreign export ccall runExpr :: Context -> CString -> IO CString
runExpr :: Context -> CString -> IO CString
runExpr env input = join ((liftM newCString) (join (((liftM liftM) doRunExpr) env (peekCString input))))
where
doRunExpr :: Context -> String -> IO String
doRunExpr env input
= do env_value <- deRefStablePtr env
tcs_value <- readIORef env_value
result <- runInterpreter (tcs_value >> eval input)
return $ either show id result
foreign export ccall freeString :: CString -> IO ()
freeString :: CString -> IO ()
freeString = Foreign.Marshal.Alloc.free
When I compile the whole project with ghc, everything works fine. I use the following command:
ghc -no-hs-main FFIInterpreter.hs main.cpp -lstdc++
But the haskell module is only a small piece of the C++ project and I don't want the whole project to depend on ghc.
So I want to build a dynamic library with ghc and then link it to the project using g++.
$ ghc -shared -fPIC FFIInterpreter.hs module_init.c -lstdc++
[1 of 1] Compiling FFIInterpreter ( FFIInterpreter.hs, FFIInterpreter.o )
Linking a.out ...
/usr/bin/ld: /usr/lib/haskell-packages/ghc/lib/hint-0.3.3.2/ghc-7.0.3/libHShint-0.3.3.2.a(Interpreter.o): relocation R_X86_64_32S against `.data' can not be used when making a shared object; recompile with -fPIC
/usr/lib/haskell-packages/ghc/lib/hint-0.3.3.2/ghc-7.0.3/libHShint-0.3.3.2.a: could not read symbols: Bad value
collect2: ld gab 1 als Ende-Status zurück
So I added the -dynamic keyword, but that also doesn't work:
$ ghc -dynamic -shared -fPIC FFIInterpreter.hs librarymain.cpp -lstdc++
FFIInterpreter.hs:3:8:
Could not find module `Language.Haskell.Interpreter':
Perhaps you haven't installed the "dyn" libraries for package `hint-0.3.3.2'?
Use -v to see a list of the files searched for.
I searched my system for Interpreter.dyn_hi but didn't find it. Is there a way to get it?
I also tried to install hint manually, but this also doesn't deliver the Interpreter.dyn_hi file.
You have to install the library (and all it depends on) with the --enable-shared flag (using cabal-install) to get the .dyn_hi and .dyn_o files. You may consider setting that option in your ~/.cabal/config file.
Perhaps the easiest way is to uncomment the shared: XXX line in ~/.cabal/config, set the option to True and
cabal install --reinstall world
For safety, run that with the --dry-run option first to detect problems early. If the --dry-run output looks reasonable, go ahead and reinstall - it will take a while, though.
I'm not a Perl user, but from this question deduced that it's exceedingly easy to retrieve the standard output of a program executed through a Perl script using something akin to:
$version = `java -version`;
How would I go about getting the same end result in Python? Does the above line retrieve standard error (equivalent to C++ std::cerr) and standard log (std::clog) output as well? If not, how can I retrieve those output streams as well?
Thanks,
Geoff
For python 2.5: sadly, no. You need to use subprocess:
import subprocess
proc = subprocess.Popen(['java', '-version'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = proc.communicate()
Docs are at http://docs.python.org/library/subprocess.html
In Python 2.7+
from subprocess import check_output as qx
output = qx(['java', '-version'])
The answer to Capturing system command output as a string question has implementation for Python < 2.7.
As others have mentioned you want to use the Python subprocess module for this.
If you really want something that's more succinct you can create a function like:
#!/usr/bin/env python
import subprocess, shlex
def captcmd(cmd):
proc = subprocess.Popen(shlex.split(cmd), \
stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False)
out, err = proc.communicate()
ret = proc.returncode
return (ret, out, err)
... then you can call that as:
ok, o, e = captcmd('ls -al /foo /bar ...')
print o
if not ok:
print >> sys.stderr, "There was an error (%d):\n" % ok
print >> sys.stderr, e
... or whatever.
Note: I'm using shlex.split() as a vastly safer alternative to shell=True
Naturally you could write this to suit your own tastes. Of course for every call you have to either provide three names into which it can unpack the result tuple or you'd have to pull the desired output from the result using normal indexing (captcmd(...)[1] for the output, for example). Naturally you could write a variation of this function to combine stdout and stderr and to discard the result code. Those "features" would make it more like the Perl backtick expressions. (Do that and take out the shlex.split() call and you have something that's as crude and unsafe as what Perl does, in fact).