GNU Parallel to run Python script on huge file

GNU Parallel to run Python script on huge file - python-2.7

I have a file which contains XML elements in every line, which needs to be converted to JSON. I have written a Python script which does the conversion but runs in serial mode. I have two options to use Hadoop or GNU Parallel, I have tried Hadoop and want to see how GNU could help, will be simple for sure.
My Python code is as follows:
import sys
import json
import xmltodict
with open('/path/sample.xml') as fd:
for line in fd:
o=xmltodict.parse(line)
t=json.dumps(o)
with open('sample.json', 'a') as out:
out.write(t+ "\n")
So can I use GNU parallel to directly work on the huge file or do I need to split it?
Or is this right:
cat sample.xml | parallel python xmltojson.py >sample.json
Thanks

You need to change your Python code to a UNIX filter, i.e. a program that reads from standard input (stdin) and writes to standard output (stdout). Untested:
import fileinput
import sys
import json
import xmltodict
for line in fileinput.input():
o=xmltodict.parse(line)
t=json.dumps(o)
print t + "\n"
Then you use --pipepart in GNU Parallel:
parallel --pipepart -a sample.xml --block -1 python my_script.py

Related

Run system command with no output

I'm running a wget in Python via os.system.
Is there anyways to hide the output? I tried
> /dev/null
and tried running the command with a $ in front of it.

Use the subprocess module instead.
Using subprocess.call (which is a helper function for some more advanced subprocess features), you can redirect stdout and stderr to file objects. If you open /dev/null (the os module has os.devnull which is a platform-independant path of the null device that can help), you can hand it to subprocess.call and suppress all output
import os
import subprocess
devnull = open(os.devnull, 'w')
subprocess.call([...], stdout=devnull, stderr=devnull)

How to run executable with input file using Python?

I am running an executable from cmd:
*.exe input.inp
I want to run it using python and tried following:
os.system('"*.exe"')
But don't know how to specify input file as well. Any suggestions?

import os
from subprocess import Popen, PIPE
p = Popen('fortranExecutable', stdin=PIPE) #NOTE: no shell=True here
p.communicate(os.linesep.join(["input 1", "input 2"]))
For more please refer to:
Using Python to run executable and fill in user input

I had to launch cmd window and specify input file location from within Python script. This page was really helpful in getting it done.
I used Popen(['cmd', '/K', 'command']) from above page and replaced '/K' with '/C' in it to run and close the cmd window.

import os
os.system(r'pathToExe.exe inputFileOrWhateverOtherCommand')

Spyder SymPy Wont Print Symbolic Math

I setup Anaconda 2.0.0 (Win 64).
It has SymPy 0.7.5.
I configured Spyder (2.3.0rc that came with Anaconda) to use symbolic math:
Tools > Preferences > iPython console > Advanced Settings > Symbolic Mathematics
I create a new project and a new file:
# -*- coding: utf-8 -*-
from sympy import *
init_printing(use_unicode=False, wrap_line=False, no_global=True)
x = Symbol('x')
integrate(x, x)
print("Completed.")
When I run this (Python or iPython console) it does not print the integral -- it only prints Completed.
But what is weird is that while in the console that just did the run if I then re-type:
integrate(x, x)
It does print the integral.
So running from a file never prints any symbolic math but typing in the console manually does?
Can anyone help with this issue -- maybe it some sort of configuration?
Thank you!

Running a script is not the same as executing code in IPython. When you run the code in a cell or prompt in IPython, it captures the output of the last command and displays it to you. When you run a script, the script is just run, and the only thing that is displayed is what is printed to the screen.
I don't think there is a way to send the IPython display object (which would be needed to get pretty latex output) from a script, but I may be misunderstanding how spyder executes the code in IPython, or missing some hooks that it has. You can try
from IPython.display import display
display(integrate(x, x))

It is because integrate doesn't print automatically, it just returns the output. You will have to pass it to print function to get the output. Try using following code:
# -*- coding: utf-8 -*-
from sympy import *
init_printing(use_unicode=False, wrap_line=False, no_global=True)
x = Symbol('x')
print(integrate(x, x))
print("Completed.")
In Python console(or IPython console) returned statements are automatically printed.
Update: Use pprint for a nice formatted output.

How to run a.py file in python cmd?

I'm a newbie to python, so I just installed python27 on my win8 machine and set the path for C:\Python27 and C:\Python27\Scripts.
Now I want to execute a small .py file, so at the shell (python cmd) I type:
python "c:\python27\gtos.py"
File "<stdin>", line 1
python "c:\python27\gtos.py"
^
SyntaxError: invalid syntax
Any help is appreciated...
Thanks

Treat it like a module:
import file
This is good because it's secure, fast, and maintainable. Code gets reused as it's supposed to be done. Most Python libraries run using multiple methods stretched over lots of files. Highly recommended. Note that your import should not include the .py extension at the end.
Use the exec command:
execfile('file.py')
But this is likely to go wrong very often and is kind of hacky.
Spawn a shell process:
import subprocess
import sys
subprocess.check_call([sys.executable, 'file.py'])
Use when desperate.

Saving to particular directory from Python 2.7

I am running python code where I want to write some output to a particular folder (which different from the location where I execute the script).
Therefore I was planning to change the path of Python to this particular folder using the os module:
os.chdir("myLocation.../Folder")
However, the script still writes to the folder where I executed the script, and when I invoke the command
os.curdir
it returns ".".
I am a little bit lost here and would appreciate any hint.

os.chdir should do the correct thing. Here is some code used for testing on python REPL, assuming you have a ./test dir in working dir.
>>> import os
>>> os.chdir('test')
>>> f = open('testfile', 'w')
>>> print>>f, 'hello world'
>>> f.close()
test/testfile is now present with the right contents.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

GNU Parallel to run Python script on huge file - python-2.7

Related

Run system command with no output

How to run executable with input file using Python?

Spyder SymPy Wont Print Symbolic Math

How to run a.py file in python cmd?

Saving to particular directory from Python 2.7

Categories

Resources