Based on one of the solutions here, I'm using the following code to strip out the EXIF data from an image:
def remove_exif_from_image(image_path):
img = Image.open(image_path)
data = list(img.getdata())
clean_img = Image.new(img.mode, img.size)
clean_img.putdata(data)
clean_img.save(image_path)
I found this function works just fine on my local machine, however, when I try to run this on my tiny DigitalOcean VPS it causes my gunicorn process to crash.
I'm guessing this is due to img.getdata() returning something huge.
How might I strip out the EXIF by reading/writing in chunks as opposed to reading the entire image into memory?
Since the primary constraint seems to be "something that runs on my tiny VPS", consider installing and using exiftool for this task, and make a system call to it:
exiftool -all= -overwrite_original tmp.jpg
This doesn't exactly answer your question about using python streams, but may solve your problem.
Related
I am working with a command line tool called 'ideviceinfo' (see https://github.com/libimobiledevice) to help me to quickly get back serial, IMEI and battery health information from the iOS device I work with daily. It executes much quicker than Apple's own 'cfgutil' tools.
Up to know I have been able to develop a more complicated script than the one shown below in PyCharm (my main IDE) to assign specific values etc to individual variables and then to use something like to pyclip and pyautogui to help automatically paste these into the fields of the database app we work with. I have also been able to use the simplified version of the script both in Mac OS X terminal and in the python shell without any hiccups.
I am looking to use AppleScript to help make running the script as easy as possible.
When I try to use Applescript's "do shell script 'python script.py'" I just get back a string of lenght zero when I call 'ideviceinfo'. The exact same thing happens when I try to build an Automator app with a 'Run Shell Script' component for "python script.py".
I have tried my best to isolate the problem down. When other more basic commands such as 'date' are called within the script they return valid strings.
#!/usr/bin/python
import os
ideviceinfoOutput = os.popen('ideviceinfo').read()
print ideviceinfoOutput
print len (ideviceinfoOutput)
boringExample = os.popen('date').read()
print boringExample
print len (boringExample)
I am running Mac OS X 10.11 and am on Python 2.7
Thanks.
I think I've managed to fix it on my own. I just need to be far more explicit about where the 'ideviceinfo' binary (I hope that's the correct term) was stored on the computer.
Changed one line of code to
ideviceinfoOutput = os.popen('/usr/local/bin/ideviceinfo').read()
and all seems to be OK again.
I'm trying to send some files (a zip and a Word doc) to a directory on a server using ftplib. I have the broad strokes sorted out:
session = ftplib.FTP(ftp.server, 'user','pass')
filewpt = open(file, mode)
readfile = open(file, mode)
session.cwd(new/work/directory)
session.storbinary('STOR filename.zip', filewpt)
session.storbinary('STOR readme.doc', readfile)
print "filename.zip and readme.doc were sent to the folder on ftp"
readfile.close()
filewpt.close()
session.quit()
This may provide someone else what they are after but not me. I have been using FileZilla as a check to make sure the files were transferred. When I see they have made it to the server, I see that they are both way smaller or even zero K for the readme.doc file. Now I'm guessing this has something to do with the fact that I stored the file in 'binary transfer mode' <--- whatever that means.
This is where my problems lie. I have no idea at all (yet) what is meant by binary transfer mode. Is it simply that I have to use retrbinary to return the files to their original state?
Could someone please explain to me like I'm a two year old what has happened to my files? If there's any more info required, please let me know.
This is a fantastic resource. Solved most of my problems. Still trying to work out the intricacies of FTPs, but I guess I will save that for another day. The link below builds a function to effortlessly upload files to an FTP without the partial upload problem that I've seen experienced by more than one Stack Exchanger.
http://effbot.org/librarybook/ftplib.htm
I'm working on a software which uses FFMPEG C++ libs to make an acquisition from an UDP streaming.
FFMPEG (1.2) is implemented and running but I get some errors (acquisition crashes and restarts). The log displays the following message:
*Circular buffer overrun. To avoid, increase fifo_size URL option. To survive in such case, use overrun_nonfatal option*
I searched online for documentation about how to use this option, but I only got informations about how to use when running directly ffmpeg executable.
Would someone know how to set the correct option in my C++ code to:
- increase fifo_size
- use overrun_nonfatal option
Thanks
The same option works from command line or C++ libraries, you need to modify your UDP URL as follows:
If you original URL looks like this:
udp://#239.1.1.7:5107
Add the fifo_size and overrun parameters like this:
"udp://#239.1.1.7:5107?overrun_nonfatal=1&fifo_size=50000000"
Remember to escape the URL with quotes.
overrun_nonfatal=1 prevents ffmpeg from exiting, it can recover in most circumstances.
fifo_size=50000000 uses a 50MB udp input buffer (default 5MB)
The only documentation is in the source code:
http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/udp.c;h=5b5c7cb7dfc1aed3f71ea0c3e980be54757d3c62;hb=dd0a9b78db0eeea72183bd3f5bc5fe51a5d3f537
I don't have enough reputation to comment the other answer, but if I did I would say that studying the source linked in the answer:
fifo_size is measured as multiples of 188 Byte (packets) according to the line:
s->circular_buffer_size = strtol(buf, NULL, 10)*188;
so whilst Grant is roughly correct that "default 5MB", because of the line:
s->circular_buffer_size = 7*188*4096;
If you want a circular buffer of 50MB you should really set the fifo_size parameter to something closer to 50*1024*1024/188 otherwise 50000000 will give 50000000*188 bytes which is closer to 8965MB!
Background
I have a C++ extension which runs a 3D watershed pass on a buffer. It's got a nice Cython wrapper to initialise a massive buffer of signed chars to represent the voxels. I initialise some native data structures in python (in a compiled cython file) and then call one C++ function to initialise the buffer, and another to actually run the algorithm (I could have written these in Cython too, but I'd like it to work as a C++ library as well without a python.h dependancy.)
Weirdness
I'm in the process of debugging my code, trying different image sizes to gauge RAM usage and speed, etc, and I've noticed something very strange about the results - they change depending on whether I use python test.py (specifically /usr/bin/python on Mac OS X 10.7.5/Lion, which is python 2.7) or python and running import test, and calling a function on it (and indeed, on my laptop (OS X 10.6.latest, with macports python 2.7) the results are also deterministically different - each platform/situation is different, but each one is always the same as itself.). In all cases, the same function is called, loads some input data from a file, and runs the C++ module.
A note on 64-bit python - I am not using distutils to compile this code, but something akin to my answer here (i.e. with an explicit -arch x86_64 call). This shouldn't mean anything, and all my processes in Activity Monitor are called Intel (64-bit).
As you may know, the point of watershed is to find objects in the pixel soup - in 2D it's often used on photos. Here, I'm using it to find lumps in 3D in much the same way - I start with some lumps ("grains") in the image and I want to find the inverse lumps ("cells") in the space between them.
The way the results change is that I literally find a different number of lumps. For exactly the same input data:
python test.py:
grain count: 1434
seemed to have 8000000 voxels, with average value 0.8398655
find cells:
running watershed algorithm...
found 1242 cells from 1434 original grains!
...
however,
python, import test, test.run():
grain count: 1434
seemed to have 8000000 voxels, with average value 0.8398655
find cells:
running watershed algorithm...
found 927 cells from 1434 original grains!
...
This is the same in the interactive python shell and bpython, which I originally thought was to blame.
Note the "average value" number is exactly the same - this indicates that the same fraction of voxels have initially been marked as in the problem space - i.e. that my input file was initialised in (very very probably) exactly the same way both times in voxel-space.
Also note that no part of the algorithm is non-deterministic; there are no random numbers or approximations; subject to floating point error (which should be the same each time) we should be performing exactly the same computations on exactly the same numbers both times. Watershed runs using a big buffer of integers (here signed chars) and the results are counting clusters of those integers, all of which is implemented in one big C++ call.
I have tested the __file__ attribute of the relevant module objects (which are themselves attributes of the imported test), and they're pointing to the same installed watershed.so in my system's site-packages.
Questions
I don't even know where to begin debugging this - how is it possible to call the same function with the same input data and get different results? - what about interactive python might cause this (e.g. by changing the way the data is initialised)? - Which parts of the (rather large) codebase are relevant to these questions?
In my experience it's much more useful to post ALL the code in a stackoverflow question, and not assume you know where the problem is. However, that is thousands of lines of code here, and I have literally no idea where to start! I'm happy to post small snippets on request.
I'm also happy to hear debugging strategies - interpreter state that I can check, details about the way python might affect an imported C++ binary, and so on.
Here's the structure of the code:
project/
clibs/
custom_types/
adjacency.cpp (and hpp) << graph adjacency (2nd pass; irrelevant = irr)
*array.c (and h) << dynamic array of void*s
*bit_vector.c (and h) << int* as bitfield with helper functions
polyhedron.cpp (and hpp) << for voxel initialisation; convex hull result
smallest_ints.cpp (and hpp) << for voxel entity affiliation tracking (irr)
custom_types.cpp (and hpp) << wraps all files in custom_types/
delaunay.cpp (and hpp) << marshals calls to stripack.f90
*stripack.f90 (and h) << for computing the convex hulls of grains
tensors/
*D3Vector.cpp (and hpp) << 3D double vector impl with operators
watershed.cpp (and hpp) << main algorithm entry points (ini, +two passes)
pywat/
__init__.py
watershed.pyx << cython class, python entry points.
geometric_graph.py << python code for post processing (irr)
setup.py << compile and install directives
test/
test.py << entry point for testing installed lib
(files marked * have been used extensively in other projects and are very well tested, those suffixed irr contain code only run after the problem has been caused.)
Details
as requested, the main stanza in test/test.py:
testfile = 'valid_filename'
if __name__ == "__main__":
# handles segfaults...
import faulthandler
faulthandler.enable()
run(testfile)
and my interactive invocation looks like:
import test
test.run(test.testfile)
Clues
when I run this at the straight interpreter:
import faulthandler
faulthandler.enable()
import test
test.run(test.testfile)
I get the results from the file invocation (i.e. 1242 cells), although when I run it in bpython, it just crashes.
This is clearly the source of the problem - hats off to Ignacio Vazquez-Abrams for asking the right question straight away.
UPDATE:
I've opened a bug on the faulthandler github and I'm working towards a solution. If I find something that people can learn from I'll post it as an answer.
After debugging this application extensively (printf()ing out all the data at multiple points during the run, piping outputs to log files, diffing the log files) I found what seemed to cause the strange behaviour.
I was using uninitialised memory in a couple of places, and (for some bizarre reason) this gave me repeatable behaviour differences between the two cases I describe above - one without faulthandler and one with.
Incidentally, this is also why the bug disappeared from one machine but continued to manifest itself on another, part way through debugging (which really should have given me a clue!)
My mistake here was to assume things about the problem based on a spurious correlation - in theory the garbage ram should have been differently random each time I accessed it (ahh, theory.) In this case I would have been quicker finding the problem with a printout of the main calculation function and a rubber duck.
So, as usual, the answer is the bug is not in the library, it is somewhere in your code - in this case, it was my fault for malloc()ing a chunk of RAM, falsely assuming that other parts of my code were going to initialise it (which they only did sometimes.)
I want to preview TeX formulas in my User Interface. After a long time searching, it seems to me that there is no other possibility than
write the formula into a .tex file
call tex with system() and write a dvi file
call e.g. dvipng with system() and write a png file
load this file into the GUI
clean up(erase all these files).
I think that the performance of this way doing it is not a problem, since there are only formulas to render and not whole documents. But setting up the environment automatically for the TeX system seems to be a bigger problem.
So, is there a possibility to include TeX as an API in my program?
Thanks a lot!
Couldn't you encapsulate those steps in a single shell script (i.e. which takes the formula and png filename as arguments)? The script could then also handle setting up the environment for TeX. Your program just calls the script with the system() call.
There's a C API for TeX called MimeTeX but the resulting image is... not a nice as it could be.
If you're OK with Java, there's JLatexMath
And if you'd like a WPF version, one is under development at WPFMath
I'm not sure, but think MathType's Component will overkill.
Also have a look at sideshare and see flash video to get more information about sitmo, mathMagig, Edoboard and their API tools.
good lucks.
For Edoboard and Tutorsbox.com we do the following:
Keep a blacklist of LaTeX commands to avoid:
TEX_BLACKLIST = ["\\def", "\\let", "\\futurelet",
"\\newcommand", "\\renewcommand", "\\else", "\\fi", "\\write",
"\\input", "\\include", "\\chardef", "\\catcode", "\\makeatletter",
"\\noexpand", "\\toksdef", "\\every", "\\errhelp", "\\errorstopmode",
"\\scrollmode", "\\nonstopmode", "\\batchmode", "\\read", "\\csname",
"\\newhelp", "\\relax", "\\afterground", "\\afterassignment",
"\\expandafter", "\\noexpand", "\\special", "\\command", "\\loop",
"\\repeat", "\\toks", "\\output", "\\line", "\\mathcode", "\\name",
"\\item", "\\section", "\\mbox", "\\DeclareRobustCommand", "\\[", "\\]"];
We then do system call "latex and textopng".
That as an API REST plus some caching and here you go :)
As an upgrade we will soon convert those LaTeX images as SVG.
LyX is a TeX based Document Processor. As the application is open source you can inspect the C++ code to see how they deal with the problem you described.