python 2.7 IOError: [Errno 24] Too many open files: - python-2.7

At work I had the bad luck to have fix a badly written url validator script in python done by someone else. It's a really messy code, and trying to fix one of the bugs, I found some behavior that I don't understand.
The script has to process a file with around 10 thousand url's in it, it has to check each url to see if it's valid, not only in it's structure but also check if it exists (using pycurl for this). On one part of the code, this is done:
for li in lineas:
liNew = "http://" + li
parsedUrl = urlparse.urlparse(liNew)
On this case the bug was the addition of "http://" at the beginning of the line as that was being done before on the script. So I changed the code to this:
for li in lineas:
liNew = li
parsedUrl = urlparse.urlparse(liNew)
Now, with the same input file the script fails with the error:
IOError: [Errno 24] Too many open files:/path/to/file/being/written/to.txt
With liNew = "http://" + li, file descriptors don't go over the default limit of 1024, but changing that line to liNew = li will make them go over 8000, why ??

With liNew = "http://" + li, file descriptors don't go over the default limit of 1024, but changing that line to liNew = li will make them go over 8000, why ??
before: broken url - nothing gets downloaded (no files are opened)
after: correct url - urls are saved to files (there are 10K urls)
It probably doesn't make sense to download more that a few hundreds urls concurrently (bandwidth, disk). Make sure that all files (sockets, disk files) are properly disposed after the download (close() method is called in time).
Default limit (1024) is low but don't increase it unless you understand what the code does.

Related

Qt - pdftocairo pdf conversion process not working when application on auto start

I am running my Qt (4.8, QWS server, QWidget app) application on an ARM/embedded linux platform. On my application, I have a module/widget to view PDF files.
Being a slower processor, it was better to go for a conversion of the PDF file to image files using pdftocairo. The module also has a feature to import any pdf file from a flash drive and convert it to images using pdftocairo. The entire module works as expected when I manually start the application from command line. Here is the code that imports the pdf file into the device in the form of images:
QString CacheName = PDFList->currentItem()->text(); //name of PDF file without ".pdf"
QString PDFString = "pdftocairo -jpeg -r 200 \"/media/usb/" + CacheName + ".pdf\" \"/opt/.pdf/" + CacheName + "\"";
qDebug() << PDFString;
QProcess PDFCacheprocess;
PDFCacheprocess.startDetached(PDFString); //or PDFCacheprocess.start(PDFString)
The ultimate goal of the project is to have the application to auto-start when the device boots up. However, when starting the application automatically, the import feature doesn't seem to do anything. I am stumped with not being able to identify the problem because I do not have any debug output (which I do have when executing the app normally).
I normally execute the application manually with
/opt/[path]/[application name] -qws
When auto-starting, I put the application out into a file, log.txt by adding &>/opt/log.txt. The output seems to be the same as when I am running with the manual command. This is the content of the file during the import process (no error being reported).
"pdftocairo -jpeg -r 200 "/media/usb/manual.pdf" "/opt/.pdf/manual"
Strangely enough, every other command (other than pdftocairo) is working. I tried to replace this command with QString PDFString = "/opt/./importPDF.sh". The script was being executed for any command (like reboot), but again, it would fail if it contained the pdftocairo command.
I also tried to add a slot connected to QProcess::finished(int) to show the QProcess output:
connect(&PDFCacheprocess, SIGNAL(finished(int)), this, SLOT(pdfImportStatus(int)));
void UserManual::pdfImportStatus(int)
{
qDebug()<<PDFCacheprocess.errorString()<<'\t'<<PDFCacheprocess.exitCode();
}
For the manual execution ( when import works), I would get:
"pdftocairo -jpeg -r 200 "/media/usb/manual.pdf" "/opt/.pdf/manual""
"Unknown error" 0
For the auto-start, log.txt only shows this (seems like the slot isn't being triggered?)
"pdftocairo -jpeg -r 200 "/media/usb/manual.pdf" "/opt/.pdf/manual""
Any help is appreciated. Thanks in advance :)
Apparently the problem was that the command was not being recognized in the working directory (only when auto-starting for some reason). When using a Qprocess, it turns out that it is always good to give the path even if the file/command exists in the environment variables - as was in my case ($PATH).
I had to replace the QString with:
QString PDFString = "/usr/local/bin//pdftocairo -jpeg -r 200 \"/media/usb/" + CacheName + ".pdf\" \"/opt/.pdf/" + CacheName + "\"";

Script failing to open and append multiple files simultaneously

So trying to finish a very simple script that has given me a unbelievably hard time. It's supposed to iterate through specified directories and open all text files in them and append them all with the same specified string.
The issue is it's not doing anything to the files at all. Using print to test my logic I've replaced lines 10 and 11 with print f (the write and close functions), and get the following output:
<open file '/Users/russellculver/documents/testfolder/.DS_Store', mode 'a+' at
So I think it is storing the correct files in the f variable for the write function, however I am not familiar with how Mac's handle DS_STORE or the exact role it plays in temporary location tracking.
Here is the actual script:
import os
x = raw_input("Enter the directory path here: ")
def rootdir(x):
for dirpaths, dirnames, files in os.walk(x):
for filename in files:
try:
with open(os.path.join(dirpaths, filename), 'a+') as f:
f.write('new string content')
f.close()
except:
print "Directory empty or unable to open file."
return x
rootdir(x)
And the exact return in Terminal after execution:
Enter the directory path here: /Users/russellculver/documents/testfolder
Exit status: 0
logout
[Process completed]
Yet nothing written to the .txt files in the provided directory.
The way the indentation is in the question, you return from the function right after writing the first file; either of the for-loops never finish. Which is relatively easy to surmise from the fact that you only get one output file printed.
Since you're not doing anything with the result of the rootdir function, I would just remove the return statement entirely.
An aside: there is no need to use f.close() when you open a file with the with statement: it will automatically be closed (even upon an exception). That is in fact what the with statement was introduced for (see the pep on context managers if necessary).
To be complete, here's the function the way I would have (roughly) written it:
def rootdir(x):
for dirpaths, dirnames, files in os.walk(x):
for filename in files:
path = os.path.join(dirpaths, filename)
try:
with open(path, 'a+') as f:
f.write('new string content')
except (IOError, OSError) as exc:
print "Directory empty or unable to open file:", path
(Note that I'm catching only the relevant I/O errors; any other exceptions (though unlikely) will not be caught, as they are likely not to be related to non-existing/unwritable file.)
Return was indented wrong, ending the iteration after a single loop. Wasn't even necessary so was removed entirely.

Why are my files smaller after I FTP them using this Python program?

I'm trying to send some files (a zip and a Word doc) to a directory on a server using ftplib. I have the broad strokes sorted out:
session = ftplib.FTP(ftp.server, 'user','pass')
filewpt = open(file, mode)
readfile = open(file, mode)
session.cwd(new/work/directory)
session.storbinary('STOR filename.zip', filewpt)
session.storbinary('STOR readme.doc', readfile)
print "filename.zip and readme.doc were sent to the folder on ftp"
readfile.close()
filewpt.close()
session.quit()
This may provide someone else what they are after but not me. I have been using FileZilla as a check to make sure the files were transferred. When I see they have made it to the server, I see that they are both way smaller or even zero K for the readme.doc file. Now I'm guessing this has something to do with the fact that I stored the file in 'binary transfer mode' <--- whatever that means.
This is where my problems lie. I have no idea at all (yet) what is meant by binary transfer mode. Is it simply that I have to use retrbinary to return the files to their original state?
Could someone please explain to me like I'm a two year old what has happened to my files? If there's any more info required, please let me know.
This is a fantastic resource. Solved most of my problems. Still trying to work out the intricacies of FTPs, but I guess I will save that for another day. The link below builds a function to effortlessly upload files to an FTP without the partial upload problem that I've seen experienced by more than one Stack Exchanger.
http://effbot.org/librarybook/ftplib.htm

Script to list all dwg files and convert them to geodatabase

This script is an improvement on the previously posted one but it is
still giving me an error of "Failed to execute (CADToGeodatabase)"
It is able to iterate through the directories and subdirectories, list
the dwg files, create the geodatabase but not able to populate it with
the feature datasets and feature classes due to the error!. Please help!
import os, os.path, arcpy
from arcpy import env
env.workspace = "J:/2010"
# Set workspace and variables
gdb = r"C:\data\2010.gdb"
arcpy.env.workspace = gdb
# Create a FileGDB for the fds
arcpy.CreateFileGDB_management("C:/data", "2010.gdb")
reference_scale = "1500"
for root, dirs, files in os.walk("J:/2010/"):
for file in files:
if file.endswith('.dwg'):
print "current file is: " + file
outDS = arcpy.ValidateTableName(os.path.splitext("d" +
os.path.basename(file))[0])
arcpy.CADToGeodatabase_conversion(file, gdb, outDS, reference_scale)
The line saying def recursive_file_gen(r"J:\2010"): looks strange to me. I don't think you can put a literal string there. I am surprised that this runs at all. Maybe you meant to do something like def recursive_file_gen(directory=r"J:\2010"): or simply def recursive_file_gen():.
Also, I think the line saying yield os.path.join(root, file) needs to be indented more to be considered inside of the inner for loop.
I don't see specifically what is causing the script to work only in one subdirectory. I will need more details about what is happening.
EDIT: I didn't notice that the recursive_file_gen function is not being used at all. I don't know what is causing the problem. I think someone who is more familiar with arcpy would be more helpful to you.

Libtorrent - Given a magnet link, how do you generate a torrent file?

I have read through the manual and I cannot find the answer. Given a magnet link I would like to generate a torrent file so that it can be loaded on the next startup to avoid redownloading the metadata. I have tried the fast resume feature, but I still have to fetch meta data when I do it and that can take quite a bit of time. Examples that I have seen are for creating torrent files for a new torrent, where as I would like to create one matching a magnet uri.
Solution found here:
http://code.google.com/p/libtorrent/issues/detail?id=165#c5
See creating torrent:
http://www.rasterbar.com/products/libtorrent/make_torrent.html
Modify first lines:
file_storage fs;
// recursively adds files in directories
add_files(fs, "./my_torrent");
create_torrent t(fs);
To this:
torrent_info ti = handle.get_torrent_info()
create_torrent t(ti)
"handle" is from here:
torrent_handle add_magnet_uri(session& ses, std::string const& uri add_torrent_params p);
Also before creating torrent you have to make sure that metadata has been downloaded, do this by calling handle.has_metadata().
UPDATE
Seems like libtorrent python api is missing some of important c++ api that is required to create torrent from magnets, the example above won't work in python cause create_torrent python class does not accept torrent_info as parameter (c++ has it available).
So I tried it another way, but also encountered a brick wall that makes it impossible, here is the code:
if handle.has_metadata():
torinfo = handle.get_torrent_info()
fs = libtorrent.file_storage()
for file in torinfo.files():
fs.add_file(file)
torfile = libtorrent.create_torrent(fs)
torfile.set_comment(torinfo.comment())
torfile.set_creator(torinfo.creator())
for i in xrange(0, torinfo.num_pieces()):
hash = torinfo.hash_for_piece(i)
torfile.set_hash(i, hash)
for url_seed in torinfo.url_seeds():
torfile.add_url_seed(url_seed)
for http_seed in torinfo.http_seeds():
torfile.add_http_seed(http_seed)
for node in torinfo.nodes():
torfile.add_node(node)
for tracker in torinfo.trackers():
torfile.add_tracker(tracker)
torfile.set_priv(torinfo.priv())
f = open(magnet_torrent, "wb")
f.write(libtorrent.bencode(torfile.generate()))
f.close()
There is an error thrown on this line:
torfile.set_hash(i, hash)
It expects hash to be const char* but torrent_info.hash_for_piece(int) returns class big_number which has no api to convert it back to const char*.
When I find some time I will report this missing api bug to libtorrent developers, as currently it is impossible to create a .torrent file from a magnet uri when using python bindings.
torrent_info.orig_files() is also missing in python bindings, I'm not sure whether torrent_info.files() is sufficient.
UPDATE 2
I've created an issue on this, see it here:
http://code.google.com/p/libtorrent/issues/detail?id=294
Star it so they fix it fast.
UPDATE 3
It is fixed now, there is a 0.16.0 release. Binaries for windows are also available.
Just wanted to provide a quick update using the modern libtorrent Python package: libtorrent now has the parse_magnet_uri method which you can use to generate a torrent handle:
import libtorrent, os, time
def magnet_to_torrent(magnet_uri, dst):
"""
Args:
magnet_uri (str): magnet link to convert to torrent file
dst (str): path to the destination folder where the torrent will be saved
"""
# Parse magnet URI parameters
params = libtorrent.parse_magnet_uri(magnet_uri)
# Download torrent info
session = libtorrent.session()
handle = session.add_torrent(params)
print "Downloading metadata..."
while not handle.has_metadata():
time.sleep(0.1)
# Create torrent and save to file
torrent_info = handle.get_torrent_info()
torrent_file = libtorrent.create_torrent(torrent_info)
torrent_path = os.path.join(dst, torrent_info.name() + ".torrent")
with open(torrent_path, "wb") as f:
f.write(libtorrent.bencode(torrent_file.generate()))
print "Torrent saved to %s" % torrent_path
If saving the resume data didn't work for you, you are able to generate a new torrent file using the information from the existing connection.
fs = libtorrent.file_storage()
libtorrent.add_files(fs, "somefiles")
t = libtorrent.create_torrent(fs)
t.add_tracker("http://10.0.0.1:312/announce")
t.set_creator("My Torrent")
t.set_comment("Some comments")
t.set_priv(True)
libtorrent.set_piece_hashes(t, "C:\\", lambda x: 0), libtorrent.bencode(t.generate())
f=open("mytorrent.torrent", "wb")
f.write(libtorrent.bencode(t.generate()))
f.close()
I doubt that it'll make the resume faster than the function built specifically for this purpose.
Try to see this code http://code.google.com/p/libtorrent/issues/attachmentText?id=165&aid=-5595452662388837431&name=java_client.cpp&token=km_XkD5NBdXitTaBwtCir8bN-1U%3A1327784186190
it uses add_magnet_uri which I think is what you need