Scrapy and Webkit keeps freezing - python-2.7

I am using scrapy to crawl some websites with JavaScript. The scraper works fine, but it keeps freezing at random places. I need to kill the script with ctrl+z, but it only freezes when i use this downloader middleware:
from scrapy.http import Request, FormRequest, HtmlResponse
import gtk
import webkit
import jswebkit
class WebkitDownloader( object ):
def process_request( self, request, spider ):
if( type(request) is not FormRequest ):
webview = webkit.WebView()
webview.connect( 'load-finished', lambda v,f: gtk.main_quit() )
webview.load_uri( request.url )
gtk.main()
js = jswebkit.JSContext( webview.get_main_frame().get_global_context() )
renderedBody = str( js.EvaluateScript( 'document.documentElement.innerHTML' ) )
return HtmlResponse( request.url, body=renderedBody )
I tried debugging the script by printing every single step of the middleware, but it runs fine. It freezes when i see this in the terminal:
Message: console message: http://www.myurl.com/js/Shop.min.js?twsv=20140521122711 #642: JQMIGRATE: Logging is active
I tried different middlewares and download handlers, but it always freezes. I also checked my connection, but i cant see any patterns or signs of interruption. I am using the latest version of scrapy.
2014-04-25 - Update
When i do get it to run a couple of minutes, i get this error:
Creating pipes for GWakeup: Too many open files
GLib-ERROR **: Creating pipes for GWakeup: Too many open files
2014-05-27 - Update
I ran "print" through every line of code in my middleware. It turns out it freezes the code and terminal randomly when it comes to the gtk.main() part. When I use the lsof command during a "freeze" i can see a bunch of "scrapy" "files" open. After a freeze my code wont run again until i reset my computer. I guess there is some kind of bug that causes a leakage.

Related

Google App Engine deferred.defer task not getting executed

I have a Google App Engine Standard Environment application that has been working fine for a year or more, that, quite suddenly, refuses to enqueue new deferred tasks using deferred.defer.
Here's the Python 2.7 code that is making the deferred call:
# Find any inventory items that reference the product, and change them too.
# because this could take some time, we'll do it as a deferred task, and only
# if needed.
if upd:
updater = deferredtasks.InvUpdate()
deferred.defer(updater.run, product_key)
My app.yaml file has the necessary bits to support deferred.defer:
- url: /_ah/queue/deferred
script: google.appengine.ext.deferred.deferred.application
login: admin
builtins:
- deferred: on
And my deferred task has logging in it so I should see it running when it does:
#-------------------------------------------------------------------------------
# DEFERRED routine that updates the inventory items for a particular product. Should be callecd
# when ANY changes are made to the product, because it should trigger a re-download of the
# inventory record for that product to the iPad.
#-------------------------------------------------------------------------------
class InvUpdate(object):
def __init__(self):
self.to_put = []
self.product_key = None
self.updcount = 0
def run(self, product_key, batch_size=100):
updproduct = product_key.get()
if not updproduct:
logging.error("DEFERRED ERROR: Product key passed in does not exist")
return
logging.info(u"DEFERRED BEGIN: beginning inventory update for: {}".format(updproduct.name))
self.product_key = product_key
self._continue(None, batch_size)
...
When I run this in the development environment on my development box, everything works fine. Once I deploy it to the App Engine server, the inventory updates never get done (i.e. the deferred task is not executed), and there are no errors (and no other logging from the deferred task in fact) in the log files on the server. I know that with the sudden move to get everybody on Python 3 as quickly as possible, the deferred.defer library has been marked as not recommended because it only works with the 2.7 Python environment, and I planned on moving to task queues for this, but I wasn't expecting deferred.defer to suddenly stop working in the existing python environment.
Any insight would be greatly appreciated!
I'm pretty sure you cant pass the method of an instance to appengine taskqueue, because that instance will not get exist when your task runs since it will be running in a different process. I actually dont understand how your task ever worked when running remotely in the first place (and running locally is not an accurate representation of how things will run remotely)
Try changing your code to this:
if upd:
deferred.defer(deferredtasks.InvUpdate.run_cls, product_key)
and then InvUpdate is the same but has a new function run_cls:
class InvUpdate(object):
#classmethod
def run_cls(cls, product_key):
cls().run(product_key)
And I'm still on the process of migrating to cloud tasks and my deferred tasks still work

arcpy in django doesnot run

I have a script with ArcGIS using arcpy model. And I want to combine it with Django. The script runs on console successfully, however, when running with Django, I find the arcpy functions do not run. So I make a simple test for it, and get the same result.
test.py
import arcpy
import os
def test_arcpy():
tempFolder = arcpy.env.scratchFolder
tempGDBPath = os.path.join(tempFolder, 'test.gdb')
arcpy.env.overwriteOutput = True
if not arcpy.Exists(tempGDBPath):
arcpy.AddMessage('create..')
arcpy.CreateFileGDB_management(tempFolder, 'test.gdb')
return arcpy.Exists(tempGDBPath)
views.py
from django.http import HttpResponse
from . import test
def t(request):
msg = str(test.test_arcpy())
return HttpResponse(msg)
If I run the test.py in console, it return True.But if I run it in django, it always return false. If I cannot solve it , I cannot make more difficult script in django. Can you help me?
I found the similar question in Flask app with ArcGIS, Arcpy does not run, but no solution in this question.
I think I can use the subprocess model to run the arcpy script in console, and then return the console message back to the django function.
But I really want to know whether arcpy can run with django or not.
i meet the same problem the same as you in Flask.
I check the source code in arcpy, found there is an bug in arcgisscripting.pyd file while i trigger arcpy.Exists() from Flask. And i lost any clew. After searching for some case on ESRI online blog, i certainly found that it's a bug in arcpy. So i advice u:
try higher version arcpy.(my current version is 10.1)
try to run your code in a console
try to run message queue like from multiprocessing import Queue, Process
try to not use the function. In my case, I avoid to use arcpy.Exists()
try to use 64-bit arcpy in arcServer.

Taking a full screenshot of only my desktop icons using python/ PIL

I'm currently able to take a screenshot of my desktop using this code
from PIL import Image, ImageGrab
x =ImageGrab.grab()
x.show()
but the problem is, this captures the python script/idle dialog(box) which is running the script as well.
I want to take a clean screenshot of my desktop screen, caputuring all the desktop icons,background screen and taskbar etc via a script.
Is there a way to automate it so the script can automatically do this?
Thanks.
One of the solutions would be to use a Python web server (tornado or whatever).
Take a look at the following code (Python 3 version):
pip install tornado
from PIL import ImageGrab
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
x = ImageGrab.grab()
x.show()
self.write("Hello, world")
def make_app():
return tornado.web.Application([
(r"/", MainHandler),
])
if __name__ == "__main__":
app = make_app()
app.listen(8888)
tornado.ioloop.IOLoop.current().start()
Minimise the IDLE window when the script is running.
Open a browser from your phone or another computer and navigate to:
http://[your IP address]:8888
I cannot test this script because ImageGrab.grab() only works on Mac and Windows but unfortunately not on Linux.
I tried out this code:
from PIL import ImageGrab
import time
import cv2
time.sleep(5)
x = ImageGrab.grab()
x.save('Screenshot.jpg')
x.show()
I delayed the execution of ImageGrab.grab() by 5 seconds, in the meantime I minimized my workspace and switched to my Desktop. After 5 seconds it captured a screenshot of my desktop and saved it.
I am still figuring out a way to automatically capture the Desktop.

Getting an 500 internal error when uploading file using cgi and python

Hi I have a project that involves uploading documents. I am using html for the front end and python for the backend. I've managed to link my html and python file but I'm having a problem with the server. At first I though it was a random thing but I'm pretty sure it's because of what I added to the python code. I have:
import cgi
import sys
import os
htmlform = cgi.FieldStorage()
file_data = htmlform['myfile']
if not fileitem.file:
return
(name,ext) = os.path.splitext( fileitem.filename)
#if ext == “.jpg” or ext == “.png” or ext == “.gif”:
#ioFlag = “wb”
#else:
#ioFlag = “w”
I was able to log into my page go to the html form submit the form and got to a basic success html page I had below the above input. Now Im pretty new to python and didnt realise that the if statements should be indented. And I get a 500 internal error when I uncommented the if statement. I did it once and then went through commenting out my code being completely confused as to why I was getting error but after a while it just started working again. My guess is the incorrect if statement somehow got it stuck. I expect after about an hour it'll be working again but ideally I'd like to know if I could stop the process on the server if possible. I was following this guide http://www.alwaysgetbetter.com/blog/2009/01/02/python-file-upload/
Fixed it! The problem seems to be the indentation. If you're ever unsure about this stuff look at the error logs. I'm using an apache server and I dont have access to the error logs so I used
sudo cat /etc/log/apache2/error.log
It gave me the answer and this should hopefully help you even if your question is unrelated.
EDIT: An for completeness sake file_data should be fileitem

How do I embed an IPython Interpreter into an application running in an IPython Qt Console

There are a few topics on this, but none with a satisfactory answer.
I have a python application running in an IPython qt console
http://ipython.org/ipython-doc/dev/interactive/qtconsole.html
When I encounter an error, I'd like to be able to interact with the code at that point.
try:
raise Exception()
except Exception as e:
try: # use exception trick to pick up the current frame
raise None
except:
frame = sys.exc_info()[2].tb_frame.f_back
namespace = frame.f_globals.copy()
namespace.update(frame.f_locals)
import IPython
IPython.embed_kernel(local_ns=namespace)
I would think this would work, but I get an error:
RuntimeError: threads can only be started once
I just use this:
from IPython import embed; embed()
works better than anything else for me :)
Update:
In celebration of this answer receiving 50 upvotes, here are the updates I've made to this snippet in the intervening six years since it was posted.
First, I now like to import and execute in a single statement, as I use black for all my python code these days and it reformats the original snippet in a way that doesn't make sense in this specific and unusual context. So:
__import__("IPython").embed()
Given than I often use this inside a loop or a thread, it can be helpful to include a snippet that allows terminating the parent process (partly for convenience, and partly to remind myself of the best way to do it). os._exit is the best choice here, so my snippet includes this (same logic w/r/t using a single statement):
q = __import__("functools").partial(__import__("os")._exit, 0)
Then I can simply use q() if/when I want to exit the master process.
My full snippet (with # FIXME in case I would ever be likely to forget to remove it!) looks like this:
q = __import__("functools").partial(__import__("os")._exit, 0) # FIXME
__import__("IPython").embed() # FIXME
You can follow the following recipe to embed an IPython session into your program:
try:
get_ipython
except NameError:
banner=exit_msg=''
else:
banner = '*** Nested interpreter ***'
exit_msg = '*** Back in main IPython ***'
# First import the embed function
from IPython.frontend.terminal.embed import InteractiveShellEmbed
# Now create the IPython shell instance. Put ipshell() anywhere in your code
# where you want it to open.
ipshell = InteractiveShellEmbed(banner1=banner, exit_msg=exit_msg)
Then use ipshell() whenever you want to be dropped into an IPython shell. This will allow you to embed (and even nest) IPython interpreters in your code and inspect objects or the state of the program.