Ruby on Rails working in background [resque + resque-status]

Ruby on Rails working in background [resque + resque-status] - c++

I am building a webapp using ruby on rails that requires running C++ exe programs in the background. I compared 3 most frequently used gems for this(Delayed_Jobs, Resque, Sidekiq) and found that resque is the most suitable for me.
In Countroller I have create method like this
def create
#model = Model.create(model_params)
# resque processe the file from the #model
Resque.enqueue(JobWorker, #model.file.url)
redirect_to model_path(#model.id)
end
In Worker class I have
class JobWorker
#queue = :file
def perform file_to_process
# calling time consuming C++ here which generates 2 image files.
system('my_file_processing_program "#{file_to_process}"')
end
end
Now my question is how should I detect that job has finished? I want to send gemerated image file to client once images are generated by C++ application.
which usercan view/download.
As redirect_to model_path(#model.id) will return after Resque.enqueue(JobWorker, #model.file.url) in the create in controller.
I tried using resque-status but that requires polling in the controller to check the status like...
while status = Resque::Plugins::Status::Hash.get(job_id) and !status.completed? && !status.failed?
sleep 1
puts status.inspect
end
Any suggestions?? Thank you in Advance...

If you want to go asynchronous system like faye(http://faye.jcoglan.com/ruby.html) so you can send a message to the frontend when the process is done. Write the code to publish a message after your system code finishes execution.
Another simple step, though might not be feasible for you is to send an email at the end of the process. You can send an email to the client letting them know "that the process is complete visit link to see the result."

Related

Google App Engine deferred.defer task not getting executed

I have a Google App Engine Standard Environment application that has been working fine for a year or more, that, quite suddenly, refuses to enqueue new deferred tasks using deferred.defer.
Here's the Python 2.7 code that is making the deferred call:
# Find any inventory items that reference the product, and change them too.
# because this could take some time, we'll do it as a deferred task, and only
# if needed.
if upd:
updater = deferredtasks.InvUpdate()
deferred.defer(updater.run, product_key)
My app.yaml file has the necessary bits to support deferred.defer:
- url: /_ah/queue/deferred
script: google.appengine.ext.deferred.deferred.application
login: admin
builtins:
- deferred: on
And my deferred task has logging in it so I should see it running when it does:
#-------------------------------------------------------------------------------
# DEFERRED routine that updates the inventory items for a particular product. Should be callecd
# when ANY changes are made to the product, because it should trigger a re-download of the
# inventory record for that product to the iPad.
#-------------------------------------------------------------------------------
class InvUpdate(object):
def __init__(self):
self.to_put = []
self.product_key = None
self.updcount = 0
def run(self, product_key, batch_size=100):
updproduct = product_key.get()
if not updproduct:
logging.error("DEFERRED ERROR: Product key passed in does not exist")
return
logging.info(u"DEFERRED BEGIN: beginning inventory update for: {}".format(updproduct.name))
self.product_key = product_key
self._continue(None, batch_size)
...
When I run this in the development environment on my development box, everything works fine. Once I deploy it to the App Engine server, the inventory updates never get done (i.e. the deferred task is not executed), and there are no errors (and no other logging from the deferred task in fact) in the log files on the server. I know that with the sudden move to get everybody on Python 3 as quickly as possible, the deferred.defer library has been marked as not recommended because it only works with the 2.7 Python environment, and I planned on moving to task queues for this, but I wasn't expecting deferred.defer to suddenly stop working in the existing python environment.
Any insight would be greatly appreciated!

I'm pretty sure you cant pass the method of an instance to appengine taskqueue, because that instance will not get exist when your task runs since it will be running in a different process. I actually dont understand how your task ever worked when running remotely in the first place (and running locally is not an accurate representation of how things will run remotely)
Try changing your code to this:
if upd:
deferred.defer(deferredtasks.InvUpdate.run_cls, product_key)
and then InvUpdate is the same but has a new function run_cls:
class InvUpdate(object):
#classmethod
def run_cls(cls, product_key):
cls().run(product_key)
And I'm still on the process of migrating to cloud tasks and my deferred tasks still work

returned array referencing the same memory spot but returning different values

I'm setting up a basic dynamic web page to display EC2 instance data and I need to be checking and passing arrays with the data inside to display with D3. Im using multiprocess to run the collection in the background.
Running python3.7 and the newest version of Flask.
app.py Code
#app.route('/experiment')
def experiment():
type = request.args.get('type')
resource = request.args.get('resource')
action = request.args.get('action')
if 'test' not in session:
thread = multiprocessing.Process(target=exp.transmitTest)
session['test'] = 'started'
thread.start()
print(f"Looking for Data at {hex(id(exp.getData()))} found {exp.getData()}")
return render_template('experiment.html', data=exp.getData(), type=request.args.get('type'), resource=request.args.get('resource'), action=request.args.get('action'))
Backend Code
def transmitTest(self):
for i in range(5):
self.data.append(random.randint(0,100))
time.sleep(4)
print(f"Data: {self.data} at {hex(id(self.data))}")
def getData(self):
return self.data
My JS scheduler runs '/experiment' every 5 seconds. The print statements show that the array im writing to and getting from the getter are at the same memory space, but one is empty and the other has the data. Can anyone help me understand this?

so I figured it out. When calling object methods in processes in flask python creates copies of the objects and then differentiates between the two copies even if they do take up the same memory space. I needed to add a backend queue through redisqueue (https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-xxii-background-jobs) so that I could make asynchronous calls to my backend without disrupting flask's routing.

Right way to delay file download in Django

I have a class-based view which triggers the composition and downloading of a report for a user.
Normally in def get of the class I just compile the report, add response['Content-Disposition'] = 'attachment; filename="somefilename.pdf"' and return response to a user.
The problem is that some reports are large and while they are compiling the request timeout happens.
I know that the right way of dealing with this is to delegate it to a background process (like Celery). But the problem is that it means that instead of creating a temporary file which ceases to exist the moment the user downloads a report, I have to store these reports somewhere, and write a cronjob which will regularly clean the reports directory.
Is there any more elegant way in Django to deal with this issue?

One solution less fancy than using celery is to use is Django's StreamingHttpResponse:
(https://docs.djangoproject.com/en/2.0/ref/request-response/#django.http.StreamingHttpResponse
With this, you use a generator function, which is a python function that uses yield to return its results as an iterator. This allows you to return the data as you generate it, rather than all at once at after you're finished. You can yield after each line or section of the report.. thus keeping a flow of data back to the browser.
But.. this only works if you are building up the finished file bit by bit.. for example, a CSV file. If you're returning something that you need to format all at once, for example if you're using something like wkhtmltopdf to generate a pdf file after you're done, then it's not as easy.
But there's still a solution:
What you can do in that case is, use StreamingHttpReponse along with a generator function to generate your report into a temporary file, instead of back to the browser. But as you are doing this, yield HTML snippets back to the browser which lets the user know the progress, eg:
def get(self, request, **kwargs):
# first you need a tempfile name.. do that however you like
tempfile = "kfjkdsjfksjfks"
# then you need to create a view which will open that file and serve it
# but I won't show that here.
# For security reasons it has to serve only out of one directory
# that is dedicated to this.
fetchurl = reverse('reportgetter_url') + '?file=' + tempfile
def reportgen():
yield 'Starting report generation..<br>'
# do some stuff to generate your report into the tempfile
yield 'Doing this..<br>'
# do this
yield 'Doing that..<br>'
# do that
yield 'Finished.<br>'
# when the browser receives this script, it'll go to fetchurl where
# you will send them the finished report.
yield '<script>document.location="%s";</script>' % fetchurl
return http.StreamingHttpResponse(reportgen())
That's not a complete example obviously, but should give you the idea.
When your user fetches this view, they will see the progress of the report as it comes along. At the end, you're sending the javacript which redirect the browser to the other view you will have to write which returns the response containing the finished file. When the browser gets this javacript, if the view returning the tempfile is setting the response Content-Disposition as an attachment before returning it, eg:
response['Content-Disposition'] = 'attachment; filename="%s"' % filename
..then the browser will stay on the current page showing your progress.. and simply pop up a file save dialog for the user.
For cleanup, you'll need a cron job regardless.. because if people don't wait around, they'll never pick up the report. Sometimes things don't work out... So you could just clean up files older than let's say 1 hour. For a lot of systems this is acceptable.
But if you want to clean up right away, what you can do, if you are on unix/linux, is to use an old unix filesystem trick: Files which are deleted while they are open are not really gone until they are closed. So, open your tempfile.. then delete it. Then return your response. As soon as the response has finished sending, the space used by the file will be freed.
PS: I should add.. that if you take this second approach, you could use one view to do both jobs.. just:
if `file` in request.GET:
# file= was in the url.. they are trying to get an already generated report
with open(thepathname) as f:
os.unlink(f)
# file has been 'deleted' but f is still a valid open file
response = HttpResponse( etc etc etc)
response['Content-Disposition'] = 'attachment; filename="thereport"'
response.write(f)
return response
else:
# generate the report
# as above

This is not really a Django question but a general architecture question.
You can always increase your server time outs but it would still, IMO, give you a bad user experience if the user has to sit watching the browser just spin.
Doing this on a background task is the only way to do it right. I don’t know how large the reports are, but using email can be a good solution. The background task simply generates the report, sends it via email and deletes it.
If the files are too large to send via email, then you will have to store them. Maybe send an email with a link and a message indicating the link will not work after X days/hours. Once you have a background worker, creating a daily or hourly clean up task would be very easy.
Hope it helps

Can Amazon Simple Workflow (SWF) be made to work with jRuby?

For uninteresting reasons, I have to use jRuby on a particular project where we also want to use Amazon Simple Workflow (SWF). I don't have a choice in the jRuby department, so please don't say "use MRI".
The first problem I ran into is that jRuby doesn't support forking and SWF activity workers love to fork. After hacking through the SWF ruby libraries, I was able to figure out how to attach a logger and also figure out how to prevent forking, which was tremendously helpful:
AWS::Flow::ActivityWorker.new(
swf.client, domain,"my_tasklist", MyActivities
) do |options|
options.logger= Logger.new("logs/swf_logger.log")
options.use_forking = false
end
This prevented forking, but now I'm hitting more exceptions deep in the SWF source code having to do with Fibers and the context not existing:
Error in the poller, exception:
AWS::Flow::Core::NoContextException: AWS::Flow::Core::NoContextException stacktrace:
"aws-flow-2.4.0/lib/aws/flow/implementation.rb:38:in 'task'",
"aws-flow-2.4.0/lib/aws/decider/task_poller.rb:292:in 'respond_activity_task_failed'",
"aws-flow-2.4.0/lib/aws/decider/task_poller.rb:204:in 'respond_activity_task_failed_with_retry'",
"aws-flow-2.4.0/lib/aws/decider/task_poller.rb:335:in 'process_single_task'",
"aws-flow-2.4.0/lib/aws/decider/task_poller.rb:388:in 'poll_and_process_single_task'",
"aws-flow-2.4.0/lib/aws/decider/worker.rb:447:in 'run_once'",
"aws-flow-2.4.0/lib/aws/decider/worker.rb:419:in 'start'",
"org/jruby/RubyKernel.java:1501:in `loop'",
"aws-flow-2.4.0/lib/aws/decider/worker.rb:417:in 'start'",
"/Users/trcull/dev/etl/flow/etl_runner.rb:28:in 'start_workers'"
This is the SWF code at that line:
# #param [Future] future
# Unused; defaults to **nil**.
#
# #param block
# The block of code to be executed when the task is run.
#
# #raise [NoContextException]
# If the current fiber does not respond to `Fiber.__context__`.
#
# #return [Future]
# The tasks result, which is a {Future}.
#
def task(future = nil, &block)
fiber = ::Fiber.current
raise NoContextException unless fiber.respond_to? :__context__
context = fiber.__context__
t = Task.new(nil, &block)
task_context = TaskContext.new(:parent => context.get_closest_containing_scope, :task => t)
context << t
t.result
end
I fear this is another flavor of the same forking problem and also fear that I'm facing a long road of slogging through SWF source code and working around problems until I finally hit a wall I can't work around.
So, my question is, has anyone actually gotten jRuby and SWF to work together? If so, is there a list of steps and workarounds somewhere I can be pointed to? Googling for "SWF and jRuby" hasn't turned up anything so far and I'm already 1 1/2 days into this task.

I think the issue might be that aws-flow-ruby doesn't support Ruby 2.0. I found this PDF dated Jan 22, 2015.
1.2.1
Tested Ruby Runtimes The AWS Flow Framework for Ruby has been tested
with the official Ruby 1.9 runtime, also known as YARV. Other versions
of the Ruby runtime may work, but are unsupported.

I have a partial answer to my own question. The answer to "Can SWF be made to work on jRuby" is "Yes...ish."
I was, indeed, able to get a workflow working end-to-end (and even make calls to a database via JDBC, the original reason I had to do this). So, that's the "yes" part of the answer. Yes, SWF can be made to work on jRuby.
Here's the "ish" part of the answer.
The stack trace I posted above is the result of SWF trying to raise an ActivityTaskFailedException due to a problem in some of my activity code. That part is my fault. What's not my fault is that the superclass of ActivityTaskFailedException has this code in it:
def initialize(reason = "Something went wrong in Flow",
details = "But this indicates that it got corrupted getting out")
super(reason)
#reason = reason
#details = details
details = details.message if details.is_a? Exception
self.set_backtrace(details)
end
When your activity throws an exception, the "details" variable you see above is filled with a String. MRI is perfectly happy to take a String as an argument to set_backtrace(), but jRuby is not, and jRuby throws an exception saying that "details" must be an Array of Strings. This exception blows through all the nice error catching logic of the SWF library and into this code that's trying to do incompatible things with the Fiber library. That code then throws a follow-on exception and kills the activity worker thread entirely.
So, you can run SWF on jRuby as long as your activity and workflow code never, ever throws exceptions because otherwise those exceptions will kill your worker threads (which is not the intended behavior of SWF workers). What they are designed to do instead is communicate the exception back to SWF in a nice, trackable, recoverable fashion. But, the SWF code that does the communicating back to SWF has, itself, code that's incompatible with jRuby.
To get past this problem, I monkey-patched AWS::Flow::FlowException like so:
def initialize(reason = "Something went wrong in Flow",
details = "But this indicates that it got corrupted getting out")
super(reason)
#reason = reason
#details = details
details = details.message if details.is_a? Exception
details = [details] if details.is_a? String
self.set_backtrace(details)
end
Hope that helps someone in the same situation as me.

I'm using JFlow, it lets you start SWF flow activity workers with JRuby.

Python 2.7 twisted FileSender ProducerMembrane object was never unregistered

I have client periodically upload files to the server by using FileSender in twisted. If the last upload task haven't finished and the current upload starts, I receive an runtime exception shows:
Cannot register producer , because producer was never registered.
My code is like follows:
def uploadFile(self, filename):
try:
self.sender = FileSender()
d = self.sender.beginFileTransfer(uploadfile, self.transport, self.__monitor)
d.addCallback(self.uploadCompleted, filename)
except RuntimeError as e:
...
My question is how to avoid this error? If we cannot avoid it, how to recover from it? Right now, if it happens once, the uploadFile will never send file any more.

Now I strictly skip the current upload task until the last upload task has finished. And it works!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Ruby on Rails working in background [resque + resque-status] - c++

Related

Google App Engine deferred.defer task not getting executed

returned array referencing the same memory spot but returning different values

Right way to delay file download in Django

Can Amazon Simple Workflow (SWF) be made to work with jRuby?

Python 2.7 twisted FileSender ProducerMembrane object was never unregistered

Categories

Resources