Image processing on a web server - c++

I want to run image processing algos on server which can interact easily with web apps. The image processing algos are compute heavy and wont be available in custom built libraries. Currently I am using Ruby on Rails on Heroku for my website.
What would be the best architecture to achieve this? take images from website - run image processing algo on it - display back on website
Most of my image processing code is on C/C++.
Can i call C/C++ code from Ruby on Rails directly? Is this possible on Heroku?
Or should I design a system where C/C++ code expose some APIs which can be called by Ruby on Rails server?

Heroku typically uses small virtual machine instances, so depending on just how heavy your processing is, it may not be the best choice of architecture. However, if you do use it I would do this:
Use a background task gem to do your processing. Have this running on a separate process (called a worker rather than a dyno in Heroku terminology). Delayed Job is a tried and tested solution for background tasks with a wealth of online information relating to integrating it into Heroku, but there are newer ones like Sidekiq which use the new threading system in modern versions of Ruby. They would allow everything to be done in the dyno, but I would say that it would be useful to keep all background processing away from the webserver dynos, so Delayed Job (or similar) would be fine.
As for integrating C/C++, I haven't needed to do this as yet. However, I know it is possible to create gems that integrate C or C++ code and compile natively. As long as you're using ruby rather than JRuby, I don't think Heroku should have a problem with them. There are other ways of accomplishing this, look at SO questions specifically about this topic, such as
How can I call C++ functions from within ruby
It seems that you need to create an extension, then create a gem to contain it. These links may or may not help.
http://www.rubyinside.com/how-to-create-a-ruby-extension-in-c-in-under-5-minutes-100.html
http://guides.rubygems.org/gems-with-extensions/
I recommend making a gem as I think it may be difficult to otherwise get libraries or executables on to a Heroku instance. You can store the gem in your vendor directory if you don't want to make it public.
Overall I would have the webserver upload to S3 or wherever you're storing the images (this can be done directly in the browser without using the webserver as a stepping stone with the AWS JS API. Have a look for gems to help.)
Then the webserver can request a background task to process the image.
If you're not storing them, things become a little more interesting... You'll need a database if you're using background tasks, so you could pass the image data over to the worker as a blob in the database perhaps.
I really wouldn't do all the processing just in the webserver dyno, unless you're really only hitting this thing very occasionally. With multiple users you'd hit a bottleneck very quickly.
The background process can set a flag on the image table row so the webserver can let the user know when processing is complete. (You can poll for information using JS on the upload complete screen using AJAX)
Of course, there are many other ways of accomplishing this, depending on a number of factors.
Apologies that the answer is vague, but the question is quite open-ended.
Good luck.

Related

What is the best way to transfer large files through Django app hosted on HEROKU?

HEROKU gives me H12 error on transferring the file to an API from my Django application (Understood it's a long running process and there is some memory/worker tradeoff I guess so). I am on one single hobby Dyno right now.
The function just runs smoothly for around 50MB file. The file itself is coming from a different source ( requests python package )
The idea is to build a file transfer utility using Django app on HEROKU. The file will not gets stored in my app side. Its just getting from point A and sending to point B.
Went through multiple discussions along with the standard HEROKU documentations, however I am struggling in between in some concepts:
Will this problem be solved by background tasks really? (If YES, I am finding explanation of the process than the direct way to do it such that I can optimize my flow)
As mentioned in standard docs, they recommend background tasks using RQ package for python, I am using Postgre SQL at moment. Will I need to install and manage Redis Database as well for this. Is this even related to Database?
Some recommend using extra Worker other than the WEB worker we have by default. How does this relate to my problem?
Some say to add multiple workers, not sure how this solve it. Let's say today it starts working for large files using background tasks, what if the load of users at same time increases. How this will impact my solution and how should I plan the mitigation plan around the risks.
If someone here has a strong understanding with respect to the architecture, I am here to listen your experiences and thoughts. Also, let me know if there is other way than HEROKU from a solution standpoint which will make this more easy for me.
Have you looked at using celery to run this as a background task?
This is a very standard way of dealing with requests which take a large time to complete.
Will this problem be solved by background tasks really? ( If YES, I am finding explanation of the process than the direct way to do it such that I can optimise my flow )
Yes it can be solved by background tasks. If you are using something like Celery which has direct support for django, you will be running another instance of your Django application but with a different startup command for Celery. It then keeps reading for new tasks to execute and reads the task name from the redis queue (or rabbitmq - whichever you use as the broker) and then executes that task and updates the status back to redis (or the broker you use).
You can also use flower along with celery so that you have a dashboard to see how many tasks are being executed and what are their statuses etc.
As mentioned in standard docs, they recommend background tasks using RQ package for python, I am using Postgre SQL at moment. Will I need to install and manage Redis Database as well for this. Is this even related to Database?
To use background task with Celery you will need to set up some sort of message broker like Redis or RabbitMQ
Some recommend using extra Worker other than the WEB worker we have by default. How does this relate to my problem?
I dont think that would help for your use case
Some say to add multiple workers, not sure how this solve it. Let's say today it starts working for large files using background tasks, what if the load of users at same time increases. How this will impact my solution and how should I plan the mitigation plan around the risks.
When you use celery, you will have to start few workers for that Celery instance, these workers are the ones who execute your background tasks. Celery documentation will help you with exact count calculation of these workers based on your instance CPU and memory etc.
If someone here has a strong understanding with respect to the architecture, I am here to listen your experiences and thoughts. Also, let me know if there is other way than HEROKU from a solution standpoint which will make this more easy for me.
I have worked on few projects where we used Celery with background tasks to upload large files. It has worked well for our use cases.
Here is my final take on this after full evaluation, trials and earlier recommendations made here, thanks #arun.
HEROKU needs a web worker to deliver the website runtime which hold 512MB of memory, operations your perform if are below this limits should be fine.
Beyond that let's say you have scenarios like mentioned above where a large file is coming from one source api and going into another target api with Django app, you will have to:
First, you will have to run the file upload function as a background process since it will take time more than 30 seconds to respond which HEROKU expects to return. If not H12 Error is waiting for you. Solution to this is implementing Django Background tasks, Celery worked in my case. So here Celery is your same Django app functionality running as a background handler which needs its own app Dyno ( The Worker ) This can be scaled as needed in future.
To make your Django WSGI ( Frontend App ) talk to the Celery ( Background App), you need a message broker in between which can be HEROKU Redis, RabbitMQ, etc.
Second, the problems doesn't gets solved here even though you have a new Worker dedicated for the Celery app, the memory limits will still apply as its also a Dyno with its own memory.
To overcome this, your Python requests module should download the file in stream instead of direct downloading complete file in a single memory buffer. Iterate and load the stream data in chunks and send the file chunks to target endpoint.
Even chunk size plays here an important role. I will not put exact number here since it depends on various factors:
Should not be too small, else it will take more time to transfer.
Should not be too big to be handled by either of the source/target endpoint servers.

Django communicating with another python application?

Is it possible to have django running on the server and one application from django inter communicating with another python process say that I developed and fetching a response from it or even make it just do a particular action?
It can be synchronous or asynchronous; I have some idea of being asynchronous where some package like hendrix, crossbar.io or even celery can be used. But I don't understand what would be the name for this inter-communication and how should I plan the architecture for this.
Going around my head I have the two following situations I'm seeking a plan to be developed:
1.
Say I have django and an e-mail sender with the python package smtp. A user making a request to a view would make django execute my python module I developed for sending an email to a particular user (with a smpt server from google/gmail). It could be synchronous or asynchronous.
OR
2
I have django (some application) and I want it to communicate with some server I maintain; say for making this server execute some code or just fetch a file (if it is an ftp server). Is this an appropriate situation to point to the term 'microservices'? Or there is another term or workaround here?
Your first solution would be called an installable python module, just like any package you install with pip. You can have this as a separate module if you need your code to be re-usable across multiple or just future projects.
Your second solution would be a microservice. This will require setting your small module as a service that could have a REST API to communicate with and make it do whatever you intend it to do.
If your question is "what is the right approach" then I would tell you it depends on your use case. If this is just some re-usable code that you don't want to repeat over and over through our project then just make it into a separate module. While if this is a service that you expect other built services will use and rely on, then just make it into a microservice. You can use a microframework such as Flask for easier and faster setup of your service. Otherwise, if it's just some code that you will use once and serves a single functionality on your application then just write it and keep it there.
There are no rules or standards on which approach should be taken. I personally judge things depending on the use-case.
Hope this helps!

Saving Interactive Bokeh Chart

I have created an interactive Bokeh chart with various widgets which allow manipulation of the data. I now want to understand what is the standard way of sharing such a plot or how do I save it for sharing.
The plot is created with the curdoc method and then output to the Bokeh server using session.show().
#create current visualization using plot p and widgets inputs
curdoc().add_root(HBox(inputs, p, width=1100))
#run the session
session = push_session(curdoc())
session.show() # open the document in a browser
session.loop_until_closed() # run forever
Does the app trigger actual python code?
If not, you might consider reworking it as a non-server standalone document (using CustomJS callbacks, for instance). That would just generate a self-contained static HTML file that you could publish or send anywhere, and have it be immediately accessible.
If your app does rely on executing actual python code to do the work, then it needs to actually be running somewhere for users to interact with it. First off, I would suggest you make a real app that runs in the server, like the ones in the demo app gallery (see also Use Case Scenarios in the User's Guide). A real server app, i.e. one you run like bokeh serve myapp.py, is definitely preferred over using bokeh.client, especially for "publishing" scenarios (it will also be simpler/less code and more performant). Then, distributing the app could mean a few things:
You give them the script and they run bokeh serve app.py locally themselves
You "deploy" the app by leaving it running on a server with a URL that is accessible to users who you want to be able to see it
Depending on how much compute the app does, and how many users you expect at a given time, the second option could be as simple as running bokeh serve app.py somewhere. But if there is heavy compute or you expect a lot of traffic, you may need more sophisticated "scale out" deployments behind a load balancer. More information is in Deployment Scenarios in the User's Guide, and of course we are happy to help wth more extended discussions on the public mailing list. Finally, I should mention that in the near future, automated scalable publishing of Bokeh applications will be available as a feature on https://anaconda.org/

How do those who are not using a backend framework (such as Rails/Symfony/Django) go about developing and deploying an Ember application's assets?

More specifically, when using a backend application framework I generally am afforded some level of asset management which allows me to work with multiple files in development which are uncompressed and unminified and then in production mode those files become automatically minified, compressed, and concatenated into a single file.
I am looking to create an Ember application that is a single page app that interfaces with a separate RESTful services layer. I simply do not need the weight of a framework behind the Ember app and am hoping to serve it as static html+css+js, so I am looking for any guidance on how to easily manage development and deployment of a client-side only app without adding much overhead.
Right now my biggest issue is with including JS (and to a lesser extent, CSS) files. My HTML is static and I have an Ember app comprised of many files, so I have many script tags to include them all. This is clearly not appropriate for production so I imagine some kind of build tool will be needed to assemble my Javascript files and overwrite the script tags in the HTML file. Are there tools out there right now that will do this? Is there another approach that I may be overlooking?
This is my first fully client-side application so it's very possible that I just need to make a paradigm shift, having done server-side applications for so long.
Agreed this can be tricky without a backend framework. For sure script tags are not the way to go and you will need some kind of build tool for production deployment.
Ember App Kit is a solution a few of us have been working on. It's still early stages but i've used it for a couple of projects so far and it's been much better than trying to roll-my-own with grunt. I would expect it to become the default starting point for ember apps in near future, to try it now just download it as a zip then read the Getting Started Guide
There are many other solid solutions out there, consider checking out:
ember-tools
brunch-with-ember-reloaded
brunch-with-hapmsters
charcoal
I use a combination of requirejs and Grunt, using these lovely functions and this one, which can compile your ember-handlebars templates into functions. (The git-contrib includes the ability to watch for changes in your files and perform various build steps which may differ if you are in development or production. You can have separate grunt functions which run various tasks for production or development. Of course for all of this you are going to need node!

How do I run one version of a web app while developing the next version?

I just finished a Django app that I want to get some outside user feedback on. I'd like to launch one version and then fork a private version so I can incorporate feedback and add more features. I'm planning to do lots of small iterations of this process. I'm new to web development; how do websites typically do this? Is it simply a matter of copying my Django project folder to another directory, launching the server there, and continuing my dev work in the original directory? Or would I want to use a version control system instead? My intuition is that it's the latter, but if so, it seems like a huge topic with many uses (e.g. collaboration, which doesn't apply here) and I don't really know where to start.
1) Seperate URLs www.yoursite.com vs test.yoursite.com. you can also do www.yoursite.com and www.yoursite.com/development, etc.. You could also create a /beta or /staging..
2) Keep seperate databases, one for production, and one for development. Write a script that will copy your live database into a dev database. Keep one database for each type of site you create. (You may want to create a beta or staging database for your tester).. Do your own work in the dev database. If you change the database structure, save the changes as a .sql file that can be loaded and run on the live site database when you turn those changes live.
3) Merge features into your different sites with version control. I am currently playing with a subversion setup for web apps that has my stable (trunk), one for staging, and one for development. Development tags + branches get merged into staging, and then staging tags/branches get merged into stable. Version control will let you manage your source code in any way you want. You will have to find a methodology that works for you and use it.
4) Consider build automation. It will publish your site for you automatically. Take a look at http://ant.apache.org/. It can drive a lot of automatically checking out your code and uploading it to each specific site as you might need.
5) Toy of the month: There is a utility called cUrl that you may find valuable. It does a lot from the command line. This might be okay for you to do in case you don't want to use all or any of Ant.
Good luck!
You would typically use version control, and have two domains: your-site.com and test.your-site.com. Then your-site.com would always update to trunk which is the current latest, shipping version. You would do your development in a branch of trunk and test.your-site.com would update to that. Then you periodically merge changes from your development branch to trunk.
Jas Panesar has the best answer if you are asking this from a development standpoint, certainly. That is, if you're just asking how to easily keep your new developments separate from the site that is already running. However, if your question was actually asking how to run both versions simultaniously, then here's my two cents.
Your setup has a lot to do with this, but I always recommend running process-based web servers in the first place. That is, not to use threaded servers (less relevant to this question) and not embedding in the web server (that is, not using mod_python, which is the relevant part here). So, you have one or more processes getting HTTP requests from your web server (Apache, Nginx, Lighttpd, etc.). Now, when you want to try something out live, without affecting your normal running site, you can bring up a process serving requests that never gets the regular requests proxied to it like the others do. That is, normal users don't see it.
You can setup a subdomain that points to this one, and you can install middleware that redirects "special" user to the beta version. This allows you to unroll new features to some users, but not others.
Now, the biggest issues come with database changes. Schema migration is a big deal and something most of us never pay attention to. I think that running side-by-side is great, because it forces you to do schema migrations correctly. That is, you can't just shut everything down and run lengthy schema changes before bringing it back up. You'd never see any remotely important site doing that.
The key is those small steps. You need to always have two versions of your code able to access the same database, so changes you make for the new code need to not break the old code. This breaks down into a few steps you can always make:
You can add a column with a default value, or that is optional. The new code can use it, and the old code can ignore it.
You can update the live version with code that knows to use a new column, at which point you can make it required.
You can make the new version ignore a column, and when it becomes the main version, you can delete that column.
You can make these small steps to migrate between any schemas. You can iteratively add a new column that replaces an old one, roll out the new code, and remove the old column, all without interrupting service.
That said, its your first web app? You can probably break it. You probably have few users :-) But, it is fantastic you're even asking this question. Many "professionals" fair to ever ask it, and even then fewer answer it.
What I do is have an export a copy of my SVN repository and put the files on the live production server, and then keep a virtual machine with a development working copy, and submit the changes to the repo when Im done.