Should I make a separate django project for crawling/scraping part?

Should I make a separate django project for crawling/scraping part? - django

I recently made an online-post-curating service with django. I now upload posts manually everyday but finally realized web-crawler/scraper is highly needed to save my time. In this case, is it better to make a separate django project(for the crawler only)? Or just starting a new app in the existing project is enough?
[Advantage/Ground of making separate project for crawler (I guess)]
Totally different dependencies needed, so new project
Mainly asynchronous tasks unlike the existing service
The function is very universal (not really related to post-curating service)
Server spec. for crawler must be different, so be prepared in advance with separate project.
[Opposite side - making an app is inough]
For asynchronous tasks, just adding some library like Celery in your project is enough
Running multiple projects demands lots and lots of additional effort
Two projects using one database(saving and reading posts) at the same time can be a threat in some day
Sorry for the tacky question and thank you in advance!

Related

Django multiple Project using same database tables

I've been reading through related articles regarding using same database for multiple django projects however, I have not been able to come up with a fix yet.
I've tried relative and absolute pathing when importing but it gives "attempted relative import beyond top-level package" error when I try to access parent directory.
Project 1 gets the users to write to database by filling in a form and Project 2 retrieves data written in by users from database.
I'm using Postgresql for database. I've tried writing exactly same models.py for both projects but it seems like in database they appear as separate relations/tables. E.g. for table named school in both models.py, it would look like project1_school and project2_school in Postgres database.
Is there a way to write to and read from same tables of same database?
Thank you so much in advance.

I think that you might be confuse with the difference between Projects and Applications.
Projects vs. apps
What’s the difference between a project and an app? An app is a Web application that does something – e.g., a Weblog system, a database of public records or a small poll app. A project is a collection of configuration and apps for a particular website. A project can contain multiple apps. An app can be in multiple projects.
Writing your first Django app, part 1
So in you particular case, I would say that your actual projects, both of them, could be applications of one project. The main reason, why I think this is a better approach, is that both are gonna use the same data, one application writes while the other retrieve it. One could even argue that they actually could be the same application. But this may depend on many factors of your business.
BTW, is really hard for me to imagine a situation where it would be a good idea to have two projects using the same database. Even if both projects need to share data, I would not think in using on database. I would try to solve it at an application level. But I you need for some reason to share information at database level, there are tools to connect both databases.

How do those who are not using a backend framework (such as Rails/Symfony/Django) go about developing and deploying an Ember application's assets?

More specifically, when using a backend application framework I generally am afforded some level of asset management which allows me to work with multiple files in development which are uncompressed and unminified and then in production mode those files become automatically minified, compressed, and concatenated into a single file.
I am looking to create an Ember application that is a single page app that interfaces with a separate RESTful services layer. I simply do not need the weight of a framework behind the Ember app and am hoping to serve it as static html+css+js, so I am looking for any guidance on how to easily manage development and deployment of a client-side only app without adding much overhead.
Right now my biggest issue is with including JS (and to a lesser extent, CSS) files. My HTML is static and I have an Ember app comprised of many files, so I have many script tags to include them all. This is clearly not appropriate for production so I imagine some kind of build tool will be needed to assemble my Javascript files and overwrite the script tags in the HTML file. Are there tools out there right now that will do this? Is there another approach that I may be overlooking?
This is my first fully client-side application so it's very possible that I just need to make a paradigm shift, having done server-side applications for so long.

Agreed this can be tricky without a backend framework. For sure script tags are not the way to go and you will need some kind of build tool for production deployment.
Ember App Kit is a solution a few of us have been working on. It's still early stages but i've used it for a couple of projects so far and it's been much better than trying to roll-my-own with grunt. I would expect it to become the default starting point for ember apps in near future, to try it now just download it as a zip then read the Getting Started Guide
There are many other solid solutions out there, consider checking out:
ember-tools
brunch-with-ember-reloaded
brunch-with-hapmsters
charcoal

I use a combination of requirejs and Grunt, using these lovely functions and this one, which can compile your ember-handlebars templates into functions. (The git-contrib includes the ability to watch for changes in your files and perform various build steps which may differ if you are in development or production. You can have separate grunt functions which run various tasks for production or development. Of course for all of this you are going to need node!

Best practices for DVCS and reusable Django apps

I'm getting set up with proper distribution version control (yes, overdue) on a large Django environment with lot's of reusable apps and lot's of projects.
What's the right way to do this?
Clone each app you need within each project, to allow you to make changes to the app without worrying about breaking anything.
Have one copy of each version controlled application to avoid having multiple copies of the code, each in its own repository.
Or is there a better way?
Thanks.
Edit for clarity: These are in house apps that are reused from project to project.

In my opinion the best practice is to keep all your apps as one library/package. You can have versions/snapshots (e.g. tags in hg) and branches and you should definitely create and configure setup.py file.

If the app are reusables, you must create a egg in pypi. These have releases. For each project, you could use one or the other releases.
See for example this package.
To deploy the projects both in local as in the server, you can use buildout (very recomended)

Is there an ideal way to move from Staging to Production for Coldfusion code?

I am trying to work out a good way to run a staging server and a production server for hosting multiple Coldfusion sites. Each site is essentially a fork of a repo, with site specific changes made to each. I am looking for a good way to have this staging server move code (upon QA approval) to the production server.
One fanciful idea involved compiling the sites each into EAR files to be run on the production server, but I cannot seem to wrap my head around Coldfusion archives, plus I cannot see any good way of automating this, especially the deployment part.
What I have done successfully before is use subversion as a go between for a site, where once a site is QA'd the code is committed and then the production server's working directory would have an SVN update run, which would then trigger a code copy from the working directory to the actual live code. This worked fine, but has many moving parts, and still required some form of server access to each server to run the commits and updates. Plus this worked for an individual site, I think it may be a nightmare to setup and maintain this architecture for multiple sites.
Ideally I would want a group of developers to have FTP access with the ability to log into some control panel to mark a site for QA, and then have a QA person check the site and mark it as stable/production worthy, and then have someone see that a site is pending and click a button to deploy the updated site. (Any of those roles could be filled by the same person mind you)
Sorry if that last part wasn't so much the question, just a framework to understand my current thought process.

Agree with #Nathan Strutz that Ant is a good tool for this purpose. Some more thoughts.
You want a repeatable build process that minimizes opportunities for deltas. With that in mind:
SVN export a build.
Tag the build in SVN.
Turn that export into a .zip, something with an installer, etc... idea being one unit to validate with a set of repeatable deployment steps.
Send the build to QA.
If QA approves deploy that build into production
Move whole code bases over as a build, rather than just changed files. This way you know what's put into place in production is the same thing that was validated. Refactor code so that configuration data is not overwritten by a new build.
As for actual production deployment, I have not come across a tool to solve the multiple servers, different code bases challenge. So I think you're best served rolling your own.
As an aside, in your situation I would think through an approach that allows for a standardized codebase, with a mechanism (i.e. an API) that allows for the customization you're describing. Otherwise managing each site as a "custom" project is very painful.
Update
Learning Ant: Ant in Action [book].
On Source Control: for the situation you describe, I would maintain a core code base and overlays per site. Export core, then site specific over it. This ensures any core updates that site specific changes don't override make it in.
Call this combination a "build". Do builds with Ant. Maintain an Ant script - or perhaps more flexibly an ant configuration file - per core & site combination. Track version number of core and site as part of a given build.
If your software is stuffed inside an installer (Nullsoft Install Shield for instance) that should be part of the build. Otherwise you should generate a .zip file (.ear is a possibility as well, but haven't seen anyone actually do this with CF). Point being one file that encompasses the whole build.
This build file is what QA should validate. So validation includes deployment, configuration and functionality testing. See my answer for deployment on how this can flow.
Deployment:
If you want to automate deployment QA should be involved as well to validate it. Meaning QA would deploy / install builds using the same process on their servers before doing a staing to production deployment.
To do this I would create something that tracks what server receives what build file and whatever credentials and connection information is necessary to make that happen. Most likely via FTP. Once transferred, the tool would then extract the build file / run the installer. This last piece is an area I would have to research as to how it's possible to let one server run commands such as extraction or installation remotely.

You should look into Ant as a migration tool. It allows you to package your build process with a simple XML file that you can run from the command line or from within Eclipse. Creating an automated build process is great because it documents the process as well as executes it the same way, every time.
Ant can handle zipping and unzipping, copying around, making backups if needed, working with your subversion repository, transferring via FTP, compressing javascript and even calling a web address if you need to do something like flush the application memory or server cache once it's installed. You may be surprised with the things you can do with Ant.
To get started, I would recommend the Ant manual as your main resource, but look into existing Ant builds as a good starting point to get you going. I have one on RIAForge for example that does some interesting stuff and calls a groovy script to do some more processing on my files during the build. If you search riaforge for build.xml files, you will come up with a great variety of them, many of which are directly for ColdFusion projects.

How do I run one version of a web app while developing the next version?

I just finished a Django app that I want to get some outside user feedback on. I'd like to launch one version and then fork a private version so I can incorporate feedback and add more features. I'm planning to do lots of small iterations of this process. I'm new to web development; how do websites typically do this? Is it simply a matter of copying my Django project folder to another directory, launching the server there, and continuing my dev work in the original directory? Or would I want to use a version control system instead? My intuition is that it's the latter, but if so, it seems like a huge topic with many uses (e.g. collaboration, which doesn't apply here) and I don't really know where to start.

1) Seperate URLs www.yoursite.com vs test.yoursite.com. you can also do www.yoursite.com and www.yoursite.com/development, etc.. You could also create a /beta or /staging..
2) Keep seperate databases, one for production, and one for development. Write a script that will copy your live database into a dev database. Keep one database for each type of site you create. (You may want to create a beta or staging database for your tester).. Do your own work in the dev database. If you change the database structure, save the changes as a .sql file that can be loaded and run on the live site database when you turn those changes live.
3) Merge features into your different sites with version control. I am currently playing with a subversion setup for web apps that has my stable (trunk), one for staging, and one for development. Development tags + branches get merged into staging, and then staging tags/branches get merged into stable. Version control will let you manage your source code in any way you want. You will have to find a methodology that works for you and use it.
4) Consider build automation. It will publish your site for you automatically. Take a look at http://ant.apache.org/. It can drive a lot of automatically checking out your code and uploading it to each specific site as you might need.
5) Toy of the month: There is a utility called cUrl that you may find valuable. It does a lot from the command line. This might be okay for you to do in case you don't want to use all or any of Ant.
Good luck!

You would typically use version control, and have two domains: your-site.com and test.your-site.com. Then your-site.com would always update to trunk which is the current latest, shipping version. You would do your development in a branch of trunk and test.your-site.com would update to that. Then you periodically merge changes from your development branch to trunk.

Jas Panesar has the best answer if you are asking this from a development standpoint, certainly. That is, if you're just asking how to easily keep your new developments separate from the site that is already running. However, if your question was actually asking how to run both versions simultaniously, then here's my two cents.
Your setup has a lot to do with this, but I always recommend running process-based web servers in the first place. That is, not to use threaded servers (less relevant to this question) and not embedding in the web server (that is, not using mod_python, which is the relevant part here). So, you have one or more processes getting HTTP requests from your web server (Apache, Nginx, Lighttpd, etc.). Now, when you want to try something out live, without affecting your normal running site, you can bring up a process serving requests that never gets the regular requests proxied to it like the others do. That is, normal users don't see it.
You can setup a subdomain that points to this one, and you can install middleware that redirects "special" user to the beta version. This allows you to unroll new features to some users, but not others.
Now, the biggest issues come with database changes. Schema migration is a big deal and something most of us never pay attention to. I think that running side-by-side is great, because it forces you to do schema migrations correctly. That is, you can't just shut everything down and run lengthy schema changes before bringing it back up. You'd never see any remotely important site doing that.
The key is those small steps. You need to always have two versions of your code able to access the same database, so changes you make for the new code need to not break the old code. This breaks down into a few steps you can always make:
You can add a column with a default value, or that is optional. The new code can use it, and the old code can ignore it.
You can update the live version with code that knows to use a new column, at which point you can make it required.
You can make the new version ignore a column, and when it becomes the main version, you can delete that column.
You can make these small steps to migrate between any schemas. You can iteratively add a new column that replaces an old one, roll out the new code, and remove the old column, all without interrupting service.
That said, its your first web app? You can probably break it. You probably have few users :-) But, it is fantastic you're even asking this question. Many "professionals" fair to ever ask it, and even then fewer answer it.

What I do is have an export a copy of my SVN repository and put the files on the live production server, and then keep a virtual machine with a development working copy, and submit the changes to the repo when Im done.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js