Virus scan for files being uploaded to Sitecore - sitecore

Are there any best practices on virus scanning all files being uploaded to the Sitecore media library (and ultimately stored in Sitecore's DB)?
I searched all over the web but there is too much noise caused by the word virus since many people seem to have performance issues on server that have anti-virus software installed.

I don't know if it is an established best practice, but I would probably add a processor for the uiUpload pipeline that used an API or command line process for a commercial antivirus product. Other than the fact that it is in a pipeline processor, it shouldn't really be much different from how you would do it in any other ASP.NET application. Performance will definitely be a concern, but you could create a dialog with a psuedo progress bar to give some feedback to the user.

Take a look at this post by Mike Reynolds. It may help you out:
http://sitecorejunkie.com/2013/11/09/perform-a-virus-scan-on-files-uploaded-into-sitecore/

I am not aware of any published best practices, but if you are able to add a step in the upload process, you might want to take a look at Metascan, which provides API level integration to multiple antivirus engines. Using this, you could build a workflow for those uploaded files to scan them prior to them hitting your Sitecore media library by establishing rules based on the results of the antivirus engines used in your Metascan deployment. There's also a hosted version at metascan-online(dot)com
Disclaimer /// I am an employee of OPSWAT, who produces Metascan, but it appears to be a potential solution to your issue

In one of our recent Projects, we were faced with a requirement to scan incoming files for virus. The problem in the project was that the files after begin uploaded, were made public available on the website.
The way we solved the problem was to implementing https://www.virustotal.com/. Its a free online virus scanner that has a public API. You can send files via SSL.
We implemented the solution by adding newly uploaded files to a Sitecore workflow. The workflow would handle the scanning of files, and move the files to the final stage of the workflow, if the files wasn't infected. If a file was infected, the file would be deleted.
A Scheduler is running every 5 minutes to check for new incoming files with the workflow.
This also means that the files aren't available straight away, as the scheduler has to check the file, but you should be able to implement the functionality directly when the user has uploaded the file, by adding your custom code to the upload pipeline.

Related

What is the best way to set up your development environment for Sitecore

The general guidance appears to be to install Sitecore into one folder, e.g. D:\Websites\MyWebSite and then create your Visual Studio project in a separate folder, e.g. C:\Projects\MyWebProject. You would then publish your custom code into the Sitecore folder from Visual Studio (This video explains what I’m describing https://www.youtube.com/watch?v=i3Mwcphtz4w around 13 mins in).
I have the following questions:-
Do people only store their Visual Studio project in source countrol and not the Sitecore code?
The publish option from VS into the Sitecore folder only has options for adding files or deleting anything not in the VS project. How would files removed from the VS project ever get deleted without doing it manually?
We use web-deploy to publish sites to staging and live environments. In this scenario would you publish from your VS project or would you set up a way to publish the Sitecore folder (if so how)?
Is this actually a good set up to have or do you do something different?
I did a lot of research on this when we started Sitecore development a couple of years ago. I remember reading a post from Sean Kearney that made a lot of sense to me: http://seankearney.com/post/Visual-Studio-Projects-and-Sitecore
We ended up using this approach for both large and small scale projects and it has been great. You will also want to look at a couple of other tools:
Team Development for Sitecore (TDS) from Hedgehog Development (http://www.hhogdev.com/products/team-development-for-sitecore/overview.aspx)
CopySauce from Igloo (http://www.igloo.com.au/blog/copysauce-igloos-sitecore-development-utility/)
SitecoreRocks for Visual Studio
So to answer your questions:
All of your code and some of the Sitecore items are stored in source control. The approach you want to take is to only store new Sitecore items (layouts, sublayouts, templates, etc) that you create along with any items you may need to customize. You do not need to store all of the sitecore source, content or modules...just what you would need to reapply to get a fresh environment up-to-date. You can manage this manually but a tool like TDS makes this MUCH easier.
We use TDS to manage the publish/deploy to each of our environments. TDS has configurable settings for handling items that have been deleted, including the ability to move it to the Sitecore recycle bin or simply remove it. You have to be careful with this but it does work.
We use a separate build environment to assemble and run deployments using TDS and Jenkins. Basically, all of the code is retrieved from the source control system to the Sitecore server and built using MSBuild and TDS. In most cases we use a webdeploy directly to the Sitecore webroot, but for production we build TDS packages and then run them on each Content Delivery Server
We have used this setup for 7 sitecore projects so far and I am very happy with how it has worked out. We have questioned whether TDS is worth the license fee but the answer always comes back as a yes. The alternative is not very appealing for our development staff and time savings far out-weigh the costs.
Everything is stored in Source Control!... just not always in the same area as they reside on the web server. Storing the Sitecore folder in source control is a good idea as there are changes that you will have as you install modules, but you do NOT add the Sitecore folder as part of your solution/project and should really be there to pull from if need be and not something that is even tracked/monitored.
Once Sitecore is installed, create a new project that resides in the website folder and only add things like the properties folder, layouts, xml and other folders that you want. I don't even include the app_config in my project. Oh and to be clear, it's probably best to just keep the Sitecore folder as a sort of reference folder in your source control but not as part of your website trunk. We have it on the ignore list for website folder in source control. However, that being said, keep in mind that you will NEED to have it in your website folder.
Technically speaking, the recommended approach is to install Sitecore on to the server itself as a stand alone empty instance.. like using the installer with the client mode (not full) so that you get the framework for an empty site in place. Then you can create the deployment package/packages/whatever and it will all be your own code. You should really never have to mess with changing/removing the base Sitecore file system manually.
See above. Generally speaking, unless you have a reason to do so: install Sitecore as an empty instance... then manage your code/files via deployment and just leave the Sitecore folder files alone. You will have very little reason to ever touch them or the Sitecore folder itself outside of an upgrade.
Adding Sitecore itself to source control should be avoided, since you won't be deploying Sitecore as part of your implementation. For modifications to Sitecore itself, you would need a way of handling those inside your implementation, but the config patch system and other mechanisms provide the means for this.
Redundant files in the web site folder will only be a real problem in your development environment. When publishing to a demo environment or to a live environment, you will only publish the material that you actually want. And the deployment-based setup opens up the possibility of always starting from a clean Sitecore installation - as long as you include your Sitecore modifications as part of your implementation (which is not covered in the video). So there is little risk of this being a problem in real life, and the development method in the video makes eliminating this risk entirely possible.
The Sitecore installation should be handled outside of the deployment of your implementation.
It's a good setup, because the method in the video is the method Sitecore recommends for development, and it is also the method Sitecore teaches to developers in development courses. The most obvious advantages of this method are
Clean separation between your web site implementation and the Sitecore installation. There is no risk of accidentally mangling the Sitecore installation, and there is no risk of forgetting unmanaged manual modifications to Sitecore that are needed to run your site. This separation is hard to accomplish if you're not using the method in the video.
By using publishing to deploy your implementation, you know that your implementation is deployable on top of a clean Sitecore installation - and works. This means when deploying to a production or demo server in the future, things will work the same and there will be no surprises. This is very hard to be confident about if you're not using the method in the video.
To test your implementation on a different version of Sitecore, you can just deploy to a clean installation of a different version. This is very hard to test if you're not using the method in the video.
There is sample source code for the video on GitHub, along with instructions on how to set up the development environment, including the publishing parts. This sample source directly and indirectly answers some of your questions.

About Sitecore Backup

I am trying to backup a whole Sitecore website.
I know that the package designer can do part of the job, but not entirely.
Having a backup is always a good way when the site is broken accidently.
Is there a way or a tool to backup the whole Sitecore website?
I am new to the Sitecore, so any advise is welcome.
Thank you!
We've got a SQL job running to back-up the databases nightly.
Apart from that, when I deploy code and it's a small bit I usually end up backing up only the parts I'm going to replace. If it's a big code deploy I just back up the whole website (code-wise anyway) before deploying the code package.
Apart from that we also run scheduled backups of the code (although I don't know the intervals), and of course we've got source control if everything else fails.
If you've got an automated deployment tool you could also automate the above of course.
Before a major deploy of content or code, I typically backup the master database and zip everything in the website directory minus the App_Data and temp directories. That way if the deploy goes wrong, I can restore the code and database fairly quickly and be back to the previous state.
I have no knowledge of a tool that can do this for you, but there are a few ways you can handle this in an easy way:
1) you can create a database backup of the master database, but this only contains content and no files like media files that are saved on disk or your complete and build solution. It is always a good idea to schedule your database backup every night and save the backups for at least a week or more.
2) When you use the package designer, you can create dynamic pacakges that can contain all your content, media files and solution files on disk. This is an easy way to deploy the site onto a new Sitecore installation all at once, but it requires a manual backup every time.
3) Another way you can use is to serialize your entire content-tree to an xml-format on disk from the Developer tab. Once serialized, you can revert them back into the content tree.
I'd suggest thinking of this in two parts, the first part is backing up the application which is a simple as making sure your application is in some SCM system.
For that you can use Team Development for Sitecore. One of it's features allows you to connect a Visual Studio project to your Sitecore instance.
You can select Sitecore items that you want to be stored in your solution and it will serialize them and place them into your solution.
You can then check them into your SCM system and sleep easier.
The thing to note is deciding which item to place in source control, generally you can think of Sitecore items has developer owned and Content Editor owned. The items you will place in your solution are the items that are developer owned; templates, sublayouts, layouts, and content items that you need for the site to function are good examples.
This way if something goes bad a base restoration is quick and easy.
The second part is the backup of the content in Sitecore that has been added since your deployment. For that like Trayek said above use a SQL job to do the back-ups at whatever interval your are comfortable with.
If you're bored I have a post on using TDS (Team Development for Sitecore) you can check out at Working with Sitecore, Part Nine: TDS
Expanding bit more on what Trayek said, my suggestion would be to have a Continuous Integration (CI) and have automated deploy using Team City.
A good answer is also given here on Stack Overflow.
Basically in your case Teamcity would automatically
1. take back up of the current website (i.e. code) and deploy the new code on top of it.
2. Scripts can also be written to take a differential backup of the SQL databases, if need be.
Hope this helps.
Take a look at Sitecore Instance Manager module. Works really well for packaging entire Sitecore instance.

Business Process "Observer" application

My client is requesting to be notified any time one of their business processes fails for any reason. I had the idea of writing a seperate application that will run as an "observer" and check for various parts of the process.
An example would be that a daily file was generated and uploaded to an FTP location. The "Observer" might have the following "tests" :
Connect to the FTP
Go to folder where file should exist
Find file with naming convention
Verify create date of file
Failure of any step will send an alert email and also log to a report (both in case database is down OR email is down).
My question is.... Are there any products out there that do something close to this? I'd rather buy if there is something robust out there. If not, this almost seems like a unit test platform... Anything out there for testing I could potentially repurpose?
As an FYI, we are a Microsoft/Windows based shop.
Thx in advance!
You could even use a Continuous Integration framework for this. They normally monitor source code repositories and build&test things, but could be used for this as well.
For instance, Hudson, Jenkins and CrouseControl.NET are a few open source ones that are good and can easily be set up for something like this. Only change the monitoring of a repository to either filesystem over FTP and write a small script which checks what you need. Everything else comes for free by the framework, i.e. email, web interface for monitoring and running things.
Just an idea.

Is there an ideal way to move from Staging to Production for Coldfusion code?

I am trying to work out a good way to run a staging server and a production server for hosting multiple Coldfusion sites. Each site is essentially a fork of a repo, with site specific changes made to each. I am looking for a good way to have this staging server move code (upon QA approval) to the production server.
One fanciful idea involved compiling the sites each into EAR files to be run on the production server, but I cannot seem to wrap my head around Coldfusion archives, plus I cannot see any good way of automating this, especially the deployment part.
What I have done successfully before is use subversion as a go between for a site, where once a site is QA'd the code is committed and then the production server's working directory would have an SVN update run, which would then trigger a code copy from the working directory to the actual live code. This worked fine, but has many moving parts, and still required some form of server access to each server to run the commits and updates. Plus this worked for an individual site, I think it may be a nightmare to setup and maintain this architecture for multiple sites.
Ideally I would want a group of developers to have FTP access with the ability to log into some control panel to mark a site for QA, and then have a QA person check the site and mark it as stable/production worthy, and then have someone see that a site is pending and click a button to deploy the updated site. (Any of those roles could be filled by the same person mind you)
Sorry if that last part wasn't so much the question, just a framework to understand my current thought process.
Agree with #Nathan Strutz that Ant is a good tool for this purpose. Some more thoughts.
You want a repeatable build process that minimizes opportunities for deltas. With that in mind:
SVN export a build.
Tag the build in SVN.
Turn that export into a .zip, something with an installer, etc... idea being one unit to validate with a set of repeatable deployment steps.
Send the build to QA.
If QA approves deploy that build into production
Move whole code bases over as a build, rather than just changed files. This way you know what's put into place in production is the same thing that was validated. Refactor code so that configuration data is not overwritten by a new build.
As for actual production deployment, I have not come across a tool to solve the multiple servers, different code bases challenge. So I think you're best served rolling your own.
As an aside, in your situation I would think through an approach that allows for a standardized codebase, with a mechanism (i.e. an API) that allows for the customization you're describing. Otherwise managing each site as a "custom" project is very painful.
Update
Learning Ant: Ant in Action [book].
On Source Control: for the situation you describe, I would maintain a core code base and overlays per site. Export core, then site specific over it. This ensures any core updates that site specific changes don't override make it in.
Call this combination a "build". Do builds with Ant. Maintain an Ant script - or perhaps more flexibly an ant configuration file - per core & site combination. Track version number of core and site as part of a given build.
If your software is stuffed inside an installer (Nullsoft Install Shield for instance) that should be part of the build. Otherwise you should generate a .zip file (.ear is a possibility as well, but haven't seen anyone actually do this with CF). Point being one file that encompasses the whole build.
This build file is what QA should validate. So validation includes deployment, configuration and functionality testing. See my answer for deployment on how this can flow.
Deployment:
If you want to automate deployment QA should be involved as well to validate it. Meaning QA would deploy / install builds using the same process on their servers before doing a staing to production deployment.
To do this I would create something that tracks what server receives what build file and whatever credentials and connection information is necessary to make that happen. Most likely via FTP. Once transferred, the tool would then extract the build file / run the installer. This last piece is an area I would have to research as to how it's possible to let one server run commands such as extraction or installation remotely.
You should look into Ant as a migration tool. It allows you to package your build process with a simple XML file that you can run from the command line or from within Eclipse. Creating an automated build process is great because it documents the process as well as executes it the same way, every time.
Ant can handle zipping and unzipping, copying around, making backups if needed, working with your subversion repository, transferring via FTP, compressing javascript and even calling a web address if you need to do something like flush the application memory or server cache once it's installed. You may be surprised with the things you can do with Ant.
To get started, I would recommend the Ant manual as your main resource, but look into existing Ant builds as a good starting point to get you going. I have one on RIAForge for example that does some interesting stuff and calls a groovy script to do some more processing on my files during the build. If you search riaforge for build.xml files, you will come up with a great variety of them, many of which are directly for ColdFusion projects.

Are there any decent compiler web services?

Here's the idea: you commit your code to a repository and call a web service (or enter the request through a web app) to have it compiled. The results are then pushed up to a FTP server, S3 bucket, etc. Is there anything like this out there on the public internet?
TFS has a build queuing feature, but I'm thinking more along the lines of a internet (not intranet) web service. And if it can pull from known source control interfaces (Subversion, CVS, etc.) then the caller needs to pass very little besides selecting a compiler and specific compilation options.
My reasoning is more along the lines of removing a lot of software installation and configuration hassles, especially when working between different languages/platforms/frameworks/projects
See Is there build farm for checking open source apps against different OS'es? for related information.