How to configure Sitecore processing server? - sitecore

I just installed Sitecore Experience Platform and configured it according to the Sitecore scaling recommendations for processing servers.
But I want to know the following things:
1.How can I use the sitecore processing server?
2.How can I check whether processing server is working fine?
3.How collections DB data is processed and send to reporting server?

The processing server is a piece of the whole analytics (xDB) part of the Sitecore solution. More info can be found here.
Snippet:
"The processing and aggregation component extracts information from
captured, raw analytics data and transforms it into a form suitable
for use in reporting applications. It also performs specific tasks on
the collection database that involve mass updates.
You implement processing and aggregation on a Sitecore application
server connected to both the collection and reporting databases. A
processing server can run independently on a dedicated server, or on
the same server together with other Sitecore components. By
implementing multiple processing or aggregation servers, it is
possible to achieve higher performance on high-traffic solutions."
In short: the processing server will aggregate the data in Mongo and processes it (to the reporting database). This can be put on a separate server in order to spare resources on your other servers. I'm not quite sure what it all does behind the scenes and how to check exactly and only that part of the process, but you could check the the reporting tools in the Sitecore backend, like Experience Analytics. If those are working, you probably are fine. Also, check the logs on the processing server - that will give you an indication what he is doing and if any errors occur.

Related

What is the "proper" way to use DynamoDB for an iOS app?

I've just started messing around with AWS DynamoDB in my iOS app and I have a few questions.
Currently, I have my app communicating directly to my DynamoDB database. I've been reading around lately and people are saying this isn't the proper way to go about getting data from my database.
By this I mean is I just have a function in my code querying my Dynamo database and returning the result.
How I do it works but is there a better way I should be going about this?
Amazon DynamoDB itself is a highly-scalable service and standing up another server in front of it requires scaling the service also in line with the RCU/WCU configured for your tables, which we can and should avoid.
If your mobile application doesn't need a backend server and you can perform all the business functions from the mobile device, then you should probably think about
Using the AWS DynamoDB SDK for iOS devices to write your client application that runs on the mobile device
Use AWS Token Vending Machine to authenticate your mobile users to grant them credentials to be used to run operations on DynamoDB tables.
Control access (i.e what operations should be allowed on tables etc.,) using IAM policies.
HTH.
From what you say, I can guess that you are talking about a way you can distribute data to many clients (ios apps).
There are few integration patterns (a very good book on this: Enterprise Integration Patterns), one of which is called shared database. It is essentially about using a common database for multiple clients to share the data. Main drawback for that pattern (in your case) is that you are doing assumption about how the database schema looks like. It can potentially bring you some headache supporting the schema in the future, if your business logic changes.
The more advanced approach would be sending events on every change in your data instead of directly writing changes to the database from client apps. This way you can add additional processing to the events before the data they carry is written to the database. For example, you may want to change the event format in the new version of your app, but still want to support legacy users, so you add translation procedure which transforms both types of events to the format which fits the database schema. It's basically a question of whether to work with diffs vs snapshots.
You should be aware of added complexity of working with events, and it can be an overkill if your app is simple and changes in schema are unlikely.
Also consider that you can do data preprocessing using DynamoDB Streams, which gives you some advantages of using events still keeping it simple to implement.

Should I implement revisioning using database triggers or using django-reversion?

We're looking into implementing audit logs in our application and we're not sure how to do it correctly.
I know that django-reversion works and works well but there's a cost of using it.
The web server will have to make two roundtrips to the database when saving a record even if the save is in the same transaction because at least in postgres the changes are written to the database and comitting the transaction makes the changes visible.
So this will block the web server until the revision is saved to the database if we're not using async I/O which is currently the case. Even if we would use async I/O generating the revision's data takes CPU time which again blocks the web server from handling other requests.
We can use database triggers instead but our DBA claims that offloading this sort of work to the database will use resources that are meant for handling more transactions.
Is using database triggers for this sort of work a bad idea?
We can scale both the web servers using a load balancer and the database using read/write replicas.
Are there any tradeoffs we're missing here?
What would help us decide?
You need to think about the pattern of db usage in your website.
Which may be unique to you, however most web apps read much more often than they write to the db. In fact it's fairly common to see optimisations done, to help scaling a web app, which trade off more complicated 'save' operations to get faster reads. An example would be denormalisation where some data from related records is copied to the parent record on each save so as to avoid repeatedly doing complicated aggregate/join queries.
This is just an example, but unless you know your specific situation is different I'd say don't worry about doing a bit of extra work on save.
One caveat would be to consider excluding some models from the revisioning system. For example if you are using Django db-backed sessions, the session records are saved on every request. You'd want to avoid doing unnecessary work there.
As for doing it via triggers vs Django app... I think the main considerations here are not to do with performance:
Django app solution is more 'obvious' and 'maintainable'... the app will be in your pip requirements file and Django INSTALLED_APPS, it's obvious to other developers that it's there and working and doesn't need someone to remember to run the custom SQL on the db server when you move to a new server
With a db trigger solution you can be certain it will run whenever a record is changed by any means... whereas with Django app, anyone changing records via a psql console will bypass it. Even in the Django ORM, certain bulk operations bypass the model save method/save signals. Sometimes this is desirable however.
Another thing I'd point out is that your production webserver will be multiprocess/multithreaded... so although, yes, a lengthy db write will block the webserver it will only block the current process. Your webserver will have other processes which are able to server other requests concurrently. So it won't block the whole webserver.
So again, unless you have a pattern of usage where you anticipate a high frequency of concurrent writes to the db, I'd say probably don't worry about it.

Sitecore performance enhancements

We need our Sitecore web application to process 60-80 web requests per second. We are using Sitecore 7.0. We have tried a 1 Webserver + 1 Database server deployment, but it only processes 20-25 requests per second. Web server queues up all the other requests in the Memory. As we increase the load, memory fills up.(We did all Sitecore performance enhancements recommended). We need 4X performance to reach the goal :).
Will it be possible to achieve this goal by upgrading the existing server, or do we have to add more web servers in production environment.
Note: We are using Lucene indexing as well.
Here are some things you can consider without changing overall architecture of your deployment
CDN to offload media and static asset requests
This leaves your content delivery server available to handle important content queries and display logic.
Example www.cloudflare.com
Configure and use Sitecore's built-in caching
This is from the guide:
Investigation and configuration of the Sitecore Caches is broken down
into multiple tasks. This way each task is more focused and
simplified. The focus is on configuration and tuning of the Sitecore
Database Caches (prefetch, data, and item caches.)
For configuration
of the output rendering caching properties, the customer should be
made aware of both the Sitecore Cache Configuration Reference and the
Sitecore Presentation Component Reference as to how properly enable
and the properties to expire these caches.
Check out the Sitecore Tuning Guide
Find Slow Queries or Controls
It sounds like your application follows Sitecore best practices, but I leave this note in for anyone that might find this answer. Use Sitecore's built-in Debug mode to identify the slowest running controls and sublayouts. Additionally, if you have Analytics set up there is a "Slow Pages" report that might give you some information on where your application is slowing down.
Those things being said, if you're prepared to provision additional servers and set up a load-balanced environment then read on.
Separate Content Delivery and Content Management
To me the first logical step before load-balancing content delivery servers is to separate the content management from the equation. This is pretty easy and the Scaling Guide walks you through getting the HistoryEngine set up to keep those Lucene indexes up to date.
Set up Load Balancer with 2 or more Content Delivery servers
Once you've done the first step this can be as easy as cloning your content delivery server and adding it to your load balancer "pool". There are a couple of things to consider here like: Does your web application allow users to log in? So you'll need to worry about sticky sessions or machine keys. Does your web application use file media instead of blob media? I haven't had to deal with this, but I understand that's another consideration.
Scale your SQL solution
I've seen applications with up to four load balanced content delivery servers and the SQL Server did not have a problem - I think this will be unique to each case depending on a lot of factors: horsepower and tuning of SQL Server, content model of your application, complexity of your queries, caching configuration on content delivery servers, etc. Again, the Scaling Guide covers SQL Mirroring and Failover, so that is going to be your first stop on getting that going.
Finally, I would say contact Sitecore. These guys have probably seen more of what's gone right and what's gone wrong with installations and could get you on the right path. Good luck!
This answer written from a Sitecore developer perspective:
Bottom line: You need to figure out exactly where your performance bottleneck is. That is going to take some digging, but will be very worthwhile. You should definitely be able to serve 60-80 requests/s without any trouble... but of course that makes a lot of assumptions about the nature of your site and the requests.
For my site, I found Sitecore's caching implementation to be sub-par... I created some very simple and aggressive application-specific caches in my app and this made all the difference in the world. For instance, we have 900+ "Partner" items where our sites' advertisements live... and simply putting all these objects in an array in the Application object sped up page requests significantly. Finding an object in a Hashtable indexed by its Item.Name or ID is going to be a lot faster than Sitecore.Context.Database.GetItem("/itempath") or a SelectItems() call (at least, that's my experience). If your architecture and data set will allow this strategy, we've had good experience with it.
Another thing to watch out for is XSLT renderings. Personally, I avoid them completely in favor of ASP.NET UserControls. The XSLT rendering is just slow. As much as 10x slower than a native UserControl rendering the same HTML. So if you have a few of these... replace with some custom code and you'll see a world of difference.

Are web services processed sequentially or in parallel?

I am just getting started in web services using Lotus Notes. What I would like to be able to do is to create a web service that generates a sequential number. The code to generate the number is based on existing code we have used for some time within our databases (just straight lotus script, no web services). Basically there is a document that stores the next number, the next number is returned and is updated for the next call save conflicts are detected and the number is tried again if there was a issue saving the number.
I thought I might use a web service for to generate the number. So are web services processed sequentially or in parallel? Because if they are serial then I won't need to deal with two people trying to save the number at the same time.
Web services are a way for two systems to communicate with each other where they would not have a common language.
For example LotusScript agent connecting to a .Net server.
When creating a web service provider (server) on Domino you can code it in LotusScript or Java. The server then provides a WSDL file for the consumer (client) to write the code required to talk to that web service.
This tutorial should explain it better for you:
http://www-10.lotus.com/ldd/ddwiki.nsf/dx/Creating_your_first_Web_Service_provider_and_consumer_in_LotusScript_and_Java.
Now as for Domino. Web services run in order they are requested from the server. However there is no control to say "Don't start until Webservice X has finished".
You could also code this into an application but run the serious risk of deadlocks of memory/performance issues for other users unless you counter for that.
The Domino server can also be set to not run web services/agents in parallel. But again you risk the same issues.
If it is a unique ID then you could go by the UNID of the document you create from the web service. Or you can use #UNIQUE via an evaluate, but both only return text.
http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/topic/com.ibm.designer.domino.main.doc/H_UNIQUE.html
From the Lotus Designer Documentation:
To enable concurrent Web services on a server, you must enable concurrent Web Agents on that server. Open the Server document you want to edit. Click the Internet Protocols - Domino Web Engine tab. Enable Run Web Agents concurrently.
The maximum number of concurrent Web service calls is determind by "Max concurrent agents"-setting. From the Lotus Administration Documentation:
Max concurrent agents Specifies the number of agents allowed to run concurrently. Valid values are 1 through 10. Default values are 1 for daytime and 2 for nighttime. Enabling a higher number of concurrent agents can relieve a heavily loaded Agent Manager, but also reduces the resources available to run other server tasks.
Lotus Notes Domino Version 8.5.x
Yes web services Will run in parrallel. But since you wrote that your code deals with save conflict, you should NOT have problem.
As in standard notes calls by 2 users: the 1st get the doc then the 2nd get the doc and save (speedy two) then first will get save conflict.
In conclusion yes it's parallel BUT it's not a problem.
I would have thought that they would by default run sequentially as asynchronous web agents is off unless you switch it on. So although it's a good design pattern to do 'safe' sequentially number if you only allocate a number via the web service and you haven't changed the asynchronous setting then you'll be fine
Let me also add:
Employ document locking to assure number uniqueness in sequential document numbering solution
There is a simple solution that avoids synchronicity considerations.
You should generate a temporary number using #Unique, then use a scheduled agent to assign sequential numbers in order of document creation, selecting only unprocessed documents using a properly constituted view. If you're not concerned about the order in which documents were created and only concerned that all numbers are unique, a view is not necessary, and you can just trigger the agent on unprocessed documents.
The temporary number can be used for reference temporarily until a proper sequential number is assigned.
When the scheduled agent runs, it should send authors confirmation with the correct reference number.
Or, you could export to DXL and get the sequence= attribute of the tag. This only works if you're accessing a single instance of the database, though. And the DXL export/XML import is a huge amount of overhead.
Unfortunately, I can't see a way to easily get the sequence number of the note from LotusScript NotesDocument. If you have an active support contract, you could open a Problem Management Report for a software enhancement request ("APAR", in IBM's parlance, though I do not know what its acronym expands to).
Good luck!

Bare minimum for a Sitecore content delivery set-up

We currently have a single installation multi-site setup, hosted in Europe, and are looking to move content delivery for a single site to China. This is partly for SEO purposes and partly to improve content delivery performance there. Content management performance isn't an issue.
Given that we'll be having to transfer data between two separate hosting companies we'd like to limit both how much gets sent, and if possible not send any data we wouldn't be happy to publish.
We have Sitecore analytics enabled, so this might be a complicating factor.
I've read the scaling guide, which suggests we'll need a minimum of both web and core databases in the new CD environment. They do suggest that if there is no extranet security configured it is possible to do without the core database in a pure CD environment.
Does anyone have any experience with this? What are the benefits/pitfalls? What is the bare minimum installation we can get away with?
Edit: Sitecore.NET 6.4.1 (rev. 111003)
Like divamatrix said, knowing the version number is essential.
But even though the older versions can run without the Core, I would stick to an installation that includes the Core so you will have less trouble upgrading in the future.
What you need on the Content Delivery side is:
Web database
Core database
Analytics database
Then on the Content Management side you need your usual:
Master database
Web database
Core database
Analytics database
Then setup SQL replication between the Core databases.
Analytics can be configure to run reports using data from CD and store them on CM.
You also need to setup Web Deployment for file replication between the instances.
Besides all this you need some extra configuration as is explained in the Scaling Guide.
If you are not using Sitecore 6.4 or higher, I would recommend upgrading first. Once you got this setup properly it will work like a charm!
To answer your question, older versions of Sitecore worked without the Core database. You didn't say which version of Sitecore you're using, but if it's anything current, the answer is going to be that you need a web database and a core database. Also, having analytics enabled is definitely a consideration you need to look at. You should probably look at setting up an your analytics database local to your CD hosting as this database can see a lot of traffic depending on the traffic of your site. You can have publishing set up to either publish to a local web database and then replicate or you can just let publishing should handle the transfer of data between your CM and CD environment.