Sitecore web database indexing lags behind - sitecore

We are using Sitecore 8 with Solrcloud and one of the things we have noticed is that anytime several hundred articles are published the web database indexing lags behind. There never seems to be an issue with the master database indexes.
For web database, the strategy being used is publishasync and for master database indexes it is syncmaster.
Are there any configuration changes that can be made to speed up the web indexing. There aren't really that many items that Sitecore needs to get backed up.
We have taken a few steps to rectify the web database situation -
1. Set all CD indexing to manual.
2. Turned off any unnecessary indexes.
Any help is appreciated.
Thanks

Related

Django with Heroku-PostgreSQL database horizontal scaling

The problem background
I am working on a Django project, which is using PostgreSQL and is hosted on Heroku(also using heroku-postgres). After some time, the amount of data becomes very big and that slows down the application.
I tried replication in order to read from multiple databases, that helped reducing the queue and connection time, but the big table is still the problem.
I can split the data based on group of users, different group do not need to interact with each other.
I have read into Sharding. But since we use heroku-postgres, it's hard to customize the sharding
So I have come up with 2 different ideas below
1. The app cluster with multi-db (Not allowed to embed image yet)
Please see the design here
We can use the middlewares and database-routers to achieve this
But not sure if this is friendly with django
2. The gateway app with multiple sub-apps (Not allowed to embed image yet)
Please see the design here
This require less effort than the previous design
Also possible to set region-based sub-apps in the future
My question is: which of the two is more django-friendly and better for scalability in the long run?

Sitecore Database Cleanup Fails

Sitecore 6.6
I'm speaking with Sitecore Support about this as well, but thought I'd reach out to the community too.
We have a custom agent that syncs media on the file system with the media library. It's a new agent and we made the mistake of not monitoring the database size. It should be importing about 8 gigs of data, but the database ballooned to 713 GB in a pretty short amount of time. Turns out the "Blobs" table in both "master" and "web" databases is holding pretty much all of this space.
I attempted to use the "Clean Up Databases" tool from the Control Panel. I only selected one of the databases. This ran for 6 hours before it bombed due to consuming all the available locks on the SQL Server:
Exception: System.Data.SqlClient.SqlException
Message: The instance of the SQL Server Database Engine cannot obtain a LOCK
resource at this time. Rerun your statement when there are fewer active users.
Ask the database administrator to check the lock and memory configuration for
this instance, or to check for long-running transactions.
It then rolled everything back. Note: I increased the SQL and DataProvider timeouts to infinity.
Anyone else deal with something like this? It would be good if I could 'clean up' the databases in smaller chunks to avoid overwhelming the SQL Server.
Thanks!
Thanks for the responses, guys.
I also spoke with support and they were able to provide a SQL script that will clean the Blobs table:
DECLARE #UsableBlobs table(
ID uniqueidentifier
);
INSERT INTO
#UsableBlobs
select convert(uniqueidentifier,[Value]) as EmpID from [Fields]
where [Value] != ''
and (FieldId='{40E50ED9-BA07-4702-992E-A912738D32DC}' or FieldId='{DBBE7D99-1388-4357-BB34-AD71EDF18ED3}')
delete top (1000) from [Blobs]
where [BlobId] not in (select * from #UsableBlobs)
The only change I made to the script was to add the "top (1000)" so that it deleted in smaller chunks. I eventually upped that number to 200,000 and it would run for about an hour at a time.
Regarding cause, we're not quite sure yet. We believe our custom agent was running too frequently, causing the inserts to stack on top of each other.
Also note that there was a Sitecore update that apparently addressed a problem with the Blobs table getting out of control. The update was 6.6, Update 3.
I faced such a problem previously, and we had contacted Sitecore Support.
They gave us a Sitecore Support DLL, and suggessted a Web.Config change for Dataprovider -- from main type="Sitecore.Data.$(database).$(database)DataProvider, Sitecore.Kernel" to the new one.
The reason I am posting on this question of yours is that because the most of the time taken for us was in Cleaning up Blobs and and they gave us this DLL to increase Cleanup Blobs speed. So I think it might help you too.
Hence, I would like to suggest if you could please request Sitecore Support in this case, I am sure you might get the best solution to solve your case.
Hope this helps you!
Regards,
Varun Shringarpure
If you have a staging environment I would recommend taking a copy of database and try to shrink the database. Part of the database size might also be related to the transaction log.
if you have a DBA please have him (her) involved.

Sitecore performance enhancements

We need our Sitecore web application to process 60-80 web requests per second. We are using Sitecore 7.0. We have tried a 1 Webserver + 1 Database server deployment, but it only processes 20-25 requests per second. Web server queues up all the other requests in the Memory. As we increase the load, memory fills up.(We did all Sitecore performance enhancements recommended). We need 4X performance to reach the goal :).
Will it be possible to achieve this goal by upgrading the existing server, or do we have to add more web servers in production environment.
Note: We are using Lucene indexing as well.
Here are some things you can consider without changing overall architecture of your deployment
CDN to offload media and static asset requests
This leaves your content delivery server available to handle important content queries and display logic.
Example www.cloudflare.com
Configure and use Sitecore's built-in caching
This is from the guide:
Investigation and configuration of the Sitecore Caches is broken down
into multiple tasks. This way each task is more focused and
simplified. The focus is on configuration and tuning of the Sitecore
Database Caches (prefetch, data, and item caches.)
For configuration
of the output rendering caching properties, the customer should be
made aware of both the Sitecore Cache Configuration Reference and the
Sitecore Presentation Component Reference as to how properly enable
and the properties to expire these caches.
Check out the Sitecore Tuning Guide
Find Slow Queries or Controls
It sounds like your application follows Sitecore best practices, but I leave this note in for anyone that might find this answer. Use Sitecore's built-in Debug mode to identify the slowest running controls and sublayouts. Additionally, if you have Analytics set up there is a "Slow Pages" report that might give you some information on where your application is slowing down.
Those things being said, if you're prepared to provision additional servers and set up a load-balanced environment then read on.
Separate Content Delivery and Content Management
To me the first logical step before load-balancing content delivery servers is to separate the content management from the equation. This is pretty easy and the Scaling Guide walks you through getting the HistoryEngine set up to keep those Lucene indexes up to date.
Set up Load Balancer with 2 or more Content Delivery servers
Once you've done the first step this can be as easy as cloning your content delivery server and adding it to your load balancer "pool". There are a couple of things to consider here like: Does your web application allow users to log in? So you'll need to worry about sticky sessions or machine keys. Does your web application use file media instead of blob media? I haven't had to deal with this, but I understand that's another consideration.
Scale your SQL solution
I've seen applications with up to four load balanced content delivery servers and the SQL Server did not have a problem - I think this will be unique to each case depending on a lot of factors: horsepower and tuning of SQL Server, content model of your application, complexity of your queries, caching configuration on content delivery servers, etc. Again, the Scaling Guide covers SQL Mirroring and Failover, so that is going to be your first stop on getting that going.
Finally, I would say contact Sitecore. These guys have probably seen more of what's gone right and what's gone wrong with installations and could get you on the right path. Good luck!
This answer written from a Sitecore developer perspective:
Bottom line: You need to figure out exactly where your performance bottleneck is. That is going to take some digging, but will be very worthwhile. You should definitely be able to serve 60-80 requests/s without any trouble... but of course that makes a lot of assumptions about the nature of your site and the requests.
For my site, I found Sitecore's caching implementation to be sub-par... I created some very simple and aggressive application-specific caches in my app and this made all the difference in the world. For instance, we have 900+ "Partner" items where our sites' advertisements live... and simply putting all these objects in an array in the Application object sped up page requests significantly. Finding an object in a Hashtable indexed by its Item.Name or ID is going to be a lot faster than Sitecore.Context.Database.GetItem("/itempath") or a SelectItems() call (at least, that's my experience). If your architecture and data set will allow this strategy, we've had good experience with it.
Another thing to watch out for is XSLT renderings. Personally, I avoid them completely in favor of ASP.NET UserControls. The XSLT rendering is just slow. As much as 10x slower than a native UserControl rendering the same HTML. So if you have a few of these... replace with some custom code and you'll see a world of difference.

Bare minimum for a Sitecore content delivery set-up

We currently have a single installation multi-site setup, hosted in Europe, and are looking to move content delivery for a single site to China. This is partly for SEO purposes and partly to improve content delivery performance there. Content management performance isn't an issue.
Given that we'll be having to transfer data between two separate hosting companies we'd like to limit both how much gets sent, and if possible not send any data we wouldn't be happy to publish.
We have Sitecore analytics enabled, so this might be a complicating factor.
I've read the scaling guide, which suggests we'll need a minimum of both web and core databases in the new CD environment. They do suggest that if there is no extranet security configured it is possible to do without the core database in a pure CD environment.
Does anyone have any experience with this? What are the benefits/pitfalls? What is the bare minimum installation we can get away with?
Edit: Sitecore.NET 6.4.1 (rev. 111003)
Like divamatrix said, knowing the version number is essential.
But even though the older versions can run without the Core, I would stick to an installation that includes the Core so you will have less trouble upgrading in the future.
What you need on the Content Delivery side is:
Web database
Core database
Analytics database
Then on the Content Management side you need your usual:
Master database
Web database
Core database
Analytics database
Then setup SQL replication between the Core databases.
Analytics can be configure to run reports using data from CD and store them on CM.
You also need to setup Web Deployment for file replication between the instances.
Besides all this you need some extra configuration as is explained in the Scaling Guide.
If you are not using Sitecore 6.4 or higher, I would recommend upgrading first. Once you got this setup properly it will work like a charm!
To answer your question, older versions of Sitecore worked without the Core database. You didn't say which version of Sitecore you're using, but if it's anything current, the answer is going to be that you need a web database and a core database. Also, having analytics enabled is definitely a consideration you need to look at. You should probably look at setting up an your analytics database local to your CD hosting as this database can see a lot of traffic depending on the traffic of your site. You can have publishing set up to either publish to a local web database and then replicate or you can just let publishing should handle the transfer of data between your CM and CD environment.

Django redundancy and replication over two VPS accounts

I'm slowly getting into the position where one of my Django sites needs some robustness behind it. I'd currently running on a single VPS on a SQLite database with memcached.. It's about as un-scaled as things can get.
If I bought another VPS account, what would I want to do?
Move to MySQL/PostgreSQL with replication? What's easiest? Does replication protect me from one server exploding? Are there concurrency downsides?
How do I load-balance between the two servers?
I'd put memcached on the new server too. If I put both IPs into the configuration, would that keep a copy of data on both servers? (I'm thinking of what happens to session data - currently stored in memcached)
I'm currently using Cherokee as the httpd - I'm sure this has its own set of issues. If you've any tips, let me know.
Am I going at this the wrong way? Is there an easier way to have faster, more robust django sites?
First step: switch from SQLite to a real production database (I like Postgres). This should happen long before you even think about a second VPS. SQLite essentially does not support concurrency at all. Personally, I wouldn't even consider deploying a live site on SQLite in the first place.
If your site is running on SQLite and is functioning, my guess is you are still quite a long ways from actually outgrowing your single VPS (unless it's already heavily loaded otherwise).
If/when you do need to add a second server, how you configure things depends on where you're actually seeing a bottleneck. Chances are it'll be the database, in which case a good step might be simply moving the database onto its own server (presuming you can guarantee low latency between the two VPSes) and loading the database server with as much RAM as you can afford. In general disk performance suffers most in a VPS, so another step to consider might be putting the DB onto raw metal.
I'd probably look at those steps before I'd think about DB replication or multiple web-tier servers, but it really depends on profiling your actual case (and how you value performance vs reliability).
Watching the Django Deployment Workshop by Jacob Kaplan-Moss should give you a good overview.
MySQL supports Master-Slave and Master-Master setups I don't use PostgreSQL.
You can use nginx as your loadbalancer, HAProxy is an option, too (SO use it).
Memcached distributes the objects over the servers, If one crashes the data is lost.
I don't know Cherokee, but nginx is great.