Sitecore EventQueue Table growing out of control

Sitecore EventQueue Table growing out of control - sitecore

We are having an issue with the EventQueue table growing very fast at times, up to 3k records a second, and never clearing records (30 million as of right now). Our environment has the following set up:
Sitecore 7.2
4 CD servers and 1 CM server
All four CD servers are load balanced.
CD1 and CD2 are pointed to DB1 server CD3 and CD4 are pointed to DB2
server There are 2 Publishing targets (one for each DB) Merge
Replication is setup for the Core db across all servers (CM, CD's)
EventQueue is enabled
I have a few questions so I will break them down into separate line items.
When a publish is issued for all CD servers is the updated content sent directly from the CM db to the CD db's (all of the correct tables) or is it sent to the EventQueue table in the CD db and the CD server has a job/task that looks at the table and updates as needed.
Depending on answer to the first question, if there are 2 CD servers pointing to the same DB how do they know if they should process the EventQueue table (wont they each process the table and be duplicating efforts)
Why isn't the EventTable table cleared? How is is cleared, when is it cleared?

On CM publish, the publish request is sent to the EventQueue table on the CD db where it is processed as per the instance's publishing schedule.
The InstanceName column in the EventQueue table stores the unique name of each Sitecore instance (by default this is Machine Name + IIS Instance Name, but can be set in web.config). This enables events to be picked up by an individual CD instance in a load balanced environment.
The EventQueue table is cleared by a Sitecore task defined in the <scheduling> element in the web.config, although I've seen this misbehave in the past. By default, it is set as follows:
<agent type="Sitecore.Tasks.CleanupEventQueue, Sitecore.Kernel" method="Run" interval="04:00:00">
<DaysToKeep>1</DaysToKeep>
</agent>
I've previously run into high loads on the EventQueue and PublishQueue tables and would recommend trying the following (some of which were suggested from Sitecore support):
Reduce the interval of the CleanupEventQueue agent (above)
Reduce the DaysToKeep setting on the CleanupEventQueue (also CleanupPublishQueue wouldn't hurt)
Create a scheduled SQL job to run the clean up script outlined in the CMS Tuning Guide (Page 10: http://sdn.sitecore.net/upload/sitecore7/70/cms_tuning_guide_sc70-usletter.pdf)
Finally, from Sitecore support:
Sitecore recommends that the number of rows (entries) in the History, PublishQueue, and EventQueue tables would be less than 1000.

Related

Why does WSO2AM execute count query against mb_metadata table over and over again?

We have enabled advanced throttling for WSO2AM 2.6.0. Once this was enabled and the execution plans were appropriately created, we are noticing that over 35M select count queries per hour are executing against MB_METADATA table.
Also, MB_METADATA and MB_CONTENT table are constantly growing and the row count never goes down.
I have disabled all statistics as well as tracing. we have 4 WSO servers, each one running independently with the gateway, key manager, and traffic manager on the same box. The DB is oracle.
we are seeing this query run 35 million times / hr:
SELECT COUNT(MESSAGE_ID) AS count
FROM MB_METADATA
WHERE QUEUE_ID=:1
AND MESSAGE_ID BETWEEN :2 AND :3
AND DLC_QUEUE_ID=-1
I would expect the table sizes to be manageable and this query not be run at this high of a rate.
Any suggestions on what might be going on? may be a configuration that I need to disable?

Sharing the MB database is not correct. Each traffic manager node should have its own MB database, and it can be the default H2 one.
Quoted from docs:
Do not share the WSO2_MB_STORE_DB database among the nodes in an Active-Active set-up
or Traffic Manager HA scenario, because each node should have its own local WSO2_MB_STORE_DB
database to act as separate Traffic Managers.
The latter mentioned DBs can be either H2 DBs or any RDBMS such as MySQL.
If the database gets corrupted then you need to replace the database with a fresh database
that is available in the product distribution.
Ref: https://docs.wso2.com/display/AM260/Installing+and+Configuring+the+Databases

How AWS DMS works internally

In AWS DMS how does the migration happening internally? Is it like exporting entire data from source table and importing to destination table? Or is it like migrating table records one by one to destination table? I am new to aws dms and don't have much idea on how things work there.

AWS publish how DMS works in their documentation and blog posts. This is the list I wish I had when I started with DMS:
For a high level understanding see: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.html
A task can consist of three major phases:
The full load of existing data
The application of cached changes
Ongoing replication
During a full load migration, where existing data from the source is moved to the target, AWS DMS loads data from tables on the source data store to tables on the target data store. While the full load is in progress, any changes made to the tables being loaded are cached on the replication server; these are the cached changes.
...
When the full load for a given table is complete, AWS DMS immediately begins to apply the cached changes for that table. When all tables have been loaded, AWS DMS begins to collect changes as transactions for the ongoing replication phase. After AWS DMS applies all cached changes, tables are transactionally consistent. At this point, AWS DMS moves to the ongoing replication phase, applying changes as transactions.
From: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Components.html
Look at the headings:
Replication Tasks
Ongoing replication, or change data capture (CDC)
To gain a detailed understanding of how DMS works internally, read through the following blogs from AWS:
Debugging Your AWS DMS Migrations: What to Do When Things Go Wrong (Part 1)
Debugging Your AWS DMS Migrations: What to Do When Things Go Wrong (Part 2)
Debugging Your AWS DMS Migrations: What to Do When Things Go Wrong? (Part 3)
Finally, work through the blogs particular to your source and target databases at https://aws.amazon.com/blogs/database/category/migration/aws-database-migration-service-migration/

When I first used DMS I had same question. So simply I enabled Cloudwatch logs and created one migration task from Oracle to Aurora Postgresql.
First DMS task runs on Replication Instance and it connects to Source and Target databases.
RI then connect to Source database and based on selection rule it identifies tables and column details since it has lot of special access on Source and Target DB.
After that it start reading source table(s) in parallel and create Select col1, col2, col3.. from kind of query to fetch data from Source.
Then it write files in a temp location on RI based on tables, 1 file per table and approx 10000 rows in one commit.
While all this is happening another process is creating connection to Target DB and checking if Tables already exist if yes then it check which option we selected Do Nothing or Truncate Table etc.. Based on that it takes action.
Till now we have data from Source table in files on RI and connection and tables created on Target DB. Now RI just reads file records from RI temp location and create insert query.
Once last commit is successful it deletes the temp file from RI.
Once Source table and target table count is matched it closes connections in case of One time load.
In case of On going changes it keeps connection alive and read redo logs or other logs in Source db. Then follow same process mentioned above for CDC.

Here's a doc that provides some more information on how DMS Ongoing Replication works internally: https://aws.amazon.com/blogs/database/introducing-ongoing-replication-from-amazon-rds-for-sql-server-using-aws-database-migration-service/
The short of it is:
(following some initial steps) AWS DMS does not use any replication artifacts. When all the required information is available in the transaction log or transaction log backup, AWS DMS uses the fn_dblog() and fn_dump_dblog() functions to read changes directly from the transaction logs or transaction log backups using the log sequence number (LSN).

In addition to above answers, DMS uses Attunity underneath. There are public documents on how the later works in detail.

Configuring WSO2 STATS_DB

I have configured API Manager 2.0.0 & API Manager Analytics Pack to use MySQL databases.
For each server, there exists a WSO2AM_STATS_DB. I have given these differing names on my MySQL server. I have also pointed my datasources in master-datasources.xml(for APIM) & stats-datasources.xml(for Analytics) to the relevant databases.
I couldn't find any relevant schema(dbscripts) for these databases in their respective packs.
On running, the Analytics database is populated but the APIM database isn't and throws an exception. The Analytics database not only gets the schema but also the invocation details of my API.
I am unable to get the stats on my dashboard though.
Previously, I (unwittingly) configured the h2-repository stats database to be the same for both servers (due to the folder structure) and was able to get all the statistics on my dashboard in the publisher.
Other configurations I have tried :
On the MySQL Server, pointed it to the same database (the Analytics one with the schema) but with no results on my dashboard (after waiting for a while).

Both datasources (WSO2AM_STATS_DB) in 2 servers should be pointed to the same database. There are no database scripts for this. Tables are created automatically.
By default in both servers, Stats DB path comes like this. (note ../ part)
<url>jdbc:h2:../tmpStatDB/WSO2AM_STATS_DB;DB_CLOSE_ON_EXIT=FALSE;LOCK_TIMEOUT=60000;AUTO_SERVER=TRUE</url>
So if you extract both servers to the same directory as mentioned in this doc, both datasources will be pointing to the same database (inside tmpStatDB) like this.
/parent_dir
|__wso2am-2.0.0/
|__wso2am-analytics-2.0.0/
|__tmpStatDB/
So, what happens here is, wso2am-analytics writes stats data to shared database, then apim reads it and shows data on its databases.

Sitecore Publishing Problems and determining item state

Can anyone explain to me what state the data should be in for a healthy sitecore instance in each database?
for example:
We currently have an issue with publishing in a 2 server setup.
Our staging server hosts the SQL instance and the authoring / staging instance of sitecore.
We then have a second server to host just the production website for our corp site.
When I look in the master database the PublishQueue table is full of entries and the same table in the web database is empty.
Is this correct?
No amount of hitting publish buttons is changing that at the moment.
How do I determine what the state of an item is in both staging and production environments without having to write an application on top of the sitecore API which I really don't have time for?

This is a normal behavior for the Publish Queue of the Web Database to be blank. The reason is because changes are made on the Master database which will add an entry in the Publish Queue.
After publishing, the item will not be removed from the Publish Queue table. It is the job of the CleanupPublishQueue to cleanup the publish queue table.

In general, tables WILL be different between the two databases as they are used for different purposes. Your master database is generally connected to by authors and the publishing logic, while the web database is generally used as a holding place for the latest published version of content that should be visible.
In terms of debugging publishing, from the Sitecore desktop, you can swap between 'master' and 'web' databases in the lower right corner and use the Content Editor to examine any individual item. This is useful for spot checking individual items have been published successfully.
If an item is missing from 'web', or the wrong version is in 'web', you should examine the following:
Publishing Restrictions on the item: Is there a restriction applied to the item or version that prevents it from publishing at this time?
Workflow state: Is the item/version in the final approved workflow state? You can use the workbox to do a quick check for items needing approval.
Connection strings: Is your staging system connection strings setup to connect to the correct 'web' used by the production delivery server?

The Database table [PublishQueue] is a table where all save and other mutations are stored. This table is used by a Incremental Publish. Sitecore get all the items from the PublishQueue table that were modified more recently than the last incremental publish date. The PublishQueue tabel is not used by a full publish
So it is okay that this Table contain a lot of records on the Master. The web database has the same database scheme. (not the same data, web contain only one version of a item, optimize for performance) The PublishQueue on the web is Empty this is normal.
To Know the state of an item compair the master version with the web version, there can be more than 1 webdatabase, The master database do not know the state/version of the web database

Django App on Heroku

I've been struggling with the issue where I believe my account has been shutdown due to having too large of a table? Correct me if I'm wrong.
=== HEROKU_POSTGRESQL_OL (DATABASE_URL)
Plan: Dev
Status: available
Connections: 0
PG Version: 9.1.8
Created: 2013-01-06 18:23 UTC
Data Size: 11.8 MB
Tables: 15
Rows: 24814/10000 (Write access revoked)
Fork/Follow: Unsupported
I tried running
heroku pg:psql HEROKU_POSTGRESQL_OL
to look at the tables, but how do I determine which table has too many rows and is flooding my database inside psql?
Once, I do determine which table this is. Can I just go to heroku run manage.py shell and call Model_with_too_many_rows.delete.all() and my account will no longer be shutdown? Are there other steps that must be taken to have the smaller db register with heroku so that my write access will be returned?
Sorry, if these questions are trivial, but my understanding of SQL is limited.
EDIT: I also believe that there was a time where my database was flooded with entries, but I have since deleted them. Is there any command I can run to resize the databse to acknowledge that the number of rows have been reduced? Or does heroku do this automatically?

There may be a smarter way to check row count by table, but I use the pg-extras plugin and run pg:index_usage.
You will regain write access to your database within ~5 minutes of getting back down below the 10k row limit – Heroku will check this and update the limit automatically.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Sitecore EventQueue Table growing out of control - sitecore

Related

Why does WSO2AM execute count query against mb_metadata table over and over again?

How AWS DMS works internally

Configuring WSO2 STATS_DB

Sitecore Publishing Problems and determining item state

Django App on Heroku

Categories

Resources