Sync Framework - Conflict Resolution Triggers Change, Resulting in Unnecessary Downloads - microsoft-sync-framework

I'm using Sync Framework v2.1 configured in a hub <--> spoke fashion.
Hub: SQL Server 2012 using SqlSyncProvider.
Spokes: LocalDb 2012 using SqlSyncProvider. Each spokes' database begins as a restored backup from the server, after which PostRestoreFixup is executed against it. In investigating this, I've also tried starting with an empty spoke database whose schema and data are created through provisioning and an initial, download-only sync.
Assume two spokes (A & B) and a central hub (let's call it H). They each have one table with one record and they're all in sync.
Spoke A changes the record and syncs, leaving A & H with identical records.
Spoke B changes the same record and syncs, resulting in a conflict with the change made in step #1. B's record is overwritten with H's, and H's record remains as-is. This is the expected/desired result. However, the SyncOperationStatistics returned by the orchestrator suggest changes are made at H. I've tried both SyncDirectionOrder directions, with these results:
- DownloadAndUpload (H's local_update_peer_timestamp and last_change_datetime are updated) -->
* Download changes total: 1
* Download changes applied: 1
* Download changed failed: 0
* Upload changes total: 1
* Upload changes applied: 1
* Upload changed failed: 0
- UploadAndDownload (H's local_update_peer_timestamp is updated)-->
* Upload changes total: 1
* Upload changes applied: 1
* Upload changed failed: 0
* Download changes total: 1
* Download changes applied: 1
* Download changed failed: 0
And, indeed, when Spoke A syncs again the record is downloaded from H, even though H's record hasn't changed. Why?
The problem arising from this is, for example, if Spoke A makes another change to the record between steps #2 and 3, that change will (falsely) be flagged as a conflict and will be overwritten at step #3.
Here's the pared-down code demonstrating the issue or, rather, my question. Note that I've implemented the provider's ApplyChangeFailed handlers such that the server wins, regardless of the SyncDirectionOrder:
private const string ScopeName = "TestScope";
private const string TestTable = "TestTable";
public static SyncOperationStatistics Synchronize(SyncEndpoint local,SyncEndpoint remote, EventHandler<DbSyncProgressEventArgs> eventHandler)
{
using (var localConn = new SqlConnection(local.ConnectionString))
using (var remoteConn = new SqlConnection(remote.ConnectionString))
{
// provision the remote server if necessary
//
var serverProvision = new SqlSyncScopeProvisioning(remoteConn);
if (!serverProvision.ScopeExists(ScopeName))
{
var serverScopeDesc = new DbSyncScopeDescription(ScopeName);
var serverTableDesc = SqlSyncDescriptionBuilder.GetDescriptionForTable(TestTable, remoteConn);
serverScopeDesc.Tables.Add(serverTableDesc);
serverProvision.PopulateFromScopeDescription(serverScopeDesc);
serverProvision.Apply();
}
// provision locally (localDb), if necessary, bringing down the server's scope
//
var clientProvision = new SqlSyncScopeProvisioning(localConn);
if (!clientProvision.ScopeExists(ScopeName))
{
var scopeDesc = SqlSyncDescriptionBuilder.GetDescriptionForScope(ScopeName, remoteConn);
clientProvision.PopulateFromScopeDescription(scopeDesc);
clientProvision.Apply();
}
// create\initialize the sync providers and go for it...
//
using (var localProvider = new SqlSyncProvider(ScopeName, localConn))
using (var remoteProvider = new SqlSyncProvider(ScopeName, remoteConn))
{
localProvider.SyncProviderPosition = SyncProviderPosition.Local;
localProvider.SyncProgress += eventHandler;
localProvider.ApplyChangeFailed += LocalProviderOnApplyChangeFailed;
remoteProvider.SyncProviderPosition = SyncProviderPosition.Remote;
remoteProvider.SyncProgress += eventHandler;
remoteProvider.ApplyChangeFailed += RemoteProviderOnApplyChangeFailed;
var syncOrchestrator = new SyncOrchestrator
{
LocalProvider = localProvider,
RemoteProvider = remoteProvider,
Direction = SyncDirectionOrder.UploadAndDownload // also issue with DownloadAndUpload
};
return syncOrchestrator.Synchronize();
}
}
}
private static void RemoteProviderOnApplyChangeFailed(object sender, DbApplyChangeFailedEventArgs e)
{
// ignore conflicts at the server
//
e.Action = ApplyAction.Continue;
}
private static void LocalProviderOnApplyChangeFailed(object sender, DbApplyChangeFailedEventArgs e)
{
// server wins, force write at each client
//
e.Action = ApplyAction.RetryWithForceWrite;
}
To reiterate, using this code along w/the configuration described at the outset, conflicting rows are, as expected, overwritten on the spoke containing the conflict and the server's version of that row remains as-is (unchanged). However, I'm seeing that each conflict results in an update to the server's xxx_tracking table, specifically the local_update_peer_timestamp and last_change_datetime fields. This, I'm guessing, results in a download to every other spoke even though the server's data hasn't really changed. This seems unnecessary and is, to me, counter-intuitive.

Related

Strange behavior trying to perform data migration in Dynamo DB

We're trying to make a simple data migration in one of our tables in DDB.
Basically we're adding a new field and we need to backfill all the Documents in one of our tables.
This table has around 700K documents.
The process we follow is quite simple:
Manually trigger a lambda that will scan the table and for each document, will update the document and continue doing the same til its close to the 15 minutes top, in that case
Puts LastEvaluatedKey into SQS to trigger new lambda execution that uses that key to continue scanning.
Process goes on spawining lambdas sequentially as needed until there are no more documents
The problem we found is as follows...
Once the migration is done we noticed that the number of documents updated is way lower than the total number of documents existing in that table. It's a random value, not the same always but it ranges from tens of thousands to hundreds of thousands (worst case we seen was 300K difference).
This is obviously a problem, because if we scan the documents again, it seems obvious some documents were not migrated. We thought at first this was because of some clients updating/inserting new documents but the throughput on that table is not that large that will justify such a big difference, so this is not that there are new documents being added while we run the migration.
We tried a second approach that was first scanning, because if we only scan, we noticed that number of scan documents == count of documents in table, so we tried to dump the IDs of the documents in another table, then scan that table and update those items again. Funny thing, same problem happens with this new table with just IDs, there are way less than the count in the table we want to update, thus, we're back to square one.
We thought about using parallel scans but I don't see how this could benefit plus I don't want to compromise reading capacity for the table while running the migration.
Anybody with experience in data migrations in DDB can shed some light here? We're not able to figure out what we're doing wrong.
UPDATE: Sharing the function that is triggered and actually scans and updates
#Override
public Map<String, AttributeValue> migrateDocuments(String lastEvaluatedKey, String typeKey){
LOG.info("Migrate Documents started {} ", lastEvaluatedKey);
int noOfDocumentsMigrated = 0;
Map<String, AttributeValue> docLastEvaluatedKey = null;
DynamoDBMapperConfig documentConfig = new DynamoDBMapperConfig.TableNameOverride("KnowledgeDocumentMigration").config();
if(lastEvaluatedKey != null) {
docLastEvaluatedKey = new HashMap<String,AttributeValue>();
docLastEvaluatedKey.put("base_id", new AttributeValue().withS(lastEvaluatedKey));
docLastEvaluatedKey.put("type_key",new AttributeValue().withS(typeKey));
}
Instant endTime = Instant.now().plusSeconds(840);
LOG.info("Migrate Documents endTime:{}", endTime);
try {
do {
ScanResultPage<Document> docScanList = documentDao.scanDocuments(docLastEvaluatedKey, documentConfig);
docLastEvaluatedKey = docScanList.getLastEvaluatedKey();
LOG.info("Migrate Docs- docScanList Size: {}", docScanList.getScannedCount());
docLastEvaluatedKey = docScanList.getLastEvaluatedKey();
LOG.info("lastEvaluatedKey:{}", docLastEvaluatedKey);
final int chunkSize = 25;
final AtomicInteger counter = new AtomicInteger();
final Collection<List<Document>> docChunkList = docScanList.getResults().stream()
.collect(Collectors.groupingBy(it -> counter.getAndIncrement() / chunkSize)).values();
List<List<Document>> docListSplit = docChunkList.stream().collect(Collectors.toList());
docListSplit.forEach(docList -> {
TransactionWriteRequest documentTx = new TransactionWriteRequest();
for (Document document : docList) {
LOG.info("Migrate Documents- docList Size: {}", docList.size());
LOG.info("Migrate Documents- Doc Id: {}", document.getId());
if (!StringUtils.isNullOrEmpty(document.getType()) && document.getType().equalsIgnoreCase("Faq")) {
if (docIdsList.contains(document.getId())) {
LOG.info("this doc already migrated:{}", document);
} else {
docIdsList.add(document.getId());
}
if ((!StringUtils.isNullOrEmpty(document.getFaq().getQuestion()))) {
LOG.info("doc FAQ {}", document.getFaq().getQuestion());
document.setTitle(document.getFaq().getQuestion());
document.setTitleSearch(document.getFaq().getQuestion().toLowerCase());
documentTx.addUpdate(document);
}
} else if (StringUtils.isNullOrEmpty(document.getType())) {
if (!StringUtils.isNullOrEmpty(document.getTitle()) ) {
if (!StringUtils.isNullOrEmpty(document.getQuestion())) {
document.setTitle(document.getQuestion());
document.setQuestion(null);
}
LOG.info("title {}", document.getTitle());
document.setTitleSearch(document.getTitle().toLowerCase());
documentTx.addUpdate(document);
}
}
}
if (documentTx.getTransactionWriteOperations() != null
&& !documentTx.getTransactionWriteOperations().isEmpty() && docList.size() > 0) {
LOG.info("DocumentTx size {}", documentTx.getTransactionWriteOperations().size());
documentDao.executeTransaction(documentTx, null);
}
});
noOfDocumentsMigrated = noOfDocumentsMigrated + docScanList.getScannedCount();
}while(docLastEvaluatedKey != null && (endTime.compareTo(Instant.now()) > 0));
LOG.info("Migrate Documents execution finished at:{}", Instant.now());
if(docLastEvaluatedKey != null && docLastEvaluatedKey.get("base_id") != null)
sqsAdapter.get().sendMessage(docLastEvaluatedKey.get("base_id").toString(), docLastEvaluatedKey.get("type_key").toString(),
MIGRATE, MIGRATE_DOCUMENT_QUEUE_NAME);
LOG.info("No Of Documents Migrated:{}", noOfDocumentsMigrated);
}catch(Exception e) {
LOG.error("Exception", e);
}
return docLastEvaluatedKey;
}
Note: I would've added this speculation as a comment but my reputation does not allow
I think the issue that you're seeing here could be caused by the Scans not being ordered. So as long as your Scan would be executed in a single lambda I'd expect to you see that everything was handled fine. However, as soon as you hit the runtime limit of the lambda & start a new one your Scan will essentially get a new "ScanID" which might come in a different order. Based on the different order you're now skipping a certain set of entries.
I haven't tried to replicate this behavior & sadly there is no clear indication in the AWS documentation whether a Scan Request can be created in a new Session/Application.
I think #Charles' suggestion might help you in this case as you can simply run the entire migration in one process.

Google Dataflow template job not scaling when writing records to Google datastore

I have a small dataflow job triggered from a cloud function using a dataflow template. The job basically reads from a table in Bigquery, converts the resultant Tablerow to a Key-Value, and writes the Key-Value to Datastore.
This is what my code looks like :-
PCollection<TableRow> bigqueryResult = p.apply("BigQueryRead",
BigQueryIO.readTableRows().withTemplateCompatibility()
.fromQuery(options.getQuery()).usingStandardSql()
.withoutValidation());
bigqueryResult.apply("WriteFromBigqueryToDatastore", ParDo.of(new DoFn<TableRow, String>() {
#ProcessElement
public void processElement(ProcessContext pc) {
TableRow row = pc.element();
Datastore datastore = DatastoreOptions.getDefaultInstance().getService();
KeyFactory keyFactoryCounts = datastore.newKeyFactory().setNamespace("MyNamespace")
.setKind("MyKind");
Key key = keyFactoryCounts.newKey("Key");
Builder builder = Entity.newBuilder(key);
builder.set("Key", BooleanValue.newBuilder("Value").setExcludeFromIndexes(true).build());
Entity entity= builder.build();
datastore.put(entity);
}
}));
This pipeline runs fine when the number of records I try to process is anywhere in the range of 1 to 100. However, when I try putting more load on the pipeline, ie, ~10000 records, the pipeline does not scale (eventhough autoscaling is set to THROUGHPUT based and maximumWorkers is specified to as high as 50 with an n1-standard-1 machine type). The job keeps processing 3 or 4 elements per second with one or two workers. This is impacting the performance of my system.
Any advice on how to scale up the performance is very welcome.
Thanks in advance.
Found a solution by using DatastoreIO instead of the datastore client.
Following is the snippet I used,
PCollection<TableRow> row = p.apply("BigQueryRead",
BigQueryIO.readTableRows().withTemplateCompatibility()
.fromQuery(options.getQueryForSegmentedUsers()).usingStandardSql()
.withoutValidation());
PCollection<com.google.datastore.v1.Entity> userEntity = row.apply("ConvertTablerowToEntity", ParDo.of(new DoFn<TableRow, com.google.datastore.v1.Entity>() {
#SuppressWarnings("deprecation")
#ProcessElement
public void processElement(ProcessContext pc) {
final String namespace = "MyNamespace";
final String kind = "MyKind";
com.google.datastore.v1.Key.Builder keyBuilder = DatastoreHelper.makeKey(kind, "root");
if (namespace != null) {
keyBuilder.getPartitionIdBuilder().setNamespaceId(namespace);
}
final com.google.datastore.v1.Key ancestorKey = keyBuilder.build();
TableRow row = pc.element();
String entityProperty = "sample";
String key = "key";
com.google.datastore.v1.Entity.Builder entityBuilder = com.google.datastore.v1.Entity.newBuilder();
com.google.datastore.v1.Key.Builder keyBuilder1 = DatastoreHelper.makeKey(ancestorKey, kind, key);
if (namespace != null) {
keyBuilder1.getPartitionIdBuilder().setNamespaceId(namespace);
}
entityBuilder.setKey(keyBuilder1.build());
entityBuilder.getMutableProperties().put(entityProperty, DatastoreHelper.makeValue("sampleValue").build());
pc.output(entityBuilder.build());
}
}));
userEntity.apply("WriteToDatastore", DatastoreIO.v1().write().withProjectId(options.getProject()));
This solution was able to scale from 3 elements per second with 1 worker to ~1500 elements per second with 20 workers.
At least with python's ndb client library it's possible to write up to 500 entities at a time in a single .put_multi() datastore call - a whole lot faster than calling .put() for one entity at a time (the calls are blocking on the underlying RPCs)
I'm not a java user, but a similar technique appears to be available for it as well. From Using batch operations:
You can use the batch operations if you want to operate on multiple
entities in a single Cloud Datastore call.
Here is an example of a batch call:
Entity employee1 = new Entity("Employee");
Entity employee2 = new Entity("Employee");
Entity employee3 = new Entity("Employee");
// ...
List<Entity> employees = Arrays.asList(employee1, employee2, employee3);
datastore.put(employees);

Meteor regex find() far slower than in MongoDB console

I've been researching A LOT for past 2 weeks and can't pinpoint the exact reason of my Meteor app returning results too slow.
Currently I have only a single collection in my Mongo database with around 2,00,000 documents. And to search I am using Meteor subscriptions on the basis of a given keyword. Here is my query:
db.collection.find({$or:[
{title:{$regex:".*java.*", $options:"i"}},
{company:{$regex:".*java.*", $options:"i"}}
]})
When I run above query in mongo shell, the results are returned instantly. But when I use it in Meteor client, the results take almost 40 seconds to return from server. Here is my meteor client code:
Template.testing.onCreated(function () {
var instance = this;
// initialize the reactive variables
instance.loaded = new ReactiveVar(0);
instance.limit = new ReactiveVar(20);
instance.autorun(function () {
// get the limit
var limit = instance.limit.get();
var keyword = Router.current().params.query.k;
var searchByLocation = Router.current().params.query.l;
var startDate = Session.get("startDate");
var endDate = Session.get("endDate");
// subscribe to the posts publication
var subscription = instance.subscribe('sub_testing', limit,keyword,searchByLocation,startDate,endDate);
// if subscription is ready, set limit to newLimit
$('#searchbutton').val('Searching');
if (subscription.ready()) {
$('#searchbutton').val('Search');
instance.loaded.set(limit);
} else {
console.log("> Subscription is not ready yet. \n\n");
}
});
instance.testing = function() {
return Collection.find({}, {sort:{id:-1},limit: instance.loaded.get()});
}
And here is my meteor server code:
Meteor.publish('sub_testing', function(limit,keyword,searchByLocation,startDate,endDate) {
Meteor._sleepForMs(200);
var pat = ".*" + keyword + ".*";
var pat2 = ".*" + searchByLocation + ".*";
return Jobstesting.find({$or:[{title:{$regex: pat, $options:"i"}}, { company:{$regex:pat,$options:"i"}},{ description:{$regex:pat,$options:"i"}},{location:{$regex:pat2,$options:"i"}},{country:{$regex:pat2,$options:"i"}}],$and:[{date_posted: { $gte : endDate, $lt: startDate }},{sort:{date_posted:-1},limit: limit,skip: limit});
});
One point I'd also like to mention here that I use "Load More" pagination and by default the limit parameter gets 20 records. On each "Load More" click, I increment the limit parameter by 20 so on first click it is 20, on second click 40 and so on...
Any help where I'm going wrong would be appreciated.
But when I use it in Meteor client, the results take almost 40 seconds to return from server.
You may be misunderstanding how Meteor is accessing your data.
Queries run on the client are processed on the client.
Meteor.publish - Makes data available on the server
Meteor.subscribe - Downloads that data from the server to the client.
Collection.find - Looks through the data on the client.
If you think the Meteor side is slow, you should time it server side (print time before/after) and file a bug.
If you're implementing a pager, you might try a meteor method instead, or
a pager package.

Sitecore Clear Cache Programmatically

I am trying to publish programmatically in Sitecore. Publishing works fine. But doing so programmatically doesn't clear the sitecore cache. What is the best way to clear the cache programmatically?
I am trying to use the webservice that comes with the staging module. But I am getting a Bad request exception(Exception: The remote server returned an unexpected response: (400) Bad Request.). I tried to increase the service receivetimeout and sendtimeout on the client side config file but that didn't fix the problem. Any pointers would be greatly appreciated?
I am using the following code:
CacheClearService.StagingWebServiceSoapClient client = new CacheClearService.StagingWebServiceSoapClient();
CacheClearService.StagingCredentials credentials = new CacheClearService.StagingCredentials();
credentials.Username = "sitecore\adminuser";
credentials.Password = "***********";
credentials.isEncrypted = false;
bool s = client.ClearCache(true, dt, credentials);
I am using following code to do publish.
Database master = Sitecore.Configuration.Factory.GetDatabase("master");
Database web = Sitecore.Configuration.Factory.GetDatabase("web");
string userName = "default\adminuser";
Sitecore.Security.Accounts.User user = Sitecore.Security.Accounts.User.FromName(userName, true);
user.RuntimeSettings.IsAdministrator = true;
using (new Sitecore.Security.Accounts.UserSwitcher(user))
{
Sitecore.Publishing.PublishOptions options = new Sitecore.Publishing.PublishOptions(master, web,
Sitecore.Publishing.PublishMode.Full, Sitecore.Data.Managers.LanguageManager.DefaultLanguage, DateTime.Now);
options.RootItem = master.Items["/sitecore/content/"];
options.Deep = true;
options.CompareRevisions = true;
options.RepublishAll = true;
options.FromDate = DateTime.Now.AddMonths(-1);
Sitecore.Publishing.Publisher publisher = new Sitecore.Publishing.Publisher(options);
publisher.Publish();
}
In Sitecore 6, the CacheManager class has a static method that will clear all caches. The ClearAll() method is obsolete.
Sitecore.Caching.CacheManager.ClearAllCaches();
Just a quick note, in Sitecore 6.3, that is not needed anymore. Caches are being cleared automatically after a change happens on a remote server.
Also, if you are on previous releases, instead of clearing all caches, you can do partial cache clearing.
There is a free shared source component called Stager that does that.
http://trac.sitecore.net/SitecoreStager
If you need a custom solution, you can simply extract the source code from there.
I got this from Sitecore support. It clears all caches:
Sitecore.Context.Database = this.WebContext.Database;
Sitecore.Context.Database.Engines.TemplateEngine.Reset();
Sitecore.Context.ClientData.RemoveAll();
Sitecore.Caching.CacheManager.ClearAllCaches();
Sitecore.Context.Database = this.ShellContext.Database;
Sitecore.Context.Database.Engines.TemplateEngine.Reset();
Sitecore.Caching.CacheManager.ClearAllCaches();
Sitecore.Context.ClientData.RemoveAll();
Out of the box solution provided by Sitecore to clean caches (ALL of them) is utilized by the following page: http://sitecore_instance_here/sitecore/admin/cache.aspx and code behind looks like the following snippet:
foreach (var cache in Sitecore.Caching.CacheManager.GetAllCaches())
cache.Clear();
Via the SDN:
HtmlCache cache = CacheManager.GetHtmlCache(Context.Site);
if (cache != null) {
cache.Clear();
}

Why does WebSharingAppDemo-CEProviderEndToEnd sample still need a client db connection after scope creation to perform sync

I'm researching a way to build an n-tierd sync solution. From the WebSharingAppDemo-CEProviderEndToEnd sample it seems almost feasable however for some reason, the app will only sync if the client has a live SQL db connection. Can some one explain what I'm missing and how to sync without exposing SQL to the internet?
The problem I'm experiencing is that when I provide a Relational sync provider that has an open SQL connection from the client, then it works fine but when I provide a Relational sync provider that has a closed but configured connection string, as in the example, I get an error from the WCF stating that the server did not receive the batch file. So what am I doing wrong?
SqlConnectionStringBuilder builder = new SqlConnectionStringBuilder();
builder.DataSource = hostName;
builder.IntegratedSecurity = true;
builder.InitialCatalog = "mydbname";
builder.ConnectTimeout = 1;
provider.Connection = new SqlConnection(builder.ToString());
// provider.Connection.Open(); **** un-commenting this causes the code to work**
//create anew scope description and add the appropriate tables to this scope
DbSyncScopeDescription scopeDesc = new DbSyncScopeDescription(SyncUtils.ScopeName);
//class to be used to provision the scope defined above
SqlSyncScopeProvisioning serverConfig = new SqlSyncScopeProvisioning();
....
The error I get occurs in this part of the WCF code:
public SyncSessionStatistics ApplyChanges(ConflictResolutionPolicy resolutionPolicy, ChangeBatch sourceChanges, object changeData)
{
Log("ProcessChangeBatch: {0}", this.peerProvider.Connection.ConnectionString);
DbSyncContext dataRetriever = changeData as DbSyncContext;
if (dataRetriever != null && dataRetriever.IsDataBatched)
{
string remotePeerId = dataRetriever.MadeWithKnowledge.ReplicaId.ToString();
//Data is batched. The client should have uploaded this file to us prior to calling ApplyChanges.
//So look for it.
//The Id would be the DbSyncContext.BatchFileName which is just the batch file name without the complete path
string localBatchFileName = null;
if (!this.batchIdToFileMapper.TryGetValue(dataRetriever.BatchFileName, out localBatchFileName))
{
//Service has not received this file. Throw exception
throw new FaultException<WebSyncFaultException>(new WebSyncFaultException("No batch file uploaded for id " + dataRetriever.BatchFileName, null));
}
dataRetriever.BatchFileName = localBatchFileName;
}
Any ideas?
For the Batch file not available issue, remove the IsOneWay=true setting from IRelationalSyncContract.UploadBatchFile. When the Batch file size is big, ApplyChanges will be called even before fully completing the previous UploadBatchfile.
// Replace
[OperationContract(IsOneWay = true)]
// with
[OperationContract]
void UploadBatchFile(string batchFileid, byte[] batchFile, string remotePeer1
I suppose it's simply a stupid example. It exposes "some" technique but assumes you have to arrange it in proper order by yourself.
http://msdn.microsoft.com/en-us/library/cc807255.aspx