Cloning to create separate "same-langugage" sites for different countries - sitecore

I would like to use the cloning feature of Sitecore to create sites with similar content to different countries. Mostly, it would be useful if changes to master site (Russian) would be automatically reflected on the sub-sites.
The first problem I have is that I seem to get language versions of all the items (German, English), not just Russian, when I create the clone
The second problem is that the clone does not have items in the targets I want, e.g. Belorussian, so do I have to create it manually?
This seems like the kind of situation where cloning would be useful, but I am wondering if Sitecore architecture prevents me from actually using it?

We have the same architecture you described for almost every of our sites. We have a "Master"-site with all overall content. This site has no <site>-configuration. For every country site we create a clone of this content tree and add a corresponding <site>-configuration. Of course we add there some country-specific content:
Master
- Home
- Sitemap
- Error
- ...
Germany [Clone]
- Home [Clone]
- Sitemap [Clone]
- Error [Clone]
- Our Office
- ...
USA [Clone]
- Home [Clone]
- Sitemap [Clone]
- Error [Clone]
- Differences
- ...
This works fine, but has two drawbacks (as you also mentioned):
The items in the master have to have a version in each language available in any of the country sites (so yes, the author may have to create the language version in the master-site and clone the item afterwards)
The clones then have a version in each language, also if the language is not used in the country site
For the second point we've added a new parameter to the <site>-configurion called "availableLanguages". If the user requested i.e. an item in the "Russian"-language on "USA" (which is not available), we show a 404 error to the user. We make this possible by using the Sitecore Error Manager module, which also covers this parameter.
As said before we use this in almost every website we have and it works very well. It's also easy to understand for the authors.

Related

How to change Sitecore Template field Item Id without data loss?

I recently noticed there is a difference in Item Id for a Sitecore template field between 2 environments (Source and Target). Due to this, any data changes to the field value for the dataitem using the template is not reflecting to target Sitecore database.
Hence, we manually copy the value from source to target and which takes lot of time to sync the 2 environments. Any idea how to change the template field Item Id in Sitecore without data loss in target instance?
Thanks
The template fields have most likely been created manually on the different servers, as #AdrianIorgu has suggested. I am going to suggest that you don't worry about merging fields and tools.
What you really care about is the content on the PRODUCTION instance of your site (assuming that this is Target). In any other environment, content should be regarded throwaway.
With that in mind, create a package of the template from your PRODUCTION instance and the install that in the other environments, deleting the duplicate field from the Source instance. The GUIDs of the field should now match across all environments. Check this into your source control (using TDS or Unicorn or whatever). You can then correctly update any standard values and that will be reflect through the server when you deploy again.
If your other environments (dev/qa/pre-prod) result in data loss for that field then don't worry about it, restore a backup from PROD.
Most likely that happened because the field or the template was added manually on the second environment, without migrating the items using packages, serialization or a third-party tool like TDS or Unicorn.
As #SitecoreClimber mentioned above, you can use Razl to sync the two environments and see the differences, but I don't think you will be able to change the field's GUID, to have the two environments consistent, without any data loss. Depending on the volume of your data, fixing this can be tricky.
What I would do:
make sure the target instance has the right template by installing a package with the correct template from source (with a MERGE-MERGE operation), which will end up having a duplicate field name
write a SQL query to get a list of all the items that have value for that field and update the value to the new field
Warning: this SQL query below is just a sample to get you started, make sure you extend and test this properly before running on a CD instance
use YOUR_DATABASE
begin tran
Declare #oldFieldId nvarchar(100), #newFieldId nvarchar(100), #previousValue nvarchar(100), #newValue nvarchar(100)
set #oldFieldID = '75577384-3C97-45DA-A847-81B00500E250' //old field ID
set #newFieldID = 'A2F96461-DE33-4CC6-B758-D5183676509B' //new field ID
/* versionedFields */
Select itemId, fieldid, value
from [dbo].[versionedFields] f with (nolock)
where f.FieldId like #oldFieldID
For this kind of stuff I sugest you to use Sitecore Razl.
It's a tool for comparing and merging sitecore databases.
Razl allows developers to have a complete side by side comparison between two Sitecore databases; highlighting features that are missing or not up to date. Razl also gives developers the ability to simply move the item from one database to another.
Whether it's finding that one missing template, moving your entire database or just one item, Razl allows you to do it seamlessly and worry free.
It's not a free tool, you can check here how you can buy it:
https://www.razl.net/purchase.aspx

Sitecore updates a lot of items on smart publish

I'm using Sitecore 8.0 Update 5.
Each time I'm doing a "Smart" publish with languages other than English I can see that thousands of items being updated.
Job started: Publish to 'web'
Items created: 0
Items deleted: 0
Items updated: 56207
Items skipped: 13057
Job ended: Publish to 'web' (units processed: 69258)
I've enabled tracing and in the logs I can see that Sitecore updateds shared fields of those items
##Publish Item: Name=sitecore, Uri=sitecore://master/{11111111-1111-1111-1111-111111111111}?lang=zh&ver=1, Operation=Updated, ChildAction=Allow, Explanation=Shared fields were published.
##Publish Item: Name=templates, Uri=sitecore://master/{3C1715FE-6A13-4FCF-845F-DE308BA9741D}?lang=zh&ver=1, Operation=Updated, ChildAction=Allow, Explanation=Shared fields were published.
##Publish Item: Name=List Manager, Uri=sitecore://master/{D2833213-CB77-431A-9108-55E62E4E47FD}?lang=zh&ver=1, Operation=Updated, ChildAction=Allow, Explanation=Shared fields were published.
And the list goes on like that for pretty much every item in the tree.
Armed with dotPeek I was able to find a method in publishing pipeline that is responsible for determining publish action:
private void HandleSourceVersionNotFound(Item sourceItem, PublishItemContext context)
{
Assert.ArgumentNotNull((object) sourceItem, "sourceItem");
Item targetItem = context.PublishHelper.GetTargetItem(sourceItem.ID);
if (targetItem != null)
{
Item[] versions = targetItem.Versions.GetVersions(true);
if (versions.Length > 0 && versions.Any(v => v.Language != sourceItem.Language)) || Settings.Publishing.PublishEmptyItems)
context.Action = PublishAction.PublishSharedFields;
else
context.Action = PublishAction.DeleteTargetItem;
}
else if (Settings.Publishing.PublishEmptyItems)
context.Action = PublishAction.PublishSharedFields;
else
context.AbortPipeline(PublishOperation.Skipped, PublishChildAction.Skip, "No publishable source version exists (and there is no target item).");
}
Here we can see that it checks item versions and if there are versions on language other than English it sets action to PublishAction.PublishSharedFields.
Settings.Publishing.PublishEmptyItems is set to false in my case, so this should not trigger shared fields publish.
I thought that the only "system" items with non-English version in my solution were languages, but when I looked on one of the item from the logs I discovered a really interesting thing:
Sitecore default languages
These appears to be Sitecore "default" languages.
This behavior causes a performance problem with publishing when I enable Language Fallback module. (https://marketplace.sitecore.net/en/modules/language_fallback.aspx)
My questions are:
Is this an expected behavior for Sitecore to push shared fields each time when you publish?
Is this an expected behavior for Sitecore to push shared fields on system items when they have versions only on these default languages?
How can I disable those default languages and remove versions on these languages? (Powershell?)
What are the implications of removing these default languages?
Is there anything that I'm doing wrong that can cause this kind of behavior?
UPD. On a different environment where it goes over 100k items threshold and triggers full index rebuild, which is pretty expensive operation. (With or without language fallback)
Thanks in advance!
How can I disable those default languages and remove versions on these languages?
If the language is registered under /sitecore/system/Languages, Sitecore should remove all item version, for the language, if you remove the language.
If the language is not registered (or Sitecore, for some reason, fails to remove it when you remove it) under /sitecore/system/Languages, do a database cleanup (Control Panel > Database > Clean Up Databases). Sitecore will remove any version of items that are in a language that is not registered.
What are the implications of removing these default languages?
Unless you plan on using those languages in the future, no real implication.
Is this an expected behavior for Sitecore to push shared fields each time when you publish?
=> No but the code you've found isn't used for every item. That action occurs when the Source Version isn't found ea the item exists in web but not in master. That's why it's either publish shared fields (in case other versions do exist) or delete the item completely from web (because it no longer exists in master).
Is this an expected behavior for Sitecore to push shared fields on system items when they have versions only on these default languages?
=> I'm unable to answer this question as is without deeper investigation for which I do not have time at this point I'm sorry.
How can I disable those default languages and remove versions on these languages? (Powershell?)
=> These languages aren't registered, they're showing up because these items do have a version in those languages. So if you remove the versions of all system items in those languages they'll no longer show up. You can create a version in any language even a non-existing one through code.
What are the implications of removing these default languages?
=> Any content editors that are using that language might suddenly see english (or a leftover language) instead of their preferred set Sitecore Editing language. Beyond that it should not matter.
Is there anything that I'm doing wrong that can cause this kind of behavior?
=> Not that I can see from what you've presented thus far.

Using non-trivial REST API and Ember

We're new to Ember, and our intended (ember-cli) app first works by opening a project (which we can think of a JSON object), and then acting on various sections of that project with various functions. We have this "pick your project first" approach neatly encapsulated in a Django REST api structure, e.g.
/projects/ lists all projects
/projects/1/ gives information about project 1
/projects/1/sectionA/ list all elements in sectionA of project 1
/projects/1/sectionA/2/ gives information about element 2 of sectionA in project 1
/projects/1/sectionA/2/sectionB/... and so on.
We made relatively good progress with the first two points in Ember using ember-data and this.store('project').find(...) etc. However, we've come unstuck trying to add further to our url (e.g. points 3., 4., and 5.). I believe our issues come from routing and handling multiple models (e.g. project and sectionA).
The question: what is the best way to structure the routes in Ember.js to match a non-trivial REST API, and use ember-data similarly?
Comments:
the "Ember way", and stuff working out of the box is preferred. Custom adapters and .getJSON might work, but we're not sure if we'll then lose out on what Ember offers.
we want the choice of project to affect the main app template. E.g. if a project does not have "sectionA", then a link to "sectionA" is not displayed in the main app. And, if the project does have "sectionA", we need the link to be to e.g. "/project/1/sectionA", i.e. dependant on the project open.
This seems similar to handling users (i.e. first I must "pick a user" and then continue), where the problem is solved outside of the URL (and is similar to using sessions as we have done in the past). However, we specifically want the project ID to be inside the URL, to remain stateless.
Bonus questions (if relevant):
how would we structure the models? Do we need to use hasMany/belongsTo and, if so, is this equivalent to just loading the whole project JSON in the first place?
can ember-data handle such complex requests? I.e. "give me item 2 from sectionA of project 1"? Can it do this "in one go", or do there have to be nested queries (i.e. "first give me project 1" and then from this "give me sectionA" and then from this "give me item 1")?
Finally, apologies if this is documented well somewhere. We've spent nearly a week trying to figure this out and have tried our best to find resources -- it's possible we just don't know what we're looking for.
I think this one will be a good thing to read: discuss.emberjs.com/t/… - you've got Tom Dale and Stefan Penner involved in the thread
My suggestion would be to change it to query params:
/projects?id=1&selectionA=a&selectionB=b
then, you won't have such problems. And yes, you can still use all the hasMany and belongsTo fields.
If there's anything unclear, I'll provide you with a longer answer (if I'm able to).
Check out ember-api-actions and ember-data-actions also ember-data-url-templates
Here's a few more resources from a blog I found. ember-data-working-with-custom-api-endpoints and ember-data-working-with-nested-api-resources

Google App Engine - Add & Reflect pattern, working around eventual consistency

I'm building a web application on app engine.
In my case, that's built on django-nonrel, but the key point is that it's using Google's datastore.
I love the fact that I don't need to deal with replication, sharding, backups and such, but one thing that is always getting in my way is the eventual consistency, which seems to get in the way of implementing a common web app pattern which I'm calling "Add & Reflect".
Let's say I have a project management app. The Project is its central model.
Now there's a web page page where I see a list of all projects, can add a project, and then I'll reflect back the list of all projects, which should include the project I just added (assuming no errors).
So the pattern goes like this:
Get and display list of existing projects
User adds new project (using a form on that page)
New project is created
As a response, get and display list of existing projects (now includes the new project)
Now the thing is, that due to eventual consistency, there is no guarantee whatsoever that I will get that new project when I get a list of all projects right after adding a new project.
Now that would be fine if this momentary inconsistency happened when another request (e.g. another user: user B) requested the list of projects one second after the project was added by the first user (user A), but it's really a problem when user A performs an operation, and does not see the results of his action, therefore does not get feedback.
I have gotten used to doing something like this to work around this problem:
def create_project(request):
response_context = {}
new_project = Project(name=request.POST['name'])
project.save()
response_context['projects'] = Project.get_serialized_projects()
# on GAE, eventual consistency means we are not guaranteed to see the
# new projects while querying for all projects, therefore we might need
# to add it manually...
if project.serialize() not in response_context['projects']:
response_context['projects'].append(project.serialize())
return render('projects.html', response_context)
The problem is that this happens in many places in my code, so I'm thinking maybe I'm missing something there, since this pattern is such a basic web app pattern.
Any suggestions for other ways to handle this?
Yes its a common issue. No theres no magic fix.
From client-side once you know the commit succeeded you can save the item locally (globals or storage) and then when querying from datastore merge your saved data. Put an expiration on it so its temporary. Its not trivial to make it work in all cases (say added an item then removed/renamed it so also update cache etc).
From server-side its common to cache recent saves in memcache and also merge with your queries.

Running multiple sites on the same python process

In our company we make news portals for a pretty big number of local newspapers (currently 13, going to 30 next month and more in the future), each with 2k to 100k page views/day. Since we are evolving from a situation where each site was heavily customized to one where each difference is a matter of configuration or custom template, our software is already pretty much the same for all sites. Right now our deployment strategy is one gunicorn instance for each site (with 1-17 workers each, depending on the site traffic), on a 16-core server and 12GB RAM. The problem with this setup is that each worker (regular pre-forked gunicorn) takes 110MB, whether its being used or not. Now with the new sites we would need to add more RAM to serve not that much many requests, so basically it doesn't scale. Also, since we are moving from this model where each site is independent, each site has its own database and I quite like it that way, especially since we are using relational databases (mysql, but migrating to pgsql), so its much easier to shard this way.
I'm doing some research and experimenting with running all sites on one gunicorn instance, so I could use the servers fully and add more servers behind a load balancer when it came to it. The problem is that django assumes in a lot of places that only one site is running per process, so for what I've thought of so far I'd have to implement:
A middleware that takes the HTTP_HOST from the request and places an identifier on a threadlocal variable.
A template loader that uses that variable to load custom templates accordingly.
Monkey patch django.db.model.Model, probably adding a metaclass (not even sure that's possible, but I think I would need it because of the custom managers we sometimes need to use) that would overwrite the managers for one that would first call db_manager(identifier) on the original manager and then call the intended method. I would also need to overwrite the save and delete methods to always include the using=identifier parameter.
I guess I would need to stop using inclusion_tag decorators, not a big problem, but I need to think of other cases like this.
Heavy and ugly patching of urlresolvers if I need custom or extra urls for each site. I don't need them now, but probably will at some point.
And this is just is what I came up with without even implementing it and seeing where it breaks, I'm sure I'd need many more changes for it to work. So I really don't want to do it, especially with the extra maintenance effort I'll need, but I don't see any alternatives and would love to learn that someone already solved this in a better way. Of course I could also stop using django altogether (I already have many reasons to do so) but that would mean a major rewrite and having two maintain two incompatible branches of the software until the new one reached feature parity with the django version, so to me it seems even worse than all the ugly hacks.
I've recently developed an e-commerce system with similar requirements -- many instances running from the same project sharing almost everything. The previous version of the system was a bunch of independent installations (~30) so it was pretty unmaintainable. I'm sure the requirements still differ from yours (for example, all instances shared the same models in my case), but it still might be useful to share my experience.
You are right that Django doesn't help with scenarios like this out of the box, but it's actually surprisingly easy to work it around. Here is a brief description of what I did.
I could see a synergy between what I wanted to achieve and django.contrib.sites. Also because many third-party Django apps out there know how to work with it and use it, for example, to generate absolute URLs to the current site. The major problem with sites is that it wants you to specify the current site id in settings.SITE_ID, which a very naive approach to the multi host problem. What one naturally wants, and what you also mention, is to determine the current site from the Host request header. To fix this problem, I borrowed the hook idea from django-multisite: https://github.com/shestera/django-multisite/blob/master/multisite/threadlocals.py#L19
Next I created an app encapsulating all the functionality related to the multi host aspect of my project. In my case the app was called stores and among other things it featured two important classes: stores.middleware.StoreMiddleware and stores.models.Store.
The model class is a subclass of django.contrib.sites.models.Site. The good thing about subclassing Site is that you can pass a Store to any function where a Site is expected. So you are effectively still just using the old, well documented and tested sites framework. To the Store class I added all the fields needed to configure all the different stores. So it's got fields like urlconf, theme, robots_txt and whatnot.
The middleware class' function was to match the Host header with the corresponding Store instance in the database. Once the matching Store was retrieved, It would patch the SITE_ID in a way similar to https://github.com/shestera/django-multisite/blob/master/multisite/middleware.py. Also, it looked at the store's urlconf and if it was not None, it would set request.urlconf to apply its special URL requirements. After that, the current Store instance was stored in request.store. This has proven to be incredibly useful, because I was able to do things like this in my views:
def homepage(request):
featured = Product.objects.filter(featured=True, store=request.store)
...
request.store became a natural additional dimension of the request object throughout the project for me.
Another thing that was defined on the Store class was a function get_absolute_url whose implementation looked roughly like this:
def get_absolute_url(self, to='/'):
"""
Return an absolute url to this `Store` or to `to` on this store.
The URL includes http:// and the domain name of the store.
`to` can be an object with `get_absolute_url()` or an absolute path as string.
"""
if isinstance(to, basestring):
path = to
elif hasattr(to, 'get_absolute_url'):
path = to.get_absolute_url()
else:
raise ValueError(
'Invalid argument (need a string or an object with get_absolute_url): %s' % to
)
url = 'http://%s%s%s' % (
self.domain,
# This setting allowed for a sane development environment
# where I just set it to ".dev:8000" and configured `dnsmasq`.
# The same value was also removed from the `Host` value in the middleware
# before looking up the `Store` in database.
settings.DOMAIN_SUFFIX,
path
)
return url
So I could easily generate URLs to objects on other than the current store, e.g.:
# Redirect to `product` on `store`.
redirect(store.get_absolute_url(product))
This was basically all I needed to be able to implement a system allowing users to create a new e-shop living on its own domain via the Django admin.