Sitecore logging to azure table storage - sitecore

Firstly could someone please advise the best practice for sitecore logging when hosted in Azure?
Ideally we would like to log on to table storage. I tried using https://www.nuget.org/packages/log4net.Appender.Azure/.
However, the data doesn't stored on to azure table storage until we invoke the buffer.flush() method per article below:
http://zacg.github.io/blog/2014/02/05/azure-log4net-appender/
Has anyone experience this logging on to table storage in sitecore? Any recommendation will be much appreciated.

Good question. We have just added a specific object type that is optimized for logging - so our recommendation is to use AppendBlob for logging. See here for more information: http://blogs.msdn.com/b/windowsazurestorage/archive/2015/04/13/introducing-azure-storage-append-blob.aspx.
Unfortunately many people do try to use Table Storage for logging and if you don't design your keys carefully you can end up with hot partitions and scalability problems. Take a look at the logging anti-pattern in this guide: https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/.

Related

Identify AWS service for fast retrival of data

I have one generic question, actually, I am hunting for a solution to a problem,
Currently, we are generating the reports directly from the oracle database, now from the performance perspective, we want to migrate data from oracle to any specific AWS service which could perform better. We will pass data from that AWS service to our reporting software.
Could you please help which service would be idle for this?
Thanks,
Vishwajeet
To answer well, additional info is needed:
How much data is needed to generate a report?
Are there any transformed/computed values needed?
What is good performance? 1 second? 30 seconds?
What is the current query time on Oracle and what kind of query? Joins, aggregations etc.

Backup of Datastore/Firestore without gcloud import/export

Hello Google Cloud Platform users!
I am interested in a solution for a regular (let's say daily) backup of Datastore/Firestore databases. Typical use: for some reason (bad "manual" operation, bug, whatever), a series of entities have been wrongly modified or destroyed, or the database is corrupted; in that case, the database version from the previous day will be restored.
I know this has been discussed in previous posts, but mostly through gcloud datastore|firestore import|export through files hosted on Google Cloud Storage. The problem is that for large databases (typically for professional applications with thousands and thousands of entities), this approach can take huge time and resources, even if launched in batch during the night (and it can only get worse when the database increases).
A solution that I have thought about would be to copy to another Datastore/Firestore dataset at each upsert, but that seems like overkill, since Datastore/Firestore services already guarantees replica anyway. But most of all: it does not address the issue of unwanted writing or deletion of entities if this second database is 100% synced with the original one...
Are there best practices to backup Datastore/Firestore entities for this use case?
Any (brilliant) idea is welcome!
Thanks.
You can have a look on this project: https://github.com/Zenika/alpine-firestore-backup
I'm a contributor on it, don't hesitate if you have question or if you want new features.
At the moment that function is not available for the datastore/firestore, there is a Feature Request to implement the functionality
https://issuetracker.google.com/133662510

Selecting the right cloud storage option on GCP

I am an entry level developer in a startup. I am trying to deploy a text classifier on GCP. For storing inputs(training data) and outputs, I am struggling to find the right storage option.
My data isn't huge in terms of columns but is fairly huge in terms of instances. It could even be just key-value pairs. My use case is to retrieve each entity from just one particular column from the DB, apply some classification on it and store the result in the corresponding column and update the DB. Our platform requires a DB which can handle a lot of small queries at once without much delay. Also, the data is completely unrelational.
I looked into GCP's article of choosing a storage option but couldn't narrow down my options to any specific answer. Would love to get some advice on this.
You should take a look at Google's "Choosing a Storage Option" guide: https://cloud.google.com/storage-options/
Your data is structured, your main goal is not analytics, your data isn't relational, you don't mostly need mobile SDKs, so you should probably use Cloud Datastore. That's a great choice for durable key-value data.
In brief, these are the storage options available. May be in future it can be more or less.
Depending on choice, you can select your storage option which is best suited.
SOURCE: Linux Academy

How to synchronize the local DynamoDb and Amazon DynamoDb web service

Hello, thanks for your viewing my question first!
I am running the Amazon dynamoDb locally and all databases are saved locally. With the local dynamoDb, I have to show everything with a lot of code, but I feel the interface at web service is much better, in which I can perform operations and see the tables directly and clearly:
So may I ask how can connect them, then I can practice the coding and check the status easily?
Looking forward to your reply and thank you so much!
Sincerely
You cannot connect them as they are completely separate databases. However, you can put a simple user interface on top of your local DynamoDB database.
I use the SQLite Browser: http://sqlitebrowser.org/. Once you have it installed, open the .db file located in the folder where you are running DynamoDBLocal.jar. You should be able to see all your tables and the data within them. You won't be able to see DynamoDB specific things like your provisioned capacity, but I think this will give you enough of what you're looking for.
Does this help?

Data Warehouse and Django

This is more of an architectural question than a technological one per se.
I am currently building a business website/social network that needs to store large volumes of data and use that data to draw analytics (consumer behavior).
I am using Django and a PostgreSQL database.
Now my question is: I want to expand this architecture to include a data warehouse. The ideal would be: the operational DB would be the current Django PostgreSQL database, and the data warehouse would be something additional, preferably in a multidimensional model.
We are still in a very early phase, we are going to test with 50 users, so something primitive such as a one-column table for starters would be enough.
I would like to know if somebody has experience in this situation, and that could recommend me a framework to create a data warehouse, all while mantaining the operational DB with the Django models for ease of use (if possible).
Thank you in advance!
Here are some cool Open Source tools I used recently:
Kettle - great ETL tool, you can use this to extract the data from your operational database into your warehouse. Supports any database with a JDBC driver and makes it very easy to build e.g. a star schema.
Saiku - nice Web 2.0 frontend built on Pentaho Mondrian (MDX implementation). This allows your users to easily build complex aggregation queries (think Pivot table in Excel), and the Mondrian layer provides caching etc. to make things go fast. Try the demo here.
My answer does not necessarily apply to data warehousing. In your case I see the possibility to implement a NoSQL database solution alongside an OLTP relational storage, which in this case is PostgreSQL.
Why consider NoSQL? In addition to the obvious scalability benefits, NoSQL offer a number of advantages that probably will apply to your scenario. For instance, the flexibility of having records with different sets of fields, and key-based access.
Since you're still in "trial" stage you might find it easier to decide for a NoSQL database solution depending on your hosting provider. For instance AWS have SimpleDB, Google App Engine provide their own DataStore, etc. However there are plenty of other NoSQL solutions you can go for that have nice Python bindings.