I am exploring the product WSO2 stratos ,watched some of the webinar recordings. I would like to create an application and expose it as SAAS.One of the webex recordings cover this in detail , but it is not explaining the multi-tenancy on data storage. Is there any tutorial available for the same ? I would like to use shared schema for data storage. What kind of database can i use for this ( For eg: MySql,MongoDB,Cassandra etc ) Is it possible to use some frame works like Athena ? I am just trying to do a kind of POC and then i need to decide whether this platform really fits for the application that i am thinking to build
You can create databases through WSO2 Storage Server in StratosLive which can be accessed via storage.stratoslive.wso2.com. You need to create a database and attach a user to it. Then you can access that database from your webapp (you will get a jdbc url) as you do it in normal cases. Also, you can create Cassandra keyspaces in the Storage Server. But we dont have the MongoDB support at the moment. There is no documentation on this yet.
Yes, you're right. Multi-tenant data architecture is up to the user to decide. This white paper from Microsoft explains multi-tenant data architecture nicely. The whitepaper however is written assuming you're using an RDBMS. I haven't played around with Athena so it's difficult to say how it'll map with what Stratos provides. The data architecture might be different when you're using a NoSQL DB and different DBs have different ways of filtering a set of data by a given tenant (or an ID). So probably going by the whitepaper it'll map to,
Different DBs -> Different keyspaces
Different tabeles -> Different column families
Shared schema -> Shared column family
Better to define your application characteristics before hand and then choose an appropriate DB
Related
I am new to the Hadoop environment, sorry if the question is obvious...
I need to develop a web service to record and read large volumes of data. Because of this requirement I thought of using a Hadoop cluster and HBase as my database.
I have designed my hbase schema to satisfy my requirements, so far so good.
The thing is that since it is a service I am developing, I would like the users of the service not to know the internal representation of the data.
I do not want the users to have to invoke a Put to a certain table, for example, to the Clients table, but instead invoke a high-abstraction method, for example, createClient().
How do I add this abstraction layer on top of HBase while maintaining the characteristics of reliable and distributed and the capacity to service lots of users simultaneously offered by HBase itself?
Thanks a lot
Consider Hbase Stargate to enable a REST server. If you want to obscure the table name in the URI, perhaps proxy Stargate with a web server.
I've just started messing around with AWS DynamoDB in my iOS app and I have a few questions.
Currently, I have my app communicating directly to my DynamoDB database. I've been reading around lately and people are saying this isn't the proper way to go about getting data from my database.
By this I mean is I just have a function in my code querying my Dynamo database and returning the result.
How I do it works but is there a better way I should be going about this?
Amazon DynamoDB itself is a highly-scalable service and standing up another server in front of it requires scaling the service also in line with the RCU/WCU configured for your tables, which we can and should avoid.
If your mobile application doesn't need a backend server and you can perform all the business functions from the mobile device, then you should probably think about
Using the AWS DynamoDB SDK for iOS devices to write your client application that runs on the mobile device
Use AWS Token Vending Machine to authenticate your mobile users to grant them credentials to be used to run operations on DynamoDB tables.
Control access (i.e what operations should be allowed on tables etc.,) using IAM policies.
HTH.
From what you say, I can guess that you are talking about a way you can distribute data to many clients (ios apps).
There are few integration patterns (a very good book on this: Enterprise Integration Patterns), one of which is called shared database. It is essentially about using a common database for multiple clients to share the data. Main drawback for that pattern (in your case) is that you are doing assumption about how the database schema looks like. It can potentially bring you some headache supporting the schema in the future, if your business logic changes.
The more advanced approach would be sending events on every change in your data instead of directly writing changes to the database from client apps. This way you can add additional processing to the events before the data they carry is written to the database. For example, you may want to change the event format in the new version of your app, but still want to support legacy users, so you add translation procedure which transforms both types of events to the format which fits the database schema. It's basically a question of whether to work with diffs vs snapshots.
You should be aware of added complexity of working with events, and it can be an overkill if your app is simple and changes in schema are unlikely.
Also consider that you can do data preprocessing using DynamoDB Streams, which gives you some advantages of using events still keeping it simple to implement.
First I am new in Xcode but start using Swift
I am so confused with the purpose of CloudKit (Document) if I want to develop multiuse app. Which normally I would use Web Services or Web base Application. It would be nice to use App in mobile and store all data in Cloud.
I have two questions:
Regardless of speed to deal with the iCloud, Can I make app with most or all data in iCloud and go to Appstore. Mean App only run when network available. (The reason I asked because some developers complained of Apple rejected with the reason "Specifically, your application requires iCloud support for the users to access this application, which can create poor user experience. ")
In case the app should maintain the syn of all data. I couldn't imagine how complicate to handle all in/out of the data. So the question: Is there any way to replicate iCloud data. just like some databases: MySql, Sql without much of the programming. Then I just focus on CoreData with replication features.
Concerning availability CloudKit is not very different from using web services. If there is no internet connection, then you will not be able to fetch data. How would you handle that if you were using web services? You should handle it the same way when using CloudKit. Just make sure that at least there is some functionality when the data is not available.
Your 2nd question is a bit broad. Yes, you could use CoreData syncing to iCloud. You could also just catch the fetched data using one of the many caching libraries that car available (search CocoaPods)
This is more of an architectural question than a technological one per se.
I am currently building a business website/social network that needs to store large volumes of data and use that data to draw analytics (consumer behavior).
I am using Django and a PostgreSQL database.
Now my question is: I want to expand this architecture to include a data warehouse. The ideal would be: the operational DB would be the current Django PostgreSQL database, and the data warehouse would be something additional, preferably in a multidimensional model.
We are still in a very early phase, we are going to test with 50 users, so something primitive such as a one-column table for starters would be enough.
I would like to know if somebody has experience in this situation, and that could recommend me a framework to create a data warehouse, all while mantaining the operational DB with the Django models for ease of use (if possible).
Thank you in advance!
Here are some cool Open Source tools I used recently:
Kettle - great ETL tool, you can use this to extract the data from your operational database into your warehouse. Supports any database with a JDBC driver and makes it very easy to build e.g. a star schema.
Saiku - nice Web 2.0 frontend built on Pentaho Mondrian (MDX implementation). This allows your users to easily build complex aggregation queries (think Pivot table in Excel), and the Mondrian layer provides caching etc. to make things go fast. Try the demo here.
My answer does not necessarily apply to data warehousing. In your case I see the possibility to implement a NoSQL database solution alongside an OLTP relational storage, which in this case is PostgreSQL.
Why consider NoSQL? In addition to the obvious scalability benefits, NoSQL offer a number of advantages that probably will apply to your scenario. For instance, the flexibility of having records with different sets of fields, and key-based access.
Since you're still in "trial" stage you might find it easier to decide for a NoSQL database solution depending on your hosting provider. For instance AWS have SimpleDB, Google App Engine provide their own DataStore, etc. However there are plenty of other NoSQL solutions you can go for that have nice Python bindings.
I'm planning to get GREG from WSO2 as business service registry. We're currently storing services in a Spreadsheet as a delimited text file. Services are still abstract concepts (operations not).
Which is the best approach (painless, programming-less...) to do a bulk load of about 660 business services and 12000 operations?
The most painless way probably is using the registry client. WSO2 provides a java based client you can use to easily access the registry. It won't be completely painless, but with a couple of lines of code you could easily add this information.
On other option would be to directly plug in to the underlying JCR repository or database, but than your entering the painful area I think.