ETL and analytics ready backend architecture - web-services

Let's suppose I have retail website like amazon. Right now when I use ETL tool, my mobile application become inconsistent.
Could you suggest me some best data warehouse/ETL ready architecture which can be best fit for analysis purpose also and serving the main application also.
If there is something related with micro-service architecture then please suggest.
Please do provide reference also if any.
Thank you

Related

Question about high level architecture required to process and visualize fitness app data (From Apple Health for example) using google cloud services?

I'm working on a project where I am tasked to use google cloud services to process and visualize fitness data. For example, I have exported some apple health data from my watch, and it is in .xml format. From a high level, I envision this .xml file starting off in object storage, and being converted to .csv through a cloud function (triggered by the creation of the .xml object in storage) and stored again in object storage (different bucket). Then I see these .csv files being processed by a DataFlow pipeline, which will reformat the data to the template schema that I would like the data to be organized with. This pipeline will output the resultant .csv to BigQuery, which will then be designated as a data source for Data Studio. I will then configure Data Studio to produce some simple reports that compare the health data to recommended values. I would like for this report to be accessible as a .pdf in object storage potentially as well. Am I on the right track, or am I missing some key services to accomplish this?
Also, I'm new to posting on StackOverflow, so if this question is against the rules or not welcome, please let me know.
Any feedback is greatly appreciated, as I have not been able to bounce these ideas off of other experienced cloud architects/developers.
This question is currently off-topics by the rule of StackOverflow, as it does not contain any problems to resolve. See point 4-5.
As a high-level advice, I do not see why it should not be possible based on the services you mentioned but you would need to implement it and try it on your side and evaluate the features of each service in your workflow.
In terms of solution or architecture advice, those are generally paid services and you would most likely find little help here for those unless you have a specific problem to solve with said services. You might find some help on the internet as well. ie.Cloud Solutions, Built it on GCP, etc
You might find this interesting to review as well as it mimics your solution. Hope this helps.

How do I aggregate datafeeds from affiliate networks and online merchants using custom technology?

I want to build a software solution for datafeed aggregation. I dont want to use datafeed aggregation services like Rakuten PopShops or Datafeedr. I am looking for a guidance on how to build such solution from a software architecture point of view.
Which architectural design patterns should I use ?
Which technologies should I use ?
How to face the problem of non-normalized data formats and APIs from affiliate networks and merchants ?
Do you know some book on such topic ?
Since I am comming from a Java Enterprise world, these technologies seems to be a usable components of the solution for me:
Apache Camel,
ElasticSearch,
NoSQL database (MongoDB),
Akka,
ZeroMQ, RabbitMQ,
Spring Framework,
Typesafe Reactive Platform and a lot of other tools.
Unfortunately there is no single answer to your question. I would say stop thinking about solutions/patterns at this point. Try to figure out what exactly you want to do from a requirements perspective. What kind of data you want to aggregate, where to get the data from, data scrubbing rules, legal issues etc. Once you have that nailed down, take the easiest path to implement it. Technologies you have used in the past/comfortable with. Then add in other technologies once you find out that your existing solutions will not work.

What is a better mBaaS that supports offline sync and caching?

What is a better mBaaS that supports offline sync and caching?
I am evaluating several mBaaS solutions for my hybrid mobile app under development. I looked at Kinvey, Kii, buddy, and Telerik BackEnd platform. I have also came across some open source solutions like openmobster and dreamfactory. I am looking to store data in sql-lite on mobile app and then sync it back with an online data store. Kinvey has this support, but their pricing model (per user) is not suitable in my scenario. I can see that openmobster does this but, how is what I need to understand? Can I host in on Azure VM or something? Also please suggest if there is any other solution commercial/open source capable of doing offline sync and caching with push notifications and data storage?
DreamFactory could be a good fit for your scenario. It is open source and comes with a full 30 days of free support. After which it's only like $25/month for a developer account - and this isn't even a requirement to use its product. It's specifically a support package.
To address your question a little more in-depth... I don't believe DreamFactory supports offline syncing at the moment, though they plan to very soon. In regards to sql-lite, DreamFactory's (DSP) product has a built in sql-lite driver to connect to that DB. However, it hasn't been tested enough for them to say it is a fully supported RDBMS. One of the beautiful things about DreamFactory is you're able to host the DSP (DreamFactory Service Platform) on Azure and Amazon EC2 instances (cloud solutions), host locally on your own server, or even use its own free hosted edition!
I would definitely take a little time to look into DF. It doesn't seem to me like you have much to lose. Especially, considering it's a free open-source product!
Feel free to ask me any questions you may have about DreamFactory!
-Mark

Help emulating Heroku, GAE, etc : Building a web service privately (PaaS)

I'm not the only one with this question, but haven't found a lot of information in my research so far, so help me out.
We are a small IT crowd in an organization. We're looking to build a small, private service that would emulate a heroku/gae workflow. The basics of this: deploy an app as a git repository, and have it scale in a 'cloud' environment. Basically, a platform as a service (Paas).
Pretend we are amateur PM's, programmers, and sysadmins tasked with this. What would you recommend? We know generally what is needed: some sort of routing, database, caching, authentication, etc. What other tools do we need?
We would prefer tools along a ruby/python/haskell/erlang dimension, on a linux/bsd stack, with postgres databases(couchdb or cassandra in the future). We are not touching anything in the ms/.net area, nothing on the JVM (We've looked at Steamcannon, but no; Scala and Clojure tools are not entirely out of the question). We have a basic grasp of bootstrapping a cloud (e.g. Eucalyptus) to build on. We have an understanding of the basics in server admin, and the physical infrastructure limitations aren't a factor right now.
We're not looking into why gaerokuyardspace is the best choice, a list of such services, why we should ditch our plans for one of these services, or an argument against this plan. For this situation the decision has been made that the cost to build privately is more attractive than the cost of deploying elsewhere. We already know why and how for these services. We're looking to emulate and build upon these for private needs.
A short list of tools to be expanded:
Beehive
Steamcannon
Gitosis/Gitolite
?
Basically, I'd like to generate a list of tools for building heroku/gae like service on a small, private, definitely experimental/toy level.
I don't know that it will meet all of your stated needs today, but you should take a look at Cloud Foundry from VMware. You can check the FAQ for the commercial project or look in to the Open Source version that you can host and manage yourself.
Some combination of Cloud Foundry (above) gitolite, and fabric
will probably do well for you. Any such solution will take some time to get right.
(Disclaimer: I'm a lead developer on the AppScale project)
AppScale is pretty much right up your alley, especially if you're looking to run Google App Engine apps in your own private cloud. It's open source, so grab it and extend it if there are other types of apps you want to support (and definitely commit it back to us if you do).

Sample Flow Charts

I need to create a flowchart to show Developer computers, Development server, Development DB,
QA Server, QA DB, Staging Server, Staging DB, Production Server and Production DB as part of creating a process so that developers follow the same during the development to staging to production development cycle.
Could you please direct me to the right URL or resource.
Thanks in advance
If you're looking for inspiration, figure A in this post looks similar to what you're trying to do, albeit simplified slightly: http://blog.sysbliss.com/uncategorized/release-management-with-atlassian-bamboo-and-jira.html
I have used Microsoft Visio in the past for my flowcharts and it meets the basic needs.
Most of the standard components - servers etc are all there and you can usually find and download free stencils from the net for more specific needs
A process flow like you are talking about should be easily manageable using the standard stecils itself.
There seem to be a lot of online sites that provide this kind of service free lately.
You can check out this link. I have not used any of these before so cannot vouch for them though i did try out flowchart.com and it seemed pretty ok
You are looking for a tool to make network diagrams.
These are some candidates I found looking for Network Diagram at Google:
SmartDraw, A friend recommended it to me some time ago
Gliffy looks promising
To make a flowchart from source code is so complicated, but I found an code to flowchart converter software, it can create flowchart from source code automatically, I got this software from http://flowchart-creator.com. It is free to download and free to try.