Amazon DynamoDB vs relational database [closed] - amazon-web-services

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I want to store huge amount of trading data (say million records per day) using some kind of data base. Each record is small and has static structure: id(integer), time stamp(integer), price(float), size (float). Id field is primary key here (in terms of relational data bases). And I want to select records from specific time range (ordered by time). These is straightforward in a relational database.
Is nosql data base (DynamoDB in particular) suitable for these requirements? Or should I use traditional relational database solution ?
I don't have any experience with NoSql data bases.

The straightforward answer to this question is yes, this fits DynamoDB's use case well. But there's a better answer: try it out and see!
I have been seeing a lot of this kind of question regarding AWS, namely "will this work?" as opposed to "how do I do this?" And the best way to answer that is to try it out and see. Unlike traditional IT, you don't have to do a lot of planning or invest a lot of capital up front to try it out. Spend a buck or two (literally that little) to run a little test program using DynamoDB and another using MySQL (or other RDBMS) and see how they work for you.

Dynamodb would work, however given that each record is small, static structure in my opinion, a relational database would be equally well suited for this task, perhaps even better (which is very subjective).
Don't forget to calculate the costs of both solutions; you can easily install mysql (free) or sql server (not free once you get past a certain point) on an ec2 instance and you will know exactly what your monthly costs will be.
Dynamodb is priced very differently, so you really need to quantify your reads/writes and storage requirements in order to know what you are in for. Best to figure these things out ahead of time unless money is not a concern.

Related

C++ - Saving & Deleting [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm creating a sample banking programme for fun and I realised I needed some help.
I have it set so that you can create two different types of bank accounts - one for personal and one for a business. I want to be able to save the data given into these accounts & also be able to delete data if they account gets "cancelled" etc.
My idea was to have separate files be created for each account holder - so each person or company will have a file created with the data in it (how much money, name, etc.). Is this a realistic approach or would it just cluster up and take a lot of space? Is there an easier way of doing this? (more efficient or faster).
Note: I don't want a code for this question, I want an explanation on how to approach this problem the best.
Thanks in advance! :)
You can do it with files.
However, eventually as the program becomes complex you'll end up needing more structure (like shared data between multiple accounts), atomicity (no intermediate state visible), transactions (being able to roll back some actions), more throughput, backups, reporting, aggregation, multi-system distribution, check-pointing, migrations and others. You can implement it on top of your file structure, but it's going to be hard.
Luckily there's already a simple solution. It's called a database. You can setup your own instance relatively easy and provides out of the box what you already need and a bunch of features that you didn't think you need right now, but likely you're going to need them at some point in the future.
So checkout a SQL database (like MySql, PostGreSQL, SQlite or some more advanced solutions from Microsoft or Oracle) or one of the NoSQL solutions offered by cloud providers (bigtable for example). At this point any of them is likely to satisfy your need to store, modify and delete data.

Approach to data-analysis [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm looking to write a reporting tool. The data resides in a ~6GB postgresql database. The application is an online store/catalog application that has items and orders. The stakeholders are requesting a feature that will allow them to search for an item and give a count of all those orders in the last 2 years.
Some rows contain quantities, and units of measure, which would require multiplication of quantity and UoM for each row.
It's also possible that other reporting functions will be necessary in the future.
I have not delved much into the data analysis aspect of programming. I enjoy Clojure, so I would be thrilled to find a solution that uses Clojure, but only if Clojure offers competitive tools for my needs.
Here are some options I'm considering:
merely SQL
Clojure
core.reducers
a clojure hadoop library
Hadoop
Can anyone shed some insight into these kinds of problems for me? Are there articles that you would recommend?
Hadoop is likely overkill for this project. It seems most likely that simply using Clojure-jdbc or Korma to read the data form the database and filter/reduce it in Clojure is likely to be fine. At work we routinely work with sequences of that size, though this depends on the expected response time. You may need to do some preprocessing and caching if instantaneous responses are expected.

Advice on C++ database program [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I want to build a simple program that searches through a local database and returns a list of results. Essentially the program will be a searchable archive. This will be my first project of this type and I'm looking for some advice.
Fast reading of the database is important. Writing can be slow.
There will be at most a few thousand records, and most records will probably contain less than 3 kb in text.
Records should be flexible when it comes to their fields. Some records will have the field "abc", others will not. The number of fields per record may vary.
Should I simply write my data structures in C++, and then serialize them? Does it make sense to use an existing (lightweight) database framework? If so, can you recommend any that are free and easy to use and preferably written in modern C++?
My target platform is Window and I'm using the Visual Studio compiler.
Edit: the consensus is to use SQLite. Any recommendations as to which of the many SQlite C++ wrappers to use?
As commented by #Joachim, I would suggest SQLite. I have used it in C++ projects and it's straightforward to use. You basically put two files in your project sqlite3.c and sqlite3.h and then start coding to the interface (read the last paragraph of http://www.sqlite.org/selfcontained.html). You don't have to worry about configuring a database server. It's fast and lightweight. Read about transactions and SQLite if you need to speed some operations up.

Looking for C++ datawarehousing for time series data [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I need a C++ library that can store and retrieve time series on demand to stream to client front-ends. I will be storing each component as structure of arrays format. I am currently using MySQL for correctness, but the DB access is starting to get ridiculously slow. I am trying to migrate away from this. Intuitively I can build such a library but it is not my business goal and will take quite a bit of implementation to get working. I am looking for an existing solution that can meet the following requirements:
O(1) lookup scheme
Excellent compression, each component is separated, so there should be plenty of redundancy that can be removed
Scalable to terabytes
(optional: Audit tracking)
Most important: Transactional support. There is going to be BIG data, and I can't have the possibility of a bad run corrupt an entire dataset which will create an unnecessary burden for backups and downtime during restores.
Also checkout TempoDB: http://tempo-db.com I'm a co-founder, and we built the service to solve this problem. We don't have a C++ client yet, but could work with you to develop one.
Take a look at OpenTSDB it's been develop at StumbleUpon by Benoit Sigoure:
http://opentsdb.net/
TeaFiles provide simple and efficient time series storage in flat files, enriched with item metadata and description. They might be a building block of the system you aim for. Currently free open source libraries exist for C++ (github.com/discretelogics/TeaFiles), C# and Python.
I am a founder of discretelogics and we coined this file format to overcome litations flat file time series storage while preserving its unrivaled speed.
Take a look at HDF5. It has a quick lookup scheme, has C, C++, Python interfaces. Has compression. Can get pretty big. Maintains metadata. Doesn't do auditing. You'll need a wrapper to handle multi-user capability.

Guidelines for GUI design for a risk analysis app [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
In my free time, I'm working on a risk analysis application. I have already finished the mathematical and simulation engines, but I'm stuck with the design of the user interface. I want my application to be as easy-to-use as possible for Excel users, but I don't want to make it an Excel add-in, because Excel takes ages to load add-ins. So I'm going to use the old and venerable MFC.
I want to make these things easy in my application:
Modeling tasks:
Defining probability and uncertainty distributions
Defining mathematical relations between the variables
Separating uncertainty from variability (second-order risk modeling)
Validating the risk model
What-if (sensitivity) analysis
Data manipulation/display tasks:
Importing/exporting data from/to Excel and databases
Displaying nice graphs to the user
Do you know any guidelines I could take into consideration in the design of the user interface? The only examples I know, LINGO and Rockwell Arena, are actually examples of what NOT to do. Perhaps I will need to include a simple scripting language in the system but, in that case, it will be an option for advanced users, not for everybody.
1) For risk-specific functionality (at least, in financial world), one of the important guidelines is to allow easy viewing of summary level risk as well as easy drill-down to details (e.g. enterprise-wide down to security level)
2) Plus, don't forget standard GUI design guidelines - there's always Nielsen and there's JoelOnSoftware's Joel Spolsky's design book and series of articles.
High level,
make sure your controls are intuitive (do what the user expects them to),
minimize the amount of work (eye and hand movements) user needs to do to accomplish most frequent tasks,
Allow easy linking (e.g. no dead ends - if you are displaying a list of securities, make an easy way to jump from security's name to detail screen for that security)
Always always usability-test.