Document Database like MongoDB Design of an expense tracker application - document-database

Getting started in a design of an application that tracks expenses.
Using MongoDB only to get familiar with document oriented DBs.
If I start with a doc design that has one doc per day, and that doc has info like where each dollar was spent, and the amount, am I necessarily starting off in the wrong direction?
I eventually want to slice and dice all of the data like how much was spent at Target between two dates, how much was spent in restaurants for a month, stuff like that.
My question is if I start by having a design that is day oriented, will I get into any trouble right away?

I think that would be just fine. You can make the _id anything you want, but consider making it milliseconds since the Epoch. That might make range queries easier to work with. You can also embed the string version of the date in each document so you don't always have to parse the _id field.
I don't think you'll get into trouble with that design, but prepare to learn a lot when it comes to writing queries in Mongo. Try to stay within their recommendations for writing queries or things can get very slow.

Related

How to structure a daily habit tracking app

Background
I'm a personal trainer and trying to build an app that tracks whether or not my clients are working on their daily habits (and bugs them about it at a chosen time every day). I'd love to hear if any of you all have ideas on how to structure this as I'm new to both Django and coding in general.
Models
I currently have two models: Habits and Checks.
Habits represent "What are you working on improving?" and have a ForeignKey to a user.
Checks represent "Did you complete your habit today?" and have a ForeignKey to a habit.
Current status
There is a nice solution where you create all the Checks for a Habit based on it's end date, but I'm trying to structure this with an indefinite end date because, as a coach, then I can show hard data when someone isn't making progress. Though I am still willing to accept that maybe this app would work better if habits had deadlines.
I wrote a custom manage.py script that Heroku runs automatically at the same time every day, but that doesn't scale with users' individual time zones. I run it manually on my local computer.
I originally tried getting it to work with Celery but that did not go well on my Windows machine.
Should I push the script out a day or week in advance and hide the days that are in the future?
Should I avoid the script and just create a year's worth of rows and hope they don't want to track it for more than a year?
Is there a better option?
Help requested
The two issues I'm having at this point:
How can I have a Check created for each day? Is there a better way than what I've done already?
How can I make the timezone for each day relative to the user?

How to way to make a BusinessHourField for business directory django project?

I am working on a business directory in django and am trying to figure out the best method for representing a store's business hours, with an open and close time for each day of the week.
So far the options that I have been unsatisfied are:
- Use one CommaSeparatedIntegerField to hold versions of the times in one parseable unit, but I'd rather deal with time objects if I can, with their various attributes and such.
Explicitly create a TimeField for each day's open and close, but this just sounds like a terrible idea that would take time and be too complicated.
Build a ScheduleField that would handle it, but then I am sort of unclear on the best road for relating to a pre-existing model field or work with the database.
I know any of you can beat any of those. Care to share?

How to make Django query using extra() portable accross database backends?

I have a little project that involved article archive browsing by year of publication.
I used the trick given in this other question to build a list of article publication years and article counts for those years. It works pretty well on my test server with SQLite. Since the production server will rely on PostgreSQL I am looking for a way to achieve the same thing in PostgreSQL and ended up toying with the EXTRACT keyword. I use something like "import settings" to detect the current database backend and execute the right query.
My point is all of that look more and more like a dirty & crappy hack to solve an issue in a very inelegant, untestable and poorly maintainable way. As a web programmer beginner I ask my experienced elder,
How would you deal with that correctly ?
As an option to the raw sql:
You can calculate the count per year with the ORM (e.g. How to use Django ORM to get a list by year of all articles with an article count )
Then you store that value somewhere (in a model or in cache ...) in order not to be overwhelmed by the slowness of the ORM calculation.

How can I start with data mining for small grocery shop

My company got the project to build simple website of grocery shop with catalogue only without shop cart. Few days ago I read something about data mining from here
I found that it is possible to do some predictive modelling like
For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought
diapers on Thursdays and Saturdays, they also tended to buy beer.
I told them this example and they were happy if I can do something like that.
Now don't know how to start and where to start. I know mysql database and can program complex queries as well. But I don't know how i can get the type of data like beer and diapers
I have 3-4 months left. Can anyone guide me how i can start.
I also don't know what type of data of customer shopping i can get from the shop may be excel files .
But i want to start
Judging your question, you don't seem to know much, if anything, about data mining. That being said, you can get something usable running in 4 months, especially in a very restricted domain like a web shop, where all you are after is probably buying patterns for a start.
Please understand that you cannot expct some out-of-the-box solution that can be posted here in 10 lines of code, so I suggest you start by reading a decent book on the subject. I'd recommend:
Programming Collective Intelligence: Building Smart Web 2.0 Applications

What is data mining from a developer's perspective?

I can find the technical explanation of what data mining is in a book or on Wikipedia, but I'm wondering what sort of development does it exactly involve? Is it more about using tools or more about writing tools? Is it really any much different from other domains when it comes to R&D?
Data Mining is the process of discovering interesting patterns in large amounts of data. It is not querying data, which is just what user Treb describes (sorry Treb).
To understand DM from a developer's perspective, you should read the book Programming Collective Intelligence by Toby Segaran.
In my experience (I'm a former data miner :-)), it's a mixture of using tools and writing tools. A lot of the time, the tools you need to analyse the particular data set don't exist, so you have to write them yourself first. It can be very interesting but you often need quite a different approach to the sort of programming I do now (embedded wireless), for example.
You really ought to change the accepted answer on this question so it doesn't mislead those who come across it.
Saying that querying a database IS data mining because "[h]ow would you discover any pattern in your data without querying first?" is like saying opening your car door is driving because "how else would you be able to drive somewhere without opening the car door first."
You can read your data out of a text file if you want. My first data mining assignment used data sets from the UCI repository and those are almost all text files.
If you want to learn about data mining start by looking up clustering and classification. Learn about decision trees and rule based classification. Then look at k-nearest-neighbor and k-means. After that if you really want to see what data mining is all about look at Chameleon, DBScan, and Support Vector Machines. Don't necessarily learn the minutiae of the last three (they're pretty complex and math heavy) but understanding the abstract idea of what happens will tell you all you need to know in order to use the many tools and libraries that are available for each strategy.
These are only the algorithms that popped into my head just now. There are so many others that I don't recall or don't even know yet.
Data mining is about searching large quantities of data for hidden patterns. Web 2.0 example: News corp uses its site myspace.com as a large data mine to determine what movies and products to promote. They write software to identify trends in the data that it's users post to the site. News corp does this to gather information useful for advertising campaigns and market predictions. It's different from other domains of R&D in that from a data givers perspective its passive. Rather than going out on the street and asking people in person what movies they are likely to see this summer and other such questions, the data mining tools sort out these things by analyzing data given by users voluntarily.
Wikipedia actually does have a pretty good article on it:
- http://en.wikipedia.org/wiki/Data_mining
Data Mining as I say is finding patterns or trends from given data. A developer perspective might be in applications like Anti Money Laundring... Where given a pattern you will search data for that given pattern. One other use is in Projection Softwares... where you project a result or outcome in future against a heuristic by studying recognizing the current trend from data.
I think it's more about using off the shelf tools rather than developing your own. An academic example of that kind of tools might be WEKA. Of course, you still have to know what algorithms use, how to preprocess data (very important this part), etc.
In R&D I don't have much idea, but it should be like almost everything: maths, statistics, more maths...
On the development level, data mining is just another database application, but with a huge amount of data.
The mining itself is done by running specific queries on the database. It's in the creation of the queries where the important work is done. They of course depend on the data model, and on the hypotheses, what sort of trends the customer expects to find.
Therefore, the fine tuning of the queries usually can't be done in development, but only once the system is live and you have live data. Then the user can test his hypotheses and adapt the queries to show him the trends he is looking for.
So from a dev point of view, data maining is about
Managing large sets of data in your client (one query may return 100.000 rows of data)
Providing the user (who may know nothing about SQL or relational databases in general) with an effective way to modify his queries and view the results.