Django webapp - tracking financial account information - django

I need some coding advice as I am worried that I am creating, well, bloated code that is inefficient.
I have a webapp that keeps track of a company's financial data. I have a table called Accounts with a collection of records corresponding to the typical financial accounts such as revenue, cash, accounts payable, accounts receivable, and so on. These records are simply name holders to be pointed at as foreign keys.
I also have a table called Account_Transaction which records all the transactions of money in and out of all the accounts in Accounts. Essentially, the Account_Transaction table does all the heavy lifting while pointing to the various accounts being altered.
For example, when a sale is made, two records are created in the Account_Transaction table. One record to increase the cash balance and a second record to increase the revenue balance.
Trans Record 1:
Acct: Cash
Amt: 50.00
Date: Nov 1, 2011
Trans Record 2:
Acct: Revenue
Amt: 50.00
Date: Nov 1, 2011
So now I have two records, but they each point to a different account. Now if I want to view my cash balance, I have to look at each Account_Transaction record and check if the record deals with Cash. If so, add/subtract the amount of that record and move to the next.
During a typical business day, there may be upwards of 200-300 transactions like the one above. As such, the Account_Transaction table will grow pretty quickly. After a few months, the table could have a few thousand records. Granted this isn't much for a database, however, every time the user wants to know the current balance of, say, accounts receivable, I have to traverse the entire Account_Transaction table to sum up all records that deal with the account name "Accounts Receivable".
I'm not sure I have designed this in the most optimal manner. I had considered creating a distinct table for each account (one for "Cash", another for "Accounts Receivable" another for "Revenue" etc...), but with that approach I was creating 15-20 tables with the exact same parameters, other than their name. This seemed like poor design so I went with this Account_Transaction idea.
Does this seem like an appropriate way to handle this kind of data? Is there a better way to do this that I should really be adopting?
Thanks!

Why do you need to iterate through all the records to figure out the status of Accounts Receievable accounts? Am I missing something in thinking you can't just use a .filter within the Django ORM to selectively pick the records you need?
As your records grow, you could add some date filtering to your reports. In most cases, your accountant will only want numbers for this quarter, month, etc., not entire historic data.
Add an index to that column to optimize selection and then check out Djangos aggregation to Sum up values from your database.
Finally, you could do some conservative caching to speed up things for "quick view" style reports where you just want a total number very quickly, but you need to be careful with this to not have false positives, so reseting that cache on any change to the records would be a must.

Why don't you keep track of the exact available amount in the Account table? The Account_Transaction could only be used to view transaction history.

Related

How to build events aggregation service for high load system with DynamoDB

I'm working on an Ad-tech system which serves millions of users.
Basically users (non anonymous users) can see different Ads that are being created by the marketing team.
Our marketing team want to be able to set some Frequency caps on those Ads (among other targeting rules they already have)
For example:
"We should not show this ad for a user if he already seen/click this ad more than X times in the last Y days"
Also ads can be grouped to campaigns, so rules like that are also possibile:
"We should not show this for a user if he viewed more than X times ads in this campaign in the last Y days".
Also our marketing might wanna know how many people viewed/click a specific add in the last Y days.
We have roughly 200K RPM and our responses should be very fast.
The smallest unit of time for our queries is one day and it will not change.
Few questions and thoughts:
Is DynamoDB a good fit?
I thought about creating a table for each event type (Click/View/Close..)
What is the best way to configure the primary key?
I thought about settings the primary key as the user id and the sort key as a combination of the ad id and the current day {dd/mm/yyyy}
I thought about use "ADD" operation to increase the counter when a user click/view/.. an Ad in a specific date. are they expensive operations? do I have an alternative?
What is the best way I can use to also be able to query per ad and campaigns as well (for example: "all users views for all ads in campaign" or "get all ad views in the last 40 days) ) ?
What other considerations should I take in mind?
Thanks a lot

Access Patterns Dynamo DB possible with a NoSQL Database?

I am relativly new to NoSQL Database Structures.
In my thinking the following access patterns show relations and analytical queries. But then a SQL Data Structure would be the better approach instead of a NoSQL Structure.
I wondering whether the following access patterns are even possible with a dynamoDB or maybe one has to get the data first from the DynamoDB into e.g. a lambda to process them.
Get all customers of this month that increased their spending higher then 25% compared to the last month
Get anual spending of a customer (Only monthly spending is entered into DynamoDB)
Get customers that has a spending over 0 in a specific timeframe
Get all orders in a specific timeframe and where the customer who placed the order has the following attributes (female, 25 yars, 170cm tall)
Get all active customers, active supplyers software, active supplyers raw materials for a given timeframe

Strong Consistency when you need to query multiple entities (thousands)

In an application that has many 'shops' every registered admin user has a 'shop' entity, each shop sells items where each item belongs to a certain 'category'. Having multiple clients (100's in some cases) each client has an account to follow up on their purchases and past orders. Each shop generates invoices for their clients, clients pays the invoice.
Admin User -- > Shop
Shop ---> clients
|-> items Categories
|-> items
|-> invoices
|-> payments received
An admin page shows a report showing invoices within the year (from Jan to Dec) this page is a client requirement. The shop is able to manually generate a new invoice when a purchase is made, and records a payment when it is paid. Note: This all happens in the actual shop, there is no online client purchases.
As a single shop generates few invoices per month (~100's), and multiple payments per month (~100's), showing this per year easily goes to thousands entities to show on a single page.
To optimize loading the page and generating the sales year report (total sales, revenue, payment...etc.), we thought we'd structure the data in a way where each item category per year is also an entity. This means that whenever a purchase is made for an item in this category, we need to add the item's purchase price to the itemCategory at that year in this month.
itemCategory Model:
itemCategory(ndb.Model):
shopID = ndb.KeyProperty()
year = ndb.IntegerProperty()
monthly_sales = ndb.FloatProperty(repeated=True) #12 months
This way we can load the entire sales table by reading just the list of itemCategory for this shop for this year, instead of reading all individual purchases through the year. This would save lots of Datastore reads and decrease page load time on the expense of an extra read, sum & write to this summary like entity.
Category Jan Feb Mar ... Dec
--------------------------------------
Men's shoes 1000 1300 850 ... 1400
Kids shoes 600 850 650 ... 900
The challenge at this point is that strong consistency is quite essential, for individual purchases and for the itemCategory entities. Because if the shop tries to add multiple purchases in a successive short timed way, with eventual consistency itemCategory might have not been updated with the last purchase sum yet. Resulting in wrong sales values. Also the same for individual purchase if there was a requirement to edit one right after it was added, a query for the entity without its ID might have no results. So it seems that Ancestor queries is essential here with maybe the shop as the parent entity. Yet, this will result in a contention issue later on (at least until Datastore is migrated to Firestore) with all those entities (thousands in this case!) having one single parent!
The same goes for invoices, generating a new invoice means knowing the latest invoice number so that they are always in sequence without gaps. Querying invoice with eventual consistency may result in duplicate invoice numbers.
What is the optimum way to structure the data at this point for strong consistency? Unfortunately the project has been there for a few years, and was started using Google Datastore rather than Cloud SQL (which seem to be more appropriate for this kind of projects). Hopefully all these issues goes away after the migration to Firestore having Strong consistency for all reads
Consider exporting the data and then importing it into a Cloud Firestore in Datastore Mode project. No more eventual consistemcy issues.
There are certain ways you can achieve strong consistency.
Query using key. Whenever you try to read an object via its key it is strongly consistent.
Another approach would be to use NDB Asynchronous Operations. See related documentation here.
A really naive approach would be to provide a delay which could help you but the delay should be provided in such a way that it is sufficient for the object to get updated.
And the final approach could be to export data into Cloud Firestore. There you can achieve strong consistency always.
Hope this answers your question!!!

How to solve "hot" hash key issue (space skewed data) in DynamoDB?

For example, I am using DynamoDB to store product purchase records. The hash key is product ID and the range key is purchase time.
Some popular products can have a lot of purchase records (space skewed) so that read/write requests can get throttled for "hot" partitions while other partitions are not using full throughput.
How to solve this problem and still be able to get latest purchase records? Thanks!
You can use a cache solution in order to achieve this.
You can follow the guidelines when designing a table to cache the popular items:
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.CachePopularItems
My solution for this is to use elasticache (Redis), you can create a list that represent the last purchases per product and trim the last 100 purchases for each product, for example:
LPUSH product:100 2016-08-13:purchaseId
LTRIM product:100 0 99
Will trim the list to last 100 items.
I hope this help...

Django Copy Related Data and keep them unchanged over time

Using ForeignKey relationships, I want to be able to copy data and store it in another model. For example, think of how you would handle past Invoices and billed Services.
The Invoice would have one or more Services associated with it and with prices for the Services. This prices for a Service can / will change over time - but the Service price recorded with the Invoice should remain as it was when the Invoice was created.
My first thought was to create a pdf from the resulting data and store it. But this would make the data somewhat inaccessible.
Is there somehow a way to copy the data and keep them accessible?
This is a pretty broad problem with multiple solutions. I dont think that what you're aiming to is the correct one.
One rule for saving invoices is, that invoices never change. You should never update an invoice. So not only your 'copies' of invoices should remain the same, but the original too.
Also, you should have a InvoiceItem (or InvoiceRow) model which are the items on your invoice. Don't bind Products to a Invoice directly.
Here are 2 solutions I've used:
Solution 1
You can normalize the data on your invoice(items). So, don't use foreignkeys, but normalize all data about the product, so product info (incl. price) is saved within the invoice(item).
Solution 2
Give your products revision numbers. So everytime a product is updated (name or price change for example), a new product is created in the database. Now you can link the InvoiceItem to a Product with a Foreign Key, and it will will be historically accurate.
Im sure there are some guides/best practices for creating Invoice backends. Language or Framework is not important. Invoicing is really important, so do alot of research before starting to build something. That's just my two pennies