Strong Consistency when you need to query multiple entities (thousands) - python-2.7

In an application that has many 'shops' every registered admin user has a 'shop' entity, each shop sells items where each item belongs to a certain 'category'. Having multiple clients (100's in some cases) each client has an account to follow up on their purchases and past orders. Each shop generates invoices for their clients, clients pays the invoice.
Admin User -- > Shop
Shop ---> clients
|-> items Categories
|-> items
|-> invoices
|-> payments received
An admin page shows a report showing invoices within the year (from Jan to Dec) this page is a client requirement. The shop is able to manually generate a new invoice when a purchase is made, and records a payment when it is paid. Note: This all happens in the actual shop, there is no online client purchases.
As a single shop generates few invoices per month (~100's), and multiple payments per month (~100's), showing this per year easily goes to thousands entities to show on a single page.
To optimize loading the page and generating the sales year report (total sales, revenue, payment...etc.), we thought we'd structure the data in a way where each item category per year is also an entity. This means that whenever a purchase is made for an item in this category, we need to add the item's purchase price to the itemCategory at that year in this month.
itemCategory Model:
itemCategory(ndb.Model):
shopID = ndb.KeyProperty()
year = ndb.IntegerProperty()
monthly_sales = ndb.FloatProperty(repeated=True) #12 months
This way we can load the entire sales table by reading just the list of itemCategory for this shop for this year, instead of reading all individual purchases through the year. This would save lots of Datastore reads and decrease page load time on the expense of an extra read, sum & write to this summary like entity.
Category Jan Feb Mar ... Dec
--------------------------------------
Men's shoes 1000 1300 850 ... 1400
Kids shoes 600 850 650 ... 900
The challenge at this point is that strong consistency is quite essential, for individual purchases and for the itemCategory entities. Because if the shop tries to add multiple purchases in a successive short timed way, with eventual consistency itemCategory might have not been updated with the last purchase sum yet. Resulting in wrong sales values. Also the same for individual purchase if there was a requirement to edit one right after it was added, a query for the entity without its ID might have no results. So it seems that Ancestor queries is essential here with maybe the shop as the parent entity. Yet, this will result in a contention issue later on (at least until Datastore is migrated to Firestore) with all those entities (thousands in this case!) having one single parent!
The same goes for invoices, generating a new invoice means knowing the latest invoice number so that they are always in sequence without gaps. Querying invoice with eventual consistency may result in duplicate invoice numbers.
What is the optimum way to structure the data at this point for strong consistency? Unfortunately the project has been there for a few years, and was started using Google Datastore rather than Cloud SQL (which seem to be more appropriate for this kind of projects). Hopefully all these issues goes away after the migration to Firestore having Strong consistency for all reads

Consider exporting the data and then importing it into a Cloud Firestore in Datastore Mode project. No more eventual consistemcy issues.

There are certain ways you can achieve strong consistency.
Query using key. Whenever you try to read an object via its key it is strongly consistent.
Another approach would be to use NDB Asynchronous Operations. See related documentation here.
A really naive approach would be to provide a delay which could help you but the delay should be provided in such a way that it is sufficient for the object to get updated.
And the final approach could be to export data into Cloud Firestore. There you can achieve strong consistency always.
Hope this answers your question!!!

Related

How to build events aggregation service for high load system with DynamoDB

I'm working on an Ad-tech system which serves millions of users.
Basically users (non anonymous users) can see different Ads that are being created by the marketing team.
Our marketing team want to be able to set some Frequency caps on those Ads (among other targeting rules they already have)
For example:
"We should not show this ad for a user if he already seen/click this ad more than X times in the last Y days"
Also ads can be grouped to campaigns, so rules like that are also possibile:
"We should not show this for a user if he viewed more than X times ads in this campaign in the last Y days".
Also our marketing might wanna know how many people viewed/click a specific add in the last Y days.
We have roughly 200K RPM and our responses should be very fast.
The smallest unit of time for our queries is one day and it will not change.
Few questions and thoughts:
Is DynamoDB a good fit?
I thought about creating a table for each event type (Click/View/Close..)
What is the best way to configure the primary key?
I thought about settings the primary key as the user id and the sort key as a combination of the ad id and the current day {dd/mm/yyyy}
I thought about use "ADD" operation to increase the counter when a user click/view/.. an Ad in a specific date. are they expensive operations? do I have an alternative?
What is the best way I can use to also be able to query per ad and campaigns as well (for example: "all users views for all ads in campaign" or "get all ad views in the last 40 days) ) ?
What other considerations should I take in mind?
Thanks a lot

Django create a view to generate a receipt

I want to create a small app that creates a kind off receipt record in to a db table, from two other tables. very much like a receipt from a grocery store where a cashier makes a sell and the ticket contains multiple items, calculates a total and subtotal and return values to the database. I currently have 3 tables: the Ticket table where i would like to insert the values of all calculations and ticket info, the services table that acts like an inventory of services available. this has the service name and price for each service and my responsible table that has a list of "cashiers" or people that will make the sale and their percentage for their commissions, i have the views to create , edit and delete cashier's and services.
What I don't have is a way to create the ticket. I am completely lost. can you guys point me in to the correct path on what to look for. i am learning to program son i don't have a lot of knowledge in this if its even possible. i don't need the system to print i just want to have all record stored this way later on i can expand on it and create reports of sold items and who sold them and how much commissions each seller has won.
You need to create relationships to two other models (tables) from the Ticket model (table). Luckily you don't have to create the relations in the database tables itself. Use django model's Foreign key fields to accomplish this. Here is the documentation link:
Django Models
You may need to read it several times to get the concepts thoroughly.

Django Copy Related Data and keep them unchanged over time

Using ForeignKey relationships, I want to be able to copy data and store it in another model. For example, think of how you would handle past Invoices and billed Services.
The Invoice would have one or more Services associated with it and with prices for the Services. This prices for a Service can / will change over time - but the Service price recorded with the Invoice should remain as it was when the Invoice was created.
My first thought was to create a pdf from the resulting data and store it. But this would make the data somewhat inaccessible.
Is there somehow a way to copy the data and keep them accessible?
This is a pretty broad problem with multiple solutions. I dont think that what you're aiming to is the correct one.
One rule for saving invoices is, that invoices never change. You should never update an invoice. So not only your 'copies' of invoices should remain the same, but the original too.
Also, you should have a InvoiceItem (or InvoiceRow) model which are the items on your invoice. Don't bind Products to a Invoice directly.
Here are 2 solutions I've used:
Solution 1
You can normalize the data on your invoice(items). So, don't use foreignkeys, but normalize all data about the product, so product info (incl. price) is saved within the invoice(item).
Solution 2
Give your products revision numbers. So everytime a product is updated (name or price change for example), a new product is created in the database. Now you can link the InvoiceItem to a Product with a Foreign Key, and it will will be historically accurate.
Im sure there are some guides/best practices for creating Invoice backends. Language or Framework is not important. Invoicing is really important, so do alot of research before starting to build something. That's just my two pennies

django models - best practice for annual subscription management

I'm looking for advice on best practice for annual subscription management where the fees may change each year.
I have a Membership and MemshipYear models as well as the User models. Each membership category (membership.category) has an annual fee which may be different. Members will be able to download pdf invoices for membership fees at any time once logged in.
The pdf is generated at the time of request and the data is taken from the membership model. Therefore if the membership fee changes after one year the invoice would be caluclated with this figure and not the fee for that year.
One thought I had was to use price banding i.e A-F, and have a price band for each category for each year.
I reckon there's a better way.Anyone?
Logically, the membership itself should have a price, since that inherently belongs to the user. On creation, you would set this value from the category. Then, whenever you need to get the current price the user is paying, you pull it from their membership, instead of the variable price on the category.

Django webapp - tracking financial account information

I need some coding advice as I am worried that I am creating, well, bloated code that is inefficient.
I have a webapp that keeps track of a company's financial data. I have a table called Accounts with a collection of records corresponding to the typical financial accounts such as revenue, cash, accounts payable, accounts receivable, and so on. These records are simply name holders to be pointed at as foreign keys.
I also have a table called Account_Transaction which records all the transactions of money in and out of all the accounts in Accounts. Essentially, the Account_Transaction table does all the heavy lifting while pointing to the various accounts being altered.
For example, when a sale is made, two records are created in the Account_Transaction table. One record to increase the cash balance and a second record to increase the revenue balance.
Trans Record 1:
Acct: Cash
Amt: 50.00
Date: Nov 1, 2011
Trans Record 2:
Acct: Revenue
Amt: 50.00
Date: Nov 1, 2011
So now I have two records, but they each point to a different account. Now if I want to view my cash balance, I have to look at each Account_Transaction record and check if the record deals with Cash. If so, add/subtract the amount of that record and move to the next.
During a typical business day, there may be upwards of 200-300 transactions like the one above. As such, the Account_Transaction table will grow pretty quickly. After a few months, the table could have a few thousand records. Granted this isn't much for a database, however, every time the user wants to know the current balance of, say, accounts receivable, I have to traverse the entire Account_Transaction table to sum up all records that deal with the account name "Accounts Receivable".
I'm not sure I have designed this in the most optimal manner. I had considered creating a distinct table for each account (one for "Cash", another for "Accounts Receivable" another for "Revenue" etc...), but with that approach I was creating 15-20 tables with the exact same parameters, other than their name. This seemed like poor design so I went with this Account_Transaction idea.
Does this seem like an appropriate way to handle this kind of data? Is there a better way to do this that I should really be adopting?
Thanks!
Why do you need to iterate through all the records to figure out the status of Accounts Receievable accounts? Am I missing something in thinking you can't just use a .filter within the Django ORM to selectively pick the records you need?
As your records grow, you could add some date filtering to your reports. In most cases, your accountant will only want numbers for this quarter, month, etc., not entire historic data.
Add an index to that column to optimize selection and then check out Djangos aggregation to Sum up values from your database.
Finally, you could do some conservative caching to speed up things for "quick view" style reports where you just want a total number very quickly, but you need to be careful with this to not have false positives, so reseting that cache on any change to the records would be a must.
Why don't you keep track of the exact available amount in the Account table? The Account_Transaction could only be used to view transaction history.