Is there a maximum value as to how high pk values for a model can get? For example, for something like an activity feed model, the pk's can get really large.
Ex, "Kyle liked this post", "Alex commented on this post" etc..As you can see, for every action, an activity feed object is created. Is there some sort of maximum threshold that will be reached? I've done some research but haven't found a concise answer. If there is some limit, how can one overcome this?
I currently use PostgreSQL as my database.
Django's auto-incrementing IDs are AutoFields, a subclass of IntegerField. It will generate the primary keys as PostgreSQL integer, which are signed 32-bit integers with a maximum value of 2147483647, a little over 2 billion.
I'm gonna take a guess and assume that postgreSQL stores primary keys as 64 bits unsigned integers. If this is the case, then you can have up to 2^64 different values. Even with a 32 bits integer, that leaves us with 4294967296 possibilities. Unless you are twitter or facebook, you should never be annoyed with this kind of limit.
Related
What is the best model field to store a frequency, from 1 Hz to 10 GHz?
IMHO could be a PositiveBigIntegerField but I'm not completely convinced...
Thank you
If you only need integer hertzes, then go with PositiveBigIntegerField, nothing wrong with that.
If you (think you might) need non-integer values, you could store eg. millihertzes - or just go for a DecimalField and store hertz.
Moving from RDBMS and I am not sure how best to design for below scenario
I have a table with around 200,000 questions with question id as partition key.
Users view questions and i do not wish to show viewed question again to the user. So which one is better option?
Have a table with question id as partition key and a set of user IDs as attribute
Have a table with User id as partition key and a set of question IDs they have viewed as attribute
Have a table with question id as partition key and user id as sort key. Once user has viewed question, add row to this table
1 and 2 might have a problem with the 400 kb size limit for item. The third seems better option though i would end up with 100 million items as there will be one row per user per question viewed. but i assume this is not a problem for dynamo?
Another problem is how to get 10 random questions not viewed by the user. Do i generate 10 random numbers between 1 and 200,000 (the number of questions) and then check if not in table mentioned in point 3 above?
I definitely would not go with option 1 or 2 for the reason you mentioned: you would already be limiting your scalability by the 400kb limit. With a UUID of 128 bits, you would be limited to about 250 users per question.
Option 3 is the way to go with DynamoDB, but what you need to consider is what is the partition key and what is the range key. You could have user_id as the partition key and question_id as the range key. The answer to that decision depends on how your data is going to be accessed. DynamoDB divides the total table throughput by each partition key: each one of your n partition keys gets 1/nth of the table throughput. For example, if you have a subset of partition keys that are accessed more than the others, then you won't be efficiently utilizing your table throughput because those partition keys that actually use up less than 1/nth of the throughput are still provisioned for 1/nth of the throughput. The general idea is that you want to have the each of your partition keys utilized equally. I think that you have it correct, I'm assuming that each question is given randomly and is no more popular than another, while some users might be more active than others.
The other part of your question is a little bit more difficult to answer / determine. You could do it your way where you have tables that contain question and user pairs for the questions that those users have read or you could have tables that contain the pairs for the questions those users haven't read. The tradeoff here is between initial write cost and subsequent read cost, and the answer depends on the amount of questions that you have compared to the consumption rate.
When you have a large amount of questions compared to the rate that users will progress through them, the chances of randomly selecting an already chosen one are small, so you're going to want to store have-read question-user pairs. With this setup you don't pay a lot to initialize a user (you don't have to write a question-user pair for each question) and you won't have a lot of miss-read costs (i.e. where you select a question-user pair and it turns out they already read it, this still consumes read-write units).
If you have a small amount of questions compared to the rate that users consume them, then you're going to want to store the haven't-read question-user pairs. You pay something to initialize each user (writing in one question-user pair for each question), but then you don't have any accidental miss-reads. If you stored them as have-read pairs when their is a small amount of questions, then you will encounter a lot of miss-reads as the percentage of read questions approaches 100% (to the point where you would have been better off just setting them up as haven't-read pairs).
I hope this helps with your design considerations. Drop a comment if you need clarification!
I am designing the primary key for storing product. I look around to find some insight how to design the ID as using auto increment is too boring. Do any one know that the code 'KB46279860I' on the below banknote meaning?
100 USD picture
I think that code is not just using auto-increment but some algorithm like check digit,etc.
Could any one give me some hints , Thanks!!
If you're not planning on showing the user your ID then auto-increment could save you processing time as it is handled by your database directly.
If you are planning on showing the ID to the user without showing the one in the database, you could consider using Hashids, or GUID or generating your own unique random value with a check digit. You can use Luhn or Damm's algorithm for check digit.
I want to store rows that have 65536 columns in a Sqlite database, and I am doing that using C++ and QT.
My question is: Since the default maximum number of columns seems to be 2000 no more, how to configure this parameter from C++ and Qt?
Thank you.
The SQLLite homepage has some explanation on this:
2.Maximum Number Of Columns
The SQLITE_MAX_COLUMN compile-time parameter is used to set an upper
bound (...)
and
The default setting for SQLITE_MAX_COLUMN is 2000. You can change it
at compile time to values as large as 32767. On the other hand, many
experienced database designers will argue that a well-normalized
database will never need more than 100 columns in a table.
Like that, even if you increased it, you could only achieve half of what you want. Apart from that I can only refer to Styne666's comment on your post.
I'm saving data to a PostgreSQL backend through Django. Many of the fields in my models are DecimalFields set to arbitrarily high max_digits and decimal_places, corresponding to numeric columns in the database backend. The data in each column have a precision (or number of decimal places) that is not known a priori, and each datum in a given column need not have the same precision.
For example, arguments to a model may look like:
{'dist': Decimal("94.3"), 'dist_e': Decimal("1.2")}
{'dist': Decimal("117"), 'dist_e': Decimal("4")}
where the keys are database column names.
Upon output, I need to preserve and redisplay those data with the precision with which they were read in. In other words, after the database is queried, the displayed data need to look exactly like the data in that were read in, with no additional or missing trailing 0's in the decimals. When queried, however, either in a django shell or in the admin interface, all of the DecimalField data come back with many trailing 0's.
I have seen similar questions answered for money values, where the precision (2 decimal places) is both known and the same for all data in a given column. However, how might one best preserve the exact precision represented by Decimal values in Django and numeric values in PostgreSQL when the precision is not the same and not known beforehand?
EDIT:
Possibly an additional useful piece of information: When viewing the table to which the data are saved in a Django dbshell, the many trailing 0's are also present. The python Decimal value is apparently converted to the maximum precision value specified in the models.py file upon being saved to the PostgreSQL backend.
If you need perfect parity forwards and backwards, you'll need to use a CharField. Any number-based database field is going to interact with your data muxing it in some way or another. Now, I know you mentioned not being able to know the digit length of the data points, and a CharField requires some length. You can either set it arbitrarily high (1000, 2000, etc) or I suppose you could use a TextField, instead.
However, with either approach, you're going to be wasting a lot database resources in most scenarios. I would suggest modifying your approach such that extra zeros at the end don't matter (for display purpose you could always chop them off), or such that the precision is not longer arbitrary.
Since I asked this question awhile ago and the answer remains the same, I'll share what I found should it be helpful to anyone in a similar position. Django doesn't have the ability to take advantage of the PostgreSQL Numerical column type with arbitrary precision. In order to preserve the display precision of data I upload to my database, and in order to be able to perform mathematical calculations on values obtained from database queries without first recasting strings into python Decimal types, I opted to add an extra precision column for every numerical column in the database.
The precision value is an integer indicating how many digits after the decimal point are required. The datum 4.350 is assigned a value of 3 in its corresponding precision column. Normally displayed integers (e.g. 2531) have a precision entry of 0. However, large integers reported in scientific notation are assigned a negative integer to preserve their display precision. The value 4.320E+33, for example, gets the precision entry -3. The database recognizes that all objects with negative precision values should be re-displayed in scientific notation.
This solution adds some complexity to the structure and code surrounding the database, but it has proven effective. It also allows me to accurately preserve precision through calculations like converting to/from log and linear values.