Drawbacks of using an integer as a bitfield? - django

I have a bunch of boolean options for things like "acceptable payment types" which can include things like cash, credit card, cheque, paypal, etc. Rather than having a half dozen booleans in my DB, I can just use an integer and assign each payment method an integer, like so
PAYMENT_METHODS = (
(1<<0, 'Cash'),
(1<<1, 'Credit Card'),
(1<<2, 'Cheque'),
(1<<3, 'Other'),
)
and then query the specific bit in python to retrieve the flag. I know this means the database can't index by specific flags, but are there any other drawbacks?
Why I'm doing this: I have about 15 booleans already, split into 3 different logical "sets". That's already a lot of fields, and using 3 many-to-many tables to save a bunch of data that will rarely change seems inefficient. Using integers allows me to add up to 32 flags to each field without having to modify the DB at all.

The main drawback that I can think of is maintainability. Someone writing a query against the database has to understand the bit convention rather than being able to go after a more human readable set of columns. Also, if one of the "accepted payment types" is removed, the data itself has to be migrated rather than just dropping the a column in the table.

This isn't the worst, but there might be a better way.
Define table called PaymentTypes
id, paymentId, key (string), value (boolean)
Now you just populate this table with whatever you want. No long column of booleans, and you can dynamically add new types. The drawback to this is that default of all booleans is NULL or false.

Not sure what database you're using, but MySQL has a set type.

If you could limit your use case to one or more sets of values that can only have one bit true at a time, perhaps you could use enums in your database. You would get the best of both worlds, maintainable like btreat notes, and still smaller (and simpler) than several booleans.
Since that's not possible, I'd agree with your initial assment and go with a bitfield. I would use/create a bitfield wrapper however, so that in your code you don't deal with flipping and shifting bits directly - that becomes difficult to maintain and debug, as btreat says - but instead deal with it like a list or dictionary and convert to/from a bitfield when needed.
Some commentary on enums/bitfields in Django

I think the previous posters were both correct. The cleanest way to do it in a "relational" database would be to define a new relation table that stores payment types. In practice though, this is usually more hassle than it's worth.
Using enums in your code and using something similar in the DB (check constraints in Oracle, AFAIK) should help keep it maintainable, and obvious to the poor soul who's job it will be to add a new type, many many years after you've left

Related

Django: what's the simplest way to test if object is in set

Just a non-critical question that has bothered me after trying to find answers in the doc to no avail.
class Book(models.Model)
authors = ManyToManyField(Author)
homer = Author.objects.get(pk=1)
iliad = Book.objects.get(pk=2)
iliad.authors.filter(pk=homer.pk).exists()
Book.objects.filter(name='Iliad', authors__in=homer).exists()
I believe the last two asserts will test if Homer is the author of Iliad. But I kind of dislike the (pk=homer.pk) portion and am wondering if there's any construct that will allow me to test if an object (assuming we already have it from a "get") exists in a queryset?
(homer in iliad.authors)
While the above expression may also work, and is arguably more pythonic, it may retrieve unnecessarily too many authors back from DB.
In this particular case homer in iliad.authors would be only slightly slower than exists version. Most of the books have one or two authors so getting all from DB should not be a problem.
I think there is no way to do this in Django faster without using filter in way you used, sorry.
Django's queryset code implements some optimization when using the in operator.
To get better insight, I'd try timing the use of in on the queryset vs. your other examples, for big and small datasets.

Best R data structure to return table value counts

The following function returns a data.frame with two columns:
fetch_count_by_day=function(con){
q="SELECT t,count(*) AS count FROM data GROUP BY t"
dbGetQuery(con,q) #Returns a data frame
}
t is a DATE column, so output looks like:
t count(*)
1 2011-09-22 1438
...
All I'm really interested in is if any records for a given date already exist; but I will also use the count as a sanity check.
In C++ I'd return a std::map<std::string,int> or std::unordered_map<std::string,int> (*).
In PHP I'd use an associative array with the date as the key.
What is the best data structure in R? Is it a 2-column data.frame? My first thought was to turn the t column into rownames:
...
d=dbGetQuery(con,q)
rownames(d)=d[,1]
d$t=NULL
But data.frame rownames are not unique, so conceptually it does not quite fit. I'm also not sure if it makes using it any quicker.
(Any and all definitions of "best": quickest, least memory, code clarity, least surprise for experienced R developers, etc. Maybe there is one solution for all; if not then I'd like to understand the trade-offs and when to choose each alternative.)
*: (for C++) If benchmarking showed this was a bottleneck, I might convert the datestamp to a YYYYMMDD integer and use std::unordered_map<int,int>; knowing the data only covers a few years I might even use a block of memory with one int per day between min(t) and max(t) (wrapping all that in a class).
Contingency tables are actually arrays (or matrices) and can very easily be created.The dimnames hold the values and the array/matrix at its "core" holds the count data. The "table" and "tapply" functions are natural creators. You access the counts with "[" and use dimnames( ) followed by an "[" to get you the row annd column names. I would say it was wiser to use the "Date" class for dates than storing in "character" vectors.

best practice for implementing a Search-Function via prepared statements

I'm trying to implement a Search-Function using c++ and libpqxx.
But I've got the following problem:
The user is able to specify 4 different search patterns (each of them optional):
from date
till date
document type
document id
Each of them is optional. So if I want to use prepared statements I would need 2^4 = 16 different prepared statements. Well, it's possible, but I want to avoid this.
Here as an example what a prepared statement in libpqxx looks like:
_connection->prepare("ExampleStmnt", "SELECT * FROM foo WHERE title=$1 AND id=$2 AND date=$3")
("text", pqxx::prepare::treat_string)
("smallint", pqxx::prepare::treat_direct)
("timestamp", pqxx::prepare::treat_direct);
Therefore I also have no idea how I would piece such a prepared statement together.
Is there any other 'nice' way that I didn't think of?
The best you can do is to have four different ->prepare clauses, depending on how many search criteria are actually used, concatenate the criteria into your String, and then branch to one of the four prepare code blocks. (That will probably spook your style checker into thinking you are creating an injection vulnerability, but of course you aren't, as long as you insert only elements of the closed set os column names.)
Note that this isn't a very nice solution, but even Stephane Faroult (in The Art of SQL) says it's the best one possible, so who am I to argue?

Have you used boost::tribool in real work?

tribool strikes me as one of the oddest corners of Boost. I see how it has some conveniences compared to using an enum but an enum can also be easily expanded represent more than 3 states.
In what real world ways have you put tribool to use?
While I haven't used C++, and hence boost, I have used three-state variables quite extensively in a network application where I need to store state as true/false/pending.
An extra state in any value type can be extremely valuable. It avoids the use of "magic numbers" or extra flags to determine if the value of a variable is "maybe" or "unknown".
Instead of true or false, the state of a tribool is true, false, or indeterminate.
Let's say you have a database that contains a list of customers and their dateOfBirth. So you write a function along the lines of :
tribool IsCustomerAdult(customerName);
The function returns:
`true` if the customer is 18 or older;
`false` if the customer is less than 18;
`indeterminate` if the customer is not in the database
(or the dateOfBirth value is not present).
Very useful.
I think the extra benefit is not only the 3rd value, but also that you can easily use the 3-valued logic!
For example:
(true && indeterminate) == indeterminate
(true || indeterminate) == true
SQL implements such logic.
I've seen numerous examples of two booleans being used to represent three possible states, explicitly or otherwise, with the fourth state being silently assumed to be impossible. In at least two cases, I've changed such constructions to use tribool since we started using boost.
I am a big fan of the Boost library and started using it at company who I have since left. After getting exposure to and using the boost library extensively throughout our project I stumbled on tribool and was considering using for some "Fuzzy Logic" algorithms needing improvements.
I left before I had a chance to get into it, but beyond the "Fuzzy Logic" example, other modules in the system had components with this sort of between state that considering now, I would probably end up using tribool in a decent amount of code if I was still with the company.
-bn
I think it is very useful for Language moulding such as OCR applications and Speech synthesis because as you know human languages are ambiguous and they have a lot of Intermediate statuses
looking foreword to improve the current technologies using the tribool

Casting in Informix

In Informix, how can I cast a char(8) type into a money type, so that I can compare it to another money type?
Using "tblAid.amt::money as aid_amt" did not work.
Using "(tblAid.amt * 1) AS aid_amt" did not work.
try this -->
select (disb_amt::NUMERIC) disb_amt from tmp_kygrants;
You may be able to compare the amounts as numeric.
First question - why on earth are you not storing a numeric value in a numeric column? This would make the rest of your question moot. It would also mean that your system will perform better. When you need to store data values, use the obvious type; do not use a string type unless the data is a string.
As already noted, you can use the non-standard Informix cast notation:
SELECT some_column::MONEY FROM WhereEver;
You can also be more careful about the cast type - using MONEY(8,2) for example. You can also use the standard notation:
SELECT CAST(some_column AS MONEY(8,2)) FROM WhereEver;
This assumes you are using IDS 9.x or later -- older products do not support casts at all. However, in general, Informix is pretty good about doing conversions automatically (for example, converting numbers to strings). However, strings are compared lexicographically and not numerically, so a CAST is probably wiser in this context -- but avoiding the need for a cast by using the correct type in the first place is wiser still.
'tis been a while since I played around with informix and I don't have a running instance handy at the moment. However, there are two things that can cause a problem here:
1) since it is a char(8) it can contain values that can not be casted to numeric without a bit of 'cleanup'. E.g. "abc". Or "1,234,567.00".
2) Trailing spaces. (char as opposed to varchar).
What informix error do you get on your explicit cast (::money)?