Rails 4: the right way to initialize hash table from database?

Rails 4: the right way to initialize hash table from database? - ruby-on-rails-4

my first question, be gentle :-)
I have a rails application using pieces of text from a database (table basically contains a short key k and the corresponding text; sort of like i18n, but for reasons out of scope here I don't want to use that right now). Made a small helper function for the views to get the matching text by key, along the lines of "Text.find_by k: x". Hell of a database load but allowing the texts to be changed dynamically by a CMS.
But since it turned out that the texts do rarely change, I wanted to preload the whole table into a hash instead. Since I'm not sure where to put such initialization stuff, and also because I thought lazy loading might be cool, here is what I did (simplified):
module MainHelper
...
##items = nil
def getText(x)
initItems if !##items
##items[x]
end
private
def initItems
##items = {}
Text.all.each {|t| ##items[t.k] = t.text} #load from model
end
Which seems to work great. But since I am quite a newbie here, I wonder if anybody thinks there is a better solution, or a solution that is more "the rails way"? Any comments highly appreciated!

What you've done is pretty cool. I'd be more likely to make a method items that memoizes the items, as opposed to an explicit initItems method. That would be the more conventional solution.
Use pluck to get only the fields in the table that you need... it'll make the SQL call more efficient.
because pluck in this case returns an array of two-element arrays, it's easy to use the to_h method to convert it into a hash.
(also, convention is to use snake_case for method names)
module MainHelper
...
##items = nil
def get_text(x)
items[x]
end
private
def items
##items ||= Text.pluck(:k,:text).to_h #load from model
end
end

Related

How to update the values of nested dict in python?

I am new to programming and currently learning python. I am stuck when trying out dictionaries in python.
I have a normal dict with a key and the value as an empty string:
dict = {'Project_path':''},
'Target_files':'',
'default_ws':''}
I am updating the value of keys like this:
dict['project_path'] = 'Pass'
But I want to create a dict where I don't have to use long keys; instead I can map these keys to numbers so that if i want to update a dict i can do it like this:
dict = {1:{'Project_path':''},
2:{'Target_files':''},
3:{'default_ws':''}}
dict['1'] = 'pass'
It should update the value for 'project_path' as 'pass'
And after updating the values, I have to write this in a file
Is there way to do it ?

Welcome to Programming in Python! Few things here: First, using dict as a variable name is not a good idea, as this is a keyword for the whole class of dictionary objects. You can use something like, my_dict, or something like that.
Second, when you make the keys numbers and then call dict['1'] = 'pass' this looks for the string '1' in the keys, when you have declared the integer 1 as the key. Let me explain:
'1' is a string
1 is an integer.
In python these are two very different things and cannot be mixed around . That's why the value doesn't update. If you were to say dict[1] = 'pass', that should work for you and it should update the values. Note, though that this will overwrite the whole dictionary that is the current value and replace it with 'pass', if instead you just want to change the dictionary key, use
dict[1]['Project_path'] = 'pass'.
This, however seems redundant to me. It's kind of unclear exactly why you want to map integers in there (perhaps to loop through it and change values), but if you just want integers associated with it, you could just create and array of dictionaries like this:
my_array = [{"Project_path": ''}, {"Target_files":''}, {default_ws":''}]
Then, you could access them by their index starting at 0. my_array[0] is {"Project_path":''}. Again, it's kind of unclear why you want to map integers to these dicts, so it doesn't really make sense to me because this is a little redundant.
Finally, writing to a file. There's a lot of webpages out there, because there are many options when it comes to reading and writing files (like this one), but here's the short version.
with open('your_filepath.txt', 'w') as file: #note this will overwrite the current file, see the page I linked.
file.write(my_array[0]) #or whatever you want to write to it.
It's that simple, but like I said, take a look at that page to see what all your options are and choose the best one for you. Hopefully that is helpful and can get you started.

What is the best way to use query with a list and keep the list order? [duplicate]

This question already has answers here:
Django: __in query lookup doesn't maintain the order in queryset
(6 answers)
Closed 8 years ago.
I've searched online and could only find one blog that seemed like a hackish attempt to keep the order of a query list. I was hoping to query using the ORM with a list of strings, but doing it that way does not keep the order of the list.
From what I understand bulk_query only works if you have the id's of the items you want to query.
Can anybody recommend an ideal way of querying by a list of strings and making sure the objects are kept in their proper order?
So in a perfect world I would be able to query a set of objects by doing something like this...
Entry.objects.filter(id__in=['list', 'of', 'strings'])
However, they do not keep order, so string could be before list etc...
The only work around I see, and I may just be tired or this may be perfectly acceptable I'm not sure is doing this...
for i in listOfStrings:
object = Object.objects.get(title=str(i))
myIterableCorrectOrderedList.append(object)
Thank you,

The problem with your solution is that it does a separate database query for each item.
This answer gives the right solution if you're using ids: use in_bulk to create a map between ids and items, and then reorder them as you wish.
If you're not using ids, you can just create the mapping yourself:
values = ['list', 'of', 'strings']
# one database query
entries = Entry.objects.filter(field__in=values)
# one trip through the list to create the mapping
entry_map = {entry.field: entry for entry in entries}
# one more trip through the list to build the ordered entries
ordered_entries = [entry_map[value] for value in values]
(You could save yourself a line by using index, as in this example, but since index is O(n) the performance will not be good for long lists.)

Remember that ultimately this is all done to a database; these operations get translated down to SQL somewhere.
Your Django query loosely translated into SQL would be something like:
SELECT * FROM entry_table e WHERE e.title IN ("list", "of", "strings");
So, in a way, your question is equivalent to asking how to ORDER BY the order something was specified in a WHERE clause. (Needless to say, I hope, this is a confusing request to write in SQL -- NOT the way it was designed to be used.)
You can do this in a couple of ways, as documented in some other answers on StackOverflow [1] [2]. However, as you can see, both rely on adding (temporary) information to the database in order to sort the selection.
Really, this should suggest the correct answer: the information you are sorting on should be in your database. Or, back in high-level Django-land, it should be in your models. Consider revising your models to save a timestamp or an ordering when the user adds favorites, if that's what you want to preserve.
Otherwise, you're stuck with one of the solutions that either grabs the unordered data from the db then "fixes" it in Python, or constructing your own SQL query and implementing your own ugly hack from one of the solutions I linked (don't do this).
tl;dr The "right" answer is to keep the sort order in the database; the "quick fix" is to massage the unsorted data from the database to your liking in Python.
EDIT: Apparently MySQL has some weird feature that will let you do this, if that happens to be your backend.

Methods for filtering within a Doctrine results Collection?

I'm very new to Doctrine, so this might seem a rather obvious question to those more experienced.
I'm writing a data import tool that has to check every row being imported contains valid data. For example, the Row has a reference to a product code, I need to check that there is a pre-existing Product object with that code. If not, flag that row as invalid.
Now I can easily do something like this for each row.
$productCode = $this->csv->getProductNumber();
$product = $doctrine->getRepository('MyBundle:Product')->findOneBy(array('code' => $productCode ));
But that seems hideously inefficient. So I thought about returning the entire Product Collection and then iterating within that.
$query = $this->getEntityManager()->createQuery('SELECT p FROM MyBundle\Entity\Product p');
$products = $query->getResult();
All well and good, but then I've got to write messy loops to search for.
Two questions:
1). I was wondering if I'm missing some utility methods such as you have in Magento Collections, where you can search within the Collection results without incurring additional database hits. For example, in Magento this will iterate the collection and filter on the code property.
$collection->getItemByColumnValue("code","FZTY444");
2). At the moment I'm using the query below which returns an "rectangular array". More efficient, but could be better.
$query = $this->getEntityManager()->createQuery('SELECT p.code FROM MyBundle\Entity\Product p');
$products = $query->getResult();
Is there a way of returning a single dimensional array without have to reiterate the resultset and transform into a flat array, so I can use in_array() on the results?

If I understand your question correctly you want to filter an array of entities returned by getResult(). I had a similar question and I think I've figured out two ways to do it.
Method 1: Arrays
Use the array_filter method on your $products variable. Yes, this amounts to a "loop" in the background, but I think this is a generally acceptable way of filtering arrays rather than writing it yourself. You need to provide a callback (anonymous function in 5.3 preferred). Here is an example
$codes = array_filter($products, function($i) {
return $i->getCode() == '1234';
});
Basically in your function, return true if you want the result returned into $codes and false otherwise (not sure if the false is necssary, or if a void return value is sufficient).
Method 2: Doctrine's ArrayCollection
In your custom repository or where ever you are returning the getResult() method, you can instead return an ArrayCollection. This is found in the Doctrine namespace Doctrine\Common\Collections\. More documenation on the interface behind this method can be found here. So in this case you would have
$query = $this->getEntityManager()->createQuery('SELECT p FROM MyBundle\Entity\Product p');
$products = new ArrayCollection($query->getResult());
You can then use the filter() method on the array collection. Use it in a very similar way to the array_filter. Except it doesn't need a first argument because you call it like this: $products->filter(function($i) { ... });
The ArrayCollection class is an iterator, so you can use it in foreach loops to your hearts content, and it shouldn't really be different from an array of your products. Unless your code explicitly uses $products[$x], then it should be plug 'n' play*.
*Note: I haven't actually tested this code or concept, but based on everything I've read it seems legit. I'll update my answer if it turns out I'm wrong.

You can use another hydration mode. $query->getResult() usually returns a result in object hydration. Take a look at $query->getScalarResult(), which should be more suitable for your needs.
More info on the Doctrine 2 website.

Is there any acceptable way to chop/recombine Django querysets without using the API?

I was to forced to use a models.CharField to store some additional flags in one of my models. So I'm abusing each letter of the field as a flag. As an example, 'MM5' would mean "man, married, age above 50" and 'FS2' "female, single, age above 20".
I'm using methods to query/access these flags. Of course I cannot use these methods with the queryset API. I'm using list comprehensions calling the methods to filter an initial queryset and transform them into a plain list, which is good enough for most template feeding:
people = People.objects.filter(name__startswith='J')
people_i_want = [p for p in people if p.myflags_ismale() and p.myflags_isolderthan(30)]
So, is there any ok'ish way to retransform these lists back into a queryset? Or to chop/filter a queryset based on the output of my methods without transforming it to a normal list in the first place?

It would probably be needlessly complicated as well as bad practice to try and "re-transform" your list back into a QuerySet, the best thing to do is to use cleverer QuerySet filtering.
You should use the queryset filter by regex syntax to get the functionality you need. Documentation here.
For instance the equivalent to ismale() using regex would be something like...
People.objects.filter(myflags__regex=r'.M.') # <-- For matching something like 'MM5'
# Please note I haven't tested this regex expression, but the principal is sound
Also, while I'm admittedly not a database guru, I'm fairly certain using this sort of "flags" in a charfield is a rather inefficient way of doing things.

If you must convert a list back into a queryset, then the idiom I use is:
People.objects.filter(pk__in=[x.pk for x in list_of_objects])
Obviously, this hits the database again. But if you really need it.

Django ORM: Optimizing queries involving many-to-many relations

I have the following model structure:
class Container(models.Model):
pass
class Generic(models.Model):
name = models.CharacterField(unique=True)
cont = models.ManyToManyField(Container, null=True)
# It is possible to have a Generic object not associated with any container,
# thats why null=True
class Specific1(Generic):
...
class Specific2(Generic):
...
...
class SpecificN(Generic):
...
Say, I need to retrieve all Specific-type models, that have a relationship with a particular Container.
The SQL for that is more or less trivial, but that is not the question. Unfortunately, I am not very experienced at working with ORMs (Django's ORM in particular), so I might be missing a pattern here.
When done in a brute-force manner, -
c = Container.objects.get(name='somename') # this gets me the container
items = c.generic_set.all()
# this gets me all Generic objects, that are related to the container
# Now what? I need to get to the actual Specific objects, so I need to somehow
# get the type of the underlying Specific object and get it
for item in items:
spec = getattr(item, item.get_my_specific_type())
this results in a ton of db hits (one for each Generic record, that relates to a Container), so this is obviously not the way to do it. Now, it could, perhaps, be done by getting the SpecificX objects directly:
s = Specific1.objects.filter(cont__name='somename')
# This gets me all Specific1 objects for the specified container
...
# do it for every Specific type
that way the db will be hit once for each Specific type (acceptable, I guess).
I know, that .select_related() doesn't work with m2m relationships, so it is not of much help here.
To reiterate, the end result has to be a collection of SpecificX objects (not Generic).

I think you've already outlined the two easy possibilities. Either you do a single filter query against Generic and then cast each item to its Specific subtype (results in n+1 queries, where n is the number of items returned), or you make a separate query against each Specific table (results in k queries, where k is the number of Specific types).
It's actually worth benchmarking to see which of these is faster in reality. The second seems better because it's (probably) fewer queries, but each one of those queries has to perform a join with the m2m intermediate table. In the former case you only do one join query, and then many simple ones. Some database backends perform better with lots of small queries than fewer, more complex ones.
If the second is actually significantly faster for your use case, and you're willing to do some extra work to clean up your code, it should be possible to write a custom manager method for the Generic model that "pre-fetches" all the subtype data from the relevant Specific tables for a given queryset, using only one query per subtype table; similar to how this snippet optimizes generic foreign keys with a bulk prefetch. This would give you the same queries as your second option, with the DRYer syntax of your first option.

Not a complete answer but you can avoid a great number of hits by doing this
items= list(items)
for item in items:
spec = getattr(item, item.get_my_specific_type())
instead of this :
for item in items:
spec = getattr(item, item.get_my_specific_type())
Indeed, by forcing a cast to a python list, you force the django orm to load all elements in your queryset. It then does this in one query.

I accidentally stubmled upon the following post, which pretty much answers your question :
http://lazypython.blogspot.com/2008/11/timeline-view-in-django.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Rails 4: the right way to initialize hash table from database? - ruby-on-rails-4

Related

How to update the values of nested dict in python?

What is the best way to use query with a list and keep the list order? [duplicate]

Methods for filtering within a Doctrine results Collection?

Is there any acceptable way to chop/recombine Django querysets without using the API?

Django ORM: Optimizing queries involving many-to-many relations

Categories

Resources