How to make this django attribute name search better? - django

lcount = Open_Layers.objects.all()
form = SearchForm()
if request.method == 'POST':
form = SearchForm(request.POST)
if form.is_valid():
data = form.cleaned_data
val=form.cleaned_data['LayerName']
a=Open_Layers()
data = []
for e in lcount:
if e.Layer_name == val:
data = val
return render_to_response('searchresult.html', {'data':data})
else:
form = SearchForm()
else:
return render_to_response('mapsearch.html', {'form':form})
This just returns back if a particular "name" matches . How do to change it so that it returns when I give a search for "Park" , it should return Park1 , Park2 , Parking , Parkin i.e all the occurences of the park .

You can improve your searching logic by using a list to accumulate the results and the re module to match a larger set of words.
However, this is still pretty limited, error prone and hard to maintain or even harder to make evolve. Plus you'll never get as nice results as if you were using a search engine.
So instead of trying to manually reinvent the wheel, the car and the highway, you should spend some time setting up haystack. This is now the de facto standard to do search in Django.
Use woosh as a backend at first, it's going to be easier. If your search get slow, replace it with solr.
EDIT:
Simple clean alternative:
Open_Layers.objects.filter(name__icontains=val)
This will perform a SQL LIKE, adding %` for you.
This going to kill your database if used too often, but I guess this is probably not going to be an issue with your current project.
BTW, you probably want to rename Open_Layers to OpenLayers as this is the Python PEP8 naming convention.

Instead of
if e.Layer_name == val:
data = val
use
if val in e.Layer_name:
data.append(e.Layer_name)
(and you don't need the line data = form.cleaned_data)

I realise this is an old post, but anyway:
There's a fuzzy logic string comparison already in the python standard library.
import difflib
Mainly have a look at:
difflib.SequenceMatcher(None, a='string1', b='string2', autojunk=True).ratio()
more info here:
http://docs.python.org/library/difflib.html#sequencematcher-objects
What it does it returns a ratio of how close the two strings are, between zero and 1. So instead of testing if they're equal, you chose your similarity ratio.
Things to watch out for, you may want to convert both strings to lower case.
string1.lower()
Also note you may want to impliment your favourite method of splitting the string i.e. .split() or something using re so that a search for 'David' against 'David Brent' ranks higher.

Related

Python Data Scraping (using Xpath) - Returning empty lists and stripping characters

--
I am attempting to scrape information from the website:
http://www.forexfactory.com/#tradesPositions
Now, I used to have one up and running which this forum helped me get going, but I think something has changed on the website and the script I had no longer works.
What do I need?
I would like to scrape the number of 'short' and 'long' positions for AUDUSD, EURUSD, GBPUSD, USDJPY, USDCAD, NZDUSD and USDCHF.
NOT the percentages, the actual number of traders.
What have I done?
This is for EURUSD
import lxml.html
from selenium import webdriver
driver = webdriver.Chrome("C:\Users\MY NAME\Downloads\Chrome Driver\chromedriver.exe")
url = ('http://www.forexfactory.com/#tradesPositions')
driver.get(url)
tree = lxml.html.fromstring(driver.page_source)
results_short = tree.xpath('//*[#id="flexBox_flex_trades/positions_tradesPositionsCopy1"]/div[1]/table/tbody/tr/td[2]/div[1]/ul[1]/li[2]/span/text()')
results_long = tree.xpath('//*[#id="flexBox_flex_trades/positions_tradesPositionsCopy1"]/div[1]/table/tbody/tr/td[2]/div[1]/ul[1]/li[1]/span/text()')
print "Forex Factory"
print "Traders Short EURUSD:",results_short
print "Traders Long EURUSD:",results_long
driver.quit()
This returns
Forex Factory
Traders Short EURUSD: ['337 Traders ', ' ']
Traders Long EURUSD: [' 259 Traders']
I would like to strip everything away from the result except for the numbers. I've tried .strip() and .replace() but neither work on a list. Which will come as no surprise to you guys I don't think!
Empty List
When I apply the same technique to AUDUSD I get an empty list.
import lxml.html
from selenium import webdriver
driver = webdriver.Chrome("C:\Users\Andrew G\Downloads\Chrome Driver\chromedriver.exe")
url = ('http://www.forexfactory.com/#tradesPositions')
driver.get(url)
tree = lxml.html.fromstring(driver.page_source)
results_short = tree.xpath('//*[#id="flexBox_flex_trades/positions_tradesPositionsCopy1"]/div[6]/table/tbody/tr/td[2]/div[1]/ul[1]/li[2]/span/text()')
results_long = tree.xpath('//*[#id="flexBox_flex_trades/positions_tradesPositionsCopy1"]/div[6]/table/tbody/tr/td[2]/div[1]/ul[1]/li[1]/span/text()')
s2 = results_short
l2 = results_long
print "Traders Short AUDUSD:",s2
print "Traders Long AUDUSD:",l2
This returns
Traders Short AUDUSD: []
Traders Long AUDUSD: []
What gives? Is the Xpath not working? Just use Chromes 'inspect element' feature and navigated to the desired number, and copied the path. Same method for EURUSD.
Ideally, It would be nice to set up a list of div numbers that can insert into the tree.xpath instead of repeating the lines of code for all the different currencies to make it neater. So, in the Xpath where it has:
/div[number]/
It would be nice to have a list, i.e [1,2,3,4,5,6] that can insert into that because the rest of the Xpath is the same for the currencies. Anyway, that's an optional bonus, priority is to get a return for all currencies listed.
THANKS
You can remove all the space inside your result as you mentioned with strip method, here is my sample code:
for index in range(len(results_short)):
results_short[index] = results_short[index].strip()
if results_short[index] == "":
del results_short[index]
for index in range(len(results_long)):
results_long[index] = results_long[index].strip()
if results_long[index] == "":
del results_long[index]
For the problem you cannot get the result of AUD because the values are not loaded to the page until you have clicked the "expand" button. But I have found you can get the result from the following page: http://www.forexfactory.com/trades.php
So you can change the value of url as:
url = ('http://www.forexfactory.com/trades.php')
For this page, since the name of CSS id has changed, you need to update your value to:
results_short = tree.xpath('//*[#id="flexBox_flex_trades/positions_tradesPositions"]/div[6]/table/tbody/tr/td[2]/div[1]/ul[1]/li[2]/span/text()')
results_long = tree.xpath('//*[#id="flexBox_flex_trades/positions_tradesPositions"]/div[6]/table/tbody/tr/td[2]/div[1]/ul[1]/li[1]/span/text()')
Then apply the strip function as mentioned above, you should be able to get the correct results.

Custom Django count filtering

A lot of websites will display:
"1.8K pages" instead of "1,830 pages"
or
"43.2M pages" instead of "43,200,123 pages"
Is there a way to do this in Django?
For example, the following code will generate the quantified amount of objects in the queryset (i.e. 3,123):
Books.objects.all().count()
Is there a way to add a custom count filter to return "3.1K pages" instead of "3,123 pages?
Thank you in advance!
First off, I wouldn't do anything that alters the way the ORM portion of Django works. There are two places this could be done, if you are only planning on using it in one place - do it on the frontend. With that said, there are many ways to achieve this result. Just to spout off a few ideas, you could write a property on your model that calls count then converts that to something a little more human readable for the back end. If you want to do it on the frontend you might want to find a JavaScript lib that could do the conversion.
I will edit this later from my computer and add an example of the property.
Edit: To answer your comment, the easier one to implement depends on your skills in python vs in JavaScript. I prefer python so I would probably do it in there somewhere on the model.
Edit2: I have wrote an example to show you how I would do a classmethod on a base model or on the model that you need these numbers on. I found a python package called humanize and I took its function that converts these to readable and modified it a bit to allow for thousands and took out some of the super large number conversion.
def readable_number(value, short=False):
# Modified from the package `humanize` on pypy.
powers = [10 ** x for x in (3, 6, 9, 12, 15, 18)]
human_powers = ('thousand', 'million', 'billion', 'trillion', 'quadrillion')
human_powers_short = ('K', 'M', 'B', 'T', 'QD')
try:
value = int(value)
except (TypeError, ValueError):
return value
if value < powers[0]:
return str(value)
for ordinal, power in enumerate(powers[1:], 1):
if value < power:
chopped = value / float(powers[ordinal - 1])
chopped = format(chopped, '.1f')
if not short:
return '{} {}'.format(chopped, human_powers[ordinal - 1])
return '{}{}'.format(chopped, human_powers_short[ordinal - 1])
class MyModel(models.Model):
#classmethod
def readable_count(cls, short=True):
count = cls.objects.all().count()
return readable_number(count, short=short)
print(readable_number(62220, True)) # Returns '62.2K'
print(readable_number(6555500)) # Returns '6.6 million'
I would stick that readable_number in some sort of utils and just import it in your models file. Once you have that, you can just stick that string wherever you would like on your frontend.
You would use MyModel.readable_count() to get that value. If you want it under MyModel.objects.readable_count() you will need to make a custom object manager for your model, but that is a bit more advanced.

Python - null object pattern with generators

It is apparently Pythonic to return values that can be treated as 'False' versions of the successful return type, such that if MyIterableObject: do_things() is a simple way to deal with the output whether or not it is actually there.
With generators, bool(MyGenerator) is always True even if it would have a len of 0 or something equally empty. So while I could write something like the following:
result = list(get_generator(*my_variables))
if result:
do_stuff(result)
It seems like it defeats the benefit of having a generator in the first place.
Perhaps I'm just missing a language feature or something, but what is the pythonic language construct for explicitly indicating that work is not to be done with empty generators?
To be clear, I'd like to be able to give the user some insight as to how much work the script actually did (if any) - contextual snippet as follows:
# Python 2.7
templates = files_from_folder(path_to_folder)
result = list(get_same_sections(templates)) # returns generator
if not result:
msg("No data to sync.")
sys.exit()
for data in result:
for i, tpl in zip(data, templates):
tpl['sections'][i]['uuid'] = data[-1]
msg("{} sections found to sync up.".format(len(result)))
It works, but I think that ultimately it's a waste to change the generator into a list just to see if there's any work to do, so I assume there's a better way, yes?
EDIT: I get the sense that generators just aren't supposed to be used in this way, but I will add an example to show my reasoning.
There's a semi-popular 'helper function' in Python that you see now and again when you need to traverse a structure like a nested dict or what-have-you. Usually called getnode or getn, whenever I see it, it reads something like this:
def get_node(seq, path):
for p in path:
if p in seq:
seq = seq[p]
else:
return ()
return seq
So in this way, you can make it easier to deal with the results of a complicated path to data in a nested structure without always checking for None or try/except when you're not actually dealing with 'something exceptional'.
mydata = get_node(my_container, ('path', 2, 'some', 'data'))
if mydata: # could also be "for x in mydata", etc
do_work(mydata)
else:
something_else()
It's looking less like this kind of syntax would (or could) exist with generators, without writing a class that handles generators in this way as has been suggested.
A generator does not have a length until you've exhausted its iterations.
the only way to get whether it's got anything or not, is to exhaust it
items = list(myGenerator)
if items:
# do something
Unless you wrote a class with attribute nonzero that internally looks at your iterations list
class MyGenerator(object):
def __init__(self, items):
self.items = items
def __iter__(self):
for i in self.items:
yield i
def __nonzero__(self):
return bool(self.items)
>>> bool(MyGenerator([]))
False
>>> bool(MyGenerator([1]))
True
>>>

Iterating over a large unicode list taking a long time?

I'm working with the program Autodesk Maya.
I've made a naming convention script that will name each item in a certain convention accordingly. However I have it list every time in the scene, then check if the chosen name matches any current name in the scene, and then I have it rename it and recheck once more through the scene if there is a duplicate.
However, when i run the code, it can take as long as 30 seconds to a minute or more to run through it all. At first I had no idea what was making my code run slow, as it worked fine in a relatively low scene amount. But then when i put print statements in the check scene code, i saw that it was taking a long time to check through all the items in the scene, and check for duplicates.
The ls() command provides a unicode list of all the items in the scene. These items can be relatively large, up to a thousand or more if the scene has even a moderate amount of items, a normal scene would be several times larger than the testing scene i have at the moment (which has about 794 items in this list).
Is this supposed to take this long? Is the method i'm using to compare things inefficient? I'm not sure what to do here, the code is taking an excessive amount of time, i'm also wondering if it could be anything else in the code, but this seems like it might be it.
Here is some code below.
class Name(object):
"""A naming convention class that runs passed arguments through user
dictionary, and returns formatted string of users input naming convention.
"""
def __init__(self, user_conv):
self.user_conv = user_conv
# an example of a user convention is '${prefix}_${name}_${side}_${objtype}'
#staticmethod
def abbrev_lib(word):
# a dictionary of abbreviated words is here, takes in a string
# and returns an abbreviated string, if not found return given string
#staticmethod
def check_scene(name):
"""Checks entire scene for same name. If duplicate exists,
Keyword Arguments:
name -- (string) name of object to be checked
"""
scene = ls()
match = [x for x in scene if isinstance(x, collections.Iterable)
and (name in x)]
if not match:
return name
else:
return ''
def convert(self, prefix, name, side, objtype):
"""Converts given information about object into user specified convention.
Keyword Arguments:
prefix -- what is prefixed before the name
name -- name of the object or node
side -- what side the object is on, example 'left' or 'right'
obj_type -- the type of the object, example 'joint' or 'multiplyDivide'
"""
prefix = self.abbrev_lib(prefix)
name = self.abbrev_lib(name)
side = ''.join([self.abbrev_lib(x) for x in side])
objtype = self.abbrev_lib(objtype)
i = 02
checked = ''
subs = {'prefix': prefix, 'name': name, 'side':
side, 'objtype': objtype}
while self.checked == '':
newname = Template (self.user_conv.lower())
newname = newname.safe_substitute(**subs)
newname = newname.strip('_')
newname = newname.replace('__', '_')
checked = self.check_scene(newname)
if checked == '' and i < 100:
subs['objtype'] = '%s%s' %(objtype, i)
i+=1
else:
break
return checked
are you running this many times? You are potentially trolling a list of several hundred or a few thousand items for each iteration inside while self.checked =='', which would be a likely culprit. FWIW prints are also very slow in Maya, especially if you're printing a long list - so doing that many times will definitely be slow no matter what.
I'd try a couple of things to speed this up:
limit your searches to one type at a time - why troll through hundreds of random nodes if you only care about MultiplyDivide right now?
Use a set or a dictionary to search, rather than a list - sets and dictionaries use hashsets and are faster for lookups
If you're worried about maintining a naming convetion, definitely design it to be resistant to Maya's default behavior which is to append numeric suffixes to keep names unique. Any naming convention which doesn't support this will be a pain in the butt for all time, because you can't prevent Maya from doing this in the ordinary course of business. On the other hand if you use that for differntiating instances you don't need to do any uniquification at all - just use rename() on the object and capture the result. The weakness there is that Maya won't rename for global uniqueness, only local - so if you want to make unique node name for things that are not siblings you have to do it yourself.
Here's some cheapie code for finding unique node names:
def get_unique_scene_names (*nodeTypes):
if not nodeTypes:
nodeTypes = ('transform',)
results = {}
for longname in cmds.ls(type = nodeTypes, l=True):
shortname = longname.rpartition("|")[-1]
if not shortname in results:
results[shortname] = set()
results[shortname].add(longname)
return results
def add_unique_name(item, node_dict):
shortname = item.rpartition("|")[-1]
if shortname in node_dict:
node_dict[shortname].add(item)
else:
node_dict[shortname] = set([item])
def remove_unique_name(item, node_dict):
shortname = item.rpartition("|")[-1]
existing = node_dict.get(shortname, [])
if item in existing:
existing.remove(item)
def apply_convention(node, new_name, node_dict):
if not new_name in node_dict:
renamed_item = cmds.ls(cmds.rename(node, new_name), l=True)[0]
remove_unique_name(node, node_dict)
add_unique_name ( renamed_item, node_dict)
return renamed_item
else:
for n in range(99999):
possible_name = new_name + str(n + 1)
if not possible_name in node_dict:
renamed_item = cmds.ls(cmds.rename(node, possible_name), l=True)[0]
add_unique_name(renamed_item, node_dict)
return renamed_item
raise RuntimeError, "Too many duplicate names"
To use it on a particular node type, you just supply the right would-be name when calling apply_convention(). This would rename all the joints in the scene (naively!) to 'jnt_X' while keeping the suffixes unique. You'd do something smarter than that, like your original code did - this just makes sure that leaves are unique:
joint_names= get_unique_scene_names('joint')
existing = cmds.ls( type='joint', l = True)
existing .sort()
existing .reverse()
# do this to make sure it works from leaves backwards!
for item in existing :
apply_convention(item, 'jnt_', joint_names)
# check the uniqueness constraint by looking for how many items share a short name in the dict:
for d in joint_names:
print d, len (joint_names[d])
But, like i said, plan for those damn numeric suffixes, maya makes them all the time without asking for permission so you can't fight em :(
Instead of running ls for each and every name, you should run it once and store that result into a set (an unordered list - slightly faster). Then check against that when you run check_scene
def check_scene(self, name):
"""Checks entire scene for same name. If duplicate exists,
Keyword Arguments:
name -- (string) name of object to be checked
"""
if not hasattr(self, 'scene'):
self.scene = set(ls())
if name not in self.scene:
return name
else:
return ''

Django schedule: Difference between event.get_occurrence() and event.get_occurrences()?

Based on django-schedule. I can't find the guy who made it.
Pardon me if I'm missing something, but I've been trying to get an events occurrences, preferably for a given day.
When I use event.get_occurrence(date), it always returns nothing. But when I use event.get_occurrences(before_date, after_date), suddenly the occurrences on the previously attempted date show up.
Why won't this work with just one datetime object?
This difference is probably in the actual design of these two methods. Frankly, get_occurrence is rather flawed, in general. A method like this should always return something, even if it's just None, but there's scenarios where it doesn't return at all. Namely, if your event doesn't have an rrule, and the date you passed get_occurrence isn't the same as your event's start, then no value is returned.
There's not really anything that can be done about that. It's just flawed code.
Based on the above comment especially in the case when the event doesn't return and occurrence, the following snippet below can force the retrieval of an occurrence especially when you are sure that it exists
from dateutil.relativedelta import relativedelta
def custom_get_occurrence(event,start_date):
occurrence = event.get_occurrence(start_date)
if occurrence is None:
occurrences = event.get_occurrences(start_date, start_date+relative_delta(months=3)
result = filter(lambda x: x.start==start_date,occurrences)
occurence = result[0]
The above code resolves issue that might occur when the default get_occurrence doesn't return a result.