django queryset counts substrings in charField - django

One field in my model is a charField with the format substring1-substring2-substring3-substring4 and it can have this range of values:
"1-1-2-1"
"1-1-2-2"
"1-1-2-3"
"1-1-2-4"
"2-2-2-6"
"2-2-2-7"
"2-2-2-9"
"3-1-1-10"
"10-1-1-11"
"11-1-1-12"
"11-1-1-13"
For example I need to count the single number of occurrences for substring1.
In this case there are 5 unique occurrences (1,2,3,10,11).
"1-X-X-X"
"2-X-X-X"
"3-X-X-X"
"10-X-X-X"
"11-X-X-XX"
Sincerely I don't know where I can start from. I read the doc https://docs.djangoproject.com/en/1.5/ref/models/querysets/ but I didn't find a specific clue.
Thanks in advance.

results = MyModel.objects.all()
pos_id = 0
values_for_pos_id = [res.field_to_check.split('-')[pos_id] for res in results]
values_for_pos_id = set(values_for_pos_id)
How does this work:
first you fetch all your objects (results)
pos_id is your substring index (you have 4 substring, so it's in range 0 to 3)
you split each field_to_check (aka: where you store the substring combinations) on - (your separator) and fetch the correct substring for that object
you convert the list to a set (to have all the unique values)
Then a simple len(values_for_pos_id) will do the trick for you
NB: If you don't have pos_id or can't set it anywhere, you just need to loop like this:
for pos_id in range(4):
values_for_pos_id = set([res.field_to_check.split('-')[pos_id] for res in results])
# process your set results now
print len(values_for_pos_id)

Try something like this...
# Assumes your model name is NumberStrings and attribute numbers stores the string.
search_string = "1-1-2-1"
matched_number_strings = NumberStrings.objects.filter(numbers__contains=search_string)
num_of_occurrences = len(matches_found)
matched_ids = [match.id for match in matched_number_strings]

You could loop through these items (I guess they're strings), and add the value of each substring_n to a Set_n.
Since set values are unique, you would have a set, called Set_1, for example, that contains 1,2,3,10,11.
Make sense?

Related

Python3: Checking if a key word within a dictionary matches any part of a string

I'm having trouble converting my working code from lists to dictionaries. The basics of the code checks a file name for any keywords within the list.
But I'm having a tough time understanding dictionaries to convert it. I am trying to pull the name of each key and compare it to the file name like I did with lists and tuples. Here is a mock version of what i was doing.
fname = "../crazyfdsfd/fds/ss/rabbit.txt"
hollow = "SFV"
blank = "2008"
empty = "bender"
# things is list
things = ["sheep", "goat", "rabbit"]
# other is tuple
other = ("sheep", "goat", "rabbit")
#stuff is dictionary
stuff = {"sheep": 2, "goat": 5, "rabbit": 6}
try:
print(type(things), "things")
for i in things:
if i in fname:
hollow = str(i)
print(hollow)
if hollow == things[2]:
print("PERFECT")
except:
print("c-c-c-combo breaker")
print("\n \n")
try:
print(type(other), "other")
for i in other:
if i in fname:
blank = str(i)
print(blank)
if blank == other[2]:
print("Yes. You. Can.")
except:
print("THANKS OBAMA")
print("\n \n")
try:
print(type(stuff), "stuff")
for i in stuff: # problem loop
if i in fname:
empty = str(i)
print(empty)
if empty == stuff[2]: # problem line
print("Shut up and take my money!")
except:
print("CURSE YOU ZOIDBERG!")
I am able to get a full run though the first two examples, but I cannot get the dictionary to run without its exception. The loop is not converting empty into stuff[2]'s value. Leaving money regrettably in fry's pocket. Let me know if my example isn't clear enough for what I am asking. The dictionary is just short cutting counting lists and adding files to other variables.
A dictionary is an unordered collection that maps keys to values. If you define stuff to be:
stuff = {"sheep": 2, "goat": 5, "rabbit": 6}
You can refer to its elements with:
stuff['sheep'], stuff['goat'], stuff['rabbit']
stuff[2] will result in a KeyError, because the key 2 is not found in your dictionary. You can't compare a string with the last or 3rd value of a dictionary, because the dictionary is not stored in an ordered sequence (the internal ordering is based on hashing). Use a list or tuple for an ordered sequence - if you need to compare to the last item.
If you want to traverse a dictionary, you can use this as a template:
for k, v in stuff.items():
if k == 'rabbit':
# do something - k will be 'rabbit' and v will be 6
If you want to check to check the keys in a dictionary to see if they match part of a string:
for k in stuff.keys():
if k in fname:
print('found', k)
Some other notes:
The KeyError would be much easier to notice... if you took out your try/except blocks. Hiding python errors from end-users can be useful. Hiding that information from YOU is a bad idea - especially when you're debugging an initial pass at code.
You can compare to the last item in a list or tuple with:
if hollow == things[-1]:
if that is what you're trying to do.
In your last loop: empty == str(i) needs to be empty = str(i).

compare two dictionary, one with list of float value per key, the other one a value per key (python)

I have a query sequence that I blasted online using NCBIWWW.qblast. In my xml blast file result I obtained for a query sequence a list of hit (i.e: gi|). Each hit or gi| have multiple hsp. I made a dictionary my_dict1 where I placed gi| as key and I appended the bit score as value. So multiple values for each key.
my_dict1 = {
gi|1002819492|: [437.702, 384.47, 380.86, 380.86, 362.83],
gi|675820360| : [2617.97, 2614.37, 122.112],
gi|953764029| : [414.258, 318.66, 122.112, 86.158],
gi|675820410| : [450.653, 388.08, 386.27] }
Then I looked for max value in each key using:
for key, value in my_dict1.items():
max_value = max(value)
And made a second dictionary my_dict2:
my_dict2 = {
gi|1002819492|: 437.702,
gi|675820360| : 2617.97,
gi|953764029| : 414.258,
gi|675820410| : 450.653 }
I want to compare both dictionary. So I can extract the hsp with the highest score bits. I am also including other parameters like query coverage and identity percentage (Not shown here). The finality is to get the best gi| with the highest bit scores, coverage and identity percentage.
I tried many things to compare both dictionary like this :
First code :
matches[]
if my_dict1.keys() not in my_dict2.keys():
matches[hit_id] = bit_score
else:
matches = matches[hit_id], bit_score
Second code:
if hit_id not in matches.keys():
matches[hit_id]= bit_score
else:
matches = matches[hit_id], bit_score
Third code:
intersection = set(set(my_dict1.items()) & set(my_dict2.items()))
Howerver I always end up with 2 types of errors:
1 ) TypeError: list indices must be integers, not unicode
2 ) ... float not iterable...
Please I need some help and guidance. Thank you very much in advance for your time. Best regards.
It's not clear what you're trying to do. What is hit_id? What is bit_score? It looks like your second dict is always going to have the same keys as your first if you're creating it by pulling the max value for each key of the first dict.
You say you're trying to compare them, but don't really state what you're actually trying to do. Find those with values under a certain max? Find those with the highest max?
Your first code doesn't work because I'm assuming you're trying to use a dict key value as an index to matches, which you define as a list. That's probably where your first error is coming from, though you haven't given the lines where the error is actually occurring.
See in-code comments below:
# First off, this needs to be a dict.
matches{}
# This will never happen if you've created these dicts as you stated.
if my_dict1.keys() not in my_dict2.keys():
matches[hit_id] = bit_score # Not clear what bit_score is?
else:
# Also not sure what you're trying to do here. This will assign a tuple
# to matches with whatever the value of matches[hit_id] is and bit_score.
matches = matches[hit_id], bit_score
Regardless, we really need more information and the full code to figure out your actual goal and what's going wrong.

Loop through dict key equals to list than put in new list

I have a rather large dictionary right now that is set up like this:
largedict = {'journalname':{code: 2065},'journalname1':{code: 3055}}
and so on and so on. And another dictionary:
codes = {3055: 'medicine',3786: 'sciences'}
And I want to loop through largedict, compare it's code value to the keys in codes. Then either add all journalname key/value pairs that match the code to a different dictionary, or delete all that don't from largedict.
new_dic = {journal_name : journal_body for journal_name, journal_body in largedict.items() if journal_body["code"] in codes}

How to get 3 unique values using random.randint() in python?

I am trying to populate a list in Python3 with 3 random items being read from a file using REGEX, however i keep getting duplicate items in the list.
Here is an example.
import re
import random as rn
data = '/root/Desktop/Selenium[FILTERED].log'
with open(data, 'r') as inFile:
index = inFile.read()
URLS = re.findall(r'https://www\.\w{1,10}\.com/view\?i=\w{1,20}', index)
list_0 = []
for i in range(3):
list_0.append(URLS[rn.randint(1, 30)])
inFile.close()
for i in range(len(list_0)):
print(list_0[i])
What would be the cleanest way to prevent duplicate items being appended to the list?
(EDIT)
This is the code that i think has done the job quite well.
def random_sample(data):
r_e = ['https://www\.\w{1,10}\.com/view\?i=\w{1,20}', '..']
with open(data, 'r') as inFile:
urls = re.findall(r'%s' % r_e[0], inFile.read())
x = list(set(urls))
inFile.close()
return x
data = '/root/Desktop/[TEMP].log'
sample = random_sample(data)
for i in range(3):
print(sample[i])
Unordered collection with no duplicate entries.
Use the builtin random.sample.
random.sample(population, k)
Return a k length list of unique elements chosen from the population sequence or set.
Used for random sampling without replacement.
Addendum
After seeing your edit, it looks like you've made things much harder than they have to be. I've wired a list of URLS in the following, but the source doesn't matter. Selecting the (guaranteed unique) subset is essentially a one-liner with random.sample:
import random
# the following two lines are easily replaced
URLS = ['url1', 'url2', 'url3', 'url4', 'url5', 'url6', 'url7', 'url8']
SUBSET_SIZE = 3
# the following one-liner yields the randomized subset as a list
urlList = [URLS[i] for i in random.sample(range(len(URLS)), SUBSET_SIZE)]
print(urlList) # produces, e.g., => ['url7', 'url3', 'url4']
Note that by using len(URLS) and SUBSET_SIZE, the one-liner that does the work is not hardwired to the size of the set nor the desired subset size.
Addendum 2
If the original list of inputs contains duplicate values, the following slight modification will fix things for you:
URLS = list(set(URLS)) # this converts to a set for uniqueness, then back for indexing
urlList = [URLS[i] for i in random.sample(range(len(URLS)), SUBSET_SIZE)]
Or even better, because it doesn't need two conversions:
URLS = set(URLS)
urlList = [u for u in random.sample(URLS, SUBSET_SIZE)]
seen = set(list_0)
randValue = URLS[rn.randint(1, 30)]
# [...]
if randValue not in seen:
seen.add(randValue)
list_0.append(randValue)
Now you just need to check list_0 size is equal to 3 to stop the loop.

How can I count the number of attributes of an element in Selenium Python?

Having the following HTML code:
<span class="warning" id ="warning">WARNING:</span>
For an object accessible by XPAth:
.//*[#id='unlink']/table/tbody/tr[1]/td/span
How can one count its attributes (class, id) by means of Selenium WebDriver + Python 2.7, without actually knowing their names?
I'm expecting something like count = 2.
Got it! This should work for div, span, img, p and many other basic elements.
element = driver.find_element_by_xpath(xpath) #Locate the element.
outerHTML = element.get_attribute("outerHTML") #Get its HTML
innerHTML = element.get_attribute("innerHTML") #See where its inner content starts
if len(innerHTML) > 0: # Let's make this work for input as well
innerHTML = innerHTML.strip() # Strip whitespace around inner content
toTrim = outerHTML.index(innerHTML) # Get the index of the first part, before the inner content
# In case of moste elements, this is what we care about
rightString = outerHTML[:toTrim]
else:
# We seem to have something like <input class="bla" name="blabla"> which is good
rightString = outerHTML
# Ie: <span class="something" id="somethingelse">
strippedString = rightString.strip() # Remove whitespace, if any
rightTrimmedString = strippedString.rstrip('<>') #
leftTrimmedString = rightTrimmedString.lstrip('</>') # Remove the <, >, /, chars.
rawAttributeArray = leftTrimmedString.split(' ') # Create an array of:
# [span, id = "something", class="somethingelse"]
curatedAttributeArray = [] # This is where we put the good values
iterations = len(rawAttributeArray)
for x in range(iterations):
if "=" in rawAttributeArray[x]: #We want the attribute="..." pairs
curatedAttributeArray.append(rawAttributeArray[x]) # and add them to a list
numberOfAttributes = len(curatedAttributeArray) #Let's see what we got
print numberOfAttributes # There we go
I hope this helps.
Thanks,
R.
P.S. This could be further enhanced, like stripping whitespace together with <, > or /.
It's not going to be easy.
Every element has a series of implicit attributes as well as the ones explicitly defined (for example selected, disabled, etc). As a result the only way I can think to do it would be to get a reference to the parent and then use a JavaScript executor to get the innerHTML:
document.getElementById('{ID of element}').innerHTML
You would then have to parse what is returned by innerHTML to extract out individual elements and then once you have isolated the element that you are interested in you would again have to parse that element to extract out a list of attributes.