How to get 3 unique values using random.randint() in python? - regex

I am trying to populate a list in Python3 with 3 random items being read from a file using REGEX, however i keep getting duplicate items in the list.
Here is an example.
import re
import random as rn
data = '/root/Desktop/Selenium[FILTERED].log'
with open(data, 'r') as inFile:
index = inFile.read()
URLS = re.findall(r'https://www\.\w{1,10}\.com/view\?i=\w{1,20}', index)
list_0 = []
for i in range(3):
list_0.append(URLS[rn.randint(1, 30)])
inFile.close()
for i in range(len(list_0)):
print(list_0[i])
What would be the cleanest way to prevent duplicate items being appended to the list?
(EDIT)
This is the code that i think has done the job quite well.
def random_sample(data):
r_e = ['https://www\.\w{1,10}\.com/view\?i=\w{1,20}', '..']
with open(data, 'r') as inFile:
urls = re.findall(r'%s' % r_e[0], inFile.read())
x = list(set(urls))
inFile.close()
return x
data = '/root/Desktop/[TEMP].log'
sample = random_sample(data)
for i in range(3):
print(sample[i])
Unordered collection with no duplicate entries.

Use the builtin random.sample.
random.sample(population, k)
Return a k length list of unique elements chosen from the population sequence or set.
Used for random sampling without replacement.
Addendum
After seeing your edit, it looks like you've made things much harder than they have to be. I've wired a list of URLS in the following, but the source doesn't matter. Selecting the (guaranteed unique) subset is essentially a one-liner with random.sample:
import random
# the following two lines are easily replaced
URLS = ['url1', 'url2', 'url3', 'url4', 'url5', 'url6', 'url7', 'url8']
SUBSET_SIZE = 3
# the following one-liner yields the randomized subset as a list
urlList = [URLS[i] for i in random.sample(range(len(URLS)), SUBSET_SIZE)]
print(urlList) # produces, e.g., => ['url7', 'url3', 'url4']
Note that by using len(URLS) and SUBSET_SIZE, the one-liner that does the work is not hardwired to the size of the set nor the desired subset size.
Addendum 2
If the original list of inputs contains duplicate values, the following slight modification will fix things for you:
URLS = list(set(URLS)) # this converts to a set for uniqueness, then back for indexing
urlList = [URLS[i] for i in random.sample(range(len(URLS)), SUBSET_SIZE)]
Or even better, because it doesn't need two conversions:
URLS = set(URLS)
urlList = [u for u in random.sample(URLS, SUBSET_SIZE)]

seen = set(list_0)
randValue = URLS[rn.randint(1, 30)]
# [...]
if randValue not in seen:
seen.add(randValue)
list_0.append(randValue)
Now you just need to check list_0 size is equal to 3 to stop the loop.

Related

Python3: Checking if a key word within a dictionary matches any part of a string

I'm having trouble converting my working code from lists to dictionaries. The basics of the code checks a file name for any keywords within the list.
But I'm having a tough time understanding dictionaries to convert it. I am trying to pull the name of each key and compare it to the file name like I did with lists and tuples. Here is a mock version of what i was doing.
fname = "../crazyfdsfd/fds/ss/rabbit.txt"
hollow = "SFV"
blank = "2008"
empty = "bender"
# things is list
things = ["sheep", "goat", "rabbit"]
# other is tuple
other = ("sheep", "goat", "rabbit")
#stuff is dictionary
stuff = {"sheep": 2, "goat": 5, "rabbit": 6}
try:
print(type(things), "things")
for i in things:
if i in fname:
hollow = str(i)
print(hollow)
if hollow == things[2]:
print("PERFECT")
except:
print("c-c-c-combo breaker")
print("\n \n")
try:
print(type(other), "other")
for i in other:
if i in fname:
blank = str(i)
print(blank)
if blank == other[2]:
print("Yes. You. Can.")
except:
print("THANKS OBAMA")
print("\n \n")
try:
print(type(stuff), "stuff")
for i in stuff: # problem loop
if i in fname:
empty = str(i)
print(empty)
if empty == stuff[2]: # problem line
print("Shut up and take my money!")
except:
print("CURSE YOU ZOIDBERG!")
I am able to get a full run though the first two examples, but I cannot get the dictionary to run without its exception. The loop is not converting empty into stuff[2]'s value. Leaving money regrettably in fry's pocket. Let me know if my example isn't clear enough for what I am asking. The dictionary is just short cutting counting lists and adding files to other variables.
A dictionary is an unordered collection that maps keys to values. If you define stuff to be:
stuff = {"sheep": 2, "goat": 5, "rabbit": 6}
You can refer to its elements with:
stuff['sheep'], stuff['goat'], stuff['rabbit']
stuff[2] will result in a KeyError, because the key 2 is not found in your dictionary. You can't compare a string with the last or 3rd value of a dictionary, because the dictionary is not stored in an ordered sequence (the internal ordering is based on hashing). Use a list or tuple for an ordered sequence - if you need to compare to the last item.
If you want to traverse a dictionary, you can use this as a template:
for k, v in stuff.items():
if k == 'rabbit':
# do something - k will be 'rabbit' and v will be 6
If you want to check to check the keys in a dictionary to see if they match part of a string:
for k in stuff.keys():
if k in fname:
print('found', k)
Some other notes:
The KeyError would be much easier to notice... if you took out your try/except blocks. Hiding python errors from end-users can be useful. Hiding that information from YOU is a bad idea - especially when you're debugging an initial pass at code.
You can compare to the last item in a list or tuple with:
if hollow == things[-1]:
if that is what you're trying to do.
In your last loop: empty == str(i) needs to be empty = str(i).

print sum of duplicate numbers and product of non duplicate numbers from the list

I am new to python. I am trying to print sum of all duplicates nos and products of non-duplicates nos from the python list. for examples
list = [2,2,4,4,5,7,8,9,9]. what i want is sum= 2+2+4+4+9+9 and product=5*7*8.
There are pythonic one liners that can do this but here is an explicit way you might find easier to understand.
num_list = [2,2,4,4,5,7,8,9,9]
sum_dup = 0
product = 1
for n in num_list:
if num_list.count(n) == 1:
product *= n
else:
sum_dup += n
Also side note, don't call your list the name "list", it interferes with the builtin name of the list type.
count is useful for this. Sum is built in, but there is no built in "product", so using reduce is the easiest way to do this.
from functools import reduce
import operator
the_sum = sum([x for x in list if list.count(x)>1])
the_product = reduce(operator.mul, [x for x in lst if lst.count(x)==1])
Use a for loop to read a number from the list. create a variable and assign the number to it, read another number and compare them using an if statement. If they are the same sum them like sameNumSum+=sameNumSum else multiply them. Before for loop create these two variables and initialize them. I just gave you the algorithm to it, you can change it into your code. Hope that help though.

String from CSV to list - Python

I don't get it. I have a CSV data with the following content:
wurst;ball;hoden;sack
1;2;3;4
4;3;2;1
I want to iterate over the CSV data and put the heads in one list and the content in another list. Heres my code so far:
data = [ i.strip() for i in open('test.csv', 'r').readlines() ]
for i_c, i in enumerate(data):
if i_c == 0:
heads = i
else:
content = i
heads.split(";")
content.split(";")
print heads
That always returns the following string, not a valid list.
wurst;ball;hoden;sack
Why does split not work on this string?
Greetings and merry Christmas,
Jan
The split method returns the list, it does not modify the object in place. Try:
heads = heads.split(";")
content = content.split(";")
I've noticed also that your data seems to all be integers. You might consider instead the following for content:
content = [int(i) for i in content.split(";")]
The reason is that split returns a list of strings, and it seems like you might need to deal with them as numbers in your code later on. Of course, disregard if you are expecting non-numeric data to show up at some point.

django queryset counts substrings in charField

One field in my model is a charField with the format substring1-substring2-substring3-substring4 and it can have this range of values:
"1-1-2-1"
"1-1-2-2"
"1-1-2-3"
"1-1-2-4"
"2-2-2-6"
"2-2-2-7"
"2-2-2-9"
"3-1-1-10"
"10-1-1-11"
"11-1-1-12"
"11-1-1-13"
For example I need to count the single number of occurrences for substring1.
In this case there are 5 unique occurrences (1,2,3,10,11).
"1-X-X-X"
"2-X-X-X"
"3-X-X-X"
"10-X-X-X"
"11-X-X-XX"
Sincerely I don't know where I can start from. I read the doc https://docs.djangoproject.com/en/1.5/ref/models/querysets/ but I didn't find a specific clue.
Thanks in advance.
results = MyModel.objects.all()
pos_id = 0
values_for_pos_id = [res.field_to_check.split('-')[pos_id] for res in results]
values_for_pos_id = set(values_for_pos_id)
How does this work:
first you fetch all your objects (results)
pos_id is your substring index (you have 4 substring, so it's in range 0 to 3)
you split each field_to_check (aka: where you store the substring combinations) on - (your separator) and fetch the correct substring for that object
you convert the list to a set (to have all the unique values)
Then a simple len(values_for_pos_id) will do the trick for you
NB: If you don't have pos_id or can't set it anywhere, you just need to loop like this:
for pos_id in range(4):
values_for_pos_id = set([res.field_to_check.split('-')[pos_id] for res in results])
# process your set results now
print len(values_for_pos_id)
Try something like this...
# Assumes your model name is NumberStrings and attribute numbers stores the string.
search_string = "1-1-2-1"
matched_number_strings = NumberStrings.objects.filter(numbers__contains=search_string)
num_of_occurrences = len(matches_found)
matched_ids = [match.id for match in matched_number_strings]
You could loop through these items (I guess they're strings), and add the value of each substring_n to a Set_n.
Since set values are unique, you would have a set, called Set_1, for example, that contains 1,2,3,10,11.
Make sense?

How can I count the number of attributes of an element in Selenium Python?

Having the following HTML code:
<span class="warning" id ="warning">WARNING:</span>
For an object accessible by XPAth:
.//*[#id='unlink']/table/tbody/tr[1]/td/span
How can one count its attributes (class, id) by means of Selenium WebDriver + Python 2.7, without actually knowing their names?
I'm expecting something like count = 2.
Got it! This should work for div, span, img, p and many other basic elements.
element = driver.find_element_by_xpath(xpath) #Locate the element.
outerHTML = element.get_attribute("outerHTML") #Get its HTML
innerHTML = element.get_attribute("innerHTML") #See where its inner content starts
if len(innerHTML) > 0: # Let's make this work for input as well
innerHTML = innerHTML.strip() # Strip whitespace around inner content
toTrim = outerHTML.index(innerHTML) # Get the index of the first part, before the inner content
# In case of moste elements, this is what we care about
rightString = outerHTML[:toTrim]
else:
# We seem to have something like <input class="bla" name="blabla"> which is good
rightString = outerHTML
# Ie: <span class="something" id="somethingelse">
strippedString = rightString.strip() # Remove whitespace, if any
rightTrimmedString = strippedString.rstrip('<>') #
leftTrimmedString = rightTrimmedString.lstrip('</>') # Remove the <, >, /, chars.
rawAttributeArray = leftTrimmedString.split(' ') # Create an array of:
# [span, id = "something", class="somethingelse"]
curatedAttributeArray = [] # This is where we put the good values
iterations = len(rawAttributeArray)
for x in range(iterations):
if "=" in rawAttributeArray[x]: #We want the attribute="..." pairs
curatedAttributeArray.append(rawAttributeArray[x]) # and add them to a list
numberOfAttributes = len(curatedAttributeArray) #Let's see what we got
print numberOfAttributes # There we go
I hope this helps.
Thanks,
R.
P.S. This could be further enhanced, like stripping whitespace together with <, > or /.
It's not going to be easy.
Every element has a series of implicit attributes as well as the ones explicitly defined (for example selected, disabled, etc). As a result the only way I can think to do it would be to get a reference to the parent and then use a JavaScript executor to get the innerHTML:
document.getElementById('{ID of element}').innerHTML
You would then have to parse what is returned by innerHTML to extract out individual elements and then once you have isolated the element that you are interested in you would again have to parse that element to extract out a list of attributes.