Sum values of a Dictionary with "similar" keys Python - python-2.7

I have the following dictionary:
CostofA = {'Cost1,(1, 2)': 850.93,
'Cost1,(1, 2, 3)': 851.08,
'Cost1,(1, 3)': 851.00,
'Cost1,(1,)': 850.86,
'Cost2,(1, 2)': 812.56,
'Cost2,(1, 2, 3)': 812.65,
'Cost2,(2, 3)': 812.12,
'Cost2,(2,)': 812.04,
'Cost3,(1, 2, 3)': 717.93,
'Cost3,(1, 3)': 717.88,
'Cost3,(2, 3)': 717.32,
'Cost3,(3,)': 717.27}
From this dictionary, I want to create the following dictionary by adding up the elements that have similar keys. For example, I want to sum the values of 'Cost1,(1, 2, 3)', 'Cost2,(1, 2, 3)', and 'Cost3,(1, 2, 3)' as they have the same numbers inside the parentheses (1, 2, 3) and create 'Cost(1, 2, 3)': 2381.66. Similarly, 'Cost1,(1, 3)' and 'Cost3,(1, 3)' have the same numbers inside the parentheses, so, I want to sum 851.00 and 717.88 and write it to my new dictionary as: 'Cost(1, 3)': 1568.88. For 'Cost1,(1,)', 'Cost2,(2,)', and 'Cost3,(3,)', I do not want to do anything but to add them to the new dictionary. If I can get rid of the comma right after the 1 in the parentheses, it would be perfect. So, what I mean is: 'Cost1,(1,)': 850.86 becomes 'Cost(1)': 850.86.
CostofA = {'Cost(1)': 850.86,
'Cost(2)': 812.04,
'Cost(3)': 717.27,
'Cost(1, 2)': 1663.58,
'Cost(1, 3)': 1568.88,
'Cost(2, 3)': 1529.34,
'Cost(1, 2, 3)': 2381.66}
I know I can reach to the keys of the dictionary by
CostofA.keys()
and I know I may create a logic with a for loop and an if condition to create the above dictionary, however, I cannot think of a way to reach the numbers in the parentheses inside this if statement. Any suggestions?

Generate the items from the dictionary
Construct a list comprehension with tuple items by removing Cost.
e.g., Cost1,(1,2) will be (1,2), and Cost2,(1,2) will also be (1,2)
Sort the list so all key items will ordered
Groupby using itertools and sum and store it in a dict
from itertools import groupby
data = sorted([(i[0].split(",",1)[1].replace(",)",")"),i[1]) for i in CostofA.items()])
for key, group in groupby(data, lambda x: x[0]):
new_dict["Cost"+key] = sum([thing[1] for thing in group])

This is one solution:
import re
str_pat = re.compile(r'\((.*)\)')
Cost = {}
for key, value in CostofA.items():
match = str_pat.findall(key)[0]
if match.endswith(','): match = match[:-1]
temp_key = 'Cost(' + match + ')'
if temp_key in Cost:
Cost[temp_key] += value
else:
Cost[temp_key] = value
CostofA = Cost
This creates a new dictionary Cost with keys built based on the numbers enclosed by brackets in the original dictionary CostA. It uses a precompiled regex to match those numbers after which it checks with endswith(',') if the matched pattern end with a , like in (1,) - if it does, it removes it.
It then explicitly concatenates the pattern with brackets and other desired strings creating the target new key. If the new key exists, the program increases it's value by the value from the old dictionary. If it does not - it creates a new entry with that value. At the end, the program overwrites the old dictionary.
re.compile is a compiled regex object as said in the documentation:
Compile a regular expression pattern into a regular expression object,
which can be used for matching using its match() and search() methods,
described below.
It stores a given fixed regex pattern for searching and is considered to be more efficient than calling a new regex each time, especially when the program does more matching with that same pattern,
but using re.compile() and saving the resulting regular expression
object for reuse is more efficient when the expression will be used
several times in a single program.
Here it is used more for clarity as it defines the pattern once upfront rather then each time in the loop, but if your original dictionary was larger it could actually provide some performance improvements.

Related

identify letter/number combinations using regex and storing in dictionary

import pandas as pd
df = pd.DataFrame({'Date':['This 1-A16-19 person is BL-17-1111 and other',
'dont Z-1-12 do here but NOT 12-24-1981',
'numbers: 1A-256-29Q88 ok'],
'IDs': ['A11','B22','C33'],
})
Using the dataframe above I want to do the following 1) Use regex to identify all digit + number combination e.g 1-A16-19 2) Store in dictionary
Ideally I would like the following output (note that 12-24-1981 intentionally was not picked up by the regex since it doesn't have a letter in it e.g. 1A-24-1981)
{1: 1-A16-19, 2:BL-17-1111, 3: Z-1-12, 4: 1A-256-29Q88}
Can anybody help me do this?
This regex might do the trick.
(?=.*[a-zA-Z])(\S+-\S+-\S+)
It matches everything between two spaces that has two - in it. Also there won't be a match if there is no letter present.
regex101 example
As you can see for the given input you provided only 1-A16-19, BL-17-1111, Z-1-12 & 1A-256-29Q88 are getting returned.
you could try :
vals = df['Date'].str.extractall(r'(\S+-\S+-\S+)')[0].tolist()
# extract your strings based on your condition above and pass to a list.
# make a list with the index range of your matches.
nums = []
for x,y in enumerate(vals):
nums.append(x)
pass both lists into a dictionary.
my_dict = dict(zip(nums,vals))
print(my_dict)
{0: '1-A16-19',
1: 'BL-17-1111',
2: 'Z-1-12',
3: '12-24-1981',
4: '1A-256-29Q88'}
if you want the index to start at one you can specify this in the enumerate function.
for x,y in enumerate(vals,1):
nums.append(x)
print(nums)
[1, 2, 3,4,5]

Adding dictionary values with the missing values in the keys

I have the following three dictionaries:
Mydict = {'(1)': 850.86,
'(1, 2)': 1663.5,
'(1, 2, 3)': 2381.67,
'(1, 3)': 1568.89,
'(2)': 812.04,
'(2, 3)': 1529.45,
'(3)': 717.28}
A = {1: 4480.0, 2: 3696.0, 3: 4192.5}
B = {1: 1904.62, 2: 1709.27, 3: 1410.73}
Based on the keys in Mydict, I want to add the missing key value of min(A, B). For example, For the first key '(1)' in Mydict, I want to add min(A[2], B[2]) + min(A[3], B[3]) to the value of the first row and update the value in that dictionary. Similarly, for the value of the key: '(1, 2)', I want to add the min(A[3] + B[3]) as only 3 is missing in there. For '(1, 2, 3)', I don't need to add anything as it involves all the 3 numbers, namely 1, 2, and 3.Thus, my new Mydict will be as following:
Mynewdict = {'(1)': 3970.86,
'(1, 2)': 3074.23,
'(1, 2, 3)': 2381.67,
'(1, 3)': 3278.16,
'(2)': 4127.39,
'(2, 3)': 3434.07,
'(3)': 4331.17}
In this example, all the values of B are less than the values of A, however, it may not be the case for all the time, that is why, I want to add the minimum of those. Thanks for answers.
The list of numbers sounds like a good idea, but to me it seems it will still require similar amount of (similar) operations as the following solution:
import re
str_pat = re.compile(r'\((.*)\)')
Mynewdict = Mydict
for key in Mynewdict.keys():
match = (str_pat.findall(key)[0]).split(',')
# set(list(map(int, match))) in Python 3.X
to_add = list(set(A.keys()).difference(set(map(int,match))))
if to_add:
for kAB in to_add:
Mynewdict[key] += min(A[kAB],B[kAB])
For each key in Mynewdict this program finds the pattern between the brackets and turns it into a list match split by ,. This list is then compared to list of keys in A.
The comparison goes through sets - the program construct sets from both lists and returns a set difference (also turned into list) into to_add. to_add is hence a list with keys in A that are numbers not present in the compound key in Mynewdict. This assumes that all the keys in B are also present in A. map is used to convert the strings in match to integers (for comparison with keys in A that are integers). To use it in Python 3.X you need to additionally turn the map(int, match) into a list as noted in the comment.
Last part of the program assigns minimum value between A and B for each missing key to the existing value of Mynewdict. Since Mynewdict is initially a copy of Mydict all the final keys and intial values already exist in it so the program does not need to check for key presence or explicitly add the initial values.
To look for keys in Mynewdict that correspond to specific values it seems you actually have to loop through the dictionary:
find_val = round(min(Mynewdict.values()),2)
for key,value in Mynewdict.items():
if find_val == round(value,2):
print key, value
What's important here is the rounding. It is necessary since the values in Mynewdict have variable precision that is often longer than the precision of the value you are looking for. In other words without round(value,2) if will evaluate to False in cases where it is actually True.

Python pairs have multiple copies of a word in list

So I have the following code:
def stripNonAlphaNum(text):
import re
return re.compile(r'\W+', re.UNICODE).split(text)
def readText(fileStub):
words = open(fileStub, 'r').read()
words = words.lower() # Make it lowercase
wordlist = sorted(stripNonAlphaNum(words))
wordfreq = []
for w in wordlist: # Increase count of one upon every iteration of the word.
wordfreq.append(wordlist.count(w))
return list(zip(wordlist, wordfreq))
It reads a file in, and then makes pairs of the word and frequency in which they occur. The issue I'm facing is that when I print the result, I don't get the proper pair counts.
If I have some input given, I might get output like this:
('and', 27), ('and', 27), ('and', 27), ('and', 27), ('and', 27), ('and', 27), ('and', 27),.. (27 times)
Which is NOT what I want it to do.
Rather I would like it to give 1 output of the word and just one number like so:
('and', 27), ('able', 5), ('bat', 6).. etc
So how do I fix this?
You should consider using a dictionary.
Dictionaries work like hash maps, thus allow associative indexing; in this way duplicates are not an issue.
...
wordfreq = {}
for w in wordlist:
wordfreq[w] = wordlist.count(w)
return wordfreq
If you really need to return a list, just do return wordfreq.items()
The only problem with this approach is that you will unnecessarily compute the wordlist.count() method more than once for each word.
To avoid this issue, write for w in set(wordlist):
Edit for additional question: if you are ok with returning a list, just do return sorted(wordfreq.items(), key=lambda t: t[1]). If you omit the key part, the result will be ordered by the word first, then the value

How to compare variables in a list (number chars in order from 1 to n without changing the position of the chars in a list)

In short i want to number chars in order from 1 to n without changing the position of the chars in a list.
Suppose I have a list called key = ['c', 'a', 't'] How would i go about
assigning a number to each letter depending on where it is situated in the alphabet with respect to the other letters. Starting at 1 and going until len(key) such that our key becomes [ 2, 1, 3]
I'm really stumped. I have a way to convert them to numbers but very unsure as to how to compare them such that the above happens any help, tips, ideas or explanations would be appreciated.
this is what i have so far...
import string
key = list(input("enter key: ").upper())
num = []
for i in key:
num.append(string.ascii_uppercase.index(i)+1)
This solution assumes that duplicate entries should be assigned the same number, so that
# ['c','a','t'] -> [2, 1, 3]
# ['c','a','t','c','a','t'] -> [2, 1, 3, 2, 1, 3]
You can write a simple function like this:
def get_alphabet_pos(lst):
uniques = sorted(set(lst)) # set() to filter uniques, then order by value
numbers = {letter: i+1 for i, letter in enumerate(uniques)} # build a lookup dict
return [numbers[key] for key in lst]
get_alphabet_pos('cat') # [2, 1, 3]
So here's what happens in the function:
In line 1 of the function we convert your list to a set to remove any duplicate values. From the docs # https://docs.python.org/3/tutorial/datastructures.html#sets:
A set is an unordered collection with no duplicate elements.
Still in line 1 we sort the set and convert it back into a list. Thanks to #StefanPochmann for pointing out that sorted() takes care of the list conversion.
In line 2, we use enumerate() so we can iterate over the indices and values of our list of unique values: https://docs.python.org/3/library/functions.html#enumerate
The rest of line 2 is a simple dict comprehension to build a dictionary of letter -> number mappings. We use the dictionary in line 3 to look up the numbers for each letter in our input dict.
You might have to modify this slightly depending on how you want to handle duplicates :)

Sort nested dictionary in ascending order and grab outer key?

I have a dictionary that looks like:
dictionary = {'article1.txt': {'harry': 3, 'hermione': 2, 'ron': 1},
'article2.txt': {'dumbledore': 1, 'hermione': 3},
'article3.txt': {'harry': 5}}
And I'm interested in picking the article with the most number of occurences of Hermione. I already have code that selects the outer keys (article1.txt, article2.txt) and inner key hermione.
Now I want to be able to have code that sorts the dictionary into a list of ascending order for the highest number occurrences of the word hermione. In this case, I want a list such that ['article1.txt', 'article2.txt']. I tried it with the following code:
#these keys are generated from another part of the program
keys1 = ['article1.txt', 'article2.txt']
keys2 = ['hermione', 'hermione']
place = 0
for i in range(len(keys1)-1):
for j in range(len(keys2)-1):
if articles[keys1[i]][keys2[j]] > articles[keys1[i+1]][keys2[j+1]]:
ordered_articles.append(keys1[i])
place += 1
else:
ordered_articles.append(place, keys1[i])
But obviously (I'm realizing now) it doesn't make sense to iterate through the keys to check if dictionary[key] > dictionary[next_key]. This is because we would never be able to compare things not in sequence, like dictionary[key[1]] > dictionary[key[3]].
Help would be much appreciated!
It seems that what you're trying to do is sort the articles by the amount of 'hermiones' in them. And, python has a built-in function that does exactly that (you can check it here). You can use it to sort the dictionary keys by the amount of hermiones each of them points to.
Here's a code you can use as example:
# filters out articles without hermione from the dictionary
# value here is the inner dict (for example: {'harry': 5})
dictionary = {key: value for key, value in dictionary.items() if 'hermione' in value}
# this function just returns the amount of hermiones in an article
# it will be used for sorting
def hermione_count(key):
return dictionary[key]['hermione']
# dictionary.keys() is a list of the keys of the dictionary (the articles)
# key=... here means we use hermione_count as the function to sort the list
article_list = sorted(dictionary.keys(), key=hermione_count)