Appending values to existing keys in a dictionary - list

I am working on a website crawler.
The task of this crawler is to look for products and their respective brands.
The written crawler gives me two lists as output.
This works fine so far.
The problem I am facing is that I want to put this two list into a dictionary.
The brands should be the keys and the products the values.
So that I can ask for the brands(keys) on this website and get the products(values) as output.
e.g.:
brands = ["a", "b", "c", "a", "a", "b"]
products = [ 1, 2, 3, 4, 5, 6]
offer = {}
for i in range(0,len(brands)-1):
offer[brands[i]] = products[i]
desired output:
offer = { a: [1, 4, 5] ; b: [2, 6] ; c: [3]}
actual output:
offer = { a: 5 ; b: 6 ; c: 3}
I kinda see that the for-loop could be the problem since I am using equal-sign, which leads that the values are updating, but not appending.
thanks for your help

You have got to the right mistake
What you need to do is save all the results in a list.
brands = ["a", "b", "c", "a", "a", "b"]
products = [ 1, 2, 3, 4, 5, 6]
offer = {}
for i in range(0,len(brands)-1):
if brands[i] not in offer:
offer[brands[i]] = []
offer[brands[i]].append(products[i])
You can avoid the if condition while iterating by using defaultdict.
defaultdict is an awesome thing for your use case, without making many changes to your code, the following is the way to do it:
from collections import defaultdict
brands = ["a", "b", "c", "a", "a", "b"]
products = [ 1, 2, 3, 4, 5, 6]
offer = defaultdict(list)
for brand, product in zip(brands, products):
offer[brand].append(product)

Related

Making a dictionary? from 2 lists / columns

I have a large database with several columns, i need data from 2 of these.
The end result is to have 2 drop down menus where the first one sets "names" and the second one is the "numbers" values that has been merged into the name. I just need the data available so i can input it into another program.
So a list or dictionary that contains the Unique values of the "names" list, with the numbers from the numbers list appended to them.
# Just a list of random names and numbers for testing
names = [
"Cindi Brookins",
"Cumberband Hamberdund",
"Roger Ramsden",
"Cumberband Hamberdund",
"Lorean Dibble",
"Lorean Dibble",
"Coleen Snider",
"Rey Bains",
"Maxine Rader",
"Cindi Brookins",
"Catharine Vena",
"Lanny Mckennon",
"Berta Urban",
"Rey Bains",
"Roger Ramsden",
"Lanny Mckennon",
"Catharine Vena",
"Berta Urban",
"Maxine Rader",
"Coleen Snider"
]
numbers = [
6,
5,
7,
10,
3,
9,
1,
1,
2,
7,
4,
2,
8,
3,
8,
10,
4,
9,
6,
5
]
So in the above example "Berta Urban" would appear once, but still have the numbers 8 and 9 assigned, "Rey Bains" would have 1 and 3.
I have tried with
mergedlist = dict(zip(names, numbers))
But that only assigns the last of the numbers to the name.
I am not sure if i can make a dictionary with Unique "names" that holds multiple "numbers".
You only get the last number associated with each name because dictionary keys are unique (otherwise they wouldn't be much use). So if you do
mergedlist["Berta Urban"] = 8
and after that
mergedlist["Berta Urban"] = 9
the result will be
{'Berta Urban': 9}
Just as if you did:
berta_urban = 8
berta_urban = 9
In that case you would expect the value of berta_urban to be 9 and not [8,9].
So, as you can see, you need an append not an assignment to your dict entry.
from collections import defaultdict
mergedlist = defaultdict(list)
for (name,number) in zip(names, numbers): mergedlist[name].append(number)
This gives:
{'Coleen Snider': [1, 5],
'Cindi Brookins': [6, 7],
'Cumberband Hamberdund': [5, 10],
'Roger Ramsden': [7, 8],
'Lorean Dibble': [3, 9],
'Rey Bains': [1, 3],
'Maxine Rader': [2, 6],
'Catharine Vena': [4, 4],
'Lanny Mckennon': [2, 10],
'Berta Urban': [8, 9]
}
which is what I think you want. Note that you will get duplicates, as in 'Catharine Vena': [4, 4] and you will also get a list of numbers for each name, even if the list has only one number in it.
You cannot have multiple keys of the same name in a dict, but your dict keys can be unique while holding a list of matching numbers. Something like:
mergedlist = {}
for i, v in enumerate(names):
mergedlist[v] = mergedlist.get(v, []) + [numbers[i]]
print(mergedlist["Berta Urban"]) # prints [8, 9]
Not terribly efficient, tho. In dependence of the datatbase you're using, chances are that the database can get you the results in the form you prefer faster than you post-processing and reconstructing the data.

Python Dask - vertical concatenation of 2 DataFrames

I am trying to vertically concatenate two Dask DataFrames
I have the following Dask DataFrame:
d = [
['A','B','C','D','E','F'],
[1, 4, 8, 1, 3, 5],
[6, 6, 2, 2, 0, 0],
[9, 4, 5, 0, 6, 35],
[0, 1, 7, 10, 9, 4],
[0, 7, 2, 6, 1, 2]
]
df = pd.DataFrame(d[1:], columns=d[0])
ddf = dd.from_pandas(df, npartitions=5)
Here is the data as a Pandas DataFrame
A B C D E F
0 1 4 8 1 3 5
1 6 6 2 2 0 0
2 9 4 5 0 6 35
3 0 1 7 10 9 4
4 0 7 2 6 1 2
Here is the Dask DataFrame
Dask DataFrame Structure:
A B C D E F
npartitions=4
0 int64 int64 int64 int64 int64 int64
1 ... ... ... ... ... ...
2 ... ... ... ... ... ...
3 ... ... ... ... ... ...
4 ... ... ... ... ... ...
Dask Name: from_pandas, 4 tasks
I am trying to concatenate 2 Dask DataFrames vertically:
ddf_i = ddf + 11.5
dd.concat([ddf,ddf_i],axis=0)
but I get this error:
Traceback (most recent call last):
...
File "...", line 572, in concat
raise ValueError('All inputs have known divisions which cannot '
ValueError: All inputs have known divisions which cannot be concatenated
in order. Specify interleave_partitions=True to ignore order
However, if I try:
dd.concat([ddf,ddf_i],axis=0,interleave_partitions=True)
then it appears to be working. Is there a problem with setting this to True (in terms of performance - speed)? Or is there another way to vertically 2 concatenate Dask DataFrames?
If you inspect the divisions of the dataframe ddf.divisions, you will find, assuming one partition, that it has the edges of the index there: (0, 4). This is useful to dask, as it knows when you do some operation on the data, not to use a partition not including required index values. This is also why some dask operations are much faster when the index is appropriate for the job.
When you concatenate, the second dataframe has the same index as the first. Concatenation would work without interleaving if the values of the index had different ranges in the two partitions.
mdurant's answer is correct and this answer elaborate with MCVE code snippets using Dask v2021.08.1. Examples make it easier to understand divisions and interleaving.
Vertically concatenating DataFrames
Create two DataFrames, concatenate them, and view the results.
df = pd.DataFrame(
{"nums": [1, 2, 3, 4, 5, 6], "letters": ["a", "b", "c", "d", "e", "f"]}
)
ddf1 = dd.from_pandas(df, npartitions=2)
df = pd.DataFrame({"nums": [88, 99], "letters": ["xx", "yy"]})
ddf2 = dd.from_pandas(df, npartitions=1)
ddf3 = dd.concat([ddf1, ddf2])
print(ddf3.compute())
nums letters
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
0 88 xx
1 99 yy
Divisions metadata when vertically concatenating
Create two DataFrames, concatenate them, and illustrate that sometimes this operation will cause divisions metadata to be lost.
def print_partitions(ddf):
for i in range(ddf.npartitions):
print(ddf.partitions[i].compute())
df = pd.DataFrame(
{"nums": [1, 2, 3, 4, 5, 6], "letters": ["a", "b", "c", "d", "e", "f"]}
)
ddf1 = dd.from_pandas(df, npartitions=2)
ddf1.divisions # (0, 3, 5)
df = pd.DataFrame({"nums": [88, 99], "letters": ["xx", "yy"]})
ddf2 = dd.from_pandas(df, npartitions=1)
ddf2.divisions # (0, 1)
ddf3 = dd.concat([ddf1, ddf2])
ddf3.divisions # (None, None, None, None)
Set interleave_partitions=True to avoid losing the divisions metadata.
ddf3_interleave = dd.concat([ddf1, ddf2], interleave_partitions=True)
ddf3_interleave.divisions # (0, 1, 3, 5)
When interleaving isn't necessary
Create two DataFrames without overlapping divisions, concatenate them, and confirm that the divisions metadata is not lost:
df = pd.DataFrame(
{"nums": [1, 2, 3, 4], "letters": ["a", "b", "c", "d"], "some_index": [4, 5, 6, 7]}
)
ddf1 = dd.from_pandas(df, npartitions=2)
ddf1 = ddf1.set_index("some_index")
df = pd.DataFrame({"nums": [88, 99], "letters": ["xx", "yy"], "some_index": [10, 20]})
ddf2 = dd.from_pandas(df, npartitions=1)
ddf2 = ddf2.set_index("some_index")
ddf3 = dd.concat([ddf1, ddf2])
ddf3.divisions # (4, 6, 10, 20)
I wrote a blog post to explain this in more detail. Let me know if you'd like the link.

Python 2.7 current row index on 2d array iteration

When iterating on a 2d array, how can I get the current row index? For example:
x = [[ 1. 2. 3. 4.]
[ 5. 6. 7. 8.]
[ 9. 0. 3. 6.]]
Something like:
for rows in x:
print x current index (for example, when iterating on [ 5. 6. 7. 8.], return 1)
Enumerate is a built-in function of Python. It’s usefulness can not be summarized in a single line. Yet most of the newcomers and even some advanced programmers are unaware of it. It allows us to loop over something and have an automatic counter. Here is an example:
for counter, value in enumerate(some_list):
print(counter, value)
And there is more! enumerate also accepts an optional argument which makes it even more useful.
my_list = ['apple', 'banana', 'grapes', 'pear']
for c, value in enumerate(my_list, 1):
print(c, value)
.
# Output:
# 1 apple
# 2 banana
# 3 grapes
# 4 pear
The optional argument allows us to tell enumerate from where to start the index. You can also create tuples containing the index and list item using a list. Here is an example:
my_list = ['apple', 'banana', 'grapes', 'pear']
counter_list = list(enumerate(my_list, 1))
print(counter_list)
.
# Output: [(1, 'apple'), (2, 'banana'), (3, 'grapes'), (4, 'pear')]
enumerate:
In [42]: x = [[ 1, 2, 3, 4],
...: [ 5, 6, 7, 8],
...: [ 9, 0, 3, 6]]
In [43]: for index, rows in enumerate(x):
...: print('current index {}'.format(index))
...: print('current row {}'.format(rows))
...:
current index 0
current row [1, 2, 3, 4]
current index 1
current row [5, 6, 7, 8]
current index 2
current row [9, 0, 3, 6]

Django model aggregate matches count in ManyToMany

There is Model with ManyToMany field:
class Number(Model):
current_number = IntegerField()
class MyModel(models.Model):
numbers_set = models.ManyToMany(Number)
For example we have such dataset:
my_model_1.numbers_set = [1, 2, 3, 4]
my_model_2.numbers_set = [2, 3, 4, 5]
my_model_3.numbers_set = [3, 4, 5, 6]
my_model_4.numbers_set = [4, 5, 6, 7]
my_model_5.numbers_set = [4, 5, 6, 7]
I'm looking for a way to aggregate MyModel by amount of same numbers.
f.e. MyModel objects that have at least 3 same numbers in theirs numbers_set.
[
[my_model_1, my_model_2],
[my_model_2, my_model_3],
[my_model_3, my_model_4, my_model_5],
]
if you are using Postgres version 9.4 and Django version 1.9 , It's better to use JSONField() rather than using ManyToMany(), for indexing purpose use jsonb indexing on Postgres which will provide you efficient query for fetching data. Check here

slice a dictionary on elements contained within item arrays

Say I have a dict of country -> [cities] (potentially an ordered dict):
{'UK': ['Bristol', 'Manchester' 'London', 'Glasgow'],
'France': ['Paris', 'Calais', 'Nice', 'Cannes'],
'Germany': ['Munich', 'Berlin', 'Cologne']
}
The number of keys (countries) is variable: and the number of elements cities in the array, also variable. The resultset comes from a 'search' on city name so, for example, a search on "San%" could potentially meet with 50k results (on a worldwide search)
The data is to be used to populate a select2 widget --- and I'd like to use its paging functionality...
Is there a smart way to slice this such that [3:8] would yield:
{'UK': ['Glasgow'],
'France': ['Paris', 'Calais', 'Nice', 'Cannes'],
'Germany': ['Munich']
}
(apologies for the way this question was posed earlier -- I wasn't sure that the real usage would clarify the issue...)
If I understand your problem correctly, as talked about in the comments, this should do it
from pprint import pprint
def slice_dict(d,a, b):
big_list = []
ret_dict = {}
# Make one big list of all numbers, tagging each number with the key
# of the dict they came from.
for k, v in d.iteritems():
for n in v:
big_list.append({k:n})
# Slice it
sliced = big_list[a:b]
# Put everything back in order
for k, v in d.iteritems():
for subd in sliced:
for subk, subv in subd.iteritems():
if k == subk:
if k in ret_dict:
ret_dict[k].append(subv)
else:
ret_dict[k] = [subv]
return ret_dict
d = {
'a': [1, 2, 3, 4],
'b': [5, 6, 7, 8, 9],
'c': [10, 11, 12, 13, 14]
}
x = slice_dict(d, 3, 11)
pprint(x)
$ python slice.py
{'a': [4], 'b': [5, 6], 'c': [10, 11, 12, 13, 14]}
The output is a little different from your example output, but that's because the dict was not ordered when it was passed to the function. It was a-c-b, that's why b is cut off at 6 and c is not cut off