I was wondering whether there is an easier way of multiplying the inventory I have by their values assigned? I came out with the below code, which seems to work but it looks lengthy.
score = {
"handle" : 4 ,
"blade" : 3 ,
"bottle" : 10 ,
"full blade" : 10
}
inventory = ["handle", "blade", "bottle", "full blade"]
list = []
def ScoreCompute():
for x in inventory:
list.append(score.get(x))
ScoreCompute()
print sum(list)
x = sum(list)
If you want to sum all the values in your dictionary score, you can do this with
sum(score.values())
or
sum(score.itervalues())
An additional note: In your example, inventory is equal to score.keys() (though, the order is not necessarily the same).
Edit:
As of Python 3.X inventory would equal something like list(score.keys()).
Thanks to #dwanderson for the hint.
>>>> sum(score.values())
Python dictionaries have lots of convenience methods/accessors, specifically for these types of things. First, if you want every item (key, value, key-value pair) in a dictionary, you probably don't need to use the .get(...) syntax.
If you want just the keys, you could do:
for key in score:
(Note: for key in score.keys(): works as well, but in Python3, it's unnecessary, and in Python2, it can give a performance hit; for key in score.iterkeys() works without creating a temporary list, but there's probably no reason to call score.iterkeys() over just 'score`, at least not that I can think of off-hand.)
If you want just the values, Python provides a built-in, efficient method for though - no need to construct an entire new list of the values, just to sum them and then throw the list away. This gets into the topic of generators, but that seems a bit of a digression here. Suffice to say,
for value in score.itervalues():
would let you look at and examine each of the values. You don't know which key goes with which value, but in your particular case, you don't care about the key, only the value, so that's fine.
If you want to look at both the key and the value at the same time, then all you need is
for key, value in score.iteritems():
Now, in your particular case, you just need each value once, and Python's sum is smart enough to figure out how to extract that from the .values() generator, so there's no need for an explicit for-loop. You can just write sum(score.itervalues()) and get what you need.
It doesn't look like a particular concern here, but just to note: even though you don't have an explicit for-loop, it's still performing one "under the hood", so the longer the list, the longer it will take to sum. Again, though, this shouldn't matter for such a straight-forward and small example; it would be something to keep in mind if you were taking the sum of a list with millions and millions of values (or if you were calling sum(score.itervalues()) over and over and over again).
A little bit shorter if inventory has fewer entries than score:
>>> sum(score[key] for key in inventory)
27
Related
I have around 400.000 "items".
Each "item" consists of 16 double values.
At runtime I need to compare items with each other. Therefore I am muplicating their double values. This is quite time-consuming.
I have made some tests, and I found out that there are only 40.000 possible return values, no matter which items I compare with each other.
I would like to store these values in a look-up table so that I can easily retrieve them without doing any real calculation at runtime.
My question would be how to efficiently store the data in a look-up table.
The problem is that if I create a look-up table, it gets amazingly huge, for example like this:
item-id, item-id, compare return value
1 1 499483,49834
1 2 -0.0928
1 3 499483,49834
(...)
It would sum up to around 120 million combinations.
That just looks too big for a real-world application.
But I am not sure how to avoid that.
Can anybody please share some cool ideas?
Thank you very much!
Assuming I understand you correctly, You have two inputs with 400K possibilities, so 400K * 400K = 160B entries... assuming you have them indexed sequentially, and the you stored your 40K possibilities in a way that allowed 2-octets each, you're looking at a table size of roughly 300GB... pretty sure that's beyond current every-day computing. So, you might instead research if there is any correlation between the 400K "items", and if so, if you can assign some kind of function to that correlation that gives you a clue (read: hash function) as to which of the 40K results might/could/should result. Clearly your hash function and lookup needs to be shorter than just doing the multiplication in the first place. Or maybe you can reduce the comparison time with some kind of intelligent reduction, like knowing the result under certain scenarios. Or perhaps some of your math can be optimized using integer math or boolean comparisons. Just a few thoughts...
To speed things up, you should probably compute all of the possible answers, and store the inputs to each answer.
Then, I would recommend making some sort of look up table that uses the answer as the key(since the answers will all be unique), and then storing all of the possible inputs that get that result.
To help visualize:
Say you had the table 'Table'. Inside Table you have keys, and associated to those keys are values. What you do is you make the keys have the type of whatever format your answers are in(the keys will be all of your answers). Now, give your 400k inputs each a unique identifier. You then store the unique identifiers for a multiplication as one value associated to that particular key. When you compute that same answer again, you just add it as another set of inputs that can calculate that key.
Example:
Table<AnswerType, vector<Input>>
Define Input like:
struct Input {IDType one, IDType two}
Where one 'Input' might have ID's 12384, 128, meaning that the objects identified by 12384 and 128, when multiplied, will give the answer.
So, in your lookup, you'll have something that looks like:
AnswerType lookup(IDType first, IDType second)
{
foreach(AnswerType k in table)
{
if table[k].Contains(first, second)
return k;
}
}
// Defined elsewhere
bool Contains(IDType first, IDType second)
{
foreach(Input i in [the vector])
{
if( (i.one == first && i.two == second ) ||
(i.two == first && i.one == second )
return true;
}
}
I know this isn't real C++ code, its just meant as pseudo-code, and it's a rough cut as-is, but it might be a place to start.
While the foreach is probably going to be limited to a linear search, you can make the 'Contains' method run a binary search by sorting how the inputs are stored.
In all, you're looking at a run-once application that will run in O(n^2) time, and a lookup that will run in nlog(n). I'm not entirely sure how the memory will look after all of that, though. Of course, I don't know much about the math behind it, so you might be able to speed up the linear search if you can somehow sort the keys as well.
input
1 - - GET hm_brdr.gif
2 - - GET s102382.gif ( "1", {"- - GET hm_brdr.gif"})
3 - - GET bg_stars.gif map-reduce-> ( "2", {"- - GET s102382.gif"})
3 - - GET phrase.gif ( "3", {"- - GET bg_stars.gif,"- - GET phrase.gif"})
I want to make the first column values 1,2,3.. anonymous using random integers. But it shouldn't change it like 1->x in one line and 1->t in another line. so my solution is to replace the "keys" with random integers (rand(1)=x, rand(2)=y ..) in the reduce step and ungroup the values with their new keys and write to files again as shown below.
output file
x - - GET hm_brdr.gif
y - - GET s102382.gif
z - - GET bg_stars.gif
z - - GET phrase.gif
my question is, is there a better way of doing this in the means of running time ?
If you want to assign a random integer to a key value then you'll have to do that in a reducer where all key/value pairs for that key are gathered in one place. As #jason pointed out, you don't want to assign a random number since there's no guarantee that a particular random number won't be chosen for two different keys. What you can do is just increment a counter held as an instance variable on the reducer to get the next available number to associate with a key. If you have a small amount of data then a single reducer can be used and the numbers will be unique. If you're forced to use multiple reducers then you'll need a slightly more complicated technique. Use
Context.getTaskAttemptID().getTaskID().getId()
to get a unique reducer number with which to calculate an overall unique number for each key.
There is no way this is a bottleneck to your MapReduce job. More precisely, the runtime of your job is dominated by other concerns (network and disk I/O, etc.). A quick little key function? Meh.
But that's not even the biggest issue with your proposal. The biggest issue with your proposal is that it's doomed to fail. What is a key fact about keys? They serve as unique identifiers for records. Do random number generators guarantee uniqueness? No.
In fact, pretend for just a minute that your random key space has 365 possible values. It turns out that if you generate a mere 23 random keys, you are more likely than not to have a key collision; welcome to the birthday paradox. And all of a sudden, you've lost the whole point to the keys in the first place as you've started smashing together records by giving two that shouldn't have the same key the same key!
And you might be thinking, well, my key space isn't as small as 365 possible keys, it's more like 2^32 possible keys, so I'm, like, totally in the clear. No. After approximately 77,000 keys you're more likely than not to have a collision.
Your idea is just completely untenable because it's the wrong tool for the job. You need unique identifiers. Random doesn't guarantee uniqueness. Get a different tool.
In your case, you need a function that is injective on your input key space (that is, it guarantees that f(x) != f(y) if x != y). You haven't given me enough details to propose anything concrete, but that's what you're looking for.
And seriously, there is no way that performance of this function will be an issue. Your job's runtime really will be completely dominated by other concerns.
Edit:
To respond to your comment:
here i am actually trying to make the ip numbers anonymous in the log files, so if you think there is a better way i ll be happy to know.
First off, we have a serious XY problem here. You should have asked searched for answers to that question. Anonymizing IP addresses, or anything for that matter, is hard. You haven't even told us the criteria for a "solution" (e.g., who are the attackers?). I recommend taking a look at this answer on the IT Security Stack Exchange site.
All,
I have following task.
I have finite number of strings (categories). Then in each category there will be a set of team and the value pairs. The number of team is finite based on the user selection.
Both sizes are not more than 25.
Now the value will change based on the user input and when it change the team should be sorted based on the value.
I was hoping that STL has some kind of auto sorted vector or list container, but the only thing I could find is std::map<>.
So what I think I need is:
struct Foo
{
std::string team;
double value;
operator<();
};
std::map<std::string,std::vector<Foo>> myContainer;
and just call std::sort() when the value will change.
Or is there more efficient way to do it?
[EDIT]
I guess I need to clarify what I mean.
Think about it this way.
You have a table. The rows of this table are teams. The columns of this table are categories. The cells of this table are divided in half. Top half is the category value for a given team. This value is increasing with every player.
Now when the player is added to a team, the scoring categories of the player will be added to a team and the data in the columns will be sorted. So, for category "A" it may be team1, team2; and for category "B" it may be team2, team1.
Then based on the position of each team the score will be assigned for each team/category.
And that score I will need to display.
I hope this will clarify what I am trying to achieve and it become more clear of what I'm looking for.
[/EDIT]
It really depend how often you are going to modify the data in the map and how often you're just going to be searching for the std::string and grabbing the vector.
If your access pattern is add map entry then fill all entries in the vector then access the next, fill all entries in the vector, etc. Then randomly access the map for the vector afterwards then .. no map is probably not the best container. You'd be better off using a vector containing a standard pair of the string and the vector, then sort it once everything has been added.
In fact organising it as above is probably the most efficient way of setting it up (I admit this is not always possible however). Furthermore it would be highly advisable to use some sort of hash value in place of the std::string as a hash compare is many times faster than a string compare. You also have the string stored in Foo anyway.
map will, however, work but it really depends on exactly what you are trying to do.
I'm creating a item crafting system for a game and need to be able to take any random selected items that a player could select and transform whatever items selected into a hash_tag which can then be compared to all the hash_tags from all item-mixes possible, searching for a correct match. This should be the simplest and fastest means to get the result I'm looking for, but of all other ways of doing this sort of thing (I've experience with just about all of them), hash tags are the one thing I've never even slightly touched. I've no idea where to even begin, and could use a lot of help with this.
Basically what it needs to do is allow the player to select anything he or she has, combine the selected things into a hash_tag and check the hash tag board for that number. Whether or not that number results in a "valid combination" or a "this is not a valid combination" doesn't matter, so long as all possible mixes are available on the hash tag board.
On the side there'll obviously be some code for picking things and removing them if there's a valid match and adding in the new item instead, but that's not what I need help with.
(Although anyone with suggestions on this I'll be glad to hear them!)
From what I have gathered so far you have an ordered list of inputs
(the items being crafted) and are looking for a function that returns
a hash (probably for easy comparisons and storage) and also has the
property of being reversible.
Such a thing cannot exist for the general case as long as your hash
has less bits than your input data hashing will produce collisions and
with those collisions the backward transformation will be impossible.
A good start would be just to choose unique identifiers for each item
and use a list of those identifiers(order them by size if order is
irrelevant to the crafting) as the hash. Comparison will still be
reasonable fast.
I have a vector which contains a list of Hash maps in Clojure and I have an add-watch on this vector to see any changes made. Is there an easy way to do a diff on the changes made to the hash map, so that maybe I could get a list of just the changed entries in the hash?
Note: This follows on from some earlier posts I have had where I have tried to persist changes to a database for a data structure stored in a ref. I have realised that the easiest way to save state is simple to watch the ref for changes and then store those changes. My ideal solution would be if the add-watch was passed a changelist as well :)
You probably need to define "diff" a little more precisely. For example, does an insertion in the middle of the vector count as a single change or as a change of that element and all subsequent ones? Also are your vectors guaranteed to be the same length?
Having said that, the simple approach would be something like:
First check the length of the two vectors. If one is longer, then consider the extra elements as changes
Then compare all the other elements with the corresponding element in the other vector using not= (this workes with hashes and will be very fast in the common case that the elements haven't changed). Something to get you started: (map not= vector-1 vector-2)
You can then use the answer from stackoverflow.com/questions/3387155/difference-between-two-maps that pmf mentioned if you want to find out exactly how the two maps differ.