Update a List of objects by comparing with another list - list

I do have two different lists of same object , one is with sample data , one is with real data. Few fields in the real data are messed up, I need to update the few fields of real data list , by getting those values from sample data .
Both lists are of same object, both have same unique key .
List<pojo> real = [(code:60,active:Y,account:check),(code:61,active:Y,account:check),(code:62,active:Y,account:check)];
List<pojo> sample = [(code:60,active:Y,account:saving),(code:61,active:Y,account:check),(code:62,active:Y,account:saving)]
I have around 60 objects in each list , In the above one I need to update real where code is 60 and 62 - account from check to saving.
I am using java 1.8 & groovy
thanks

Is this what you need?
class Pojo {
def code
def active
def account
String toString() {
account
}
}
List<Pojo> real = [new Pojo(code: 60, active: 'Y', account: 'check'), new Pojo(code: 61, active: 'Y', account: 'check'), new Pojo(code: 62, active: 'Y', account: 'check')]
List<Pojo> sample = [new Pojo(code: 60, active: 'Y', account: 'saving'), new Pojo(code: 61, active: 'Y', account: 'check'), new Pojo(code: 62, active: 'Y', account: 'saving')]
real.each { r ->
def acc = sample.find{it.code == r.code}?.account
if (acc != null) {
r.account = acc
}
}
println real // prints [saving, check, saving]
The above sample iterates with each over each pojo in real and searches the corresponding object (that with the same code) in the sample list. If the corresponding object is found, the value of account in the object of the real list is overwritten, otherwise it will be left as it is.

Here is the script that updates real data after comparing with sample data as requested by OP.
Note that the input is not valid, so made it valid by changing values inside list as map. i.e.,
changed from (code:60,active:'Y',account:'check')
to [code:60,active:'Y',account:'check']
def realData = [[code:60,active:'Y',account:'check'],[code:61,active:'Y',account:'check'],[code:62,active:'Y',account:'check']]
def sampleData = [[code:60,active:'Y',account:'saving'],[code:61,active:'Y',account:'check'],[code:62,active:'Y',account:'saving']]
realData.collect{rd -> sampleData.find{ it.code == rd.code && (it.account == rd.account ?: (rd.account = it.account))}}
println realData
Output:
[[code:60, active:Y, account:saving], [code:61, active:Y, account:check], [code:62, active:Y, account:saving]]
You can quickly try online Demo

Related

How do I access the original data from object in python. Here I've used group by to group data

I am trying to group data based on the following data fields I have, and when I am not able to access the original data in the fields
Printing the filtered_data is giving something like "object at 0x10dd1abf0>", so I need to access the original human-readable value in the objects.
data_objects = ['*', '*', '*', ......] // This is list of data items
filterd_data_objects = groupby(
data_objects, lambda data: (data.x, data.y, data.z) and data.p
)
print(filtered_data_objects)
// This is giving <itertools.groupby object at 0x1066ceb30>, Need to access the original content in the data objects.
for filterd_data_object, _ in filterd_data_objects:
x = data_object[0] // this is not working I've tried this to access the original data
y = data_object[1]
z = data_object[2]
p = data_object[3]
You need to wrap the data_object by list like
list(data_object)
You can refer to this example
from itertools import groupby
data_objects = [{"a": 1}, {"a": 1}, {"a": 2}]
for dobject, x in groupby( data_objects, lambda data : data["a"]):
print(dobject,list(x))

Determining if an int exist in a list, without using the "in" function

I need to get user input to generate a list of 8 numbers, but when they input a number that is already in the list print and error . Without using the in function to determine if its in the list. Here's what I have so far.
def main():
myList = range(9)
a= True
for i in myList:
while a == True:
usrNum = int(input("Please enter a number: "))
if usrNum != myList[i]:
myList.append(usrNum)
print(i)
main()
Error for above code,
Scripts/untitled4.py", line 18, in main
myList.append(usrNum)
AttributeError: 'range' object has no attribute 'append'
The issue seems to be your way of generating myList. If you generate it with myList = [range(9)] you'll get:
[[0, 1, 2, 3, 4, 5, 6, 7, 8]]
Try using simply:
myList = range(9)
Also, you need to change myList.append[usrNum] with myList.append(usrNum) or you'll get a:
TypeError: 'builtin_function_or_method' object has no attribute '__getitem__'
You could also use wim's suggestion instead of the != operator:
if myList.__contains__(usrNum):
myList.append(usrNum)
There are two ways you can go about this:
Loop through the list to check each element.
The in operator is effectively doing:
for each value in the list:
if the value is what you're looking for
return True
if you reach the end of the list:
return False
If you can add that check into your code, you'll have your problem solved.
Use an alternate way of tracking which elements have been added
Options include a dict, or bits of an int.
For example, create checks = {}. When you add an value to the list, set checks[usrNum] = True. Then checks.get(usrNum, False) will return a boolean indicating whether the number already exists. You can simplify that with a collections.DefaultDict, but I suspect that may be more advanced than you're ready for.
The first is probably the result your instructor is after, so I'll give you a simple version to work with and massage to fit your needs.
myList = []
while True:
usrNum = int(input())
found = False
for v in myList:
if usrNum == v:
found = True
if not found:
myList.append(usrNum)
else:
#number was already in the list, panic!
Most instructors will be more impressed, and hence award better grades, if you can figure out how to do something like method 2, however.
You could do something like this, modify as needed (not sure when/if you want to break when the user enters a number that is already in the list, etc.)
This prompts for user input until they enter an item that already exists in the list, then it prints a message to the user, and stops execution.
def main():
mylist = range(9)
while True:
usrNum = int(input("Please enter a number: "))
if existsinlist(mylist, usrNum):
print("{} is already in the list {}".format(usrNum, mylist))
break
else:
mylist.append(usrNum)
def existsinlist(lst, itm):
for i in lst:
if itm == i:
return True
return False
Perhaps the point of this homework assignment is to help you understand how an operator like in is more efficient to read (and write, and compile) than the explicit loop that I used in the existsinlist function.
Not sure if list-comperehension would be allowable in this case, but you also could've done something like this, without relying on the existsinlist helper function:
def main():
mylist = range(9)
while True:
usrNum = int(input("Please enter a number: "))
if [i for i in mylist if i == usrNum]:
print("{} is already in the list {}".format(usrNum, mylist))
break
else:
mylist.append(usrNum)
In this case, the result of the list-comprehension can be evaluated for truthiness:
An empty list like [] results if no matching value exists, and this will be considered False
A non-empty list will result if at least one matching value exists, and this will be considered True
Yet another option which short-circuits and may be preferable:
if any(usrNum == i for i in mylist)

I don't understand why I'm getting an index error, when trying to extract exif data

The code and error with sample data from an image:
image = Image.open(newest)
exif = image._getexif()
gps = {}
datebool = False
gpsbool = False
date = 'None'
time = 'None'
gpstext = 'None'
dmslat = 'None'
dmslon = 'None'
if exif is not None:
for tag, entry in exif.items(): #Import date and time from Exif
datebool = True
if TAGS.get(tag, tag) == 'DateTimeOriginal':
date = entry[0:10]
time = entry[11:19]
for tag, entry in exif.items(): #Check if the GPSInfo field exists
if TAGS.get(tag,tag) == 'GPSInfo':
gpsbool = True
for e in entry:
decoded = GPSTAGS.get(e,e)
print (decoded)
print(type(entry))
gps[decoded] = entry[e]
The results
4984
<type 'tuple'>
Traceback (most recent call last):File"C:\Users\~~~~~\Desktop\project_7-8-2015\8_bands\Program_camera.py", line 109, in <module>
gps[decoded] = entry[e]
IndexError: tuple index out of range
Since e is pulled from entry, how can indexing that particular e from entry generate an indexing error? Am I actually pulling the correct data for the gps?
for e in entry doesn't index the values in entry, it iterates over them. For example:
entry = (3, 5, 7)
for e in entry:
print(e)
will output:
3
5
7
So the line should probably look like:
gps[decoded] = e
though I'm not sure what the GPSTAGS line would become. If you really need the items in entry enumerated, then you should look into (to your great surprise, I'm sure) the enumerate() function.

Same code different results from Py2.7 to Py3.4. Where is the mistake?

I am refactoring a few lines of code found in Harrington, P. (2012). Machine Learning in Action, Chapters 11 and 12. The code is supposed to build an FP-tree from a test dataset and it goes as it follows.
from __future__ import division, print_function
class treeNode:
'''
Basic data structure for an FP-tree (Frequent-Pattern).
'''
def __init__(self, nameValue, numOccur, parentNode):
self.name = nameValue
self.count = numOccur
self.nodeLink = None
self.parent = parentNode
self.children = {}
def inc(self, numOccur):
'''
Increments the count variable by a given amount.
'''
self.count += numOccur
def disp(self, ind=1):
'''
Displays the tree in text.
'''
print('{}{}:{}'.format('-'*(ind-1),self.name,self.count))
for child in list(self.children.values()):
child.disp(ind+1)
def createTree(dataSet, minSup=1):
'''
Takes the dataset and the minimum support
and builds the FP-tree.
'''
headerTable = {} #stores the counts
#loop over the dataset and count the frequency of each term.
for trans in dataSet:
for item in trans:
headerTable[item] = headerTable.get(item, 0) + dataSet[trans]
#scan the header table and delete items occurring less than minSup
for k in list(headerTable.keys()):
if headerTable[k] < minSup:
del(headerTable[k])
freqItemSet = set(headerTable.keys())
#if no item is frequent, quit
if len(freqItemSet) == 0:
return None, None
#expand the header table
#so it can hold a count and pointer to the first item of each type.
for k in list(headerTable.keys()):
headerTable[k] = [headerTable[k], None]
#create the base node, which contains the 'Null Set'
retTree = treeNode('Null Set', 1, None)
#iterate over the dataset again
#this time using only items that are frequent
for tranSet, count in list(dataSet.items()):
localD = {}
for item in tranSet:
if item in freqItemSet:
localD[item] = headerTable[item][0]
if len(localD) > 0:
#sort the items and the call updateTree()
orderedItems = [v[0] for v in sorted(list(localD.items()),
key=lambda p: p[1], reverse=True)]
updateTree(orderedItems, retTree, headerTable, count)
return retTree, headerTable
def updateTree(items, inTree, headerTable, count):
if items[0] in inTree.children:
inTree.children[items[0]].inc(count)
else:
#Populate tree with ordered freq itemset
inTree.children[items[0]] = treeNode(items[0], count, inTree)
if headerTable[items[0]][1] == None:
headerTable[items[0]][1] = inTree.children[items[0]]
else:
updateHeader(headerTable[items[0]][1],inTree.children[items[0]])
#Recursively call updateTree on the remaining items
if len(items) > 1:
updateTree(items[1::], inTree.children[items[0]], headerTable, count)
def updateHeader(nodeToTest, targetNode):
while (nodeToTest.nodeLink != None):
nodeToTest = nodeToTest.nodeLink
nodeToTest.nodeLink = targetNode
def loadSimpDat():
simpDat = [['r', 'z', 'h', 'j', 'p'],
['z', 'y', 'x', 'w', 'v', 'u', 't', 's'],
['z'],
['r', 'x', 'n', 'o', 's'],
['y', 'r', 'x', 'z', 'q', 't', 'p'],
['y', 'z', 'x', 'e', 'q', 's', 't', 'm']]
return simpDat
def createInitSet(dataSet):
retDict = {}
for trans in dataSet:
retDict[frozenset(trans)] = 1
return retDict
simpDat = loadSimpDat()
initSet = createInitSet(simpDat)
myFPtree, myHeaderTab = createTree(initSet, 3)
myFPtree.disp()
This code run without errors in both Python 2.7.9 and 3.4.3. However the output I get is different. Moreover, the output I get with using Py2.7 is consistent while running the same code over and over again with Py3.4 leads to different results.
The correct result is the one obtained using Py2.7 but I cannot figure out why it doesn't work on 3.4.
Why?
What is wrong with this code when interpreted with Python3?
The output describe a defined tree. The order of the branches can change, but the underlined tree shall be the same. This is always the case with Python2 where the output looks like this:
-x:1
--s:1
---r:1
-z:5
--x:3
---y:3
----s:2
-----t:2
----r:1
-----t:1
--r:1
It should represent this tree.
Null
/ \
x z
/ \ / \
s r x r
| |
r y
/ \
s r
| |
t t
This is an example of the wrong result I get using Python3.
-z:5
--r:1
--x:3
---t:3
----y:2
-----s:2
----r:1
-----y:1
-x:1
--r:1
---s:1
P.S. I have tried to use OrderedDict instead of {}, it doesn't change anything...
What seem to happen is that you rely on the ordering of dict iteration (ie the order of d.keys(), d.items() etc). While both python2 and python3 guarantee that the iteration order is consistent during the execution it doesn't guarantee that it is consistent from run to run.
Therefore it's a correct behaviour that the order of the output differs from run to run. That you get the same result from run to run in python2 is pure "luck".
You can get python3 to behave deterministic by setting the PYTHONHASHSEED environment variable to a fixed value, but probably you shouldn't rely on dict iteration to be deterministic.

Adding 2 list inside a dictionnary

I've been trying to add the number of 2 list inside a dictionnary. The thing is, I need to verify if the value in the selected row and column is already in the dictionnary, if so I want to add the double entry list to the value (another double entry list) already existing in the dictionnary. I'm using a excel spreadsheet + xlrd so i can read it up. I' pretty new to this.
For exemple, the code is checking the account (a number) in the specified row and columns, let's say the value is 10, then if it's not in the dictionnary, it add the 2 values corresponding to this count, let's say [100, 0] as a value to this key. This is working as intended.
Now, the hard part is when the account number is already in the dictionnary. Let's say its the second entry for the account number 10. and it's [50, 20]. I want the value associated to the key "10" to be [150, 20].
I've tried the zip method but it seems to return radomn result, Sometimes it adds up, sometime it doesn't.
import xlrd
book = xlrd.open_workbook("Entry.xls")
print ("The number of worksheets is", book.nsheets)
print ("Worksheet name(s):", book.sheet_names())
sh = book.sheet_by_index(0)
print (sh.name,"Number of rows", sh.nrows,"Number of cols", sh.ncols)
liste_compte = {}
for rx in range(4, 10):
if (sh.cell_value(rowx=rx, colx=4)) not in liste_compte:
liste_compte[((sh.cell_value(rowx=rx, colx=4)))] = [sh.cell_value(rowx=rx, colx=6), sh.cell_value(rowx=rx, colx=7)]
elif (sh.cell_value(rowx=rx, colx=4)) in liste_compte:
three = [x + y for x, y in zip(liste_compte[sh.cell_value(rowx=rx, colx=4)],[sh.cell_value(rowx=rx, colx=6), sh.cell_value(rowx=rx, colx=7)])]
liste_compte[(sh.cell_value(rowx=rx, colx=4))] = three
print (liste_compte)
I'm not going to directly untangle your code, but just help you with a general example that does what you want:
def update_balance(existing_balance, new_balance):
for column in range(len(existing_balance)):
existing_balance[column] += new_balance[column]
def update_account(accounts, account_number, new_balance):
if account_number in accounts:
update_balance(existing_balance = accounts[account_number], new_balance = new_balance)
else:
accounts[account_number] = new_balance
And finally you'd do something like (assuming your xls looks like [account_number, balance 1, balance 2]:
accounts = dict()
for row in xls:
update_account(accounts = accounts,
account_number = row[0],
new_balance = row[1:2])