Python keeps printing first index of for loop - python-2.7

I have data in two directories and i'm using for loop to read the files from both the folders.
path_to_files = '/home/Desktop/computed_2d/'
path_to_files1 = '/home/Desktop/computed_1d/'
for filen in [x for x in os.listdir(path_to_files) if '.ares' in x]:
df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
for filen1 in [x for x in os.listdir(path_to_files1) if '.ares' in x]:
df1 = pd.read_table(path_to_files1+filen1, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
print(filen,filen1)
Now what's happening is like when tried to print the filenames then it kept printing the names forever. So, its basically taking the first iteration from first loop then print it with all the iteration of the second loop.I don't understand why is it happening.
But what i want to do is, i want to print the first iteration of first loop with the first iteration of second for loop
As the file names are same in both the folders.So when i do the print, then desired result should look like something like this:
(txt_1.txt,txt_1.txt)
(txt_2.txt,txt_2.txt)
(txt_3.txt,txt_3.txt)
(txt_4.txt,txt_4.txt)
Where i'm making the mistake??

If I understand your question correctly, you seem to want to print pairs of files from path_to_files and path_to_files1. Since you are nesting a for loop, for every iteration of the nested for loop, filen is not going to change.
I think you might want something more like this:
path_to_files = '/home/Desktop/computed_2d/'
path_to_files1 = '/home/Desktop/computed_1d/'
filelistn = [x for x in os.listdir(path_to_files) if '.ares' in x]
filelist1 = [x for x in os.listdir(path_to_files1) if '.ares' in x]
for filen, filen1 in zip(filelistn, filelist1):
df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
df1 = pd.read_table(path_to_files1+filen1, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
print(filen,filen1)
For a sample input of:
filelistn = ['a.ar', 'b.ar']
filelist1 = ['c.ar', 'd.ar']
I get the following output:
('a.ar', 'c.ar')
('b.ar', 'd.ar')

Related

Select a batch from a matrix in a loop

Assume I have the following matrix:
X = np.array([[1,2,3], [4,5,6], [7,8,9], [70,80,90], [45,43,68], [112,87,245]])
I want to draw a batch of 2 random rows at each time loop, and send it to a function. For instance, a batch in iteration i can be batch = [[4,5,6], [70,80,90]]
I do the following:
X = np.array([[1,2,3], [4,5,6], [7,8,9], [70,80,90], [45,43,68], [112,87,245]])
def caclulate_batch(batch):
pass
for i in range(X.shape[0]/2):
batch = np.array([])
for _ in range(2):
r = random.randint(0, 5)
batch = np.append(batch, X[r])
caclulate_batch(batch)
There are two problems here: (1) It returns appended array (2) The random number can be repeated which can choose the same row many times. How can modify the code to fit my requirement.
r = np.random.randint(0, len(x), 2) should get you the indices. That lets you use fancy indexing to get the subset: batch = x[r, :].
If you want to accumulate arrays along a new dimension, as your loop does, use np.stack or np.block instead of np.append.
(1) You can use numpy.stack instead of append. EDIT: But this function would be called when you have all your batch in a list like:
list = ([1,2], [3,4])
numpy.stack(list)
# gives [[1,2],
# [3,4]]
(2) You can shuffle X array, loop through the results and extract two by two. Look at numpy.random.shuffle
It would look like that:
S = np.random.shuffle(X)
for i in range(S.shape[0]/2):
batch = S[i*2:i*2+1]
caclulate_batch(batch)

Python: referring to each duplicate item in a list by unique index

I am trying to extract particular lines from txt output file. The lines I am interested in are few lines above and few below the key_string that I am using to search through the results. The key string is the same for each results.
fi = open('Inputfile.txt')
fo = open('Outputfile.txt', 'a')
lines = fi.readlines()
filtered_list=[]
for item in lines:
if item.startswith("key string"):
filtered_list.append(lines[lines.index(item)-2])
filtered_list.append(lines[lines.index(item)+6])
filtered_list.append(lines[lines.index(item)+10])
filtered_list.append(lines[lines.index(item)+11])
fo.writelines(filtered_list)
fi.close()
fo.close()
The output file contains the right lines for the first record, but multiplied for every record available. How can I update the indexing so it can read every individual record? I've tried to find the solution but as a novice programmer I was struggling to use enumerate() function or collections package.
First of all, it would probably help if you said what exactly goes wrong with your code (a stack trace, it doesn't work at all, etc). Anyway, here's some thoughts. You can try to divide your problem into subproblems to make it easier to work with. In this case, let's separate finding the relevant lines from collecting them.
First, let's find the indexes of all the relevant lines.
key = "key string"
relevant = []
for i, item in enumerate(lines):
if item.startswith(key):
relevant.append(item)
enumerate is actually quite simple. It takes a list, and returns a sequence of (index, item) pairs. So, enumerate(['a', 'b', 'c']) returns [(0, 'a'), (1, 'b'), (2, 'c')].
What I had written above can be achieved with a list comprehension:
relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
So, we have the indexes of the relevant lines. Now, let's collected them. You are interested in the line 2 lines before it and 6 and 10 and 11 lines after it. If your first lines contains the key, then you have a problem – you don't really want lines[-1] – that's the last item! Also, you need to handle the situation in which your offset would take you past the end of the list: otherwise Python will raise an IndexError.
out = []
for r in relevant:
for offset in -2, 6, 10, 11:
index = r + offset
if 0 < index < len(lines):
out.append(lines[index])
You could also catch the IndexError, but that won't save us much typing, as we have to handle negative indexes anyway.
The whole program would look like this:
key = "key string"
with open('Inputfile.txt') as fi:
lines = fi.readlines()
relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
out = []
for r in relevant:
for offset in -2, 6, 10, 11:
index = r + offset
if 0 < index < len(lines):
out.append(lines[index])
with open('Outputfile.txt', 'a') as fi:
fi.writelines(out)
To get rid of duplicates you can cast list to set; example:
x=['a','b','a']
y=set(x)
print(y)
will result in:
['a','b']

Sequence of dictionaries in python

I am trying to create a sequence of similar dictionaries to further store them in a tuple. I tried two approaches, using and not using a for loop
Without for loop
dic0 = {'modo': lambda x: x[0]}
dic1 = {'modo': lambda x: x[1]}
lst = []
lst.append(dic0)
lst.append(dic1)
tup = tuple(lst)
dic0 = tup[0]
dic1 = tup[1]
f0 = dic0['modo']
f1 = dic1['modo']
x = np.array([0,1])
print (f0(x) , f1(x)) # 0 , 1
With a for loop
lst = []
for j in range(0,2):
dic = {}
dic = {'modo': lambda x: x[j]}
lst.insert(j,dic)
tup = tuple(lst)
dic0 = tup[0]
dic1 = tup[1]
f0 = dic0['modo']
f1 = dic1['modo']
x = np.array([0,1])
print (f0(x) , f1(x)) # 1 , 1
I really don't understand why I am getting different results. It seems that the last dictionary I insert overwrite the previous ones, but I don't know why (the append method does not work neither).
Any help would be really welcomed
This is happening due to how scoping works in this case. Try putting j = 0 above the final print statement and you'll see what happens.
Also, you might try
from operator import itemgetter
lst = [{'modo': itemgetter(j)} for j in range(2)]
You have accidentally created what is know as a closure. The lambda functions in your second (loop-based) example include a reference to a variable j. That variable is actually the loop variable used to iterate your loop. So the lambda call actually produces code with a reference to "some variable named 'j' that I didn't define, but it's around here somewhere."
This is called "closing over" or "enclosing" the variable j, because even when the loop is finished, there will be this lambda function you wrote that references the variable j. And so it will never get garbage-collected until you release the references to the lambda function(s).
You get the same value (1, 1) printed because j stops iterating over the range(0,2) with j=1, and nothing changes that. So when your lambda functions ask for x[j], they're asking for the present value of j, then getting the present value of x[j]. In both functions, the present value of j is 1.
You could work around this by creating a make_lambda function that takes an index number as a parameter. Or you could do what #DavisYoshida suggested, and use someone else's code to create the appropriate closure for you.

Why this doble loop doesn't work as i expected?

I have two txt files, with 50000 and 25000 data to compare which data are in both files, but only the first line is compared and added to the list res1, (prints were just to get the idea of how it is working) when i run the code it prints the tuple (as expected), but then only prints the values in lineCue and avoid the second loop, the list result is only the first value taked by lineCue, and not all the values repeated in both files. when
i tried by another way the list content have 24808 repetitions... :(
contratos = 'C:\\CONTRATOS.txt'
cuentas = 'C:\\CUENTAS0.txt'
res1 = [[], []] # res1[0] -> ID, res1[1] -> NO ID
res2 = [] # res2 -> REPE
with open(cuentas, 'rb') as cue:
with open(contratos, 'rb') as con:
for lineCue in cue.xreadlines():
print(lineCue)
for lineCon in con.xreadlines():
print(lineCue, lineCon)
if lineCue == lineCon:
res1[0].append(lineCon)
print(res1[0])
output:
['O199924\r\n']
files:
https://dl.dropboxusercontent.com/u/33113171/CONTRATOS.txt
https://dl.dropboxusercontent.com/u/33113171/CUENTAS0.txt
In the first iteration of the outer loop you read the whole file con. You need to read it from start each time. To do so, use con.seek(0) to go to the beginning of this file before entering the inner loop.

Getting duplicates in dict

I have a dictionary, say d1 that looks like this:
d = {'file1': 4098, 'file2': 4139, 'file3': 4098, 'file4': 1353, 'file5': 4139}
Now, I've figured out how to get it to tell me if there are any dublicates or not. But what I'd like to get it to do is tell me if there are any, and what 2 (or more) values (and corresponding keys) are dublicates.
The output for the above would tell me that file1 and file3 are identical and that file2 and file5 are identical
I've been trying to wrap my head around it for a few hours, and haven't found the right solution yet.
try this to get the duplicates:
[item for item in d.items() if [val for val in d.values()].count(item[1]) > 1]
that outputs:
[('file3', 4098), ('file2', 4139), ('file1', 4098), ('file5', 4139)]
next sort the list by the second item in the tuple:
list = sorted(list, key=operator.itemgetter(1))
finally use itertools.groupby() to group by the second item:
list = [list(group) for key, group in itertools.groupby(list, operator.itemgetter(1))]
final output:
[[('file3', 4098), ('file1', 4098)], [('file2', 4139), ('file5', 4139)]]