I have two txt files, with 50000 and 25000 data to compare which data are in both files, but only the first line is compared and added to the list res1, (prints were just to get the idea of how it is working) when i run the code it prints the tuple (as expected), but then only prints the values in lineCue and avoid the second loop, the list result is only the first value taked by lineCue, and not all the values repeated in both files. when
i tried by another way the list content have 24808 repetitions... :(
contratos = 'C:\\CONTRATOS.txt'
cuentas = 'C:\\CUENTAS0.txt'
res1 = [[], []] # res1[0] -> ID, res1[1] -> NO ID
res2 = [] # res2 -> REPE
with open(cuentas, 'rb') as cue:
with open(contratos, 'rb') as con:
for lineCue in cue.xreadlines():
print(lineCue)
for lineCon in con.xreadlines():
print(lineCue, lineCon)
if lineCue == lineCon:
res1[0].append(lineCon)
print(res1[0])
output:
['O199924\r\n']
files:
https://dl.dropboxusercontent.com/u/33113171/CONTRATOS.txt
https://dl.dropboxusercontent.com/u/33113171/CUENTAS0.txt
In the first iteration of the outer loop you read the whole file con. You need to read it from start each time. To do so, use con.seek(0) to go to the beginning of this file before entering the inner loop.
Related
I have data in two directories and i'm using for loop to read the files from both the folders.
path_to_files = '/home/Desktop/computed_2d/'
path_to_files1 = '/home/Desktop/computed_1d/'
for filen in [x for x in os.listdir(path_to_files) if '.ares' in x]:
df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
for filen1 in [x for x in os.listdir(path_to_files1) if '.ares' in x]:
df1 = pd.read_table(path_to_files1+filen1, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
print(filen,filen1)
Now what's happening is like when tried to print the filenames then it kept printing the names forever. So, its basically taking the first iteration from first loop then print it with all the iteration of the second loop.I don't understand why is it happening.
But what i want to do is, i want to print the first iteration of first loop with the first iteration of second for loop
As the file names are same in both the folders.So when i do the print, then desired result should look like something like this:
(txt_1.txt,txt_1.txt)
(txt_2.txt,txt_2.txt)
(txt_3.txt,txt_3.txt)
(txt_4.txt,txt_4.txt)
Where i'm making the mistake??
If I understand your question correctly, you seem to want to print pairs of files from path_to_files and path_to_files1. Since you are nesting a for loop, for every iteration of the nested for loop, filen is not going to change.
I think you might want something more like this:
path_to_files = '/home/Desktop/computed_2d/'
path_to_files1 = '/home/Desktop/computed_1d/'
filelistn = [x for x in os.listdir(path_to_files) if '.ares' in x]
filelist1 = [x for x in os.listdir(path_to_files1) if '.ares' in x]
for filen, filen1 in zip(filelistn, filelist1):
df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
df1 = pd.read_table(path_to_files1+filen1, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')
print(filen,filen1)
For a sample input of:
filelistn = ['a.ar', 'b.ar']
filelist1 = ['c.ar', 'd.ar']
I get the following output:
('a.ar', 'c.ar')
('b.ar', 'd.ar')
I have two loops which have two variables each.
cutoff1 and cutoff2 contain measured data, and BxTime and ByTimecontain time data from 0 to 300 s (It was used to set up scale in matplotlib). I have used this loop:
Loop = zip(BxTime,cutoff1)
for tup in Loop:
print (tup[0], tup[1])
Loop2 = zip(ByTime,cutoff2)
for tup in Loop2:
print (tup[0], tup[1])
It prints in my PyCharm enviroment a vertical list of measurements and time of their occurence, first from Loop, then from Loop2. My question here is a bit complex, because I need to:
Save this loops to a file which will write my data vertically. First column cutoff1 second column BxTime, third column cutoff2 forth column ByTime.
Second, after or before step 1, I need to erase concrete measurements. Any ideas?
UPDATE:
def writeData(fileName, tups):
'''takes a filename, creates the file blank (and deletes existing one)
and a list of 2-tuples of numbers. writes any tuple to the file where
the second value is > 100'''
with open(fileName,"w") as f:
for (BxTime,cutoff1) in tups:
if cutoff1 <= 100:
continue # goes to the nex tuple
f.write(str(BxTime) + '\t' + str(cutoff1) + "\n" )`
Loop = zip(BxTime,cutoff1)
# for tup in Loop:
# print (tup[0], tup[1])
Loop2 = zip(ByTime,cutoff2)
# for tup in Loop2:
# print (tup[0], tup[1])
writeData('filename1.csv', Loop)
writeData('filename2.csv', Loop2)
I have used that code, but:
There are still measurements which contain 100.0
Before saving to a file I have to wait till the whole loop is printed, how to avoid that?
Is there any other way to save it to a text file instead of csv, which later open as Excel?
def writeData(fileName, tups):
'''takes a filename, creates the file blank (and deletes existing one)
and a list of 2-tuples of numbers. writes any tuple to the file where
the second value is >= 100'''
with open(fileName,"w") as f:
for (tim,dat) in tups:
if dat < 100:
continue # goes to the nex tuple
f.write(str(tim) + '\t' + str(dat) + "\n" ) # adapt to '\r\n' for windows
Use it like this:
writeData("kk.csv", [(1,101),(2,99),(3,200),(4,99.999)])
Output:
1 101
3 200
Should work with your zipped Loop and Loop2 of (time, data).
You might need to change the line-end from '\n' to '\r\n' on windows.
Assume I have the following matrix:
X = np.array([[1,2,3], [4,5,6], [7,8,9], [70,80,90], [45,43,68], [112,87,245]])
I want to draw a batch of 2 random rows at each time loop, and send it to a function. For instance, a batch in iteration i can be batch = [[4,5,6], [70,80,90]]
I do the following:
X = np.array([[1,2,3], [4,5,6], [7,8,9], [70,80,90], [45,43,68], [112,87,245]])
def caclulate_batch(batch):
pass
for i in range(X.shape[0]/2):
batch = np.array([])
for _ in range(2):
r = random.randint(0, 5)
batch = np.append(batch, X[r])
caclulate_batch(batch)
There are two problems here: (1) It returns appended array (2) The random number can be repeated which can choose the same row many times. How can modify the code to fit my requirement.
r = np.random.randint(0, len(x), 2) should get you the indices. That lets you use fancy indexing to get the subset: batch = x[r, :].
If you want to accumulate arrays along a new dimension, as your loop does, use np.stack or np.block instead of np.append.
(1) You can use numpy.stack instead of append. EDIT: But this function would be called when you have all your batch in a list like:
list = ([1,2], [3,4])
numpy.stack(list)
# gives [[1,2],
# [3,4]]
(2) You can shuffle X array, loop through the results and extract two by two. Look at numpy.random.shuffle
It would look like that:
S = np.random.shuffle(X)
for i in range(S.shape[0]/2):
batch = S[i*2:i*2+1]
caclulate_batch(batch)
I can't seem to find the error in my Python 2.7.13 code. When I try to run it, the following shows up:
"IndexError: list index out of range"
for d in dopant[1:]:
for s in xrange(1,3,2):
for k in xrange(0,1):
# creates folder
try:
os.makedirs("path")
except OSError:
if not os.path.isdir("path"):
raise
# enters that folder
os.chdir("path")
file2 = open("atomicXYZ","a+")
stdin=subprocess.PIPE, stdout = file2).stdin
subprocess.Popen(['cat', '/path/file'], stdout = cmd1)
file2.seek(0)
# The following reads atomicXYZ and converts its contents to tuples
result = []
with file2 as fp:
for i in fp.readlines():
tmp = i.split()
try:
result.append((float(tmp[0]), float(tmp[1]), float(tmp[2])))
except:pass
# As a check, I access the last line of the tuple
x,y,z = result[len(result)-1]
os.chdir("..")
That is when the error shows up. This is surprising because atomicXYZ is NOT empty, as you can see here:
atomicXYZ
0.309595018 0.070879924 0.041045030
0.600985479 0.103996517 0.130482163
0.982347083 -0.008801119 -0.088718291
0.266923601 0.125720284 -0.038070136
0.520845390 0.163282973 0.061118496
0.812787033 0.194089924 0.131124240
0.398054509 0.270533816 -0.097226923
0.673016094 0.332428625 0.006571612
0.968473946 0.356972107 0.087712083
0.896549601 0.449435057 0.027658530
0.602586223 0.391867525 -0.070503370
0.732266134 0.576057624 -0.111890811
1.018201372 0.643127004 -0.009288985
0.914029765 0.703744085 -0.066356115
And what is even stranger is that when I split this entire code into two codes-- one for writing the atomic coordinates, and one for reading the coordinates-- it works.
What is it that I'm doing wrong?
I am trying to extract particular lines from txt output file. The lines I am interested in are few lines above and few below the key_string that I am using to search through the results. The key string is the same for each results.
fi = open('Inputfile.txt')
fo = open('Outputfile.txt', 'a')
lines = fi.readlines()
filtered_list=[]
for item in lines:
if item.startswith("key string"):
filtered_list.append(lines[lines.index(item)-2])
filtered_list.append(lines[lines.index(item)+6])
filtered_list.append(lines[lines.index(item)+10])
filtered_list.append(lines[lines.index(item)+11])
fo.writelines(filtered_list)
fi.close()
fo.close()
The output file contains the right lines for the first record, but multiplied for every record available. How can I update the indexing so it can read every individual record? I've tried to find the solution but as a novice programmer I was struggling to use enumerate() function or collections package.
First of all, it would probably help if you said what exactly goes wrong with your code (a stack trace, it doesn't work at all, etc). Anyway, here's some thoughts. You can try to divide your problem into subproblems to make it easier to work with. In this case, let's separate finding the relevant lines from collecting them.
First, let's find the indexes of all the relevant lines.
key = "key string"
relevant = []
for i, item in enumerate(lines):
if item.startswith(key):
relevant.append(item)
enumerate is actually quite simple. It takes a list, and returns a sequence of (index, item) pairs. So, enumerate(['a', 'b', 'c']) returns [(0, 'a'), (1, 'b'), (2, 'c')].
What I had written above can be achieved with a list comprehension:
relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
So, we have the indexes of the relevant lines. Now, let's collected them. You are interested in the line 2 lines before it and 6 and 10 and 11 lines after it. If your first lines contains the key, then you have a problem – you don't really want lines[-1] – that's the last item! Also, you need to handle the situation in which your offset would take you past the end of the list: otherwise Python will raise an IndexError.
out = []
for r in relevant:
for offset in -2, 6, 10, 11:
index = r + offset
if 0 < index < len(lines):
out.append(lines[index])
You could also catch the IndexError, but that won't save us much typing, as we have to handle negative indexes anyway.
The whole program would look like this:
key = "key string"
with open('Inputfile.txt') as fi:
lines = fi.readlines()
relevant = [i for (i, item) in enumerate(lines) if item.startswith(key)]
out = []
for r in relevant:
for offset in -2, 6, 10, 11:
index = r + offset
if 0 < index < len(lines):
out.append(lines[index])
with open('Outputfile.txt', 'a') as fi:
fi.writelines(out)
To get rid of duplicates you can cast list to set; example:
x=['a','b','a']
y=set(x)
print(y)
will result in:
['a','b']