I want to loop through files as long as there's files left (Where I edit and remove files inside the loop but glob.glob seems to only initialize at the beginning of the loop and not in every iteration. How can I get this achieved?
Thanks,
Ron
edit 1
I tried:
for MyBuff in glob.glob(DATA_FOLDER+"*.bin"):
tarcmd = TARBIN+" -cjf "+DATA_FOLDER+str(int(time.time()*100))+".tar.bz2 $(find "+DATA_FOLDER+" -name \"*.bin\"| head -500)"
#...tarcmd operations and removal of the 500 inserted bin files
Now, the problem is that I tar up to 500 files into the tarball but what if there's more than 500? I want to loop until no bin files are left...
Assuming that they are text files, Here is a way but I'm not sure if this is the best one.
import os, re
filepath = '/path/to/file/directory/'
result = [f for f in os.listdir(filepath) if re.search(r'[A-Za-z0-9]+\.txt$', f)]
for f in result:
print f
Related
I am trying to find some missing files, but those files are in a pair.
as example, we have files like:
file1_LEFT
file1_RIGHT
file2_LEFT
file2_RIGHT
file3_LEFT
file4_RIGHT
...
The ideea is the name is same but they have a left\right pair. Normally we have thousands of files but somewhere there, we'll find some files without a pair. Like file99_LEFT is present but RIGHT is missing (or vice-versa for sides).
I'm trying to make a script in python 2.7 (yes i'm using an old python for personal reasons... unfortunately) but i have no clue how can be realized.
ideas tried:
-verify them 2 by 2 and check if we have RIGHT in current file and LEFT in previous, print ok, else print the file that's not matching. But after first one is printed, all others are failing due to fact that the structure is changed, at that point we won't have left-right one next to eachother, their order will be re-arranged
-create separate lists for LEFT and RIGHT and compare them but again first one will be found but won't work for others.
Code i've used until now:
import os
import fnmatch,re
path = raw_input('Enter files path:')
for path, dirname, filenames in os.walk(path):
for fis in filenames:
print fis
print len(filenames)
for i in range(1,len(filenames),2):
print filenames[i]
if "RIGHT" in filenames[i] and "LEFT" in filenames[i-1]:
print "Ok"
else:
print "file >"+fis+"< has no pair"
f = open(r"D:\rec.txt", "a")
f.writelines(fis + "\n")
f.close()
Thanks for your time!
We can use glob to list the files in a given path, filtered by a search pattern.
If we consider one set of all LEFT filenames, and another set of all RIGHT filenames, can we say you are looking for the elements not in the intersection of these two sets?
That is called the "symmetric difference" of those two sets.
import glob
# Get a list of all _LEFT filenames (excluding the _LEFT part of the name)
# Eg: ['file1', 'file2' ... ].
# Ditto for the _RIGHT filenames
# Note: glob.glob() will look in the current directory where this script is running.
left_list = [x.replace('_LEFT', '') for x in glob.glob('*_LEFT')]
right_list = [x.replace('_RIGHT', '') for x in glob.glob('*_RIGHT')]
# Print the symmetric difference between the two lists
symmetric_difference = list(set(left_list) ^ set(right_list))
print symmetric_difference
# If you'd like to save the names of missing pairs to file
with open('rec.txt', 'w') as f:
for pairname in symmetric_difference:
print >> f, pairname
# If you'd like to print which file (LEFT or RIGHT) is missing a pair
for filename in symmetric_difference:
if filename in left_list:
print "file >" + filename + "_LEFT< has no pair"
if filename in right_list:
print "file >" + filename + "_RIGHT< has no pair"
I'm using the map function with pandas to read in files with multiprocessing like
files = glob.glob('C:\Desktop\Folder\*.xlsx')
def read_excel(filename):
return pd.read_excel(filename)
file_list = [filename for filename in files]
pool = Pool(processors = 4)
pool.map(read_excel, file_list)
But the problem is that whereas before I was using a for loop and could have a counter
count += 1
for each iteration through the loop and print the count / len(files) to get a sense of how far along the process is, I can't do that here. I realize with multiprocessing it could get a little funky but there should be some way to implement this.
I am trying to get this function to read an input file and output the lines from the input file into a new file. Pycharm keeps saying 'item' is not being used or it was used in the first for loop. I don't see why 'item' is a problem. It also won't create the new file.
input_list = 'persist_output_input_file_test.txt'
def persist_output(input_list):
input_file = open(input_list, 'rb')
lines = input_file.readlines()
input_file.close()
for item in input_list:
write_new_file = open('output_word.txt', 'wb')
for item in lines:
print>>input_list, item
write_new_file.close()
You have a few things going wrong in your program.
input_list seems to be a string denoting the name of a file. Currently you are iterating over the characters in the string with for item in input_list.
You shadow the already created variable item in your second for loop. I recommend you change that.
In Python, depending on which version you use, the correct syntax for printing a statement to the screen is print text(Python 2) or print(text)(Python 3). Unlike c++'s std::cout << text << endl;. << and >> are actually bit wise operators in Python that shift the bits either to the left or to the right.
There are a few issues in your implementation. Refer the following code for what you intend to do:
def persist_output(input_list):
input_file = open(input_list, 'rb')
lines = input_file.readlines()
write_new_file = open('output_word.txt', 'wb')
input_file.close()
for item in lines:
print item
write_new_file.write(item);
The issues with your earlier implementation are as follows:
In the first loop you are iterating in the input file name. If you intend to keep input_list a list of input files to be read, then you will also have to open them. Right now, the loop iterates through the characters in the input file name.
You are opening the output file in a loop. So, Only the last write operation will be successful. You would have to move the the file opening operation outside the loop(Ref: above code snippet) or edit the mode to 'append'. This can be done as follows:
write_new_file = open('output_word.txt', 'a')
There is a syntax error with the way you are using print command.
f=open('yourfilename','r').read()
f1=f.split('\n')
p=open('outputfilename','w')
for i in range (len(f1)):
p.write(str(f1[i])+'\n')
p.close()
hope this helps.
I was (unsuccessfully) trying to figure out how to create a list of compound letters using loops. I am a beginner programmer, have been learning python for a few months. Fortunately, I later found a solution to this problem - Genearte a list of strings compound of letters from other list in Python - see the first answer.
So I took that code and added a little to it for my needs. I randomized the list, turned the list into a comma separated file. This is the code:
from string import ascii_lowercase as al
from itertools import product
import random
list = ["".join(p) for i in xrange(1,6) for p in product(al, repeat = i)]
random.shuffle(list)
joined = ",".join(list)
f = open("double_letter_generator_output.txt", 'w')
print >> f, joined
f.close()
What I need to do now is split that massive file "double_letter_generator_output.txt" into smaller files. Each file needs to consist of 200 'words'. So it will need to split into many files. The files of course do not exist yet and will need to be created by the program also. How can I do that?
Here's how I would do it, but I'm not sure why you're splitting this into smaller files. I would normally do it all at once, but I'm assuming the file is too big to be stored in working memory, so I'm traversing one character at a time.
Let bigfile.txt contain
1,2,3,4,5,6,7,8,9,10,11,12,13,14
MAX_NUM_ELEMS = 2 #you'll want this to be 200
nameCounter = 1
numElemsCounter = 0
with open('bigfile.txt', 'r') as bigfile:
outputFile = open('output' + str(nameCounter) + '.txt', 'a')
for letter in bigfile.read():
if letter == ',':
numElemsCounter += 1
if numElemsCounter == MAX_NUM_ELEMS:
numElemsCounter = 0
outputFile.close()
nameCounter += 1
outputFile = open('output' + str(nameCounter) + '.txt', 'a')
else:
outputFile.write(letter);
outputFile.close()
now output1.txt is 1,2, output2.txt is 3,4, output3.txt is 5,6, etc.
$ cat output7.txt
13,14
This is a little sloppy, you should write a nice function to do it and format it the way you like!
FYI, if you want to write to a bunch of different files, there's no reason to write to one big file first. Write to the little files right off the bat.
This way, the last file might have fewer than MAX_NUM_ELEMS elements.
I have a file containing a list of event spaced with some time. Here is an example:
0, Hello World
0.5, Say Hi
2, Say Bye
I would like to be able to replay this sequence of events. The first column is the delta between the two consecutive events ( the first starts immendiately, the second happens 0.5s later, the third 2s later, ... )
How can i do that on Windows . Is there anything that can ensure that I am very accurate on the timing ? The idea is to be as close as what you would have listneing some music , you don't want your audio event to happen close to the right time but just on time .
This can be done easily by using the sleep function from the time module. The exact code should work like this:
import time
# Change data.txt to the name of your file
data_file = open("data.txt", "r")
# Get rid of blank lines (often the last line of the file)
vals = [i for i in data_file.read().split('\n') if i]
data_file.close()
for i in vals:
i = i.split(',')
i[1] = i[1][1:]
time.sleep(float(i[0]))
print i[1]
This is an imperfect algorithm, but it should give you an idea of how this can be done. We read the file, split it to a newline delimited list, then go through each comma delimited couplet sleeping for the number of seconds specified, and printing the specified string.
You're looking for time.sleep(...) in Python.
If you load that file as a list, and then print the values,
import time
with open("datafile.txt", "r") as infile:
lines = infile.read().split('\n')
for line in lines:
wait, response = line.split(',')
time.sleep(float(wait))
print response