I am looking for a more elegant solution to an issue I am facing when trying to recurse to multiple levels to list out dirs and files.
os.walk(folder) does sub and sub-sub levels, but I need to go to at least 5 deep.
I have come up with the following to traverse multiple directories, however, is there a better, or more elegant way that I am missing?
rootPath = '/path/to/my/folder/test'
print '###### using os.walk ######'
for root, dirs, files in os.walk(rootPath):
print 'directory - ' + " ".join(dirs)
for d in dirs:
for f in files:
if not f.startswith('.'):
print 'directory - ' + d + ' file - ' + f
print '\n\n\n###### using isdir ######'
for f in os.listdir(rootPath):
print '-' + f
if os.path.isdir(os.path.join(rootPath,f)):
for fo in os.listdir(os.path.join(rootPath,f)):
print '--' + fo
if os.path.isdir(os.path.join(rootPath,f,fo)):
for fol in os.listdir(os.path.join(rootPath,f,fo)):
print '---' + fol
if os.path.isdir(os.path.join(rootPath,f,fo,fol)):
for fold in os.listdir(os.path.join(rootPath,f,fo,fol)):
print '----' + fold
if os.path.isdir(os.path.join(rootPath,f,fo,fol,fold)):
for folde in os.listdir(os.path.join(rootPath,f,fo,fol,fold)):
print '-----' + folde
if os.path.isdir(os.path.join(rootPath,f,fo,fol,fold,folde)):
for folder in os.listdir(os.path.join(rootPath,f,fo,fol,fold,folde)):
print '------' + folder
Output:
###### using os.walk ######
directory - first
directory - second
directory - third
directory - fourth
directory - fourth file - in_third.txt
directory - fifth
directory - fifth file - in_fourth.txt
directory -
###### using isdir ######
-.DS_Store
-first
---.DS_Store
---second
-----.DS_Store
-----third
------.DS_Store
------fourth
-------.DS_Store
-------in_fourth.txt
-------fifth
---------.DS_Store
---------in_fifth.txt
------in_third.txt
It seems as though os.walk isn't going into the 'fifth' folder to see in_fifth.txt, however the isidr() solution does.
Thanks
So part of this was my misunderstanding about how os.walk worked, I believed you were required to iterate through ever directory with for d in dirs. however, it seems it does it already with for f in files
I got round this by letting files do its thing, then replacing root with the rootPath I initially provided, as I only wanted the directory names after this entire path string.
rootPath = '/path/to/my/folder/test'
for root, dirs, files in os.walk(rootPath):
for f in files:
if not f.startswith('.'):
print 'file - ' + os.path.join(os.path.join(root.replace(rootPath,''), f))
Output:
file - /test2/first/foobar/files1.txt
file - /test2/first/foo.txt
file - /extra/test/bar.bin
Related
I have 1000 of subdirectories (error1 - error1000) with three different csv files (rand.csv, run_error.csv, swe_error.csv). Each vsc has index row. I need to merge the csv files that have the same filename, so I end up with e.g. rand_merge.csv with index row and 1000 rows of data.
I followed Merge multiple csv files with same name in 10 different subdirectory, which gets me
KeyError: 'filename'
I can't figure out how to fix it, so any help is appreciated.
Thx
Update: Here's the exact code, which came from linked post above:
import pandas as pd
import glob
CONCAT_DIR = "./error/files_concat/"
# Use glob module to return all csv files under root directory. Create DF from this.
files = pd.DataFrame([file for file in glob.glob("error/*/*")], columns=["fullpath"])
# Split the full path into directory and filename
files_split = files['fullpath'].str.rsplit("\\", 1, expand=True).rename(columns={0: 'path', 1:'filename'})
# Join these into one DataFrame
files = files.join(files_split)
# Iterate over unique filenames; read CSVs, concat DFs, save file
for f in files['filename'].unique():
paths = files[files['filename'] == f]['fullpath'] # Get list of fullpaths from unique filenames
dfs = [pd.read_csv(path, header=None) for path in paths] # Get list of dataframes from CSV file paths
concat_df = pd.concat(dfs) # Concat dataframes into one
concat_df.to_csv(CONCAT_DIR + f) # Save dataframe
I found my mistake. I needed a "/" after rsplit, not "\"
files_split = files['fullpath'].str.rsplit("/", 1, expand=True).rename(columns={0: 'path', 1:'filename'})
I have an interesting issue I am trying to solve and I have taken a good stab at it but need a little help. I have a squishy file that contains some lua code. I am trying to read this file and build a file path out of it. However, depending on where this file was generated from, it may contain some information or it might miss some. Here is an example of the squishy file I need to parse.
Module "foo1"
Module "foo2"
Module "common.command" "common/command.lua"
Module "common.common" "common/common.lua"
Module "common.diagnostics" "common/diagnostics.lua"
Here is the code I have written to read the file and search for the lines containing Module. You will see that there are three different sections or columns to this file. If you look at line 3 you will have "Module" for column1, "common.command" for column2 and "common/command.lua" for column3.
Taking Column3 as an example... if there is data that exists in the 3rd column then I just need to strip the quotes off and grab the data in Column3. In this case it would be common/command.lua. If there is no data in Column3 then I need to get the data out of Column2 and replace the period (.) with a os.path.sep and then tack a .lua extension on the file. Again, using line 3 as an example I would need to pull out common.common and make it common/common.lua.
squishyContent = []
if os.path.isfile(root + os.path.sep + "squishy"):
self.Log("Parsing Squishy")
with open(root + os.path.sep + "squishy") as squishyFile:
lines = squishyFile.readlines()
squishyFile.close()
for line in lines:
if line.startswith("Module "):
path = line.replace('Module "', '').replace('"', '').replace("\n", '').replace(".", "/") + ".lua"
Just need some examples/help in getting through this.
This might sound silly, but the easiest approach is to convert everything you told us about your task to code.
for line in lines:
# if the line doesn't start with "Module ", ignore it
if not line.startswith('Module '):
continue
# As you said, there are 3 columns. They're separated by a blank, so what we're gonna do is split the text into a 3 columns.
line= line.split(' ')
# if there are more than 2 columns, use the 3rd column's text (and remove the quotes "")
if len(line)>2:
line= line[2][1:-1]
# otherwise, ...
else:
line= line[1] # use the 2nd column's text
line= line[1:-1] # remove the quotes ""
line= line.replace('.', os.path.sep) # replace . with /
line+= '.lua' # and add .lua
print line # prove it works.
With a simple problem like this, it's easy to make the program do exactly what you yourself would do if you did the task manually.
I have the following question in Python 2.7:
I have 20 different txt-files, each with exactly one column of numbers. Now - as an output - I would like to have one file with all those columns together. How can I concatenate one-column files in Python ? I was thinking about using the fileinput module, but I fear, I have to open all my different txt files at once ?
My idea:
filenames = ['input1.txt','input2.txt',...,'input20.txt']
import fileinput
with open('/path/output.txt', 'w') as outfile:
for line in fileinput.input(filenames)
write(line)
Any suggestions on that ?
Thanks for any help !
A very simply (naive?) solution is
filenames = ['a.txt', 'b.txt', 'c.txt', 'd.txt']
columns = []
for filename in filenames:
lines = []
for line in open(filename):
lines.append(line.strip('\n'))
columns.append(lines)
rows = zip(*columns)
with open('output.txt', 'w') as outfile:
for row in rows:
outfile.write("\t".join(row))
outfile.write("\n")
But on *nix (including OS X terminal and Cygwin), it's easier to
$ paste a.txt b.txt c.txt d.txt
from the command line.
My suggestion: a little functional approach. Using list comprehension to zip the file being read, to the accumulated columns, and then join them to be a string again, one column (file) at a time:
filenames = ['input1.txt','input2.txt','input20.txt']
outputfile = 'output.txt'
#maybe you need to separate each column:
separator = " "
separator_list = []
output_list = []
for f in filenames:
with open(f,'r') as inputfile:
if len(output_list) == 0:
output_list = inputfile.readlines()
separator_list = [ separator for x in range(0, len(outputlist))]
else:
input_list = inputfile.readlines()
output_list = [ ''.join(x) for x in [list(y) for y in zip(output_list, separator_list, input_list)]
with open(outputfile,'w') as output:
output.writelines(output_list)
It will keep in memory the accumulator for the result (output_list), and one file at a time (the one being read, which is also the only file open for reading), but may be a little slower, and, of course, it is not fail-proof.
I'm trying to create a WiFi Log Scanner. Currently we go through logs manually using CTRL+F and our keywords. I just want to automate that process. i.e. bang in a .txt file and receive an output.
I've got the bones of the code, can work on making it pretty later, but I'm running into a small issue. I want the scanner to search the file (done), count instances of that string (done) and output the number of occurrences (done) followed by the full line where that string occurred last, including line number (line number is not essential, just makes things easier to do a gestimate of which is the more recent issue if there are multiple).
Currently I'm getting an output of every line with the string in it. I know why this is happening, I just can't think of a way to specify just output the last line.
Here is my code:
import os
from Tkinter import Tk
from tkFileDialog import askopenfilename
def file_len(filename):
#Count Number of Lines in File and Output Result
with open(filename) as f:
for i, l in enumerate(f):
pass
print('There are ' + str(i+1) + ' lines in ' + os.path.basename(filename))
def file_scan(filename):
#All Issues to Scan will go here
print ("DHCP was found " + str(filename.count('No lease, failing')) + " time(s).")
for line in filename:
if 'No lease, failing' in line:
print line.strip()
DNS= (filename.count('Host name lookup failure:res_nquery failed') + filename.count('HTTP query failed'))/2
print ("DNS Failure was found " + str(DNS) + " time(s).")
for line in filename:
if 'Host name lookup failure:res_nquery failed' or 'HTTP query failed' in line:
print line.strip()
print ("PSK= was found " + str(testr.count('psk=')) + " time(s).")
for line in ln:
if 'psk=' in line:
print 'The length(s) of the PSK used is ' + str(line.count('*'))
Tk().withdraw()
filename=askopenfilename()
abspath = os.path.abspath(filename) #So that doesn't matter if File in Python Dir
dname = os.path.dirname(abspath) #So that doesn't matter if File in Python Dir
os.chdir(dname) #So that doesn't matter if File in Python Dir
print ('Report for ' + os.path.basename(filename))
file_len(filename)
file_scan(filename)
That's, pretty much, going to be my working code (just have to add a few more issue searches), I have a version that searches a string instead of a text file here. This outputs the following:
Total Number of Lines: 38
DHCP was found 2 time(s).
dhcp
dhcp
PSK= was found 2 time(s).
The length(s) of the PSK used is 14
The length(s) of the PSK used is 8
I only have general stuff there, modified for it being a string rather than txt file, but the string I'm scanning from will be what's in the txt files.
Don't worry too much about PSK, I want all examples of that listed, I'll see If I can tidy them up into one line at a later stage.
As a side note, a lot of this is jumbled together from doing previous searches, so I have a good idea that there are probably neater ways of doing this. This is not my current concern, but if you do have a suggestion on this side of things, please provide an explanation/link to explanation as to why your way is better. I'm fairly new to python, so I'm mainly dealing with stuff I currently understand. :)
Thanks in advance for any help, if you need any further info, please let me know.
Joe
To search and count the string occurrence I solved in following way
'''---------------------Function--------------------'''
#Counting the "string" occurrence in a file
def count_string_occurrence():
string = "test"
f = open("result_file.txt")
contents = f.read()
f.close()
print "Number of '" + string + "' in file", contents.count("foo")
#we are searching "foo" string in file "result_file.txt"
I can't comment yet on questions, but I think I can answer more specifically with some more information What line do you want only one of?
For example, you can do something like:
search_str = 'find me'
count = 0
for line in file:
if search_str in line:
last_line = line
count += 1
print '{0} occurrences of this line:\n{1}'.format(count, last_line)
I notice that in file_scan you are iterating twice through file. You can surely condense it into one iteration :).
I need to replace a single text phrase across my entire website domain with another one. What is the best way to do a mass search/ replace?
If you can do it file-by-file, then you could use a simple Perl one-liner:
perl -pi -e 's/search/replace/gi' filename.txt
If you are on a UNIX system with a shell, you can combine this with find to search and replace text on files in subdirectoies:
find /dir/to/files -iname 'foo.*' -exec perl ... {}\;
where ... is the above perl command.
I use KEDIT to do this every day. I have a script I wrote called TooMany.kex which allows me to edit a list of files across computers and networks. The only other way I know how to do it is using a shell script - you already have that.
* TooMany.kex - perform commands on files using a directory
* (this is designed to issue edit commands to too many files for the ring buffer)
* Multiple commands are separated by a semicolon ";"
*
* eg. TooMany c\one\two\**;c\two\three\**;file
* commands:
* 1. kedit "dirfileid.1()" (nodefext noprof'
* 2. c\one\two\**
* 3. c\two\three\**
* 4. file
*
parse arg CmdStr
if ftype.1() \= 'DIR' then do
'alert /The File Type Must Be "DIR"./ title /TooMany/'
exit
end
'nomsg less /</|/>/'
if rc = 0 then do
if nbscope.1() = 0 then do
'alert /No files found/ title /TooMany/'
exit
end
end
'top'
* give user something to look at while macro is running
'extract /nbfile/fileid'
* the number of files can change depending on the setting SCOPE/DISPLAY or ALL
size = nbscope.1()
if scope.1() = "ALL" then size = size.1()
nfiles = size
'msg Processing' size 'files.'
'refresh'
* save the directory file name
dir_fileid = fileid.1()
do nfiles - 1
* if less than 3K ISA free, leave early so user has some to work with
if memory.3() < 3 then do
'alert $TooMany aborting. ISA nearly full. You Forgot To File.$ title $TooMany$'
'qquit'
exit
end
'down'
'refresh'
'kedit "'dirfileid.1()'" (nodefext noprof'
if rc \= 0 then do
'alert $TooMany aborting. KEDIT rc='rc'$ title $TooMany$'
exit
end
Call ExecuteCommands
* edit file # 1 in the ring
'kedit "'fileid.1'" (noprof'
*'refresh'
end
* quit out of dir.dir and edit the last file
'next'
fid = dirfileid.1()
** 'qquit'
'kedit "'fid'" (nodefext noprof'
Call ExecuteCommands
'msg TooMany:' nfiles 'file(s) processed'
exit
ExecuteCommands:
* special skip files - don't edit the directory file
if dir_fileid = fileid.1() then return
* Execute commands separated by ";"
istart = 1
do forever
if pos(";",CmdStr,istart) = 0 then do
command substr(CmdStr,istart,length(CmdStr))
return
end
else do
iend = pos(";",CmdStr,istart)
command substr(CmdStr,istart,iend - istart)
istart = iend + 1
if istart > length(CmdStr) then return
end
end
return