input: I have more than 100 sample files. Each sample file has two different files has an extension of *.column' and *.datatypes
File1 each file has column names and has an extension of *.column datatypes description and has an extension of *.datatypes
What I need is an output file in their respective files sample
Output File should have column names along with datatypes.
Currently am getting all 100 files data merged and saved into one file.
Eg: file_1:
column names datatypes
id int
name string
Eg: file_2:
column names datatypes
id int
name string
i got the output for all files column names and datatypes merged in one single file.
What I need is to get individual files merged separately for each sample.
for name in os.listdir("C:\Python27"):
if name.endswith(".column"):
for file in name:
file = os.path.join(name)
joined = file+ ".joined"
with open(joined,"w") as fout:
filenames = glob.glob('*.column')
for filename in filenames:
with open(filename) as f1:
file_names = glob.glob('*.datatypes')
for filename in file_names:
with open(filename) as f2:
for line1,line2 in zip(f1,f2):
x = ("{0} {1} \n".format(line1.rstrip(),line2.rstrip()))
y = x.strip()
fout.write(y.strip() + ',\n')
Please assist me.
Hopefully the below would work. This is on the understanding that each *.column file has a corresponding *.datatypes file name, if not the code will throw a File not found. error.
for colname in os.listdir("C:\Python27"):
if colname.endswith(".column"):
print('Processing:' + colname)
file = os.path.splitext(colname)[0]
joined = file+ ".joined"
with open(joined,"w") as fout:
with open(colname) as f1:
datname = file+'.datatypes'
with open(datname) as f2:
for line1,line2 in zip(f1,f2):
x = ("{0} {1}".format(line1.rstrip(),line2.rstrip()))
y = x.strip()
fout.write(y.strip() + ',\n')
print('Finished writing to :'+joined)
I test ran this with a few sample input files as below file1.column
date_sev
pos
file1.datatypes
timestamp
date
file2.column
id
name
file2.datatypes
int
string
file3.column
id
name
file3.datatypes
int
string
When I run the file I get the below output in the console
Processing:file1.column
Finished writing to :file1.joined
Processing:file2.column
Finished writing to :file2.joined
Processing:file3.column
Finished writing to :file3.joined
And the output files I get are file1.joined
date_sev timestamp,
pos date,
file2.joined
id int,
name string,
file3.joined
id int,
name string,
Also if you want to better the output syntax of the files then I would make the changes as below...
From
x = ("{0} {1}".format(line1.rstrip(),line2.rstrip()))
To
x = ("{0},{1}".format(line1.rstrip(),line2.rstrip()))
From
fout.write(y.strip() + ',\n')
To
fout.write(y.strip() + '\n')
I left the formatting as is from your initial version in my original solution posted in the beginning.
Related
I need to parse through a directory of multiple excel files to find matches to a set of 500+ strings (that I currently have in a set).
If there is a match to one of the strings in an excel file, I need to pull that row out into a new file.
Please let me know if you can assist! Thank you in advance for the help!
The directory is called: All_Data
The set is from a list of strings in a file (MRN_file_path)
My code:
MRN = set()
with open(MRN_file_path) as MRN_file:
for line in MRN_file:
if line.strip():
MRN.add(line.strip())
for root, dires, files in os.walk('path/All_Data'):
for name in files:
if name.endswith('.xlsx'):
filepath = os.path.join(root, name)
with open(search_results_path, "w") as search_results:
if MRN in filepath:
search_results.write(line)
Your code doesn't actually read the .xlsx files. As far as I know, there isn't anything in native Python to read .xlsx files. However, you can check out openpyxl and see if that helps. Here's a solution which reads all the .xlsx files in the specified directory, and writes them into a single tab-delimited txt file.
import os
from openpyxl import load_workbook
MRN = set()
with open(MRN_file_path) as MRN_file:
for line in MRN_file:
if line.strip():
MRN.add(line.strip())
outfile = open(search_results_path, "w")
for root, dires, files in os.walk(path):
for name in files:
if name.endswith('.xlsx'):
filepath = os.path.join(root, name)
# load in the .xlsx workbook
wb = load_workbook(filename = filepath, read_only = True)
# assuming we select the worksheet which is active
ws = wb.active
# iterate through each row in the worksheet
for row in ws.rows:
# iterate over each cell
for cell in row:
if cell.value in MRN:
# create a temporary array with all the cell values in the matching row.
# the 'None' check is there to avoid errors when joining the array
# into a tab-delimited row
arr = [cell.value if cell.value is not None else "" for cell in row]
outfile.write("\t".join(arr) + "\n")
outfile.close()
If a tab-delimited output isn't what you're looking for, then you can adjust the second last line to whatever fits your needs.
I have recently produced several thousand shapefile outputs and accompanying .dbf files from an atmospheric model (HYSPLIT) on a unix system. The converter txt2dbf is used to convert shapefile attribute tables (text file) to a .dbf.
Unfortunately, something has gone wrong (probably a separator/field length error) because there are 2 problems with the output .dbf files, as follows:
Some fields of the dbf contain data that should not be there. This data has "spilled over" from neighbouring fields.
An additional field has been added that should not be there (it actually comes from a section of the first record of the text file, "1000 201").
This is an example of the first record in the output dbf (retrieved using dbview unix package):
Trajnum : 1001 2
Yyyymmdd : 0111231 2
Time : 300
Level : 0.
1000 201:
Here's what I expected:
Trajnum : 1000
Yyyymmdd : 20111231
Time : 2300
Level : 0.
Separately, I'm looking at how to prevent this from happening again, but ideally I'd like to be able to repair the existing .dbf files. Unfortunately the text files are removed for each model run, so "fixing" the .dbf files is the only option.
My approaches to the above problems are:
Extract the information from the fields that do exist to a new variable using dbf.add_fields and dbf.write (python package dbf), then delete the old incorrect fields using dbf.delete_fields.
Delete the unwanted additional field.
This is what I've tried:
with dbf.Table(db) as db:
db.add_fields("TRAJNUMc C(4)") #create new fields
db.add_fields("YYYYMMDDc C(8)")
db.add_fields("TIMEc C(4)")
for record in db: #extract data from fields
dbf.write(TRAJNUMc=int(str(record.Trajnum)[:4]))
dbf.write(YYYYMMDDc=int(str(record.Trajnum)[-1:] + str(record.Yyyymmdd)[:7]))
dbf.write(TIMEc=record.Yyyymmdd[-1:] + record.Time[:])
db.delete_fields('Trajnum') # delete the incorrect fields
db.delete_fields('Yyyymmdd')
db.delete_fields('Time')
db.delete_fields('1000 201') #delete the unwanted field
db.pack()
But this produces the following error:
dbf.ver_2.BadDataError: record data is not the correct length (should be 31, not 30)
Given the apparent problem that there has been with the txt2dbf conversion, I'm not surprised to find an error in the record data length. However, does this mean that the file is completely corrupted and that I can't extract the information that I need (frustrating because I can see that it exists)?
EDIT:
Rather than attempting to edit the 'bad' .dbf files, it seems a better approach to 1. extract the required data to a text from the bad files and then 2. write to a new dbf. (See Ethan Furman's comments/answer below).
EDIT:
An example of a faulty .dbf file that I need to fix/recover data from can be found here:
https://www.dropbox.com/s/9y92f7m88a8g5y4/p0001120110.dbf?dl=0
An example .txt file from which the faulty dbf files were created can be found here:
https://www.dropbox.com/s/d0f2c0zehsyy8ab/attTEST.txt?dl=0
To fix the data and recreate the original text file, this snippet should help:
import dbf
table = dbf.Table('/path/to/scramble/table.dbf')
with table:
fixed_data = []
for record in table:
# convert to str/bytes while skipping delete flag
data = record._data[1:].tostring()
trajnum = data[:4]
ymd = data[4:12]
time = data [12:16]
level = data[16:].strip()
fixed_data.extend([trajnum, ymd, time, level])
new_file = open('repaired_data.txt', 'w')
for line in fixed_data:
new_file.write(','.join(line) + '\n')
Assuming all your data files look like your sample (the big IF being the data has no embedded commas), then this rough code should help translate your text files into dbfs:
raw_data = open('some_text_file.txt').read().split('\n')
final_table = dbf.Table(
'dest_table.dbf',
'trajnum C(4); yyyymmdd C(8); time C(4); level C(9)',
)
with final_table:
for line in raw_data:
fields = line.split(',')
final_table.append(tuple(fields))
# table has been populated and closed
Of course, you could get fancier and use actual date, and number fields if you want to:
# dbf string becomes
'trajnum N; yyyymmdd D; time C(4), level N'
#appending data loop becomes
for line in raw_data:
trajnum, ymd, time, level = line.split(',')
trajnum = int(trajnum)
ymd = dbf.Date(ymd[:4], ymd[4:6], ymd[6:])
level = int(level)
final_table.append((trajnum, ymd, time, level))
I have a files like this.
1.stream0106.wav
2.stream0205.wav
3.steram0304.wav
I need to rename "01" in a file name as "_C" & "06" as "_LFE1" Like this.This new names I have in csv file like below.
Can you please guuide me for this.
I'm not sure if you want the "01" to be replaced or appended. The csv titles make it confusing.
I would first make the csv file start in column A and row 1 to make reading it in easier for you.
If you are appending names this should work
import os
import csv
# Assuming files are just in current directory
wav_files = [f for f in os.listdir('.') if f.endswith('.wav')]
with open('your_file.csv', 'rb') as csv_file:
mappings = [row.strip().split(',' ) for row in csv_file.readlines()[1:]]
for f in wav_files:
for digit, name in mappings:
if f[:-4].endswith(digit):
new_name = f.replace(digit,name)
os.rename(f, new_name)
break
EDIT
Old Name,New Name
00,_0
01,_C
02,_L
03,_R
04,_Ls
05,_Rs
06,_LFE1
07,_Cs
This can be achieved by just having them in excel starting at Col A and Row 1
I have 251 CSV files in a folder. They are named "returned UDTs 1-12-13.csv", "returned UDTs 1-13-13.csv. The dates are not consecutive, however. For example holidays and weekends may have missing dates, so the next file may be "returned UDTs 1-17-13.csv". Each file has one column of data. I need to extract each column and append into one column in one new output csv file. I want to write a python script to do so. In a dummy folder with 3 dummy csv files (csv1.csv, csv2.csv, and csv3.csv) I created the following script that works:
import csv, os, sys
out_csv = r"C:\OutCSV\csvtest.csv"
path = r"C:\CSV_test"
fout=open(out_csv,"a")
# first file:
for line in open(path + "\csv1.csv"):
fout.write(line)
# now the rest:
for num in range(2,4):
f = open(path + "\csv"+str(num)+".csv")
f.next() # skip the header
for line in f:
fout.write(line)
f.close() # dont know if needed
fout.close()
The issue is the date in the file name and how to deal with it. Any help would be appreciated.
I have the following question in Python 2.7:
I have 20 different txt-files, each with exactly one column of numbers. Now - as an output - I would like to have one file with all those columns together. How can I concatenate one-column files in Python ? I was thinking about using the fileinput module, but I fear, I have to open all my different txt files at once ?
My idea:
filenames = ['input1.txt','input2.txt',...,'input20.txt']
import fileinput
with open('/path/output.txt', 'w') as outfile:
for line in fileinput.input(filenames)
write(line)
Any suggestions on that ?
Thanks for any help !
A very simply (naive?) solution is
filenames = ['a.txt', 'b.txt', 'c.txt', 'd.txt']
columns = []
for filename in filenames:
lines = []
for line in open(filename):
lines.append(line.strip('\n'))
columns.append(lines)
rows = zip(*columns)
with open('output.txt', 'w') as outfile:
for row in rows:
outfile.write("\t".join(row))
outfile.write("\n")
But on *nix (including OS X terminal and Cygwin), it's easier to
$ paste a.txt b.txt c.txt d.txt
from the command line.
My suggestion: a little functional approach. Using list comprehension to zip the file being read, to the accumulated columns, and then join them to be a string again, one column (file) at a time:
filenames = ['input1.txt','input2.txt','input20.txt']
outputfile = 'output.txt'
#maybe you need to separate each column:
separator = " "
separator_list = []
output_list = []
for f in filenames:
with open(f,'r') as inputfile:
if len(output_list) == 0:
output_list = inputfile.readlines()
separator_list = [ separator for x in range(0, len(outputlist))]
else:
input_list = inputfile.readlines()
output_list = [ ''.join(x) for x in [list(y) for y in zip(output_list, separator_list, input_list)]
with open(outputfile,'w') as output:
output.writelines(output_list)
It will keep in memory the accumulator for the result (output_list), and one file at a time (the one being read, which is also the only file open for reading), but may be a little slower, and, of course, it is not fail-proof.