ARFF to CSV multiple files conversions - weka

Anyone successfully tried to convert many ARFF files to CSV files from windows Command line.
I tried to use weka.core.converters.CSVSaver but it works for a single file only.
Can it be done for multiple files?

I found a way to solve this conversion by using R as shown in the following Script:
#### Set the default directory to the folder that contains all ARFF files
temp = list.files(pattern="*.arff")
library(foreign)
for (i in 1:length(temp)) assign(temp[i], read.arff(temp[i]))
for(i in 1:length(temp))
{
mydata=read.arff(temp[i])
t=temp[i]
x=paste(t,".csv")
write.csv(mydata,x,row.names=FALSE)
mydata=0
}

On a windows command line, type powershell
Change to the directory where your *.arff files reside in
Enter this command
dir *.arff | Split-Path -Leaf| ForEach-Object {Invoke-Expression "C:\Program Files\Weka-3-6\weka.jar;." weka.core.converters.CSVSaver -i $_ -o $_.csv"}
This assumes that your filenames do not contain any blanks, and all arff files reside in a single directory, and you want to convert them all. It will create a new csv file from each arff file. myfile.arff will be exported/converted to myfile.arff.csv

I write a simple python script in github: arff2csv.py.
paste my code.
"""trans multi-label *.arff file to *.csv file."""
import re
def trans_arff2csv(file_in, file_out):
"""trans *.arff file to *.csv file."""
columns = []
data = []
with open(file_in, 'r') as f:
data_flag = 0
for line in f:
if line[:2] == '#a':
# find indices
indices = [i for i, x in enumerate(line) if x == ' ']
columns.append(re.sub(r'^[\'\"]|[\'\"]$|\\+', '', line[indices[0] + 1:indices[-1]]))
elif line[:2] == '#d':
data_flag = 1
elif data_flag == 1:
data.append(line)
content = ','.join(columns) + '\n' + ''.join(data)
# save to file
with open(file_out, 'w') as f:
f.write(content)
if __name__ == '__main__':
from multi_label.arff2csv import trans_arff2csv
# setting arff file path
file_attr_in = r'D:\Downloads\birds\birds-test.arff'
# setting output csv file path
file_csv_out = r"D:\Downloads\birds\birds-test.csv"
# trans
trans_arff2csv(file_attr_in, file_csv_out)

Related

How to I create a code in Python to run a file A, use its output as input to file B?

A and B are independent files which run perfectly. They are coded such that the results are stored in corresponding text files. My requirement is to create a new code to run the file A, get its output and give it as the input to the file B and run it to get the desired output.
# File1
def do_something(in_file):
with open(in_file, "r") as t:
return t.name
# File2
def do_something_else(in_file):
with open(in_file, 'r') as q:
for a in q.readlines():
print(a)
# File to Run Other Files
import file1, file2
outfile = file1.do_something("hello.txt")
file2.do_something_else(outfile)
The value which is returned from file1 is a full file path so when doing your code you can either return the path from the file object as I have or you can pass the file name (which you used to create the output file).
Bear in mind all these files should be in the same directory.

Ziping a File in python and moving it

In the script when I have zip it up a file in say C:/Users/User/Desktop/Folder, it shows up as a zip file in the structure of ZipFile.zip/C:/Users/user/Desktop/Folder instead of just ZipFile.zip/Folder and I can't figure out how to fix it. [Zipping code is lines 21-26]
I'm also trying to move the created zip file to the specified back up device [line 27]
My code is :
import os
import sys
import shutil
import zipfile
import traceback
print ('Welcome to USB Backup Utility')
print ('Created by: TheCryptek')
print ('\nWhat directory would you like to back up?')
print ('Example: C:/users/user/Desktop/Folder')
backUp = raw_input('> ') # Files the user specified to back up
print ('\nWhere would you like to back these files up at?')
print ('Example USB Letter: E:/')
backDevice = raw_input('> ') # Device the user specified to save the back up on.
print ('\nName of the zip file you prefer?')
print ('Example: Backup.zip')
backZip = raw_input('> ') # The name of the zip file specified by the user
print ('\nBackup started...')
if not os.path.exists(backDevice + '/BackUp'): # If the BackUp folder doesn't exist on the device then
os.mkdir(backDevice + 'BackUp') # Make the backup folder on usb device
backZip = zipfile.ZipFile(backZip, 'w') # Not sure what to say for lines 21 - 26
for dirname, subdirs, files in os.walk(backUp):
backZip.write(dirname)
for filename in files:
backZip.write(os.path.join(dirname, filename))
backZip.close()
shutil.move(backZip, backDevice + '/BackUp') # Move the zip files created in working directory to the specified back up device -[ Something is wrong with this can't figure out what ]-
print('Backup finished.')
For shutil.move() you have to give proper source and destination paths.
And in your program,the source path and file object are of same names.so it is calling that object instead it should take the path of file.
import os
import sys
import shutil
import zipfile
import traceback
print ('Welcome to USB Backup Utility')
print ('Created by: TheCryptek')
print ('\nWhat directory would you like to back up?')
print ('Example: C:/users/user/Desktop/Folder')
backUp = raw_input('> ') # Files the user specified to back up
print ('\nWhere would you like to back these files up at?')
print ('Example USB Letter: E:/')
backDevice = raw_input('> ') # Device the user specified to save the back up on.
print ('\nName of the zip file you prefer?')
print ('Example: Backup.zip')
backZip = raw_input('> ') # The name of the zip file specified by the user
print ('\nBackup started...')
if not os.path.exists(backDevice + '/BackUp'): # If the BackUp folder doesn't exist on the device then
os.mkdir(backDevice + 'BackUp') # Make the backup folder on usb device
bkZip = zipfile.ZipFile(backZip, 'w') # Not sure what to say for lines 21 - 26
for dirname, subdirs, files in os.walk(backUp):
bkZip.write(dirname)
for filename in files:
bkZip.write(os.path.join(dirname, filename))
bkZip.close()
#print backZip,backDevice
dest = backDevice + '/BackUp'
#print dest
shutil.move(backZip, dest) # Move the zip files created in working directory to the specified back up device -[ Something is wrong with this can't figure out what ]-
print('Backup finished.')
You have to make an absolute path for it probs.

Python finds a string in multiple files recursively and returns the file path

I'm learning Python and would like to search for a keyword in multiple files recursively.
I have an example function which should find the *.doc extension in a directory.
Then, the function should open each file with that file extension and read it.
If a keyword is found while reading the file, the function should identify the file path and print it.
Else, if the keyword is not found, python should continue.
To do that, I have defined a function which takes two arguments:
def find_word(extension, word):
# define the path for os.walk
for dname, dirs, files in os.walk('/rootFolder'):
#search for file name in files:
for fname in files:
#define the path of each file
fpath = os.path.join(dname, fname)
#open each file and read it
with open(fpath) as f:
data=f.read()
# if data contains the word
if word in data:
#print the file path of that file
print (fpath)
else:
continue
Could you give me a hand to fix this code?
Thanks,
def find_word(extension, word):
for root, dirs, files in os.walk('/DOC'):
# filter files for given extension:
files = [fi for fi in files if fi.endswith(".{ext}".format(ext=extension))]
for filename in files:
path = os.path.join(root, filename)
# open each file and read it
with open(path) as f:
# split() will create list of words and set will
# create list of unique words
words = set(f.read().split())
if word in words:
print(path)
.doc files are rich text files, i.e. they wont open with a simple text editor or python open method. In this case, you can use other python modules such as python-docx.
Update
For doc files (previous to Word 2007) you can also use other tools such as catdoc or antiword. Try the following.
import subprocess
def doc_to_text(filename):
return subprocess.Popen(
'catdoc -w "%s"' % filename,
shell=True,
stdout=subprocess.PIPE
).stdout.read()
print doc_to_text('fixtures/doc.doc')
If you are trying to read .doc file in your code the this won't work. you will have to change the part where you are reading the file.
Here are some links for reading a .doc file in python.
extracting text from MS word files in python
Reading/Writing MS Word files in Python
Reading/Writing MS Word files in Python

How do I confirm with python that required files are in a particular folder and are accessible or not?

I have 5 files in a folder App:
App|
|--A.txt
|--B.txt
|--C.txt
|--D.txt
|--E.txt
|--Run.py
|--Other Folders or Files
Now I want to know if files (A.txt,B.txtC.txt,C.txt,D.txt,E.txt) is present or not and if its there than I want to call a function Cleaner which will supply names of these files to that function. I have written this code but nothing is happening.The function is not getting called.
import glob
import csv
import itertools
files = glob.glob("*.txt")
i = 0
def sublist(a, b):
seq = iter(b)
try:
for x in a:
while next(seq) != x: pass
else:
return True
except StopIteration:
pass
return False
required_files = ['Alternate_ADR6_LFB1.txt', 'Company_Code.txt', 'Left_LIFNR.txt', 'LFA1.txt', 'LFB1.TXT', 'LFBK.TXT']
if sublist(required_files,files):
for files in required_files:
try:
f = open(files , 'r')
f.close()
except IOError as e:
print 'Error opening or accessing files'
i = 1
else:
print 'Required files are not in correct folder'
if i == 1:
for files in required_files:
Cleansing(files)
def Cleansing(filename):
with open('filename', 'rb') as f_input:
...
...
break
with open('filename', 'rb') as f_input, open('filename_Cleaned.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow('something')
Upadate
I think now I am able to call the function and also able to check the valid files but its not that pythonic. And I am not able to open or create a file with the name of the file plus _cleaned :filename_cleaned.csv.
You want to check if a list of files (required_files) are in a folder.
You successfully get the complete list of text files in the folder with files = glob.glob("*.txt")
So the first question is: Checking for sublist in list
As the order is not important, we can use sets:
if set(required_files) <= set(files):
# do stuff
else:
#print warning
Next question: How to open the files and create an outputs with names like "filename_Cleaned.csv"
A very important thing you have to understand: "filename" is not the same thing as filename. The first is a string, it will always be the same thing, it will not be replaced by real filenames. When writing open('filename', 'rb') you're trying to open a file called "filename".
filename however can be a variable name and take different values.
for filename in required_files:
Cleansing(filename)
def Cleansing(filename):
with open(filename, 'rb') as f_input, open(filename+'_Cleaned.csv', 'wb') as f_output:
#read stuff in f_input
#write stuff in f_output

How do I recursively find a specific subfolder name in a directory using Python / Regular expressions?

New to Python and programming. I'm using a Mac Air w/ OS X Yosemite version 10.10.2.
I'm looking for a way to recursively find multiple subfolders, with the same name(e.g."Table"), without using a direct path(assuming I don't know what folders they might be in) and read the files within them using regular expressions or Python. Thank you in advance!
import sys
import os, re, sys
import codecs
import glob
files = glob.glob( '*/Table/.*Records.sql' )
with open( 'AllTables1.2.sql', 'w' ) as result:
for file_ in files:
print file_
if len(file_) == 0: continue
for line in open( file_, 'r' ):
line = re.sub(r'[\[\]]', '', line)
line = re.sub(r'--.*', '', line)
line = re.sub('NVARCHAR', 'VARCHAR', line)
line = re.sub(r'IDENTITY', 'AUTO_INCREMENT', line)
line = re.sub(r'DEFAULT\D* +','', line)
line = re.sub(r'\W(1, 1\W)', ' ', line)
result.write( line.decode("utf-8-sig"))
result.close()
You can use os.walk, which comes shippes with python for this purpose. As, the name suggest, os.walk will 'walk' thru your directory recursively and return the root, the dir and a list of file found in the dir.
http://www.tutorialspoint.com/python/os_walk.htm
You will find an example in the link that I gave above.
Hence for your example, you can achieve your goal you consider doing an os.walk, set up a regex to match folders with a given pattern and get the file with the matching name in your list of file.
For instanxe :
import os
for root, dir, filelist in os.walk('./'):
if dir == 'results': # insert logic to find the folder you want
for file in filelist:
if file == 'xx': # logic to match file name
fullpath = os.path.join(root, file) # Get the full path to the file
the above example will find your wanted file names in a particular folder.