I have a csv file with the following entries:
"Last,First,HW1,HW2,HW3,HW4,Test 1,Test 2"
I want to remove the numbers from the HWs and the Tests. So the result would be:
"Last,First,HW,HW,HW,HW,Test ,Test"
Is there some way to do this?
Try this:
def main(ifilename, ofilename):
with open(ifilename,'r') as i, open(ofilename,'w') as o:
for line in i:
for digit in (0,1,2,3,4,5,6,7,8,9):
line = line.replace(str(digit), '')
print(line,file=o,end='')
if __name__ == '__main__':
from sys import argv
main(argv[1], argv[2])
Use the script from the command line like this:
python my_script.py input.csv output.csv
Related
I am testing below python code to export content from one txt file to another but in destination contents are getting copied with some different language (may be chinese) with improper
# - *- coding: utf- 8 - *-
from sys import argv
from os.path import exists
script,from_file,to_file = argv
print "Does the output file exist ? %r" %exists(to_file)
print "If 'YES' hit Enter to proceed or terminate by CTRL+C "
raw_input('?')
infile=open(from_file)
file1=infile.read()
outfile=open(to_file,'w')
file2=outfile.write(file1)
infile.close()
outfile.close()
try this code to copy 1 file to another :
with open("from_file") as f:
with open("to_file", "w") as f1:
for line in f:
f1.write(line)
I am a new user of Python. I got to learn a way of changing value of a parameter in a single file. The script:
#####test.py##########
from sys import argv
script,filename,sigma = argv
file_data = open(filename,'r')
txt = file_data.read()
txt=txt.replace('3.7',sigma)
file_data = open(filename,'w')
file_data.write(txt)
file_data.close()
It's run in command line with test.txt as
test.py test.txt 2.
3.7 is replaced by 2 in test.txt, as a result.
Now if I want to do the same for all the .txt files in the directory e.g.
test.py *.txt 2
what are the suggested modifications?
Your suggestions are highly appreciated.
Hafiz.
bash (or whatever your shell is) will expand the *.txt (to test0.txt test1.txt ... or whatever the *.txt files in your current directory are called) before passing it to your python script. your python script will therefore get many arguments (and not just 2 as you expect). print sys.argv to inspect.
you could solve that in bash itself with something like
for name in *.txt; do test.py ${name} 2; done
otherwise you would need to treat sys.argv differently in python and allow for more than 2 arguments.
Importing glob solved that issue. But I've got some queries.
Query 1:
I'm rewriting my code as:
#####test.py##########
from sys import argv
script,filename,sigma = argv
file_data = open(filename,'r')
txt = file_data.read()
txt=txt.replace('3.7'|'3',sigma) #gives syntax error
file_data = open(filename,'w')
file_data.write(txt)
file_data.close()
I want to replace 3.7 or 3 by sigma. What will be the corrected code?
Query 2:
I'm rewriting it in the following manner:
#####test.py##########
from sys import argv
script,filename,sigma = argv
file_data = open(filename,'r')
txt = file_data.read()
txt=txt.replace('x="2"','x=sigma')
file_data = open(filename,'w')
file_data.write(txt)
file_data.close()
With
py test.py test.txt 3.
I get x=sigma, but I want to get x=3
What'd be the modification?
Regards,
Hafiz
from mrjob.job import job
class KittyJob(MRJob):
OUTPUT_PROTOCOL = JSONValueProtocol
def mapper_cmd(self):
return "grep kitty"
def reducer(self, key, values):
yield None, sum(1 for _ in values)
if __name__ == '__main__':
KittyJob().run()
Source : https://mrjob.readthedocs.org/en/latest/guides/writing-mrjobs.html#protocols
How does this code do its task of counting the number of lines containing kitty?
Also where is OUTPUT_PROTOCOL defined?
Well, the short answer is that this example doesn't count lines containing 'kitty'.
Here is some code using filters that does count lines containing (case-insensitive) kitty:
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol
from mrjob.step import MRStep
class KittyJob(MRJob):
OUTPUT_PROTOCOL = JSONValueProtocol
def mapper(self, _, line):
yield 'kitty', 1
def sum_kitties(self, key, values):
yield None, sum(values)
def steps(self):
return [
MRStep(mapper_pre_filter='grep -i "kitty"',
mapper=self.mapper,
reducer=self.sum_kitties)]
if __name__ == '__main__':
KittyJob().run()
If I run it using the local runner as noted in Shell Commands as Steps over the text of the english wikipedia page for 'Kitty', then I get a count of all lines containing 'kitty' as expected:
$ python grep_kitty.py -q -r local kitty.txt
20
$ grep -ci kitty kitty.txt
20
It looks like the example you cite from the mrjob docs is just wrong.
this is just part of the long python script. there is a file called aqfile and it has many parameters. I would like to extract what is next to "OWNER" and "NS".
Note:
OWNER = text
NS = numbers
i could extract what is next to OWNER, because they were just text and i could extract.
for line in aqfile.readlines():
if string.find(line,"OWNER")>0:
print line
m=re.search('<(.*)>',line)
owner=incorp(m.group(1))
break
but when i try to modify the script to extract the numbers
for line in aqfile.readlines():
if string.find(line,"NS")>0:
print line
m=re.search('<(.*)>',line)
ns=incorp(m.group(1))
break
it doesnt work any more.
Can anyone help me?
this is the whole script
#Make a CSV file of datasetnames. pulseprog and, if avaible, (part of) the title
#Note: the whole file tree is read into memory!!! Do not start too high in the tree!!!
import os
import os.path
import fnmatch
import re
import string
max=20000
outfiledesc=0
def incorp(c):
#Vervang " door """ ,CRLF door blankos
c=c.replace('"','"""')
c=c.replace("\r"," ")
c=c.replace("\n"," ")
return "\"%s\"" % (c)
def process(arg,root,files):
global max
global outfiledesc
#Get name,expno,procno from the root
if "proc" in files:
procno = incorp(os.path.basename(root))
oneup = os.path.dirname(root)
oneup = os.path.dirname(oneup)
aqdir=oneup
expno = incorp(os.path.basename(oneup))
oneup = os.path.dirname(oneup)
dsname = incorp(os.path.basename(oneup))
#Read the titlefile, if any
if (os.path.isfile(root + "/title")):
f=open(root+"/title","r")
title=incorp(f.read(max))
f.close()
else:
title=""
#Grab the pulse program name from the acqus parameter
aqfile=open(aqdir+"/acqus")
for line in aqfile.readlines():
if string.find(line,"PULPROG")>0:
print line
m=re.search('<(.*)>',line)
pulprog=incorp(m.group(1))
break
towrite= "%s;%s;%s;%s;%s\n" % (dsname,expno,procno,pulprog,title)
outfiledesc.write(towrite)
#Main program
dialogline1="Starting point of the search"
dialogline2="Maximum length of the title"
dialogline3="output CSV file"
def1="/opt/topspin3.2/data/nmrafd/nmr"
def2="20000"
def3="/home/nmrafd/filelist.csv"
result = INPUT_DIALOG("CSV file creator","Create a CSV list",[dialogline1,dialogline2,dialogline3],[def1,def2,def3])
start=result[0]
tlength=int(result[1])
outfile=result[2]
#Search for procs files. They should be in any dataset.
outfiledesc = open(outfile,"w")
print start
os.path.walk(start,process,"")
outfiledesc.close()
I have a .txt file with multiple lines (Different yet simular for each one) that I want to add a "*.tmp" at the end.
I'm trying to use python2.7 regex to do this.
Here is what I have for the python script:
import sys
import os
import re
import shutil
#Sets the buildpath variable to equal replace one "\" with two "\\" for python code to input/output correctly
buildpath = sys.argv[1]
buildpath = buildpath.replace('\\', '\\\\')
#Opens a tmp file with read/write/append permissions
tf = open('tmp', 'a+')
#Opens the selenium script for scheduling job executions
with open('dumplist.txt') as f:
#Sets line as a variable for every line in the selenium script
for line in f.readlines():
#Sets build as a variable that will replace the \\build\path string in the line of the selenium script
build = re.sub (r'\\\\''.*',''+buildpath+'',line)
#Overwrites the build path string from the handler to the tmp file with all lines included from the selenium script
tf.write(build)
#Saves both "tmp" file and "selenium.html" file by closing them
tf.close()
f.close()
#Copies what was re-written in the tmp file, and writes it over the selenium script
shutil.copy('tmp', 'dumplist.txt')
#Deletes the tmp file
os.remove('tmp')
#exits the script
exit()
Current File Before Replacing the Line:
\\server\dir1\dir2\dir3
DUMP3f2b.tmp
1 File(s) 1,034,010,207 bytes
\\server\dir1_1\dir2_1\dir3_1
DUMP3354.tmp
1 File(s) 939,451,120 bytes
\\server\dir1_2\dir2_2\dir3_2
Current file after replacing string:
\*.tmp
DUMP3f2b.tmp
1 File(s) 1,034,010,207 bytes
\*.tmp
DUMP3354.tmp
1 File(s) 939,451,120 bytes
\*.tmp
Desired file after replacing string:
\\server\dir1\dir2\dir3\*.tmp
DUMP3f2b.tmp
1 File(s) 1,034,010,207 bytes
\\server\dir1_1\dir2_1\dir3_1\*.tmp
DUMP3354.tmp
1 File(s) 939,451,120 bytes
\\server\dir1_2\dir2_2\dir3_2\*.tmp
If anyone could help me in solving this that would be great. Thanks :)
You should use capturing groups:
>>> import re
>>> s = "\\server\dir1\dir2\dir3"
>>> print re.sub(r'(\\.*)', r'\\\1\*.tmp', s)
\\server\dir1\dir2\dir3\*.tmp
Then, modify build = re.sub (r'\\\\''.*',''+buildpath+'',line) line this way:
build = re.sub (r'(\\.*)', r'\\\1%s' % buildpath, line)
Also, you shouldn't call readlines(), just iterate over f:
for line in f: