How to remove numbers from every list entry - list

I have a csv file with the following entries:
"Last,First,HW1,HW2,HW3,HW4,Test 1,Test 2"
I want to remove the numbers from the HWs and the Tests. So the result would be:
"Last,First,HW,HW,HW,HW,Test ,Test"
Is there some way to do this?

Try this:
def main(ifilename, ofilename):
with open(ifilename,'r') as i, open(ofilename,'w') as o:
for line in i:
for digit in (0,1,2,3,4,5,6,7,8,9):
line = line.replace(str(digit), '')
print(line,file=o,end='')
if __name__ == '__main__':
from sys import argv
main(argv[1], argv[2])
Use the script from the command line like this:
python my_script.py input.csv output.csv

Related

Python code for import not working properly

I am testing below python code to export content from one txt file to another but in destination contents are getting copied with some different language (may be chinese) with improper
# - *- coding: utf- 8 - *-
from sys import argv
from os.path import exists
script,from_file,to_file = argv
print "Does the output file exist ? %r" %exists(to_file)
print "If 'YES' hit Enter to proceed or terminate by CTRL+C "
raw_input('?')
infile=open(from_file)
file1=infile.read()
outfile=open(to_file,'w')
file2=outfile.write(file1)
infile.close()
outfile.close()
try this code to copy 1 file to another :
with open("from_file") as f:
with open("to_file", "w") as f1:
for line in f:
f1.write(line)

How to change the values of a parameter in multiple files using python

I am a new user of Python. I got to learn a way of changing value of a parameter in a single file. The script:
#####test.py##########
from sys import argv
script,filename,sigma = argv
file_data = open(filename,'r')
txt = file_data.read()
txt=txt.replace('3.7',sigma)
file_data = open(filename,'w')
file_data.write(txt)
file_data.close()
It's run in command line with test.txt as
test.py test.txt 2.
3.7 is replaced by 2 in test.txt, as a result.
Now if I want to do the same for all the .txt files in the directory e.g.
test.py *.txt 2
what are the suggested modifications?
Your suggestions are highly appreciated.
Hafiz.
bash (or whatever your shell is) will expand the *.txt (to test0.txt test1.txt ... or whatever the *.txt files in your current directory are called) before passing it to your python script. your python script will therefore get many arguments (and not just 2 as you expect). print sys.argv to inspect.
you could solve that in bash itself with something like
for name in *.txt; do test.py ${name} 2; done
otherwise you would need to treat sys.argv differently in python and allow for more than 2 arguments.
Importing glob solved that issue. But I've got some queries.
Query 1:
I'm rewriting my code as:
#####test.py##########
from sys import argv
script,filename,sigma = argv
file_data = open(filename,'r')
txt = file_data.read()
txt=txt.replace('3.7'|'3',sigma) #gives syntax error
file_data = open(filename,'w')
file_data.write(txt)
file_data.close()
I want to replace 3.7 or 3 by sigma. What will be the corrected code?
Query 2:
I'm rewriting it in the following manner:
#####test.py##########
from sys import argv
script,filename,sigma = argv
file_data = open(filename,'r')
txt = file_data.read()
txt=txt.replace('x="2"','x=sigma')
file_data = open(filename,'w')
file_data.write(txt)
file_data.close()
With
py test.py test.txt 3.
I get x=sigma, but I want to get x=3
What'd be the modification?
Regards,
Hafiz

Explanation of this MRJob example

from mrjob.job import job
class KittyJob(MRJob):
OUTPUT_PROTOCOL = JSONValueProtocol
def mapper_cmd(self):
return "grep kitty"
def reducer(self, key, values):
yield None, sum(1 for _ in values)
if __name__ == '__main__':
KittyJob().run()
Source : https://mrjob.readthedocs.org/en/latest/guides/writing-mrjobs.html#protocols
How does this code do its task of counting the number of lines containing kitty?
Also where is OUTPUT_PROTOCOL defined?
Well, the short answer is that this example doesn't count lines containing 'kitty'.
Here is some code using filters that does count lines containing (case-insensitive) kitty:
from mrjob.job import MRJob
from mrjob.protocol import JSONValueProtocol
from mrjob.step import MRStep
class KittyJob(MRJob):
OUTPUT_PROTOCOL = JSONValueProtocol
def mapper(self, _, line):
yield 'kitty', 1
def sum_kitties(self, key, values):
yield None, sum(values)
def steps(self):
return [
MRStep(mapper_pre_filter='grep -i "kitty"',
mapper=self.mapper,
reducer=self.sum_kitties)]
if __name__ == '__main__':
KittyJob().run()
If I run it using the local runner as noted in Shell Commands as Steps over the text of the english wikipedia page for 'Kitty', then I get a count of all lines containing 'kitty' as expected:
$ python grep_kitty.py -q -r local kitty.txt
20
$ grep -ci kitty kitty.txt
20
It looks like the example you cite from the mrjob docs is just wrong.

how to lookup the numbers next to character using python

this is just part of the long python script. there is a file called aqfile and it has many parameters. I would like to extract what is next to "OWNER" and "NS".
Note:
OWNER = text
NS = numbers
i could extract what is next to OWNER, because they were just text and i could extract.
for line in aqfile.readlines():
if string.find(line,"OWNER")>0:
print line
m=re.search('<(.*)>',line)
owner=incorp(m.group(1))
break
but when i try to modify the script to extract the numbers
for line in aqfile.readlines():
if string.find(line,"NS")>0:
print line
m=re.search('<(.*)>',line)
ns=incorp(m.group(1))
break
it doesnt work any more.
Can anyone help me?
this is the whole script
#Make a CSV file of datasetnames. pulseprog and, if avaible, (part of) the title
#Note: the whole file tree is read into memory!!! Do not start too high in the tree!!!
import os
import os.path
import fnmatch
import re
import string
max=20000
outfiledesc=0
def incorp(c):
#Vervang " door """ ,CRLF door blankos
c=c.replace('"','"""')
c=c.replace("\r"," ")
c=c.replace("\n"," ")
return "\"%s\"" % (c)
def process(arg,root,files):
global max
global outfiledesc
#Get name,expno,procno from the root
if "proc" in files:
procno = incorp(os.path.basename(root))
oneup = os.path.dirname(root)
oneup = os.path.dirname(oneup)
aqdir=oneup
expno = incorp(os.path.basename(oneup))
oneup = os.path.dirname(oneup)
dsname = incorp(os.path.basename(oneup))
#Read the titlefile, if any
if (os.path.isfile(root + "/title")):
f=open(root+"/title","r")
title=incorp(f.read(max))
f.close()
else:
title=""
#Grab the pulse program name from the acqus parameter
aqfile=open(aqdir+"/acqus")
for line in aqfile.readlines():
if string.find(line,"PULPROG")>0:
print line
m=re.search('<(.*)>',line)
pulprog=incorp(m.group(1))
break
towrite= "%s;%s;%s;%s;%s\n" % (dsname,expno,procno,pulprog,title)
outfiledesc.write(towrite)
#Main program
dialogline1="Starting point of the search"
dialogline2="Maximum length of the title"
dialogline3="output CSV file"
def1="/opt/topspin3.2/data/nmrafd/nmr"
def2="20000"
def3="/home/nmrafd/filelist.csv"
result = INPUT_DIALOG("CSV file creator","Create a CSV list",[dialogline1,dialogline2,dialogline3],[def1,def2,def3])
start=result[0]
tlength=int(result[1])
outfile=result[2]
#Search for procs files. They should be in any dataset.
outfiledesc = open(outfile,"w")
print start
os.path.walk(start,process,"")
outfiledesc.close()

Use regex to add string to different multilines with simularities

I have a .txt file with multiple lines (Different yet simular for each one) that I want to add a "*.tmp" at the end.
I'm trying to use python2.7 regex to do this.
Here is what I have for the python script:
import sys
import os
import re
import shutil
#Sets the buildpath variable to equal replace one "\" with two "\\" for python code to input/output correctly
buildpath = sys.argv[1]
buildpath = buildpath.replace('\\', '\\\\')
#Opens a tmp file with read/write/append permissions
tf = open('tmp', 'a+')
#Opens the selenium script for scheduling job executions
with open('dumplist.txt') as f:
#Sets line as a variable for every line in the selenium script
for line in f.readlines():
#Sets build as a variable that will replace the \\build\path string in the line of the selenium script
build = re.sub (r'\\\\''.*',''+buildpath+'',line)
#Overwrites the build path string from the handler to the tmp file with all lines included from the selenium script
tf.write(build)
#Saves both "tmp" file and "selenium.html" file by closing them
tf.close()
f.close()
#Copies what was re-written in the tmp file, and writes it over the selenium script
shutil.copy('tmp', 'dumplist.txt')
#Deletes the tmp file
os.remove('tmp')
#exits the script
exit()
Current File Before Replacing the Line:
\\server\dir1\dir2\dir3
DUMP3f2b.tmp
1 File(s) 1,034,010,207 bytes
\\server\dir1_1\dir2_1\dir3_1
DUMP3354.tmp
1 File(s) 939,451,120 bytes
\\server\dir1_2\dir2_2\dir3_2
Current file after replacing string:
\*.tmp
DUMP3f2b.tmp
1 File(s) 1,034,010,207 bytes
\*.tmp
DUMP3354.tmp
1 File(s) 939,451,120 bytes
\*.tmp
Desired file after replacing string:
\\server\dir1\dir2\dir3\*.tmp
DUMP3f2b.tmp
1 File(s) 1,034,010,207 bytes
\\server\dir1_1\dir2_1\dir3_1\*.tmp
DUMP3354.tmp
1 File(s) 939,451,120 bytes
\\server\dir1_2\dir2_2\dir3_2\*.tmp
If anyone could help me in solving this that would be great. Thanks :)
You should use capturing groups:
>>> import re
>>> s = "\\server\dir1\dir2\dir3"
>>> print re.sub(r'(\\.*)', r'\\\1\*.tmp', s)
\\server\dir1\dir2\dir3\*.tmp
Then, modify build = re.sub (r'\\\\''.*',''+buildpath+'',line) line this way:
build = re.sub (r'(\\.*)', r'\\\1%s' % buildpath, line)
Also, you shouldn't call readlines(), just iterate over f:
for line in f: