unpacking rar with R system() - regex

OK this task seems to be really easy to do. However I spent a couple of hours without any results.
User have:
7z
Windows
R
User should enter:
path to 7z (z7path)
filename
System should unpack rar into the project's root
I tried:
cmd = "C:\\Program Files (x86)\\7-Zip\\7z e D:/20140601.rar"
system(shQuote(cmd))
And..nothing happens.
Please don't advise to set up PATH, it doesn't help, and this should work without it.

Ok, I finally got it.
Use shell
Use shQuote for surrounding path
Use right keys
z7path = shQuote('C:\\Program Files (x86)\\7-Zip\\7z')
file = paste(getwd(), '/101-01.rar', sep = '')
cmd = paste(z7path, ' e ', file, ' -y -o', getwd(), '/', sep='')
shell(cmd)

I had to modify the code from the second answer, and finally it works.
You can change "-ir!. -o" by "-y -o" if you want all files.
z7path = shQuote('C:\\Program Files\\7-Zip\\7z')
file = paste('"', 'D:/20140601.rar', '"',sep = '')
cmd = paste(z7path, ' e ', file, ' -ir!*.* -o', '"', getwd(), '"', sep='')
system(cmd)

Related

The glob.glob function to extract data from files

I am trying to run the script below. The intention of the script is to open different fasta files one after the other, and extract the geneID. The script works well if I don't use the glob.glob function. I get this message TypeError: coercing to Unicode: need string or buffer, list found
files='/home/pathtofiles/files'
#print files
#sys.exit()
for file in files:
fastas=sorted(glob.glob(files + '/*.fasta'))
#print fastas[0]
output_handle=(open(fastas, 'r+'))
genes_files=list(SeqIO.parse(output_handle, 'fasta'))
geneID=genes_files[0].id
print geneID
I am running of ideas on how to direct the script to open when file after another to give me the require information.
I see what you are trying to do, but let me first explain why your current approach is not working.
You have a path to a directory with fasta files and you want to loop over the files in that directory. But observe what happens if we do:
>>> files='/home/pathtofiles/files'
>>> for file in files:
>>> print file
/
h
o
m
e
/
p
a
t
h
t
o
f
i
l
e
s
/
f
i
l
e
s
Not the list of filenames you expected! files is a string and when you apply a for loop on a string you simply iterate over the characters in that string.
Also, as doctorlove correctly observed, in your code fastas is a list and open expects a path to a file as first argument. That's why you get the TypeError: ... need string, ... list found.
As an aside (and this is more a problem on Windows then on Linux or Mac), but it is good practice to always use raw string literals (prefix the string with an r) when working with pathnames to prevent the unwanted expansion of backslash escaped sequences like \n and \t to newline and tab.
>>> path = 'C:\Users\norah\temp'
>>> print path
C:\Users
orah emp
>>> path = r'C:\Users\norah\temp'
>>> print path
C:\Users\norah\temp
Another good practice is to use os.path.join() when combining pathnames and filenames. This prevents subtle bugs where your script works on your machine bug gives an error on the machine of your colleague who has a different operating system.
I would also recommend using the with statement when opening files. This assures that the filehandle gets properly closed when you're done with it.
As a final remark, file is a built-in function in Python and it is bad practice to use a variable with the same name as a built-in function because that can cause bugs or confusion later on.
Combing all of the above, I would rewrite your code like this:
import os
import glob
from Bio import SeqIO
path = r'/home/pathtofiles/files'
pattern = os.path.join(path, '*.fasta')
for fasta_path in sorted(glob.glob(pattern)):
print fasta_path
with open(fasta_path, 'r+') as output_handle:
genes_records = SeqIO.parse(output_handle, 'fasta')
for gene_record in genes_records:
print gene_record.id
This is way I solved the problem, and this script works.
import os,sys
import glob
from Bio import SeqIO
def extracting_information_gene_id():
#to extract geneID information and add the reference gene to each different file
files=sorted(glob.glob('/home/path_to_files/files/*.fasta'))
#print file
#sys.exit()
for file in files:
#print file
output_handle=open(file, 'r+')
ref_genes=list(SeqIO.parse(output_handle, 'fasta'))
geneID=ref_genes[0].id
#print geneID
#sys.exit()
#to extract the geneID as a reference record from the genes_files
query_genes=(SeqIO.index('/home/path_to_file/file.fa', 'fasta'))
#print query_genes[geneID].format('fasta') #check point
#sys.exit()
ref_gene=query_genes[geneID].format('fasta')
#print ref_gene #check point
#sys.exit()
output_handle.write(str(ref_gene))
output_handle.close()
query_genes.close()
extracting_information_gene_id()
print 'Reference gene sequence have been added'

Calling rsync with pexpect: glob string not working

I'm attempting to rsync some files with pexpect. It appears the glob string argument I'm providing to identify all the source files is not working.
The gist of it is something like this...
import pexpect
import sys
glob_str = (
"[0-9]" * 4 + "-" +
"[0-9]" * 2 + "-" +
"[0-9]" * 2 + "-" +
"[A-B]" + "*"
)
SRC = "../data/{}".format(glob_str)
DES = "user#host:" + "/path/to/dest/"
args = [
"-avP",
SRC,
DES,
]
print "rsync" + " ".join(args)
# Execute the transfer
child = pexpect.spawn("rsync", args)
child.logfile_read = sys.stdout # log what the child sends back
child.expect("Password:")
child.sendline("#######")
child.expect(pexpect.EOF)
Fails with this...
building file list ...
rsync: link_stat "/Users/U6020643/git/ue-sme-query-logs/code/../data/[0-9][0-9][0-9][0-9]\-[0-9][0-9]\-[0-9][0-9]\-[A-B]*" failed: No such file or directory (2)
0 files to consider
...
The same command run in the shell works just fine
rsync -avP ../data/[0-9][0-9][0-9][0-9]\-[0-9][0-9]\-[0-9][0-9]\-[A-B].csv username#host:/path/to/dest/
The pexpect documentation mentions this
Remember that Pexpect does NOT interpret shell meta characters such as redirect, pipe, or wild cards (>, |, or *). This is a common mistake. If you want to run a command and pipe it through another command then you must also start a shell.
But doing so...
...
args = [
"rsync",
"-avP",
SRC,
DES,
]
...
child = pexpect.spawn("/bin/bash", args) # have to use a shell for glob expansion to work
...
Runs into a permissions issue
/usr/bin/rsync: /usr/bin/rsync: cannot execute binary file
To run rsync with bash you have to use bash -c "cmd...":
args = ["-c", "rsync -avP {} {}".format(SRC, DES)]
child = pexpect.spawn('/bin/bash', args=args)
And I think you can also try rsync --include=PATTERN.

Vim source code compile options with python2.7 and python3.x

I want to use vim to write Python2/3 code, and I want to know how I can compile and run from the editor? Does anyone have any good suggestions, thanks?
Pymode can run code using <leader>r. Here is a example:
If you're using vim writing Python 2 and Python 3, maybe you should compile a vim with +python2, and another one with +python3(then using the first one write python 2 code, the second one write python 3 code),
because Pymode and other plugin for python need +python2/3, but the problem is vim can't compile both with them.
I use a simple script to run python programs. All it requires is to have python installed on the machine. What it does ist run the program and show it's output in a overlay box in Vim. Problem is: if your program is interactive, it won't work. All it does ist display its output after the program is finished.
What I do is:
command! -complete=shellcmd -nargs=+ Shell call s:RunShellCommand(<q-args>)
function! s:RunShellCommand(cmdline)
let isfirst = 1
let words = []
for word in split(a:cmdline)
if isfirst
let isfirst = 0 " don't change first word (shell command)
else
if word[0] =~ '\v[%#<]'
let word = expand(word)
endif
let word = shellescape(word, 1)
endif
call add(words, word)
endfor
let expanded_cmdline = join(words)
botright new
setlocal buftype=nofile bufhidden=wipe nobuflisted noswapfile nowrap
"call setline(1, 'You entered: ' . a:cmdline)
call setline(1, 'CMD: ' . expanded_cmdline)
call append(line('$'), substitute(getline(2), '.', '=', 'g'))
silent execute '$read !'. expanded_cmdline
1
endfunction
This RunShellCommand runs a command and displays it in a popup within vim. Paste it in your vimrc.
For python I use this
nnoremap <silent> <leader>r :Shell python %:p<cr>
in <vimdir>/ftplugin/python.vim
With this all I have to do is use ,r (my <leader> is ,) and it runs the current open python file and shows its output.

How to run a line in Powershell in Python 2.7?

I have a line of Powershell script that runs just fine when I enter it in Powershell's command line. In my Python application which I run from Powershell, I am trying to send this line of script to Powershell.
powershell -command ' & {. ./uploadImageToBigcommerce.ps1; Process-Image '765377' '.jpg' 'C:\Images' 'W:\product_images\import'}'
I know that the script works because I've been able to implement it on its own from the Powershell command line. However, I haven't been able to get Python to send this line to the shell without getting a "non-zero exit status 1."
import subprocess
product = "765377"
scriptPath = "./uploadImageToBigcommerce.ps1"
def process_image(sku, fileType, searchDirectory, destinationPath, scriptPath):
psArgs = "Process-Image '"+sku+"' '"+fileType+"' '"+searchDirectory+"' '"+destinationPath+"'"
subprocess.check_call([create_PSscript_call(scriptPath, psArgs)], shell=True)
def create_PSscript_call(scriptPath, args):
line = "powershell -command ' & {. "+scriptPath+"; "+args+"}'"
print(line)
return line
process_image(product, ".jpg", "C:\Images", "C:\webDAV", scriptPath)
Does anyone have any ideas to help? I've tried:
subprocess.check_call()
subprocess.call()
subprocess.Popen()
And maybe it is just a syntax issue, but I haven't been able to find enough documentation to confirm that.
Using single quotes inside a single quoted string breaks the string. Use double quotes outside and single qoutes inside or vice versa to avoid that. This statement:
powershell -command '& {. ./uploadImageToBigcommerce.ps1; Process-Image '765377' '.jpg' 'C:\Images' 'W:\product_images\import'}'
should rather look like this:
powershell -command "& {. ./uploadImageToBigcommerce.ps1; Process-Image '765377' '.jpg' 'C:\Images' 'W:\product_images\import'}"
Also, I'd use subprocess.call (and a quoting function), like this:
import subprocess
product = '765377'
scriptPath = './uploadImageToBigcommerce.ps1'
def qq(s):
return "'%s'" % s
def process_image(sku, fileType, searchDirectory, destinationPath, scriptPath):
psCode = '. ' + scriptPath + '; Process-Image ' + qq(fileType) + ' ' + \
qq(searchDirectory) + ' ' + qq(destinationPath)
subprocess.call(['powershell', '-Command', '& {'+psCode+'}'], shell=True)
process_image(product, '.jpg', 'C:\Images', 'C:\webDAV', scriptPath)

How to replace string in multiple files in the folder?

I m trying to read two files and replace content of one file with content of other file in files present in folder which also has sub directories.
But its tell sub process not defined.
i'm new to python and shell script can anybody help me with this please?
import os
import sys
import os.path
f = open ( "file1.txt",'r')
g = open ( "file2.txt",'r')
text1=f.readlines()
text2=g.readlines()
i = 0;
for line in text1:
l = line.replace("\r\n", "")
t = text2[i].replace("\r\n", "")
args = "find . -name *.tml"
Path = subprocess.Popen( args , shell=True )
os.system(" sed -r -i 's/" + l + "/" + t + "/g' " + Path)
i = i + 1;
To specifically address your actual error, you need to import the subprocess module as you are making use of it (oddly) in your code:
import subprocess
After that, you will find more problems. I will try and keep it as simple as possible with my suggestions. Code first, then I will break it down. Keep in mind, there are more robust ways to accomplish this task. But I am doing my best to keep in mind your experience level and making it make your current approach as closely as possible.
import subprocess
import sys
# 1
results = subprocess.Popen("find . -name '*.tml'",
shell=True, stdout=subprocess.PIPE)
if results.wait() != 0:
print "error trying to find tml files"
sys.exit(1)
# 2
tml_files = []
for tml in results.stdout:
tml_files.append(tml.strip())
if not tml_files:
print "no tml files found"
sys.exit(0)
tml_string = " ".join(tml_files)
# 3
with open ("file1.txt") as f, open("file2.txt") as g:
while True:
# 4
f_line = f.readline()
if not f_line:
break
g_line = g.readline()
if not g_line:
break
f_line = f_line.strip()
g_line = g_line.strip()
if not f_line or not g_line:
continue
# 5
cmd = "sed -i -e 's/%s/%s/g' %s" % \
(f_line.strip(), g_line.strip(), tml_string)
ret = subprocess.Popen(cmd, shell=True).wait()
if ret != 0:
print "error doing string replacement"
sys.exit(1)
You do not need to read in your entire files at once. If they are large this could be a lot of memory. You can consume a line at a time, and you can also make use of what is called "context managers" when you open the files. This will ensure they close properly no matter what happens:
We start with a subprocess command that is run only once to find all your .tml files. Your version had the same command being run multiple times. If the search path is the same, then we only need it once. This checks the exit code of the command and quits if it failed.
We loop over stdout on the subprocess command, and add the stripped lines to a list. This is a more robust way of your replace("\r\n"). It removes whitespace. A "list comprehension" would be better suited here (down the line). If we didn't find any tml files, then we have no work to do, so we exit. Otherwise, we join them together in a space-separated string to be suitable for our command later.
This is called "context managers". You can open the file in a way that no matter what they will be closed properly. The file is open for the length of the context within that code block. We are going to loop forever, and break when appropriate.
We pull a line, one at a time, from each file. If either line is blank, we reached the end of the file and cannot do any more work, so we break out. We then strip the newlines, and if either string is empty (blank line) we still can't do any work, but we just continue to the next available line.
A modified version of your sed command. We construct the command string on each loop for the source and replacement strings, and tack on the tml file string. Bear in mind this is a very naive approach to the replacement. It really expects your replacement strings to be safe characters and not break the s///g sed format. But we run that with another subprocess command. The wait() simply waits for the return code, and we check it for an error. This approach replaces your os.system() version.
Hope this helps. Eventually you can improve this to do more checking and safe operations.