Running locally blastn against nt db thru python script - python-2.7

I have a fasta file with sequence that I want to blast locally to 'nt' database dowloaded on my computer from ncbi website
I dowloaded blast 2.6.0.
In order to access blast from anywhere, I did:
gedit ~/.bashrc
export PATH=/usr/local/ncbi-blast-2.6.0+/bin:$PATH
then I did:
source ~/.bashrc
Then I downloaded 'nt' database (155.6GB) and stored it in /usr/local/blastdb
I want to run in python script this command:
from Bio.Blast.Applications import NcbiblastnCommandline
cline = NcbiblastnCommandline(query="/home/proprietaire/Desktop/JADE/stage_scripts/seq_error_fasta.fasta", db="/usr/local/blastdb/nt", evalue=0.001, out="blast_result_local.xml", outfmt=5)
But it is not working for a reason. Please help me figure out what I'm doing wrong. Thank you for your help.
EDIT:
'seq_error_fasta.fasta' : is my fasta file with 64 sequences that I want to blast to 'nt' database.
My 'seq_error_fasta.fasta' contains sequence loaded with error like S, J, X so I want to blast them to 'nt' db in order to get the closest better sequence
I found out that I need to format the nt database dowloaded from ncbi so I did this:
makeblastdb -dbtype nucl -in nt
Then I added this after my cline variable in my python script:
stdout, stderr = cline()
The script is running but unfortunately I'm getting this error now:
Bus error (core dumped)
I think it's a ram memory problem so I thought that I need to shorten 'nt' db by taking only the bacteria sequence. I looked on NCBI for a whole bacteria only database but there is multiple database of different species like more then a thousand.
I also tried blast online using this script:
f = open('output_blast.xml','w')
for rec in SeqIO.parse(open("seq_error_fasta.fasta"), 'fasta):
result_handle = NCBIWWW.qblast("blastn", "nt", rec.format("fasta"), format_type="XML", alignments=1, perc_ident=95, expect= 0.001)
f.write(result_handle.read())
f.close()
but this only doing one query sequence and returning all hits, althought I specified 1 hit and 95% of identity.
This is driving me crazy lollll Please help

Download NCBI nr database:
$ mkdir db
$ cd db
$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr*
Run blast:
import io
import shlex
import subprocess
BLAST_OUTFMT6 = """\
'6 qacc sacc pident length mismatch gapopen qstart qend sstart send evalue bitscore qseq sseq'\
"""
BLAST_OUTFMT6_COLUMN_NAMES = [
'query_id', 'subject_id', 'pc_identity', 'alignment_length', 'mismatches', 'gap_opens',
'q_start', 'q_end', 's_start', 's_end', 'evalue', 'bitscore', 'qseq', 'sseq',
]
def blastp(sequence, db, evalue=0.001, max_target_seqs=100000):
system_command = (
'blastp -db {db} -outfmt {outfmt} -evalue {evalue} -max_target_seqs {max_target_seqs}'
.format(db=db, outfmt=BLAST_OUTFMT6, evalue=evalue, max_target_seqs=max_target_seqs)
)
cp = subprocess.Popen(
shlex.split(system_command),
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
universal_newlines=True)
result, error_message = cp.communicate(sequence)
if error_message.strip():
print("Error: {}".format(error_message))
return result
if __name__ == '__main__':
result = blastp('AAAAAAAAAAAAAA', db='/path/to/db/nr')
print(result)

Related

Django-extension runscript No (valid) module for script

I'm trying to create a script that will populate my model families with informations extracted from a text file.
This is my first post in StackOverflow, please be gentle, sorry if the question is not well expressed or not correctly formatted.
Django V 1.9 and running on Python 3.5
Django-extensions installed
This is my model: it's in an app called browse
from django.db import models
from django_extensions.db.models import TimeStampedModel
class families(TimeStampedModel):
rfam_acc = models.CharField(max_length=7)
rfam_id = models.CharField(max_length=40)
description = models.CharField(max_length=75)
author = models.CharField(max_length=50)
comment = models.CharField(max_length=500)
rfam_URL = models.URLField()
Here I have my script familiespopulate.py. Positioned in the PROJECT_ROOT/scripts directory.
import csv
from browse.models import families
file_path = "/Users/work/Desktop/StructuRNA/website/scripts/RFAMfamily12.1.txt"
def run(file_path):
listoflists = list(csv.reader(open(file_path, 'rb'), delimiter='\t'))
for row in listoflists:
families.objects.create(
rfam_acc=row[0],
rfam_id=row[1],
description=row[3],
author=row[4],
comment=row[9],
)
When from Terminal i run:
python manage.py runscript familiespopulate
it returns:
No (valid) module for script 'familiespopulate' found
Try running with a higher verbosity level like: -v2 or -v3
The problem must be in importing the model families, I'm new to django, and I cannot find any solution here on StackOverflow or anywhere else online.
This is why I ask for your help!
Do you know how the model should be imported?
Or... Am I doing something else wrong.
Important piece of information is that the script runs if I modify it to PRINT out the parameters, instead of creating an object in families.
For your information and curiosity I will also post here an extract of the textfile that I'm using.
RF00001 5S_rRNA 1302 5S ribosomal RNA Griffiths-Jones SR, Mifsud W, Gardner PP Szymanski et al, 5S ribosomal database, PMID:11752286 38.00 38.00 37.90 5S ribosomal RNA (5S rRNA) is a component of the large ribosomal subunit in both prokaryotes and eukaryotes. In eukaryotes, it is synthesised by RNA polymerase III (the other eukaryotic rRNAs are cleaved from a 45S precursor synthesised by RNA polymerase I). In Xenopus oocytes, it has been shown that fingers 4-7 of the nine-zinc finger transcription factor TFIIIA can bind to the central region of 5S RNA. Thus, in addition to positively regulating 5S rRNA transcription, TFIIIA also stabilises 5S rRNA until it is required for transcription. NULL cmbuild -F CM SEED cmcalibrate --mpi CM cmsearch --cpu 4 --verbose --nohmmonly -T 24.99 -Z 549862.597050 CM SEQDB 712 183439 0 0 Gene; rRNA; Published; PMID:11283358 7946 0 0.59496 -5.32219 1600000 213632 305 119 1 -3.78120 0.71822 2013-10-03 20:41:44 2016-04-21 23:07:03
This is the first line and the result of the extraction from the listoflists is :
RF00002
5_8S_rRNA
5.8S ribosomal RNA
Griffiths-Jones SR, Mifsud W
5.8S ribosomal RNA (5.8S rRNA) is a component of the large subunit of the eukaryotic ribosome. It is transcribed by RNA polymerase I as part of the 45S precursor that also contains 18S and 28S rRNA. Functionally, it is thought that 5.8S rRNA may be involved in ribosome translocation [2]. It is also known to form covalent linkage to the p53 tumour suppressor protein [3]. 5.8S rRNA is also found in archaea.
Try adding empty file __init__.py (double underscore) into your /scipts folder and run with:
python manage.py runscript scipts.familiespopulate
Apart from adding init.py you are not supposed to pass any parameters in the run method.
def run():
<your code goes here>
Thanks for the useful comments.
I modified my code in this way:
import csv
from browse.models import families
def run():
file_path = "/Users/work/Desktop/StructuRNA/website/scripts/RFAMfamily12.1.txt"
listoflists = list(csv.reader(open(file_path, 'r'),delimiter='\t'))
print(listoflists)
for row in listoflists:
families.objects.create(
rfam_acc=row[0],
rfam_id=row[1],
description=row[3],
author=row[4],
comment=row[9],
)
This is all. Now it worked smoothly.
I want to confirm to everyone that my file: familiespopulate.py was in the folder script with the file init.py
The problem seemed to be resolved when I put
file_path = "/Users/work/Desktop/StructuRNA/website/scripts/RFAMfamily12.1.txt"
Inside the run function, removing the parameter file_path from run(file_path).
Another modify to my code was the argument r inside open(file_path, 'r'), before it was open(file_path, 'rb') that should corrispond to read binary.
I was also getting exactly the same error, I tried all of the solution above but unfortunately did not worked for me. Then I realized my mistake, and I found it.
Inside the script file (which is inside the script/ folder) I used different name for the function, which should be named as 'run'. So, make sure you checked it as well, if you get this error.
Here you can read more about "runscript"

Hadoop commands from python script?

I have multiple hadoop commands to be run and these are going to be invoked from a python script. Currently, I tried the following way.
import os
import xml.etree.ElementTree as etree
import subprocess
filename = "sample.xml"
__currentlocation__ = os.getcwd()
__fullpath__ = os.path.join(__currentlocation__,filename)
tree = etree.parse(__fullpath__)
root = tree.getroot()
hivetable = root.find("hivetable").text
dburl = root.find("dburl").text
username = root.find("username").text
password = root.find("password").text
tablename = root.find("tablename").text
mappers = root.find("mappers").text
targetdir = root.find("targetdir").text
print hivetable
print dburl
print username
print password
print tablename
print mappers
print targetdir
p = subprocess.call(['hadoop','fs','-rmr',targetdir],stdout = subprocess.PIPE, stderr = subprocess.PIPE)
But, the code is not working.It is neither throwing an error not deleting the directory.
I suggest you slightly change your approach, or this is how I'm doing it. I make use of python library import commands which then depends how you will use it (https://docs.python.org/2/library/commands.html).
Here is a lil demo:
import commands as com
print com.getoutput('hadoop fs -ls /')
This gives you output like (depending on what you have in the HDFS dir )
/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hadoop-env.sh: line 25: /Library/Java/JavaVirtualMachines/jdk1.8.0_112.jdk/Contents/Home: Is a directory
Found 2 items
drwxr-xr-x - someone supergroup 0 2017-03-29 13:48 /hdfs_dir_1
drwxr-xr-x - someone supergroup 0 2017-03-24 13:42 /hdfs_dir_2
Note: the lib commands doesn't work with python 3 (to my knowledge), I'm using python 2.7.
Note: Be aware of the limitation of commands
If you will use subprocess which is the equivalent to commands for python 3 then you might consider to find a proper way to deal with your 'pipelines'. I find this discussion useful in that sense: (subprocess popen to run commands (HDFS/hadoop))
I hope this suggestion helps you!
Best

python + wx & uno to fill libreoffice using ubuntu 14.04

I collected user data using a wx python gui and than I used uno to fill this data into an openoffice document under ubuntu 10.xx
user + my-script ( +empty document ) --> prefilled document
After upgrading to ubuntu 14.04 uno doesn't work with python 2.7 anymore and now we have libreoffice instead of openoffice in ubuntu. when I try to run my python2.7 code, it says:
ImportError: No module named uno
How could I bring it back to work?
what I tried:
installed https://pypi.python.org/pypi/unotools v0.3.3
sudo apt-get install libreoffice-script-provider-python
converted the code to python3 and got uno importable, but wx is not importable in python3 :-/
ImportError: No module named 'wx'
googled and read python3 only works with wx phoenix
so tried to install: http://wxpython.org/Phoenix/snapshot-builds/
but wasn't able to get it to run with python3
is there a way to get the uno bridge to work with py2.7 under ubuntu 14.04?
Or how to get wx to run with py3?
what else could I try?
Create a python macro in LibreOffice that will do the work of inserting the data into LibreOffice and then in your python 2.7 code envoke the macro.
As the macro is running from with LibreOffice it will use python3.
Here is an example of how to envoke a LibreOffice macro from the command line:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
##
# a python script to run a libreoffice python macro externally
# NOTE: for this to run start libreoffice in the following manner
# soffice "--accept=socket,host=127.0.0.1,port=2002,tcpNoDelay=1;urp;" --writer --norestore
# OR
# nohup soffice "--accept=socket,host=127.0.0.1,port=2002,tcpNoDelay=1;urp;" --writer --norestore &
#
import uno
from com.sun.star.connection import NoConnectException
from com.sun.star.uno import RuntimeException
from com.sun.star.uno import Exception
from com.sun.star.lang import IllegalArgumentException
def uno_directmacro(*args):
localContext = uno.getComponentContext()
localsmgr = localContext.ServiceManager
resolver = localsmgr.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext )
try:
ctx = resolver.resolve("uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext")
except NoConnectException as e:
print ("LibreOffice is not running or not listening on the port given - ("+e.Message+")")
return
msp = ctx.getValueByName("/singletons/com.sun.star.script.provider.theMasterScriptProviderFactory")
sp = msp.createScriptProvider("")
scriptx = sp.getScript('vnd.sun.star.script:directmacro.py$directmacro?language=Python&location=user')
try:
scriptx.invoke((), (), ())
except IllegalArgumentException as e:
print ("The command given is invalid ( "+ e.Message+ ")")
return
except RuntimeException as e:
print("An unknown error occurred: " + e.Message)
return
except Exception as e:
print ("Script error ( "+ e.Message+ ")")
print(e)
return
return(None)
uno_directmacro()
And this is the corresponding macro code within LibreOffice called "directmacro.py" and stored in the User area for libreOffice macros (which would normally be $HOME/.config/libreoffice/4/user/Scripts/python :
#!/usr/bin/python
from com.sun.star.awt.MessageBoxButtons import BUTTONS_OK, BUTTONS_OK_CANCEL, BUTTONS_YES_NO, BUTTONS_YES_NO_CANCEL, BUTTONS_RETRY_CANCEL, BUTTONS_ABORT_IGNORE_RETRY
from com.sun.star.awt.MessageBoxButtons import DEFAULT_BUTTON_OK, DEFAULT_BUTTON_CANCEL, DEFAULT_BUTTON_RETRY, DEFAULT_BUTTON_YES, DEFAULT_BUTTON_NO, DEFAULT_BUTTON_IGNORE
from com.sun.star.awt.MessageBoxType import MESSAGEBOX, INFOBOX, WARNINGBOX, ERRORBOX, QUERYBOX
def directmacro(*args):
import socket, time
class FontSlant():
from com.sun.star.awt.FontSlant import (NONE, ITALIC,)
#get the doc from the scripting context which is made available to all scripts
desktop = XSCRIPTCONTEXT.getDesktop()
model = desktop.getCurrentComponent()
text = model.Text
tRange = text.End
cursor = desktop.getCurrentComponent().getCurrentController().getViewCursor()
doc = XSCRIPTCONTEXT.getDocument()
parentwindow = doc.CurrentController.Frame.ContainerWindow
# your cannot insert simple text and text into a table with the same method
# so we have to know if we are in a table or not.
# oTable and oCurCell will be null if we are not in a table
oTable = cursor.TextTable
oCurCell = cursor.Cell
insert_text = "This is text inserted into a LibreOffice Document\ndirectly from a macro called externally"
Text_Italic = FontSlant.ITALIC
Text_None = FontSlant.NONE
cursor.CharPosture=Text_Italic
if oCurCell == None: # Are we inserting into a table or not?
text.insertString(cursor, insert_text, 0)
else:
cell = oTable.getCellByName(oCurCell.CellName)
cell.insertString(cursor, insert_text, False)
cursor.CharPosture=Text_None
return None
You will of course need to adapt the code to either accept data as arguments, read it from a file or whatever.
Ideally I would say use python 3, because python 2 is becoming outdated. The switch requires quite a bit of new coding changes, but better sooner than later. So I tried:
sudo pip3 install -U --pre \
-f http://wxpython.org/Phoenix/snapshot-builds/ \
wxPython_Phoenix
However this gave me errors, and I didn't want to spend the next couple of days working through them. Probably the pre-release versions are not ready for prime time yet.
So instead, what I recommend is to switch to AOO for now. See https://stackoverflow.com/a/27980255/5100564 for instructions. AOO does not have all the latest features that LO has, but it is a good solid Office product.
Apparently it is also possible to rebuild LibreOffice with python 2 using this script: https://gist.github.com/hbrunn/6f4a007a6ff7f75c0f8b

Django: No such file or directory

I have a process that scans a tape library and looks for media that has expired, so they can be removed and reused before sending the tapes to an offsite vault. (We have some 7 day policies that never make it offsite.) This process takes around 20 minutes to run, so I didn't want it to run on-demand when loading/refreshing the page. Rather, I set up a django-cron job (I know I could have done this in Linux cron, but wanted the project to be as self-contained as possible) to run the scan, and creates a file in /tmp. I've verified that this works -- the file exists in /tmp from this morning's execution. The problem I'm having is that now I want to display a list of those expired (scratch) media on my web page, but the script is saying that it can't find the file. When the file was created, I use the absolute filename "/tmp/scratch.2015-11-13.out" (for example), but here's the error I get in the browser:
IOError at /
[Errno 2] No such file or directory: '/tmp/corpscratch.2015-11-13.out'
My assumption is that this is a "web root" issue, but I just can't figure it out. I tried copying the file to the /static/ and /media/ directories configured in django, and even in the django root directory, and the project root directory, but nothing seems to work. When it says it cant' find /tmp/file, where is it really looking?
def sample():
""" Just testing """
today = datetime.date.today() #format 2015-11-31
inputfile = "/tmp/corpscratch.%s.out" % str(today)
with open(inputfile) as fh: # This is the line reporting the error
lines = [line.strip('\n') for line in fh]
print(lines)
The print statement was used for testing in the shell (which works, I might add), but the browser gives an error.
And the file does exist:
$ ls /tmp/corpscratch.2015-11-13.out
/tmp/corpscratch.2015-11-13.out
Thanks.
Edit: was mistaken, doesn't work in python shell either. Was thinking of a previous issue.
Use this instead:
today = datetime.datetime.today().date()
inputfile = "/tmp/corpscratch.%s.out" % str(today)
Or:
today = datetime.datetime.today().strftime('%Y-%m-%d')
inputfile = "/tmp/corpscratch.%s.out" % today # No need to use str()
See the difference:
>>> str(datetime.datetime.today().date())
'2015-11-13'
>>> str(datetime.datetime.today())
'2015-11-13 15:56:19.578569'
I ended up finding this elsewhere:
today = datetime.date.today() #format 2015-11-31
inputfilename = "tmp/corpscratch.%s.out" % str(today)
inputfile = os.path.join(settings.PROJECT_ROOT, inputfilename)
With settings.py containing the following:
PROJECT_ROOT = os.path.abspath(os.path.dirname(__file__))
Completely resolved my issues.

Program for Backup - Python

Im trying to execute the following code in Python 2.7 on Windows7. The purpose of the code is to take back up from the specified folder to a specified folder as per the naming pattern given.
However, Im not able to get it work. The output has always been 'Backup Failed'.
Please advise on how I get resolve this to get the code working.
Thanks.
Code :
backup_ver1.py
import os
import time
import sys
sys.path.append('C:\Python27\GnuWin32\bin')
source = 'C:\New'
target_dir = 'E:\Backup'
target = target_dir + os.sep + time.strftime('%Y%m%d%H%M%S') + '.zip'
zip_command = "zip -qr {0} {1}".format(target,''.join(source))
print('This is a program for backing up files')
print(zip_command)
if os.system(zip_command)==0:
print('Successful backup to', target)
else:
print('Backup FAILED')
See if escaping the \'s helps :-
source = 'C:\\New'
target_dir = 'E:\\Backup'