Django-extension runscript No (valid) module for script - django

I'm trying to create a script that will populate my model families with informations extracted from a text file.
This is my first post in StackOverflow, please be gentle, sorry if the question is not well expressed or not correctly formatted.
Django V 1.9 and running on Python 3.5
Django-extensions installed
This is my model: it's in an app called browse
from django.db import models
from django_extensions.db.models import TimeStampedModel
class families(TimeStampedModel):
rfam_acc = models.CharField(max_length=7)
rfam_id = models.CharField(max_length=40)
description = models.CharField(max_length=75)
author = models.CharField(max_length=50)
comment = models.CharField(max_length=500)
rfam_URL = models.URLField()
Here I have my script familiespopulate.py. Positioned in the PROJECT_ROOT/scripts directory.
import csv
from browse.models import families
file_path = "/Users/work/Desktop/StructuRNA/website/scripts/RFAMfamily12.1.txt"
def run(file_path):
listoflists = list(csv.reader(open(file_path, 'rb'), delimiter='\t'))
for row in listoflists:
families.objects.create(
rfam_acc=row[0],
rfam_id=row[1],
description=row[3],
author=row[4],
comment=row[9],
)
When from Terminal i run:
python manage.py runscript familiespopulate
it returns:
No (valid) module for script 'familiespopulate' found
Try running with a higher verbosity level like: -v2 or -v3
The problem must be in importing the model families, I'm new to django, and I cannot find any solution here on StackOverflow or anywhere else online.
This is why I ask for your help!
Do you know how the model should be imported?
Or... Am I doing something else wrong.
Important piece of information is that the script runs if I modify it to PRINT out the parameters, instead of creating an object in families.
For your information and curiosity I will also post here an extract of the textfile that I'm using.
RF00001 5S_rRNA 1302 5S ribosomal RNA Griffiths-Jones SR, Mifsud W, Gardner PP Szymanski et al, 5S ribosomal database, PMID:11752286 38.00 38.00 37.90 5S ribosomal RNA (5S rRNA) is a component of the large ribosomal subunit in both prokaryotes and eukaryotes. In eukaryotes, it is synthesised by RNA polymerase III (the other eukaryotic rRNAs are cleaved from a 45S precursor synthesised by RNA polymerase I). In Xenopus oocytes, it has been shown that fingers 4-7 of the nine-zinc finger transcription factor TFIIIA can bind to the central region of 5S RNA. Thus, in addition to positively regulating 5S rRNA transcription, TFIIIA also stabilises 5S rRNA until it is required for transcription. NULL cmbuild -F CM SEED cmcalibrate --mpi CM cmsearch --cpu 4 --verbose --nohmmonly -T 24.99 -Z 549862.597050 CM SEQDB 712 183439 0 0 Gene; rRNA; Published; PMID:11283358 7946 0 0.59496 -5.32219 1600000 213632 305 119 1 -3.78120 0.71822 2013-10-03 20:41:44 2016-04-21 23:07:03
This is the first line and the result of the extraction from the listoflists is :
RF00002
5_8S_rRNA
5.8S ribosomal RNA
Griffiths-Jones SR, Mifsud W
5.8S ribosomal RNA (5.8S rRNA) is a component of the large subunit of the eukaryotic ribosome. It is transcribed by RNA polymerase I as part of the 45S precursor that also contains 18S and 28S rRNA. Functionally, it is thought that 5.8S rRNA may be involved in ribosome translocation [2]. It is also known to form covalent linkage to the p53 tumour suppressor protein [3]. 5.8S rRNA is also found in archaea.

Try adding empty file __init__.py (double underscore) into your /scipts folder and run with:
python manage.py runscript scipts.familiespopulate

Apart from adding init.py you are not supposed to pass any parameters in the run method.
def run():
<your code goes here>

Thanks for the useful comments.
I modified my code in this way:
import csv
from browse.models import families
def run():
file_path = "/Users/work/Desktop/StructuRNA/website/scripts/RFAMfamily12.1.txt"
listoflists = list(csv.reader(open(file_path, 'r'),delimiter='\t'))
print(listoflists)
for row in listoflists:
families.objects.create(
rfam_acc=row[0],
rfam_id=row[1],
description=row[3],
author=row[4],
comment=row[9],
)
This is all. Now it worked smoothly.
I want to confirm to everyone that my file: familiespopulate.py was in the folder script with the file init.py
The problem seemed to be resolved when I put
file_path = "/Users/work/Desktop/StructuRNA/website/scripts/RFAMfamily12.1.txt"
Inside the run function, removing the parameter file_path from run(file_path).
Another modify to my code was the argument r inside open(file_path, 'r'), before it was open(file_path, 'rb') that should corrispond to read binary.

I was also getting exactly the same error, I tried all of the solution above but unfortunately did not worked for me. Then I realized my mistake, and I found it.
Inside the script file (which is inside the script/ folder) I used different name for the function, which should be named as 'run'. So, make sure you checked it as well, if you get this error.
Here you can read more about "runscript"

Related

How to get specific fields data from string without field labels fetched from image by using tesseract-ocr

I have written a program that ask for the user to upload the DMV license picture and reads the DMV License details from the uploaded picture using tesseract ocr. I've got the tesseract part work which does well to some extent. I have got a raw string and now I need to parse that string to fetch user details. The problem is that some fields on DMV license have no lables. Like name, address etc. I need to fetch these details. I can't come up with idea (may be I can use regex but don't know how to get it work?). If someone already did that I'd love to take a look. . Any suggestion would be welcome.
Image:
CODE
Here is the code to read the uploaded file and get text from image using tesseract.
from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
# replace the path with the path to tesseract installation directory on server.
pytesseract.pytesseract.tesseract_cmd = "C:\Program Files (x86)\Tesseract-OCR\\tesseract.exe"
#csrf_exempt
def upload_dmv(request):
if request.method == "POST":
dmv = request.FILES['dmv']
extracted_data = pytesseract.image_to_string(Image.open(dmv))
print(extracted_data)
return HttpResponse(b'OK')
Output
W YORK STALE
DRIVER Ee C{ENeSae
876 071652
BOGADO
PETER,GIOVANNI.
9520 93RD ST FL 2
OZONE PARK, NY 114116
SexM_ Height6'-02" Eyes BRO:
00806/06/1992
Expires 06/06/2018
ENONE
RB
Issued 03/09/2017
Usa
~e-h.
Crecutive Deputy Comminsioner of Motor
Class E

Running locally blastn against nt db thru python script

I have a fasta file with sequence that I want to blast locally to 'nt' database dowloaded on my computer from ncbi website
I dowloaded blast 2.6.0.
In order to access blast from anywhere, I did:
gedit ~/.bashrc
export PATH=/usr/local/ncbi-blast-2.6.0+/bin:$PATH
then I did:
source ~/.bashrc
Then I downloaded 'nt' database (155.6GB) and stored it in /usr/local/blastdb
I want to run in python script this command:
from Bio.Blast.Applications import NcbiblastnCommandline
cline = NcbiblastnCommandline(query="/home/proprietaire/Desktop/JADE/stage_scripts/seq_error_fasta.fasta", db="/usr/local/blastdb/nt", evalue=0.001, out="blast_result_local.xml", outfmt=5)
But it is not working for a reason. Please help me figure out what I'm doing wrong. Thank you for your help.
EDIT:
'seq_error_fasta.fasta' : is my fasta file with 64 sequences that I want to blast to 'nt' database.
My 'seq_error_fasta.fasta' contains sequence loaded with error like S, J, X so I want to blast them to 'nt' db in order to get the closest better sequence
I found out that I need to format the nt database dowloaded from ncbi so I did this:
makeblastdb -dbtype nucl -in nt
Then I added this after my cline variable in my python script:
stdout, stderr = cline()
The script is running but unfortunately I'm getting this error now:
Bus error (core dumped)
I think it's a ram memory problem so I thought that I need to shorten 'nt' db by taking only the bacteria sequence. I looked on NCBI for a whole bacteria only database but there is multiple database of different species like more then a thousand.
I also tried blast online using this script:
f = open('output_blast.xml','w')
for rec in SeqIO.parse(open("seq_error_fasta.fasta"), 'fasta):
result_handle = NCBIWWW.qblast("blastn", "nt", rec.format("fasta"), format_type="XML", alignments=1, perc_ident=95, expect= 0.001)
f.write(result_handle.read())
f.close()
but this only doing one query sequence and returning all hits, althought I specified 1 hit and 95% of identity.
This is driving me crazy lollll Please help
Download NCBI nr database:
$ mkdir db
$ cd db
$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr*
Run blast:
import io
import shlex
import subprocess
BLAST_OUTFMT6 = """\
'6 qacc sacc pident length mismatch gapopen qstart qend sstart send evalue bitscore qseq sseq'\
"""
BLAST_OUTFMT6_COLUMN_NAMES = [
'query_id', 'subject_id', 'pc_identity', 'alignment_length', 'mismatches', 'gap_opens',
'q_start', 'q_end', 's_start', 's_end', 'evalue', 'bitscore', 'qseq', 'sseq',
]
def blastp(sequence, db, evalue=0.001, max_target_seqs=100000):
system_command = (
'blastp -db {db} -outfmt {outfmt} -evalue {evalue} -max_target_seqs {max_target_seqs}'
.format(db=db, outfmt=BLAST_OUTFMT6, evalue=evalue, max_target_seqs=max_target_seqs)
)
cp = subprocess.Popen(
shlex.split(system_command),
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
universal_newlines=True)
result, error_message = cp.communicate(sequence)
if error_message.strip():
print("Error: {}".format(error_message))
return result
if __name__ == '__main__':
result = blastp('AAAAAAAAAAAAAA', db='/path/to/db/nr')
print(result)

Hadoop commands from python script?

I have multiple hadoop commands to be run and these are going to be invoked from a python script. Currently, I tried the following way.
import os
import xml.etree.ElementTree as etree
import subprocess
filename = "sample.xml"
__currentlocation__ = os.getcwd()
__fullpath__ = os.path.join(__currentlocation__,filename)
tree = etree.parse(__fullpath__)
root = tree.getroot()
hivetable = root.find("hivetable").text
dburl = root.find("dburl").text
username = root.find("username").text
password = root.find("password").text
tablename = root.find("tablename").text
mappers = root.find("mappers").text
targetdir = root.find("targetdir").text
print hivetable
print dburl
print username
print password
print tablename
print mappers
print targetdir
p = subprocess.call(['hadoop','fs','-rmr',targetdir],stdout = subprocess.PIPE, stderr = subprocess.PIPE)
But, the code is not working.It is neither throwing an error not deleting the directory.
I suggest you slightly change your approach, or this is how I'm doing it. I make use of python library import commands which then depends how you will use it (https://docs.python.org/2/library/commands.html).
Here is a lil demo:
import commands as com
print com.getoutput('hadoop fs -ls /')
This gives you output like (depending on what you have in the HDFS dir )
/usr/local/Cellar/hadoop/2.7.3/libexec/etc/hadoop/hadoop-env.sh: line 25: /Library/Java/JavaVirtualMachines/jdk1.8.0_112.jdk/Contents/Home: Is a directory
Found 2 items
drwxr-xr-x - someone supergroup 0 2017-03-29 13:48 /hdfs_dir_1
drwxr-xr-x - someone supergroup 0 2017-03-24 13:42 /hdfs_dir_2
Note: the lib commands doesn't work with python 3 (to my knowledge), I'm using python 2.7.
Note: Be aware of the limitation of commands
If you will use subprocess which is the equivalent to commands for python 3 then you might consider to find a proper way to deal with your 'pipelines'. I find this discussion useful in that sense: (subprocess popen to run commands (HDFS/hadoop))
I hope this suggestion helps you!
Best

Manually building a deep copy of a ConfigParser in Python 2.7

Just starting in on my Python learning curve, and hitting a snag in porting some code up to Python 2.7. It appears that in Python 2.7 it is no longer possible to perform a deepcopy() on instances of ConfigParser. It also appears that the Python team isn't terribly interested in restoring such a capability:
http://bugs.python.org/issue16058
Can someone propose an elegant solution for manually constructing a deepcopy/duplicate of an instance of ConfigParser?
Many thanks, -Pete
This is just an example implementation of Jan Vlcinsky answer written in Python 3 (I don't have enough reputation to post this as a comment to Jans answer). Many thanks to Jan for the push in the right direction.
To make a full (deep) copy of base_config into new_config just do the following;
import io
import configparser
config_string = io.StringIO()
base_config.write(config_string)
# We must reset the buffer ready for reading.
config_string.seek(0)
new_config = configparser.ConfigParser()
new_config.read_file(config_string)
Based on #Toenex answer, modified for Python 2.7:
import StringIO
import ConfigParser
# Create a deep copy of the configuration object
config_string = StringIO.StringIO()
base_config.write(config_string)
# We must reset the buffer to make it ready for reading.
config_string.seek(0)
new_config = ConfigParser.ConfigParser()
new_config.readfp(config_string)
The previous solution doesn't work in all python3 use cases. Specifically if the original parser is using Extended Interpolation the copy may fail to work correctly. Fortunately, the easy solution is to use the pickle module:
def deep_copy(config:configparser.ConfigParser)->configparser.ConfigParser:
"""deep copy config"""
rep = pickle.dumps(config)
new_config = pickle.loads(rep)
return new_config
If you need new independent copy of ConfigParser, then one option is:
have original version of ConfigParser
serialize the config file into temporary file or StringIO buffer
use that tmpfile or StringIO buffer to create new ConfigParser.
And you have it done.
If you are using Python 3 (3.2+) you can use the Mapping Protocol Access to copy (actually deep copy) the sections and options of a source configuration to another ConfigParser object.
You can use read_dict() to copy the state of a configuration parser.
Here is a demo:
import configparser
# the configuration to deep copy:
src_cfg = configparser.ConfigParser()
src_cfg.add_section("Section A")
src_cfg["Section A"]["key1"] = "value1"
src_cfg["Section A"]["key2"] = "value2"
# the destination configuration
dst_cfg = configparser.ConfigParser()
dst_cfg.read_dict(src_cfg)
dst_cfg.add_section("Section B")
dst_cfg["Section B"]["key3"] = "value3"
To display the resulting configuration, you can try:
import io
output = io.StringIO()
dst_cfg.write(output)
print(output.getvalue())
You get:
[Section A]
key1 = value1
key2 = value2
[Section B]
key3 = value3
After reading this article, I am more familiar with config.ini.
Record as follows:
import io
import configparser
def copy_config_demo():
with io.StringIO() as memory_file:
memory_file.write(str(test_config_data.__doc__)) # original_config.write(memory_file)
memory_file.seek(0)
new_config = configparser.ConfigParser(interpolation=configparser.ExtendedInterpolation())
new_config.read_file(memory_file)
# below is just for test
for section_name, list_item in [(section_name, new_config.items(section_name)) for section_name in new_config.sections()]:
print('\n[' + section_name + ']')
for key, value in list_item:
print(f'{key}: {value}')
def test_config_data():
"""
[Common]
home_dir: /Users
library_dir: /Library
system_dir: /System
macports_dir: /opt/local
[Frameworks]
Python: >=3.2
path: ${Common:system_dir}/Library/Frameworks/
[Arthur]
name: Carson
my_dir: ${Common:home_dir}/twosheds
my_pictures: ${my_dir}/Pictures
python_dir: ${Frameworks:path}/Python/Versions/${Frameworks:Python}
"""
output:
[Common]
home_dir: /Users
library_dir: /Library
system_dir: /System
macports_dir: /opt/local
[Frameworks]
python: >=3.2
path: /System/Library/Frameworks/
[Arthur]
name: Carson
my_dir: /Users/twosheds
my_pictures: /Users/twosheds/Pictures
python_dir: /System/Library/Frameworks//Python/Versions/>=3.2
hoping it is helpful to you.

How do you shift all pages of a PDF document right by one inch?

I want to shift all the pages of an existing pdf document right one inch so they can be three hole punched without hitting the content. The pdf documents will be already generated so changing the way they are generated is not possible.
It appears iText can do this from a previous question.
What is an equivalent library (or way do this) for C++ or Python?
If it is platform dependent I need one that would work on Linux.
Update: Figured I would post a little script I wrote to do this in case anyone else finds this page and needs it.
Working code thanks to Scott Anderson's suggestion:
rightshift.py
#!/usr/bin/python2
import sys
import os
from pyPdf import PdfFileReader, PdfFileWriter
#not sure what default user space units are.
# just guessed until current document i was looking at worked
uToShift = 50;
if (len(sys.argv) < 3):
print "Usage rightshift [in_file] [out_file]"
sys.exit()
if not os.path.exists(sys.argv[1]):
print "%s does not exist." % sys.argv[1]
sys.exit()
pdfInput = PdfFileReader(file( sys.argv[1], "rb"))
pdfOutput = PdfFileWriter()
pages=pdfInput.getNumPages()
for i in range(0,pages):
p = pdfInput.getPage(i)
for box in (p.mediaBox, p.cropBox, p.bleedBox, p.trimBox, p.artBox):
box.lowerLeft = (box.getLowerLeft_x() - uToShift, box.getLowerLeft_y())
box.upperRight = (box.getUpperRight_x() - uToShift, box.getUpperRight_y())
pdfOutput.addPage( p )
outputStream = file(sys.argv[2], "wb")
pdfOutput.write(outputStream)
outputStream.close()
You can try the pypdf library. In 2022 PyPDF2 was merged back into pypdf.
two ways to perform this task in Linux
using ghostscript trough gsview
look in your /root or /home for the hidden file .gsview.ini
go to section:
[pdfwrite Options]
Options=
Xoffset=0
Yoffset=0
change the values for X axis, settling a convenient value (values are in postscript points, 1 inch = 72 postscript points)
so:
[pdfwrite Options]
Options=
Xoffset=72
Yoffset=0
close .gsview.ini
open your pdf file with gsview
file / convert / pdfwrite
select first odd pages and print to a new file (you can name this as odd.pdf)
now repeat same steps for even pages
open your pdf file with gsview
[pdfwrite Options]
Options=
Xoffset=-72
Yoffset=0
file / convert / pdfwrite
select first even pages and print to a new file (you can name this as even.pdf)
now you need to mix these two pdf with odd and even pages
you can use:
Pdf Transformer
http://sourceforge.net/projects/pdf-transformer/
java -jar ./pdf-transformer-0.4.0.jar <INPUT_FILE_NAME1> <INPUT_FILE_NAME2> <OUTPUT_FILE_NAME> merge -j
2: : use podofobox + pdftk
first step: with pdftk separate whole pdf document in two pdf files with only odd and even pages
pdftk file.pdf cat 1-endodd output odd.pdf && pdftk file.pdf cat 1-endeven output even.pdf
now with podofobox, included into podofo utils
http://podofo.sourceforge.net/about.html
podofobox file.pdf odd.pdf crop -3600 0 widht height for odd pages and
podofobox file.pdf even.pdf crop 3600 0 widht height for even pages
width and height are in postscript point x 100 and can be found with pdfinfo
e.g. if your pdf file has pagesize 482x680, then you enter
./podofobox file.pdf odd.pdf crop -3600 0 48200 68000
./podofobox file.pdf even.pdf crop 3600 0 48200 68000
then you can mix together odd and even in a unique file with already cited
Pdf Transformer
http://sourceforge.net/projects/pdf-transformer/
With pdfjam, the command to translate all pages 1 inch to the right is
pdfjam --offset '1in 0in' doc.pdf
The transformed document is saved to doc-pdfjam.pdf. For further options, type pdfjam --help. Currently pdfjam requires a Unix-like command prompt (Linux, Mac, or Cygwin). In Ubuntu, it can be installed with
sudo apt install pdfjam
Not a full answer, but you can use LaTeX with pdfpages:
http://www.ctan.org/tex-archive/macros/latex/contrib/pdfpages/
Multiple commandline linux tools also use this approach, for instance pdfjam uses this:
http://www2.warwick.ac.uk/fac/sci/statistics/staff/academic-research/firth/software/pdfjam
Maybe pdfjam can already provide what you need already.
Here is a modified version for python3.x.
First install pypdf2 via pip install pypdf2
import sys
import os
from PyPDF2 import PdfFileReader, PdfFileWriter
uToShift = 40; # amount to shift contents by. +ve shifts right
if (len(sys.argv) < 3):
print ("Usage rightshift [in_file] [out_file]")
sys.exit()
if not os.path.exists(sys.argv[1]):
print ("%s does not exist." % sys.argv[1])
sys.exit()
path=os.path.dirname(os.path.realpath(__file__))
with open(("%s\\%s" % (path, sys.argv[1])), "rb") as pdfin:
with open(("%s\\%s" % (path, sys.argv[2])), "wb") as pdfout:
pdfInput = PdfFileReader(pdfin)
pdfOutput = PdfFileWriter()
pages=pdfInput.getNumPages()
for i in range(0,pages):
p = pdfInput.getPage(i)
for box in (p.mediaBox, p.cropBox, p.bleedBox, p.trimBox, p.artBox):
box.lowerLeft = (box.getLowerLeft_x() - uToShift, box.getLowerLeft_y())
box.upperRight = (box.getUpperRight_x() - uToShift, box.getUpperRight_y())
pdfOutput.addPage( p )
pdfOutput.write(pdfout)