Joining many callisto images - python-2.7

c1 = CallistoSpectrogram.read('BIR_20110922_101500_01.fit')
c2 = CallistoSpectrogram.read('BIR_20110922_103000_01.fit')
d = CallistoSpectrogram.join_many([c1, c2])
If I want to join approximately 40 files like this, it is throwing the following error
ValueError: Too large gap.
Is there any limit in number?

This error is an internal error of the sunpy package that you are using. Really your question is not to do with python but to do with that package. You need to tag it with that.
But we can see what's going on by looking at the source, eg here. It shows that the ValueError is thrown when two adjacent spectra are separated by more than the maxgap parameter which defaults to zero.
So one fix might be simply to pass in maxgap = None
d = CallistoSpectrogram.join_many([c1, c2],maxgap = None)
That assumes you don't mind the gaps, of course.

Related

Converting segments of large .cif files to smaller .pdb files

I'm trying to carve out some binding sites with ligands from cif-files of ribosome crystal structures, and have encountered an annoying problem involving a type error.
TypeError: %c requires int or char
Using the code below,
from Bio.PDB import *
from Bio import PDB
class save_res(Select):
def accept_residue(self, residue):
if residue in keep_res_list:
print(residue)
return 1
else:
return 0
keep_res_list = []
parser = MMCIFParser()
structure = parser.get_structure("1vvj.cif", "./1vvj.cif")
structure = structure[0]
atom_list = Selection.unfold_entities(structure, "A") # A for atoms
ns = NeighborSearch(atom_list)
for residue in structure.get_residues():
if residue.get_resname() == "PAR":
for atom in residue:
center = atom.get_coord()
neighbors = ns.search(center, 5.0)
neighbor_residue_list = Selection.unfold_entities(neighbors, "R")
for res in neighbor_residue_list:
if res not in keep_res_list:
keep_res_list.append(res)
io = PDBIO()
io.set_structure(structure)
io.save("1vvj_bs.pdb", save_res())
gives me the error:
File "/scratch/software/anaconda3/envs/my-devel-3.6/lib/python3.6/site-packages/Bio/PDB/PDBIO.py", line 112, in _get_atom_line
return _ATOM_FORMAT_STRING % args
TypeError: %c requires int or char
This code works well when changing the pdb-id to 1fyb, which also has the same ligand id.
I'm thinking the problem stems from the vast amounts of chains and their IDs in the original file. Am I completely wrong in this assumption or does anyone know how to fix this? I've been trying to find a way to rename the chain IDs, but haven't found a viable method to do this.
Your help is appreciated.
The chain name format in _ATOM_FORMAT_STRING is %c, while in this case you have chain named QA.
Chain names in PDB files were traditionally single characters.
But there are only so many letters and digits. For ribosome it's necessary to use longer names. The pdb format has space for a second letter -- empty column on the left from the 1-character chain name. Many programs support it, but not all, and this is not part of the official specification.
So you can either use PDB files with 2-character chains (if the rest of your workflow supports it) or rename chains in the output (your output is only a tiny part of the original structure).
Here is how to do it in gemmi:
import gemmi
structure = gemmi.read_structure('1vvj.cif')
model = structure[0]
ns = gemmi.NeighborSearch(model, structure.cell, 5.0).populate()
for chain in model:
for residue in chain:
if residue.name == 'PAR':
for atom in residue:
for nb in ns.find_neighbors(atom):
nb.to_cra(model).residue.flag = 'y'
sel = gemmi.Selection().set_residue_flags('y')
new_structure = sel.copy_structure_selection(structure)
#new_structure.remove_empty_chains()
#new_structure.shorten_chain_names()
new_structure.write_minimal_pdb('1vvj-par.pdb')
The two commented out lines are renaming the chains.
One difference comparing with your code is that the NeighborSearch in gemmi is symmetry-aware. It finds also nearby atoms from symmetry mates. In BioPython you search only in asymmetric unit (asu).
Both are different than the biological assembly --
PDB-101 covers it nicely.
If you'd like to search in asu only -- replace structure.cell with gemmi.UnitCell() above, i.e. don't pass the unit cell information.
(You can ask such questions on bioinformatics.SE -- it should get answer sooner there).

glmmadmb no longer supported in lsmeans. Alternative packages?

My data is zero inflated so I'm running a zero-inflated model using glmmamdb:
Model3z <- glmmadmb(Count3 ~ Light3 + (1|Site3), zeroInflation = T, family= "poisson", data = dframe3)
However, when I try and do pairwise comparisons of the different light types in this model pwcs3 <- lsmeans(Model3z, "Light") I get the error message:
Error in ref_grid(object, ...) :
Can't handle an object of class “glmmadmb”
Use help("models", package = "emmeans") for information on supported models.
When I go on the emmeans package website it says that glmmadmb is no longer supported.
I've switched to pscl and the zeroinfl function but am unsure on how to restructure my code to fit the pscl format. Typing in P <- zeroinfl(Count3 ~ Light3 + (1|Site3), family = poisson, data = dframe3) gets the error message:
Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In Ops.factor(1, Site3) : ‘|’ not meaningful for factors
Is there another way of using glmmadmb with lsmeans? If not, does anyone know how a zero-inflated model code in pscl is supposed to look? Thanks.

Trying to import dictionary to work with from a url; 'unicode' object not callable

I'm new to coding and have searched as best I can to find out how to solve this before asking.
I'm trying to pull information from poloniex.com REST api, which is in JSon format I believe. I can import the data, and work with it a little bit, but when I try to call and use the elements in the contained dictionaries, I get "'unicode' object not callable". How can I use this information? The end goal with this data is to pull the "BTC: "(volume)" for each coin pair and test if it is <100, and if not, append it to a new list.
The data is presented like this or you can see yourself at https://poloniex.com/public?command=return24hVolume:
{"BTC_LTC":{"BTC":"2.23248854","LTC":"87.10381314"},"BTC_NXT":{"BTC":"0.981616","NXT":"14145"}, ... "totalBTC":"81.89657704","totalLTC":"78.52083806"}
And my code I've been trying to get to work with currently looks like this(I've tried to iterate the information I want a million different ways, so I dunno what example to give for that part, but this is how I am importing the data):
returnvolume = urllib2.urlopen(urllib2.Request('https://poloniex.com/public?command=return24hVolume'))
coinvolume = json.loads(returnvolume.read())
coinvolume = dict(coinvolume)
No matter how I try to use the data I've pulled, I get an error stating:
"unicode' object not callable."
I'd really appreciate a little help, I'm concerned I may be approaching this the wrong way as I haven't been able to get anything to work, or maybe I'm just missing something rudimentary, I'm not sure.
Thank you very much for your time!
Thanks to another user, downshift, I have discovered the answer!
d = {}
for k, v in coinvolume.items():
try:
if float(v['BTC']) > 100:
d[k] = v
except KeyError:
d[k] = v
except TypeError:
if v > 100:
d[k] = k
This creates a new list, d, and adds every coin with a 'BTC' volume > 100 to this new list.
Thanks again downshift, and I hope this helps others as well!

compare two dictionary, one with list of float value per key, the other one a value per key (python)

I have a query sequence that I blasted online using NCBIWWW.qblast. In my xml blast file result I obtained for a query sequence a list of hit (i.e: gi|). Each hit or gi| have multiple hsp. I made a dictionary my_dict1 where I placed gi| as key and I appended the bit score as value. So multiple values for each key.
my_dict1 = {
gi|1002819492|: [437.702, 384.47, 380.86, 380.86, 362.83],
gi|675820360| : [2617.97, 2614.37, 122.112],
gi|953764029| : [414.258, 318.66, 122.112, 86.158],
gi|675820410| : [450.653, 388.08, 386.27] }
Then I looked for max value in each key using:
for key, value in my_dict1.items():
max_value = max(value)
And made a second dictionary my_dict2:
my_dict2 = {
gi|1002819492|: 437.702,
gi|675820360| : 2617.97,
gi|953764029| : 414.258,
gi|675820410| : 450.653 }
I want to compare both dictionary. So I can extract the hsp with the highest score bits. I am also including other parameters like query coverage and identity percentage (Not shown here). The finality is to get the best gi| with the highest bit scores, coverage and identity percentage.
I tried many things to compare both dictionary like this :
First code :
matches[]
if my_dict1.keys() not in my_dict2.keys():
matches[hit_id] = bit_score
else:
matches = matches[hit_id], bit_score
Second code:
if hit_id not in matches.keys():
matches[hit_id]= bit_score
else:
matches = matches[hit_id], bit_score
Third code:
intersection = set(set(my_dict1.items()) & set(my_dict2.items()))
Howerver I always end up with 2 types of errors:
1 ) TypeError: list indices must be integers, not unicode
2 ) ... float not iterable...
Please I need some help and guidance. Thank you very much in advance for your time. Best regards.
It's not clear what you're trying to do. What is hit_id? What is bit_score? It looks like your second dict is always going to have the same keys as your first if you're creating it by pulling the max value for each key of the first dict.
You say you're trying to compare them, but don't really state what you're actually trying to do. Find those with values under a certain max? Find those with the highest max?
Your first code doesn't work because I'm assuming you're trying to use a dict key value as an index to matches, which you define as a list. That's probably where your first error is coming from, though you haven't given the lines where the error is actually occurring.
See in-code comments below:
# First off, this needs to be a dict.
matches{}
# This will never happen if you've created these dicts as you stated.
if my_dict1.keys() not in my_dict2.keys():
matches[hit_id] = bit_score # Not clear what bit_score is?
else:
# Also not sure what you're trying to do here. This will assign a tuple
# to matches with whatever the value of matches[hit_id] is and bit_score.
matches = matches[hit_id], bit_score
Regardless, we really need more information and the full code to figure out your actual goal and what's going wrong.

What causes "UserWarning: Discarded range with reserved name" - openpyxl

I have a simple EXCEL-sheet with names of cities in column A and I want to extract them and put them in a list:
def getCityfromEXCEL():
wb = load_workbook(filename='test.xlsx', read_only=True)
ws = wb['Sheet1']
cityList = []
for i in range(2, ws.get_highest_row()+1):
acell = "A"+str(i)
cityString = ws[acell].value
city = ftfy.fix_text_encoding(cityString)
cityList.append(city)
getCityfromEXCEL()
With a small file that worked perfectly (70 rows). Now I'm processing a big file (8300 rows) and it gives me this error:
/Library/Python/2.7/site-packages/openpyxl/workbook/names/named_range.py:121: UserWarning: Discarded range with reserved name
warnings.warn("Discarded range with reserved name")
but it does not abort. It just does not seem to continue anymore. Can someone tell me what might cause the error? Is it something in the .xlsx? Any special hints what I can look for?
It's supposed to be a friendly warning letting you know that some of the defined names are being lost when reading the file. Warnings in Python are not exceptions but informational notices.
Support for defined names is essentially limited to references to cell ranges in openpyxl at the moment. But they can refer to lots of other things like printing settings. However, if the objects/values they refer to are not preserved by openpyxl and the file is saved and later opened by Excel it might complain about the missing objects.
If you want to ignore it:
import warnings
warnings.simplefilter("ignore")
wb = load_workbook(path)
warnings.simplefilter("default")
In my case this warning shows up when filtering is on one of my worksheets. I wanted to suppress the warning so that it didn't bother my users and I just put this line in my code before the openpyxl.load_workbook call:
warnings.simplefilter("ignore")