How to convert Ensembl .gff3 to 12-column .bed

How to convert Ensembl .gff3 to 12-column .bed - file-conversion

I am trying to use the geneBody_coverage.py script from RSeQC, which requires a tab-separated 12-column .bed file as a reference. To do so, I used gff2bed script to convert a .gff3 file from Ensembl to a .bed format. When I run it, I only get errors informing me the file is not in 12-column format. A colleague told me he also tried using gff2bed on Ensembl files and the format was incorrect for him as well. Are there any solutions to this?
I have tried the same thing with a different .gff3 Ensembl file with the same result. I've also tried gtf2bed with the same result.

I figured something out. I used gff3ToGenePred followed by genePredToBed tools from UCSC utilities. This outputs a 12-column .bed.

Related

why is django-dbbackup in .psql.bin format? Can I decode it?

I just installed django-dbbackup.. All working as per the doc (linked).
One thing slightly puzzles me. Why does it dump into a binary format which I don't know how to read? (.psql.bin). Is there a Postgres command to de-bin it?
I found by Googling, that it's possible to get a text dump by adding to settings.py
DBBACKUP_CONNECTOR_MAPPING = {
'django.db.backends.postgresql':
'dbbackup.db.postgresql.PgDumpConnector',
}
This is about 4x bigger as output, but after gzip'ping the file it's about 0.7x the size of the binary and after bzip2, about 0.5x
However, this appears to be undocumented, and I don't like using undocumented for backups! (same reason I want to be able to look at the file :-)

Why does it dump into a binary format which I don't know how to read? (.psql.bin).
You'll get a .psql.bin when using PgDumpBinaryConnector, which is the default for Postgres databases.
Is there a Postgres command to de-bin it?
The magic difference between PgDumpConnector and PgDumpBinaryConnector is the latter passes --format=custom to pgdump which is documented as (emphasis mine)
Output a custom-format archive suitable for input into pg_restore. Together with the directory output format, this is the most flexible output format in that it allows manual selection and reordering of archived items during restore. This format is also compressed by default.
IOW, I don't think there's an off-the-shelf de-binning command for it other than pg_restoreing and pg_dumping back out as regular SQL, because you're not supposed to read it if you're not PostgreSQL.
Of course, the format is de-facto documented in the source for pg_dump...

text files are displayed differently in different clients

I have a c++ program that creates a .txt file and writes output to it. When I open with it with different clients, in this case Clion, Atom, and Gedit, the displays are differently. I tried googling a solution or an explanation, but no luck so far.
This is in Atom
This is in Clion
This is Gedit
What is actually the problem? The way I am writing the output? The client? The encoding?
Extra: What would be the best format to output the displayed data? .txt, .log, others?
Thanks!

How to use .rec format for training in MXNet C++ implementation?

C++ examples of MXNet contain model training examples for MNISTIter, MNIST data set (.idx3-ubyte or .idx1-ubyte). However the same code actually recommends to use im2rec tool to produce the data, and it produces the different .rec format. Looks like the .rec format contains images and labels in the same file, because im2rec takes a prepared .lst file with both (number, label and image file name per each line).
I have produced the code like
auto val_iter = MXDataIter("ImageRecordIter");
setDataIter(&val_iter, "Train", vector < string >
{"output_train.rec", "output_validate.rec"}, batch_size));
with all files present but it fails because four files are still required in the vector (segmentation fault). But why, should not labels be inside the file now?
Digging more into the code, I found that setDataIter actually sets the parameters. Parameters for ImageRecordIter can be found here. I tried to set parameters like path_imgrec, path.imgrec, then call .CreateDataIter() but all this was not helpful - segmentation fault on the first attempt to use the iterator.
I was not able to find a single example in the whole Internet about how to train any MxNet neural network in C++ using .rec file format for training and validation sets. Is it possible? The only work around I found is to try original MNIST tools that produce files covered by MNIST output examples.

Eventually I have used Mnisten to produce the matching data set so that may input format is now the same as MxNet examples use. Mnisten is a good tool to work, just it is important not to forget that it normalizes grayscale pixels into 0..1 range (no more 0..255).
It is a command line tool but with all C++ code available (and there is not really a lot if it), the converter can also be integrated with existing code of the project to handle various specifics. I have never been affiliated with this project before.

Cannot export Pandas dataframe to specified file path in Python for csv and excel both

I have written a program which exports my Pandas DataFrame to a csv as well as an excel file. However, the problem I am facing is that, randomly, the export function to both the file formats does not work, resulting in me seeing an error stating "No such File Path or Directory".
My code is as follows:
frame3.to_csv('C:/Users/Downloads/ABC_test.csv',index=False)
writer = pd.ExcelWriter('C:/Users/Downloads/ABCD.xlsx', engine='openpyxl')
frame3.to_excel(writer, sheet_name='Sheet1')
writer.save()
The major issue is that this code works sometimes and sometimes it does not! Going by what others have posted here, I tried to add the output directory by the use of
pth1 = os.path.join(r'C:/Users/Downloads/FinalProgram/', output_filename)
frame3.to_csv(pth1)
Sadly, this has no effect on this stubborn error. Would appreciate any help / insights possible on the matter.

Forgot to update - I figured a way around this particular problem:
Simply set the working directory for the program to be the output directory (as depicted by the command below), before the calling the 'to_csv' function.
os.chdir('F:/Codelah/')
On a side note, this was an issue I primarily faced on Windows OS - Ubuntu worked like a charm and did not require this workaround!
Hope this helps

Concerning Converting Text file to Binary File c++

I've been at it all night just trying to get a simple program to read in a text file and then copy it/write it back into a binary format.
My code looped through the text doc, got the data, put it in the buffer and wrote it back out. Heck I even hard coded the data I wanted to be written out in binary.
I used fstream, ofstream, example: fp1.open("student.dat",ios::binary);
and was reading up on several different sites such as:
http://www.functionx.com/cpp/articles/serialization.htm
http://www.cppforschool.com/tutorial/files2.html
and I had working code, but when I open the .bin file in my Notepad++ I saw that my text data still looked like text and wasn't really 'converted' over to any hexdecimal format, or anything really. Numbers were, and I double checked to see if they were accurate by y'know, a little website where you can type in the number and it spits out the hex.
I was so fed up as to why my text wasn't converting that I destroyed all my code and tried to start over. *hence the lack of examples"
So, my question, finally is, why wasn't the text changing in any way, is this normal for a binary file and using this method? I've even used pre-made coding examples and it all came out the same way. Am I just expecting it to all look like 1's and 0's and really it's not and it was all really working?
My main project is to convert an .OBJ file to binary data, but really how should I be looking at this? How should this binary file look?
Any advice would be greatly appreciated!!!
Thank you!

I was just using Chars and string and wasn't seeing a difference. Once I started using other data types, it became apparent that there was a change. I understand that .OBJ and .txt are binary file types, i just expected a bigger change. I used cplusplus.com and reviewed what I needed to know more of. Thank you for trying to help I guess!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js