BIOPERL. Bio::Graphics with GFF file - bioperl

I need to obtain something like this:
However, I cannot know how to continue... now I have this:
In other words... I cannot know how to add tags and the corresponding transcripts, CDS, etc.
My code right now is the following one:
#!/usr/bin/perl
#use strict;
use Bio::Graphics;
use Bio::SeqFeature::Generic;
my $panel = Bio::Graphics::Panel->new(
-length => 20000,
-width => 800
);
my $full_length = Bio::SeqFeature::Generic->new(
-start => 1,
-end => 20000,
);
$panel->add_track($full_length,
-key => "hola",
-glyph => 'arrow',
-tick => 2,
-fgcolor => 'black',
-double => 1,
);
my $track = $panel->add_track(
-glyph => 'generic',
-label => 1
);
my $track = $panel->add_track(
-glyph => 'generic',
-label => 1
);
$seq = "";
$seqlength = length($seq);
$count = 0;
while (<>) {
chomp;
next if /^\#/;
my #gff_data = split /\t+/;
next if ($gff_data[2] ne "gene");
my $feature = Bio::SeqFeature::Generic->new(
-display_name => $gff_data[8],
-score => $gff_data[5],
-start => $gff_data[3],
-end => $gff_data[4],
);
$track->add_feature($feature);
}
print $panel->png;
I've read as well the CPAN information but no clue... There is a lot of information for NCBI files but nothing for GFF...
My data:
313-9640000-9660000:19634:fwd maker gene 1978 7195 . + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10;Name=maker-313-9640000-9660000%253A19634%253Afwd-augustus-gene-0.10
313-9640000-9660000:19634:fwd maker mRNA 1978 7195 . + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1;Name=maker-313-9640000-9660000%253A19634%253Afwd-augustus-gene-0.10-mRNA-1;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10
313-9640000-9660000:19634:fwd maker exon 1978 2207 0.48 + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd maker exon 3081 3457 0.48 + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd maker exon 3535 3700 0.48 + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
Any help will be very wellcome.

use Bio::Graphics;
use Bio::Tools::GFF;
use Bio::SeqFeature::Generic;
$gfffile = shift;
my $gff = Bio::Tools::GFF->new(-file => $gfffile, -gff_version => 3);
while($feature = $gff->next_feature()) {
$tag = $feature->primary_tag;
push #{$hash{$tag}}, $feature;
}
$gff->close();
my $panel = Bio::Graphics::Panel->new(
-length => 20000,
-width => 800,
-key_style => 'between',
);
my $full_length = Bio::SeqFeature::Generic->new(
-start => 1,
-end => 20000,
);
$panel->add_track($full_length,
-key => "hola",
-glyph => 'arrow',
-tick => 2,
-fgcolor => 'black',
-double => 1,
);
my #colors = qw(cyan orange blue);
my $idx = 0;
for my $tag (sort keys %hash) {
my $features = $hash{$tag};
$panel->add_track($features,
-glyph => 'generic',
-bgcolor => $colors[$idx++ % #colors],
-fgcolor => 'black',
-font2color => 'red',
-key => "${tag}s",
-bump => +1,
-height => 8,
-label => 1,
-description => 1,
);
}
print $panel->png;

What you are showing in the first screen capture is likely from GBrowse, and the track labels (I think this is what you mean by 'tags') are defined in the configuration file. The feature label can be turned on/off by setting the 'label' and 'label_feat' attributes when you create the object. You will have to manually edit the file if you don't like those long strings set as the ID by MAKER.
You can change the appearance of each feature by choosing a different glyph. For example, you chose the 'generic' glyph, so what you get is a pretty generic looking box which just shows you the location of the feature. For nice looking transcripts, take a look at the 'processed_transcript' and 'decorated_transcript' glyphs.
There are also some problems with your code, such duplication of certain sections, but that may be from doing a copy and paste of the code.

I had the same problem than you. The thing is not very easy to explain. Finally I solved it and mi figure is pretty much similar at yours.
In my case I wanted to connect some encode data in the 5' UTR of several genes. And this encode data should be connected with dashed lines.
So I create a Bio::SeqFeature::Generic object and the information to plot inside are my encode regions, so this encode regions need to be sub_located inside :
my $splitlocation = Bio::Location::Split->new();
foreach $encode_regions ....{
$splitlocation->add_sub_Location(Bio::Location::Simple->new(-start=>$start,-end=>$end,- strand=>$strand,-splittype=>"join"));
}
$exones = new Bio::SeqFeature::Generic(
-primary_tag => 'Encode Regions',
-location=>$splitlocation
);

Related

How to remove double quotes from keys in RDD and split JSON into two lines?

I need to modify the data to give input to CEP system, my current data looks like below
val rdd = {"var":"system-ready","value":0.0,"objectID":"2018","partnumber":2,"t":"2017-08-25 11:27:39.000"}
I need output like
t = "2017-08-25 11:27:39.000
Check = { var = "system-ready",value = 0.0, objectID = "2018", partnumber = 2 }
I have to write RDD map operations to achieve this if anybody suggests better option welcome. colcount is the number of columns.
rdd.map(x => x.split("\":").mkString("\" ="))
.map((f => (f.dropRight(1).split(",").last.toString, f.drop(1).split(",").toSeq.take(colCount-1).toString)))
.map(f => (f._1, f._2.replace("WrappedArray(", "Check = {")))
.map(f => (f._1.drop(0).replace("\"t\"", "t"), f._2.dropRight(1).replace("(", "{"))) /
.map(f => f.toString().split(",C").mkString("\nC").replace(")", "}").drop(0).replace("(", "")) // replacing , with \n, droping (
.map(f => f.replace("\" =\"", "=\"").replace("\", \"", "\",").replace("\" =", "=").replace(", \"", ",").replace("{\"", "{"))
Scala's JSON parser seems to be a good choice for this problem:
import scala.util.parsing.json
rdd.map( x => {
JSON.parseFull(x).get.asInstanceOf[Map[String,String]]
})
This will result in an RDD[Map[String, String]]. You can then access the t field from the JSON, for example, using:
.map(dict => "t = "+dict("t"))

Cakephp sum() with multiplication

Is it possible to find the cakephp with additional condition or can we multiply the sum() field with conditional variable?
I don't know, what could be a best question to let you guys know what I'm asking. so I'm giving a more detail about question. I'm able to find similar data with single array with some trick. but I'm unable to find for this. Please help me or give me an idea to do such task. Thanks.
Using CakePhp 2.6
I want to find data from first table here..
$incentiveweekly=$this->Incentive->find('all',
array('conditions' =>
array(
"AND"=>array(
"Incentive.fromdate >=" => $from,
"Incentive.todate <=" => $to
)
)
)
);
according to number of rows. I have to find another table
Here is the result of above find condition.
Array
(
[Incentive] => Array
(
[id] => 2
[target1] => 3000
[price1] => 1.5
[target2] => 6000
[price2] => 2.5
[target3] => 8000
[price3] => 3.5
[formonth] =>
[type] => 1
[fromdate] => 2016-11-13
[todate] => 2016-11-21
[updatedby] => 1
[created] => 2016-11-15 23:57:21
[modified] => 2016-11-15 23:57:21
)
)
Array
(
[Incentive] => Array
(
[id] => 3
[target1] => 3000
[price1] => 1.5
[target2] => 6000
[price2] => 2.5
[target3] => 8000
[price3] => 3.5
[formonth] =>
[type] => 1
[fromdate] => 2016-11-24
[todate] => 2016-11-28
[updatedby] => 1
[created] => 2016-11-15 23:57:21
[modified] => 2016-11-15 23:57:21
)
)
Now I want to find the array according to number of array record.
$byweek=array(); // Storing Customer data by Target dates in array()
foreach ($incentiveweekly as $weekly){
print_r($weekly);
$target3=$weekly['Incentive']['target3'];
$target2=$weekly['Incentive']['target2'];
$target1=$weekly['Incentive']['target1'];
$price3=$weekly['Incentive']['price3'];
$price2=$weekly['Incentive']['price2'];
$price1=$weekly['Incentive']['price1'];
$byweek[]=$customers=$this->Customer->find('all',array(
'fields'=>array(
'SUM(amount) AS amount',
'created_by'
),
'group' => 'Customer.created_by',
'conditions' => array(
"AND" =>array(
"Customer.created >=" => $weekly['Incentive']['fromdate'],
"Customer.created <=" => $weekly['Incentive']['todate']
)
),
'recursive'=>-1 )
);
//print_r($byweek);
}
I'm getting result like ...
Array
(
[0] => Array
(
[0] => Array
(
[Customer] => Array
(
[created_by] => 3
)
)
)
)
Array
(
[0] => Array
(
[0] => Array
(
[Customer] => Array
(
[created_by] => 3
)
)
)
[1] => Array
(
[0] => Array
(
[Customer] => Array
(
[created_by] => 1
)
)
[1] => Array
(
[Customer] => Array
(
[created_by] => 2
)
)
)
)
But I want that amount would multiply with on the if else condition where I'm using ternary operator.
$value[0]['amount']>=$valuem['Incentive']['target3']?"Target Third":($value[0]['amount']>=$valuem['Incentive']['target2']?"Target Second":($value[0]['amount']>=$valuem['Incentive']['target1']?"Target First":"None"))
Main purpose to find the details are. I want to create an incentive amount where total sales amount should be match with given target amount. If target1 amount one match then incentive would be total amount*price1 and same with target2 and target3. where i'm using tenantry operator. Target price should be multiply(by condition) with the same find condition data.
I search on google and stack overflow but can't find it's solutions. However I'm thankful to stack overflow and all you gusy that i'm able to find a lots of trick and solutions.
I haven't got exactly though you can try with below solution as per my thought,
You need to use WHEN close in MYSQL,
$incenve = 'CASE WHEN amount>5000 THEN "FIRST Target" WHEN amount>3000 THEN "Second Target" ELSE 0 '
and inside fields in above query
'incentive' => $incentive
Thanks Oldskool for this solution Virtualfields with If Condition
Finaly I, Got my solutions by this code bellow. I'm getting result as expected. :)
$this->Customer->virtualFields=array(
'incentive' => "if(SUM(Customer.amount)>=$target3,(SUM(Customer.amount))*$price3,if(SUM(Customer.amount)>=$target2,(SUM(Customer.amount))*$price2,if(SUM(Customer.amount)>=$target1,(SUM(Customer.amount))*$price1,0)))",
);
Now I can use Virtualfied to get the Incentive value.

Pass data variable to function when Unit Testing

I am new to unit testing in PHP and I'm having a little trouble. Whether that is because I'm using the Cake framework or because I'm used to the Java way, point it I'm having issues.
I'm writing tests for a Model function that gets called on the submit of a form. The function receives two parameters, which I think I'm passing through correctly, and a data object that is not received as a parameter. My question is how do I populate that "data" object? I keep getting and "undefined index" error when I run the tests.
I've tried both mocking the data and using fixtures, but in all honesty, I don't get this stuff. Below is my model function, followed by my test code.
public function isUniqueIfVerified($check, $unverified){
$found = false;
if ($this->data['Client']['client_type_id'] == 5) {
$found = $this->find ( 'first', array (
'conditions' => array (
$check,
$this->alias . '.' . $this->primaryKey . ' !=' => $this->id,
'client_type_id <>' => 5
),
'fields' => array (
'Client.id'
)
) );
} else {
$found = $this->find ( 'first', array (
'conditions' => array (
$check,
$this->alias . '.' . $this->primaryKey . ' !=' => $this->id
),
'fields' => array (
'Client.id'
)
) );
}
if ($found) {
return false;
} else {
return true;
}
}
This is like the 52 version of my test function, so feel free to just do whatever you want with it. I was thinking that mocking the data would be easier and faster, since I only really need the 'client_type_id' for the condition inside my Model function, but I couldn't get that 'data' object to work, so I switched to fixtures... with no success.
public function testIsUniqueIfVerified01() {
$this->Client = $this->getMock ( 'Client', array (
'find'
) );
$this->Client->set(array(
'client_type_id' => 1,
'identity_no' => 1234567890123
));
//$this->Client->log($this->Client->data);
$check = array (
'identity_no' => '1234567890123'
);
$unverified = null;
$this->Client = $this->getMockforModel("Client",array('find'));
$this->Client->expects($this->once())
->method("find")
->with('first', array (
'conditions' => array (
"identity_no" => "1234567890123",
"Client.id" => "7711883306236",
'client_type_id <>' => 5
),
'fields' => array (
'Client.id'
)
))
->will($this->returnValue(false));
$this->assertTrue($this->Client->isUniqueIfVerified($check, $unverified));
unset ( $this->Client );
}
Again, I'm very green when it comes to Cake, and more specifically PHP Unit Testing, so feel free to explain where I went wrong.
Thanks!
You'll need to make a slight adjustment to your model function (which I'll show below) but then you should be able to do something like this to pass through data in the data object:
$this->Client->data = array(
'Client' => array(
'client_type_id' => 5,
'identity_no' => 1234567890123
));
This is instead of the "set" you used, as below:
$this->Client->set(array( ...
Also, you mocked the Client model, then "set" a few things, but then just before you do the test, you mock it again. This means you're throwing away all the thins you set for the mock you did right at the top. You can do something as below which should solve you problem:
public function testIsUniqueIfVerified01() {
$this->Client = $this->getMock ( 'Client', array (
'find'
) );
$this->Client->data = array(
'Client' => array(
'client_type_id' => 5,
'identity_no' => 1234567890123
));
$check = array (
'identity_no' => '1234567890123'
);
$unverified = null;
$this->Client->expects($this->once())
->method("find")
->with($this->identicalTo('first'), $this->identicalTo(array(
'conditions' => array (
$check,
"Client.id !=" => 1,
'client_type_id <>' => 5
),
'fields' => array (
'Client.id'
)
)))
->will($this->returnValue(false));
$this->assertTrue($this->Client->isUniqueIfVerified($check, $unverified));
unset ( $this->Client );
}
This should at least give you an idea of what to do. Hope it helps!

Doctrine setparamers in wrong order

It seems that doctrine sets parameters in a wrong order. I have the following parameter array:
$params = array(
1 => array(1, 2, 3, 4, 5, 6),
2 => array(150, 12, 130),
3 => 'CALLED',
4 => array('ND', 'PF', 'OS'),
5 => '2015-07-02 00:00:00',
6 => '2015-07-05 00:00:00'
);
And i have the following query
$query = $this->getEntityManager()->createQuery('
SELECT c FROM Customers\Entity\Customer c
WHERE c.customer_categories_id IN(?1)
AND c.countries_id IN(?2)
AND c.state = ?3
AND c.potential_diamonds IN(?4)
AND c.last_call NOT BETWEEN ?5 AND ?6
');
$query->setParameters($params);
$result = $query->getResult();
And this is the final query:
SELECT c0_.id AS id_0, c0_.company AS company_1, c0_.vat_number AS vat_number_2, c0_.first_name AS first_name_3, c0_.last_name AS last_name_4, c0_.phone AS phone_5, c0_.phone2 AS phone2_6, c0_.mobile AS mobile_7, c0_.email AS email_8, c0_.email2 AS email2_9, c0_.fax AS fax_10, c0_.address AS address_11, c0_.address2 AS address2_12, c0_.postal_code AS postal_code_13, c0_.town AS town_14, c0_.province AS province_15, c0_.countries_id AS countries_id_16, c0_.customer_titles_id AS customer_titles_id_17, c0_.customer_categories_id AS customer_categories_id_18, c0_.present_list AS present_list_19, c0_.vip_list AS vip_list_20, c0_.previous_sold AS previous_sold_21, c0_.previous_bought AS previous_bought_22, c0_.opening_hours AS opening_hours_23, c0_.main_office AS main_office_24, c0_.newsletter AS newsletter_25, c0_.website AS website_26, c0_.`state` AS state_27, c0_.potential_diamonds AS potential_diamonds_28, c0_.remarks AS remarks_29, c0_.date_created AS date_created_30, c0_.date_changed AS date_changed_31, c0_.last_call AS last_call_32, c0_.customer_languages_id AS customer_languages_id_33, c0_.display_state AS display_state_34, c0_.longitude AS longitude_35, c0_.latitude AS latitude_36 FROM customers c0_ WHERE c0_.customer_categories_id IN ('ND', '2015-07-05 00:00:00', 'CALLED', 130, 130, 'CALLED') AND c0_.countries_id IN ('ND', '2015-07-05 00:00:00', '2015-07-02 00:00:00') AND c0_.`state` = 'PF' AND c0_.potential_diamonds IN ('2015-07-02 00:00:00', 'OS', 'PF') AND c0_.last_call NOT BETWEEN 'OS' AND 12
if we look in the WHERE part of the query than we see that the parameters are completely flipped. This is very strange. does anyone have an explanation for this or have the same problem? There is not much to find on google.
Thanks in advance

Easily parsable output from rrdtool

I'm working with a large bunch of RRD-files, where I have to query the data quite a lot - and mostly by reading all the data and pass it on.
Currently, I use rrdtool fetch <filename> CF --start XXX --end YYY, but as it only returns data for one CF at a time, I first have to do a separate query to find the CF's (= run and parse rrdtool info <filename>) and then run rrdtool fetch for each found CF. The output is trivial to parse, though.
Alternately, there is rrdtool xport DEF:XX=<filename>:RRA:CF ... XPORT:XX:XX ... with multiple "sets" of the latter commands for each thing I want. On the upside, this can give me all the data in one go, but I still need to have a fairly good idea about what data I want beforehand. Also, it only spits out XML (always a hassle to parse).
I have a feeling I'm missing something very obvious, as it simply can't be such a big hassle to get a list of timestamp → numbers out of a file... Any clues?
While there are patches around for adding JSON-support, there is currently no way around:
Parsing at least two different output formats (rrdtool info's ASCII and then either XML from rrdtool xport or tabular data from rrdtool fetch).
Dumping the entire contents of the file to XML via rrdtool dump and then re-implementing quite a bit of librrd's internals.
I've written a parser that turns the output of rrdtool info /tmp/pb_1_amp.rrd into a nested array. So from:
filename = "/tmp/pb_1_amp.rrd"
rrd_version = "0003"
step = 1800
last_update = 1372685403
header_size = 1208
ds[amp].index = 0
ds[amp].type = "GAUGE"
ds[amp].minimal_heartbeat = 3200
ds[amp].min = 0.0000000000e+00
ds[amp].max = 1.0000000000e+02
ds[amp].last_ds = "5.6"
ds[amp].value = 1.6800000000e+01
ds[amp].unknown_sec = 0
rra[0].cf = "AVERAGE"
rra[0].rows = 576
rra[0].cur_row = 385
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = NaN
rra[0].cdp_prep[0].unknown_datapoints = 0
rra[1].cf = "AVERAGE"
rra[1].rows = 672
rra[1].cur_row = 159
rra[1].pdp_per_row = 6
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 1.6999833333e+01
rra[1].cdp_prep[0].unknown_datapoints = 0
rra[2].cf = "AVERAGE"
rra[2].rows = 732
rra[2].cur_row = 639
rra[2].pdp_per_row = 24
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = 1.6999833333e+01
rra[2].cdp_prep[0].unknown_datapoints = 0
rra[3].cf = "AVERAGE"
rra[3].rows = 1460
rra[3].cur_row = 593
rra[3].pdp_per_row = 144
rra[3].xff = 5.0000000000e-01
rra[3].cdp_prep[0].value = 6.6083527778e+02
rra[3].cdp_prep[0].unknown_datapoints = 0
to:
Array
(
[filename] => /tmp/pb_1_amp.rrd
[rrd_version] => 0003
[step] => 1800
[last_update] => 1372685403
[header_size] => 1208
[ds] => Array
(
[amp] => Array
(
[index] => 0
[type] => GAUGE
[minimal_heartbeat] => 3200
[min] => 0.0000000000e+00
[max] => 1.0000000000e+02
[last_ds] => 5.6
[value] => 1.6800000000e+01
[unknown_sec] => 0
)
)
[rra] => Array
(
[0] => Array
(
[cf] => AVERAGE
[rows] => 576
[cur_row] => 385
[pdp_per_row] => 1
[xff] => 5.0000000000e-01
[cdp_prep] => Array
(
[0] => Array
(
[value] => NaN
[unknown_datapoints] => 0
)
)
)
[1] => Array
(
[cf] => AVERAGE
[rows] => 672
[cur_row] => 159
[pdp_per_row] => 6
[xff] => 5.0000000000e-01
[cdp_prep] => Array
(
[0] => Array
(
[value] => 1.6999833333e+01
[unknown_datapoints] => 0
)
)
)
[2] => Array
(
[cf] => AVERAGE
[rows] => 732
[cur_row] => 639
[pdp_per_row] => 24
[xff] => 5.0000000000e-01
[cdp_prep] => Array
(
[0] => Array
(
[value] => 1.6999833333e+01
[unknown_datapoints] => 0
)
)
)
[3] => Array
(
[cf] => AVERAGE
[rows] => 1460
[cur_row] => 593
[pdp_per_row] => 144
[xff] => 5.0000000000e-01
[cdp_prep] => Array
(
[0] => Array
(
[value] => 6.6083527778e+02
[unknown_datapoints] => 0
)
)
)
)
)
It's in PHP but it should be easy to port to any other language. Here's the code:
$store = array();
foreach ($lines as $line) {
list($raw_key, $raw_val) = explode(' = ', $line);
$keys = preg_split('/[\.\[\]]/', $raw_key, -1, PREG_SPLIT_NO_EMPTY);
$key_count = count($keys);
$pointer = &$store;
foreach ($keys as $key_num => $key) {
if (!array_key_exists($key, $pointer)) {
$pointer[$key] = array();
}
$pointer = &$pointer[$key];
if ($key_num+1 === $key_count) {
$pointer = trim($raw_val, '"');
}
}
}
It assumes the rrdtool info output is split by newline (\n) and found in $lines. Hope this helps.
If you want the 'table of contents' use rrdtool info, if you want the whole content, use rrdtool dump.
BUT ... why would you want that?
cheers
tobi