Filtering multidimensional views in xtensor - c++

I am trying to filter a 2D xtensor view with a simple condition. I found the xt::filter function, but when i use it, it only return the first column of the filtered view. I need the 2D filtered view. What is the best way to do it?
I could check the condition line by line, and get all the indexes myself, and the use xt::view to only show the needed lines, but i am hopig in a more sophisticated method using the xtensor toolset.
My current filter, which returns only one direction looks like this:
auto unfiltered = xt::view(...);
auto filtered = xt::filter(unfiltered, xt::view(unfiltered, xt::all(), 0) > tresh);
EDIT:
It is possible i was not completly clear. I need a 2D view where i kept only those lines, where the first element of the line is greater than the treshold.

xt::view(unfiltered, xt::all(), 0)
is creating a view that only contains the first column of unfiltered. The following should do what you expect:
auto unfiltered = xt::view(...);
auto filtered = xt::filter(unfiltered, unfiltered > tresh);
EDIT: sorry for the misunderstanding, here is an update following OP remark:
The condition is not broadcast to the shape of the expression to filter, a workaround for now is:
auto unfiltered = xt::view(...);
auto filtered = xt::filter(unfiltered,
xt::broadcast(xt::view(unfiltered, xt::all(), 0, xt::newaxis()),
unfiltered.shape()) > tresh);
I'll open an issue for this.
Also notice that filter returns a 1D expression (because the elements satisfying a condition may be scattered in the original expression), you need to reshape it to get a 2D expression.

Related

find index based on first element in a nested list

I have a list that contains sublists. The sequence of the sublist is fixed, as are the number of elements.
schedule = [['date1', 'action1', beginvalue1, endvalue1],
['date2', 'action2', beginvalue2, endvalue2],
...
]
Say, I have a date and I want find what I have to do on that date, meaning I require to find the contents of the entire sublist, given only the date.
I did the following (which works): I created a intermediate list, with all the first values of the sublists. Based on the index i was able to retrieve its entire contents, as follows:
dt = 'date150' # To just have a value to make underlying code more clear
ls_intermediate = [item[0] for item in schedule]
index = ls_intermediate.index(dt)
print(schedule[index])
It works but it just does not seem the Python way to do this. How can I improve this piece of code?
To be complete: there are no double 'date' entries in the list. Every date is unique and appears only once.
Learning Python, and having quite a journey in front of me...
thank you!

How to populate a value when comparing two columns, VLOOKUP or IF?

I'm trying to create "Sale Rep" summaries by "Shop", where I can simply filter a column by the rep's name, them populate a total sales for each shop next to the relevant filter result.
I'm using this to filter all the Stores by Scott:
=(filter(D25:D47,A25:A47 = "Scott"))
Next, want to associate the Store/Account in F to populate with the corresponding value of E inside of G. So, G25 should populate the value of E25 ($724), G26 with E26 ($822), and F27 with E38 ($511.50)
I don't know how to write the formula correctly, but something like this is what I'm trying to do: =IF(F25=D25:D38),E25 I know that's not right, and it won't work in a fill down. But I'm basically trying to look for and copy over the correct value match of D and E inside of G. So, Misty Mountain Medicince in F27 will be matched to the value of E38 and populated in G27.
The filter is what's throwing me off, because it's not a simple fill down. And I don't know how to match filtered results from one column to a matched value in another.
Hope the screenshot helps. Screenshot of table:
Change Field Rep: Scott to Scott and you might apply:
=query(A25:E38,"select D,E where A='"&F24&"'")
// Enter the following into G25 and copy down column G
=(filter(E25:E47, D25:D47 = F25))
or
// Enter the following into G25 will expand with content in F upto row 47
=ArrayFormula(IF(F25:F47 <> 0, VLOOKUP(F25:F47, D25:E47, 2, FALSE),))

compare two dictionary, one with list of float value per key, the other one a value per key (python)

I have a query sequence that I blasted online using NCBIWWW.qblast. In my xml blast file result I obtained for a query sequence a list of hit (i.e: gi|). Each hit or gi| have multiple hsp. I made a dictionary my_dict1 where I placed gi| as key and I appended the bit score as value. So multiple values for each key.
my_dict1 = {
gi|1002819492|: [437.702, 384.47, 380.86, 380.86, 362.83],
gi|675820360| : [2617.97, 2614.37, 122.112],
gi|953764029| : [414.258, 318.66, 122.112, 86.158],
gi|675820410| : [450.653, 388.08, 386.27] }
Then I looked for max value in each key using:
for key, value in my_dict1.items():
max_value = max(value)
And made a second dictionary my_dict2:
my_dict2 = {
gi|1002819492|: 437.702,
gi|675820360| : 2617.97,
gi|953764029| : 414.258,
gi|675820410| : 450.653 }
I want to compare both dictionary. So I can extract the hsp with the highest score bits. I am also including other parameters like query coverage and identity percentage (Not shown here). The finality is to get the best gi| with the highest bit scores, coverage and identity percentage.
I tried many things to compare both dictionary like this :
First code :
matches[]
if my_dict1.keys() not in my_dict2.keys():
matches[hit_id] = bit_score
else:
matches = matches[hit_id], bit_score
Second code:
if hit_id not in matches.keys():
matches[hit_id]= bit_score
else:
matches = matches[hit_id], bit_score
Third code:
intersection = set(set(my_dict1.items()) & set(my_dict2.items()))
Howerver I always end up with 2 types of errors:
1 ) TypeError: list indices must be integers, not unicode
2 ) ... float not iterable...
Please I need some help and guidance. Thank you very much in advance for your time. Best regards.
It's not clear what you're trying to do. What is hit_id? What is bit_score? It looks like your second dict is always going to have the same keys as your first if you're creating it by pulling the max value for each key of the first dict.
You say you're trying to compare them, but don't really state what you're actually trying to do. Find those with values under a certain max? Find those with the highest max?
Your first code doesn't work because I'm assuming you're trying to use a dict key value as an index to matches, which you define as a list. That's probably where your first error is coming from, though you haven't given the lines where the error is actually occurring.
See in-code comments below:
# First off, this needs to be a dict.
matches{}
# This will never happen if you've created these dicts as you stated.
if my_dict1.keys() not in my_dict2.keys():
matches[hit_id] = bit_score # Not clear what bit_score is?
else:
# Also not sure what you're trying to do here. This will assign a tuple
# to matches with whatever the value of matches[hit_id] is and bit_score.
matches = matches[hit_id], bit_score
Regardless, we really need more information and the full code to figure out your actual goal and what's going wrong.

how to apply cell style when using `append` in openpyxl?

I am using openpyxl to create an Excel worksheet. I want to apply styles when I insert the data. The trouble is that the append method takes a list of data and automatically inserts them to cells. I cannot seem to specify a font to apply to this operation.
I can go back and apply a style to individual cells after-the-fact, but this requires overhead to find out how many data points were in the list, and which row I am currently appending to. Is there an easier way?
This illustrative code shows what I would like to do:
def create_xlsx(self, header):
self.ft_base = Font(name='Calibri', size=10)
self.ft_bold = self.ft_base.copy(bold=True)
if header:
self.ws.append(header, font=ft_bold) # cannot apply style during append
ws.append() is designed for appending rows of data easily. It does, however, also allow you to include placeless cells within a row so that you can apply formatting while adding data. This is primarily of interest when using write_only=True but will work for normal workbooks.
Your code would look something like:
data = [1, 3, 4, 9, 10]
def styled_cells(data):
for c in data:
if c == 1:
c = Cell(ws, column="A", row=1, value=c)
c.font = Font(bold=True)
yield c
ws.append(styled_cells(data))
openpyxl will correct the coordinates of such cells.

PHPExcel Protect a single column

I have issues with cell protection.
I would like to protect just one column, B for example.
So I tried:
$sheet->getProtection()->setSheet(true);
$highestRow = $sheet->getHighestRow();
$sheet->getStyle('A1:J2000)->getProtection()->setLocked( PHPExcel_Style_Protection::PROTECTION_UNPROTECTED );
for($i=1;$i<=$highestRow;$i++)
{
$sheet->getStyleByColumnAndRow(1,$i)->getProtection()->setLocked(PHPExcel_Style_Protection::PROTECTION_PROTECTED);
}
But it's really slow, and not good because if I need to open my sheet again
$highestRow = $sheet->getHighestRow(); will return "J".
Another solution would be to get the last non-empty column, do you know how to do that? Because getHighestRow(Column) return the columns unprotected or empty.
The loop is slow because you're applying the style to each individual cell, rather than to the range of cells demonstrated in your
$sheet->getStyle('A1:J2000)->getProtection()->setLocked( PHPExcel_Style_Protection::PROTECTION_UNPROTECTED );
line
: one call to set the style for a range of 1000 cells is more that 1000 times faster than applying it to each of 1000 cells individually.
$sheet->getHighestDataRow();
will return the highest row in the worksheet that contains actual data values
$sheet->getHighestDataColumn();
is the column equivalent
First you can protect complete sheet. After that you can uprotect others. This code will protect the first column and first
$objPHPExcel->getActiveSheet()->getProtection()->setSheet(true);
$objPHPExcel->getActiveSheet()->getStyle('B2:Z400')->getProtection()->setLocked(PHPExcel_Style_Protection::PROTECTION_UNPROTECTED);