Conditional formatting in Python - python-2.7

Is there a way to do conditional formatting in python the way it is in excel?
Example.
Lets say I have the number 5 as average.
If I have a number which is 20% higher than the avaerage I want it to be displayed in the color green
Same for when I have a number which is 20% lower. Than I want the number to be red.
Been trying to figure it out without succes

It's a little bit difficult to understand the context that you want to create the conditional formatting in. If you are referring to formatting data in a pandas dataframe, I think that's impossible. On the other hand, if you're writing your output from python to excel using xlsxwriter then conditional formatting is pretty easy. Though I don't know how to mark 20% above average, the examples here show you how to write conditionally formatted cells that are a certain number of standard deviations above average or that are in the top x%. I hope this helps.

Related

Conditional Formatting Depending Upon Multiple Numbers

I have a column of values that are a number out of 10. So, it could be 2/10, 3/10, 4/10 and so on, all the way up to 10/10. To be clear, these are not dates, but simply showing how many questions the student answered correctly out of 10.
I'm trying to use conditional formatting to highlight them a certain color depending upon the score they got. For 9/10 and 10/10, I'm wanting to use a certain color, but it doesn't seem to be working with REGEXMATCH or with OR. Also wanting to highlight all scores that are 6/10 or lower. I know that I could make this work by applying conditional formatting for each and every score with text contains but the problem I'm finding is that it thinks it's a date.
Is there a way to match multiple scores out of 10 using REGEXMATCH?
Link to Sheet
select column and change formatting to Plain text
now you can use formula like:
=REGEXMATCH(A1; "^9|10\/")

OpenOffice Calc SUM of TRUNC number cells, with rows that include text cells

RE: Apache OpenOffice 4.1.7, AOO417m1(Build:9800) - Rev. 46059c9192, 2019-09-03 12:04.
I need to sum non-integer entries across a range of cells, but without including the decimal values (complicated by some cells being text). I started with ROUNDDOWN, then TRUNC, then FLOOR. I'm driving myself nuts trying to find a clean code (or even an arbitrarily extensible ugly code) for what would be the following:
=SUMIF(ISTEXT(R7:CL7);0;TRUNC(R7:CL7))
The above doesn't work, of course, since TRUNC() doesn't apply to ranges, but it conveys what I'm trying to do in a nutshell -- some of the cells contain text, which SUM() ignores (luckily), but they flummox TRUNC, so I needed to handle the text problem.
I started with ISNUMBER, just to get the ball rolling; ISTEXT has fewer characters, but it's not worth fixing that right now.
FLOOR was equally disappointing for ranges:
=SUM(FLOOR(R7:T7;1))
I tried variations of =SUM(IF(... and searches for ROUNDDOWN range (and variations on that) and such pseudocode as "IFTEXT" and "SUMTRUNC" (and variations on that). I found info on ROUNDDOWN(SUM(... and so forth, but not "SUM(ROUNDDOWN(..." or any equivalent.
In my delirium, I got silly and even tried:
=SUMIF(ISTEXT(S7:U7);0;AND(TRUNC(S7);TRUNC(T7);TRUNC(U7)))
To be clear: {2.9→2 + 2.9→2 + 2.9→2 = 6} ≠ {2.9+2.9+2.9 = 8.7→8}. I'm looking for a 6, not an 8 (I'd joke about sixes and sevens, but I'm way past pumpkin o'clock and 2.428571 takes up too much space).
My current test-kludge is:
=SUM(IF(ISNUMBER(R7);ROUNDDOWN(R7);0);IF(ISNUMBER(S7);ROUNDDOWN(S7);0);IF(ISNUMBER(T7);ROUNDDOWN(T7);0); ... ;IF(ISNUMBER(AX7);ROUNDDOWN(AX7);0))
It ends at AX7 only because of the char count. I hope to SUM the whole row in a single sweep, but that ain't gonna cut it. I could do it in large chunks in multiple cells, and then add those cells up, but oy gevalt.
Since it's already ugly anyway, I could use the following to save a few characters, but this would only mean being able to extend the range maybe 6 further cells (not much point in that):
=IF(ISTEXT(R7);0;TRUNC(R7))+IF(ISTEXT(S7);0;TRUNC(S7))+IF(ISTEXT(S7);0;TRUNC(S7))
I'm seriously considering simply going down a bunch of rows (to below my data cells) and entering the following, then copying the cell and pasting it to a complementary range, and telling the SUM cells to just sum up their respectively shadowed rows (instead of the data rows that they sit in):
=IF(ISTEXT(R7);0;TRUNC(R7))
Sorry for the rambling; I need sleep. This started as a need, then multiple failed attempts became a grudge match of principle and obstinacy, and now I'm just plugging away at it out of blind habit developed over the past 2-3 days (hopefully I won't forget what the purpose was).
In summary...: ++?????++ Out of Cheese Error +++DIVIDE BY CUCUMBER.
I'm comfortable enough with macros, though it's been ~7 years (and that was in Excel). Thanks in advance, even if the answer is that I'm stuck with one of these! 🙂
EDIT: I don't see a way to attach a .csv here (though I could open the .csv with Notepad, and copy-and-paste the contents if that would help anyone), so here's a set of pics:

Google Sheet Not Multiplying in IF Formula

I am trying to calculate a price based on rates. If the number is $20,000 or below, there is a flat rate of $700. If the number is between 20,001.01 and $50,000, the rate is 3.5% of the number. The rates continue to lower as the numbers go up. I can get Google Sheets to populate the box with $700 if it is below $20,000 but I can't seem to make it do the multiplication for me. The cell just shows C4*.035
I want it to multiply the number shown in the C4 square by the percentage listed.
Here is the code as it currently sits:
=if(AND(C4<=20000),"700",IF(AND(C4>=20000.01,C4<=50000),"C4*.035", IF(AND(C4>=50000.01,C4<=100000),"C4*.0325", IF(AND(C4>=100000.01),"C4*.03"))))
Note, I know nothing about coding so I apologize if it is sloppy or doesn't make sense. I tried to copy and format based on an example that was semi similar to mine.
try:
=IF(C4<=20000, 700,
IF(AND(C4>=20000.01, C4<=50000), C4*0.035,
IF(AND(C4>=50000.01, C4<=100000), C4*0.0325,
IF(AND(C4>=100000.01), C4*0.03))))
As BigBen noticed in his comment - there's a mistake in your formula. You should not use " " around the formula if you don't want it to be read as a string.
Actually more clean solution is using IFS formula for this task.
=ifs(C4<=20000,700,
C4<=50000,C4*0.035,
C4<=100000,C4*0.0325,
C4>100000,C4*0.03)

TF-IDF vectorizer doesn't work better than countvectorizer (sci-kit learn

I am working on a multilabel text classification problem with 10 labels.
The dataset is small, +- 7000 items and +-7500 labels in total. I am using python sci-kit learn and something strange came up in the results. As a baseline I started out with using the countvectorizer and was actually planning on using the tfidf vectorizer which I thought would work better. But it doesn't.. with the countvectorizer I get a performance of a 0,1 higher f1score. (0,76 vs 0,65)
I cannot wrap my head around why this could be the case?
There are 10 categories and one is called miscellaneous. Especially this one gets a much lower performance with tfidf.
Does anyone know when tfidf could perform worse than count?
The question is, why not ? Both are different solutions.
What is your dataset, how many words, how are they labelled, how do you extract your features ?
countvectorizer simply count the words, if it does a good job, so be it.
There is no reason why idf would give more information for a classification task. It performs well for search and ranking, but classification needs to gather similarity, not singularities.
IDF is meant to spot the singularity between one sample vs the rest of the corpus, what you are looking for is the singularity between one sample vs the other clusters. IDF smoothens the intra-cluster TF similarity.

Excel VBA help - run a series of regex find and replaces

I have a worksheet that has become very complex. On it, there is a sheet in which a user will paste data about once every other day. The data will always be in the same format, and is provided to us in an exact way only. Once pasted in, I need a way for a very average user of excel to be able to press a button (or key combo, or whatever) and excel will run a series of about 8-10 regex find and replaces. All of these will be on column A of the data. Once those are all run, a simple formula would be run on every cell C2 and below in column C. Those columns should be reduced by 80% - =C2*.8
This should all be done with minimal user input if possible.
Would anybody much more versed in regex or excel know a better direction for me to look for a proper start? What resources would be recommended to best accomplish this?
If you're multiplying by some factor, then regexp substitution will be overkill. Excel is very good at multiplying an array of numbers by 0.8.
Search for "Excel paste factor" and you'll get an easy explanation, such as this one.
I might record a macro for your less-experienced users and hope that the previous user pasted the numbers in with absolute perfection.