How to index match a condition set in a cell - if-statement

I am trying to avoid having a multiple if formula by index matching a table instead, however what i need to match is the actual condition and a string.
Lookup table:
+---+------------------+-------------------+-------+
| | A | B | C |
+---+------------------+-------------------+-------+
| 1 | Current to Prior | Portfolio Comment | Error |
| 2 | =0 | "" | 1 |
| 3 | <>0 | "" | -1 |
| 4 | >0 | OK – Losses | 0 |
| 5 | <0 | OK – Losses | 1 |
| 6 | <0 | OK – New Sales | 0 |
| 7 | >0 | OK – New Sales | 1 |
+---+------------------+-------------------+-------+
Column A: Lookup Condition
Column B: Lookup string
Column C: Return value
Data example with correct hard coded output (column C):
+---+------------------+-------------------+-------+
| | A | B | C |
+---+------------------+-------------------+-------+
| 1 | Current to Prior | Portfolio comment | Error |
| 2 | 0 | | 1 |
| 3 | -100 | OK – Losses | 1 |
| 4 | 50 | | -1 |
| 5 | 200 | OK – Losses | 0 |
| 6 | 0 | | 1 |
| 7 | -400 | OK – New Sales | 0 |
| 8 | 0 | | 1 |
+---+------------------+-------------------+-------+
Column A: Data value
Column B: Data string
Column C: Output formula
I need a formula that matches the data value with the lookup condition, the data string with the lookup string and outputs the return value.

I know you weren't necessarily asking for a VBA solution, but myself (and many others) prefer using UDFs as, in my opinion, it makes reading formulas easier and cleaner - plus you can do without the helper cells.
We start off your UDF by creating a Select Case Statement. We could choose to use either the Numerical Value or String for the cases. I decided to go with the string.
Within each case, you will compare the numerical values provided to the lngCondition parameter, which will ultimately return the value to the function.
Since you didn't have any cases for when textual values could have a lngCondition = 0, I made it return a worksheet error code #VALUE, just as you'd expect from any other Excel formula. This is the reason for the UDF having a variant return type.
Public Function ReturnErrorCode(lngCondition As Long, strComment As String) As Variant
Select Case strComment
Case ""
If lngCondition = 0 Then
ReturnErrorCode = 1
Else
ReturnErrorCode = -1
End If
Case "OK - Losses"
If lngCondition > 0 Then
ReturnErrorCode = 0
ElseIf lngCondition < 0 Then
ReturnErrorCode = 1
Else
' Your conditions don't specify that 'OK - Losses'
' can have a 0 value
ReturnErrorCode = CVErr(xlErrValue)
End If
Case "OK - New Sales"
If lngCondition < 0 Then
ReturnErrorCode = 0
ElseIf lngCondition > 0 Then
ReturnErrorCode = 1
Else
' Your conditions don't specify that 'OK - New Sales'
' can have a 0 value
ReturnErrorCode = CVErr(xlErrValue)
End If
Case Else
ReturnErrorCode = CVErr(xlErrValue)
End Select
End Function
You would then use this formula in the worksheet as such:
=ReturnErrorCode(A1, B1)
Great! But I have no knowledge of VBA and don't know how to add a UDF.
First, you need to open the VBA Editor. You can do this by simultaneously pressing Alt + F11.
Next, you need to create a standard code module. In the VBE, click Insert then select Module (NOT Class module!).
Then copy the code above, and paste it into the new code module you just created.
Since you have now added VBA code to your workbook, you now need to save it as a macro-enabled workbook the next time you save.

Related

Pandas group_by string column which values contained in a separate list

I have a hierarchy-based event stream, where each hierarchy parent node(represented as level0/1) has multiple children (level0(0/1/2) and sub child (level00(0/1/2)). "level" is just a placeholder, each hierarchy level has its own unique name. The only rule is that a parent node hierarchy string is always included in the child's hierarchy string name. Assume that this event stream has 300k and more entries.
| index | hierarchystr |
| ----- | --------------------- |
| 0 | level0level00level000|
| 1 | level0level01 |
| 2 | level0level02level021|
| 3 | level0level02level021|
| 4 | level0level02level020|
| 5 | level0level02level021|
| 6 | level1level02level021|
| 7 | level1level02level021|
| 8 | level1level02level021|
| 9 | level2level02level021|
Now I want to do an inclusive group_by by a separate list and the line should be included if the string in the array is included in the string of the hierarchystr column, expected output (beware hstrs is every time in a different order!):
#hstrs = ["level0", "level1", "level0level01", "level0level02", "level0level02level021"]
|index| 0 | Count |
|-----|---------------------|-------|
|0 |level0 | 6 |
|1 |level1 | 3 |
|2 |level0level01 | 1 |
|3 |level0level02 | 4 |
|4 |level0level02level021| 3 |
I tried the following solutions, but all are slow as hell:
#V1
for hstr in hstrs:
s = df[df.hierarchystr.str.contains(hstr)]
s2 = s.count()
s3 = s2.values[0]
if s3 > 200:
beforeset.append(hstr)
#V2
for hstr in hstrs:
s = df.hierarchystr.str.extract('(' + hstr + ')', expand=True)
s2 = s.count()
s3 = s2.values[0]
if s3 > 200:
list.append(hstr)
#V3 - fastest, but also slow and not satisfying
containing =[item for hierarchystr in df.hierarchystr for item in hstrs if item in hierarchystr]
containing = Counter(containing)
df1 = pd.DataFrame([containing]).T
nodeNamesWithOver200 = df1[df1 > 200].dropna().index.values
I also tried versions for all variables at once with pat and extract, but in return the size per group changes in every run, because the list hstrs is every run in a different order.
df.hierarchystr.extract[all](pat="|".join(hstrs))
Is there a regex and method possible that do this task in one step so this is also applicable for huge data frames at an appropriate time - that not depending on the order of the hstrs array?
You can try:
count = [df['hierarchystr'].str.startswith(hstr).sum() for hstr in hstrs]
out = pd.DataFrame({'hstr': hstrs, 'count': count})
print(out)
# Output
hstr count
0 level0 6
1 level1 3
2 level0level01 1
3 level0level02 4
4 level0level02level021 3

PowerQuery - Fill missing data according to specific pattern

I am trying to clean data received from an Excel file and transform it using PowerQuery (in PowerBI) into a useable format.
Below a sample table, and what I am trying to do:
| Country | Type of location |
|--------- |------------------ |
| A | 1 |
| | 2 |
| | 3 |
| B | 1 |
| | 2 |
| | 3 |
| C | 1 |
| | 2 |
| | 3 |
As you can see, I have a list of location types for each country (always constant, always the same number per country, ie each country has 3 rows for 3 location types)
What I am trying to do is to see if there is a way to fill the empty cells in the "Country" column, with the appropriate Country name, which would give something like this:
| Country | Type of location |
|--------- |------------------ |
| A | 1 |
| A | 2 |
| A | 3 |
| B | 1 |
| B | 2 |
| B | 3 |
| C | 1 |
| C | 2 |
| C | 3 |
For now I thought about using a series of if/else if conditions, but as there are 100+ countries this doesn't seem like the right solution.
Is there any way to do this more efficiently?
As Murray mentions, the Table.FillDown function works great and is built into the GUI under the Transform tab in the query editor:
Note that it only fills down to replace nulls, so if you have empty strings instead of nulls in those rows, you'll need to do a replacement first. The button for that is just above the Fill button in the GUI and you'd use the dialog box like this
or else just use the M code that this generates instead of the GUI:
= Table.ReplaceValue(#"Previous Step","",null,Replacer.ReplaceValue,{"Country"})
Yes, like you can do in Excel, you can fill down.
From the docs - Table.FillDown
I believe you will need to sort the data correctly first.
Table.FillDown(
Table.FromRecords({
[Place = 1, Name = "Bob"],
[Place = null, Name = "John"],
[Place = 2, Name = "Brad"],
[Place = 3, Name = "Mark"],
[Place = null, Name = "Tom"],
[Place = null, Name = "Adam"]
}),
{"Place"}
)

How to detect specific subwords in text

I have a column as a string with no spaces:
clear
input str100 var
"ihaveanewspaper"
"watchingthenewsonthetv"
"watchthenewsandreadthenewspaper"
end
I am using the following command:
gen = regex,(var, "(news)")
This outputs 1 1 1 because it finds that the 3 rows in the column var contain the word news.
I'm trying to alter the regular expression "(news)" to create two columns. One for news and one for newspaper. regexm(var, "(newspaper)") makes sure that the row contains a newspaper, but I need a command to make sure characters after news are not "paper" as I'm trying to quantify the two.
EDIT:
Is there a way to count the third entry as 1, because it has a news occurrence without however being a newspaper?
You can quantify as follows without a regular expression:
clear
input str100 var
"ihaveanewspaper"
"watchingthenewsonthetv"
"watchthenewsandreadthenewspaper"
"fdgdnews"
"fgogodigjhoigjnewspaper"
"fgeogeionnewsfgdgfpaper"
"45pap9358newsfjfgni"
end
generate news = strmatch(var, "*news*") & !strmatch(var, "*newspaper*")
list, separator(0)
+----------------------------------------+
| var news |
|----------------------------------------|
1. | ihaveanewspaper 0 |
2. | watchingthenewsonthetv 1 |
3. | watchthenewsandreadthenewspaper 0 |
4. | fdgdnews 1 |
5. | fgogodigjhoigjnewspaper 0 |
6. | fgeogeionnewsfgdgfpaper 1 |
7. | 45pap9358newsfjfgni 1 |
+----------------------------------------+
count if news
4
count if !news
3
EDIT:
One way to do this is to eliminate all instances of the word newspaper and repeat the process:
generate var2 = subinstr(var, "newspaper", "", .)
replace news = 1 if strmatch(var2, "*news*")
list, separator(0)
+------------------------------------------------------------------+
| var news var2 |
|------------------------------------------------------------------|
1. | ihaveanewspaper 0 ihavea |
2. | watchingthenewsonthetv 1 watchingthenewsonthetv |
3. | watchthenewsandreadthenewspaper 1 watchthenewsandreadthe |
4. | fdgdnews 1 fdgdnews |
5. | fgogodigjhoigjnewspaper 0 fgogodigjhoigj |
6. | fgeogeionnewsfgdgfpaper 1 fgeogeionnewsfgdgfpaper |
7. | 45pap9358newsfjfgni 1 45pap9358newsfjfgni |
+------------------------------------------------------------------+
count if news
5
count if !news
2

Postgres: Window Function row_number() wrong output?

i have a confussing problem here. I'm working with some arrays and trying to get the 10 minors values from all of them merged as well as the array they are within and the position they are inside such array.
My relation is arrays(id int, array float[]);
So, on it i have several stored arrays:
1, '{v1,v2,v3,v4,v5...}'
2, '{v1,v2,v3,v4,v5...}'...etc
My first query is next:
WITH T1 AS(SELECT id, unnest(array) value from arrays order by value LIMIT 10)
SELECT T1.id as id, cell(array,value) as offset, value from T1;
In this case cell() is an UDF i developed to return the position given an array and an arbitrary value.
The second query (using w-functions) is next:
WITH T1 AS(SELECT id, unnest(array) value from arrays)
SELECT id, row_number() over (partition by sid) as offset, value from T1 order by value LIMIT 10;
Despite they both return the same values (which is correct), the offset is not the same and seems they are somehow upside-down.
These are some examples outputs with bigger arrays im working with, and you can see the problemim having.
Query 1 output:
id | offset | value
-----+--------+-----------
1 | 17569 | 0.0156216
1 | 20801 | 0.0164499
1 | 20802 | 0.0171007
1 | 17570 | 0.0171008
1 | 17568 | 0.0180476
1 | 20800 | 0.0182249
1 | 20803 | 0.0194675
1 | 1411 | 0.02142
1 | 1412 | 0.02142
1 | 1413 | 0.0215976
Query 2 output:
id | offset | value
-----+--------+-----------
1 | 6591 | 0.0156216
1 | 9823 | 0.0164499
1 | 9824 | 0.0171007
1 | 6592 | 0.0171008
1 | 6590 | 0.0180476
1 | 9822 | 0.0182249
1 | 9825 | 0.0194675
1 | 26144 | 0.02142
1 | 26140 | 0.02142
1 | 26149 | 0.0215976
I would appreciate any help please. Thank you!
You haven't got an order specified in your window function in Query 2, which means that Postgres will probably be internally sorting by sid - before you outer ORDER BY value is applied.
WITH t1 AS (
SELECT id, UNNEST( array ) AS value
FROM arrays
)
SELECT id, row_number() OVER ( PARTITION BY sid ORDER BY value ) as offset, value
FROM t1
ORDER BY value
LIMIT 10;

Show count of columns distinct values

Hello my fellow colleagues from StackOverflow!
I will be brief, and cut to the point:
I have a table in MS Access, it contains 2 columns of interest- County, and TGTE (Type Of Geothermal Energy ). Column TGTE is of type VARCHAR and it can have 1 of two values, to make it easier let's say it is either L or H.
I need to create SQL query that shows a result which is described bellow:
Bellow is the part of the table:
County | TGTE | ... |
First | L |
First | L |
First | H |
Second | H |
Third | L |
__________________
I need a resulting query that shows the count of distinct TGTE in every County like this:
County | TGTE = L | TGTE = H |
First | 2 | 1 |
Second | 0 | 1 |
Third | 1 | 0 |
__________________________________
How can I create query that displays the desired result described above ?
NOTE:
I have browsed through archive, and found similar things, but nothing to help me.
To be honest, I do not know how to formulate the question properly, so I guess that is why Google couldn't be of much help...
I have tried with this:
SELECT County, COUNT(TGTE) as [Something]
FROM MyTable
WHERE TGTE = "L"
GROUPBY COUNTY;
but this is the result I get:
County | TGTE = L |
First | 2 |
Second | 0 |
Third | 1 |
__________________________________
If I change L to H, in the query above, I get this:
County | TGTE = H |
First | 1 |
Second | 1 |
Third | 0 |
__________________________________
I work on Windows XP, in C++, using ADO to access an MS Access 2007 database.
If there is anything else that I can do to help, ask and I will gladly do it.
EDIT #1:
After trying Declan's solution this is what I get:
Values in main table:
| County | TGTE |
| Стари Град | H |
| Сурчин | L |
| Стари Град | H |
| Савски Венац | H |
| Раковица | H |
Output :
| County | TGTE = L | TGTE = H |
| Раковица | 1 | 1 |
| Савски Венац | 1 | 0 |
| Сурчин | 1 | 0 |
| Стари Град | 1 | 0 |
It should output this:
| County | TGTE = L | TGTE = H |
| Раковица | 1 | 0 |
| Савски Венац | 1 | 0 |
| Сурчин | 0 | 1 |
| Стари Град | 2 | 0 |
EDIT #2:
On Declan's request, here is the original query I use:
wchar_t *query = L"select Општина, \
sum( iif( Тип_геотермалне_енергије =
'Хидрогеотермална енергија', 1, 0 ) ) as [HGTE], \
sum( iif( Тип_геотермалне_енергије =
'Литогеотермална енергија', 1, 0 ) ) as [LGTE] \
from Објекат \
group by Општина; ";
Translated to our example, it looks like this:
wchar_t *query = L"select County, \
sum( iif( TGTE = 'H', 1, 0 ) ) as [HGTE], \
sum( iif( TGTE = 'L', 1, 0 ) ) as [LGTE] \
from MyTable \
group by County; ";
EDIT #3:
After I copy the above query in Access and run it, everything works fine, thus I believe that the problem lies in in usage of ADO.
EDIT #4:
After browsing through Internet, I am sure that problem is ADO.
How can I use IIF() in ADO so my query can work?
If it can't be done, than how to modify y query to do what I have described above?
You need to use the iif function within the two additional columns. Here is some pseudo code to get you started.
SELECT County
,sum(iif(TGTE = "L",1,0)) as [L_Count]
,sum(iif(TGTE = "H",1,0)) as [H_Count]
FROM MyTable
GROUP BY
COUNTY;
I have reworked Deslan's query like bellow, and it works:
SELECT County
,sum( switch( ТGTE = 'L', 1, TGTE = 'H', 0 ) ) as [L_Count]
,sum( switch( ТGTE = 'H', 1, TGTE = 'L', 0 ) ) as [H_Count]
FROM MyTable
GROUP BY
County;
Everything works fine, when I run it through ADO and MS Access 2007.
I do not understand why IIF() isn't working in ADO, maybe it is not supported or something...
Thank you Declan anyway, for your solution.You have +1 from me.