what is difference between IIF and DECODE in informatica? - informatica

What is difference between IIF and DECODE functions in informatica power center.

Decode can be used in Select statement whereas IIF cannot be used in a Select statement.

As far as I knew Decode will stop looking further if I finds the first match and IIF will complete the search till the end
Also you can use Decode in select clause

First of all DECODE gives you much cleaner code than nested IIFs. In addition, it's more efficient in those cases.

Decode
Finds the column value and generates the result according to the expression
Syntax: DECODE (Column_name or ‘Value’, Search1, Result1, Search2, Result2, ….., Default)
Argument Mandatory/Optional Description
Column_name or Value Mandatory Value that is to be passed to the function
Search Mandatory Argument that is to be searched
Result Mandatory Result for the search value
Default Optional Default value in case of search does not
Example1: DECODE (ID, 1, ‘US
3, ‘Australia’, ‘None’)
Input Data: ID Value
1 US
2 UK
3 Australia
4
Africa
Output Data: ID Value
1 US
2 None
3 Australia
4 None
None
**
Finds the column value and generates the result according to the expression

Related

How to convert text field with formatted currency to numeric field type in Postgres?

I have a table that has a text field which has formatted strings that represent money.
For example, it will have values like this, but also have "bad" invalid data as well
$5.55
$100050.44
over 10,000
$550
my money
570.00
I want to convert this to a numeric field but maintain the actual numbers that can be retained, and for any that can't , convert to null.
I was using this function originally which did convert clean numbers (numbers that didn't have any formatting). The issue was that it would not convert $5.55 as an example and set this to null.
CREATE OR REPLACE FUNCTION public.cast_text_to_numeric(
v_input text)
RETURNS numeric
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $BODY$
declare v_output numeric default null;
begin
begin
v_output := v_input::numeric;
exception when others then return null;
end;
return v_output;
end;
$BODY$;
I then created a simple update statement which removes the all non digit characters, but keeps the period.
update public.numbertesting set field_1=regexp_replace(field_1,'[^\w.]','','g')
and if I run this statement, it correctly converts the text data to numeric and maintains the number:
alter table public.numbertesting
alter column field_1 type numeric
using field_1::numeric
But I need to use the function in order to properly discard any bad data and set those values to null.
Even after I run the clean up to set the text value to say 5.55
my "cast_text_to_numeric" function STILL sets this to null ? I don't understand why this sets it to null, but the above statement correctly converts it to a proper number.
How can I fix my cast_text_to_numeric function to properly convert values such as 5.55 , etc?
I'm ok with disgarding (setting to NULL) any values that don't end up with numbers and a period. The regular expression will strip out all other characters... and if there happens to be two numbers in the text field, with the script, they would be combined into one (spaces are removed) and I'm good with that.
In the example of data above, after conversion, the end result in numeric field would be:
5.55
100050.44
null
550
null
570.00
FYI, I am on Postgres 11 right now

TACL How to use multiple Arguments

I was wondering if we have any TACL experts out there can can help me answer probably a very basic question.
How do you inject multiple arguments into you routine.
This is what I have currently so far
[#CASE [#ARGUMENT / VALUE job_id/number /minimum [min_job], maximum [max_job]/
otherwise]
|1|#output Job Number = [job_id]
|otherwise|
#output Bad number - Must be a number between [min_job] & [max_job]
#return
]
I have been told you need to use a second #ARGUMENT statement to get it to work but I have had no such luck getting it to work. And the PDF guides don't help to much.
Any ideas/answers would be great
Thanks.
The #CASE statement isn't required if your arguments are positional and of one type (i.e. you know what you are getting and in what order). In that case you can just use a sequence of #ARGUMENT statements to get the arguments.
In your example #ARGUMENT accepts either a number in a range or anything else - the OTHERWISE bit. The #CASE statement then tells you which of those two you got, 1 or 2.
#ARGUMENT can do data validation for you (you may recognize the output from some of the TACL routines that come with the operating system).
So you can write something like this:
SINK [#ARGUMENT / VALUE job_id/number /minimum [min_job], maximum [max_job]/]
The SINK just tosses away the expansion of the #ARGUMENT, you don't need it since you only accept a number and fail otherwise.
I figured out a way but idk if it is the best way to do it.
It seems that for one an Argument statement needs to always be in a #CASE statement so all I basically did was mirror the above and just altered it for text rather than use integer.
If you know of any other/better ways let me know :)
It find it best to use CASE when you have multiple types of argument
input to process. Kind of mocked up how I would see multiple argument
types being used in the context that you shared with the CASE
expression:
?TACL ROUTINE
#FRAME
#PUSH JOB_ID MIN_JOB MAX_JOB
#SETMANY MIN_JOB MAX_JOB , 1 3
[#DEF VALID_KEYWORDS TEXT |BODY| THISJOB THATJOB SOMEOTHERJOB]
[#CASE
[#ARGUMENT/VALUE JOB_ID/
NUMBER/MINIMUM [MIN_JOB],MAXIMUM [MAX_JOB]/
KEYWORD/WORDLIST [VALID_KEYWORDS]/
STRING
OTHERWISE
]
| 1 |
#OUTPUT VALID JOB NUMBER = [JOB_ID]
| 2 |
#OUTPUT VALID KEYWORD = [JOB_ID]
| 3 |
#OUTPUT VALID STRING = [JOB_ID]
| OTHERWISE |
#OUTPUT NOT A NUMBER, KEYWORD, OR A STRING
#OUTPUT MUST BE ONE OF:
#OUTPUT A NUMBER IN THE RANGE OF: [MIN_JOB] TO [MAX_JOB]
#OUTPUT A KEYWORD IN THIS LIST: [VALID_KEYWORDS]
#OUTPUT OR A STRING OF CHARACTERS
#RETURN
]
#OUTPUT
#OUTPUT NOW WE ARE USING ARGUMENT [JOB_ID] !!!
TIME
#UNFRAME

Randomly set one-third of na's in a column to one value and the rest to another value

I'm trying to impute missing values in a dataframe df. I have a column A with 300 NaN's. I want to randomly set 2/3rd of it to value1 and the rest to value2.
Please help.
EDIT: I'm actually trying to this on dask, which does not support item assignment. This is what I have currently. Initially, I thought I'll try to convert all NA's to value1
da.where(df.A.isnull() == True, 'value1', df.A)
I got the following error:
ValueError: need more than 0 values to unpack
As the comment suggests, you can solve this with Series.where.
The following will work, but I cannot promise how efficient this is. (I suspect it may be better to produce a whole column of replacements at once with numpy.choice.)
df['A'] = d['A'].where(~d['A'].isnull(),
lambda df: df.map(
lambda x: random.choice(['value1', 'value1', x])))
explanation: if the value is not null (NaN), certainly keep the original. Where it is null, replace with the corresonding values of the dataframe produced by the first lambda. This maps values of the dataframe (chunks) to randomly choose the original value for 1/3 and 'value1' for others.
Note that, depending on your data, this likely has changed the data type of the column.

Make =IF Function Output Numbers For "Scoring": Google Sheets

I'm am exploring methods of giving scores to different datapoints within a dataset. These points come from a mix of numbers and text string attributes looking for certain characteristics, e.g. if Col. A contains more than X number of "|", then give it a 1. If not, it gets a 0 for that category. I also have some that give the point when the value is >X.
I have been trying to do this with =IF, for example, =IF([sheet] = [Text], "1","0").
I can get it to give me 1 or 0, but I am unable to get a point total with sum.
I have tried changing the formatting of the text to both "number", "plain text", and have left it as automatic, but I can't get it to sum. Thoughts? Is there maybe a better way to do this?
FWIW - I'm trying to score based on about 12 factors.
Best,
Alex
The issue here might be that you're having the cell evaluate to either the string "0" or the string "1" rather than the number 0 or the number 1. That would explain why you're seeing the right things but the math isn't coming out right - the cell contents look like numbers, but they're really text, which the summation would then ignore.
One option would be to drop the quotation marks and write something like this:
=IF(condition, 1, 0)
This has the condition evaluate to 1 if it's true and 0 if it's false.
Alternatively, you could write something like this:
=(condition) * 1
This will take the boolean TRUE or FALSE returned by condition and convert it to either the numeric value 1 (true) or the numeric value 0 (false).

cloudsearch query to boost exact match on range

In a cloudsearch structured query.
I have a couple of fields I am searching on.
On field one, the user selects "2"
On field two the user selects "1"
I am wanting to run this as a range query, so that the results that are returned are -1 to +1
eg. on field one the range would be 1,3 and on field 2 it would be 0,2
What I am wanting to do is sort the results so that the results that match both field 1 and field 2 are at the top, and the rest under it.
eg. where field one=2 and field two =1 would be at the top and the rest are not in any specific order,
note: I do need to end up sorting the results by distance, so that all the exact matching results are in distance order, then all the rest are ordered by distance.
I am sure I can do this with 2 queries, just trying to make it work in one query if at all possible to lighten the load.
Say your fields are 'a' and 'b', and the specified values are a=2 and b=1 (as in your example, except I've named the fields 'a' and 'b' instead of 'one' and 'two'). Here are the various terms of your query.
Range Query
This is the query for the range a±1 and b±1 where a=2 and b=1:
q=(and (range field=a[1,3]) (range field=b[0,2]))
Rank Expression
For your rank expression, compute a distance-based score using absolute value so that scores 'a' and 'b' can't cancel each other out (like a=3,b=0 would, for example):
expr.rank1=abs(a-2)+abs(b-1)
Sort by Rank
That defined a ranking expression named rank1, which we now want to sort by, starting with the lowest values ('0' means a=2,b=1):
sort=rank1 asc
Return the Rank
For debugging purposes, you may want return the ranking score:
return=rank1
Put all those terms together and you've got your query.
Further Potentially-Useful Things
If you want to get fancy and penalize things in a non-linear way, you can use exp. For example, if you want to differentiate between 'a' and 'b' both being off by 1 vs 'a' being an exact match and 'b' being off by 2 (eg a=3,b=2 will rank ahead of a=2,b=3 even though the previous ranker would give them both a score of 2):
expr.rank1=exp(abs(a-2))+exp(abs(b-1))
And you can use boolean logic and the ternary operator to detect and prefer certain results that meet certain criteria, eg to give a big boost when 'a' and 'b' are on-target, a smaller boost when 'a' or 'b' is on target, etc (since we're sorting in low-to-high, a boost in rank is actually achieved by adding less to the result):
((a==1&&b==2)?0:100)+((a==1||b==2)?0:1000)+abs(a-1)+abs(b-2)
See http://docs.aws.amazon.com/cloudsearch/latest/developerguide/configuring-expressions.html