Can you assist with wrong informatica results? - informatica

Hello I am new to informatica, and was trying to write a decode or if, where only one value goes into the output port... a value that is not equal is going into the output port... i iam positive the value is not equal... so my decode doesn't work and me trying to do a nested if doesn't work... the decode is as follows...
decode (true
testopen = testdoor1, 'open', 'close',
testopen = testdoor2, 'open', 'close',
testopen = testdoor3, 'open', 'close')
Although i know testdoor3 is the only one that equals testopen.. testdoor3 is populated..
i tested it with one test on testdoor2 ( only one expression in the decode and it still populates testopen with testopen2) the testdoor2 and testdoor1 have nulls
i then tried with an iif but got stuck afteer the first
iif (testopen = testdoor1, 'equal','noequal')
but i couldn't figure out how to do it with 3 iif statements.. your help is appreciated.
decode (true
testopen = testdoor1, 'open', 'close',
testopen = testdoor2, 'open', 'close',
testopen = testdoor3, 'open', 'close')
or
iif (testopen = testdoor1, 'equal','noequal')
your assistance is appreciated!

Here is the syntax of DECODE from Informatica documentation
DECODE( value, first_search, first_result [, second_search, second_result]...[,default] )
I would suggest you to add what your input values are and what the expected output is.
By looking at your code, it seems you are not providing correct input to the intended positions/parameters. Let's say the ports have below values
testopen='Yes'
testdoor1='Yes'
testdoor2='No'
testdoor3='Yes'
Let's see after replacing your decode with the actual value from the port and compare it with the syntax. Notice the (second_search) where you are not comparing the ports but just the value 'close'. This is why your code is behaving weirdly.
Note: NOTICE the missing comma after 'true' in your original post
decode(true,
'Yes' = 'No' (first_search), 'open' (first_result), 'close' (second_search),
'Yes' = 'Yes' (second_result), 'open' (third_search), 'close' (third_result),
'Yes' = 'No' (forth_search), 'open' (forth_result), 'close'(default))
Try using similar to below code. This is again from Informatica documentation.
DECODE( TRUE,
Var1 = 22, 'Variable 1 was 22!',
Var2 = 49, 'Variable 2 was 49!',
Var1 < 23, 'Variable 1 was less than 23.',
Var2 > 30, 'Variable 2 was more than 30.',
'Variables were out of desired ranges.')

DECODE takes a value, pairs of search and result values and, optionally,a default.
You seem to be using triplets instead of pairs of parameters. What you have actually written is:
decode (true
testopen = testdoor1, 'open',
'close', testopen = testdoor2,
'open', 'close',
testopen = testdoor3,
'open', 'close')

Related

Set missing column values to a default using AWS Glue Jobs

I'm trying to extract a dataset from dynamodb to s3 using Glue. In the process I want to select a handful of columns, then set a default value for any and all rows/columns that have missing values.
My attempt is currently to use the "Map" function, but it doesn't seem to be calling my method.
Here is what I have:
def SetDefaults(rec):
print("checking record")
for col in rec:
if not rec[col]:
rec[col] = "missing"
return rec
## Read raw(source) data from target DynamoDB
raw_data_dyf = glueContext.create_dynamic_frame_from_options("dynamodb", {"dynamodb.input.tableName" : my_dynamodb_table, "dynamodb.throughput.read.percent" : "0.50" } )
## Get the necessary columns
selected_data_dyf = ApplyMapping.apply(frame = raw_data_dyf, mappings = mappingList)
## get rid of null values
mapped_dyF = Map.apply(frame=selected_data_dyf, f=SetDefaults)
## write it all out as a csv
datasink = glueContext.write_dynamic_frame.from_options(frame=mapped_dyF , connection_type="s3", connection_options={ "path": my_train_data }, format="csv", format_options = {"writeHeader": False , "quoteChar": "-1" })
My ApplyMapping.apply call is doing the right thing, where mappingList is defined by a bunch of:
mappingList.append(('gsaid', 'bigint', 'gsaid', 'bigint'))
mappingList.append(('objectid', 'bigint', 'objectid', 'bigint'))
mappingList.append(('objecttype', 'bigint', 'objecttype', 'bigint'))
I have no errors, everything runs to completion. My data is all in s3, but there are many empty values still, rather than the "missing" entry I would like.
The "checking record" print statement never prints out. What am I missing here?
Alternative solution:
Convert DynamicFrame to Spark DataFrame
Use the DataFrame's fillna() method to fill the null values
Convert the DataFrame back to DynamicFrame

Pandas: SettingWithCopyWarning, trying to understand how to write the code better, not just whether to ignore the warning

I am trying to change all date values in a spreadsheet's Date column where the year is earlier than 1900, to today's date, so I have a slice.
EDIT: previous lines of code:
df=pd.read_excel(filename)#,usecols=['NAME','DATE','EMAIL']
#regex to remove weird characters
df['DATE'] = df['DATE'].str.replace(r'[^a-zA-Z0-9\._/-]', '')
df['DATE'] = pd.to_datetime(df['DATE'])
sample row in dataframe: name, date, email
[u'Public, Jane Q.\xa0' u'01/01/2016\xa0' u'jqpublic#email.com\xa0']
This line of code works.
df["DATE"][df["DATE"].dt.year < 1900] = dt.datetime.today()
Then, all date values are formatted:
df["DATE"] = df["DATE"].map(lambda x: x.strftime("%m/%d/%y"))
But I get an error:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-
versus-copy
I have read the documentation and other posts, where using .loc is suggested
The following is the recommended solution:
df.loc[row_indexer,col_indexer] = value
but df["DATE"].loc[df["DATE"].dt.year < 1900] = dt.datetime.today() gives me the same error, except that the line number is actually the line number after the last line in the script.
I just don't understand what the documentation is trying to tell me as it relates to my example.
I started messing around with pulling out the slice and assigning to a separate dataframe, but then I'm going to have to bring them together again.
You are producing a view when you df["DATE"] and subsequently use a selector [df["DATE"].dt.year < 1900] and try to assign to it.
df["DATE"][df["DATE"].dt.year < 1900] is the view that pandas is complaining about.
Fix it with loc like this:
df.loc[df.DATE.dt.year < 1900, "DATE"] = pd.datetime.today()
My thought would be that you could do
df.loc[df.DATE.dt.year < 1900, "DATE"] = dt.datetime.today()
df.loc[:, "DATE"] = df.DATE.map(lambda x: x.strftime("%m/%d/%y")
Not at a computer so I can't test but I think that should do it.

Value Error while importing data into postgres table using psycopg2

I have a tuple as below
data = ({'weather station name': 'Substation', 'wind': '6 km/h', 'barometer': '1010.3hPa', 'humidity': '42%', 'temperature': '34.5 C', 'place_id': '001D0A00B36E', 'date': '2016-05-10 09:48:58'})
I am trying to push the values from the above tuple to the postgres table using the code below:
try:
con = psycopg2.connect("dbname='WeatherForecast' user='postgres' host='localhost' password='****'")
cur = con.cursor()
cur.executemany("""INSERT INTO weather_data(temperature,humidity,wind,barometer,updated_on,place_id) VALUES (%(temperature)f, %(humidity)f, %(wind)f, %(barometer)f, %(date)s, %(place_id)d)""", final_weather_data)
ver = cur.fetchone()
print(ver)
except psycopg2.DatabaseError as e:
print('Error {}'.format(e))
sys.exit(1)
finally:
if con:
con.close()
Where datatype of each field in the DB is as below:
id serial NOT NULL,
temperature double precision NOT NULL,
humidity double precision NOT NULL,
wind double precision NOT NULL,
barometer double precision NOT NULL,
updated_on timestamp with time zone NOT NULL,
place_id integer NOT NULL,
When i run the code to push the data into postgres table using psycopg2, it is raising an error "ValueError: unsupported format character 'f'"
I hope the issue is in formatting. Am using Python3.4
Have a look at the documentation:
The variables placeholder must always be a %s, even if a different placeholder (such as a %d for integers or %f for floats) may look more appropriate:
>>> cur.execute("INSERT INTO numbers VALUES (%d)", (42,)) # WRONG
>>> cur.execute("INSERT INTO numbers VALUES (%s)", (42,)) # correct
While, your SQL query contains all type of placeholders:
"""INSERT INTO weather_data(temperature,humidity,wind,barometer,updated_on,place_id)
VALUES (%(temperature)f, %(humidity)f, %(wind)f, %(barometer)f, %(date)s, %(place_id)d)"""

TypeError when inserting time into xlsxwriter

I'm importing two points of data from MySQLdb. The second point is a time which cursor.fetchall() returns as a timedelta. I had no luck trying to insert that info into xlsxwriter, always getting a "TypeError: Unknown or unsupported datetime type" error.
Ok... round 2
Now I'm trying to convert the timedelta into a datetime.datetime object:
for x in tempList:
timeString = str(x[1])
ctTime.append(datetime.datetime.strptime(timeString,"%H:%M:%S))
Now in xlsxwriter, I setup formatting:
ctChart.set_x_axis({'name': 'Time', 'name_font': {'size': 14, 'bold': True}, 'num_font': {'italic': True},'date_axis': True})
And then I create a time format:
timeFormat = workbook.add_format({'num_format': 'hh:mm:ss'})
Then I attempt to insert data:
ctWorksheet.write_datetime('A1',ctTime,timeFormat)
But no matter what I do, no matter how I format the data, I always get the following error:
TypeError: Unknown or unsupported datetime type
Is there something stupidly obvious I'm missing?
******* EDIT 1 *******
jmcnamara - In response to your comment here are more details:
I've tried using a list of time deltas such as datetime.timedelta(0, 27453) which when printed is 7:37:33 using the following code:
timeFormat = workbook.add_format({'num_format': 'hh:mm:ss'})
ctWorksheet.write_datetime('A1',ctTime,timeFormat)
I still get the error: TypeError: Unknown or unsupported datetime type
Even iterating through the list and attempting to insert the results fails:
timeFormat = workbook.add_format({'num_format': 'hh:mm:ss'})
i = 0
for t in ctTime:
ctWorksheet.write_datetime(i,0,t,timeFormat)
i += 1
I finally got it working with my most recent code. The chart still isn't graphing correctly using the inserted times, but at least they are inserting correctly.
Since I was pulling the timedeltas from SQL, I had to change their format first. Raw timedeltas from SQL just weren't working:
for x in templist:
timeString = datetime.datetime.strptime(str(x[1]),"%H:%M:%S")
ctTime.append(timeString)
With those datetime.strptime formatted times I was able to then successfully insert into the worksheet.
timeFormat = workbook.add_format({'num_format': 'hh:mm:ss'})
i = 0
for t in ctTime:
ctWorksheet.write_datetime(i,0,t,timeFormat)
i += 1
The GitHub master version of XlsxWriter supports datetime.timedelta.
Try it out and let me know if it works. It will probably be uploaded to PyPI in the next week.

multiple if's for calculated field in tableau

please pardon the absolutely newbie question but i'm very new to tableau.
what I'd like to do is create a message based on which filter flags are active. so, in psuedo code, i'd do something like this:
message = ''
if filter1 == 1:
message += 'filter 1 is active'
if filter2 == 1:
message += ' filter 2 is active'
return message
problem is, I'm not even sure how to do multiple if statements - i keep getting a syntax error. Any help will be greatly appreciated.
Here is an example of how I accomplished something similar:
IF [ZAVUFA1_FED_COLL_CHOICE_1] = 'xxxxx' THEN 1
ELSEIF [ZAVUFA1_FED_COLL_CHOICE_2] = 'xxxxx' THEN 2
ELSEIF [ZAVUFA1_FED_COLL_CHOICE_3] = 'xxxxx' THEN 3
ELSEIF [ZAVUFA1_FED_COLL_CHOICE_4] = 'xxxxx' THEN 4
ELSEIF [ZAVUFA1_FED_COLL_CHOICE_5] = 'xxxxxx' THEN 5
ELSEIF [ZAVUFA1_FED_COLL_CHOICE_6] = 'xxxxx' THEN 6
ELSEIF [ZAVUFA1_FED_COLL_CHOICE_7] = 'xxxxxx' THEN 7
ELSEIF [ZAVUFA1_FED_COLL_CHOICE_8] = 'xxxxxx' THEN 8
ELSEIF [ZAVUFA1_FED_COLL_CHOICE_9] = 'xxxxx' THEN 9
ELSEIF [ZAVUFA1_FED_COLL_CHOICE_10] = 'xxxxxx' THEN 10
ELSEIF ISNULL([ZAVUFA1_FED_COLL_CHOICE_1]) THEN 99
END
As much as I love stackoverflow, Tableau also has a great user forum on their site.
You would create a calculated field called message with this code:
IF filter1 = 1 THEN 'filter 1 is active' END
+ IF filter2 = 1 THEN ' filter 2 is active' END
what I ended up doing is creating a calculated field for each if statement. I then created yet another calculated field that concatenates all of the output from each of the first set of calculated fields I created. Seems like a bit of a hack so If anyone knows of a more elegant way of doing this (making a calculated field of a series of calulated fields seems awfully kludgy) I'd be glad to pass on the points for answering.