I want to convert the string 20160101000000 into datetime format using expression. I have used below date function
TO_DATE(PERIOD_END_DATE),'MM/DD/YYYY HH24:MI:SS')
But my table file is not loading. My session and workflow gets succeed. My target and source is also flatfile.
I want to change the string 20160101000000 into MM/DD/YYYY HH24:MI:SS for loading data into my target table.
You need to give exact format that looks so that to_date function can understand that format and converts it into date.
TO_DATE(PERIOD_END_DATE,'YYYYMMDDHH24MISS')
So here your date looks like YYYYMMDDHH24MISS (20160101000000).
There is often confusion with the TO_DATE function... it is in fact for converting a string into a date and the function itself is to describe the pattern of the incoming date. Now if you want to convert a date field to a specified date format you must use TO_CHAR
Related
I have a big CSV text file uploaded weekly to an S3 path partitioned by upload date (maybe not important). The schema of these files are all the same, the formatting is all the same, the naming conventions are all the same. Each file contains ~100 columns and ~1M rows of mixed text/numeric types. The raw data looks like this:
id,date,string,int_values,double_values
"6F87U",2021-03-21,"Text",0,1.1483
"8DU87",2021-03-22,"More text, oh yes",1,2.525
"79LO2",2021-03-23,"Moar, give me moar, text",2,3.485489
When I run a Crawler with everything default, querying with Athena like so:
select * from tb_csv_data
...the results in Athena are thus:
id
date
string
int_values
double_values
"6F87U"
2021-03-21
"Text"
0
1.1483
"8DU87"
2021-03-22
"More text
oh yes"
1
"79LO2"
2021-03-23
"Moar
give me moar
text
The problem at this level seems to be with proper detection (read: ignoring) of commas as delimiters within quotation marks. So I have a CSV classifier with the following characteristics that I have attached to the Crawler, I run the Crawler again with the classifier attached, and the resulting table properties are thus:
Input format org.apache.hadoop.mapred.TextInputFormat
Output format org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Serde serialization lib org.apache.hadoop.hive.serde2.OpenCSVSerde
Serde parameters
quoteChar "
separatorChar ,
Table properties
sizeKey 4356512114
objectCount 3
UPDATED_BY_CRAWLER crawler-name
CrawlerSchemaSerializerVersion 1.0
recordCount 3145398
averageRecordSize 1384
CrawlerSchemaDeserializerVersion 1.0
compressionType none
columnsOrdered true
areColumnsQuoted true
delimiter ,
typeOfData file
The resulting table with the same simple Athena query as above seems to be correct:
id
date
string
int_values
double_values
6F87U
2021-03-21
Text, yes
0
1.1483
8DU87
2021-03-22
More text, oh yes
1
2.525
79LO2
2021-03-23
Moar, give me moar, text
2
3.485489
The expected automatic inference of data types is supposed to be this (let's simplify and presume the date is correct as a string):
Column name
Data type
id
string
date
string
string
string
int_values
bigint (or long)
double_values
double
...but instead they're all strings!
Column name
Data type
id
string
date
string
string
string
int_values
string
double_values
string
I need this data to be accurately queryable from Athena as it is, where it is, so what can I do without further processing of the raw data? I suppose I could manually adjust the table properties in the Console but is that really correct when I need the entire pipeline to be automated? I also want to avoid having to cast types in queries 80+ times for each field as most of these columns are numeric. What can I do?
Thank you!
The limitation arrives from the serde that you are using in your query. Refer to note section in this doc which has below explanation :
When you use Athena with OpenCSVSerDe, the SerDe converts all column types to STRING. Next, the parser in Athena parses the values from STRING into actual types based on what it finds. For example, it parses the values into BOOLEAN, BIGINT, INT, and DOUBLE data types when it can discern them. If the values are in TIMESTAMP in the UNIX format, Athena parses them as TIMESTAMP. If the values are in TIMESTAMP in Hive format, Athena parses them as INT. DATE type values are also parsed as INT.
For date type to be detected it has to be in UNIX numeric format, such as 1562112000 according to the doc.
I have the date of particular timezone, and I want to convert it to the GMT timezone, and then it needs to be inserted into DB using esql of MQ. Please help to resolve this issue.
If you want to convert a date from a format to another, you can do the following :
DECLARE inDate DATE;
DECLARE outDate DATE;
DECLARE tempDate DATE;
DECLARE patternIN CHARACTER 'yyyy-MM-dd';
DECLARE patternOUT CHARACTER 'yyMMdd';
SET tempDate = CAST(inDate AS DATE FORMAT patternIN);
-- Convert input String as Date (should match patternIN)
SET outDate = CAST(tempDate AS CHARACTER FORMAT patternOUT)
-- Convert the date object to the desired date format
Of course you need to be able to define your date pattern. I know you might need to separate the DATE from the TIME, but the object are exactly the same. A quick example of a specific cast :
CAST(CURRENT_DATE AS CHARACTER FORMAT 'yyyy-MM-dd') || 'T' || CAST(CURRENT_TIME AS CHARACTER FORMAT 'HH:mm:SS')
This will generate a date in the XML format, e.g : 2019-08-28T16:46:32
I have a date string that fails to import because it is in a different format to that expected my the machines locale (i.e. US dates to a UK machine).
How do I tell DAX to convert this string into a date, but using a specified format or locale, different to the machines default.
For example, I would like to import
3/27/2008 11:07:31 AM
as
27/3/2008 11:07:31 AM
You have two options.
First option, use the basic Formatting tab functionality in Power BI.
Select the column and use the below settings in the Formatting tab:
Second option (recommended), use PowerQuery to import the text column in datetime data type.
The following expression will split the text by "/" character, then will convert dd/mm/yyyy string to the datetime data type.
Table.AddColumn(#"Changed Type", "DateTime",
each Text.Split([#"#(001A)Date Import"],"/"){1} & "/"
& Text.Split([#"#(001A)Date Import"],"/"){0} & "/" &
Text.Split([#"# (001A)Date Import"],"/"){2})
In this case I've added an additional column in order to import the column in the required datetime type, you can apply the changes to the same column though.
Date import column is the actual text column, DateTime is the column I've added to import Date Importas Datetime type.
If you get stuck check the official documentation about PowerQuery.
Let me know if this helps.
I think the most practical solution is in the Query Editor, but complex formula are not required.
I would Right-click the column and choose Change Type / Using Locale. Then I would specify Data Type = Date and Locale = English (United States).
What is the difference between dateformat() and createODBCDate() in ColdFusion? Are these two functions the same or not? When do I need to use DateFormat() and when do I need to use createODBCDate()?
dateFormat() accepts a date and a format 'mask' and returns a string of the date, in the format passed.
For example, consider the following code:
mydate = dateFormat( now(), 'yyyy-mm-dd' );
Assuming the date is July 15, 2014 (which it was when I wrote this) the value of the variable named 'mydate' would be '2014-07-15' (without the quotes). So, you need to pass a date to the function.
createODBCDate() creates an actual date from the values passed - it does not format the date, it merely creates a date 'object'
dateFormat() is typically used to display a date in a user friendly manner. Try running this writeDump( now() ) to see what the default display looks like.
createODBCDate() is typically used when you need to pass a date to a SQL query. However, if you use cfqueryparam with a cf_sql_type that accepts a date, ColdFusion will handle converting the value (assuming it is a valid date) to a date that the database accepts and you do not need to use createODBCdate()
In 10+ years of doing ColdFusion, I have never used createODBCDate()
We have a staging table that's used to load raw data from our suppliers.
One column is used to capture a time-stamp but its data-type is varchar(265). Data's dirty: about 40% of the time, there is garbage data, otherwise time-stamp data like this
2011/11/15 20:58:48.041
I have to create a report that filters some dates/timestamps out that column but where I try to cast it, I get an error:
db2 => select cast(loadedon as timestamp) from automation
1
--------------------------
SQL0180N The syntax of the string representation of a datetime value is incorrect. SQLSTATE=22007
What do I need to do in order to parse/cast the timestamp string?
The string format for a DB2 timestamp is either:
'2002-10-20-12.00.00.000000'
or
'2002-10-20 12:00:00'
You have to get your date string in either of these formats.
Also DB2 runs on a 24 hour clock even though the output sometimes uses a 12 hour clock (AM / PM)
So '2002-10-20 14:49:50' For 2:49:50 PM
Or '2002-10-20 00:00:00' For midnight. Output would be 12:00:00 AM
It seems you have a lot of garbage data, so firt of all you should check if the data is a valid timestamp in the format you expect ('2011/11/15 20:58:48.041'). We could use a simple solution - just replace all digits with '0' and check the result format:
TRANSLATE(timestamp_column,'0','0123456789','0') = '0000/00/00 00:00:00.000'
If the format is the expected one, you should convert to DB2 timestamp. In DB2 for iSeries there is a build-in function since V6R1 TIMESTAMP_FORMAT. In your case it will look like that:
TIMESTAMP_FORMAT('2011/11/15 20:58:48.041','YYYY/MM/DD HH24:MI:SS.NNNNNN')
So the solution query combined should look something like that:
SELECT
CASE
WHEN TRANSLATE(timestamp_column,'0','0123456789','0') = '0000/00/00 00:00:00.000'
THEN TIMESTAMP_FORMAT(timestamp_column,'YYYY/MM/DD HH24:MI:SS.NNNNNN')
ELSE NULL
END
FROM
your_table_with_bad_data
EDIT
I just saw your comment that provider agreed to clean the data. You could use the solution provided to speed up the process and clean the data by yourself:
ALTER your_table_with_bad_data ADD COLUMN clean_timestamp TIMESTAMP DEFAULT NULL;
UPDATE your_table_with_bad_data
SET clean_timestamp =
CASE
WHEN TRANSLATE(timestamp_column,'0','0123456789','0') = '0000/00/00 00:00:00.000'
THEN TIMESTAMP_FORMAT(timestamp_column,'YYYY/MM/DD HH24:MI:SS.NNNNNN')
ELSE NULL
END;