I am encountering problems in my AWS Glue as it is stated in the title (also attached the error) error.
Different jobs, return different error number code such as:
An error occurred while calling o176.pyWriteDynamicFrame. YEAR
and another returns
An error occurred while calling o116.pyWriteDynamicFrame. YEAR
and many more
An error occurred while calling o128.pyWriteDynamicFrame. YEAR
I am using Glue 3.0, and using AWS Data Catalog as the Source, SQL Query for Transform, and Redshift as the Target.
IAM Role: AWSGlueETLRole
Type: Spark
Language: Python 3
I have these error in several jobs that were already existing in 2022 and there were no errors before 4 February 2023. The jobs running fine before that and there were no error also in the code if we run them in Metabase or Dbeaver (MySQL).
I have totally no clue what happened and how to fix this as I was just started to use AWS Glue. Can someone help me to explain what happening, how do I check the error logs, and how to fix this?
I have already tried to cast date columns as date to see if these caused by the date as the error contains YEAR, but that did not help.
And I haven't do anything as I afraid it will break the code or the ETL.
Related
I am receiving an error when updating my report. This is a report that has 2 sources, one in SQL Server and one in MariaDB. I have no problem assigning these two sources, however when I try to automate the report and manually update it gives me the following error:
enter image description here
I have tried to check and clean the file sources but it doesn't work.
I am trying to setup an AWS Glue Crawler using a JDBC connection in order to populate my AWS Glue Data Catalog databases.
I already have a Connection which passes the test but when I submit my crawler creation, I have this error : "Expected string length >= 1, but found 0 for params.Targets.JdbcTargets[0].customJdbcDriverClassName" as you can see in the first screenshot.
The only clue I have for now is that there is no Class Name attached to my connection. However I cannot edit it while editing the connection
Does it ring a bell to someone?
Thanks a lot
I've also had this issue, and even tried using aws-cli to create/update my connection to try to manually input the required parameter.
Turns out this is an AWS UI issue caused by a recent update. According to this post you can create it using the Legacy console for now (on the sidbar, there is a Legacy section where you can find the Legacy pages). I just tried it on my end and it worked =)
I'm having troubles with a job I've set up on dataflow.
Here is the context, I created a dataset on bigquery using the following path
bi-training-gcp:sales.sales_data
In the properties I can see that the data location is "US"
Now I want to run a job on dataflow and I enter the following command into the google shell
gcloud dataflow sql query ' SELECT country, DATE_TRUNC(ORDERDATE , MONTH),
sum(sales) FROM bi-training-gcp.sales.sales_data group by 1,2 ' --job-name=dataflow-sql-sales-monthly --region=us-east1 --bigquery-dataset=sales --bigquery-table=monthly_sales
The query is accepted by the console and returns me a sort of acceptation message.
After that I go to the dataflow dashboard. I can see a new job as queued but after 5 minutes or so the job fails and I get the following error messages:
Error
2021-09-29T18:06:00.795ZInvalid/unsupported arguments for SQL job launch: Invalid table specification in Data Catalog: Could not resolve table in Data Catalog: bi-training-gcp.sales.sales_data
Error 2021-09-29T18:10:31.592036462ZError occurred in the launcher
container: Template launch failed. See console logs.
My guess is that it cannot find my table. Maybe because I specified the wrong location/region, since my table is specified to be location in "US" I thought it would be on a US server (which is why I specified us-east1 as a region), but I tried all us regions with no success...
Does anybody know how I can solve this ?
Thank you
This error occurs if the Dataflow service account doesn't have access to the Data Catalog API. To resolve this issue, enable the Data Catalog API in the Google Cloud project that you're using to write and run queries. Alternately, assign the roles/datacatalog.
I'm trying to migrate and synchronize a PostgreSQL database using AWS DMS and I'm getting the following error.
Last Error Task error notification received from subtask 0, thread 0
[reptask/replicationtask.c:2673] [1020101] When working with Configured Slotname, user must
specify LSN; Error executing source loop; Stream component failed at subtask 0, component
st_0_D27UO7SI6SIKOSZ4V6RH4PPTZQ ; Stream component 'st_0_D27UO7SI6SIKOSZ4V6RH4PPTZQ'
terminated [reptask/replicationtask.c:2680] [1020101] Stop Reason FATAL_ERROR Error Level FATAL
I already created a replication slot and configured its name in the source endpoint.
DMS Engine version: 3.1.4
Does anyone knows anything that could help me?
Luan -
I experienced the same issue - I was trying to replicate data from Postgres to an S3 bucket.I would check two things - your version of Postgres and the DMS version being used.
I downgraded my RDS postgres version to 9.6 and my DMS version to 2.4.5 to get replication working.
You can find more details here -
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html
I wanted to try the newer versions of DMS (3.1.4 and 3.3.0[beta]) as it has parquet support but I have gotten the same errors you mentioned above.
Hope this helps.
It appears AWS expects you to use the pglogical extension rather than test_decoding. You have to:
add pglogical to shared_preload_libraries in parameter options
reboot
CREATE EXTENSION pglogical;
On dms 3.4.2 and postgres 12.3 without the slotName= setting DMS created the slot for itself. Also make sure you exclude the pglogical schema from the migration task as it has unsupported data types.
P.S. When DMS hits resource limits it silently fails. After resolving the LSN errors, I continued to get failures of the type Last Error Task 'psql2es' was suspended due to 6 successive unexpected failures Stop Reason FATAL_ERROR Error Level FATAL without any errors in the logs. I resolved this issue using the Advanced task settings > Full load tuning settings and tuning the parameters downward.
I am creating a JOB in AWS Glue but it shows below error in the last step.
{"service":"AWSGlue","statusCode":400,"errorCode":"InvalidInputException","requestId":"ad9ee511-adb8-11e9-9bbf-9d08424a9846","errorMessage":"No enum constant com.amazonaws.services.glue.FileFormat.UNKNOWN","type":"AwsServiceError"}
The column mapping is shown correctly as below screenshot.
what I don't understand is which input field has an invalid data? Glue doesn't give me more detailed information about this error. How can I debug this issue?
I faced the same issue, because accidentally I tried to write the data to my DynamoDB table (as my Target). When I changed to S3. It worked.
You can learn more about it on AWS Documentation
Here is the issue raised in amazon related to this.