Error logs Kettle/Data Integration(Pentaho) and execution information of the transformations - kettle

I need to do an analysis of the kettle errors (Data Integration).
How do I get information when just running a job in the kettle (Data Integration) and the result of this execution?

Related

SageMaker ProfilerReport stopped saying InternalServerError but the training job is successful

When running a simple training job on Amazon SageMaker, ProfilerReport (not configured by me) is also enabled by default and a processing job appears parallel to the training job.
The training job runs successfully, but a few times (so I don't know how to reproduce the error) the profiler report goes into generic error saying:
InternalServerError: An internal error occurred. Try again.
Looking at the CloudWatch logs, the last few are all like this:
Put the output notebook in /opt/ml/processing/output/rule/profiler-output/profiler-report.ipynb
Put the html in /opt/ml/processing/output/rule/profiler-output/profiler-report.html
Current timestamp 1666357140000000 last timestamp 1666357080000000: waiting for new profiler data.
Current timestamp 1666357140000000 most recent timestamp 1666357080000000: waiting for new profiler data.
Current timestamp 1666357140000000 most recent timestamp 1666357080000000: waiting for new profiler data.
......
repeating to the last this waiting for new profiler data.
The job in question lasted 2 days, but the profiler report failed after 20 hours. Looking at the instance parameters, there is no error in terms of resources used.
The only point I can think of is that I configured early stopping (with saving only the best model, progressively) and so in the last training phase it does not save any data.
Could the explanation then be that by not saving anything, the profiler report goes into timeout? The ProfilerReport though, shouldn't it also show a lot of other information about the training job by looking at the debugger like gpu utilization and more?
This is the simple example of the training job code:
from sagemaker.pytorch import PyTorch
tft_train_estimator = PyTorch(
base_job_name="my-training-job-name"
entry_point="training.py",
framework_version="1.12.0",
py_version="py38",
role=role,
instance_count=1,
instance_type=train_instance_type,
code_location = code_location,
output_path=output_model_path
)
In each case, the trained model works correctly.

GCP BigQuery - Verify successful execution of stored procedure

I have a BigQuery routine that inserts records into a BQ Table.
I am looking to have a Eventarc trigger that triggers Cloud Run, and performs some action on successful execution of the BigQuery Routine.
From Cloud Logging, I can see two events that would seem to confirm the successful execution of the BQ Routine.
protoPayload.methodName="google.cloud.bigquery.v2.JobService.InsertJob"
protoPayload.metadata.tableDataChange.insertedRowsCount
However, this does not give me the Job ID.
So, I am looking at event -
protoPayload.methodName="jobservice.jobcompleted"
Would it be correct to assume that, if protoPayload.serviceData.jobCompletedEvent.job.jobStatus.error is empty, then the stored procedure execution was successful?
Thanks!
Decided to go with protoPayload.methodName="jobservice.jobcompleted" in this case.
It gives the job id at protoPayload.requestMetadata.resourceName, status like protoPayload.serviceData.jobCompletedEvent.job.jobStatus.state and errors if any like protoPayload.serviceData.jobCompletedEvent.job.jobStatus.error

When can one find logs for Vertex AI Batch Prediction jobs?

I couldn't find relevant information in the Documentation. I have tried all options and links in the batch transform pages.
They can be found, but unfortunately not via any links in the Vertex AI console.
Soon after the batch prediction job fails, go to Logging -> Logs Explorer and create a query like this, replacing YOUR_PROJECT with the name of your gcp project:
logName:"projects/YOUR_PROJECT/logs/ml.googleapis.com"
First look for the same error reported by the Batch Prediction page in the Vertex AI console: "Job failed. See logs for full details."
The log line above the "Job Failed" error will likely report the real reason your batch prediction job failed.
I have found that just going to Cloud logger after batch prediction job fails and clicking run query shows the error details

BigQuery data transfer Job failed with error INTERNAL (Error: 80038528)

I have created a table in BQ and went ahead and made a cloud storage data trasfer job (ref - https://cloud.google.com/bigquery-transfer/docs/cloud-storage-transfer). But the job is throwing the following error.
Job bqts_6091b5c4-0000-2b62-bad0-089e08e4f7e1 (table abc) failed with error INTERNAL: An internal error occurred and the request could not be completed. Error: 80038528; JobID: 742067653276:bqts_6091b5c4-0000-2b62-bad0-089e08e4f7e1
The reason of the job failure with (Error: 80038528) is not enough slots being available. With respect to resource allocation, there is no fixed resource availability guarantee when using the on-demand model for running queries in BigQuery. The only way to make sure a certain number of slots is always available is by moving to a flat-rate model [1].
If slot is not an issue, then you can follow what #Sakshi Gatyan as mentioned. That is the right way to get an exact solution for BigQuery internal error.
[1]. https://cloud.google.com/bigquery/docs/slots
Since it's an internalError, there can be multiple reasons for your job failure depending upon your environment.
check error table
The job needs to be inspected internally by the Google support team. So I would recommend filling a case with support.

Incrementally loaded data from DynamoDB to S3 using Amazon Data Pipeline

My scenario is based on 'DAT' (which contains date) column in DynamoDB, I need to incrementally load the data to S3 using Amazon Data Pipeline console.
To perform this I have used Hive Copy activity and added filtersql as DAT > unix_timestamp(\"2015-01-01 01:00:00.301\", \"yyyy-MM-dd'T'HH:mm:ss\"). When I use filtersql then getting an error message
Failed to complete HiveActivity: Hive did not produce an error file. Cause: EMR job '#TableBackupActivity_2015-03-14T07:17:02_Attempt=3' with jobFlowId 'i-3NTVWJANCCOH7E' is failed with status 'FAILED' and reason 'Waiting after step failed'. Step '#TableBackupActivity_2015-03-14T07:17:02_Attempt=3' is in status 'FAILED' with reason 'null'
If I use without filtersql statement then data moved from DynamoDB to S3 without any error. Please someone help me on this error.