Jberet - Retryable exception class working? - jberet

Is there a way to see in the log that the retry is happening? I need to know if this is working in our test environment before implementing it into production.
There are rare instances when we get the following due to a portion of the key being a timestamp and data coming in to the table from various sources. We need to have the writer retry when we get a - DB2 SQL Error: SQLCODE=-803, SQLSTATE=23505
<chunk>
...
<retryable-exception-classes>
<include class="com.ibm.db2.jcc.am.SqlIntegrityConstraintViolationException"></include>
</retryable-exception-classes>
</chunk>

JBeret does not log these event, but you can implement some listeners defined by batch spec to act on you own. For example, RetryReadListener, RetryWriteListener, or RetryProcessListener.

Related

Stage level data is not coming for bigquery running jobs through java bigquery libraries

I am using com.google.cloud.bigquery library for fetching the job level details. We have the following code snippets
Job job = getBigQuery(projectId, location).getJob(JobId.newBuilder().setJob("myJobId").
setLocation(location).setProject(projectId).build());
private BigQuery getBigQuery(String projectId, String location) throws IOException {
// path to your credentials file
String credentialsPath = "my private key crdentials file";
BigQuery bigQuery;
bigQuery = BigQueryOptions.newBuilder().setProjectId(projectId).setLocation(location)
.setCredentials(GoogleCredentials.fromStream(new FileInputStream(credentialsPath))).build()
.getService();
return bigQuery;
}
My Dependency
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigquery</artifactId>
<version>2.10.0</version>
</dependency>
Now for completed jobs, I have no issue, but for some jobs which are in a running state like having a duration of more than 1 minute, we are getting the incomplete query plan data which is ultimately giving the null pointer exception.
If we observe the picture, for the job, there is jobStatistics part, there it is giving the warning like it will throw java.lang.NullPointerException .
Now the main issue is, in our processing, when we check the queryPlan field, it is not null and it is showing the size of some number. When I try to process that in any loop, iterator, stream it is throwing the NullPointerException.
When I try to fetch the data for the same running job using API, it is giving complete details.
Ultimately the conclusion is why the bigquery is giving different results for the java library and API, why there is incompleteness in the java library side(I have tried by updating the dependency version also). What is the solution for me, how can I prevent my code from going into the NullPointerException.
Ultimately the library is also using the same API, but somehow in the internal processing the query plan data is not getting generated properly when the job is in running state.
I was able to test the behaviour of the code as well as the API. When the query is running, most of the API response fields under queryPlan are 0, therefore not complete. Only when the query has completed its execution, the queryPlan field shows the complete information.
Also, as per this client library documentation, the queryPlan is available only once the query has completed its execution. So, the NullPointerException is the expected behaviour when the query is still running (tested this as well).
To prevent the NullPointerException, you might have to access the queryPlan when the state of the query is DONE.

Data Refinery Job failed with SCAPIException CDICO2060E

I'm building my first project in Watson Studio and a Data Refinery Job fails with the following error:
ERROR: Failed to execute the flow. Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost, executor driver): com.ibm.connect.api.SCAPIException: CDICO2060E: The metadata for the select statement could not be retrieved Sql syntax error: THE DATA TYPE, LENGTH, OR VALUE OF ARGUMENT 1 OF RID IS INVALID. SQLCODE=-171
The SQL it's executing contains this: FROM \"SCHEMA\".\"VIEW_NAME_A\" WHERE MOD(COALESCE(RID(\"SCHEMA\".\"VIEW_NAME_A\"), 0), 3) = 0
The job was built from a DB2 for Z/OS connection --> Connected Data object --> Data Refinery Flow where once the flow looked good, it was saved and then a job was created. Which failed on the execution. SCHEMA.VIEW_NAME_A is a view built of a complex query joining two or more tables together.
I have another data refinery flow for a simpler view table, where it's job (created the same way) works successfully. The query for this view is only one table.
I don't quite understand why Watson Studio built this query for the job run with this WHERE statement and I can't find anything about it.
Someone have an idea on how to fix or workaround this issue?
Watson Studio extracts the source data using multiple queries that partition the data, and that WHERE clause came from its partitioning algorithm. Apparently its partitioning strategy for z/OS does not work properly when the source is a complex view. I apologize for the inconvenience and cannot think of a suitable workaround. We will fix the issue as soon as possible.

Should I have concern about datastoreRpcErrors?

When I run dataflow jobs that writes to google cloud datastore, sometime I see the metrics show that I had one or two datastoreRpcErrors:
Since these datastore writes usually contain a batch of keys, I am wondering in the situation of RpcError, if some retry will happen automatically. If not, what would be a good way to handle these cases?
tl;dr: By default datastoreRpcErrors will use 5 retries automatically.
I dig into the code of datastoreio in beam python sdk. It looks like the final entity mutations are flushed in batch via DatastoreWriteFn().
# Flush the current batch of mutations to Cloud Datastore.
_, latency_ms = helper.write_mutations(
self._datastore, self._project, self._mutations,
self._throttler, self._update_rpc_stats,
throttle_delay=_Mutate._WRITE_BATCH_TARGET_LATENCY_MS/1000)
The RPCError is caught by this block of code in write_mutations in the helper; and there is a decorator #retry.with_exponential_backoff for commit method; and the default number of retry is set to 5; retry_on_rpc_error defines the concrete RPCError and SocketError reasons to trigger retry.
for mutation in mutations:
commit_request.mutations.add().CopyFrom(mutation)
#retry.with_exponential_backoff(num_retries=5,
retry_filter=retry_on_rpc_error)
def commit(request):
# Client-side throttling.
while throttler.throttle_request(time.time()*1000):
try:
response = datastore.commit(request)
...
except (RPCError, SocketError):
if rpc_stats_callback:
rpc_stats_callback(errors=1)
raise
...
I think you should first of all determine which kind of error occurred in order to see what are your options.
However, in the official Datastore documentation, there is a list of all the possible errors and their error codes . Fortunately, they come with recommended actions for each.
My advice is that your implement their recommendations and see for alternatives if they are not effective for you

C++ Poco ODBC Transactions - AutoCommit mode

I am currently attempting to use transactions in my C++ app, but I have a problem with the ODBC's auto commit mode.
I am using the POCO libaries to create a connection to a PostgreSQL database on the same machine. Currently, I can send data to this database as single statements, but I cannot get my head around how to use Poco's transaction libraries to be able to send this data more quickly.
As I have several thousand records to insert, and so continuing to use single insert statements is extrememly slow and inpractical - So I am trying to use Poco's transaction to speed this up a bit (a fair bit).
The error I am encountering is a theoretically a simple one - Poco is throwing the following error:
'Invalid access: Session is in auto commit mode.'
I understand, as a result of this, I should somehow set "auto commit" to false - as it only allows me to commit data to the database line by line, rather than as a single transaction.
The problem is how I set this.
Currently, I have a session created from Session.h, that looks alot like this:
session = new Poco::Data::Session(
"ODBC",
connection_data.str()
);
Where connection data is a simple stringstream with the login information, password, database, server and "Driver={PostgreSQL ANSI};" to tell ODBC to utilize PostgreSQL's driver.
I have tried just setting a property "autocommit" to false through the session's setFeature or setProperty settings, this, of course, was to no avail. (it was more of a ditch attempt at this point).
session->setFeature("AUTOCOMMIT", false);
Looking around, I saw a possible alternative method by creating a ODBC sessionImpl directly from ODBC/session/SessionImpl.h instead of using this generic method above, and then creating a new session object from this.
The benefits of this are that ODBC's sessionImpl has references to autocommit mode in the header, which would suggest it would be able to handle this:
void autoCommit(const std::string&, bool val);
/// Sets autocommit property for the session.
However, having not used sessionImpl before, I cannot garuntee if this will work or if can can get this to work with the limited documentation available.
I am using C++ 03 (Not 11), with Visual Studio 2015
Poco 1.7.5
Boost (Where needed)
Would any one know the correct way of setting this feature (above) or a alternative method to achieving this?
edit: Looking at the source of poco, at:
https://github.com/pocoproject/poco/blob/develop/Data/ODBC/src/SessionImpl.cpp#L153
The property seems be named autoCommit, and looking at
https://github.com/pocoproject/poco/blob/develop/Data/include/Poco/Data/AbstractSessionImpl.h#L120
the case of the property names seem to matter. So, does it help if you use session->setFeature("autoCommit", false);?
Cant you just call session->begin(); and session->end(); on the corresponding Session object?
What is returned by session->canTransact()?
According to the doc begin() will start a new transaction, the doc does not mention any property that needs to be set before or after.
See: https://pocoproject.org/docs/Poco.Data.Session.html
Also faced a similar issue.
First of all before begin() need:
m_ses.setFeature("autoCommit", false);
m_ses.begin();
And the second issue is that this feature stays "autoCommit" in false for all other sessions. So don't forget for the next session call
session.setFeature("autoCommit", true);

How to increase deploy timeout limit at AWS Opsworks?

I would like to increase the deploy time, in a stack layer that hosts many apps (AWS Opsworks).
Currenlty I get the following error:
Eror
[2014-05-05T22:27:51+00:00] ERROR: Running exception handlers
[2014-05-05T22:27:51+00:00] ERROR: Exception handlers complete
[2014-05-05T22:27:51+00:00] FATAL: Stacktrace dumped to /var/lib/aws/opsworks/cache/chef-stacktrace.out
[2014-05-05T22:27:51+00:00] ERROR: deploy[/srv/www/lakers_test] (opsworks_delayed_job::deploy line 65) had an error: Mixlib::ShellOut::CommandTimeout: Command timed out after 600s:
Thanks in advance.
First of all, as mentioned in this ticket reporting a similar issue, the Opsworks guys recommend trying to speed up the call first (there's always room for optimization).
If that doesn't work, we can go down the rabbit hole: this gets called, which in turn calls Mixlib::ShellOut.new, which happens to have a timeout option that you can pass in the initializer!
Now you can use an Opsworks custom cookbook to overwrite the initial method, and pass the corresponding timeout option. Opsworks merges the contents of its base cookbooks with the contents of your custom cookbook - therefore you only need to add & edit one single file to your custom cookbook: opsworks_commons/libraries/shellout.rb:
module OpsWorks
module ShellOut
extend self
# This would be your new default timeout.
DEFAULT_OPTIONS = { timeout: 900 }
def shellout(command, options = {})
cmd = Mixlib::ShellOut.new(command, DEFAULT_OPTIONS.merge(options))
cmd.run_command
cmd.error!
[cmd.stderr, cmd.stdout].join("\n")
end
end
end
Notice how the only additions are just DEFAULT_OPTIONS and merging these options in the Mixlib::ShellOut.new call.
An improvement to this method would be changing this timeout option via a chef attribute, that you could in turn update via your custom JSON in the Opsworks interface. This means passing the timeout attribute in the initial Opsworks::ShellOut.shellout call - not in the method definition. But this depends on how the shellout method actually gets called...