EclipseLink batch update doesn't work if entity has an optimistic lock - jpa-2.0

Official documentation says:
Batch writing can improve database performance by sending groups of
INSERT, UPDATE, and DELETE statements to the database in a single
transaction, rather than individually
(emphasis is mine).
But if an entity has Optimistic Locking #Version field, then all UPDATEs are executed independently.
To prove this, here is the source code snippet DatabaseAccessor:566:
if (/* ommited */(!dbCall.hasOptimisticLock() || getPlatform().canBatchWriteWithOptimisticLocking(dbCall) ) /* ommited */) {
// this will handle executing batched statements, or switching mechanisms if required
getActiveBatchWritingMechanism().appendCall(session, dbCall);
//bug 4241441: passing 1 back to avoid optimistic lock exceptions since there
// is no way to know if it succeeded on the DB at this point.
return Integer.valueOf(1);
}
So, basically, what above snippet means, is that if Entity has an optimistic lock, then batch update will be ignored.
Is there a workaround for that? I still want to use JPA.
UPDATE:
It turned out that I needed to add this property to persistence.xml in order to enable batch update with optimistic locking:
<property name="eclipselink.target-database" value="org.eclipse.persistence.platform.database.oracle.Oracle11Platform"/>
Note, that Oracle10Platform or higher could be used as a value. Lower versions don't support this feature.
Also, to enable batch writing, you have to add at least one property in your persistence.xml:
<property name="eclipselink.jdbc.batch-writing" value="JDBC" />
You can also, optionally configure batch size:
<property name="eclipselink.jdbc.batch-writing.size" value="1000" />

Did you check the canBatchWriteWithOptimisticLocking() method for the platform class you are using? This call is there so that if your driver can support returning the row counts for individual calls within the batch so that Eclipselink can throw an optimistic lock exception as required, batching can be used.

Related

Making sure I do not overwrite a file on Cloud Storage by accident

(Node.js API)
I am trying to do the following:
Generate file path like /uploads/${uuid.v4()}.extension
Write the file.
This is the code:
const path = `/uploads/${uuidv4()}.${extname(fileName)}`;
const file = bucket.file(path);
await new Promise((resolve, reject) =>
data
.pipe(file.createWriteStream({ contentType }))
.once('error', reject)
.once('finish', resolve),
);
It works fine. But bothers me to no end that there is that miniscule probability that same UUID will be generated. It is not a practical concern.
How can I upload data to Cloud Storage but get an error if there's a clash? I can check if the file exists beforehand but there is still a race condition technically...
The chance of a collision is not just miniscule: it's astronomically low for UUIDs of significant size. Putting effort into solving the problem of such a collision is not likely to be worth the effort.
That said, if you still want to, you won't be able to do it with Cloud Storage APIs alone, since there is no transactional, atomic API to interact with. If you want a "hard" guarantee that there is no collision, you will need to interact with an entirely different Cloud service that does allow you to effectively "lock" some unique string (e.g. a file path) as a flag for all other processes to check so that they don't collide. Since you are working in Google Cloud, you might want to consider using a database (like any SQL database, or Firestore) with atomic transactional operations to "reserve" the path so that only one process can use it (assuming they all correctly observe this reservation and cooperate as such).
Isn't this exactly what preconditions are for?
Copied from the docs: https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-nodejs
const options = {
destination: destFileName,
// Optional:
// Set a generation-match precondition to avoid potential race conditions
// and data corruptions. The request to upload is aborted if the object's
// generation number does not match your precondition. For a destination
// object that does not yet exist, set the ifGenerationMatch precondition to 0
// If the destination object already exists in your bucket, set instead a
// generation-match precondition using its generation number.
preconditionOpts: {ifGenerationMatch: generationMatchPrecondition},
};
await storage.bucket(bucketName).upload(filePath, options);
console.log(`${filePath} uploaded to ${bucketName}`);

Jberet - Retryable exception class working?

Is there a way to see in the log that the retry is happening? I need to know if this is working in our test environment before implementing it into production.
There are rare instances when we get the following due to a portion of the key being a timestamp and data coming in to the table from various sources. We need to have the writer retry when we get a - DB2 SQL Error: SQLCODE=-803, SQLSTATE=23505
<chunk>
...
<retryable-exception-classes>
<include class="com.ibm.db2.jcc.am.SqlIntegrityConstraintViolationException"></include>
</retryable-exception-classes>
</chunk>
JBeret does not log these event, but you can implement some listeners defined by batch spec to act on you own. For example, RetryReadListener, RetryWriteListener, or RetryProcessListener.

Should I have concern about datastoreRpcErrors?

When I run dataflow jobs that writes to google cloud datastore, sometime I see the metrics show that I had one or two datastoreRpcErrors:
Since these datastore writes usually contain a batch of keys, I am wondering in the situation of RpcError, if some retry will happen automatically. If not, what would be a good way to handle these cases?
tl;dr: By default datastoreRpcErrors will use 5 retries automatically.
I dig into the code of datastoreio in beam python sdk. It looks like the final entity mutations are flushed in batch via DatastoreWriteFn().
# Flush the current batch of mutations to Cloud Datastore.
_, latency_ms = helper.write_mutations(
self._datastore, self._project, self._mutations,
self._throttler, self._update_rpc_stats,
throttle_delay=_Mutate._WRITE_BATCH_TARGET_LATENCY_MS/1000)
The RPCError is caught by this block of code in write_mutations in the helper; and there is a decorator #retry.with_exponential_backoff for commit method; and the default number of retry is set to 5; retry_on_rpc_error defines the concrete RPCError and SocketError reasons to trigger retry.
for mutation in mutations:
commit_request.mutations.add().CopyFrom(mutation)
#retry.with_exponential_backoff(num_retries=5,
retry_filter=retry_on_rpc_error)
def commit(request):
# Client-side throttling.
while throttler.throttle_request(time.time()*1000):
try:
response = datastore.commit(request)
...
except (RPCError, SocketError):
if rpc_stats_callback:
rpc_stats_callback(errors=1)
raise
...
I think you should first of all determine which kind of error occurred in order to see what are your options.
However, in the official Datastore documentation, there is a list of all the possible errors and their error codes . Fortunately, they come with recommended actions for each.
My advice is that your implement their recommendations and see for alternatives if they are not effective for you

OpenSplice DDS: Publish, until some timeout

I'm learning every day more about dds, so my question my sound weird. I hope it makes sense.
One of the requirements of some dds wrapper I'm writing, is that it times out after some timeout period if it fails to write. My question: How can I do that?
On Prism Tech's website's tutorial, there's explanation on how to use a WaitSet to block a read operation, but what about write?
Here's some code including the question:
dds::domain::DomainParticipant dp(0);
dds::topic::Topic<MyType> topic(dp, "MyTopic");
dds::pub::Publisher pub(dp);
dds::pub::DataWriter<MyType> dw(pub, topic);
MyType t;
dw.write(t); //how can I make this block for 5 seconds (tops), and then throw an error on failure?
I noticed there exists a function in the API DataWriter::wait_for_acknowledgements(int timeout), but this seems to be bound to the DataWriter object, not to the specific call of writing. Can I bind it with the call above?
This is configured in QoS, cf RELIABILITY, field "max_blocking_time". How you set this value will depend on the vendor's implementation. Generally you get the current QoS, update the field, write the QoS back. Keep in mind that certain QoS policies must be set before something else happens. Reliability is "Before Enable" (at least in the implementation I'm most familiar with), which means you need to create the data-writer disabled, update the QoS, then enable the writer.
If QoS can be set outside the application (via XML for example), then you can set the policy easily. Otherwise, you need to do it in code.
From the spec:
The value of the max_blocking_time indicates the maximum time the operation DataWriter::write is allowed to block if the DataWriter does not have space to store the value written. The default max_blocking_time=100ms.

JPA with JTA how to persist many entites in one transaction

I have a list of objects. They are JPA "Location" entities.
List<Location> locations;
I have a stateless EJB which loops thru the list and persists each one.
public void createLocations() {
List<Locations> locations = getListOfJPAManagedLocationEntities(); // I'm leaving out the details of this because it has nothing to do with the issue
for(Location location : locations) {
em.persist(location);
}
}
The code works fine. I do not have any problems.
However, the issue is: I want this to be an all-or-none transaction. Currently, each time thru the for loop, the persist() method will insert a new row into the database. Suppose I have 100 location objects and the 54th object has something wrong with it and an exception is thrown. There will be 53 records inserted into the database. What I want is: all of them must succeed before any of them succeed.
I'm using the latest & greatest version of Java EE6, EJB 3.x., and JPA 2. My persistence.xml uses JTA.
<persistence-unit name="myPersistenceUnit" transaction-type="JTA">
And I like having JTA.
I do not want to stop using JTA.
90% of the time JTA does exactly what I want it to do. But in this case, I doesn't seem to.
My understanding of JTA must be inaccurate because I always thought the beginning and end of the EJB method marked the boundaries of the JTA transaction (assume only one method is in-play as I've shown above). By my logic, the transaction would not end until the for-loop is done and the method returns, and then at that point the records are persisted.
I'm using the JTDS driver for SqlServer 2008. Perhaps the database doesn't want to insert a record without immediately committing it. The entity id is defined like this:
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
I've checked the spec., and it is not proper to call the various "UserTransaction" or "getTransaction()" methods in a JTA environment.
So what can I do?
Thanks.
If you use JTA and container managed transactions the default behavior for an session EJB method call is to run in a transaction (is like annotating it with #TransactionAttribute(TransactionAttributeType.REQUIRED). That means that your code already runs in a transaction and will do what you expect: if an exception occurs at row 54 all previous inserted rows will be rolled-back. You can go ahead and test it by throwing yourself an exception at some point in the loop. Note that if you throw a checked exception declared by your method you can specify what the container should do when that exception occurs. You need to annotate the exception class with #ApplicationException (rollback=true).
if there was a duplicate entry while looping then it will continue without problems and when compiler reaches this line em.flush(); after the loop then it will throw an exception and rollback the transaction.
I'm using JBoss. Set your datasource in your standalone.xml or domain.xml to have
<datasource jta="true" ...>
Seems obvious, but I obviously set it wrong a long time ago and forgot about it.