Hibernate running on separate JVM fail to read - web-services

I am implementing WebService with Hibernate to write/read data into database (MySQL). One big issue I have was when I insert data (e.g., USER table) via one JVM (example: JUNit test or directly from DBUI suite) successfully, my WebService's Hibernate running on separate JVM cannot find this new data. They all point to the same DB server. It is only if I had destroyed the WebService's Hibernate SessionFactory and recreate it, then the WebService's Hibernate layer can read the new inserted data. In contrast, the same JUnit test or a direct query from DBUI suite can find the inserted data.
Any assistance is appreciated.

This issue is resolved today with the following:
I changed our Hibernate config file (hibernate.cfg.xml) to have Isolation Level to at least "2" (READ COMMITTED). This immediately resolved the issue above. To understand further about this isolation level setting, please refer to these:
Hibernate reading function shows old data
Transaction isolation levels relation with locks on table
I ensured I did not use 2nd level caching by setting CacheMode to IGNORE for each of my Session object:
Session session = getSessionFactory().openSession();
session.setCacheMode(CacheMode.IGNORE);
Reference only: Some folks did the following in hibernate.cfg.xml to disable their 2nd level caching in their apps (BUT I didn't need to):
<property name="cache.provider_class">org.hibernate.cache.internal.NoCacheProvider</property>
<property name="hibernate.cache.use_second_level_cache">false</property>
<property name="hibernate.cache.use_query_cache">false</property>

Related

The Apache Nifi state is not persisted to zookeeper

According to the official Nifi documentation, the state allows Nifi processors to "resume from the place where it left off after NiFi is restarted. Additionally, it allows for a Processor to store some piece of information so that the Processor can access that information from all of the different nodes in the cluster".
If my understanding is good, when we configure a zookeeper Provider, the state will not be persisted locally, instead, the data will be sent to zookeeper.
I've explored the zookeeper znodes and could not find any data related to the state, all I can find are the informations about the Coordinator and Primary nodes. However, the local state directory is still filled.
The configuration is very simple, I've 3 external ZK nodes and 3 Nifi instances.
Here is an exerpt of the nifi.properties file:
nifi.cluster.is.node=true
nifi.zookeeper.connect.string=zk-node1:2181,zk-node2:2181,zk-node3:2181
nifi.state.management.embedded.zookeeper.start=false
nifi.state.management.provider.cluster=zk-provider
And here is an exerpt of the state-management.xml file:
<cluster-provider>
<id>zk-provider</id>
<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
<property name="Connect String">zk-node1:2181,zk-node2:2181,zk-node3:2181</property>
<property name="Root Node">/nifi</property>
<property name="Session Timeout">10 seconds</property>
<property name="Access Control">Open</property>
</cluster-provider>
When I try to ls the Zookeeper, I can see only 2 znodes: "components" but this znode is empty and the "leaders" zonde which contain some data about the Nifi Coordinator and Primary Nodes.
Also, when I explore the transactions logs, even after using some load balanced connections, I cannot find anything related to the Nifi State.
Could somebody explain what data goes the Zookeeper and why the local state directory is still filled even if we configure the zk provider ?
Thanks.
It depends on the processor, some cases it would never make sense to store cluster wide state because it could never be picked up by another node. For example, ListFile tracking from a local directory, another node cannot access the same directory so storing this state in ZK is not helpful.
There is always a local state provider in a write-ahead-log in the state directory, and it is up to the processor to say whether it should be cluster or local state when storing it.
The documentation for each processor should say how the state is stored. For example, from ListFile:
#Stateful(scopes = {Scope.LOCAL, Scope.CLUSTER}, description = "After performing a listing of files, the timestamp of the newest file is stored. "
+ "This allows the Processor to list only files that have been added or modified after "
+ "this date the next time that the Processor is run. Whether the state is stored with a Local or Cluster scope depends on the value of the "
+ "<Input Directory Location> property.")
If Input Directory Location is "remote" then it will use cluster state, otherwise local state.

Getting error, "Entity doesn't exist in AsyncLocal" when trying to call CreateBatchWrite<T> method of DynamoDBContext object

I have created a DynamoDb table in my dev machine and I'm trying to insert couple of rows from my .NET Core application using the CreateBatchWrite<T> method of DynamoDBContext object. I'm able to query the table from DynamoDB Javascript Shell window from the localhost:8000/shell url and it returns row count as 0. But when trying to call the CreateBatchWrite<T> method I get the error, "Entity doesn't exist in AsyncLocal".
Explanation
When using X-Ray, this happens when there is an attempt to create a SubSegment without a Parent Segment. Depending on your setup, when you run a query it might try creating a SubSegment, but it's failing because there is no parent segment.
This is common when running a Lambda function locally, as the Mock Lambda Test Tool will not create a Segment for you like the actual Lambda environment does on AWS. This can happen in other scenarios too.
More details here: https://github.com/aws/aws-xray-sdk-dotnet/issues/125
Solution
Easiest way to solve this is disabling X-Ray locally (as you probably don't want to generate traces locally):
In appsettings.Development.json add this:
"XRay": {
"DisableXRayTracing": "true",
"UseRuntimeErrors": "false",
"CollectSqlQueries": "false"
}
The important bit is the DisableXRayTracing equals true.
Make sure your appsettings.Development.json is set to Copy Always in the properties window. You can do this by including this in your .csproj:
<ItemGroup>
<None Update="appsettings.Development.json">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
<None Update="appsettings.json">
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
</None>
</ItemGroup>
If you really want to trace things locally, then make sure you create
a parent segment only when running locally (on AWS this would cause
problems as you would have two parent segments, one created manually
by you, another one created by AWS).
Add this line before any DynamoDB API methods are executed:
AWSXRayRecorder.Instance.ContextMissingStrategy = ContextMissingStrategy.LOG_ERROR;
You can find more info in GitHub discussion https://github.com/aws/aws-xray-sdk-dotnet/issues/69#issuecomment-482688754
Also, you will need to import these 2 packages.
using Amazon.XRay.Recorder.Core;
using Amazon.XRay.Recorder.Core.Strategies;
If you are tracing requests made with the AWS SDK, the X-Ray SDK attempts to generate a subsegment automatically to represent those requests, such as CreateBatchWrite. However, a subsegment can only be created as the child of an existing Segment, so if you have not created a segment beforehand that Entity doesn't exist error will occur.
See these docs for how to create custom segments. Alternatively, if you are developing a web app, the X-Ray SDK can automatically create segments for requests made to your service by adding configuration described in these docs

Django ORM returning stale data, possible race condition

Consider a Django application with a single RESTful API that creates objects (using Django REST Framework). As part of this API, I do some validation to make sure the creation calls are idempotent, such that if you call the creation API twice, the first will succeed, and the second will fail with a custom error code.
I have a scenario for testing this API which intermittently fails in the following way:
First API call, success, returns 201 -> object has supposedly been created
Immediately after response, second API call is made
Validation logic calls MyModel.objects.get(some_field=some_value) to check if this is a duplicate call or not
No such object is found, despite being created in step 1, thus a duplicate object is created
When inspecting the admin/querying the model, both objects can be seen.
Some more data:
There is no explicit caching on this model, or any other caching involved in this process.
I am unable to reproduce this locally
on my deployment setup there is about a 5% failure rate for this possible race condition.
Both local and deployment use PostgreSQL.
Deployment environment does have general caching enabled, but when enabling cache locally still no repro.
What might be causing this race condition? Does Django ORM have any failure modes where I might be getting stale data? Is there any way I can defensively protect the validation from getting stale data?
Have a look to transaction.atomic:
https://docs.djangoproject.com/en/1.8/topics/db/transactions/#django.db.transaction.atomic
this sometimes solves such issue
My current proposed solution is based on the feedback from #CraigRinger which seems to be true. Basically, to get a consistent response from Postgres I need to actually attempt an INSERT, and not just query for the data, because there are race conditions in play.
A partial reference to this can be found in https://code.djangoproject.com/ticket/20429#comment:22
Bottom line, the solution is to add a DB-enforced unique=True constraint on the relevant field on the model (some_field in this case), attempt the object creation, catch the IntegrityError, and from there on I can implement the custom error handling and propagate the right result to the API layer.

How to change client schema during provisioning?

I'm rushing (never a good thing) to get Sync Framework up and running for a "offline support" deadline on my project. We have a SQL Express 2008 instance on our server and then will deploy SQLCE to the clients. Clients will only sync with server, no peer-to-peer.
So far I have the following working:
Server schema setup
Scope created and tested
Server provisioned
Client provisioned w/ table creation
I've been very impressed with the relative simplicity of all of this. Then I realized the following:
Schema created through client provisioning to SQLCE does not setup default values for uniqueidentifier types.
FK constraints are not created on client
Here is the code that is being used to create the client schema (pulled from an example I found somewhere online)
static void Provision()
{
SqlConnection serverConn = new SqlConnection(
"Data Source=xxxxx, xxxx; Database=xxxxxx; " +
"Integrated Security=False; Password=xxxxxx; User ID=xxxxx;");
// create a connection to the SyncCompactDB database
SqlCeConnection clientConn = new SqlCeConnection(
#"Data Source='C:\SyncSQLServerAndSQLCompact\xxxxx.sdf'");
// get the description of the scope from the SyncDB server database
DbSyncScopeDescription scopeDesc = SqlSyncDescriptionBuilder.GetDescriptionForScope(
ScopeNames.Main, serverConn);
// create CE provisioning object based on the scope
SqlCeSyncScopeProvisioning clientProvision = new SqlCeSyncScopeProvisioning(clientConn, scopeDesc);
clientProvision.SetCreateTableDefault(DbSyncCreationOption.CreateOrUseExisting);
// starts the provisioning process
clientProvision.Apply();
}
When Sync Framework creates the schema on the client I need to make the additional changes listed earlier (default values, constraints, etc.).
This is where I'm getting confused (and frustrated):
I came across a code example that shows a SqlCeClientSyncProvider that has a CreatingSchema event. This code example actually shows setting the RowGuid property on a column which is EXACTLY what I need to do. However, what is a SqlCeClientSyncProvider?! This whole time (4 days now) I've been working with SqlCeSyncProvider in my sync code. So there is a SqlCeSyncProvider and a SqlCeClientSyncProvider?
The documentation on MSDN is not very good in explaining what either of these.
I've further confused whether I should make schema changes at provision time or at sync time?
How would you all suggest that I make schema changes to the client CE schema during provisioning?
SqlCeSyncProvider and SqlCeClientSyncProvider are different.
The latter is what is commonly referred to as the offline provider and this is the provider used by the Local Database Cache project item in Visual Studio. This provider works with the DbServerSyncProvider and SyncAgent and is used in hub-spoke topologies.
The one you're using is referred to as a collaboration provider or peer-to-peer provider (which also works in a hub-spoke scenario). SqlCeSyncProvider works with SqlSyncProvider and SyncOrchestrator and has no corresponding Visual Studio tooling support.
both providers requires provisioning the participating databases.
The two types of providers provisions the sync objects required to track and apply changes differently. The SchemaCreated event applies to the offline provider only. This get's fired the first time a sync is initiated and when the framework detects that the client database has not been provisioned (create user tables and the corresponding sync framework objects).
the scope provisioning used by the other provider dont apply constraints other than the PK. so you will have to do a post-provisioning step to apply the defaults and constraints yourself outside of the framework.
While researching solutions without using SyncAgent I found that the following would also work (in addition to my commented solution above):
Provision the client and let the framework create the client [user] schema. Now you have your tables.
Deprovision - this removes the restrictions on editing the tables/columns
Make your changes (in my case setting up Is RowGuid on PK columns and adding FK constraints) - this actually required me to drop and add a column as you can't change the "Is RowGuid" property an existing columns
Provision again using DbSyncCreationOption.CreateOrUseExisting

How to log SQL values sent to my DB using EclipseLink?

I use EclipseLink as my JPA2 persistence layer, and i would like to see the values sent to DB in logs.
I already see SQL queries (using <property name="eclipselink.logging.level" value="ALL" /> in my persistence.xml), but, for example in an SQSL INSERT, I do not see the values, only the placeholders ?
So, how to see what values are sent
You'll need to use a JDBC proxy driver like p6spy or log4jdbc to get the SQL statements issued with their values instead of the placeholders. This approach works well you are using a EclipseLink with a connection pool whose URL is derived from persistence.xml (where you can specify a JDBC URL recognized by the proxy driver instead of the actual), but may not be so useful in a Java EE environment (atleast for log4jdbc), unless you can get the JNDI data sources to use the proxy driver.