Scalability implications of converting stateless session beans to POJOs

Scalability implications of converting stateless session beans to POJOs - concurrency

Imagine a heavily-used service object that's implemented as an EJB 2.1 SLSB, and that also happens to be thread-safe in itself by virtue of having no state whatsoever. All its public methods are transactional (via CMT), most simply requiring a transaction, but some requiring a new transaction.
If I convert this SLSB to a genuine singleton POJO (e.g. using a DI framework), how will that affect the scalability of the application? When the service was a SLSB, the EJB container would manage a pool of instances from which each client would get its own copy, so I'm wondering whether turning it into a singleton POJO will introduce some kind of contention for that single instance.
FWIW, none of this service's methods are synchronized.
Clarification: my motivation for converting the SLSB to a POJO is simplicity of both the object's lifecycle (true singleton versus container-managed) and of the code itself (one interface and one annotated POJO, versus three interfaces, one bean class, and a bunch of XML in ejb-jar.xml).
Also, FWIW, the service in question is one component of a collocated web app running on JBoss 3.x.

If the POJO is truly stateless, or has no conversational state (i.e. state is immutable) then this will not worsen the performance, and may even improve slightly since you really are using just one instance from your DI framework rather than a pool from the container. (Even the pool suffers from contention under high load.)
There is no synchronization needed for an object that is thread-safe by design, such as one with none or just immutable state. There will be no contention - threads can freely execute methods on the POJO without synchronization.
By using just the POJO you also get to see really what is going on in your app, and can be sure there is no hidden "container magic" going on behind the scenes.

Your POJO seem perfect.
So No, there will be no contention, your scalability will be perfect.
You have no additional cost.
You even have less because you have one instance instead of several
Your scalability is better because you will never hit the limit of your pool (you don't have).

Related

Getting lock on Camel Processor

I would like to know the approaches to get synchronization on Camel Processor.
The only related thing that I found at docs:
Note that there is no concurrency or locking issue when using
ActiveMQ, JMS or SEDA by design; they are designed for highly
concurrent use. However there are possible concurrency issues in the
Processor of the messages i.e. what the processor does with the
message?
So if I want to get lock on org.apache.camel.Processor.process(Exchange) , i.e. I would like other threads wait for process method finishing while it is busy. Is that possible?
UPDATE: Actually I tried to make synchronized (lock) inside of process method - that works on JVM side. But my Processor is part of transaction-ed route and that is a problem - all changes to Persistence Layer become visible only after exiting Processor (or even maybe route). So I thought there are some Camel-like solutions for this problem.

The business logic you implement inside a Camel processor must be thread-safe, as multiple threads would reuse the same instance during routing messages in Camel.
If you want to use prototype scoped (eg creating a new instance of the processor for each message) then you can use the bean component, and set cache=false, and if you use spring, then declare the bean as prototype
<bean id="myBean" class="com.foo.MyBean" scope="prototype"/>
And then call this bean in a route
.to("bean:myBean?cache=false")
Though very often people use singleton instances.
If you want any kind of locking you can define the method as synchronized and let the JVM lock it for you.
public synchronized void process(Exchange exchange) throws Exception {
...
}

Asynchronous EJB scheduling

I'm wondering how asynchronous EJB methods are scheduled onto the underlying plateform (SMP/NUMA plateform for example) ?
Can anyone describe the scheduling middleware (I'm not familiar with EJB).

EJB as a spec doesn't say how this should be exactly implemented, giving implementations the free hand to choose how to do this.
That said, the implementations I've seen simply use a thread pool. It functions pretty much like an executor service does in Java SE. A call to an #Asynchronous methods results in a task being put in a queue, which is serviced by said thread pool.
SMP/NUMA properties are not directly influenced by EJB, but depend on how the underlying operating system handles threads within a single process.

Concurrent Calls to Oracle WebLogic 10.3 Web Service Problems

I have a Web Service (in java) on a Oracle WebLogic 10.3 that does all kinds of database queries. Recently I started stress tests. It passed the repetition tests (invoke the WS several 1000 times serially) but problems become to arise when concurrency testing began. Making as much as 2 concurrent calls results in errors. When doing proper tests the results looked like the WS wasn't able to handle concurrent calls at all, which obviously should not be the case. Error included null pointer exceptions, closed connections or prepared statements, etc. I am bit stumped at this specially since I was unable to find any kind of configuration options that could effect this but then again my knowledge of the WLS is quite limited.
Thanks for any suggestions in advance.

The answer you marked as correct is totally wrong.
The webservice methods should not be made in order to be thread safe.
Webservice implenmtation of weblogic are multithreaded.
It's like for the servlets
"Servlets are multithreaded. Servlet-based applications have to recognize and handle this appropriately. If large sections of code are synchronized, an application effectively becomes single threaded, and throughput decreases dramatically."
http://www.ibm.com/developerworks/websphere/library/bestpractices/avoiding_or_minimizing_synchronization_in_servlets.html
The code inside the WS you might want to synchronize depending what you do.
Does it make sense to synchronize web-service method?

Just so there is a clear answer.
When there are several concurrent calls to a given Web Service (in this case SOAP/JAX-WS was used) on WLS, the same object is used (no pooling or queues are used), therefore the implementation must be thread safe.
EDIT:
To clarify:
Assume there is a class attribute in the WebService implementation class generated by JDeveloper. If you modify this attribute in your web method (and then use it) it will cause synchronization problems when the method is called (ie WS is called) concurrently. When I first started creating web services I though the whole WebService object was created anew for each WS call but this does not seem to be the case.

Concurrency within Java EE environment

Goal
My goal to better understand how concurrency within Java EE environment and how can I better consume it.
General questions
Let's take typical servlet container (tomcat) as example. For each request it uses 1 thread to process it. Thread pool is configured so, that it can have max 80 threads in pool. Let's also take simple webapp - it makes some processing and DB communication during each request.
At peak time I can see 80 parallel running threads (+ several other infrastructure threads). Let's also assume I running it in 'm1.large' EC2 instance.
I don't think that all these threads can really run in parallel on this hardware. So now scheduler should decide how better to split CPU time between them all. So the questions are - how big is scheduler overhead in this case? How can I find right balance between thread amount and processing speed?
Actors comparison
Having 80+ threads on 4 core CPU doesn't sound healthy to me. Especially if most of them are blocked on some kind of IO (DB, Filesystem, Socket) - they just consume precious resources. What if we will detach request from thread and will have only reasonable amount of threads (8 for instance) and will just send processing tasks to them. Of course in this case IO should be also non-blocking, so that I receive events when some data, that I need, is available and I send event, if I have some results.
As far as I understand, Actor model is all about this. Actors are not bound to threads (at least in Akka and Scala). So I have reasonable thread pool and bunch of actors with mailboxes that contain processing tasks.
Now question is - how actor model compares to traditional thread-per-request model in terms of performance, scheduler overhead and resources (RAM, CPU) consumption?
Custom threads
I have some requests (only several) that take too much time to process. I optimized code and all algorithms, added caches, but it still takes too much time. But I see, that algorithm can be parallelized. It fits naturally in actor model - I just split my big task in several tasks, and then aggregate results somehow (if needed). But in thread-per-request model I need spawn my own threads (or create my small thread pool). As far as I know, it's not recommended practice within Java EE environment. And, from my point of view, it doesn't fits naturally in thread-per-request model. Question arise: how big my thread pool size should be? Even if I will make it reasonable in terms of hardware I still have this bunch of threads managed by servlet container. Thread management becomes decentralized and goes wild.
So my question - what is the best way to deal with these situations in thread-per-request model?

Having 80+ threads on 4 core CPU doesn't sound healthy to me. Especially if most of them are blocked on some kind of IO (DB, Filesystem, Socket) - they just consume precious resourecs.
Wrong. Exactly in this scenario the processors can handle many more threads than the number of individual cores, since most of the threads at any point in time are blocked waiting for I/O. Fair enough, context switching takes time, but that overhead is usually irrelevant compared to file/network/DB latency.
The rule of thumb that the number of threads should be equal - or a little more than - the number of processor cores applies only for computation-intensive tasks when the cores are kept busy most of the time.
I have some requests (only several) that take too much time to process. I optimized code and all algorithms, added caches, but it still takes too much time. But I see, that algorithm can be parallelized. It fits naturally in actor model - I just split my big task in several tasks, and then aggregate results somehow (if needed). But in thread-per-request model I need spawn my own threads (or create my small thread pool). As far as I know, it's not recommended practice within Java EE environment.
Never heard about that (but I don't claim myself to be the ultimate Java EE expert). IMHO there is nothing wrong in executing tasks associated with a single request parallelly using e.g. a ThreadPoolExecutor. Note that these threads are not request handling threads, so they don't directly interfere with the thread pool used by the EJB container. Except that they compete for the same resources of course, so they may slow down or completely stop other request processing threads in a careless setup.
what is the best way to deal with these situations in thread-per-request model?
In the end, you can't escape measuring concurrent performance and fine-tuning the size of your thread pool and other parameters for your own specific environment.

The whole point of Java EE is to put common architectural concerns like security, state, and concurrency into the framework and let you provide the bits of business logic or data mappings along with the wiring to connect them. As such, Java EE intentionally hides the nasty bits of concurrency (locking to read/write mutable state) in the framework.
This approach lets a much broader range of developers successfully write correct applications. A necessary side effect though is that these abstractions create overhead and remove control. That's both good (in making it simple and encoding policies as policies not code) and bad (if you know what you're doing and can make choices impossible in the framework).
It is not inherently bad to have 80 threads on a production box. Most will be blocked or waiting on I/O which is fine. There is a (tunable) pool of threads doing the actual computation and Java EE will give you external hooks to tune those knobs.
Actors are a different model. They also let you write islands of code (the actor body) that (can) avoid locking to modify state. You can write your actors to be stateless (capturing the state in the recursive function call parameters) or hide your state completely in an actor instance so the state is all confined (for react style actors you probably still need to explicitly lock around data access to ensure visibility on the next thread that runs your actor).
I can't say that one or the other is better. I think there is adequate proof that both models can be used to write safe, high-throughput systems. To make either perform well, you need to think hard about your problem and build apps that isolate parts of state and the computations on each kind of state. For code where you understand your data well and have a high potential for parallelism I think models outside Java EE make a lot of sense.
Generally, the rule of thumb in sizing compute-bound thread pools is that they should be approximately equal to N of cores + 2. Many frameworks size to that automatically. You can use Runtime.getRuntime().availableProcessors() to get N. If your problem decomposes in a divide-and-conquer style algorithm and the number of data items is large, I would strongly suggest checking out fork/join which can be used now as a separate library and will be part of Java 7.
As far as how to manage this, you're not supposed to spawn threads as such inside Java EE (they want to control that) but you might investigate sending a request to your data-crunching thread pool via a message queue and handling that request via a return message. That can fit in the Java EE model (a bit clumsily of course).
I have a writeup of actors, fork/join, and some other concurrency models here that you might find interesting: http://tech.puredanger.com/2011/01/14/comparing-concurrent-frameworks/

Is it safe to access EJB home object from multiple threads?

I have read this thread: J2EE/EJB + service locator: is it safe to cache EJB Home lookup result ?
I use the same approach, i.e. I obtain EJB home object for my entity bean and cache it in a servlet.
My question is: is it safe to share this object between multiple threads?
From EJB 2.1 spec I found only that concurrent calls to entity beans [via local / remote interface] are serialized internally by the container.
However, the spec doesn't expand on concurrent calls to home objects.
Does anybody have an idea? The reference to the exact place in a spec / doc would be very welcome as well.

EJBHome and EJBObject are equally thread safe. The container takes all responsibility for the thread safety of those implementations.
Very often an app server will create one instance of a bean's EJBHome or EJBLocalHome and tie it directly into JNDI for all the application to share. I bet if you looked up your EJBLocalHome twice from inside a servlet and did an == compare on the two, there'd be good odds that it was the exact same instance.

Besides technical safety, there's the matter of mental safety.
Taking that into account, every usage of EJB 2.1's home objects should be considered unsafe. You'll be much better of looking into the much saner EJB 3 approach than wasting any time with EJB 2.x.

I don't think EJBHome is thread safe because
First to get EJBHOme Object we get the help of Synchronised Object such as Properties and Hashtable
Second if we implement BusinessDeleigate Design Pattern to cache EJBHome Object we are using Synchronised Map to store the EJBHome. So at a time only one thread can access to EJBHome.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js