is it ok to combine multiple functions to a single operation in wsdl? - web-services

Is there a "rule" for this? What i'm wondering is there best practice that tells how to combine functions to an operation. For example SetRecord-operation: if id is specified for some kind of record the operation updates the record otherwise the operation creates the record. In this case return message would tell if insert or update was made, but would this be bad design (and if it is, why)?
Another example would be that there's contains-hierarchy of records and sometimes it's wanted to create all levels of hiearchy, sometimes 2 levels and sometime only 1. (bad) Example would be hiearchy car-seat-arm rest. Sometimes only a car or a single seat is created. Sometimes a car with 4 seats (each having 2 arm rests) is created. How this is supposed to map to wsdl-operations and types. If you have opinion i would like to know why? I must say that i'm bit lost here.
Thanks and BR - Matti

Although there's no problem on doing that, it violates some principles of good programming patterns.
Your methods and also your classes should do only one thing and no more then one. The Single Responsibility Principle says exactly that:
The Single Responsibility Principle (SRP) says that a class should
have one, and only one, reason to change. To say this a different way,
the methods of a class should change for the same reasons, they should
not be affected by different forces that change at different rates.
It may also violates some other principles, like:
Separation of concerns
Cohesion
I don't even have to say that it can lead to a lot of Code Smells like:
Long Method
Conditional Complexity
Check this good text.

I made some research and i think the answer above is presenting quite narrow view of wsdl inteface design. It is stupid to combine my question's example Insert and Update to Set in a way that the operation done is deduced on the data (checking if id or similar filled in request message). So in that kind of case it's bad because the interface is not really stating what will happen. Having 2 separate operations is much more clear and does not consume any more resources.
However combining operations can be a correct way to do things. Think about my hiearchical data example: It would require 13 request to have a car with 4 seats with all having both arm-rests. All border crossings should be expected as costly. So this one could be combined to single operation.
Read for example:
Is this the Crudy anti pattern?
and
http://msdn.microsoft.com/en-us/library/ms954638.aspx
and you will find out that your answer above was definitely over simplification and all programming principles can't be automatically applied in web service interface design.
Good example in SO-answer above is creating 1st order header and them orderitems with separate requests is bad because e.g. it can be slow and unreliable. They could be combined to
PlaceOrder(invoiceHeader, List<InvoiceLines>)
So the answer is: it depends what you are combining. Too low level CRUD-kinda thing is not way to go but also combining things not needed to be combined shouldn't be. Moreover defining clear interface with clear message structures that tells straight away what will be done is the key here instead of simplyfying it to multiple / single.
-Matti

Related

Unable to fix score trap issue in Optaplanner for a variation of the Task Scheduling Problem

I am working on a variation of the Task scheduling problem. Here are the rules:
Each Task has a start time to be chosen by the optimizer
Each Task requires multiple types of resources (Crew Members) who work in parallel to complete the task. i.e the task can start only when all required types of crew members are available.
There are multiple crew members of a certain type and the optimizer has to choose the crew member of each type for a task. Eg Task A requires an Electrician and a Plumber. There are many electricians and plumbers to choose from.
Here is my domain.
I have created a planning entity called TaskAssignment with 2 planning variables CrewMember and Starttime.
So, if a Task requires 3 types of crew members, then 3 TaskAssignment entities would be associated with it.
I placed a hard constraint to force the Starttime planning variable to be same for all the TaskAssignments corresponding to a particular task.
This works perfectly when I do not add any soft constraints (For example to reduce the total cost of using the resources). But when I add the soft constraint, there seems to be a violation of 1 hard constraint.
My guess if that this is due to a score trap because the starttimes are not changing as a set.
Note: I have tried to avoid using the PlanningList variable. Can anyone suggest a way to solve this issue ?
Your issue appears to be that your scoring function expects all employees on a given task to move simultaneously, but the solver actually moves them one by one. This is a problem in your domain model - it allows this situation to happen, because you only ever assign one employee at a time.
There are two ways of fixing this problem:
Fix your model, so that this is not allowed. For example, if you know that each task requires two people, let there be two variables on the entity, one for each employee. If there is a certain maximum of people per task, have a variable for each, and make them nullable, so that unassigned slots are not an issue. If you don't have a fixed amount of employees per task and you can not get to a reasonable maximum, then this approach will likely not work for you. In that case...
Write coarse-grained custom moves which always move all the employees together.

memoization vs. state-free code

In the development of a stateless Clojure library I encounter a problem: Many functions have to be called repeatedly with the same arguments. Since everything until now is side-effect-free, this will always lead to the same results. I'm considering ways to make this more performative.
My library works like this: Every time a function is called it needs to be passed a state-hash-map, the function returns a replacement with a manipulated state object. So this keeps everything immutable and every sort of state is kept outside of the library.
(require '[mylib.core :as l])
(def state1 (l/init-state))
(def state2 (l/proceed state1))
(def state3 (l/proceed state2))
If proceed should not perform the same operations repeatedly, I have several options to solve this:
Option 1: "doing it by hand"
Store the necessary state in the state-hash-map, and update only where it is necessary. Means: Having a sophisticated mechanism that knows which parts have to be recalculated, and which not. This is always possible, in my case it would be not that trivial. If I implemented this, I'd produce much more code, which in the end is more error prone. So is it necessary?
Option 2: memoize
So there is the temptation to use the memoize function at the critical points in the lib: At the points, at which I'd expect the possibility of repeated function calls with the same args. This is sort of another philosophy of programming: Modelling each step as if it was the first time it has to run. And separating the fact that is called several times to another implementation. (this reminds me of the idea of react/om/reagent's render function)
Option 3: core.memoize
But memoization is stateful - of course. And this - for example - becomes a problem when the lib runs in a web-server. The server would just keep on filling memory with captured results. In my case however it would make sense, to only capture calculated results for each user-session. So it would be perfect to attach the cache to the previously described state-hash-map, which will be passed back by lib.
And it looks like core.memoize provides some tools for this job. Unfortunately it's not that well documented - I don't really find useful information related to the the described situation.
My question is: Do I more or less estimate the possible options correctly? Or are there other options that I have not considered? If not, it looks like the core.memoize is the way to go. Then, I'd appreciate if someone could give me a short pattern at hand, which one should use here.
If state1, state2 & state3 are different in your example, memoization will gain you nothing. proceed would, be called with different arguments each time.
As a general design principle do not impose caching strategies to the consumer. Design so that the consumers of your library have the possibility to use whatever memoization technique, or no memoization at all.
Also, you don't mention if init-state is side-effect free, and if it returns the same state1. If that is so, why not just keep all (or some) states as static literals. If they don't take much space, you can write a macro that calculates them compile time. Say, first 20 states hard-coded, then call proceed.

Defending classes with 'magic numbers'

A few months ago I read a book on security practices, and it suggested the following method for protecting our classes from overwriting with e.g. overflows etc.:
first define a magic number and a fixed-size array (can be a simple integer too)
use that array containing the magic number, and place one at the top, and one at the bottom of our class
a function compares these numbers, and if they are equal, and equal to the static variable, the class is ok, return true, else it is corrupt, and return false.
place this function at the start of every other class method, so this will check the validity of the class on function calls
it is important to place this array at the start and the end of the class
At least this is as I remember it. I'm coding a file encryptor for learning purposes, and I'm trying to make this code exception safe.
So, in which scenarios is it useful, and when should I use this method, or is this something totally useless to count on? Does it depend on the compiler or OS?
PS: I forgot the name of the book mentioned in this post, so I cannot check it again, if anyone of you know which one was it please tell me.
What you're describing sounds a Canary, but within your program, as opposed to the compiler. This is usually on by default when using gcc or g++ (plus a few other buffer overflow countermeasures).
If you're doing mutable operations on your class and you want to make sure you don't have side effects, I don't know if having a magic number is very useful. Why rely on a homebrew validity check when there are mothods out there that are more likely to be successful?
Checksums: I think it'd be more useful for you to hash the unencrypted text and add that to the end of the encrypted file. When decrypting, remove the hash and compare the hash(decrypted text) with what it should be.
I think most, if not all, widely used encryptors/decryptors store some sort of checksum in order to verify that the data has not changed.
This type of a canary will partially protect you against a very specific type of overflow attack. You can make it a little more robust by randomizing the canary value every time you run the program.
If you're worried about buffer overflow attacks (and you should be if you are ever parsing user input), then go ahead and do this. It probably doesn't cost too much in speed to check your canaries every time. There will always be other ways to attack your program, and there might even be careful buffer overflow attacks that get around your canary, but it's a cheap measure to take so it might be worth adding to your classes.

Does it make sense to verify if values are different in a setter

I remember I saw somewhere (probably in Github) an example like this in a setter:
void MyClass::setValue(int newValue)
{
if (value != newValue) {
value = newValue;
}
}
For me it doesn't make a lot of sense, but I wonder if it gives any performance improvement.
It have no sense for scalar types, but it may have sense for some user-defined types (since type can be really "big" or its assignment operator can do some "hard" work).
The deeper the instruction pipeline (and it only gets deeper and deeper on Intel platform at least), the higher the cost of a branch misprediction.
When a branch mispredicts, some instructions from the mispredicted
path still move through the pipeline. All work performed on these
instructions is wasted since they would not have been executed had the
branch been correctly predicted
So yes, adding an if int he code can actually hurt performance. The write would be L1 cached, possibly for a long time. If the write has to be visible then the operation would have to be interlocked to start with.
The only way you can really tell is by actually testing the different alternatives (benchmarking and/or profiling the code). Different compiler, different processors and different code calling it will make a big difference.
In general, and for "simple" data types (int, double, char, pointers, etc), it won't make sense. It will just make the code longer and more complex for the processor [at least if the compiler does what you ask of it - it may realize that "this doesn't make any sense, let's remove this check - I wouldn't rely on that tho' - compilers are often smarter than you, but making life more difficult for the compiler almost never leads to better code].
Edit: Additionally, it only makes GOOD sense to compare things that can be easily compared. If it's difficult to compare the data in the case where they are equal (for example, long strings take a lot of reads from both strings if they are equal [or strings that begin the same, and are only different in the last few characters]. So there is very little saving. The same applies for a class with a bunch of members that are often almost all the same, but one or two fields are not, and so on. On the other hand, if you have a "customer data" class, that has an integer customer ID that must be unique, then comparing just the customer id will be "cheap", but copying the customer name, address, phone number(s), and other data on the customer will be expensive. [Of course, in this case, why is it not a (smart) pointer or reference?]. End Edit.
If the data is "shared" between different processors (multiple threads accessing the same data), then it may help a little bit [in particular if this value is often read, and often written with the same value as before]. This is because "kicking out" the old value from the other processor's caches is expensive, and you only want to do that if you ACTUALLY change something.
And of course, it only makes ANY sense to worry about performance when you are working on code that you know is absolutely on the bleeding edge of the performance hot-path. Anywhere else, making the code as easily readable and as clear and concise as possible is always the best choice - this will also, typically, make the compiler more able to determine what is actually going on and ensure best optimization results.
This pattern is common in Qt, where the API is highly based on signals & slots. This pattern helps to avoid infinite looping in the case of cyclic connections.
In your case, where signals aren't present, this code only kills performance, as pointed out by #remus-rusanu and #mats-petersson.

Optimization and testability at the same time - how to break up code into smaller units

I am trying to break up a long "main" program in order to be able to modify it, and also perhaps to unit-test it. It uses some huge data, so I hesitate:
What is best: to have function calls, with possibly extremely large (memory-wise) data being passed,
(a) by value, or
(b) by reference
(by extremely large, I mean maps and vectors of vectors of some structures and small classes... even images... that can be really large)
(c) Or to have private data that all the functions can access ? That may also mean that main_processing() or something could have a vector of all of them, while some functions will only have an item... With the advantage of functions being testable.
My question though has to do with optimization, while I am trying to break this monster into baby monsters, I also do not want to run out of memory.
It is not very clear to me how many copies of data I am going to have, if I create local variables.
Could someone please explain ?
Edit: this is not a generic "how to break down a very large program into classes". This program is part of a large solution, that is already broken down into small entities.
The executable I am looking at, while fairly large, is a single entity, with non-divisible data. So the data will either be all created as member variable in a single class, which I have already created, or it will (all of it) be passed around as argument around functions.
Which is better ?
If you want unit testing, you cannot "have private data that all the functions can access" because then, all of that data would be a part of each test case.
So, you must think about each function, and define exactly on which part of the data it works. As for function parameters and return values, it's very simple: use pass-by-value for small objects, and pass-by-reference for large objects.
You can use a guesstimate for the threshold that separates small and large. I use the rule "8 is small, anything more is large" but what is good for my system cannot be equally good for yours.
This seems more like a general question about OOP. Split up your data into logically grouped concepts (classes), and place the code that works with those data elements with the data (member functions), then tie it all together with composition, inheritance, etc.
Your question is too broad to give more specific advice.