Ut PL SQL to test stored procedure - unit-testing

I'm new to UT PLSQL. We have a existing application which contains of lots of stored procedures. Most of the procedures at the end insert or update values to tables. Is there any way in utplsql to test these table values?? I can see lot of examples on functions alone than stored procedures.
Thank you

Test the data
In your unit test you can test more than just function results. After executing your stored procedure, you can just query your table and see if it inserted what you expected it would insert.
Depending on the stored proc it might be hard to find exactly which data it inserted, but in many cases you'll be able to do that, either because you can search for specific values, use a sequence to get the inserted ID, and so on.
To compare the data with your expectation, you select the data into variables and compare those against expected values (you could do that in a cursor loop if you need to compare multiple rows), but it may be easier to compare two cursors, one with expected data (you can construct this using select from dual), and one with the actual data.
The documentation, and especially the chapter Advanced data comparison, contains various examples on how to compare cursor data. I'm not gonna paste them here, because I don't know which one applies to your case, and both utPLSQL and its documentation are very much alive, so, it's best to check out the latest version when you need it.
Refactor your proc into a package
Nevertheless, you may find that it's hard to test big, complicated stored procs by the data they output. I've found that the easiest way to refactor this, is to create a package. In the package you can expose a procedure just like the one you have now, but it can call other procedures and functions in the package, which you can also expose. That way, it's easier to test those individual parts, and maybe you can test a big part of the logic without needing to write data, making the tests easier to write and faster to execute.
It's not completely elegant, since you're exposing parts, just for the purpose of testing, that you otherwise wouldn't expose. Nevertheless, I found it's typically really easy to refactor a stored proc into a package, especially if you already used sub-procedures in the stored proc, and this way you can quickly, and without much risk get to a structure that is easy to test.
It does't have to be in packages, you could split it up it separate smaller procedures as well, but I like packages, because they keeps all the logic of the stored proc together, and it allows you to call the proc in roughly the same way as you would before. A package is little more than a grouped set of stored procedures, functions and types. If your application would require it, you can even keep the original stored proc, but let it call its counterpart in the package, that way you got your refactoring without needing to change any of the clients.
Refactor parts of your proc into an object type
If you go a step further, you can make object types. There are various advantages to that, but they work quite different from packages, so if you're not familiar with them, this might be a big step.
First of all, the objects can keep a state, and you can have multiple of those, if you need to. Packages can hold state as well, but only one per session or per call to the database. Object types allow you to create as many instances as you need, and each keeps its own state.
With object types, you have instances of objects that you can pass around. That means you can inject a bit of logic into a stored procedure, by passing it an object of a certain type. Moreover you can make subtypes of object, so if your procedure would not write data to the table, but instead calls a method of some type X that does the actual saving, you can do the test using a subtype Y of type X, that doesn't actually save the data, but just helps you verify if the method was called with the right parameters. You're then getting into the area of mocking, which is a very useful tool to make tests more efficient.
Again a client may not be ready to pass objects like this, so I tend to create two (or more) package procedures. One is the official entry point for the application. It won't do much, except create an object of type X and pass it to the second procedure, that contains the actual logic (optionally split up further). This way, my application can call a simple stored proc, while my tests can call the second stored procedure and pass it an instance of subtype Y if needed.

Related

Use value from Input Port in Parameter of block - Simulink

I have a simulink model that I plan on converting to C code and using elsewhere. I have defined 'input ports' in order to set variables in the simulink model.
I am trying to find a way to use the input variables as part of a State Space block but have tried everything and not sure how else to go about it.
As mentioned this will be converted to C/C++ code so there is no option to use matlab in anyway.
Say I use matrix A in the state-space block parameter. Matrix A is defined liek so A= [Input1 0; Input2 0; 0 Input3]
I want to be able to change the values of the inputs through the code by setting the values of Input1 2 3 etc.
There is a very clear distinction in Simulink between Parameters and Signals. A parameter is something entered into a dialog, while a signal is something fed into or coming out of a block.
The matrices in the State-Space block are defined as parameters, and hence you will never be able to feed your signals into them.
You have two options.
Don't use the State-Space block. Rather develop the state-space model yourself using more fundamental blocks (i.e. integrators, sums and product blocks). This is feasible for small models, but not really recommended.
Note that the Parameters of a block a typically tunable. When you generate code, one of the files will be model_name_data.c and this will contain a parameter structure allowing you to change, the parameters.
Note that in either case, merely from a model design perspective, it'll be up to you to ensure that the changes to the model make sense (for instance don't make any loop, etc. go unstable).
You can not tune the parameter after generating code, because it is inlined with a constant value, this is typically done because it results in the fastest code. To have full control over the behaviour, you have to use tunable parameters. There is a table with different code versions, depending on what you want you can choose the right type of parameter.
Another lazy way to achieve this in many cases is using base workspace variables, very simple to achieve and works fine in the most cases.

C++ - Managing References in Disk Based Vector

I am developing a set of vector classes that all derived from an abstract vector. I am doing this so that in our software that makes use of these vectors, we can quickly switch between the vectors without any code breaking (or at least minimize failures, but my goal is full compatibility). All of the vectors match.
I am working on a Disk Based Vector that mostly conforms to match the STL Vector implementation. I am doing this because we need to handle large out of memory files that contain various formats of data. The Disk Vector handles data read/write to disk by using template specialization/polymorphism of serialization and deserialization classes. The data serialization and deserialization has been tested, and it works (up to now). My problem occurs when dealing with references to the data.
For example,
Given a DiskVector dv, a call to dv[10] would get a point to a spot on disk, then seek there, read out the char stream. This stream gets passed to a deserializor which converts the byte stream into the appropriate data type. Once I have the value, I my return it.
This is where I run into a problem. In the STL, they return it as a reference, so in order to match their style, I need to return a reference. What I do it store the value in an unordered_map with the given index (in this example, 10). Then I return a reference to the value in the unordered_map.
If this continues without cleanup, then the purpose of the DiskVector is lost because all the data just gets loaded into memory, which is bad due to data size. So I clean up this map by deleting the indexes later on when other calls are made. Unfortunately, if a user decided to store this reference for a long time, and then it gets deleted in the DiskVector, we have a problem.
So my questions
Is there a way to see if any other references to a certain instance are in use?
Is there a better way to solve this while still maintaining the polymorphic style for reasons described at the beginning?
Is it possible to construct a special class that would behave as a reference, but handle the disk IO dynamically so I could just return that instead?
Any other ideas?
So a better solution at what I was trying to do is to use SQLite as the backend for the database. Use BLOBs as the column types for key and value columns. This is the approach I am taking now. That said, in order to get it to work well, I need to use what cdhowie posted in the comments to my question.

Optimization and testability at the same time - how to break up code into smaller units

I am trying to break up a long "main" program in order to be able to modify it, and also perhaps to unit-test it. It uses some huge data, so I hesitate:
What is best: to have function calls, with possibly extremely large (memory-wise) data being passed,
(a) by value, or
(b) by reference
(by extremely large, I mean maps and vectors of vectors of some structures and small classes... even images... that can be really large)
(c) Or to have private data that all the functions can access ? That may also mean that main_processing() or something could have a vector of all of them, while some functions will only have an item... With the advantage of functions being testable.
My question though has to do with optimization, while I am trying to break this monster into baby monsters, I also do not want to run out of memory.
It is not very clear to me how many copies of data I am going to have, if I create local variables.
Could someone please explain ?
Edit: this is not a generic "how to break down a very large program into classes". This program is part of a large solution, that is already broken down into small entities.
The executable I am looking at, while fairly large, is a single entity, with non-divisible data. So the data will either be all created as member variable in a single class, which I have already created, or it will (all of it) be passed around as argument around functions.
Which is better ?
If you want unit testing, you cannot "have private data that all the functions can access" because then, all of that data would be a part of each test case.
So, you must think about each function, and define exactly on which part of the data it works. As for function parameters and return values, it's very simple: use pass-by-value for small objects, and pass-by-reference for large objects.
You can use a guesstimate for the threshold that separates small and large. I use the rule "8 is small, anything more is large" but what is good for my system cannot be equally good for yours.
This seems more like a general question about OOP. Split up your data into logically grouped concepts (classes), and place the code that works with those data elements with the data (member functions), then tie it all together with composition, inheritance, etc.
Your question is too broad to give more specific advice.

Building static (but complicated) lookup table using templates

I am currently in the process of optimizing a numerical analysis code. Within the code, there is a 200x150 element lookup table (currently a static std::vector <std::vector<double>> ) that is constructed at the beginning of every run. The construction of the lookup table is actually quite complex- the values in the lookup table are constructed using an iterative secant method on a complicated set of equations. Currently, for a simulation, the construction of the lookup table is 20% of the run time (run times are on the order of 25 second, lookup table construction takes 5 seconds). While 5-seconds might not seem to be a lot, when running our MC simulations, where we are running 50k+ simulations, it suddenly becomes a big chunk of time.
Along with some other ideas, one thing that has been floated- can we construct this lookup table using templates at compile time? The table itself never changes. Hard-coding a large array isn't a maintainable solution (the equations that go into generating the table are constantly being tweaked), but it seems that if the table can be generated at compile time, it would give us the best of both worlds (easily maintainable, no overhead during runtime).
So, I propose the following (much simplified) scenario. Lets say you wanted to generate a static array (use whatever container suits you best- 2D c array, vector of vectors, etc..) at compile time. You have a function defined-
double f(int row, int col);
where the return value is the entry in the table, row is the lookup table row, and col is the lookup table column. Is it possible to generate this static array at compile time using templates, and how?
Usually the best solution is code generation. There you have all the freedom and you can be sure that the output is actually a double[][].
Save the table on disk the first time the program is run, and only regenerate it if it is missing, otherwise load it from the cache.
Include a version string in the file so it is regenerated when the code changes.
A couple of things here.
What you want to do is almost certainly at least partially possible.
Floating point values are invalid template arguments (just is, don't ask why). Although you can represent rational numbers in templates using N1/N2 representation, the amount of math that you can do on them does not encompass the entire set that can be done on rational numbers. root(n) for instance is unavailable (see root(2)). Unless you want a bajillion instantiations of static double variables you'll want your value accessor to be a function. (maybe you can come up with a new template floating point representation that splits exp and mant though and then you're as well off as with the double type...have fun :P)
Metaprogramming code is hard to format in a legible way. Furthermore, by its very nature, it's rather tough to read. Even an expert is going to have a tough time analyzing a piece of TMP code they didn't write even when it's rather simple.
If an intern or anyone under senior level even THINKS about just looking at TMP code their head explodes. Although, sometimes senior devs blow up louder because they're freaking out at new stuff (making your boss feel incompetent can have serious repercussions even though it shouldn't).
All of that said...templates are a Turing-complete language. You can do "anything" with them...and by anything we mean anything that doesn't require some sort of external ability like system access (because you just can't make the compiler spawn new threads for example). You can build your table. The question you'll need to then answer is whether you actually want to.
Why not have separate programs? One that generates the table and stores it in a file, and one that loads the file and runs the simulation on it. That way, when you need to tweak the equations that generate the table, you only need to recompile that program.
If your table was a bunch of ints, then yes, you could. Maybe. But what you certainly couldn't do is generate doubles at compile-time.
More importanly, I think that a plain double[][] would be better than a vector of vectors here- you're pushing a LOT of dynamic allocation for a statically sized table.

How to handle changing data structures on program version update?

I do embedded software, but this isn't really an embedded question, I guess. I don't (can't for technical reasons) use a database like MySQL, just C or C++ structs.
Is there a generic philosophy of how to handle changes in the layout of these structs from version to version of the program?
Let's take an address book. From program version x to x+1, what if:
a field is deleted (seems simple enough) or added (ok if all can use some new default)?
a string gets longer or shorter? An int goes from 8 to 16 bits of signed / unsigned?
maybe I combine surname/forename, or split name into two fields?
These are just some simple examples; I am not looking for answers to those, but rather for a generic solution.
Obviously I need some hard coded logic to take care of each change.
What if someone doesn't upgrade from version x to x+1, but waits for x+2? Should I try to combine the changes, or just apply x -> x+ 1 followed by x+1 -> x+2?
What if version x+1 is buggy and we need to roll-back to a previous version of the s/w, but have already "upgraded" the data structures?
I am leaning towards TLV (http://en.wikipedia.org/wiki/Type-length-value) but can see a lot of potential headaches.
This is nothing new, so I just wondered how others do it....
I do have some code where a longer string is puzzled together from two shorter segments if necessary. Yuck. Here's my experience after 12 years of keeping some data compatible:
Define your goals - there are two:
new versions should be able to read what old versions write
old versions should be able to read what new versions write (harder)
Add version support to release 0 - At least write a version header. Together with keeping (potentially a lot of) old reader code around that can solve the first case primitively. If you don't want to implement case 2, start rejecting new data right now!
If you need only case 1, and and the expected changes over time are rather minor, you are set. Anyway, these two things done before the first release can save you many headaches later.
Convert during serialization - at run time, only keep the data in the "new format" in memory. Do necessary conversions and tests at persistence limits (convert to newest when reading, implement backward compatibility when writing). This isolates version problems in one place, helping to avoid hard-to-track-down bugs.
Keep a set of test data from all versions around.
Store a subset of available types - limit the actually serialized data to a few data types, such as int, string, double. In most cases, the extra storage size is made up by reduced code size supporting changes in these types. (That's not always a tradeoff you can make on an embedded system, though).
e.g. don't store integers shorter than the native width. (you might need to do that when you need to store long integer arrays).
add a breaker - store some key that allows you to intentionally make old code display an error message that this new data is incompatible. You can use a string that is part of the error message - then your old version could display an error message it doesn't know about - "you can import this data using the ConvertX tool from our web site" is not great in a localized application but still better than "Ungültiges Format".
Don't serialize structs directly - that's the logical / physical separation. We work with a mix of two, both having their pros and cons. None of these can be implemented without some runtime overhead, which can pretty much limit your choices in an embedded environment. At any rate, don't use fixed array/string lengths during persistence, that should already solve half of your troubles.
(A) a proper serialization mechanism - we use a bianry serializer that allows to start a "chunk" when storing, which has its own length header. When reading, extra data is skipped and missing data is default-initialized (which simplifies implementing "read old data" a lot in the serializationj code.) Chunks can be nested. That's all you need on the physical side, but needs some sugar-coating for common tasks.
(B) use a different in-memory representation - the in-memory reprentation could basically be a map<id, record> where id woukld likely be an integer, and record could be
empty (not stored)
a primitive type (string, integer, double - the less you use the easier it gets)
an array of primitive types
and array of records
I initially wrote that so the guys don't ask me for every format compatibility question, and while the implementation has many shortcomings (I wish I'd recognize the problem with the clarity of today...) it could solve
Querying a non existing value will by default return a default/zero initialized value. when you keep that in mind when accessing the data and when adding new data this helps a lot: Imagine version 1 would calculate "foo length" automatically, whereas in version 2 the user can overrride that setting. A value of zero - in the "calculation type" or "length" should mean "calculate automatically", and you are set.
The following are "change" scenarios you can expect:
a flag (yes/no) is extended to an enum ("yes/no/auto")
a setting splits up into two settings (e.g. "add border" could be split into "add border on even days" / "add border on odd days".)
a setting is added, overriding (or worse, extending) an existing setting.
For implementing case 2, you also need to consider:
no value may ever be remvoed or replaced by another one. (But in the new format, it could say "not supported", and a new item is added)
an enum may contain unknown values, other changes of valid range
phew. that was a lot. But it's not as complicated as it seems.
There's a huge concept that the relational database people use.
It's called breaking the architecture into "Logical" and "Physical" layers.
Your structs are both a logical and a physical layer mashed together into a hard-to-change thing.
You want your program to depend on a logical layer. You want your logical layer to -- in turn -- map to physical storage. That allows you to make changes without breaking things.
You don't need to reinvent SQL to accomplish this.
If your data lives entirely in memory, then think about this. Divorce the physical file representation from the in-memory representation. Write the data in some "generic", flexible, easy-to-parse format (like JSON or YAML). This allows you to read in a generic format and build your highly version-specific in-memory structures.
If your data is synchronized onto a filesystem, you have more work to do. Again, look at the RDBMS design idea.
Don't code a simple brainless struct. Create a "record" which maps field names to field values. It's a linked list of name-value pairs. This is easily extensible to add new fields or change the data type of the value.
Some simple guidelines if you're talking about a structure use as in a C API:
have a structure size field at the start of the struct - this way code using the struct can always ensure they're dealing only with valid data (for example, many of the structures the Windows API uses start with a cbCount field so these APIs can handle calls made by code compiled against old SDKs or even newer SDKs that had added fields
Never remove a field. If you don't need to use it anymore, that's one thing, but to keep things sane for dealing with code that uses an older version of the structure, don't remove the field.
it may be wise to include a version number field, but often the count field can be used for that purpose.
Here's an example - I have a bootloader that looks for a structure at a fixed offset in a program image for information about that image that may have been flashed into the device.
The loader has been revised, and it supports additional items in the struct for some enhancements. However, an older program image might be flashed, and that older image uses the old struct format. Since the rules above were followed from the start, the newer loader is fully able to deal with that. That's the easy part.
And if the struct is revised further and a new image uses the new struct format on a device with an older loader, that loader will be able to deal with it, too - it just won't do anything with the enhancements. But since no fields have been (or will be) removed, the older loader will be able to do whatever it was designed to do and do it with the newer image that has a configuration structure with newer information.
If you're talking about an actual database that has metadata about the fields, etc., then these guidelines don't really apply.
What you're looking for is forward-compatible data structures. There are several ways to do this. Here is the low-level approach.
struct address_book
{
unsigned int length; // total length of this struct in bytes
char items[0];
}
where 'items' is a variable length array of a structure that describes its own size and type
struct item
{
unsigned int size; // how long data[] is
unsigned int id; // first name, phone number, picture, ...
unsigned int type; // string, integer, jpeg, ...
char data[0];
}
In your code, you iterate through these items (address_book->length will tell you when you've hit the end) with some intelligent casting. If you hit an item whose ID you don't know or whose type you don't know how to handle, you just skip it by jumping over that data (from item->size) and continue on to the next one. That way, if someone invents a new data field in the next version or deletes one, your code is able to handle it. Your code should be able to handle conversions that make sense (if employee ID went from integer to string, it should probably handle it as a string), but you'll find that those cases are pretty rare and can often be handled with common code.
I have handled this in the past, in systems with very limited resources, by doing the translation on the PC as a part of the s/w upgrade process. Can you extract the old values, translate to the new values and then update the in-place db?
For a simplified embedded db I usually don't reference any structs directly, but do put a very light weight API around any parameters. This does allow for you to change the physical structure below the API without impacting the higher level application.
Lately I'm using bencoded data. It's the format that bittorrent uses. Simple, you can easily inspect it visually, so it's easier to debug than binary data and is tightly packed. I borrowed some code from the high quality C++ libtorrent. For your problem it's so simple as checking that the field exist when you read them back. And, for a gzip compressed file it's so simple as doing:
ogzstream os(meta_path_new.c_str(), ios_base::out | ios_base::trunc);
Bencode map(Bencode::TYPE_MAP);
map.insert_key("url", url.get());
map.insert_key("http", http_code);
os << map;
os.close();
To read it back:
igzstream is(metaf, ios_base::in | ios_base::binary);
is.exceptions(ios::eofbit | ios::failbit | ios::badbit);
try {
torrent::Bencode b;
is >> b;
if( b.has_key("url") )
d->url = b["url"].as_string();
} catch(...) {
}
I have used Sun's XDR format in the past, but I prefer this now. Also it's much easier to read with other languages such as perl, python, etc.
Embed a version number in the struct or, do as Win32 does and use a size parameter.
if the passed struct is not the latest version then fix up the struct.
About 10 years ago I wrote a similar system to the above for a computer game save game system. I actually stored the class data in a seperate class description file and if i spotted a version number mismatch then I coul run through the class description file, locate the class and then upgrade the binary class based on the description. This, obviously required default values to be filled in on new class member entries. It worked really well and it could be used to auto generate .h and .cpp files as well.
I agree with S.Lott in that the best solution is to separate the physical and logical layers of what you are trying to do. You are essentially combining your interface and your implementation into one object/struct, and in doing so you are missing out on some of the power of abstraction.
However if you must use a single struct for this, there are a few things you can do to help make things easier.
1) Some sort of version number field is practically required. If your structure is changing, you will need an easy way to look at it and know how to interpret it. Along these same lines, it is sometimes useful to have the total length of the struct stored in a structure field somewhere.
2) If you want to retain backwards compatibility, you will want to remember that code will internally reference structure fields as offsets from the structure's base address (from the "front" of the structure). If you want to avoid breaking old code, make sure to add all new fields to the back of the structure and leave all existing fields intact (even if you don't use them). That way, old code will be able to access the structure (but will be oblivious to the extra data at the end) and new code will have access to all of the data.
3) Since your structure may be changing sizes, don't rely on sizeof(struct myStruct) to always return accurate results. If you follow #2 above, then you can see that you must assume that a structure may grow larger in the future. Calls to sizeof() are calculated once (at compile time). Using a "structure length" field allows you to make sure that when you (for example) memcpy the struct you are copying the entire structure, including any extra fields at the end that you aren't aware of.
4) Never delete or shrink fields; if you don't need them, leave them blank. Don't change the size of an existing field; if you need more space, create a new field as a "long version" of the old field. This can lead to data duplication problems, so make sure to give your structure a lot of thought and try to plan fields so that they will be large enough to accommodate growth.
5) Don't store strings in the struct unless you know that it is safe to limit them to some fixed length. Instead, store only a pointer or array index and create a string storage object to hold the variable-length string data. This also helps protect against a string buffer overflow overwriting the rest of your structure's data.
Several embedded projects I have worked on have used this method to modify structures without breaking backwards/forwards compatibility. It works, but it is far from the most efficient method. Before long, you end up wasting space with obsolete/abandoned structure fields, duplicate data, data that is stored piecemeal (first word here, second word over there), etc etc. If you are forced to work within an existing framework then this might work for you. However, abstracting away your physical data representation using an interface will be much more powerful/flexible and less frustrating (if you have the design freedom to use such a technique).
You may want to take a look at how Boost Serialization library deals with that issue.