I am trying to use the NDbUnit. I have created seperate XSD for each table instead of one large XSD for complete database.
My tests run fine when I use only single XSD and singe xml read. However for a perticular test, I need to have data in two or three different (but related) tables. If I try to read more than one xsd and xml, then it throws exception.
Here is my code
[ClassInitialize()]
public static void MyClassInitialize(TestContext testContext)
{
IDbConnection connection = DbConnection.GetCurrentDbConnection();
_mySqlDatabase = new NDbUnit.Core.SqlClient.SqlDbUnitTest(connection);
_mySqlDatabase.ReadXmlSchema(#"Data\CompanyMaster.xsd");
_mySqlDatabase.ReadXml(#"Data\CompanyMaster.xml");
_mySqlDatabase.ReadXmlSchema(#"Data\License.xsd");
_mySqlDatabase.ReadXml(#"Data\License.xml");
_mySqlDatabase.ReadXmlSchema(#"Data\LicenseDetails.xsd");
_mySqlDatabase.ReadXml(#"Data\LicenseDetails.xml");
_mySqlDatabase.ReadXmlSchema(#"RelatedLicense.xsd");
_mySqlDatabase.ReadXml(#"Data\RelatedLicense.xml");
}
Here is the exception I get at the point where i try to read License.XSD as shown above
Class Initialization method
ESMS.UnitTest.CompanyManagerTest.MyClassInitialize
threw exception.
System.ArgumentException:
System.ArgumentException: Item has
already been added. Key in dictionary:
'EnableTableAdapterManager' Key being
added: 'EnableTableAdapterManager'.
I am not sure if this is the correct way of reading multiple XML,XSD with NDbUnit. I googled and Overflowed (i.e. searched stack overflow), but could not get any sensible direction. Could someone explain what is going wrong and how to correct?
This isn't how NDbUnit is intended to be used. There is no support for reading multiple XSD or XML files into a single test-scope. NDbUnit uses the information in the single XSD to analyze relationships (FKs, etc.) between your tables in order to be able to properly manipulate the tables during its CRUD operations and so the requirement is that the single XSD describe the entire scope of the tables that you want NDbUnit to manipulate during a test-run.
It might be possible to load multiple XML files (containing your test data) but this is not a tested/supported scenario. I'd be interested in understanding what usage scenario you have that would preclude having just one XML file with your needed test data.
But its definitely the case that only a single XSD file (containing the schema of one or more tables and their relationships, etc.) can be loaded at a time.
Hope this clears this up a bit.
Sbohlen showed me the way.
It's true that as thing stands as of now, loading of multiple XSD's is not supported.
However fortunately loading of multiple XMLs against single XSD is possible.
So what I did was created a single XSD and pulled in all related tables onto it. Then used the AppendXml sytanx available along side of ReadXml. This way I could load the required test data into multiple tables and my tests started getting passed.
This link would share more lights about the AppendXml http://code.google.com/p/ndbunit/issues/detail?id=27
Related
for a project, I am trying to create a web-app that, among other things, allows training of machine learning agents using python libraries such as Dedupe or TensorFlow. In cases such as Dedupe, I need to provide an interface for active learning, which I currently realize through jquery based ajax calls to a view that takes and sends the necessary training data.
The problem is that I need this agent object to stay alive throughout multiple view calls and be accessible by each individual call. I have tried realizing this via the built-in cache system using Memcached, but the serialization does not seem to keep all the info intact, and while I am technically able to restore the object from the cache, this appears to break the training algorithm.
Essentially, I want to keep the object alive within the application itself (rather than an external memory store) and be able to access it from another view, but I am at a bit of a loss of how to realize this.
If someone knows the proper technique to achieve this, I would be very grateful.
Thanks in advance!
To follow up with this question, I have since realized that the behavior shown seemed to have been an effect of trying to use the result of a method call from the object loaded from cache directly in the return properties of a view. Specifically, my code looked as follows:
#model is the object loaded from cache
#this returns the wrong object (same object as on an earlier call)
return JsonResponse({"pairs": model.uncertain_pairs()})
and was changed to the following
#model is the object loaded from cache
#this returns the correct object (calls and returns the model.uncertain_pairs() method properly)
uncertain = model.uncertain_pairs()
return JsonResponse({"pairs": uncertain})
I am unsure if this specifically happens due to an implementation from Dedupe or Django side or due to Python, but this has undoubtedly fixed the issue.
To return back to the question, Django does seem to be able to properly (de-)serialize objects and their properties in cache, as long as the cache is set up properly (see Apparent bug storing large keys in django memcached which I also had to deal with)
A repository pattern is there to abstract away the actual data source and I do see a lot of benefits in that, but a repository should not use IQueryable to prevent leaking DB information and it should always return domain objects, not DTO's or POCO's, and it is this last thing I have trouble with getting my head around.
If a repository pattern always has to return a domain object, doesn't that mean it fetches way too much data most of the times? Lets say it returns an employee domain object with forty properties and in the service and view layers consuming that object only five of those properties are actually used.
It means the database has fetched a lot of unnecessary data a pumped that across the network. Doing that with one object is hardly noticeable, but if millions of records are pushed across that way and a lot of of the data is thrown away every time, is that not considered bad behavior?
Yes, when adding or editing or deleting the object, you will use the entire object, but reading the entire object and pushing it to another layer which uses only a fraction of it is not utilizing the underline database and network in the most optimal way. What am I missing here?
There's nothing preventing you from having a separate read model (which could a separately stored projection of the domain or a query-time projection) and separating out the command and query concerns - CQRS.
If you then put something like GraphQL in front of your read side then the consumer can decide exactly what data they want from the full model down to individual field/property level.
Your commands still interact with the full domain model as before (except where it's a performance no-brainer to use set based operations).
How do I create a model dynamically upon uploading a csv file? I have done the part where it can read the csv file.
This doc explains very well how to dynamically create models at runtime in django. It also links to an example of doing so.
However, as you will see after looking at the document, it is quite complex and cumbersome to do this. I would not recommend doing this and believe it is quite likely you can determine a model ahead of time that is flexible enough to handle the CSV. This would be much better practice since dynamically changing the schema of your database as your application is running is a recipe for a ton of bugs in your code.
I understand that you want to create new schema's on the fly based on fields in the those in a CSV. While thats a valid use case and could be the absolute right call. I doubt it though - it lends itself to a data model for a single tenet SaaS application that could have goofy performance and migration issues.
I'd try using Mongo/ some other NoSQL solutions as others have mentioned. But a simpler approach may be a modified Star Schema implemented in SQL. In this case you create a dimensions tables that stores each header, then create an instance of each data element that has a foreign key to dimension and records the value of that dimension.
If you read the csv the psuedo code would look something like this:
for row in DictReader(file):
for k in row.keys():
try:
dim = Dimension.objects.get(name=k)
except:
dim = Dimension(name=k)
dim.save()
DimensionRecord(dimension=dim, value=row[k]
Obviously you could better handle reading the headers and error trapping if dimensions already exist, but this would be an example of how you could dynamically load variable headered CSV's into a SQL db.
Is is possible (without writing custom SQL) to have Eclipselink trust me as to whether to perform an update or insert on a merge, rather than perform a select, then an update or insert? If so, how?
In my mind I'd like to use a transient flag and a custom if statement to determine whether the item is already in the database or not, and instruct eclipselink to perform the query required. I understand Hibernate provides this as update() and save()
A few notable points:
I have a large amount of objects that are being batch-merged, as such
persist() is not suitable for me (also the objects do not exist in the
library except when they are passed in for the merge anyway)
Because there are so many objects being merged, it is unlikely that there will be any cache hits, so eclipselink has been unable to tell if its sent it in before via the cache
Because amount of objects going in (to a non-local database, in this case) the SELECTs are a problem, especially given I can tell which will be required before the operation occurs
I'd really rather not switch to Hibernate
Thanks. Perhaps I am missing something obvious!
What you are looking for is called existence checking in EclipseLink, and can be configured using the #ExistenceChecking annotation as described here:
http://eclipse.org/eclipselink/documentation/2.4/jpa/extensions/a_existencechecking.htm
Try specifying #ExistenceChecking(ExistenceType.CHECK_CACHE) as while it states that check_cache is the default,this is for Native EclipseLink projects. JPA projects use a Check_Database as the default to conform to the JPA specification requiring that merge calls merge into data from the database if necessary. Using the check_cache will prevent EclipseLink from querying at all, so you can query yourself based on your own criteria. Existing objects will be required to be in the cache though, otherwise there is nothing to merge into, and EclipseLink will have to perform an insert.
Another option is to use a customizer to define the DoesExistQuery used for each class. This could allow you to override the checkEarlyReturn method to perform as needed to determine existence.
The above options still use the JPA merge and so still require getting the existing data to merge into - so it will still require selects for existing objects not in the cache. If all you are after is an update all type statement that will update the object or insert the object as is, without tracking only what has changed, you might try looking at native EclipseLink functionality, such as the UnitOfWork api. Using something like ((EntityManagerImpl)em.getDelegate()).getUnitOfWork().updateObject(entity) or use the UOW execute your own UpdateObjectQuery would avoid the selects for existing objects, at the loss of only sending changes.
I got fixes for skipping select call before insert/update in eclipse link jpa
https://www.eclipse.org/eclipselink/documentation/2.4/jpa/extensions/a_existencechecking.htm
Here is a description of the scenario and I would appreciate also any comments on the approach used
The core of my application is a set of web services backed by a P2P database. One service accepts a simple XML-based record (I have designed a generic schema for it). The service processes this data (mainly creating keys based on certain criteria) and pass the original data along with the created keys to a listening SocketServer in one of the listening P2P nodes. This key,data pair is routed to the proper node, which stores the data (associated with the key as an ID) in an XML database.
A second service accepts a query document that is structured based on the same schema, but with optional values that would be used for searching and matching from the previously stored ones. So the second service would pass this query (with the proper keys) to the P2P part, get back the results and pass them back to the service client.
E.g. if the original record submitted to the first service was < attr1 >value1 < /attr1 > < attr2 > value2 < / attr2 > (attribute list along with some other metadata mandated by the schema) then the second service should retrieve that record if the query received was < attr2 >value2 < / attr2 >
(I could later think about using more complex XPath or XQuery queries as the underlying XML database allows instead of exact matches for values here but that is not important at this stage. there is also a third service I am working on but it depends on getting the first two in proper shape first)
So my questions are:
1) What data type should I use as the parameters of the web services? How to utilize my schema for this usage? I was considering various XML binding frameworks (especially JAXB and SDO) for this but didn't know how to proceed.
2) How can I enhance the two services (call them store and search) to use dynamically created templates based on the original generic schema? The service would still accept documents of the main schema type but has the inner attribute list based on a template say template1 only requires whose values are ints while template2 require (float) and (string). The current JSP-based prototype manually creates this template but as an XML document that is assembled by hand (<>tags dispersed in text) and there is no type checking what so ever so I thought I could do better!
3) Is it possible to generate a quick web app prototype for simple access to this system (again by using the schema (&templates) to edit the appropriate XML message structures? What I am looking for is for the (human) user to choose a template and then just "fill in the blanks" and submit, no need for any fancy look and feel.
4) Can I or how can I also use this XML message type for communicating across sockets?
5) Does it matter if I deploy the services as stateless EJBs or not? Do I need them to be EJBs or servlets would be more than enough?
I currently have a rudimentary implementation (from previous developers) that were meant for a subset of my current requirements (I am improving on the services and adding new derived ones) but there was no schema nor validation and the data is passed all along as basic strings, thus providing weak typing and difficult to update manual parsing. The reason I want to update this to a stronger bound typing is to introduce changes in the data schema that would be passed along the whole system easily. Basically I want the system to be as less coupled to the data format/schema used as possible; the current prototype is too coupled to the data that I am finding it extremely difficult to change the data without breaking the system.
My initial investigation led me to consider JAXB but it supports only static typing (cannot create a schema/types dynamically at runtime that I want to persist for later usage). So I came across SDO which has both dynamic and static typing. The problem is just that there is not enough community and/or examples of using this approach so it seems risky (the examples of Apache Tuscany and Eclipselink implementations are very scarce and I could not find complete examples that are not 5+ years old (like this http://www.ibm.com/developerworks/java/library/j-sdo/) and also tackles the XML use case of SDO (most seem to focus on the relational usage of SDO).
This is my first time asking for programming help (here and elsewhere) so please bear with me. I searched a lot on the net but I could not find anything useful but pieces here and there that did not add up.
Any comment or hint is really appreciated.
trfndr
EDIT
I forgot one thing: how would the search service get back the results? Since it is opening a client socket connection, there is no way to get back any results synchronously. The current implementation tackles this by having the service client opening a listening socket on a random port and putting this contact info in the query document. After the search web service sends the query to the p2p part it finishes. The p2p sends the results as a WS call to another service which sends them back to the service client socket. I don't like this approach much, is there any more elegant solution?
I lead the EclipseLink JAXB & SDO implementations and represent Oracle on those specifications so hopefully I can help you out. This question is very similar to talk I'm giving at JavaOne in September.
1) What data type should I use as the
parameters of the web services? How to
utilize my schema for this usage? I
was considering various XML binding
frameworks (especially JAXB and SDO)
for this but didn't know how to
proceed.
This depend's on what web service framework you are using. JAXB is much easier to use with JAX-WS, and while JAXB is still easier to use with JAX-RS SDO, is a possible alternative.
2) How can I enhance the two services
(call them store and search) to use
dynamically created templates based on
the original generic schema? The
service would still accept documents
of the main schema type but has the
inner attribute list based on a
template say template1 only requires
whose values are ints while template2
require (float) and (string). The
current JSP-based prototype manually
creates this template but as an XML
document that is assembled by hand
(<>tags dispersed in text) and there
is no type checking what so ever so I
thought I could do better!
I'm not 100% what you mean here, but the following may be helpful:
Using #XmlAnyElement to Build a Generic Message
3) Is it possible to generate a quick
web app prototype for simple access to
this system (again by using the schema
(&templates) to edit the appropriate
XML message structures? What I am
looking for is for the (human) user to
choose a template and then just "fill
in the blanks" and submit, no need for
any fancy look and feel.
JAX-RS is a nice framework for creating quick prototypes. Below is an example I created:
Part 1 - The Database
Part 2 - Mapping the Database to Objects
Part 3 - Mapping the Objects to XML
Part 4 - The RESTful Service
Part 5 - The Client
4) Can I or how can I also use this
XML message type for communicating
across sockets?
I prefer frameworks like JAX-RS that communicate over the HTTP protocol.
5) Does it matter if I deploy the
services as stateless EJBs or not? Do
I need them to be EJBs or servlets
would be more than enough?
My preference is to use an EJB session bean for the service. If you are interacting with a database then you can leverage the Java Transaction API (JTA) to manage your database transactions.
Part 4 - The RESTful Service
SDO
EclipseLink is the SDO 2.1.1 (JSR-235) reference implementation. We have some examples posted below. If you are looking how to do something specific, I will try to post a relevant example.
http://wiki.eclipse.org/EclipseLink/Examples/SDO
JAXB
JAXB is static. It is also more popular than SDO. Recognizing this in EclipseLink we have implemented a dynamic JAXB feature. It gives you the dynamic aspect of SDO with a JAXB slant.
http://wiki.eclipse.org/EclipseLink/Examples/MOXy/Dynamic
EDIT #1
Since you are dealing with JAX-WS and your model is almost entirely dynamic, I think you should skip the JAXB binding altogether. In the following link see the section "Switching Off Data Binding"
http://java.sun.com/developer/technicalArticles/xml/jaxrpcpatterns3/
This will give us the body of the message as a javax.xml.transform.Source object. We will need to process the XML based on the dynamic templates. SDO would be a good choice here. You can constantly add new types to the HelperContext using XML schemas.
helperContext.getXSDHelper().define(schema1, null);
helperContext.getXSDHelper().define(schema2, null);
You wil be able to unmarshal the Source from the web service as follows:
XMLDocument doc = helperContext.getXMLHelper().load(source, null, null);
DataObject rootDataObject = doc.getRootObject();
String someValue = rootDataObject.getString("attr3/childAttr/anotherChildAttr");
You will also be able to use the XMLHelper to marshal your objects to XML when calling another service.