Using great expectations with streamed data

Using great expectations with streamed data - great-expectations

I am using great expectations to test streaming data (I collect a sample into a batch and test the batch). The issue is I cannot use the docs because this will results in 100 of 1000s of html pages being generated. What I would like to do is use my api to generate the page requested from the json result when the specific test results are clicked on (via the index page). Is great expectations able to generate only 1 html which can be disposed of when it is closed?

If you are using a ValidationOperator / Checkpoint, then using the UpdateDataDocsAction action supports only building the resources that were validated in that run, and is the recommended approach.
If you are interacting directly with the DataContext API, then the build_data_docs method on DataContext supports a resource identifier option that you can use to request only a single asset is built. I think to get the behavior you're looking for (a truly ephemeral build of just that page), you'd want to pair that with a site configuration for a site in a temporary location, e.g. /tmp.
The docs on the build_data_docs method are here:
https://docs.greatexpectations.io/en/latest/autoapi/great_expectations/data_context/data_context/index.html#great_expectations.data_context.data_context.BaseDataContext.build_data_docs
Note that the resource_identifiers parameter requires, e.g. a ValidationResultIdentifier object, such as:
context.build_data_docs("local_site", resource_identifiers=[ValidationResultIdentifier(
run_id="20201203T182816.362147Z",
expectation_suite_identifier=ExpectationSuiteIdentifier("foo"),
batch_identifier="b739515cf1c461d67b4e56d27f3bfd02",
)])

Related

what's the difference between a collection and a store in REST?

I'm trying to wrap my head around the difference between a "collection" and a "store" in REST. From what I've read so far,
a collection is:
"a server-managed directory of resources"
and a store is a:
"client-managed resource repository"
I found this post: How "store" REST archetype isn't creating a new resource and a new URI?
but it didn't really help me clarify the difference. I mean, I understand one is controlled by the server and the other by the client... but can someone give me a concrete example of what a store might be in a real world application?
I *think it's something like this:
GET http://myrestapplication.com/widgets/{widget_id} -- retrieves a widget from db
POST http://myrestapplication.com/widgets/{widget_id} -- creates a new widget from db
PUT http://myrestapplication.com/widgets/{widget_id},[list of updated parms & their vals] -- update widget
PUT http://myrestapplication.com/users/johndoe/mywishlist/{widget_id} -- updates john doe's profile to add a widget that already exists in the database... but links to it as his favorite one or one that he wants to buy
Is this correct?
if so, could the last PUT also be expressed as a POST somehow?
EDIT 1
I found an online link to the book i'm reading... where it makes the distinction between the two:
https://books.google.ca/books?id=4lZcsRwXo6MC&pg=PA16&lpg=PA16&dq=A+store+is+a+client-managed+resource+repository.+A+store+resource+lets+an+API+client:+put+resources+in,+get+them+back+out,+and+decide+when+to+delete+them&source=bl&ots=F4CkbFkweL&sig=H6eKZMPR_jQdeBZkBL1h6hVkK_E&hl=en&sa=X&ei=BB-vVJX6HYWvyQTByYHIAg&ved=0CB0Q6AEwAA#v=onepage&q=A%20store%20is%20a%20client-managed%20resource%20repository.%20A%20store%20resource%20lets%20an%20API%20client%3A%20put%20resources%20in%2C%20get%20them%20back%20out%2C%20and%20decide%20when%20to%20delete%20them&f=false

REST uses http verbs to manipulate resources. Full-Stop. That's it. To build some classes of browser-based application developers sometimes use local storage (a store), but that has absolutely nothing to do with REST (in fact, it's the opposite). Collections are a special consideration in REST-based API design because the REST principles place considerable constraints on how they are represented in the results of your queries -- special consideration also because there are no standards on how these things should be represented and access if you're using anything other than html as a resource type.
Edit:
REST suggests that when we ask for a resource we receive that resource and only that resource and that things referenced by that resource are returned as links, not as data. This mimics the http standard by which we return the requested page and links to other pages rather than embedding linked pages. So, our resources should return links to related resources, not the resources themselves.
So, what about collections?
Let's use as an example a college management system that has Course objects each of which contains a huge lists of Students.
When I GET the course I don't want to have the collection of students returned as an embedded list, because that could be huge and because my user might not be interested. Instead, I want to know that the course has a students collection and I want to be able to query that collection separately (when I need to) and I want to be able to page it on demand. For this to work, the course needs to link to the students collection URL (maybe with an appropriate type so that my code knows how to handle the link). Then, I want to use the given collection's url to request a paged list of resources. In this example, the collection's url could be something like: course/1/students, with the convention that I can add paging info to the search string to constrain the results with something like course/1/students?page=1&count=10. Embedding the students collection into the course resource would be a violation of REST. I would not be returning a course, I'd be returning course-and-students.

CFInclude vs Custom Tag vs CFC for Presentation and Security

I'm just starting out with ColdFusion OOP and I am wanting to make a DIV which shows different links to users depending on what page they are on and what login rights (role) they have. Basically a 'context' menu.
Should I put this toolbar/navigation DIV in a .cfm or .cfc file?
To reiterate; The cfm or cfc file needs to know what page the user is on and will also check what role they have. Depending on these two pieces of information it will display a set of links to the user. The role information comes from the database and stored in a SESSION variable, and to find out what page they are on I guess it could use #GetFileFromPath(GetBaseTemplatePath())#.
My first thought was to have a normal .cfm file, put all the presentation and logic in that file (the HTML and lots of <cfif> statements) to ensure the correct information is displayed in the DIV, and then use <cfinclude> to display it on the page. Then I started thinking maybe I should make a Custom Tag and ask the calling page to pass in the user's credentials and the #GetFileFromPath(GetBaseTemplatePath())# as arguments and then have that Custom Tag return all the presentational data.
Finally I guess a CFC could do the above as well, but I'd be breaking the 'rule' of having presentational and logic data in a CFC.
Any suggestions on the best practice to achieve what I'm trying to do? It will eventually serve thousands of customers so I need to make sure my solution is easy to scale.

Anything that outputs HTML to the screen should be in a .cfm file.
That being said, depending on your need, you could have methods in a CFC that generate HTML, but the method simply returns the HTML as a string.
In programming, there are very few absolutes, but here is one: You should NEVER directly output anything inside of a function or method by using output="true". Instead, whatever content is generated, it should be returned from the method.
If you will have a need to use this display element more than once, a custom tag might be the best way to go rather than an include.

I see security as being a combination of what menu items I can see and what pages can be ran.
The main security function is inside of the main session object
On the menus
I call a function called
if (session.objState.checkSecurity(Section, Item) == 1)
then ...
For page security
function setupRequest() {
...
if (session.objState.checkSecurity(getSection(), getItem()) == 0) {
location("#request.self#?message=LoginExpired", "no");
return;
}
...
}
The particulars of what checkSecurity can do varies from application to application, but it is tied into how FW/1 works. The following security variations exist:
session.objState.checkSecurity(getSection())
session.objState.checkSecurity(getSection(), getItem())
session.objState.checkSecurity(getSection(), getItem(), Identifier)
None of the presentation files know anything about security.

Rules by which I live:) :
No CF business logic in CFM files. Just use some service which will serve template and provide needed data.
navService = com.foobar.services.Navigation(form, url);
and later output #navService.GetNavConent()#
No direct output from CFC files, functions should always return content. For example, make one function which makes one link based on some logic, second which wraps that and returns to cfm template.
Also one more hint, avoid using application and session scopes in your services.
This makes refactoring, testing and debugging too difficult.
For session you can make session.currentUser , CurrentUser.cfc which provides all things you need. e.g. session.currentUser.isAuthorized("backend/administration") and if true, show link to backend/administration.
Same for application, if you need locale, applicaiton wide setting or some singleton, make application.applicationSettings, ApplicationSettings.cfc and use that to retrieve all info you need in cfc's.
These rules will make your application to be easier to test and debug, and really easy to migrate tomorrow on some javascript based UI like Angular or backbone.js since all th edata you need is already in CFC and theoretically you just need to put remote in CFC or make some remote facade in the middle and you're done.

Joomla 2.5 ― using administrator components controllers in frontent part of component

how can I use the controllers created at
/administrator/components/com_mycom/controllers/*
in
/components/com_mycom/mycom.php
In detail:
I have a »log« controller with an »add« method, and I would like to use this from the frontend. I one is not logged in in the backend the task is not executed and a 500 error rises. So just would like to include the backend controllerpath in the frontend, so that JController::getInstance( 'Mycom' ) still works.
Greetings…
EDIT:
After a long time of searching I could find a more or less undocumented Parameter of the:
JController::getInstance() method, namely the second one: $config = array(). Going through the source code I found out that there is one key of the »config-array« that is of interest, which is: »base_path«.
The call of:
JController:getInstance( 'Mycom, array('base_path' =>JPATH_ADMINISTRATOR.DS.'components'.DS.'com_mycom')' );
always delivers the backend controller and one can use them safely in the frontend, BUT one must take care that then also the views are taken from the backend side of the component. In my case, I just use it to make ajax-calls so it does not matter, but one needs to be careful with using this method when planning to create »frontend views« with »backend controller«.
Greetings…

I had recently a similar problem where I wanted to use the whole CRUD system form back-end also in front-end.
This is the method that worked for me (and I am not saying that this is recommended or best practice):
I've just modeled the folders / file structure from backend. PHP files contained something like:
require_once JPATH_ADMINISTRATOR . '/components/com_mycom/controllers/log.php';

(Java) How can I pass Schema-validated XML documents as parameters between distributed components (e.g. web services or sockets)?

Here is a description of the scenario and I would appreciate also any comments on the approach used
The core of my application is a set of web services backed by a P2P database. One service accepts a simple XML-based record (I have designed a generic schema for it). The service processes this data (mainly creating keys based on certain criteria) and pass the original data along with the created keys to a listening SocketServer in one of the listening P2P nodes. This key,data pair is routed to the proper node, which stores the data (associated with the key as an ID) in an XML database.
A second service accepts a query document that is structured based on the same schema, but with optional values that would be used for searching and matching from the previously stored ones. So the second service would pass this query (with the proper keys) to the P2P part, get back the results and pass them back to the service client.
E.g. if the original record submitted to the first service was < attr1 >value1 < /attr1 > < attr2 > value2 < / attr2 > (attribute list along with some other metadata mandated by the schema) then the second service should retrieve that record if the query received was < attr2 >value2 < / attr2 >
(I could later think about using more complex XPath or XQuery queries as the underlying XML database allows instead of exact matches for values here but that is not important at this stage. there is also a third service I am working on but it depends on getting the first two in proper shape first)
So my questions are:
1) What data type should I use as the parameters of the web services? How to utilize my schema for this usage? I was considering various XML binding frameworks (especially JAXB and SDO) for this but didn't know how to proceed.
2) How can I enhance the two services (call them store and search) to use dynamically created templates based on the original generic schema? The service would still accept documents of the main schema type but has the inner attribute list based on a template say template1 only requires whose values are ints while template2 require (float) and (string). The current JSP-based prototype manually creates this template but as an XML document that is assembled by hand (<>tags dispersed in text) and there is no type checking what so ever so I thought I could do better!
3) Is it possible to generate a quick web app prototype for simple access to this system (again by using the schema (&templates) to edit the appropriate XML message structures? What I am looking for is for the (human) user to choose a template and then just "fill in the blanks" and submit, no need for any fancy look and feel.
4) Can I or how can I also use this XML message type for communicating across sockets?
5) Does it matter if I deploy the services as stateless EJBs or not? Do I need them to be EJBs or servlets would be more than enough?
I currently have a rudimentary implementation (from previous developers) that were meant for a subset of my current requirements (I am improving on the services and adding new derived ones) but there was no schema nor validation and the data is passed all along as basic strings, thus providing weak typing and difficult to update manual parsing. The reason I want to update this to a stronger bound typing is to introduce changes in the data schema that would be passed along the whole system easily. Basically I want the system to be as less coupled to the data format/schema used as possible; the current prototype is too coupled to the data that I am finding it extremely difficult to change the data without breaking the system.
My initial investigation led me to consider JAXB but it supports only static typing (cannot create a schema/types dynamically at runtime that I want to persist for later usage). So I came across SDO which has both dynamic and static typing. The problem is just that there is not enough community and/or examples of using this approach so it seems risky (the examples of Apache Tuscany and Eclipselink implementations are very scarce and I could not find complete examples that are not 5+ years old (like this http://www.ibm.com/developerworks/java/library/j-sdo/) and also tackles the XML use case of SDO (most seem to focus on the relational usage of SDO).
This is my first time asking for programming help (here and elsewhere) so please bear with me. I searched a lot on the net but I could not find anything useful but pieces here and there that did not add up.
Any comment or hint is really appreciated.
trfndr
EDIT
I forgot one thing: how would the search service get back the results? Since it is opening a client socket connection, there is no way to get back any results synchronously. The current implementation tackles this by having the service client opening a listening socket on a random port and putting this contact info in the query document. After the search web service sends the query to the p2p part it finishes. The p2p sends the results as a WS call to another service which sends them back to the service client socket. I don't like this approach much, is there any more elegant solution?

I lead the EclipseLink JAXB & SDO implementations and represent Oracle on those specifications so hopefully I can help you out. This question is very similar to talk I'm giving at JavaOne in September.
1) What data type should I use as the
parameters of the web services? How to
utilize my schema for this usage? I
was considering various XML binding
frameworks (especially JAXB and SDO)
for this but didn't know how to
proceed.
This depend's on what web service framework you are using. JAXB is much easier to use with JAX-WS, and while JAXB is still easier to use with JAX-RS SDO, is a possible alternative.
2) How can I enhance the two services
(call them store and search) to use
dynamically created templates based on
the original generic schema? The
service would still accept documents
of the main schema type but has the
inner attribute list based on a
template say template1 only requires
whose values are ints while template2
require (float) and (string). The
current JSP-based prototype manually
creates this template but as an XML
document that is assembled by hand
(<>tags dispersed in text) and there
is no type checking what so ever so I
thought I could do better!
I'm not 100% what you mean here, but the following may be helpful:
Using #XmlAnyElement to Build a Generic Message
3) Is it possible to generate a quick
web app prototype for simple access to
this system (again by using the schema
(&templates) to edit the appropriate
XML message structures? What I am
looking for is for the (human) user to
choose a template and then just "fill
in the blanks" and submit, no need for
any fancy look and feel.
JAX-RS is a nice framework for creating quick prototypes. Below is an example I created:
Part 1 - The Database
Part 2 - Mapping the Database to Objects
Part 3 - Mapping the Objects to XML
Part 4 - The RESTful Service
Part 5 - The Client
4) Can I or how can I also use this
XML message type for communicating
across sockets?
I prefer frameworks like JAX-RS that communicate over the HTTP protocol.
5) Does it matter if I deploy the
services as stateless EJBs or not? Do
I need them to be EJBs or servlets
would be more than enough?
My preference is to use an EJB session bean for the service. If you are interacting with a database then you can leverage the Java Transaction API (JTA) to manage your database transactions.
Part 4 - The RESTful Service
SDO
EclipseLink is the SDO 2.1.1 (JSR-235) reference implementation. We have some examples posted below. If you are looking how to do something specific, I will try to post a relevant example.
http://wiki.eclipse.org/EclipseLink/Examples/SDO
JAXB
JAXB is static. It is also more popular than SDO. Recognizing this in EclipseLink we have implemented a dynamic JAXB feature. It gives you the dynamic aspect of SDO with a JAXB slant.
http://wiki.eclipse.org/EclipseLink/Examples/MOXy/Dynamic
EDIT #1
Since you are dealing with JAX-WS and your model is almost entirely dynamic, I think you should skip the JAXB binding altogether. In the following link see the section "Switching Off Data Binding"
http://java.sun.com/developer/technicalArticles/xml/jaxrpcpatterns3/
This will give us the body of the message as a javax.xml.transform.Source object. We will need to process the XML based on the dynamic templates. SDO would be a good choice here. You can constantly add new types to the HelperContext using XML schemas.
helperContext.getXSDHelper().define(schema1, null);
helperContext.getXSDHelper().define(schema2, null);
You wil be able to unmarshal the Source from the web service as follows:
XMLDocument doc = helperContext.getXMLHelper().load(source, null, null);
DataObject rootDataObject = doc.getRootObject();
String someValue = rootDataObject.getString("attr3/childAttr/anotherChildAttr");
You will also be able to use the XMLHelper to marshal your objects to XML when calling another service.

How to solve two REST problems: the interface document; loss of privacy in descriptive URLs

Coming from a lot of frustrating times with WSDL/Soap, I very much like the REST paradigm, but am trying to solve two basic problems in our application, before moving over to REST. The first problem relates to the lack of an interface document. I think I finally see how to handle this situation: One can query his way down from a top-level "/resources" resource using various requests of GET, HEAD, and OPTIONS to find the one needed resource in the correct hypermedia format. Is this the idea? If so, the client need only be provided with a top-level resource URI: http://www.mywebservicesite.com/mywebservice/resources. He will then have to do some searching and possible keep track of what he is discovering, so that he can use the URIs again efficiently in future to do GETs, POSTs, PUTs, and DELETEs. Are there any thoughts on what should happen here?
The other problem is that we cannot use descriptive URLs like /resources/../customer/Madonna/phonenumber. We do have an implementation of opaque URLs we use in the context of a session, and I'm wondering how opaque URLs might be applied to REST. The general problem is how to keep domain-specific details out of URLs, and still benefit from what REST has to offer.

The other problem is that we cannot use descriptive URLs like /resources/../customer/Madonna/phonenumber.
I think you've misunderstood the point of opaque URIs. The notion of opaque URIs is with respect to clients: A client shall not decipher a URI to guess anything of semantic meaning from it. So a service may well have URIs like /resources/.../customer/Madonna/phonenumber, and that's quite a good idea. The URIs should be treated as opaque by clients: not infer from the URI that it represents Madonna's phone number, and that Madonna is a customer of some sort. That knowledge can only be obtained by looking inside the URI itself, or perhaps by remembering where the URI was discovered.
Edit:
A consequence of this is that navigation should happen by links, not by deconstructing the URI. So if you see /resouces/customer/Madonna/phonenumber (and it actually represents Customer Madonna's phone number) you should have links in that resource to point to the Madonna resource: e.g.
{
"phone_number" : "01-234-56",
"customer_URI": "/resources/customer/Madonna"
}
That's the only way to navigate from a phone number resource to a customer resource. An important aspect is that the server implementation might or might not have domain specific information in the URI, The Madonna record might just as well live somewhere else: /resources/customers/byid/81496237. This is why clients should treat URIs as opaque.
Edit 2:
Another question you have (in the comments) is then how a client, with the required no knowledge of the server's URIs is supposed to be able to find anything. Clients have the following possibilities to find resources:
Provide a search interface. This could be done by providing an OpenSearch description document, which tells clients how to search for items. An OpenSearch template can include several variables, and several endpoints, depending on what you're looking for. So if you have a "customer ID" that's unique, you could have the following template: /customers/byid/{proprietary:customerid}", the customerid element needs to be documented somewhere, inside the proprietary namespace. A client can then know how to use such a template.
Provide a custom form. This implies making a custom media type in which you explicitly define how (based on an instance of the document) a URI to a customer can be forged. <customers template="/customers/byid/{id}"/>. The documentation (for the media type) would have to state that the template attribute must be interpreted as a relative URI after the string substitution "{id}" to an actual customer ID.
Provide links to all resources. Some resources aren't innumerable, so you can simply make a link to each and every one of them, optionally including identifying information along with the links. This could also be done in a custom media type: <customer id="12345" href="/customer/byid/12345"/>.
It should be noted that #1 and #2 are two ways of saying the same thing: Clients are allowed to create URIs if they
haven't got the URI structure a priori
a media type exists for which the documentation states that URIs should be created
This is much the same way as a web browser has no idea of any URI structure on the web, except for the rules laid out in the definition of HTML forms, to add a ? and then all the query parameters separated by &.
In theory, if you have a customer with id 12345, then you could actually dispense with the href, since you could plug the customer id 12345 into #1 or #2. It's more common to actually provide real links between resources, rather than always relying on lookup or search techniques.

I haven't really used web RPC systems (WSDL/Soap), but i think the 'interface document' is there mostly to allow client libraries to create the service API, right? if so, REST shouldn't need it, because the verbs are already defined and don't really need to be documented again.
AFAIUI, the REST way is to document the structure of each resource (usually encoded in XML or JSON). In that document, you'll also have to document the relationship between those resources. In my case, a resource is often a container of other resources (sometimes more than one type), therefore the structure doc specifies what field holds a list of URLs pointing to the contained resources. Ideally, only one unique resource will need a single, fixed (documented) URL. everithing else follows from there.
The URL 'style' is meaningless to the client, since it shouldn't 'construct' an URL. Every URL it needs should be already constructed on a resource field. That let's you change the URL structure without changing the client (that has saved tons of time to me). Your URLs can be as opaque or as descriptive as you like. (personally, i don't like text keys or slugs; my keys are all BIGINTs or UUIDs)

I am currently building a REST "agent" that addresses the first part of your question. The agent offers a temporary bookmarking service. The client code that is interacting with the agent can request that an URL be bookmarked using some identifier. If the client code needs to retrieve that representation again, it simply asks the agent for the url that corresponds to the saved bookmark and then navigates to that bookmark. Currently those bookmarks are not persisted so they only last for the lifetime of the client application, but I have found it a useful mechanism for accessing commonly used resources. E.g. The root representation provides a login link. I bookmark that link and if the client ever receives a 401 then I can redirect to the "login" bookmark.
To address an issue you mentioned in a comment, the agent also has the ability to store retrieved representations in a dictionary. If it becomes necessary to aggregate and manipulate multiple representations at the same time then I can simply request that the agent store the current representation in a dictionary associated to a key and then continue navigating to the next resource. Once the client has accumulated all the necessary representation it can do what it needs to do.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js