I am looking for a tool to generate documentation from a WSDL file. I have found wsdl-viewer.xsl which seems promissing. See https://code.google.com/p/wsdl-viewer/
However, I am seeing a limitation where references to complex data types are not mapped out (or explained). For example, say we have a createSnapshot() operation that creates a snapshot and returns an object representing an snapshot. Runing xsltproc(1) and using wsdl-viewer.xsl, the rendered documentation has an output section that describes the output as
Output: createSnapshotOut
parameter type ceateSnapshotResponse
snapshots type Snapshot
I'd like to be able to click on "Snapshot" and see the schema definition of Snapshot.
Is this possible? Perhaps I am not using xsltproc(1) correctly. Perhaps it is not able to find the xsd files. Here are some of the relevant files I have:
SnapshotMgmntProvider.wsdl
SnapshotMgmntProviderDefinitions.xsd
SnapshotMgmntProviderTypes.xsd
Thanks
Medi
Related
I have two separate normalized text files that I want to train my BlazingText model on.
I am struggling to get this to work and the documentation is not helping.
Basically I need to figure out how to supply multiple files or S3 prefixes as "inputs" parameter to the sagemaker.estimator.Estimator.fit() method.
I first tried:
s3_train_data1 = 's3://{}/{}'.format(bucket, prefix1)
s3_train_data2 = 's3://{}/{}'.format(bucket, prefix2)
train_data1 = sagemaker.session.s3_input(s3_train_data1, distribution='FullyReplicated', content_type='text/plain', s3_data_type='S3Prefix')
train_data2 = sagemaker.session.s3_input(s3_train_data2, distribution='FullyReplicated', content_type='text/plain', s3_data_type='S3Prefix')
bt_model.fit(inputs={'train1': train_data1, 'train2': train_data2}, logs=True)
this doesn't work because SageMaker is looking for the key specifically to be "train" in the inputs parameter.
So then i tried:
bt_model.fit(inputs={'train': train_data1, 'train': train_data2}, logs=True)
This trains the model only on the second dataset and ignores the first one completely.
Now finally I tried using a Manifest file using the documentation here: https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html
(see manifest file format under "S3Uri" section)
the documentation says the manifest file format is a JSON that looks like this example:
[
{"prefix": "s3://customer_bucket/some/prefix/"},
"relative/path/to/custdata-1",
"relative/path/custdata-2"
]
Well, I don't think this is valid JSON in the first place but what do I know, I still give it a try.
When I try this:
s3_train_data_manifest = 'https://s3.us-east-2.amazonaws.com/bucketpath/myfilename.manifest'
train_data_merged = sagemaker.session.s3_input(s3_train_data_manifest, distribution='FullyReplicated', content_type='text/plain', s3_data_type='ManifestFile')
data_channel_merged = {'train': train_data_merged}
bt_model.fit(inputs=data_channel_merged, logs=True)
I get an error saying:
ValueError: Error training blazingtext-2018-10-17-XX-XX-XX-XXX: Failed Reason: ClientError: Data download failed:Unable to parse manifest at s3://mybucketpath/myfilename.manifest - invalid format
I tried replacing square brackets in my manifest file with curly braces ...but still I feel the JSON file format seems to be missing something that documentation fails to describe correctly?
You can certainly match multiple files with the same prefix, so your first attempt could have worked as long as you organize your files in your S3 bucket to suit. For e.g. the prefix: s3://mybucket/foo/ will match the files s3://mybucket/foo/bar/data1.txt and s3://mybucket/foo/baz/data2.txt
However, if there is a third file in your bucket called s3://mybucket/foo/qux/data3.txt that you don't want matched (while still matching the first two) there is no way to do achieve that with a single prefix. In these cases a manifest would work. So, in the above example, the manifest would simply be:
[
{"prefix": "s3://mybucket/foo/"},
"bar/data1.txt",
"baz/data2.txt"
]
(and yes, this is valid json - it is an array whose first element is an object with an attribute called prefix and all subsequent elements are strings).
Please double check your manifest (you didn't actually post it so I can't do that for you) and make sure it conforms to the above syntax.
If you're still stuck please open up a thread on the AWS sagemaker forums - https://forums.aws.amazon.com/forum.jspa?forumID=285 and after you do that we can setup a PM to try and get to the bottom of this (never post your AWS account id in a public forum like StackOverflow or even in AWS forums).
The pre-processing page in the cloud ML How to guide (https://cloud.google.com/ml/docs/how-tos/preprocessing-data) says that you should see the SDK reference documentation for details about each type of feature and the
Can anyone point me to this documentation or a list of feature types and their methods? I'm trying to setup a discrete target but keep getting "data type int64 expected type: float" errors whenever I set my target to .discrete() rather than .continuous()
You need to download the SDK reference documentation:
Navigate to the directory where you want to install the docs in the
command line. If you used ~/google-cloud-ml to download the samples
as recommended in the setup guide, it's a good place.
Copy the documentation archive to your chosen directory using
gsutil:
gsutil cp gs://cloud-ml/sdk/cloudml-docs.latest.tar.gz .
Unpack the archive:
tar -xf cloudml-docs.latest.tar.gz
This creates a docs directory inside the directory that you chose. The
documentation is essentially a local website: open docs/index.html in your browser to open it at its root. You can find the transform references in there.
(This information is now in the setup guide as well. It's the final step under LOCAL: MAC/LINUX)
On the type-related errors, let's assume for a bit that your feature set is specified somewhat along the following lines:
feature_set = {
'target': features.target('category').discrete()
}
When a discrete target is specified like above, the data-type of the target feature is an int64 due to one of the following:
No vocab for target data-column (i.e. 'category') was generated during the analysis of your data, i.e. the metadata (in the generated metadata.yaml) has an empty list for the target data-column's vocab.
A vocab for 'category' was indeed generated, and the data-type of the very first item (or key) of this vocab was an int.
Under these circumstances, if a float is encountered, the transformation to the target feature's data-type will fail.
Instead, casting the entire data-column ('category' in this case) into a float should help with this.
I'm trying out type providers in F#. I've had some success using the WsdlService provider in the following fashion:
type ec2 = WsdlService<"http://s3.amazonaws.com/ec2-downloads/ec2.wsdl">
but when I download that wsdl, rename it to .wsdlschema and supply it as a local schema according to the method specified in this example:
type ec2 = WsdlService< ServiceUri="N/A", ForceUpdate = false,
LocalSchemaFile = """C:\ec2.wsdlschema""">
Visual Studio emits an error message:
The type provider
'Microsoft.FSharp.Data.TypeProviders.DesignTime.DataProviders'
reported an error: Error: No valid input files specified. Specify
either metadata documents or assembly files
This message is wrong, since the file quite plainly is valid, as the previous example proves.
I've considered permissions issues, and I've repeated the same example from my user folder, making sure to grant full control to all users in both cases, as well as running VS as administrator.
Why does the F# compiler think the file isn't valid?
edit #1: I have confirmed that doing the same thing doesn't work for http://gis1.usgs.gov/arcgis/services/gap/GAP_Land_Cover_NVC_Class_Landuse/MapServer?wsdl either (a USGS vegetation-related API) whereas referencing the wsdl online works fine.
Hmmm, it appears that the type provider is rather stubborn and inflexible in that it requires a true "wsdlschema" doc when using the LocalSchemaFile option. A wsdlschema document can contain multiple .wsdl and .xsd files within it, wrapped in some XML to keep them separate. I'm guessing this is some kind of standard thing in the Microsoft toolchain, but perhaps others (e.g. Amazon) don't expose stuff like this.
The first thing the TP attempts to do is unpack the wsdlschema file into its separate parts, and sadly it does the wrong thing if in fact there is no unpacking to be done. Then, when it tries to point svcutil.exe at the unpacked schema files to do the codegen, this dies with the error message you are seeing.
Workaround: Add the expected bits of XML into your file, and it will work.
<?xml version="1.0" encoding="utf-8"?>
<ServiceMetadataFiles>
<ServiceMetadataFile name="ec2.wsdl">
[body of your WSDL goes here]
</ServiceMetadataFile>
</ServiceMetadataFiles>
As I experiment more and more with making my own Open Data Tables for YQL I find what might be some gaps in the documentation. As I'm a hands-on learner and like to understand everything I use I probe these gaps to try to learn how everything works.
I've noticed that in the XML format for Open Data Tables, there is a <urls> "array" which usually contains just a single <url> element though sometimes there is no <url>. Here's the beginning of a typical ODT XML file:
<table xmlns="http://query.yahooapis.com/v1/schema/table.xsd" https="true">
<meta>
<author>Paul Donnelly</author>
<documentationURL>http://developer.netflix.com/docs/REST_API_Reference#0_52696</documentationURL>
</meta>
<bindings>
<select itemPath="" produces="XML">
<urls>
<url env="all">http://api.netflix.com/catalog/titles/</url>
</urls>
But I can't seem to find in the documentation whether it can ever contain more than one. I can't find any examples that do but when I try adding more than one everything works and no errors are thrown, though I also can't find any way to access the <url> elements beyond the first one.
Is there any use for the url/urls fields being an XML array? Is there any way to make use of more than one url here? Or is it just a quirk of the format that has no real reason?
Is there any use for the url/urls fields being an XML array?
Is there any way to make use of more than one url here?
The <url> elements can have an env attribute. This env attribute can contain all, prod, int, dev, stable, nightly, perf, qaperf, gamma or beta.
When the table is executed, the current environment (the YQL environment, not the more familiar environment file) is checked and the first matching <url> (if any) is used. If no matching env is found (and there is no all, which is pretty self-descriptive) then an error will be issued; for example, "Table not defined in this environment prod".
Note that for public-facing YQL, the environment is prod; only prod and all make sense to be used in your Open Data Tables.
Or is it just a quirk of the format that has no real reason?
Not at all.
I assume that this information is "missing" from the online documentation purely because it is only useful internally within Yahoo!, but equally it could just be another place where the docs are somewhat out-of-date.
Finally, none of the 1,100 or so Community Open Data Tables specify more than one <url>, and only a handful (55) make use of the env attribute (all using the value all).
I am using GUI version of WEKA and I am classifying using the Naive Bayes. Can anyone please let me know how to find out which instances are misclassified.
Go to classify tab in Weka explorer
Click more options...
Check output predictions
Click OK
Hope that helps.
I faced this very same problem earlier and I tackle it just fine now.
What I do, is the following:
Make one String attribute that assigns each instance a unique ID. I
have assigned the names of the documents to each of my instances.
Generate the WEKA supported .arff file.
Whenever you have to run a classifier on this .arff data, you will notice that you have to exclude the Instance ID attribute. If you don't do this, Weka will pop-up an error saying that the classifier cannot process String attributes. Instead of excluding, run filter StringToNominal on the InstanceID.
Now, as said by #Rushdi, click "More Options" on the classify tab.
Check Output predictions on the "Classifier Evaluation Options" pop-up.
Enter the Attribute number of the Instance ID in the "Output additional attributes" box.
Run the classifier on the whole data, excluding the Instance ID attribute. (Most classifiers have this as an option called "StartSet" in "Ranker" for example which I use along with SMO classifier.)
If you've done everything properly so far, you will see all the instances listed along with their real and predicted output values and also the Instance ID which can tell you exactly which documents were incorrectly classified.
Hope this helps someone.
Good Luck!
In your output there should be a incorrectly classified section with a number and percentage, that should be it. The red box in this image is what you're looking for. Edit: Original image source here
This works for me:
Decompile the official weka.jar
Search into the library the classification that you want to test for know how it works and determinate what instances are misclassified.