Mark Great Expectation validation as failed or passed based on a percentage of failure - great-expectations

I am using Great Expectations in my ETL data pipeline for a POC. I have a validation which is failing (as expected), and I have the following data in my validation JSON:
"unexpected_count": 205,
"unexpected_percent": 10.25,
"unexpected_percent_nonmissing": 10.25,
"unexpected_percent_total": 10.25
Please note that the unexpected_percent_total is 10.25%. Is there a way to configure the validation such that the validation would show as success if the failed percentage is that low? For eg, show the validation as failed only if the unexpected_percent_total is more than 50%, else show it as passed.
Please let me know if anyone configured such a scenario using Apache Great Expectations

Yes. Use the "mostly" keyword argument.
import pandas as pd
import great_expectations as ge
d = {'fruit': ['apple','apple','apple','orange','banana']}
df = pd.DataFrame(data=d)
ge_df=ge.from_pandas(df)
ge_df.expect_column_values_to_be_in_set('fruit',['apple','banana'],mostly=.5)
This expectations returns a "Success" even though "orange" is not in the set.
{
"result": {
"element_count": 5,
"missing_count": 0,
"missing_percent": 0.0,
"unexpected_count": 1,
"unexpected_percent": 20.0,
"unexpected_percent_total": 20.0,
"unexpected_percent_nonmissing": 20.0,
"partial_unexpected_list": [
"orange"
]
},
"exception_info": {
"raised_exception": false,
"exception_traceback": null,
"exception_message": null
},
"meta": {},
"success": true
}

Related

I'm not getting the expected response from client.describe_image_scan_findings() using Boto3

I'm trying to use Boto3 to get the number of vulnerabilities from my images in my repositories. I have a list of repository names and image IDs that are getting passed into this function. Based off their documentation
I'm expecting a response like this when I filter for ['imageScanFindings']
'imageScanFindings': {
'imageScanCompletedAt': datetime(2015, 1, 1),
'vulnerabilitySourceUpdatedAt': datetime(2015, 1, 1),
'findingSeverityCounts': {
'string': 123
},
'findings': [
{
'name': 'string',
'description': 'string',
'uri': 'string',
'severity': 'INFORMATIONAL'|'LOW'|'MEDIUM'|'HIGH'|'CRITICAL'|'UNDEFINED',
'attributes': [
{
'key': 'string',
'value': 'string'
},
]
},
],
What I really need is the
'findingSeverityCounts' number, however, it's not showing up in my response. Here's my code and the response I get:
main.py
repo_names = ['cftest/repo1', 'your-repo-name', 'cftest/repo2']
image_ids = ['1.1.1', 'latest', '2.2.2']
def get_vuln_count(repo_names, image_ids):
container_inventory = []
client = boto3.client('ecr')
for n, i in zip(repo_names, image_ids):
response = client.describe_image_scan_findings(
repositoryName=n,
imageId={'imageTag': i}
)
findings = response['imageScanFindings']
print(findings)
Output
{'findings': []}
The only thing that shows up is findings and I was expecting findingSeverityCounts in the response along with the others, but nothing else is showing up.
THEORY
I have 3 repositories and an image in each repository that I uploaded. One of my theories is that I'm not getting the other responses, such as findingSeverityCounts because my images don't have vulnerabilities? I have inspector set-up to scan on push, but they don't have vulnerabilities so nothing shows up in the inspector dashboard. Could that be causing the issue? If so, how would I be able to generate a vulnerability in one of my images to test this out?
My theory was correct and when there are no vulnerabilities, the response completely omits certain values, including the 'findingSeverityCounts' value that I needed.
I created a docker image using python 2.7 to generate vulnerabilities in my scan to test out my script properly. My work around was to implement this if statement- if there's vulnerabilities it will return them, if there aren't any vulnerabilities, that means 'findingSeverityCounts' is omitted from the response, so I'll have it return 0 instead of giving me a key error.
Example Solution:
response = client.describe_image_scan_findings(
repositoryName=n,
imageId={'imageTag': i}
)
if 'findingSeverityCounts' in response['imageScanFindings']:
print(response['imageScanFindings']['findingSeverityCounts'])
else:
print(0)

"error": "Prediction failed: unknown error."

I am requesting online prediction for a trained model (model created with linear learner algorithm) and getting "error": "Prediction failed: unknown error."
This is my first ML model in Google AI platform. Model training was successful, training data, validation data and test data all look good in the output folder. But when I try to test the model by passing the input JSON I get this error. I have looked for similar other posts but couldn't find the solution to get a successful prediction.
metadata.json in the artifact folder looks like
{
"feature_columns": {
"col_0": {
"mapping": {
"0": 0,
"1": 1,
"10": 10,
"2": 2,
"3": 3,
"4": 4,
"5": 5,
"6": 6,
"7": 7,
"8": 8,
"9": 9
},
"mode": "0",
"num_category": 11,
"treatment": "identity",
"type": "categorical"
},
"col_1": {
"mapping": {
"0": 0,
"1": 1,
"10": 10,
"2": 2,
"3": 3,
"4": 4,
"5": 5,
"6": 6,
"7": 7,
"8": 8,
"9": 9
},
"mode": "4",
"num_category": 11,
"treatment": "identity",
"type": "categorical"
}
},
"target_algorithm": "TensorFlow",
"target_column": {
"type": "regression"
}
}
The input JSON that I am passing for testing prediction is
{ "instances": [5,5] }
The model is expected to sum the 2 input features and give a result of 10
Can you please advise where the mistake is?
If you are using gcloud to send a file, do:
{"col_0": "5", "col_1": "5" }
If you are sending a bunch of instances through some other client, do:
{
"instances": [
{"col_0": "5", "col_1": "5" },
{"col_0": "3", "col_1": "2" }
]
}
Lak's answer is good and does the job.
Although my input data was different I got the same error, despite successful local predictions. Here are two additional things that helped me.
Run !gcloud ai-platform predict --help to learn more about how your input should be formatted and which flag to use when making the call.
Inspect the model with !saved_model_cli show --dir ${YOUR_LOCAL_MODEL_PATH} --all to check the names of the inputs. Verify that they are in fact (in your case) inputs[col_0] and inputs[col_1].
Using the "TEST & USE" interface you mentioned above (and in this SO answer) allows quicker experimentation.

Combining `InputPath` and `Parameters` in AWS States Language

The AWS States Language specification describes the role of the InputPath and Parameters fields but does not give an example of the filters being used together.
My understanding is that, if specified, the JSON path given by the InputPath field is applied to the raw input producing the effective input. Then, if specified, the value of the parameters field is applied, modifying the effective input.
Extending the example given in the spec, given the following Task state definition:
"X": {
"Type": "Task",
"Resource": "arn:aws:swf:us-east-1:123456789012:task:X",
"Next": "Y",
"InputPath": "$.sub",
"Parameters": {
"flagged": true,
"parts": {
"first.$": "$.vals[0]",
"last3.$": "$.vals[3:]"
}
}
}
then, given the following input:
{
"flagged": 7,
"sub" : {
"vals": [0, 10, 20, 30, 40, 50]
}
}
the effective input to the code identified in the Resource field would be:
{
"flagged": true,
"parts": {
"first": 0,
"last3": [30, 40, 50]
}
}
Is my interpretation correct?
It is totally correct. Parameters is a Payload Template created to reshape input data to meet the format expectations of tasks, while ResultSelector is doing the same thing but for output data.
The value of "Parameters" MUST be a Payload Template which is a JSON object, whose input is the result of applying the InputPath to the raw input. If the "Parameters" field is provided, its payload, after the extraction and embedding, becomes the effective input.
Also sometimes specs could be a bit hard to read, a visual graph might be helpful then

How to use Google Place Add in Python

I'm using Google Place API for Web Service, in Python.
And I'm trying to add places like the tutorial here
My code is here:
from googleplaces import GooglePlaces, types, lang, GooglePlacesError, GooglePlacesAttributeError
API_KEY = "[Google Place API KEY]"
google_places = GooglePlaces(API_KEY)
try:
added_place = google_places.add_place(
name='Mom and Pop local store',
lat_lng={'lat': 51.501984, 'lng': -0.141792},
accuracy=100,
types=types.TYPE_HOME_GOODS_STORE,
language=lang.ENGLISH_GREAT_BRITAIN)
except GooglePlacesError as error_detail:
print error_detail
But I kept getting this error:
I tried to change the input into Json format or Python dictionary format, then it gave the error "google_places.add_place() only accept 1 parameter, 2 give"......
Is there any right way to use Google Place API Add Place method in Python?
Oh, finally I found the solution, it's so simple... I am not familiar with Python POST requests, in fact everything is easy.
Just need the code here, and we will be able to add a Place in Google Place API, with Python:
import requests
post_url = "https://maps.googleapis.com/maps/api/place/add/json?key=" + [YOUR_API_KEY]
r = requests.post(post_url, json={
"location": {
"lat": -33.8669710,
"lng": 151.1958750
},
"accuracy": 50,
"name": "Google Shoes!",
"phone_number": "(02) 9374 4000",
"address": "48 Pirrama Road, Pyrmont, NSW 2009, Australia",
"types": ["shoe_store"],
"website": "http://www.google.com.au/",
"language": "en-AU"
})
To check the results:
print r.status_code
print r.json()

Can I define a custom validation with options for Loopback?

Is there a prescribed way to create a custom validator in loopback? As an example, assume that I want to create something like:
Validatable.validatesRange('aProperty', {min: 0, max: 1000})
Please note that I am aware of:
Validatable.validates(propertyName, validFn, options)
The problem I have with validates() is that validFn does not have access to the options. So, I'm forced to hard code this logic; and create a custom method for every property that needs this type of validation. This is undesirable.
Similarly, I am familiar with:
Model.observes('before save', hookFn)
Unfortunately, I see no way to even declare options for the hookFn(). I don't have this specific need (at least, not yet). It was just an avenue I explored as a possible alternative to solve my problem.
Any advice is appreciated. Thanks in advance!
There is a mention of how to do this over at https://docs.strongloop.com/display/public/LB/Validating+model+data
You can also call validate() or validateAsync() with custom validation
functions.
That leads you to this page https://apidocs.strongloop.com/loopback-datasource-juggler/#validatable-validate
Which provides an example.
I tried it out on my own ...
Question.validate('points', customValidator, {message: 'Negative Points'});
function customValidator(err) {
if (this.points <0) err();
}
And since that function name isn't really used anywhere else and (in this case) the function is short, I also tried it out with anonymous function:
Question.validate('points',
function (err) { if (this.points <0) err(); },
{message: 'Question has a negative value'})
When points are less than zero, it throws the validation error shown below.
{
"error": {
"name": "ValidationError",
"status": 422,
"message": "The `Question` instance is not valid. Details: `points` Negative Points (value: -100).",
"statusCode": 422,
"details": {
"context": "Question",
"codes": {
"points": [
"custom"
]
},
"messages": {
"points": [
"Negative Points"
]
}
What you are looking for is validatesLengthOf(). For example:
Validatable.validatesLengthOf('aProperty', {min: 0, max: 1000});
Here is the documentation links:
All the methods of Validatable class and
Model-wise validation.