DJANGO OPENCENSUS url field in request data too long - django

I am having a similar issue to this question since adding App Insights to my application. It may be related to this other question also, but neither of them are directly related to App Insights and neither have solutions.
This is the error from the django-tasks.log
Data drop 400: 100: Field 'url' on type 'RequestData' is too long. Expected: 2048 characters, Actual: 3701 {'iKey': <uuid>, 'tags': {'ai.cloud.role': 'manage.py', 'ai.cloud.roleInstance': <instance>, 'ai.device.id': <device>, 'ai.device.locale': 'en_US', 'ai.device.osVersion': '#1 SMP Tue Aug 25 17:23:54 UTC 2020', 'ai.device.type': 'Other', 'ai.internal.sdkVersion': 'py3.6.12:oc0.7.11:ext1.0.4', 'ai.operation.id': 'fcbe18bf6ca9036aa4546af171f3e877', 'ai.operation.name': 'GET /<my_url>/'}, 'time': '2020-12-15T17:58:36.498868Z', 'name': 'Microsoft.ApplicationInsights.Request', 'data': {'baseData': {'id': '116a0658b513bdb9', 'duration': '0.00:00:00.096', 'responseCode': '200', 'success': True, 'properties': {'request.name': 'GET /<my_url>/', 'request.url': 'https://<my host>/<my_url>/?<my very long query string>', 'django.user.id': '90', 'django.user.name': '100044505'}, 'ver': 2, 'name': 'GET /<my_url>/', 'url': 'https://<my host>/<my_url>/?<my very long query string>', 'source': None, 'measurements': None}, 'baseType': 'RequestData'}, 'ver': 1, 'sampleRate': None, 'seq': None, 'flags': None}.
We also see this repeating in the logs.
Queue is full. Dropping telemetry.
Queue is full. Dropping telemetry.
Queue is full. Dropping telemetry.
Queue is full. Dropping telemetry.
Queue is full. Dropping telemetry.
Queue is full. Dropping telemetry.
I could rewrite the app to use shorter queries, but that seems like the wrong answer. Is there a way to configure django to support long URLs.

The buffer cannot be changed, but you can limit the size of the URL with a filter. In order to customize the trace exported, it has to be instantiated separately.
def shorten_url(envelope):
if 25 < len(envelope.data.baseData.url):
envelope.data.baseData["url"] = envelope.data.baseData.url[:25]+"..."
return True
from opencensus.ext.azure.trace_exporter import AzureExporter
exporter = AzureExporter(service_name='mysite')
exporter.add_telemetry_processor(shorten_url)
OPENCENSUS = {
'TRACE': {
'SAMPLER': 'opencensus.trace.samplers.ProbabilitySampler(rate=1)',
'EXPORTER': exporter
#Assumes Environmental Variable 'APPINSIGHTS_INSTRUMENTATIONKEY'
}
}
Complete working example:
https://github.com/Gamecock/Django-appinsights-example

Related

I'm not getting the expected response from client.describe_image_scan_findings() using Boto3

I'm trying to use Boto3 to get the number of vulnerabilities from my images in my repositories. I have a list of repository names and image IDs that are getting passed into this function. Based off their documentation
I'm expecting a response like this when I filter for ['imageScanFindings']
'imageScanFindings': {
'imageScanCompletedAt': datetime(2015, 1, 1),
'vulnerabilitySourceUpdatedAt': datetime(2015, 1, 1),
'findingSeverityCounts': {
'string': 123
},
'findings': [
{
'name': 'string',
'description': 'string',
'uri': 'string',
'severity': 'INFORMATIONAL'|'LOW'|'MEDIUM'|'HIGH'|'CRITICAL'|'UNDEFINED',
'attributes': [
{
'key': 'string',
'value': 'string'
},
]
},
],
What I really need is the
'findingSeverityCounts' number, however, it's not showing up in my response. Here's my code and the response I get:
main.py
repo_names = ['cftest/repo1', 'your-repo-name', 'cftest/repo2']
image_ids = ['1.1.1', 'latest', '2.2.2']
def get_vuln_count(repo_names, image_ids):
container_inventory = []
client = boto3.client('ecr')
for n, i in zip(repo_names, image_ids):
response = client.describe_image_scan_findings(
repositoryName=n,
imageId={'imageTag': i}
)
findings = response['imageScanFindings']
print(findings)
Output
{'findings': []}
The only thing that shows up is findings and I was expecting findingSeverityCounts in the response along with the others, but nothing else is showing up.
THEORY
I have 3 repositories and an image in each repository that I uploaded. One of my theories is that I'm not getting the other responses, such as findingSeverityCounts because my images don't have vulnerabilities? I have inspector set-up to scan on push, but they don't have vulnerabilities so nothing shows up in the inspector dashboard. Could that be causing the issue? If so, how would I be able to generate a vulnerability in one of my images to test this out?
My theory was correct and when there are no vulnerabilities, the response completely omits certain values, including the 'findingSeverityCounts' value that I needed.
I created a docker image using python 2.7 to generate vulnerabilities in my scan to test out my script properly. My work around was to implement this if statement- if there's vulnerabilities it will return them, if there aren't any vulnerabilities, that means 'findingSeverityCounts' is omitted from the response, so I'll have it return 0 instead of giving me a key error.
Example Solution:
response = client.describe_image_scan_findings(
repositoryName=n,
imageId={'imageTag': i}
)
if 'findingSeverityCounts' in response['imageScanFindings']:
print(response['imageScanFindings']['findingSeverityCounts'])
else:
print(0)

Aws cost and usage for all the instances

I would like to get the usage cost report of each instance in my aws account form a period of time.
I'm able to get linked_account_id and service in the output but I need instance_id as well. Please help
import argparse
import boto3
import datetime
cd = boto3.client('ce', 'ap-south-1')
results = []
token = None
while True:
if token:
kwargs = {'NextPageToken': token}
else:
kwargs = {}
data = cd.get_cost_and_usage(
TimePeriod={'Start': '2019-01-01', 'End': '2019-06-30'},
Granularity='MONTHLY',
Metrics=['BlendedCost','UnblendedCost'],
GroupBy=[
{'Type': 'DIMENSION', 'Key': 'LINKED_ACCOUNT'},
{'Type': 'DIMENSION', 'Key': 'SERVICE'}
], **kwargs)
results += data['ResultsByTime']
token = data.get('NextPageToken')
if not token:
break
print('\t'.join(['Start_date', 'End_date', 'LinkedAccount', 'Service', 'blended_cost','unblended_cost', 'Unit', 'Estimated']))
for result_by_time in results:
for group in result_by_time['Groups']:
blended_cost = group['Metrics']['BlendedCost']['Amount']
unblended_cost = group['Metrics']['UnblendedCost']['Amount']
unit = group['Metrics']['UnblendedCost']['Unit']
print(result_by_time['TimePeriod']['Start'], '\t',
result_by_time['TimePeriod']['End'],'\t',
'\t'.join(group['Keys']), '\t',
blended_cost,'\t',
unblended_cost, '\t',
unit, '\t',
result_by_time['Estimated'])
As far as I know, Cost Explorer can't treat the usage per instance. There is a function Cost and Usage Reports which gives a detailed billing report by dump files. In this file, you can see the instance id.
It can also be connected to the AWS Athena. Once you did this, then directly query to the file on Athena.
Here is my presto example.
select
lineitem_resourceid,
sum(lineitem_unblendedcost) as unblended_cost,
sum(lineitem_blendedcost) as blended_cost
from
<table>
where
lineitem_productcode = 'AmazonEC2' and
product_operation like 'RunInstances%'
group by
lineitem_resourceid
The result is
lineitem_resourceid unblended_cost blended_cost
i-***************** 279.424 279.424
i-***************** 139.948 139.948
i-******** 68.198 68.198
i-***************** 3.848 3.848
i-***************** 0.013 0.013
where the resourceid containes the instance id. The amount of cost is summed for all usage in this month. For other type of product_operation, it will contains different resource ids.
You can add an individual tag to all instances (e.g. Id) and then group by that tag:
GroupBy=[
{
'Type': 'TAG',
'Key': 'Id'
},
],

Boto3 API for cloudwatch - get_metrics_statistics returns empty array

I am trying to call cloudwatch API using boto3, and it seems to be going through well. But the data returned is an empty array [], even with a 200 response. What am I missing ?
cloudwatch.get_metric_statistics(
Namespace='AWS/ELB',
MetricName='Latency',
Dimensions=[
{
'Name' : 'LoadBalancerName',
'Value' : '********'
}
],
StartTime=datetime.utcnow() - timedelta(seconds=600),
EndTime=datetime.utcnow(),
Period=60,
Statistics=['Average', 'Maximum']
)
{u'Datapoints': [], 'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'RequestId': 'f631c9d6-b6d4-11e8-9b60-89ddf4935382', 'HTTPHeaders': {'x-amzn-requestid': 'f631c9d6-b6d4-11e8-9b60-89ddf4935382', 'date': 'Wed, 12 Sep 2018 21:44:00 GMT', 'content-length': '330', 'content-type': 'text/xml'}}, u'Label': 'Latency'}
I tried other APIs on boto3, to verify the connection , and I do get a valid response.
empty array is an acceptable returned value - when the data is not available:
for the time range
for the unit
for the period
for the statistic
Are you able to see some data in the CloudWatch Console if you request the same set of statistics/period/time range for that metric?
Turns out for some reason the AWS-SDK was adding a month to my startDateTime and EndDateTime parameters. I used
AWS.config.logger = console;
in my code, which helped me see the logs, and that confirmed that it was looking a month ahead. I update my code to take a month earlier, and it seems to run fine now.

RequestError: TransportError(400, u'mapper_parsing_exception', u'failed to parse')

Creating an index:
def create_index(index_name):
es=create_elastic_search_object()
entry_mapping = {
'entry-type': {
'properties': {
'text': {'type': 'string'},
'coordinates': {'type': 'geo_point'},
'username':{'type': 'string'} }
}
}
es.indices.create(index_name,body={'mappings':entry_mapping})
Inserting into the index
coordinates= str(tweet[0][0])+","+str(tweet[0][1])
es.index(index=index_name, doc_type=keyword, id=start_id+ind, body={'text': tweet[1],'coordinates': coordinates,'username': tweet[2]})
Error:
*** RequestError: TransportError(400, u'mapper_parsing_exception', u'failed to parse')
Debugging:
(Pdb) body={'text': tweet[1],'coordinates': coordinates,'username': tweet[2]}
(Pdb) print body
{'username': 'csd', 'text': 'RT #funder: Court Doc:Trump evicted a disabled US Veteran because he had a therapy dog\n\n#votevets #trumprussia #resist #theresistance #russ...', 'coordinates': '-117.1304909,32.7211149'}
All formats seem correct to me, what am I missing?
Libraries used:
from elasticsearch import Elasticsearch
https://elasticsearch-py.readthedocs.io/en/master/
When I had the same problem, in my case, I had NULL value in my dict, so just check your data or try to cast your values to str before reading.

Execute Multiple Queries into single call onto Freebase

I want to get the results of multiple queries into single call onto freebase,which is there in this chapter http://mql.freebaseapps.com/ch04.html. I am using python for querying. I want to query like this
{ # Start the outer envelope
"q1": { # Query envelope for query named q1
"query":{First MQL query here} # Query property of query envelope
}, # End of first query envelope
"q2": { # Start query envelope for query q2
"query":[{Second MQL query here}] # Query property of q2
} # End of second query envelope
}
and get answers like
{
"q1": {
"result":{First MQL result here},
"code": "/api/status/ok"
},
"q2": {
"result":[{Second MQL result here}],
"code": "/api/status/ok"
},
"status": "200 OK",
"code": "/api/status/ok",
"transaction_id":[opaque string value]
}
As specified on that link. I also came across some of the question on SO, which are -
Freebase python
Multiple Queries in MQL on Freebase
But they seems to be using the old API which is "api.freebase.com". The updated API is "www.googleapis.com/freebase"
I tried the following code, but its not working.
import json
import urllib
api_key = "freebase_api_key"
service_url = 'https://www.googleapis.com/freebase/v1/mqlread'
query1 = [{'id': None, 'name': None, 'type': '/astronomy/planet'}]
query2 = [{'id': None, 'name': None, 'type': '/film/film'}]
envelope = {
'q1':query1,
'q2':query2
}
encoded = json.dumps(envelope)
params = urllib.urlencode({'query':encoded})
url = service_url + '?' + params
print url
response = json.loads(urllib.urlopen(url).read())
print response
I am getting error as
{u'error': {u'code': 400, u'message': u'Type /type/object does not have property q1', u'errors': [{u'domain': u'global', u'message': u'Type /type/object does not have property q1', u'reason': u'invalid'}]}}
How can I embed multiple queries into a single MQL query
I'd suggest looking at the Batch capability of the Python client library for the Google APIs.