How to use Runner_v2 for apache beam dataflow job? - google-cloud-platform

My python code for dataflow job looks like below:
import apache_beam as beam
from apache_beam.io.external.kafka import ReadFromKafka
from apache_beam.options.pipeline_options import PipelineOptions
topic1="topic1"
conf={'bootstrap.servers':'gcp_instance_public_ip:9092'}
pipeline = beam.Pipeline(options=PipelineOptions())
(pipeline
| ReadFromKafka(consumer_config=conf,topics=['topic1'])
)
pipeline.run()
As i am using kafkaIO in python code, someone suggested me to use DataflowRunner_V2( I think V1 doesn't support python).
As per dataflow documentation, i am using this parameter to use runner v2:--experiments=use_runner_v2 (I have not made any change on code level for switching from V1 to V2.)
I am getting below error:
http_response, method_config=method_config, request=request)
apitools.base.py.exceptions.HttpBadRequestError: HttpError accessing <https://dataflow.googleapis.com/v1b3/projects/metal-voyaasfger-23424/locations/us-central1/jobs?alt=json>: response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8', 'date': 'Wed, 08 Jul 2020 07:23:21 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'status': '400', 'content-length': '544', '-content-encoding': 'gzip'}>, content <{
"error": {
"code": 400,
"message": "(5fd1bf4d41e8b7e): The workflow could not be created. Causes: (5fd1bf4d41e8018): The workflow could not be created due to misconfiguration. If you are trying any experimental feature, make sure your project and the specified region support that feature. Contact Google Cloud Support for further help. Experiments enabled for project: [enable_streaming_engine, enable_windmill_service, shuffle_mode=service], experiments requested for job: [use_runner_v2]",
"status": "INVALID_ARGUMENT"
}
}
I have already added service account using export GOOGLE_APPLICATION_CREDENTIALS=(project owner permission) command.
Can someone help where is my mistake. Am i mistaking using Runner_V2?
I will really thnkful if someone shortly tell whats difference in using Runner_v1 and Runner_V2.
Thanks ... :)

I was able to reproduce your issue. The error message was complaining that use_runner_v2 isn't enabled because Runner v2 isn't enabled for batch jobs.
Experiments enabled for project: [enable_streaming_engine, enable_windmill_service, shuffle_mode=service], experiments requested for job: [use_runner_v2]",
Please try running your job with the --streaming flag added.

Related

Implementing Freshsales API in Python [duplicate]

This question already has answers here:
Python request with authentication (access_token)
(8 answers)
Closed 4 years ago.
I am trying to integrate Freshsales functionality within my Django server in order to create leads, schedule appointments, etc. Freshsale's API Documentation in Python lacks detail, however. Here is a link to their API functionality using curl commands: https://www.freshsales.io/api/.
Their python code is as follows:
from .freshsales_exception import FreshsalesException
import requests
import json
def _request(path, payload):
try:
data = json.dumps(payload)
headers = { 'content-type': 'application/json', 'accept': 'application/json' }
resp = requests.post(path, data=data, headers=headers)
if resp.status_code != 200:
raise FreshsalesException("Freshsales responded with the status code of %s" % str(resp.status_code))
except requests.exceptions.RequestException as e:
raise FreshsalesException(e.message)
In the case of the curl command, for example, to create an appointment, is:
curl -H "Authorization: Token token=sfg999666t673t7t82" -H "Content-Type: application/json" -d '{"appointment":{"title":"Sample Appointment","description":"This is just a sample Appointment.","from_date":"Mon Jun 20 2016 10:30:00 GMT+0530 (IST)","end_date":"Mon Jun 20 2016 11:30:00 GMT+0530 (IST)","time_zone":"Chennai","location":"Chennai, TN, India","targetable_id":"115765","targetable_type":"Lead", "appointment_attendees_attributes":[{ "attendee_type":"FdMultitenant::User","attendee_id":"223"},{"attendee_type":"FdMultitenant::User","attendee_id":"222"},{"attendee_type":"Lead","attendee_id":"115773"}] }}' -X POST
I understand that I need to use the requests library to make a post request. However, I do not understand how I need to format the request. The furthest extent I understand to list all appointments, for example, is for my request to be the following:
my_request = "https://mydomain.freshsales.io/api/appointments/token=myexampletoken"
response = requests.post(myrequest)
I am unsure of how to create the payload to be accepted by the API to create an appointment. How might I use the requests library to accomplish this? I have searched for how to execute curl commands in Python, and the only answers I ever saw were to use the requests library. Any help is greatly appreciated!
You're missing the Authorization header. You just need to translate curl headers -H to python code. This should work, according to your syntax.
headers = {
'content-type': 'application/json',
'accept': 'application/json'
'Authorization': 'Token token=sfg999666t673t7t82'
}

AWS lambda function logs are not getting displayed in cloudwatch

I have below setup which I am trying to run.
I have a python app which is running locally on my linux host.
I am using boto3 to connect to AWS with my user secret key and secret key Id.
My user had full access to EC2, Cloudwatch, S3 and config
My application invokes a lamdbda function called mylambda.
The execution role for mylambda also has all the required permissions.
Now if i call my lambda function from aws console it works fine. I can see the logs of execution in cloudwatch. But if I do it from my linux box from my custom application, I dont see any execution logs, I am not getting error either.
is there anything I am missing ?
Any help is really appreciated.
I dont see it getting invoked. But surprisingly I am getting response as below.
gaurav#random:~/lambda_s3$ python main.py
{u'Payload': <botocore.response.StreamingBody object at 0x7f74cb7f5550>, u'ExecutedVersion': '$LATEST', 'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'RequestId': '7417534c-6263-11e8-xxx-afab1667510a', 'HTTPHeaders': {'x-amzn-requestid': '7417534c-xxx-11e8-8a24-afab1667510a', 'content-length': '4', 'x-amz-executed-version': '$LATEST', 'x-amzn-trace-id': 'root=1-5b0bdc78-7559e68acd668476bxxxx754;sampled=0', 'x-amzn-remapped-content-length': '0', 'connection': 'keep-alive', 'date': 'Mon, 28 May 2018 10:39:52 GMT', 'content-type': 'application/json'}}, u'StatusCode': 200}
{u'CreationDate': datetime.datetime(2018, 5, 27, 9, 50, 9, tzinfo=tzutc()), u'Name': 'bucketname'}
gaurav#random:~/lambda_s3$
My sample app is as below
#!/usr/bin/python
import boto3
import json
import base64
d= {'key': 10, 'key2' : 20}
client = boto3.client('lambda')
response = client.invoke(
FunctionName='mylambda',
InvocationType='RequestResponse',
#LogType='None',
ClientContext=base64.b64encode(b'{"custom":{"foo":"bar", \
"fuzzy":"wuzzy"}}').decode('utf-8'),
Payload=json.dumps(d)
)
print response
Make sure that you're actually invoking the Lambda correctly. Lambda error handling can be a bit tricky. Using boto3 the invoke method doesn't necessarily throw even if the invocation fails. You have to check the statusCode property in the response.
You mentioned that your user has full access to EC2, Cloudwatch, S3, and config. For your use case, you need to add lambda:InvokeFunction to your user's permissions.

Dataflow pipeline "lost contact with the service"

I'm running into trouble with an Apache Beam pipline on Google Cloud Dataflow.
The pipeline is simple: Reading json from GCS, extracting text from some nested fields, writing back to GCS.
It works fine when testing with a smaller subset of input files but when I run it on the full data set, I get the following error (after running fine through around 260M items).
Somehow the "worker eventually lost contact with the service"
(8662a188e74dae87): Workflow failed. Causes: (95e9c3f710c71bc2): S04:ReadFromTextWithFilename/Read+FlatMap(extract_text_from_raw)+RemoveLineBreaks+FormatText+WriteText/Write/WriteImpl/WriteBundles/Do+WriteText/Write/WriteImpl/Pair+WriteText/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteText/Write/WriteImpl/GroupByKey/Reify+WriteText/Write/WriteImpl/GroupByKey/Write failed., (da6389e4b594e34b): A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service. The work item was attempted on:
extract-tags-150110997000-07261602-0a01-harness-jzcn,
extract-tags-150110997000-07261602-0a01-harness-828c,
extract-tags-150110997000-07261602-0a01-harness-3w45,
extract-tags-150110997000-07261602-0a01-harness-zn6v
The Stacktrace shows a Failed to update work status/Progress reporting thread got error error:
Exception in worker loop: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 776, in run deferred_exception_details=deferred_exception_details) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 629, in do_work exception_details=exception_details) File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 168, in wrapper return fun(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 490, in report_completion_status exception_details=exception_details) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 298, in report_status work_executor=self._work_executor) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", line 333, in report_status self._client.projects_locations_jobs_workItems.ReportStatus(request)) File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py", line 467, in ReportStatus config, request, global_params=global_params) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 723, in _RunMethod return self.ProcessHttpResponse(method_config, http_response, request) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 729, in ProcessHttpResponse self.__ProcessHttpResponse(method_config, http_response, request)) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 600, in __ProcessHttpResponse http_response.request_url, method_config, request) HttpError: HttpError accessing <https://dataflow.googleapis.com/v1b3/projects/qollaboration-live/locations/us-central1/jobs/2017-07-26_16_02_36-1885237888618334364/workItems:reportStatus?alt=json>: response: <{'status': '400', 'content-length': '360', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Wed, 26 Jul 2017 23:54:12 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}>, content <{ "error": { "code": 400, "message": "(7f8a0ec09d20c3a3): Failed to publish the result of the work update. Causes: (7f8a0ec09d20cd48): Failed to update work status. Causes: (afa1cd74b2e65619): Failed to update work status., (afa1cd74b2e65caa): Work \"6306998912537661254\" not leased (or the lease was lost).", "status": "INVALID_ARGUMENT" } } >
And Finally:
HttpError: HttpError accessing <https://dataflow.googleapis.com/v1b3/projects/[projectid-redacted]/locations/us-central1/jobs/2017-07-26_18_28_43-10867107563808864085/workItems:reportStatus?alt=json>: response: <{'status': '400', 'content-length': '358', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Thu, 27 Jul 2017 02:00:10 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}>, content <{ "error": { "code": 400, "message": "(5845363977e915c1): Failed to publish the result of the work update. Causes: (5845363977e913a8): Failed to update work status. Causes: (44379dfdb8c2b47): Failed to update work status., (44379dfdb8c2e88): Work \"9100669328839864782\" not leased (or the lease was lost).", "status": "INVALID_ARGUMENT" } } >
at __ProcessHttpResponse (/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py:600)
at ProcessHttpResponse (/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py:729)
at _RunMethod (/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py:723)
at ReportStatus (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py:467)
at report_status (/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py:333)
at report_status (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:298)
at report_completion_status (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:490)
at wrapper (/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py:168)
at do_work (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:629)
at run (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:776)
This looks like an error to the data flow internals to me. Can anyone confirm? Are there any workarounds?
The HttpError typically appears after the workflow has failed and is part of the failure/teardown process.
It looks like there were others error reported in your pipeline, such as the following. Note that if the same elements fail 4 times it will be marked failing.
Try looking the Stack Traces section in the UI to identify the other errors and their stack traces. Since this only occurs on the larger dataset, consider the possibility of their being malformed elements that only exist in the larger dataset.

Facebook graph API : can post on "me/feed" but not on "page_id/feed" (error : 1455002)

I guess the answer to this one is straightforward but I cannot find it. Any help would be very much appreciated.
I. Use case
The application (back-end in python / django) should write on a facebook page.
II. Symptoms
When running the code below on "me/feed", the post is correctly inserted
When running the code below on "PAGE_ID/feed", there is an exception (see below in section IV.)
The scope of the authorisation is publish_stream, manage_pages
Also, the user_token is from a user in the test domain
III. Code
## Getting the user_access_token is dealt with before
h = Http()
data = dict(message="Hello", access_token=user_access_token['access_token'])
resp, content = h.request("https://graph.facebook.com/PAGE_ID/feed", "POST", urlencode(data))
IV. Exception generated (using /PAGE_ID/feed)
resp : Response: {'status': '400', 'content-length': '119', 'expires': 'Sat, 01 Jan 2000 00:00:00 GMT', 'www-authenticate':
'OAuth "Facebook Platform" "invalid_request" "(#1) An unknown error occurred"', 'x-fb-rev': '976458',
'connection': 'keep-alive', 'pragma': 'no-cache', 'cache-control': 'no-store', 'date': 'Tue, 22 Oct 2013 21:45:20
GMT', 'access-control-allow-origin': '*', 'content-type': 'text/javascript; charset=UTF-8', 'x-fb-debug':
'HFItWh64ob+3hErv+rgYdFzHlRBVHP7Pg0Eg4hvqYlY='}
content str: {"error":{"message":"(#1) An unknown error occurred","type":"OAuthException","code":1,"error_data":
{"kError":1455002}}}

'Cannot parse input stream' error when updating defects in Rally via pyral

I am using the Python Toolkit for Rally REST API to update defects on our Rally server. I have confirmed that I am able to make contact with the server and authenticate fine by getting a list of current defects. I am running into issues with updating them. I am using Python 2.7.3 with pyral 0.9.1 and requests 0.13.3.
Also, I am passing 'verify=False' to the Rally() call and have made appropriate chages to the
restapi module to compensate for this.
Here is my test code:
import sys
from pyral import Rally, rallySettings
server = "rallydev.server1.com"
user = "user#mycompany.com"
password = "trial"
workspace = "trialWorkspace"
project = "Testing Project"
defectID = "DE192"
rally = Rally(server, user, password, workspace=workspace,
project=project, verify=False)
defect_data = { "FormattedID" : defectID,
"State" : "Closed"
}
try:
defect = rally.update('Defect', defect_data)
except Exception, details:
sys.stderr.write('ERROR: %s \n' % details)
sys.exit(1)
print "Defect %s updated" % defect.FormattedID
When I run the script:
[temp]$ ./updefect.py
ERROR: Unable to update the Defect
If I change the code in the RallyRESTResponse function to print out the value of self.errors when found (line 164 of rallyresp.py), I get this output:
[temp]$ ./updefect.py
[u"Cannot parse input stream due to I/O error as JSON document: Parse error: expected '{' but saw '\uffff' [ chars read = >>>\uffff<<< ]"]
ERROR: Unable to update the Defect
I did find another question that sounds like it might possibly be related to mine here:
App SDK: Erorr parsing input stream when running query
Can you provide any assistance?
Pairing Michael's observation regarding the GZIP encoding with that of another astute Rally customer working a Support case on the issue - it appears that some versions of the requests module will default to GZIP compression if the content-type is not specifically defined.
The fix is to set content-type to application/json in the REST Headers section of pyral's config.py:
RALLY_REST_HEADERS = \
{
'X-RallyIntegrationName' : 'Python toolkit for Rally REST API',
'X-RallyIntegrationVendor' : 'Rally Software Development',
'X-RallyIntegrationVersion' : '%s.%s.%s' % __version__,
'X-RallyIntegrationLibrary' : 'pyral-%s.%s.%s' % __version__,
'X-RallyIntegrationPlatform' : 'Python %s' % platform.python_version(),
'X-RallyIntegrationOS' : platform.platform(),
'User-Agent' : 'Pyral Rally WebServices Agent',
'Content-Type' : 'application/json',
}
What you are seeing is probably not related to the Python 2.7.3 / requests 0.13.3 versions being used. The error message you saw has also been reported using the Javascript based App SDK and .NET Toolkit for Rally (2 separate reports here on SO) and at least one other person using Python 2.6.6 and requests 0.9.2. It appears that the error verbiage is being generated on the Rally WSAPI back-end. Current assessment by fellow Rally'ers is that it is an encoding related issue. The question is where the encoding issue originates.
I have yet to be able to repro this issue, having tried with several versions of Python (2.6.x and 2.7.x), several versions of requests and on Linux, MacOS and Win7.
As you seem to be pretty comfortable with diving in to the code and running in debug mode, one avenue to try is to capture the defective POST URL and POST data and attempting the update via a browser based REST client like 'Simple REST Client' or Poster and observing if you get the same error message in the WSAPI response.
I'm seeing similar behavior with pyral while trying to add an attachment to a defect.
With debugging and logging on I see this request on stdout:
2012-07-20T15:11:24.855212 PUT https://rally1.rallydev.com/slm/webservice/1.30/attachmentcontent/create.js?workspace=workspace/123456789
Then the json in the logfile:
2012-07-20 15:11:24.854 PUT attachmentcontent/create.js?workspace=workspace/123456789
{"AttachmentContent": {"Content": "iVBORw0KGgoAAAANSUhEUgAABBQAAAJrCAIAAADf2VflAAAXOWlDQ...
Then this in the logfile (after a bit of fighting with restapi.py to get around the unicode error):
2012-07-20 15:11:25.260 404 Cannot parse input stream due to I/O error as JSON document: Parse error: expected '{' but saw '?' [ chars read = >>>?<<< ]
The notable thing there is the 404 error code. Also, the "Cannot parse input stream..." error message is not coming from pyral, it's coming from Rally's server. So pyral is sending Rally something Rally can't understand.
I also logged the response headers, which may be a clue:
{'rallyrequestid': 'qs-app-03ml3akfhdpjk7c430otjv50ak.qs-app-0387404259', 'content-encoding': 'gzip', 'transfer-encoding': 'chunked', 'expires': 'Fri, 20 Jul 2012 19:18:35 GMT', 'vary': 'Accept-Encoding', 'cache-control': 'no-cache,no-store,max-age=0,must-revalidate', 'date': 'Fri, 20 Jul 2012 19:18:36 GMT', 'p3p': 'CP="NON DSP COR CURa PSAa PSDa OUR NOR BUS PUR COM NAV STA"', 'content-type': 'text/javascript; charset=utf-8'}
Note there the 'content-encoding': 'gzip'. I suspect the requests module (I'm using 0.13.3 in Macos Python 2.6) is gzip encoding its PUT request but the Rally API server is not properly decoding that.