Take the following two timings on a trivial SQL statement:
timeit.timeit("""
import MySQLdb;
import settings;
conn = MySQLdb.connect(host=settings.DATABASES['default']['HOST'], port=3306, user=settings.DATABASES['default']['USER'], passwd=settings.DATABASES['default']['PASSWORD'], db=settings.DATABASES['default']['NAME'], charset='utf8');
cursor=conn.cursor();
cursor.execute('select 1');
cursor.fetchone()
""", number=100
)
# 2.5417470932006836
And, the same thing but assuming we already have a cursor that is ready to execute a statement:
timeit.timeit("""
cursor.execute('select 1');
cursor.fetchone()""",
setup="""
import MySQLdb;
import settings;
conn = MySQLdb.connect(host=settings.DATABASES['default']['HOST'], port=3306, user=settings.DATABASES['default']['USER'], passwd=settings.DATABASES['default']['PASSWORD'], db=settings.DATABASES['default']['NAME'], charset='utf8');
cursor=conn.cursor()
""", number=100
)
# 0.1153109073638916
And so we see that the second approach is about 20x faster on initialization time when we don't have to create a new connection/cursor each time.
But how would it be possible to do something like this in a serverless environment? For example, if I were using Google Cloud Functions or Cloud Run, would it be possible to:
Authenticate the user in order to set up a cursor to the database; and
Open a websocket where they can then send the query each time? (For an open websocket, do we need to check authentication on the user each time?)
Or, is there a possible approach to deal with the above overhead in a serverless environment?
AS #John Hanley mentioned, on Cloud Run, your code can either run continuously as a service or as a job. Both services and jobs run in the same environment and can use the same integrations with other services on Google Cloud.
Cloud Run services-Used to run code that responds to web requests, or
events.
Cloud Run jobs. Used to run code that performs work (a job) and quits
when the work is done.
A Cloud Run instance that has at least one open WebSocket connection is considered active and is therefore billed Websocket connection.
WebSockets applications are supported on Cloud Run with no additional configuration required. However, WebSockets streams are HTTP requests still subject to the request timeout,configured for your Cloud Run service, so you need to make sure this setting works well for your use of WebSockets such as implementing reconnects in your clients.
I would also recommend you check links for a socket instance and sql connect functions.
Related
I have a telegram bot, it is written in python (uses the aiogram library), it works on a webhook. I need to process payments for a paid subscription to a bot (I use yoomoney as a payment).
It’s clear how you can do this on Flask: through its request method, catch http notifications that are sent from yoomoney (you can specify a url for notifications in yoomoney, where payment statuses like "payment.succeeded" should come)
In short, Flask is able to check the status of a payment. The bottom line is that the bot is written in aiogram and the bot is launched by the command:
if __name__ == '__main__': try: start_webhook( dispatcher=dp, webhook_path=WEBHOOK_PATH, on_startup=on_startup, on_shutdown=on_shutdown, skip_updates=True, host=WEBAPP_HOST, port=WEBAPP_PORT ) except (KeyboardInterrupt, SystemExit): logger.error("Bot stopped!")
And if you just write in this code the launch of the application on flask in order to listen for answers from yoomoney, then EITHER the commands (of the bot itself) from aiogram will be executed OR the launch of flask, depending on what comes first in the code.
In fact, it is impossible to use flask and aiogram at the same time without multithreading. Is it possible somehow without flask in aiogram to track what comes to my server from another server (yoomoney)? Or how to use the aiogram + flask bundle more competently?
I tried to run flask in multi-threaded mode and the aiogram bot itself, but then an error occurs that the same port cannot be attached to different processes (which is logical).
It turns out it is necessary to change ports or to execute processes on different servers?
I have successfully run Firebase emulator:
E:\firebase>firebase emulators:start
i emulators: Starting emulators: functions, firestore
! Your requested "node" version "8" doesn't match your global version "10"
+ functions: Emulator started at http://localhost:5001
! No Firestore rules file specified in firebase.json, using default rules.
i firestore: Serving ALL traffic (including WebChannel) on http://localhost:808
0
! firestore: Support for WebChannel on a separate port (8081) is DEPRECATED and
will go away soon. Please use port above instead.
i firestore: Emulator logging to firestore-debug.log
+ firestore: Emulator started at http://localhost:8080
i firestore: For testing set FIRESTORE_EMULATOR_HOST=localhost:8080
i functions: Watching "E:\firebase\func
tions" for Cloud Functions...
! functions: Your GOOGLE_APPLICATION_CREDENTIALS environment variable points to
E:\firebase\key.json. Non-emulated serv
ices will access production using these credentials. Be careful!
+ functions[notifyNewMessage]: firestore function initialized.
+ All emulators started, it is now safe to connect.
The functions file with notifyNewMessage function is below:
const functions = require('firebase-functions')
const admin = require('firebase-admin')
admin.initializeApp()
exports.notifyNewMessage = functions.firestore
.document('test/{test}')
.onCreate((docSnapshot, context) => {
console.log(docSnapshot.data())
}
When I create a new document manually in my Firebase console, my CLI in Windows does not log anything. How can I fix this so that it logs what the functions says in my CLI?
I was just being an idiot and had not re-compiled by code.
The local emulator doesn't respond to changes in the Firestore database that's hosted in Google's cloud and visible in the console. What it does respond to are changes in the locally emulated Firestore database also running on your machine. If you want your Firestore function to trigger in the local emulator, you will have to instead make a change to the emulated Firestore, as described in the documentation. You might want to go through the provided quickstart to get some experience with this.
If you don't want to use the Firestore emulator and just want to trigger it directly for testing, you can use the Firebase CLI local shell.
I've got a Google Cloud PubSub topic which at times has thousands of messages and at times zero messages coming in. These messages represent tasks which can take upwards of an hour each. Preferably I'm able to use Cloud Run for this, as it scales really well to the demand, if a thousand messages gets published, I want 100s of Cloud Run instances to spin up. These Run instances get started by a push subscription. The problem is that PubSub has a 600 second timeout for the acknowledgement. This means in order to have Cloud Run process these messages they have to finish within 600 seconds. If they do not, PubSub times it out, and sends it again, causing the task to be restarted until the first task finally does acknowledge it (this causes the same task to be ran many times). Cloud Run acknowledges the messages by returning a 2** HTTP status code. The documentation states
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So is it maybe possible to acknowledge a PubSub request through code and continue the processing, without having Google Cloud Run hand over the resources? Or is there a better solution I'm unaware of?
Because these processes are so code/resource-intensive, I feel Cloud Functions will not suffice. I've looked at https://cloud.google.com/solutions/using-cloud-pub-sub-long-running-tasks and https://cloud.google.com/blog/products/gcp/how-google-cloud-pubsub-supports-long-running-workloads. But these didn't answer my question.
I've looked at Google Cloud Tasks, which might be something? But the rest of the project has been built around PubSub/Run/Functions, so preferably I stick with that.
This project is written in Python.
So preferably I would like to write my Google Cloud Run tasks like this:
#app.route('/', methods=['POST'])
def index():
"""Endpoint for Google Cloud PubSub messages"""
pubsub_message = request.get_json()
logger.info(f'Received PubSub pubsub_message {pubsub_message}')
if message_incorrect(pubsub_message):
return "Invalid request", 400 #use normal NACK handling
# acknowledge message here without returning
# ...
# Do actual processing of the task here
# ...
So how can or should I solve this, so that the the resource-intensive tasks get properly scaled on demand ( so a push PubSub subscription ). And the tasks only get executed once.
Answers:
In short what has been answered. Cloud Run and Functions are just not suited for this problem. There is no way to have them do tasks that take longer than 9 or 15 minutes respectively. The only solution is to switch over to another Google Service and use a pull style subscription and lose out on auto-scaling of GC Run/Functions
Cloud Run on GKE can handle long process, more CPU and memory than available on managed platform. However, you have a GKE cluster always running and you loose the "pay-as-you-use" benefit.
If you want to use this solution, don't link directly PubSub push subscription to your Cloud Run on GKE. Use Cloud Task with HTTP job for this. The timeout is longer than PubSub (up to 24h instead of 10 min) and the retry policies are customizables.
Neither Cloud Functions nor Cloud Run is sufficient for arbitrarily long running operations. Cloud Functions has a hard cap of 9 minutes per invocation, and Cloud Run caps at 60. If you need more time, you're going to have to delegate the work to another product, such as Google Compute Engine. It should be possible to kick off some Compute Engine work from one of the serverless products.
Give the limits of pubsub acks, you'll probably have to find a way for a client to be able to poll or listen to some resource to find out when the work is actually done. You could use a database for that, and Cloud Firestore lets you listen to documents to find out when they change. So you could use that to track the status of your long-running work.
One of the vm instances on google cloud compute was shutdown, with an event log in stackdriver without ip or actor (user or service or system) which initiated the event. The instance has onHostMaintenance set to migrate and automaticRestart set to true. This particular instance has migrated on maintenance without error before. The stackdriver event log looks like
{
actor: {
user: ""
},
event_subtype: "compute.instances.stop",
event_timestamp_us: "1531781734907624",
event_type: "GCE_API_CALL",
ip_address: "",
}
The user and ip_address fields are NOT redacted. They have empty values on actual log.
is this common? how does one identify the cause for shutdown in these peculiar cases ?
According to your event type, I dive in the doc of Activity logs,
Compute Engine API calls - GCE_API_CALL events are API calls that change the state of a resource.
It seems like someone may use API calls to shutdown your VM. Looking your settings and logs on the VM instance, it can't be a hostError or Maintenance event. As far as you turn on the onHostMaintenance set to migrate and automaticRestart set to true. Your VM will always be migrated to another hardware.
Out of curiosity, did you restart your VM? did it shutdown again? How often does it happen?
Seems like app engine had an outage during that time, incident .
I registered my task app in Spring Cloud Data Flow, created a definition for it and the status shows 'unknown'. I created the stream and trying to launch the task through task-sink and I get an error:
java.lang.IllegalStateException: failed to resolve MavenResource:
How to launch a task from the task-sink? Am I missing something? Any help is appreciated. Another question I have is how do I access the payload sent via TaskLaunchRequest in my task?
S1 http | step1: transformer-rabbit | log
S2 :S1.step1 > filter --expression=payload.contains('CUSTADDRMODRQ_V15') | task-processor | task-sink
task-sink is launching the task provided by the uri in the TaskLaunchRequest. It is looking for the resource as shown in the log
OUT Using manager EnhancedLocalRepositoryManager with priority 10.0 for /home/vcap/.m2/repository
OUT Using transporter HttpTransporter with priority 5.0 for https://repo.spring.io/libs-snapshot and finally failing.
The task is deployed in our repository and as mentioned I registered and created the definition for it as well.
This one is in cf environment and I am using SCDF server 1.0.0.M4.
In the application.properties for the task-sink i am providing maven.remote.repositories.snapshots.url=**
task create fis-ifx-event-task --definition "fis-event-task"
My goal is launching the task from the stream.
Thanks for the information. I am in fact using the BUILD-SNAPSHOT as I am unable to enable taks in 1.0.0M4 version. Here is the one I am using spring-cloud-dataflow-server-cloudfoundry-1.0.0.BUILD-20160808.144306-116. I am able to register and create task definitions. The status of the task definition is showing as 'unknown' even when I am using the sample task module provided by your team. But when I initiate the flow of the stream and when task-sink tries to launch the task, it is unable to find the maven resource. When I create the task definition, does the task module gets deployed? I don't see any app in Pivotal Apps Manager. As mentioned earlier, I provided maven.remote.repositories.snapshot.url in the application.properties file for the task-sink application. Another thing I observed is when I launch the task manually from dataflow shell it gives an error CF-UnprocessableEntity(10008): The request is semantically invalid: Unknown field(s): 'staging_disk_in_mb', 'staging_memory_in_mb' and also a message saying 'Source is empty'. Presently the task is supposed to print the timestamp and is not dependent on any input.
TaskProcessor code:
#EnableBinding(Processor.class)
#EnableConfigurationProperties(TaskProcessorProperties.class)
public class TaskProcessor {
#Autowired
private TaskProcessorProperties processorProperties;
public TaskProcessor() {
}
#Transformer(inputChannel = Processor.INPUT, outputChannel = Processor.OUTPUT)
#ELI(level = "info", eventType = ELIEventType.INBOUND)
public Object setupRequest(String message) {
Map<String, String> properties = new HashMap<String, String>();
properties.put("payload", message);
TaskLaunchRequest request = new TaskLaunchRequest(processorProperties.getUri(), null, properties, null);
return new GenericMessage<>(request);
}
}
TaskSink code:
#SpringBootApplication
#EnableTaskLauncher
#EnableBinding(Sink.class)
#EnableConfigurationProperties(TaskSinkProperties.class)
public class FisIfxEventTaskSinkApplication {
public static void main(String[] args) {
SpringApplication.run(FisIfxEventTaskSinkApplication.class, args);
}
}
I provided the stream I am using earlier in the post. Sink is receiving the TaskLaunchRequest with uri and payload as you can see here and unable to launch the task.
OUT registering [40, java.io.File] with serializer org.springframework.integration.codec.kryo.FileSerializer
2016-08-10T16:08:55.02-0600 [APP/0]
OUT Launching Task for the following resource TaskLaunchRequest{uri='maven://com.xxx:fis.ifx.event-task:jar:1.0-SNAPSHOT', commandlineArguments=[], environmentProperties={payload={"statusCode":0,"fisT
opic":"CustomerDataUpdated","payloadId":"CUSTADDRMODR``Q_V15","customerIds":[1597304]}}, deploymentProperties={}}
Before I begin, you have a number of questions here. In the future, it's better to break them up into multiple questions so that they are easier to find by other users and easier to answer. That being said:
A little context on the current state of things
In order to understand how things will work, it's important to understand the current state of things. The current releases of the software involved are:
Pivotal Cloud Foundry (PCF) - 1.7.12. This version is required for any task support.
Spring Cloud Task (SCT) - 1.0.2.RELEASE
Spring Cloud Data Flow CF (SCDF) - 1.0.0.BUILD-SNAPSHOT (current as of the date of this post).
Currently PCF 1.7.12+ has all the capabilities to run tasks. You can create v3 applications (the type of application used to launch a task), run it as a task, etc. However, the tooling around that functionality is not currently complete. There is no support for v3 applications in Apps Manager or the CLI. There is a plugin for the CLI that is more of a dev tool that can be used to help with some functions (it will show you logs, etc), but it is not fully functional and requires a specific version of the CLI to work [1]. This is one of the reasons that the task functionality within PCF is still considered experimental.
Spring Cloud Task is currently GA and supports all the functionality needed to effectively run tasks on CF. However, it's important to note that SCT doesn't handle orchestration so the actual launching of tasks on CF is the responsibility of either the user, or Spring Cloud Data Flow (the easier route).
Spring Cloud Data Flow's Cloud Foundry server implementation currently has functionality to launch tasks on PCF in the latest snapshots. We have validated this against 1.7.12 as well as the development branch of 1.8.
The task workflow within SCDF
Tasks are fundamentally different from stream applications within the context of SCDF. When you create a stream definition, you are given the option to deploy it. What this does is it actually downloads the Spring Boot über jars and deploys them to PCF as long running processes. If they go down, PCF, will relaunch them as expected, etc.
Tasks on the other hand, are not deployed. They are launched. The difference is that while you create a task definition, there is nothing deployed until you click launch. And when the task completes, the software is shut down and cleaned up. So while a stream definition may have states, it's really a one to one relationship between the definition and the deployed software. Where with a task, you can launch a task definition as many times as you want.
Your issues
Reading through your post, I see a few things that you are struggling with. Let me see if I can help:
Task Definitions within SCDF and launching them via a stream - When launching a task from a stream, the task registry within SCDF is not used. The sink expects the URL for the resource to be within the TaskLauchRequest.
Apps Manager and tasks - As mentioned above, there is no support for v3 applications in Apps Manager yet so you won't be able to see your tasks there.
Viewing the logs - In order to debug what's going wrong with launching your task on CF, you're going to want to view the logs. To do so, use the v3 CLI plugin mentioned above to view them. It's important to note that you can only tail live logs with the plugin, not view logs that have previously been rendered. Because of that, when testing, you'll want to tail the logs as soon as the app is created, before it's launched.
Error in SCDF Shell - The error you received from the SCDF shell (CF-UnprocessableEntity(10008):...) leads me to wonder if you have both the correct version of PCF (1.7.12+) and the correct version of the following other libraries:
spring-cloud-deployer-cloudfoundry - The latest snapshots
cf-java-client - 2.0.0.M10+
reactor-core - 3.0.0.RC1+
I hope this helps!
[1] https://github.com/cloudfoundry/v3-cli-plugin
Task support is not available in 1.0.0.M4 release of SCDF's CF-server. In this release, the task commands/REST-APIs should be disabled - see here. And for that reason, you wouldn't see any docs related to Tasks in the 1.0.0.M4 reference guide.
That said, the Task support is available/enabled in the BUILD-SNAPSHOT release. If you're locally building the CF-server and upon pushing it to CF, you could take advantage the task commands in the shell to create and launch task definitions.