I want to set up a large scale scraping service. At the moment I use the scrapyrt api (https://scrapyrt.readthedocs.io/en/latest/api.html#) but I would like to run my spiders asynchronously so my crawls can scale up. I heard that scrapyd was the tool of choice for that, however, I would still like to be able to use scrapyrt for debugging purpose. Is it possible to have scrapyd and scrapyrt running next to each other in the same Docker container and have them callable with their own API.
Also when deploying scrapyd on ecs, I would like to have my scrapyd project deployed on each ecs instance I am using instead of having to deploy them manually with scrapyd-client everyt time a new instance is created. Is it something that is easily doable with scrapyd.
Cheers
Related
I've made a pretty simple web application which has a REST API backend service written in Python/Django and a FE service written in JS/React. Both parts are containerized and can be launched locally via docker-compose. They are situated in separate github repositories, and every time a new tag is pushed to the corresponding repo, an image is built and pushed to the corresponding ecr repo via github actions. Until that point everything works smoothly, but the problem is that, I don't know how to properly automate the deployment process to the test and production environments. The goal is to have those environments as similar as possible.
My current solution for test env is to simply upload docker-compose file to the ec2 instance via github actions, then manually run the docker-compose command, which pulls images from ecr.
Even though it's a simple solution, it's not very scalable or automated and requires some work to be done in order to update an application. The desired flow is to have a manual GitHub action in each repository, which would deploy either BE or FE to the test or prod environment without any need to ssh into the server and do any other manipulations with docker.
I was looking at ECS, but it seems that it's a solution for bigger apps, which need several or more instances to run. I want my app to be used by many users, but I'm not sure, when it will happen. So maybe i should stick to something less complicated than ECS. Are there any simpler solutions, which i am missing? Like Elastic beanstack or something from any other provider?
I will be happy to receive a feedback on anything I wrote in this post, thanks!
As we move our workloads to AWS I am looking for an ETL tool which is widely used and has the appropriate connectors - Apache Camel appears to fit the bill. However, I am struggling to find information on how Camel can be deployed in AWS - the obvious one is on an EC2 instance, but we would like to avoid the setup and maintenance required by Virtual Machines. I don't see anyone offering it as a managed service, so the option I'd like to explore is running it as a container in ECS, as we will have numerous other containers running.
Containers don't seem to be an installation option on the Apache Camel website - perhaps it is just too limiting for a tool whose purpose is to connect to everything else?
Is it acceptable and practical to run Camel in a container, and where could I find more information about it?
Apache Camel appears to fit the bill.
Indeed the Apache Camel is a great integration framework. And that's the point. It is a framework, not a product. So there are multiple ways to run the Camel flows. As a web app, as a standalone app, as a part if our own code. Camel itself is pretty agnostic to the way you run the flows and that's why you don't have very specific way enforced in the web site.
If you want an out of box product, which can generate containerized deployments with Apache Camel, you could have a look at Apache ServiceMix, Apache Karaf or it's supported RedHat Fuse.
Is it acceptable and practical to run Camel in a container, and where could I find more information about it?
It is perfectly fine.
Question: Can you (are you able) to create a docker container with your (any other) application?. Based on the question this skill is lacking and I really suggest to learn it.
You may check folowing post https://medium.com/#wkrzywiec/how-to-put-your-java-application-into-docker-container-5e0a02acdd6b
FROM java:8-jdk-alpine
COPY ./target/myapp.jar /usr/app/myapp.jar
WORKDIR /usr/app
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "myapp.jar"]
Let's assume you can run your ETL tasks as a standalone application, then just run it in the container as any other standalone application.
it we would like to avoid the setup and maintenance required by Virtual Machines
Question: how do you distribute your camel tasks? I mean - what is result of your build? A war file? A standalone app?
To build a web app you could see https://www.baeldung.com/spring-apache-camel-tutorial
The most convenient way to deploy a war file in AWS is AWS Beanstalk service.
If you build a standalone application (or use servicemix) and you can build a container, then indeed ECS or Fargate seem as natural options.
This is my first time deploying an application. I have some idea about it but I am not sure if it is correct. How do I go about deploying a play application on google cloud?
1) I have created a package using dist command. I have the zip file now on my local pc. https://www.playframework.com/documentation/2.5.x/Deploying
2) Do I first need to create a compute resource on gcp? What configuration shall I use for the vm? My app is still in test phase so there are no external users at the moment
3) I suppose play uses netty web server. So do I need to install netty on the compute resource? I have looked online a bit but can't find a resource on how to deploy an application on netty.
deploy an application on netty
Netty is not a web server/application server, but an IO framework which can be used to build web servers or any high-performant IO applications.
If you really want to use netty, you need to write an HTTP server yourself, or just use an HTTP framework built on netty.
If you want to build an application using netty, have a look at the examples on https://github.com/netty/netty/tree/4.1/example/src/main/java/io/netty/example/
Deploying a container to the Cloud using Google Cloud Platform and Kubernetes Engine
Kubernetes is a way of orchestrating containers in the Cloud, enabling you to do things like auto-scale, fast deploys and manage running versions of containers. You simply create a container and upload it to a container repository. In this example I used Google’s Container Registry, it’s really simple to use and works brilliantly with their Kubernetes implementation.
follow this tutorial might help you with this
https://medium.com/beyond/deploying-a-container-to-the-cloud-using-google-cloud-platform-and-kubernetes-engine-10d8ee3aba86
The use-case is this: we need to pull in data from a third-party service and update the database with fresh records, every week. The different ways I have been able to explore have been
either creating a custom django-admin command
or running a background task using Celery (and probably ELK for logging)
I just want to know which way is more feasible and simpler? And if there's another way that I can explore. What I want is monitoring the task for the first few runs then just relying on the logs.
I'm migrating a Django application to Redhat Openshift Online. The application is subject to spikes in demand, so I want to use the Openshift Autoscaling functionality.
To test this, I use Apache JMeter to put a lot of load on the server, to see whether the new gears launch I expect. But I'm encountering bugs with the server scaling up, like my deployment scripts not working as expected, or migrations not occurring correctly on the database. Is there a more convenient way to test the auto-scaling than sending a bunch of requests at the server until haproxy launches a new gear?
You can scale applications up or down using both the web console and the rhc command line tool. You can read more about how to do it here: https://developers.openshift.com/managing-your-applications/scaling.html#managing-application-scaling
Can you provide more details about what scripts/migrations are not working correctly on the newly created gears? You can also feel free to send questions/issues to https://developers.openshift.com/contact.html