Running Splash server and Scrapy spiders on the same Ec2 Instance - amazon-web-services

I'm deploying a web scraping application composed of Scrapy spiders that scrape content from websites as well as screenshot webpages with the Splash javascript rendering service. I want to deploy the whole application to a single Ec2 instance. But for the application to work I must run a splash server from a docker image at the same time I'm running my spiders. How can I run multiple processes on an Ec2 instance? Any advice on best practices would be most appreciated.

Total noob question. I found the best way to run a Splash server and Scrapy spiders on an Ec2 instance after configuration is via a bash script scheduled to run with a cronjob. Here is the bash script I came up with:
#!bin/bash
# Change to proper directory to run Scrapy spiders.
cd /home/ec2-user/project_spider/project_spider
# Activate my virtual environment.
source /home/ec2-user/venv/python36/bin/activate # activate my virtual environment
# Create a shell variable to store date at runtime
LOGDATE=$(date +%Y%m%dT%H%M%S);
# Spin up splash instance from docker image.
sudo docker run -d -p 8050:8050 -p 5023:5023 scrapinghub/splash --max-timeout 3600
# Scrape first site and store dated log file in logs directory.
scrapy crawl anhui --logfile /home/ec2-user/project_spider/project_spider/logs/anhui_spider/anhui_spider_$LOGDATE.log
...
# Spin down splash instance via docker image.
sudo docker rm $(sudo docker stop $(sudo docker ps -a -q --filter ancestor=scrapinghub/splash --format="{{.ID}}"))
# Exit virtual environment.
deactivate
# Send an email to confirm cronjob was successful.
# Note that sending email from Ec2 is difficult and you can not use 'MAILTO'
# in your cronjob without setting up something like postfix or sendmail.
# Using Mailgun is an easy way around that.
curl -s --user 'api:<YOURAPIHERE>' \
https://api.mailgun.net/v3/<YOURDOMAINHERE>/messages \
-F from='<YOURDOMAINADDRESS>' \
-F to=<RECIPIENT> \
-F subject='Cronjob Run Successfully' \
-F text='Cronjob completed.'

Related

AWS Glue 3.0 container not working for Jupyter notebook local development

I am working on Glue in AWS and trying to test and debug in local dev. I follow the instruction here https://aws.amazon.com/blogs/big-data/developing-aws-glue-etl-jobs-locally-using-a-container/ to develop Glue job locally. On that post, they use Glue 1.0 image for testing and it works as it should be. However when I load and try to dev by Glue 3.0 version; I follow the guidance steps but, I can't open Jupyter notebook on :8888 like the post said even every step seems correct.
here my cmd to start a Jupyter notebook on Glue 3.0 container
docker run -itd -p 8888:8888 -p 4040:4040 -v ~/.aws:/root/.aws:ro --name glue3_jupyter amazon/aws-glue-libs:glue_libs_3.0.0_image_01 /home/jupyter/jupyter_start.sh
nothing shows on http://localhost:8888.
still have no idea why! I understand the diff. between versions of Glues just wanna develop and test on the latest version of it. Have anybody got the same issue?
Thanks.
It seems that GLUE 3.0 image has some issues with SSL. A workaround for working locally is to disable SSL (you also have to change the script paths as documentation is not updated).
$ docker run -it -p 8888:8888 -p 4040:4040 -e DISABLE_SSL="true" \
-e AWS_ACCESS_KEY_ID=$(aws --profile default configure get aws_access_key_id) \
-e AWS_SECRET_ACCESS_KEY=$(aws --profile default configure get aws_secret_access_key) \
-e AWS_DEFAULT_REGION=$(aws --profile default configure get region) \
--name glue_jupyter amazon/aws-glue-libs:glue_libs_3.0.0_image_01 \
/home/glue_user/jupyter/jupyter_start.sh
After a few seconds you should have a working jupyter notebook instance running on http://127.0.0.1:8888

hyperledger fabric - stuck at creating cli

I was building the network at fabric level. Following this tutorial http://hyperledger-fabric.readthedocs.io/en/latest/build_network.html
I have made changes in the following files and added 2 more peers in organisation1 only.
configtx.yaml
crypto-config.yaml
docker-compose-cli.yaml
docker-compose-couch.yaml
docker-compose-e2e.yaml
docker-compose-e2e-template.yaml
docker-compose-base.yaml
When im firing ./byfn.sh -m up command
here is the screenshot
Its getting stuck at this step. Its not even showing any error.
Im trying to add 2 more peers in first organisation. Is this the correct way? Am I doing something wrong?
It is also happened to me ,
1.sudo docker stop $(docker ps --all -q ) | docker rm $(docker ps -a -q),be attention just rm the docker related to the fabric node
reboot your computer and restart your docker service
sudo service docker start or systemctl start docker command

How to customize the docker run command on Elastic Beanstalk?

Here's the thing, I need to tell Docker to not containerize the container’s networking, because it needs to connect to a MongoDB that is inside a VPN (enterprise private DB).
There is a Docker command that let's me do exactly that: --net=host. Reference here.
So, for example, when running the container on my local machine, I will do something like:
docker run --rm -it --net=host [image-name]:[version] bash -il
And that command will do the trick. Thanks to that, I can connect to the "private" MongoDB.
So, my question is: Is there a way customize the docker run command of a Single Docker Environment on Elastic Beanstalk so I can add the --net=host?
I have tried using the container_commands into the config.yml file to add that instruction there, but I don't think that does what I need, here is a snippet:
container_commands:
00-test_command:
command: bundle exec thin --net=host
01-networking-fix:
command: "docker run --rm -it --net=host [image-name]:[version] bash -il"
I ended up fixing it with two container commands
container_commands:
00_fix_networking:
command: sed -i 's/docker run -d/docker run --net=host -d/' /opt/elasticbeanstalk/hooks/appdeploy/enact/00run.sh
01_fix_docker_ip:
command: sed -i 's/server $EB_CONFIG_NGINX_UPSTREAM_IP/server localhost/' /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh
Update:
I also had to fix the Upstart script. Unfortunately, I didn't write down what I did because I didn't end up needing to alter the docker run command. You would do a files directive for (I think) /etc/init/docker. AWS edits the Nginx configuration in the same manner as in 01flip.sh in that file as well.
Explanation:
In the 64bit Amazon Linux 2015.03 v2.0.2 running Docker 1.7.1 platform version, the file you need to edit is /opt/elasticbeanstalk/hooks/appdeploy/enact/00run.sh. This file is now far more complex than Samar's version so I didn't want to put the actual contents in there. However, the change is basically the same. There's the line that starts with
docker run -d
I fixed it with a container command:
container_commands:
00_fix_networking:
command: sed -i 's/docker run -d/docker run --net=host -d/' /opt/elasticbeanstalk/hooks/appdeploy/enact/00run.sh
This successfully adds the --net=host argument but now there's another problem. The system ends up with an invalid Nginx directive. Using --net=host means that when you run docker inspect <container id> there is no IP address in the NetworkSettings. AWS uses this to create the server directive for Nginx and ends up generating server :<some port you chose> (before adding --net=host it would look like server <ip>:<port>). I needed to patch that file, too. It's generated in /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh.
01_fix_docker_ip:
command: sed -i 's/server $EB_CONFIG_NGINX_UPSTREAM_IP/server localhost/' /opt/elasticbeanstalk/hooks/appdeploy/enact/01flip.sh
While elastic beanstalk is generally well suited for applications that work with standard set of configurations, its difficult to customize and keep things updated along with the updates AWS provides to EB stacks. Having said that, I've done something like below which is a bit hacky but works fine.
files:
"/opt/elasticbeanstalk/hooks/appdeploy/pre/04run.sh":
mode: "000755"
owner: root
group: root
encoding: plain
content: |
#script content of original 04run.sh along with modification on docker run cmd
# eg. I injected multi-ports here
docker run -d \
"${EB_CONFIG_DOCKER_ENV_ARGS[#]}" \
"${EB_CONFIG_DOCKER_VOLUME_MOUNTS[#]}" \
"${EB_CONFIG_DOCKER_ENTRYPOINT_ARGS[#]}" \
"${PORT_ARGS[#]}" \
$EB_CONFIG_DOCKER_IMAGE_STAGING \
"${EB_CONFIG_DOCKER_COMMAND_ARGS[#]}" 2>&1 | tee /tmp/docker_run.log | tee $EB_CONFIG_DOCKER_STAGING_APP_FILE
This is not very neat, at least I have to make sure that it does not break with updates on elastic beanstalk. The above one is for docker 1.5 stack but you can do something similar with the version you're running.
Note that the latest version of the AWS stack (with Docker 1.7.1) has a slightly different pre-deploy setup. You'll need to update the file at the location: /opt/elasticbeanstalk/hooks/appdeploy/enact/00run.sh
commands:
00001_add_privileged:
cwd: /tmp
command: 'sed -i "s/docker run -d/docker run --privileged -d/" /opt/elasticbeanstalk/hooks/appdeploy/enact/00run.sh'
or, for example, if you want to pass args to your Docker image:
commands:
00001_modify_docker_run:
cwd: /tmp
command: 'sed -i "s/\$EB_CONFIG_DOCKER_IMAGE_STAGING/\$EB_CONFIG_DOCKER_IMAGE_STAGING -gzip -enable-url-source/" /opt/elasticbeanstalk/hooks/appdeploy/enact/00run.sh'

AWS EB, Play Framework and Docker: Application Already running

I am running a Play 2.2.3 web application on AWS Elastic Beanstalk, using SBTs ability to generate Docker images. Uploading the image from the EB administration interface usually works, but sometimes it gets into a state where I consistently get the following error:
Docker container quit unexpectedly on Thu Nov 27 10:05:37 UTC 2014:
Play server process ID is 1 This application is already running (Or
delete /opt/docker/RUNNING_PID file).
And deployment fails. I cannot get out of this by doing anything else than terminating the environment and setting it up again. How can I avoid that the environment gets into this state?
Sounds like you may be running into the infamous Pid 1 issue. Docker uses a new pid namespace for each container, which means first process gets PID 1. PID 1 is a special ID which should be used only by processes designed to use it. Could you try using Supervisord instead of having playframework running as the primary processes and see if that resolves your issue? Hopefully, supervisord handles Amazon's termination commands better than the play framework.
#dkm was having the same issue with my dockerized play app. I package my apps as standalone for production using '$ sbt clean dist` commands. This produces a .zip file that you can deploy to some folder in your docker container like /var/www/xxxx.
Get a bash shell into your container: $ docker run -it <your image name> /bin/bash
Example: docker run -it centos/myapp /bin/bash
Once the app is there you'll have to create an executable bash script I called mine startapp and the contents should be something like this:
Create the script file in the docker container:
$ touch startapp && chmod +x startapp
$ vi startapp
Add the execute command & any required configurations:
#!/bin/bash
/var/www/<your app name>/bin/<your app name> -Dhttp.port=80 -Dconfig.file=/var/www/pointflow/conf/<your app conf. file>
Save the startapp script then from a new terminal and then you must commit your changes to your container's image so it will be available from here on out:
Get the running container's current ID:
$ docker ps
Commit/Save the changes
$ docker commit <your running containerID> <your image's name>
Example: docker commit 1bce234 centos/myappsname
Now for the grand finale you can docker stop or exit out of the running container's bash. Next start the play app using the following docker command:
$ docker run -d -p 80:80 <your image's name> /bin/sh startapp
Example: docker run -d -p 80:80 centos/myapp /bin/sh startapp
Run docker ps to see if your app is running. You see something similar to this:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
19eae9bc8371 centos/myapp:latest "/bin/sh startapp" 13 seconds ago Up 11 seconds 0.0.0.0:80->80/tcp suspicious_heisenberg
Open a browser and visit your new dockerized app
Hope this helps...

Amazon Elastic Beanstalk - Change Timezone

I´m running an EC2 instance through AWS Elastic Beanstalk. Unfortunately it has the incorrect timezone - it´s 2 hours earlier than it should be, because timezone is set to UTC. What I need is GMT+1.
Is there a way to set up the .ebextensions configuration, in order to force the EC2 instance to use the right timezone?
Yes, you can.
Just create a file /.ebextensions/00-set-timezone.config with following content
commands:
set_time_zone:
command: ln -f -s /usr/share/zoneinfo/Australia/Sydney /etc/localtime
This is assuming your are using default Amazon Linux AMI image. If you use some other Linux distribution, just change the command to whatever it requires to set timezone in that Linux.
This is a response from the aws Support Business and this works!
---- Original message ----
How can I change the timezone of an enviroment or rather to the instances of the enviroment in Elastic Beasntalk to UTC/GMT -3 hours (Buenos Aires, Argentina)?
I´m currently using Amazon Linux 2016.03. Thanks in advance for your help.
Regards.
---------- Response ----------
Hello,Thank you for contacting AWS support regarding modifying your Elastic Beanstalk instances time zone to use UTC/GMT -3 hours (Buenos Aires, Argentina), please see below on steps on how to perform this modification.
The below example shows how to modify timezone for Elastic Beanstalk environment using .ebextensions for Amazon Linux OS:
Create .ebextensions folder in the root of your application
Create a .config file for example 00-set-timezone.config file and add the below content in yaml formatting.
container_commands:
01changePHP:
command: sed -i '/PHP_DATE_TIMEZONE/ s/UTC/America\/Argentina\/Buenos_Aires/' /etc/php.d/environment.ini
01achangePHP:
command: sed -i '/aws.php_date_timezone/ s/UTC/America\/Argentina\/Buenos_Aires/' /etc/php.d/environment.ini
02change_AWS_PHP:
command: sed -i '/PHP_DATE_TIMEZONE/ s/UTC/America\/Argentina\/Buenos_Aires/' /etc/httpd/conf.d/aws_env.conf
03php_ini_set:
command: sed -i '/date.timezone/ s/UTC/America\/Argentina\/Buenos_Aires/' /etc/php.ini
commands:
01remove_local:
command: "rm -rf /etc/localtime"
02link_Buenos_Aires:
command: "ln -s /usr/share/zoneinfo/America/Argentina/Buenos_Aires /etc/localtime"
03restart_http:
command: sudo service httpd restart
Deploy application to Elastic Beanstalk including the .ebextensions and the timezone will change as per the above.
I hope that helps
Regards!
If you are running windows in your eb environment...
.
create a folder named .ebextensions in the root of your project..
inside that folder create a file named timezone.config
in that file add the following :
commands:
set_time_zone:
command: tzutil /s "Central Standard Time"
set the time zone as needed
screenshot
I'm using custom .ini file in php.d folder along with regular recommendations from http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html#change_time_zone:
The sed command inserts (rewrites) only the first line of /etc/sysconfig/clock, since the second line (UTC=true) should be left alone, per the above AWS documentation.
# .ebextensions/02-timezone.config
files:
/etc/php.d/webapp.ini:
mode: "000644"
owner: root
group: root
content: |
date.timezone="Europe/Amsterdam"
commands:
01_set_ams_timezone:
command:
- sed -i '1 s/UTC/Europe\/Amsterdam/g' /etc/sysconfig/clock
- ln -sf /usr/share/zoneinfo/Europe/Amsterdam /etc/localtime
Changing the time zone of EC2 with Elastic Beanstalk is simple:
Create a .ebextensions folder in the root
Add a file with filename end with .config (timezone.config)
Inside the file
container_commands:
time_zone:
command: ln -f -s /usr/share/zoneinfo/America/Argentina/Buenos_Aires /etc/localtime
Then you have done.
Note that the container_commands is different from commands, from the document it states:
commands run before the application and web server are set up and
the application version file is extracted.
That's the reason of your time zone command doesn't work because the server hasn't started yet.
container_commands run after the application and web server have been
set up and the application version file has been extracted, but before
the application version is deployed.
If you are runing a java/Tomcat container, just put the JVM Option on the configuration.
-Duser.timezone=America/Sao_Paulo
Possibles values: timezones
Moving to AWS Linux 2 was challenging. It took me a while to work out how to do this easily in .ebextensions.
I wrote the simple solution in another stackoverflow question .. but for anyone needing instant gratification .. add the following commands into the file .ebextensions/xxyyzz.config:
container_commands:
01_set_bne:
command: "sudo timedatectl set-timezone Australia/Brisbane"
command: "sudo systemctl restart crond.service"
These workarounds only fixes the timezone for applications. But when you have any system services like a cron run it looks at the /etc/sysconfig/clock and that is always UTC. If you tail the cron logs or aws-sqsd logs would will notice timestamps are still 2hrs behind - in my case. And a change to the clock setting would need a reboot into order to take effect - which is not an option to consider should you have autoscaling in place or should you want to use ebextensions to change the system clock's config.
Amazon is aware of this issue and I dont think they have resolved it yet.
If your EB application is using the Java/Tomcat container, you can add the JVM timezone Option to the Procfile configuration. Example:
web: java -Duser.timezone=Europe/Berlin -jar application.jar
Make sure to add all configuration options before the -jar option, otherwise they are ignored.
in the .ebextensions added below for PHP
container_commands:
00_changePHP:
command: sed -i '/;date.timezone =/c\date.timezone = \"Australia/Sydney\"' /etc/php.ini
01_changePHP:
command: sed -i '/date.timezone = UTC/c\date.timezone = \"Australia/Sydney\"' /etc/php.d/aws.ini
02_set_tz_AEST:
command: "sudo timedatectl set-timezone Australia/Sydney"
command: "sudo systemctl restart crond.service"
commands:
01remove_local:
command: "rm -rf /etc/localtime"
02change_clock:
command: sed -i 's/\"UTC\"/\"Australia\/Sydney\"/g' /etc/sysconfig/clock
03link_Australia_Sydney:
command: "ln -f -s /usr/share/zoneinfo/Australia/Sydney /etc/localtime"
cwd: /etc
Connect AMI(amazon linux instance) via putty or ssh and execute the commands below;
sudo rm /etc/localtime
sudo ln -sf /usr/share/zoneinfo/Europe/Istanbul /etc/localtime
sudo reboot
Explanation of the procedure above is simply;
remove localtime,
update the timezone,
reboot
Please notify that I've changed my timezone to Turkey's localtime, you can find your timezone by listing zoneinfo directory with the command below;
ls /usr/share/zoneinfo
or just check timezone abbrevetaions via wikipedia;
http://en.wikipedia.org/wiki/Category:Tz_database
You can also check out the related Amazon AWS documentation;
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html
Note: I'm not sure that if this is the best practice or not (probably not), however I've applied the procedure I've written above and it's working for me.