Nextflow sarek pipeline on AWS batch - amazon-web-services

How to run nextflow sarek pipeline using aws batch in cloud9 environment?
I tried and I am getting "
Essential container in task exited

In order to be able to run jobs using AWS Batch, Nextflow requires access to the AWS CLI (i.e. aws) from within each of the containers that the pipeline has specified. To do this, you will need to create a custom AMI and use Conda (or another package manager) to install the AWS CLI tool. Ensure that your AMI also has Docker installed, see: docker installation.
The reason is that when the AWS CLI tool executes using Conda it will
use the version of python supplied by Conda. If you don’t use Conda
and install the AWS CLI using something like pip the aws command will
attempt to run using the version of python found in the running
container which won’t be able to find the necessary dependencies.
In your IAM settings, create an ecsInstanceRole and attach the AmazonS3FullAccess and AmazonEC2ContainerServiceforEC2Role policies. Then, when configuring a Compute Environment for AWS Batch, you will need to specify this instance role in step 1. Make sure to also supply the custom AMI ID (created above) when configuring the instance (under additional configuration) in step 2. You can then create a Job Queue and attach the compute environment to it. Finally, create an S3 bucket to write the results to.
Then get started with Cloud9 by creating and opening up an environment. The first job is to install Nextflow and move it to somewhere in your $PATH:
$ curl -s https://get.nextflow.io | bash
$ mkdir ~/bin && mv nextflow ~/bin
Then, with the following in ~/.nextflow/config for example:
plugins {
id 'nf-amazon'
}
process {
executor = 'awsbatch'
queue = 'test-queue'
errorStrategy = 'retry'
maxRetries = 3
}
aws {
batch {
cliPath = '/home/ec2-user/miniconda/bin/aws'
}
region = 'us-east-1'
}
Test the pipeline:
$ nextflow run nf-core/sarek \
-ansi-log false \
-revision 3.1.1 \
-profile test \
-work-dir s3://mybucket/work \
--outdir s3://mybucket/results
Results:
N E X T F L O W ~ version 22.10.3
Pulling nf-core/sarek ...
downloaded from https://github.com/nf-core/sarek.git
Launching `https://github.com/nf-core/sarek` [chaotic_cray] DSL2 - revision: 96749f7421 [3.1.1]
------------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
____
.´ _ `.
/ |\`-_ \ __ __ ___
| | \ `-| |__` /\ |__) |__ |__/
\ | \ / .__| /¯¯\ | \ |___ | \
`|____\´
nf-core/sarek v3.1.1
------------------------------------------------------
Core Nextflow options
revision : 3.1.1
runName : chaotic_cray
launchDir : /home/ec2-user
workDir : /mybucket/work
projectDir : /home/ec2-user/.nextflow/assets/nf-core/sarek
userName : ec2-user
profile : test
configFiles : /home/ec2-user/.nextflow/config, /home/ec2-user/.nextflow/assets/nf-core/sarek/nextflow.config
Input/output options
input : /home/ec2-user/.nextflow/assets/nf-core/sarek/tests/csv/3.0/fastq_single.csv
outdir : s3://mybucket/results
Main options
split_fastq : 0
intervals : https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/genome.interval_list
tools : strelka
Reference genome options
genome : null
dbsnp : https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/dbsnp_146.hg38.vcf.gz
fasta : https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/genome.fasta
germline_resource : https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/gnomAD.r2.1.1.vcf.gz
known_indels : https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/genome/vcf/mills_and_1000G.indels.vcf.gz
snpeff_db : WBcel235.105
snpeff_genome : WBcel235
snpeff_version : 5.1
vep_genome : WBcel235
vep_species : caenorhabditis_elegans
vep_cache_version : 106
vep_version : 106.1
igenomes_base : s3://ngi-igenomes/igenomes
igenomes_ignore : true
Institutional config options
config_profile_name : Test profile
config_profile_description: Minimal test dataset to check pipeline function
Max job request options
max_cpus : 2
max_memory : 6.5GB
max_time : 8.h
!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/sarek for your analysis please cite:
* The pipeline
https://doi.org/10.12688/f1000research.16665.2
https://doi.org/10.5281/zenodo.4468605
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
https://github.com/nf-core/sarek/blob/master/CITATIONS.md
------------------------------------------------------
WARN: There's no process matching config selector: .*:FREEC_SOMATIC -- Did you mean: FREEC_SOMATIC?
WARN: There's no process matching config selector: .*:FILTERVARIANTTRANCHES -- Did you mean: FILTERVARIANTTRANCHES?
WARN: There's no process matching config selector: NFCORE_SAREK:SAREK:CRAM_QC_NO_MD:SAMTOOLS_STATS -- Did you mean: NFCORE_SAREK:SAREK:CRAM_QC_RECAL:SAMTOOLS_STATS?
[0a/34e54c] Submitted process > NFCORE_SAREK:SAREK:PREPARE_INTERVALS:GATK4_INTERVALLISTTOBED (genome)
[68/90b2eb] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_DBSNP (dbsnp_146.hg38.vcf)
[58/00228d] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:SAMTOOLS_FAIDX (genome.fasta)
[87/c64131] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:GATK4_CREATESEQUENCEDICTIONARY (genome.fasta)
[91/5140a7] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:BWAMEM1_INDEX (genome.fasta)
[a2/823190] Submitted process > NFCORE_SAREK:SAREK:PREPARE_INTERVALS:CREATE_INTERVALS_BED (genome.interval_list)
[c2/b42dd9] Submitted process > NFCORE_SAREK:SAREK:PREPARE_GENOME:TABIX_KNOWN_INDELS (mills_and_1000G.indels.vcf)
Staging foreign file: https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/fastq/test_1.fastq.gz
[87/cb0449] Submitted process > NFCORE_SAREK:SAREK:FASTQC (test-test_L1)
[f4/86267b] Submitted process > NFCORE_SAREK:SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_SPLIT (chr22_1-40001)
[eb/dea090] Submitted process > NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP:BWAMEM1_MEM (test)
[4c/f5096d] Submitted process > NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:GATK4_MARKDUPLICATES (test)
[b4/ebcc15] Submitted process > NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:INDEX_MARKDUPLICATES (test)
[c0/8de864] Submitted process > NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:CRAM_QC_MOSDEPTH_SAMTOOLS:SAMTOOLS_STATS (test)
[be/d73b9d] Submitted process > NFCORE_SAREK:SAREK:BAM_MARKDUPLICATES:CRAM_QC_MOSDEPTH_SAMTOOLS:MOSDEPTH (test)
[68/acdf3e] Submitted process > NFCORE_SAREK:SAREK:BAM_BASERECALIBRATOR:GATK4_BASERECALIBRATOR (test)
[79/cff52c] Submitted process > NFCORE_SAREK:SAREK:BAM_APPLYBQSR:GATK4_APPLYBQSR (test)
[5b/cde6db] Submitted process > NFCORE_SAREK:SAREK:BAM_APPLYBQSR:CRAM_MERGE_INDEX_SAMTOOLS:INDEX_CRAM (test)
[20/d44d7e] Submitted process > NFCORE_SAREK:SAREK:CRAM_QC_RECAL:SAMTOOLS_STATS (test)
[99/f6362e] Submitted process > NFCORE_SAREK:SAREK:CRAM_QC_RECAL:MOSDEPTH (test)
[0f/892e88] Submitted process > NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SINGLE_STRELKA:STRELKA_SINGLE (test)
[69/ca112a] Submitted process > NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:BCFTOOLS_STATS (test)
[82/2d90d6] Submitted process > NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_COUNT (test)
[cd/5be221] Submitted process > NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_TSTV_QUAL (test)
[b8/142b75] Submitted process > NFCORE_SAREK:SAREK:VCF_QC_BCFTOOLS_VCFTOOLS:VCFTOOLS_SUMMARY (test)
[25/397520] Submitted process > NFCORE_SAREK:SAREK:CUSTOM_DUMPSOFTWAREVERSIONS (1)
[f6/a9cc92] Submitted process > NFCORE_SAREK:SAREK:MULTIQC
Waiting for file transfers to complete (1 files)
-[nf-core/sarek] Pipeline completed successfully-

Related

AWS SSM RunCommand - Issue with RunRemoteScript Document to run PowerShell script with parameters

In AWS SSM, I use RunRemoteScript document to run a PowerShell script to install some software on SSM managed instances. The script is hosted in a public accessible S3 bucket.
The RunCommand works fine with the script not taking any parameters. Software was successfully deployed to managed instances. But my script has a unique CID embedded in the code. For security reasons, I need to take it out and set it as a parameter for the PS script. Ever since then, the RunCommand just keeps failing.
My script looks like below (with parameter CID):
param (
[Parameter(Position = 0, Mandatory = 1)]
[string]$CID
)
Start-Transcript -Path "$([System.Environment]::GetEnvironmentVariable('TEMP','Machine'))\app_install.log" -Append
function Install-App {
<#
Installs App
#>
[CmdletBinding()]
[OutputType([PSCustomObject])]
param (
[Parameter(Position = 0, Mandatory = 1)]
[string]$msiURL,
[Parameter(Position = 2, Mandatory = 1)]
[string]$InstallCheck,
[Parameter(Position = 3, Mandatory = 1)]
[string]$CustomerID
)
if ( -not(Test-Path $installCheck)) {
# Do stuff
...
}
else {
Write-Host ("$installCheck - Already Installed")
Return "Already Installed, Skipped $(($msiURL -split '([^\\/]+$)')[1])"
}
}
Install-App -msiURL "https://s3.amazonaws.com/app.foo.com/Windows/app.exe" -InstallCheck "C:\Program Files\App\app.exe" -CustomerID $CID
Stop-Transcript
By following AWS SSM documentation below, I run the command below to kick off the RunCommand.
https://docs.aws.amazon.com/systems-manager/latest/userguide/integration-remote-scripts.html
aws ssm send-command --document-name "AWS-RunRemoteScript" --targets "Key=instanceids,Values=mi-abc12345"
--parameters '{"sourceType":["S3"],"sourceInfo":["{\"path\": "https://s3.amazonaws.com/app.foo.com/Windows/app_install.ps1\"}"],"commandLine":["app_install.ps1 abcd123456"]}'
The RunCommand keeps failing with error below:
----------ERROR-------
app_install.ps1 : The term 'app_install.ps1' is not recognized
as the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is
correct and try again.
At C:\ProgramData\Amazon\SSM\InstanceData\mi-abcd1234\document\orchest
ration\a6811111d-c411-411-a222-bad123456\runPowerShellScript\_script.ps1:4
char:2
+ app_install.ps1 abcd123456
+ ~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (app_install.ps1:String)
[], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
failed to run commands: exit status 255
I suspect this is to do with the way how RunCommand handles the argument for the PowerShell script. But I cannnot find any examples other than the official document, which I followed. Anyone can point out what the issue is here?
BTW, I already tried putting the ps1 after ".\" without luck.
I found out the cause of the issue. The IAM role attached to the instance did not have sufficient rights to access the S3 bucket holds the script. As a result SSM wasn't able to download the script to the instance, hence the error "...ps1 is not recognized".
So it's not related to the code actually.

Serverspec test fail when i run its pipeline from other pipeline

I'm running 3 pipelines in jenkins (CI, CD, CDP) when I run the CI pipe the final stage is a trigger for activate the pipe CD (Continuous Deployment), this receive a parameter APP_VERSION from CI (Continuous Integration) PIPE and deploy an instance with packer and run SERVERSPEC TEST, but serverspec test failed.
but the demo-app is installed via salstack
The strange is when I run the CD and pass the parameter APP_VERSION manually this WORK !!
this is the final stage for pipeline CI
stage "Trigger downstream"
echo 'parametro'
def versionApp = sh returnStdout: true, script:"echo \$(git rev-parse --short HEAD) "
build job: "demo-pipeCD", parameters: [[$class: "StringParameterValue", name: "APP_VERSION", value: "${versionApp}"]], wait: false
}
I have passed to serverspec the sbin PATH and not work.
EDIT: I add the code the test.
enter code here
require 'spec_helper'
versionFile = open('/tmp/APP_VERSION')
appVersion = versionFile.read.chomp
describe package("demo-app-#{appVersion}") do
it { should be_installed }
end
Also, i add the job pipeline
#!groovy
node {
step([$class: 'WsCleanup'])
stage "Checkout Git repo"
checkout scm
stage "Checkout additional repos"
dir("pipeCD") {
git "https://git-codecommit.us-east-
1.amazonaws.com/v1/repos/pipeCD"
}
stage "Run Packer"
sh "echo $APP_VERSION"
sh "\$(export PATH=/usr/bin:/root/bin:/usr/local/bin:/sbin)"
sh "/opt/packer validate -var=\"appVersion=$APP_VERSION\" -var-
file=packer/demo-app_vars.json packer/demo-app.json"
sh "/opt/packer build -machine-readable -
var=\"appVersion=$APP_VERSION\" -var-file=packer/demo-app_vars.json
packer/demo-app.json | tee packer/packer.log"
REPEAT .. the parameter APP_VERSION in job pipe is rigth and the demo-app app is installed before the execute the test.

Terraform - output ec2 instance ids to calling shell script

I am using 'terraform apply' in a shell script to create multiple EC2 instances. I need to output the list of generated IPs to a script variable & use the list in another sub-script. I have defined output variables for the ips in a terraform config file - 'instance_ips'
output "instance_ips" {
value = [
"${aws_instance.gocd_master.private_ip}",
"${aws_instance.gocd_agent.*.private_ip}"
]
}
However, the terraform apply command is printing entire EC2 generation output apart from the output variables.
terraform init \
-backend-config="region=$AWS_DEFAULT_REGION" \
-backend-config="bucket=$TERRAFORM_STATE_BUCKET_NAME" \
-backend-config="role_arn=$PROVISIONING_ROLE" \
-reconfigure \
"$TERRAFORM_DIR"
OUTPUT = $( terraform apply <input variables e.g -
var="aws_region=$AWS_DEFAULT_REGION">
-auto-approve \
-input=false \
"$TERRAFORM_DIR"
)
terraform output instance_ips
So the 'OUTPUT' script variable content is
Terraform command: apply Initialising the backend... Successfully
configured the backend "s3"! Terraform will automatically use this
backend unless the backend configuration changes. Initialising provider
plugins... Terraform has been successfully initialised!
.
.
.
aws_route53_record.gocd_agent_dns_entry[2]: Creation complete after 52s
(ID:<zone ............................)
aws_route53_record.gocd_master_dns_entry: Creation complete after 52s
(ID:<zone ............................)
aws_route53_record.gocd_agent_dns_entry[1]: Creation complete after 53s
(ID:<zone ............................)
Apply complete! Resources: 9 added, 0 changed, 0 destroyed. Outputs:
instance_ips = [ 10.39.209.155, 10.39.208.44, 10.39.208.251,
10.39.209.227 ]
instead of just the EC2 ips.
Firing the 'terraform output instance_ips' is throwing a 'Initialisation Required' error which I understand means 'terraform init' is required.
Is there any way to suppress ec2 generation & just print output variables. if not, how to retrieve the IPs using 'terraform output' command w/o needing to do a terraform init ?
If I understood the context correctly, you can actually create a file in that directory & that file can be used by your sub-shell script. You can do it by using a null_resource OR "local_file".
Here is how we can use it in a modularized structure -
Using null_resource -
resource "null_resource" "instance_ips" {
triggers {
ip_file = "${sha1(file("${path.module}/instance_ips.txt"))}"
}
provisioner "local-exec" {
command = "echo ${module.ec2.instance_ips} >> instance_ips.txt"
}
}
Using local_file -
resource "local_file" "instance_ips" {
content = "${module.ec2.instance_ips}"
filename = "${path.module}/instance_ips.txt"
}

Environment Variables in newest AWS EC2 instance

I am trying to get ENVIRONMENT Variables into the EC2 instance (trying to run a django app on Amazon Linux AMI 2018.03.0 (HVM), SSD Volume Type ami-0ff8a91507f77f867 ). How do you get them in the newest version of amazon's linux, or get the logging so it can be traced.
user-data text (modified from here):
#!/bin/bash
#trying to get a file made
touch /tmp/testfile.txt
cat 'This and that' > /tmp/testfile.txt
#trying to log
echo 'Woot!' > /home/ec2-user/user-script-output.txt
#Trying to get the output logged to see what is going wrong
exec > >(tee /var/log/user-data.log|logger -t user-data ) 2>&1
#trying to log
echo "XXXXXXXXXX STARTING USER DATA SCRIPT XXXXXXXXXXXXXX"
#trying to store the ENVIRONMENT VARIABLES
PARAMETER_PATH='/'
REGION='us-east-1'
# Functions
AWS="/usr/local/bin/aws"
get_parameter_store_tags() {
echo $($AWS ssm get-parameters-by-path --with-decryption --path ${PARAMETER_PATH} --region ${REGION})
}
params_to_env () {
params=$1
# If .Ta1gs does not exist we assume ssm Parameteres object.
SELECTOR="Name"
for key in $(echo $params | /usr/bin/jq -r ".[][].${SELECTOR}"); do
value=$(echo $params | /usr/bin/jq -r ".[][] | select(.${SELECTOR}==\"$key\") | .Value")
key=$(echo "${key##*/}" | /usr/bin/tr ':' '_' | /usr/bin/tr '-' '_' | /usr/bin/tr '[:lower:]' '[:upper:]')
export $key="$value"
echo "$key=$value"
done
}
# Get TAGS
if [ -z "$PARAMETER_PATH" ]
then
echo "Please provide a parameter store path. -p option"
exit 1
fi
TAGS=$(get_parameter_store_tags ${PARAMETER_PATH} ${REGION})
echo "Tags fetched via ssm from ${PARAMETER_PATH} ${REGION}"
echo "Adding new variables..."
params_to_env "$TAGS"
Notes -
What i think i know but am unsure
the user-data script is only loaded when it is created, not when I stop and then start mentioned here (although it also says [i think outdated] that the output is logged to /var/log/cloud-init-output.log )
I may not be starting the instance correctly
I don't know where to store the bash script so that it can be executed
What I have verified
the user-data text is on the instance by ssh-ing in and curl http://169.254.169.254/latest/user-data shows the current text (#!/bin/bash …)
What Ive tried
editing rc.local directly to export AWS_ACCESS_KEY_ID='JEFEJEFEJEFEJEFE' … and the like
putting them in the AWS Parameter Store (and can see them via the correct call, I just can't trace getting them into the EC2 instance without logs or confirming if the user-data is getting run)
putting ENV variables in Tags and importing them as mentioned here:
tried outputting the logs to other files as suggested here (Not seeing any log files in the ssh instance or on the system log)
viewing the System Log on the aws webpage to see any errors/logs via selecting the instance -> 'Actions' -> 'Instance Settings' -> 'Get System Log' (not seeing any commands run or log statements [only 1 unrelated word of user])

How to set up Loggly on Elastic Beanstalk?

I'd like to set up Loggly to run on AWS Elastic Beanstalk, but can't find any information on how to do this. Is there any guide anywhere, or some general guidance on how to start?
This is how I do it, for papertrailapp.com (which I prefer instead of loggly). In your /ebextensions folder (see more info) you create logs.config, where specify:
container_commands:
01-set-correct-hostname:
command: hostname www.example.com
02-forward-rsyslog-to-papertrail:
# https://papertrailapp.com/systems/setup
command: echo "*.* #logs.papertrailapp.com:55555" >> /etc/rsyslog.conf
03-enable-remote-logging:
command: echo -e "\$ModLoad imudp\n\$UDPServerRun 514\n\$ModLoad imtcp\n\$InputTCPServerRun 514\n\$EscapeControlCharactersOnReceive off" >> /etc/rsyslog.conf
04-restart-syslog:
command: service rsyslog restart
55555 should be replaced with the UDP port number provided by papertrailapp.com. Every time after new instance bootstrap this config will be applied. Then, in your log4j.properties:
log4j.rootLogger=WARN, SYSLOG
log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.facility=local1
log4j.appender.SYSLOG.header=true
log4j.appender.SYSLOG.syslogHost=localhost
log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
log4j.appender.SYSLOG.layout.ConversionPattern=[%p] %t %c: %m%n
I'm not sure whether it's an optimal solution. Read more about this mechanism in jcabi-beanstalk-maven-plugin
You can also use the installation script from loggly itself.
The setup below follows the instructions for the legacy setup on https://www.loggly.com/docs/configure-syslog-script/ with minor changes (no confirmation prompts, sudo command replaced since no tty is available)
(edit: updated link, seems to be an outdated solution now in loggly docs)
Place the following script in .ebextensions/loggly.config
Replace TOKEN and ACCOUNT with your own.
#
# Install loggly.com on AWS Elastic Beanstalk
# Tested with node.js environment
# Save this file as .ebextensions/loggly.config
# Deploy per normal scripts or aws.push. To help debug the push, ssh & tail /var/log/cfn-init.log
# See Also /var/log/eb-tools.log
#
commands:
01_loggly_dl:
command: wget -q -O /tmp/loggly.py https://www.loggly.com/install/configure-syslog.py
02_loggly_config:
command: su --session-command="python /tmp/loggly.py setup --auth TOKEN --account ACCOUNT --yes"
Here is a link to loggly support site for using syslogd with loggly:
http://wiki.loggly.com/loggingconfiguration
or using the loggly api with your own app:
http://wiki.loggly.com/apidocumention
Here is an elasticbeanstalk config for Loggly that I've just started using thanks to pointers from this thread and the logging SaaS vendors setup instructions. [Loggly Config Mgmt, Papertrail rsyslog ]
Save the file as loggly.config in the .ebextensions directory and make sure to check the YAML formatting conventions (no tabs, etc). Substitute your Loggly TCP port number, username, password and domain name into the angle brackets as required.
Note that for AWS ruby versions of elasticbeanstalk, there may be differences in the EC2 /etc/rsyslog setup. For example, if /etc/rsyslog.d already exists, and there is already an "$IncludeConfig /etc/rsyslog.d/*.conf" directive, then command "01-forward-rsyslog-to-loggly:" can be removed.
Deploy per normal scripts or aws.push. To help debug the push, ssh & tail /var/log/cfn-init.log
files:
"/etc/rsyslog.d/90-loggly.conf" :
mode: "000664"
owner: root
group: root
content: |
# ### begin forwarding rule ###
# The statement between the begin ... end define a SINGLE forwarding
# rule. They belong together, do NOT split them. If you create multiple
# forwarding rules, duplicate the whole block!
# Remote Logging (we use TCP for reliable delivery)
#
# An on-disk queue is created for this action. If the remote host is
# down, messages are spooled to disk and sent when it is up again.
$WorkDirectory /var/lib/rsyslog # where to place spool files
$ActionQueueFileName fwdRule1 # unique name prefix for spool files
$ActionQueueMaxDiskSpace 1g # 1gb space limit (use as much as possible)
$ActionQueueSaveOnShutdown on # save messages to disk on shutdown
$ActionQueueType LinkedList # run asynchronously
$ActionResumeRetryCount -1 # infinite retries if host is down
*.* ##logs.loggly.com:<yourportnum> # !!!Loggly supplied port number for each app!!!
# ### end of the forwarding rule ###
encoding: plain
"/tmp/loggly.py" :
mode: "000755"
owner: root
group: root
content: |
import json
import sys
import urllib2
'''
Auto-authenticate Syslog TCP inputs.
Usage: python inputs.py -u user -p pass -s subdomain
'''
state = None
params = {}
for i in range(len(sys.argv)):
arg = sys.argv[i]
if state:
params[state] = arg
state = None
if arg == '--username' or arg == '-u':
state = 'username'
if arg == '--password' or arg == '-p':
state = 'password'
if arg == '--subdomain' or arg == '-s':
state = 'subdomain'
url = 'https://%s.loggly.com/api/inputs' % params['subdomain']
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, url, params['username'], params['password'])
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(handler)
opener.open(url)
urllib2.install_opener(opener)
inputs = json.loads(urllib2.urlopen(url).read())
for input in inputs:
if input['service']['name'] == 'syslogtcp':
url = 'https://%s.loggly.com/api/inputs/%d/adddevice' % \
(params['subdomain'], input['id'])
response = urllib2.urlopen(url, {}).read()
print response
encoding: plain
commands:
01-forward-rsyslog-to-loggly:
# http://loggly.com/support/sending-data/logging-from/syslog/rsyslog/cd
command: test "$(grep -s '90-loggly.conf' /etc/rsyslog.conf)" == "" && echo -e "\n# Include the loggly.conf file\n\$IncludeConfig /etc/rsyslog.d/90-loggly.conf" >> /etc/rsyslog.conf
02-restart-syslog:
command: service rsyslog restart
03-inform_loggly:
command: "python /tmp/loggly.py -u <Yourloginname> -p <Yourpassword> -s <Yourdomainname>"
Typically, /etc/rsyslog.config will have a "$IncludeConfig /etc/rsyslog.d/*.conf" at the end - so you can simply introduce your own configuration file using the "files:" portion of your .ebextensions file. This works whether you are deploying to fresh servers or not.
For a ruby production.log, you might have something like this in a .ebextensions/01loggly.config file. Note this picks up your beanstalk environment name too as a loggly tag.
# For docs on eb configs, see http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
# This set of commands sets up loggly forwarding
files:
"/etc/rsyslog.d/myapp-loggly.conf" :
mode: "000664"
owner: root
group: root
content: |
$template LogglyFormat,"<%pri%>%protocol-version% %timestamp:::date-rfc3339% %HOSTNAME% %app-name% %procid% %msgid% [yourlogglyid#41058 tag=`{ "Ref" : "AWSEBEnvironmentName" }`] %msg%\n"
*.* ##logs-01.loggly.com:514;LogglyFormat
# One time config
$ModLoad imfile
$InputFilePollInterval 10
$PrivDropToGroup adm
$WorkDirectory /var/spool/rsyslog
# Add a tag for file events
# For production.log
$InputFileName /var/app/support/logs/production.log
$InputFileTag production-log
$InputFileStateFile stat-production-log #this must be unique for each file being polled
$InputFileSeverity info
$InputFilePersistStateInterval 20000
$InputRunFileMonitor
# Send to Loggly then discard
if $programname == 'myapp-production-log' then ##logs-01.loggly.com:514;LogglyFormat
if $programname == 'myapp-production-log' then ~
encoding: plain
commands:
00-make-work-directory:
command: mkdir -p /var/spool/rsyslog
01-restart-syslog:
command: service rsyslog restart
For Tomcat, you might do something like this in a .ebextesions/01logglyg.config file:
# For docs on eb configs, see http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
# This set of commands sets up loggly forwarding
files:
"/etc/rsyslog.d/mytomcatapp-loggly.conf" :
mode: "000664"
owner: root
group: root
content: |
$template LogglyFormat,"<%pri%>%protocol-version% %timestamp:::date-rfc3339% %HOSTNAME% %app-name% %procid% %msgid% [yourlogglygidhere#41058 tag=`{ "Ref" : "AWSEBEnvironmentName" }`] %msg%\n"
*.* ##logs-01.loggly.com:514;LogglyFormat
# One time config
$ModLoad imfile
$InputFilePollInterval 10
$PrivDropToGroup adm
$WorkDirectory /var/spool/rsyslog
# catalina.log
$InputFileName /var/log/tomcat7/catalina.log
$InputFileTag catalina-log
$InputFileStateFile stat-catalina-log
$InputFileSeverity info
$InputFilePersistStateInterval 20000
$InputRunFileMonitor
if $programname == 'catalina-log' then ##logs-01.loggly.com:514;LogglyFormat
if $programname == 'catalina-log' then ~
# catalina.out
$InputFileName /var/log/tomcat7/catalina.out
$InputFileTag catalina-out
$InputFileStateFile stat-catalina-out
$InputFileSeverity info
$InputFilePersistStateInterval 20000
$InputRunFileMonitor
if $programname == 'catalina-out' then ##logs-01.loggly.com:514;LogglyFormat
if $programname == 'catalina-out' then ~
# host-manager.log
$InputFileName /var/log/tomcat7/host-manager.log
$InputFileTag host-manager
$InputFileStateFile stat-host-manager
$InputFileSeverity info
$InputFilePersistStateInterval 20000
$InputRunFileMonitor
if $programname == 'host-manager' then ##logs-01.loggly.com:514;LogglyFormat
if $programname == 'host-manager' then ~
# initd.log
$InputFileName /var/log/tomcat7/initd.log
$InputFileTag initd
$InputFileStateFile stat-initd
$InputFileSeverity info
$InputFilePersistStateInterval 20000
$InputRunFileMonitor
if $programname == 'initd' then ##logs-01.loggly.com:514;LogglyFormat
if $programname == 'initd' then ~
# localhost.log
$InputFileName /var/log/tomcat7/localhost.log
$InputFileTag localhost-log
$InputFileStateFile stat-localhost-log
$InputFileSeverity info
$InputFilePersistStateInterval 20000
$InputRunFileMonitor
if $programname == 'localhost-log' then ##logs-01.loggly.com:514;LogglyFormat
if $programname == 'localhost-log' then ~
# manager.log
$InputFileName /var/log/tomcat7/manager.log
$InputFileTag manager
$InputFileStateFile stat-manager
$InputFileSeverity info
$InputFilePersistStateInterval 20000
$InputRunFileMonitor
if $programname == 'manager' then ##logs-01.loggly.com:514;LogglyFormat
if $programname == 'manager' then ~
encoding: plain
commands:
00-make-work-directory:
command: mkdir -p /var/spool/rsyslog
01-restart-syslog:
command: service rsyslog restart
This config is working for me - though I haven't yet determined how to get multi-line entries coming into a single entry in Loggly yet.
I know this is question is fairly old but I found that the answers really didnt answer the question or just plain didnt work correctly when implemented. I found that this works (file .ebextenstions/02loggly.config):
container_commands:
01-transform-rsyslog.conf:
command: sed "s/NODE_ENV/$NODE_ENV/g" scripts/22-loggly.conf.temp > scripts/22-loggly.conf
02-setup-rsyslog.conf:
command: cp scripts/22-loggly.conf /etc/rsyslog.d/22-loggly.conf
03-restart:
command: /sbin/service rsyslog restart
the "01-transform-rsyslog.conf" step is optional; I use that to set a tag by NODE_ENV in the file. "22-loggly.conf.temp" is a modified version of the "22-loggly.conf" file that gets created at "/etc/rsyslog.d/" when you run the linux source setup script (https://www.loggly.com/install/configure-syslog.py). I just installed it on a ec2 instance and copied the file.
Note I had to prepend '/sbin' to my service command because it was failing for me without it. Also, this restarts syslog on every deploy, which should be fine.
Now you just have to make sure your app logs to syslog. For Java it is going to be log4j or similar. For Node.js (which is what I'm using), rconsole works (https://github.com/tblobaum/rconsole).
None of the things I tried seemed to work, and the loggly documentation is very confusing!
I hope that this will help someone, this is how I got it to work.
Paste the following in .ebextensions/loggly.config
files:
"/etc/rsyslog.conf" :
mode: "000644"
owner: root
group: root
content: |
$ModLoad imfile
$InputFilePollInterval 10
$PrivDropToGroup adm
# Input for FILE.LOG
$InputFileName /var/app/current/PATH_TO_YOUR_LOG_FILE
$InputFileTag social_php:
$InputFileStateFile stat-social_php #this must be unique for each file being polled
$InputFileSeverity info
$InputRunFileMonitor
#Add a tag for events from this file
$template LogglyFormatsocial_php,"<%pri%>%protocol-version% %timestamp:::date-rfc3339% %HOSTNAME% %app-name% %procid% %msgid% [TOKEN#41058 tag=\"php_log\"] %msg%\n"
if $programname == 'social_php' then ##logs.loggly.com:37146;LogglyFormatsocial_php
if $programname == 'social_php' then ~
*.* ##logs.loggly.com:37146
commands:
01-restart-syslog:
command: service rsyslog restart
Replace all instances of social_php with the tag that makes sense for your application.
Replace /var/app/current/PATH_TO_YOUR_LOG_FILE with your log file location
Follow my loggly configuration in elasticbeanstalk. For Linux + log4j
on .ebextensions file configuration
container_commands:
01_configure_sudo_access:
command: sed -i -- 's/ requiretty/ \!requiretty/g' /etc/sudoers
02_loggy_configure:
command: sudo python .ebextensions/scripts/loggly_config.py
03_restore_sudo_access:
command: sed -i -- 's/ \!requiretty/ requiretty/g' /etc/sudoers
Loggly script in python for default AMI:
import os
rsyslog_path = '/etc/rsyslog.conf'
loggly_file_path = '/etc/rsyslog.d/22-loggly.conf'
class LogglyConfig:
def __init__(self):
self.__linux_log()
self.__config_loggly_for_log4j()
def __linux_log(self):
#not installed on this machine
if not os.path.exists(loggly_file_path):
os.system('rm -f configure-linux.sh')
os.system('wget https://www.loggly.com/install/configure-linux.sh')
os.system('sudo bash configure-linux.sh -a DOMAIN -t TOKEN -u USER -p PASSWORD -s')
def __config_loggly_for_log4j(self):
f = open(rsyslog_path,'r')
file_text = f.read()
f.close()
file_text = file_text.replace('#$ModLoad imudp', '$ModLoad imudp')
file_text = file_text.replace('#$UDPServerRun 514', '$UDPServerRun 514')
f = open(rsyslog_path,'w')
f.write(file_text)
f.close()
os.system('service rsyslog restart')
LogglyConfig()
In log4j.properties on your java project
log4j.rootLogger=INFO, SYSLOG
log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender
log4j.appender.SYSLOG.SyslogHost=localhost
log4j.appender.SYSLOG.Facility=Local3
log4j.appender.SYSLOG.Header=true
log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout
log4j.appender.SYSLOG.layout.ConversionPattern=java %d{ISO8601} %p %t %c{1}.%M - %m%n