I want to send my alert to two different distribution lists in Alertmanager for Prometheus. The only way to distinguish my alerts is by their job name.
my alert names are like below:
sample1:
Labels
alertname = SyslogErrors
instance = 22.32.23.32:2324
job = my-job-sample-service-dev
message = Exception raised during message subscription. Trying again in 60 seconds
monitor = server1
severity = critical
Annotations
description = Errors have been found for my-job-sample-service-dev application in /data/logs/messages/my-job-sample-service-dev syslog file
Source
sample2:
Labels
alertname = SyslogErrors
instance = 22.32.23.32:2324
job = my-job-sample-service-pre-dev
message = Exception raised during message subscription. Trying again in 60 seconds
monitor = server1
severity = critical
Annotations
description = Errors have been found for my-job-sample-service-pre-dev application in /data/logs/messages/my-job-sample-service-pre-dev syslog file
Source
here is my sample alertmanager config file:
global:
smtp_smarthost: 'mail.server.com:25'
smtp_from: 'dev#server.com'
smtp_require_tls: false
templates:
- '/etc/alertmanager/template/*.tmpl'
route:
receiver: mail-receiver-dev
group_by: ['alertname']
group_wait: 3s
group_interval: 5s
repeat_interval: 1h
# All alerts that do not match the following child routes
# will remain at the root node and be dispatched to 'default-receiver'.
routes:
- receiver: 'mail-pre-dev'
group_wait: 10s
match_re:
- job = .*pre-dev.*
- receiver: 'mail-dev'
group_wait: 10s
match_re:
- job = .*dev.*
receivers:
- name: 'mail-dev'
email_configs:
- to: 'dev-group#server.com'
send_resolved: true
- name: 'mail-pre-dev'
email_configs:
- to: 'pre-dev-group#server.com'
send_resolved: true
I am using the below link as a reference:
reference
Testing config file link
testscript for using above link: {service="foo-service",severity="critical",job="my-job-sample-service-dev"}
So the question is, how to send an alert to a different channel by using regex for the job title? At the moment when I test all the alert goes to pre-dev.
Change the following:
match_re:
- job = .*pre-dev.*
To:
matchers:
- job =~ ".*pre-dev.*"
Note:
"match_re" is deprecated and must be replaced by "matchers", but if you want to use it, the correct syntax is:
match_re:
- job: ".*pre-dev.*"
Environment:
ignite server:
centos6.5 with kernel 2.6.32-431.el6.x86_64
ignite version 1.9
hadoop version 2.6.2
3 server nodes with each having '-Xms16g -Xmx16g -server -XX:+AggressiveOpts -XX:MaxMetaspaceSize=256m' set when started
I run a map reduce test job with ignite map reduce. The job is simply getting the average number for each people. The data is like:
Jack 0.35
Tom 0.78
Lily 0.92
Jack 0.28
Tom 0.18
...
At first, I generated a data set of 100M lines. It's about 2.53GB. The job finished correctly in about 30s. Then I generated a data set of 1 Billion lines, about 25.3GB. The job always failed with exceptions. I tried several times but the same result.
The ignite server node threw exception below:
[15:06:56,804][ERROR][sys-#2740%null%][GridTcpRestProtocol] Failed to process client request [ses=GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=549 lim=549 cap=8192], super=AbstractNioClientWorker [selector=sun.nio.ch.EPollSelectorImpl#1cba0431, idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-3, gridName=null, finished=false, isCancelled=false, hashCode=906881587, interrupted=false, runner=grid-nio-worker-tcp-rest-3-#50%null%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.31.68.204:11211, rmtAddr=/172.31.68.202:39473, createTime=1493967985751, closeTime=1493968009502, bytesSent=2715, bytesRcvd=2641, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1493968016794, lastSndTime=1493967998303, lastRcvTime=1493968009502, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], routerClient=false], directMode=false]], accepted=true]], msg=GridClientTaskRequest [taskName=o.a.i.i.processors.hadoop.proto.HadoopProtocolJobStatusTask, arg=HadoopProtocolTaskArguments []]]
class org.apache.ignite.IgniteCheckedException: Failed to send message (connection was closed): GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=549 lim=549 cap=8192], super=AbstractNioClientWorker [selector=sun.nio.ch.EPollSelectorImpl#1cba0431, idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-3, gridName=null, finished=false, isCancelled=false, hashCode=906881587, interrupted=false, runner=grid-nio-worker-tcp-rest-3-#50%null%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.31.68.204:11211, rmtAddr=/172.31.68.202:39473, createTime=1493967985751, closeTime=1493968009502, bytesSent=2715, bytesRcvd=2641, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1493968016794, lastSndTime=1493967998303, lastRcvTime=1493968009502, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], routerClient=false], directMode=false]], accepted=true]]
at org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7239)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:170)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:119)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1$1.apply(GridTcpRestNioListener.java:264)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1$1.apply(GridTcpRestNioListener.java:261)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:228)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1.apply(GridTcpRestNioListener.java:261)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1.apply(GridTcpRestNioListener.java:229)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:332)
at org.apache.ignite.internal.processors.rest.GridRestProcessor$2$1.apply(GridRestProcessor.java:158)
at org.apache.ignite.internal.processors.rest.GridRestProcessor$2$1.apply(GridRestProcessor.java:155)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:332)
at org.apache.ignite.internal.util.future.GridFutureChainListener.applyCallback(GridFutureChainListener.java:78)
at org.apache.ignite.internal.util.future.GridFutureChainListener.apply(GridFutureChainListener.java:70)
at org.apache.ignite.internal.util.future.GridFutureChainListener.apply(GridFutureChainListener.java:30)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:332)
at org.apache.ignite.internal.processors.rest.handlers.task.GridTaskCommandHandler$2.apply(GridTaskCommandHandler.java:294)
at org.apache.ignite.internal.processors.rest.handlers.task.GridTaskCommandHandler$2.apply(GridTaskCommandHandler.java:257)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:271)
at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListeners(GridFutureAdapter.java:259)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:389)
at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:355)
at org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1579)
at org.apache.ignite.internal.processors.task.GridTaskWorker.finishTask(GridTaskWorker.java:1547)
at org.apache.ignite.internal.processors.task.GridTaskWorker.reduce(GridTaskWorker.java:1157)
at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:942)
at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:996)
at org.apache.ignite.internal.processors.task.GridTaskProcessor$JobMessageListener.onMessage(GridTaskProcessor.java:1221)
at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1222)
at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:850)
at org.apache.ignite.internal.managers.communication.GridIoManager.access$2100(GridIoManager.java:108)
at org.apache.ignite.internal.managers.communication.GridIoManager$7.run(GridIoManager.java:790)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to send message (connection was closed): GridSelectorNioSessionImpl [worker=ByteBufferNioClientWorker [readBuf=java.nio.HeapByteBuffer[pos=549 lim=549 cap=8192], super=AbstractNioClientWorker [selector=sun.nio.ch.EPollSelectorImpl#1cba0431, idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-3, gridName=null, finished=false, isCancelled=false, hashCode=906881587, interrupted=false, runner=grid-nio-worker-tcp-rest-3-#50%null%]]], writeBuf=null, readBuf=null, inRecovery=null, outRecovery=null, super=GridNioSessionImpl [locAddr=/172.31.68.204:11211, rmtAddr=/172.31.68.202:39473, createTime=1493967985751, closeTime=1493968009502, bytesSent=2715, bytesRcvd=2641, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1493968016794, lastSndTime=1493967998303, lastRcvTime=1493968009502, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], routerClient=false], directMode=false]], accepted=true]]
at org.apache.ignite.internal.util.nio.GridNioServer.send0(GridNioServer.java:554)
at org.apache.ignite.internal.util.nio.GridNioServer.send(GridNioServer.java:494)
at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionWrite(GridNioServer.java:3036)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:118)
at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionWrite(GridNioCodecFilter.java:94)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionWrite(GridNioFilterAdapter.java:118)
at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionWrite(GridNioFilterChain.java:264)
at org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionWrite(GridNioFilterChain.java:189)
at org.apache.ignite.internal.util.nio.GridNioSessionImpl.send(GridNioSessionImpl.java:108)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestNioListener$1.apply(GridTcpRestNioListener.java:258)
... 40 more
The job client threw exception below:
java.io.IOException: Failed to get job status: job_1fbf9083-9a44-4be9-9199-695a97652dc2_0002
at org.apache.ignite.internal.processors.hadoop.impl.proto.HadoopClientProtocol.getJobStatus(HadoopClientProtocol.java:197)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:326)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:323)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:611)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1357)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1318)
at com.tscloud.sdk.test.ignite.MRTest.run(MRTest.java:81)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.tscloud.sdk.test.ignite.MRTest.main(MRTest.java:53)
Caused by: class org.apache.ignite.internal.client.impl.connection.GridClientConnectionResetException: Failed to perform request (connection failed): /172.31.68.204:11211
at org.apache.ignite.internal.client.impl.connection.GridClientConnection.getCloseReasonAsException(GridClientConnection.java:491)
at org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection.close(GridClientNioTcpConnection.java:339)
at org.apache.ignite.internal.client.impl.connection.GridClientNioTcpConnection.close(GridClientNioTcpConnection.java:299)
at org.apache.ignite.internal.client.impl.connection.GridClientConnectionManagerAdapter$NioListener.onDisconnected(GridClientConnectionManagerAdapter.java:630)
at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionClosed(GridNioFilterChain.java:253)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:93)
at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionClosed(GridNioCodecFilter.java:70)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:93)
at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionClosed(GridNioServer.java:3005)
at org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionClosed(GridNioFilterChain.java:147)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.close(GridNioServer.java:2306)
at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:929)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2026)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:1863)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1568)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:745)
The job configuration is below:
Configuration configuration = new Configuration();
configuration.set(MRConfig.FRAMEWORK_NAME, IgniteHadoopClientProtocolProvider.FRAMEWORK_NAME);
configuration.set(MRConfig.MASTER_ADDRESS, "172.31.68.202:11211");
configuration.set("fs.igfs.impl", "org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem");
configuration.set("fs.default.name", "igfs://igfs#172.31.68.202/");
I checked nodes status after the job failed using ignitevisorcmd.sh. All server nodes were OK, but there were sometime one of the node server was down. I did not know why it behaved like this.
Any help is appreciated.
Edit(2017-05-16):
I changed the hadoop core-site.xml and add hadoop.tmp.dir property as below
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/hadoop-2.6.2/tmp</value>
</property>
Then I reformatted hdfs and uploaded the 25.3GB data file. I run the test successfully. It turns out my hdfs has something wrong. Reformatting namenode solves the problem.
Before above steps, I tried checking the jvm heap usage by VisualVM.
One of the server node visualvm monitor snapshot
If your node was down temporarily, then it is reasonable that a job would fail-over to another node or fail altogether if there are no other nodes. I would check that your network was not reset or down. Also, you should check for the presence of any software firewalls between your nodes (it is best to disable operating system firewalls).
I have two log files with multi-line log statements. Both of them have same datetime format at the begining of each log statement. The configuration looks like this:
state_file = /var/lib/awslogs/agent-state
[/opt/logdir/log1.0]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log1.0
log_stream_name = /opt/logdir/logs/log1.0
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group
[/opt/logdir/log2-console.log]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log2-console.log
log_stream_name = /opt/logdir/log2-console.log
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group
The cloudwatch logs agent is sending log1.0 logs correctly to my log group on cloudwatch, however, its not sending log files for log2-console.log.
awslogs.log says:
2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196444000, 'start_position': 42330916L, 'end_position': 42331504L}, reason: timestamp is more than 2 hours in future.
2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196451000, 'start_position': 42331504L, 'end_position': 42332092L}, reason: timestamp is more than 2 hours in future.
Though server time is correct. Also weird thing is Line numbers mentioned in start_position and end_position does not exist in actual log file being pushed.
Anyone else experiencing this issue?
I was able to fix this.
The state of awslogs was broken. The state is stored in a sqlite database in /var/awslogs/state/agent-state. You can access it via
sudo sqlite3 /var/awslogs/state/agent-state
sudo is needed to have write access.
List all streams with
select * from stream_state;
Look up your log stream and note the source_id which is part of a json data structure in the v column.
Then, list all records with this source_id (in my case it was 7675f84405fcb8fe5b6bb14eaa0c4bfd) in the push_state table
select * from push_state where k="7675f84405fcb8fe5b6bb14eaa0c4bfd";
The resulting record has a json data structure in the v column which contains a batch_timestamp. And this batch_timestamp seams to be wrong. It was in the past and any newer (more than 2 hours) log entries were not processed anymore.
The solution is to update this record. Copy the v column, replace the batch_timestamp with the current timestamp and update with something like
update push_state set v='... insert new value here ...' where k='7675f84405fcb8fe5b6bb14eaa0c4bfd';
Restart the service with
sudo /etc/init.d/awslogs restart
I hope it works for you!
We had the same issue and the following steps fixed the issue.
If log groups are not updating with latest events:
Run These steps:
Stopped the awslogs service
Deleted file /var/awslogs/state/agent-state
Updated /var/awslogs/etc/awslogs.conf configuration from hostaname to
instance ID Ex:
log_stream_name = {hostname} to log_stream_name = {instance_id}
Started awslogs service.
I was able to resolve this issue on Amazon Linux by:
sudo yum reinstall awslogs
sudo service awslogs restart
This method retained my config files in /var/awslogs/, though you may wish to back them up before a reinstall.
Note: In my troubleshooting, I had also deleted my Log Group via the AWS Console. The restart fully reloaded all historical logs, but at the present timestamp, which is of less value. I'm unsure if deleting the Log Group was this was necessary for this method to work. You might want to look at setting the initial_position config to end_of_file before you restart.
I found the reason. The time zone in my docker container is inconsistent with the time zone of my host computer. After setting the two time zones to be consistent, the problem is solved
I have a Java client, which obtains an autogenerated port. After starting the actor system, I want to access the port.
Config clientConfig = ConfigFactory.parseString("akka.remote.netty.tcp.port = 0")
.withFallback(ConfigFactory.parseString("akka.remote.netty.tcp.hostname = " + serverHostName))
.withFallback(ConfigFactory.load("common"));
actorSystem = ActorSystem.create("clientActorSystem", clientConfig);
// how to access the generated port here..!?
The port must already be set since the log output after ActorSystem.create(...) is like that:
[INFO] [03/31/2016 14:11:32.042] [main] [akka.remote.Remoting] Starting remoting
[INFO] [03/31/2016 14:11:32.233] [main] [akka.remote.Remoting] Remoting started; listening on addresses :[akka.tcp://actorSystem#localhost:58735]
[INFO] [03/31/2016 14:11:32.234] [main] [akka.remote.Remoting] Remoting now listens on addresses: [akka.tcp://actorSystem#localhost:58735]
If I try to get it via the configuration with actorSystem.settings().config().getValue("akka.remote.netty.tcp.port"), I still get 0 as defined before.
Has anyone an idea how this port (58735 in the example) can be accessed?
Using scala you can get Option of port on which Actor system is currently running:
val port = system.provider.getDefaultAddress.port
Hope you will be able to get the same code in Java.
The accepted answer probably worked for older versions of Akka but as of now (version 2.5.x) you will be getting something like:
Error:(22, 18) method provider in trait ActorRefFactory cannot be accessed in akka.actor.ActorSystem
The solution would be to use akka extensions. Here is how I use it:
Example. scala
package example
import akka.actor._
class AddressExtension(system: ExtendedActorSystem) extends Extension {
val address: Address = system.provider.getDefaultAddress
}
object AddressExtension extends ExtensionId[AddressExtension] {
def createExtension(system: ExtendedActorSystem): AddressExtension = new AddressExtension(system)
def hostOf(system: ActorSystem): String = AddressExtension(system).address.host.getOrElse("")
def portOf(system: ActorSystem): Int = AddressExtension(system).address.port.getOrElse(0)
}
object Main extends App {
val system = ActorSystem("Main")
println(AddressExtension.portOf(system))
}
I have an Orion instance with Cygnus at filab; subcription and notify run fine but I can not persist data to cosmos.lab.fi-ware.org.
Cygnus returns this error:
[ERROR - es.tid.fiware.fiwareconnectors.cygnus.sinks.OrionSink.process(OrionSink.java:139)] Persistence error (The talky/talkykar/room6_room directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
This is my agent_a.conf file:
cygnusagent.sources = http-source
cygnusagent.sinks = hdfs-sink
cygnusagent.channels = hdfs-channel
#=============================================
# source configuration
# channel name where to write the notification events
cygnusagent.sources.http-source.channels = hdfs-channel
# source class, must not be changed
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# listening port the Flume source will use for receiving incoming notifications
cygnusagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
cygnusagent.sources.http-source.handler = es.tid.fiware.fiwareconnectors.cygnus.handlers.OrionRestHandler
# URL target
cygnusagent.sources.http-source.handler.notification_target = /notify
# Default service (service semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service = talky
# Default service path (service path semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service_path = talkykar
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = 10
# Source interceptors, do not change
cygnusagent.sources.http-source.interceptors = ts de
# Timestamp interceptor, do not change
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
# Destination extractor interceptor, do not change
cygnusagent.sources.http-source.interceptors.de.type = es.tid.fiware.fiwareconnectors.cygnus.interceptors.DestinationExtractor$Builder
# Matching table for the destination extractor interceptor, put the right absolute path to the file if necessary
# See the doc/design/interceptors document for more details
cygnusagent.sources.http-source.interceptors.de.matching_table = /usr/cygnus/conf/matching_table.conf
# ============================================
# OrionHDFSSink configuration
# channel name from where to read notification events
cygnusagent.sinks.hdfs-sink.channel = hdfs-channel
# sink class, must not be changed
cygnusagent.sinks.hdfs-sink.type = es.tid.fiware.fiwareconnectors.cygnus.sinks.OrionHDFSSink
# Comma-separated list of FQDN/IP address regarding the Cosmos Namenode endpoints
# If you are using Kerberos authentication, then the usage of FQDNs instead of IP addresses is mandatory
cygnusagent.sinks.hdfs-sink.cosmos_host = http://cosmos.lab.fi-ware.org
# port of the Cosmos service listening for persistence operations; 14000 for httpfs, 50070 for webhdfs and free choice for inifinty
cygnusagent.sinks.hdfs-sink.cosmos_port = 14000
# default username allowed to write in HDFS
cygnusagent.sinks.hdfs-sink.cosmos_default_username = myuser
# default password for the default username
cygnusagent.sinks.hdfs-sink.cosmos_default_password = mypass
# HDFS backend type (webhdfs, httpfs or infinity)
cygnusagent.sinks.hdfs-sink.hdfs_api = httpfs
# how the attributes are stored, either per row either per column (row, column)
cygnusagent.sinks.hdfs-sink.attr_persistence = row
# Hive FQDN/IP address of the Hive server
cygnusagent.sinks.hdfs-sink.hive_host = http://cosmos.lab.fi-ware.org
# Hive port for Hive external table provisioning
cygnusagent.sinks.hdfs-sink.hive_port = 10000
# Kerberos-based authentication enabling
cygnusagent.sinks.hdfs-sink.krb5_auth = false
# Kerberos username
cygnusagent.sinks.hdfs-sink.krb5_auth.krb5_user = krb5_username
# Kerberos password
cygnusagent.sinks.hdfs-sink.krb5_auth.krb5_password = xxxxxxxxxxxxx
# Kerberos login file
cygnusagent.sinks.hdfs-sink.krb5_auth.krb5_login_conf_file = /usr/cygnus/conf/krb5_login.conf
# Kerberos configuration file
cygnusagent.sinks.hdfs-sink.krb5_auth.krb5_conf_file = /usr/cygnus/conf/krb5.conf
#=============================================
And this is the Cygnus log:
2015-05-04 09:05:10,434 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - es.tid.fiware.fiwareconnectors.cygnus.sinks.OrionHDFSSink.persist(OrionHDFSSink.java:315)] [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (talky/talkykar/room6_room/room6_room.txt), Data ({"recvTimeTs":"1430723069","recvTime":"2015-05-04T09:04:29.819","entityId":"Room6","entityType":"Room","attrName":"temperature","attrType":"float","attrValue":"26.5","attrMd":[]})
2015-05-04 09:05:10,435 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - es.tid.fiware.fiwareconnectors.cygnus.backends.hdfs.HDFSBackendImpl.doHDFSRequest(HDFSBackendImpl.java:255)] HDFS request: PUT http://http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/mped.mlg/talky/talkykar/room6_room?op=mkdirs&user.name=mped.mlg HTTP/1.1
2015-05-04 09:05:10,435 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:186)] Connection request: [route: {}->http://http][total kept alive: 0; route allocated: 0 of 100; total allocated: 0 of 500]
2015-05-04 09:05:10,435 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:220)] Connection leased: [id: 21][route: {}->http://http][total kept alive: 0; route allocated: 1 of 100; total allocated: 1 of 500]
2015-05-04 09:05:10,435 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.DefaultClientConnection.close(DefaultClientConnection.java:169)] Connection org.apache.http.impl.conn.DefaultClientConnection#5700187d closed
2015-05-04 09:05:10,435 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.DefaultClientConnection.shutdown(DefaultClientConnection.java:154)] Connection org.apache.http.impl.conn.DefaultClientConnection#5700187d shut down
2015-05-04 09:05:10,436 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.PoolingClientConnectionManager.releaseConnection(PoolingClientConnectionManager.java:272)] Connection [id: 21][route: {}->http://http] can be kept alive for 9223372036854775807 MILLISECONDS
2015-05-04 09:05:10,436 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.DefaultClientConnection.close(DefaultClientConnection.java:169)] Connection org.apache.http.impl.conn.DefaultClientConnection#5700187d closed
2015-05-04 09:05:10,436 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.PoolingClientConnectionManager.releaseConnection(PoolingClientConnectionManager.java:278)] Connection released: [id: 21][route: {}->http://http][total kept alive: 0; route allocated: 0 of 100; total allocated: 0 of 500]
2015-05-04 09:05:10,436 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - es.tid.fiware.fiwareconnectors.cygnus.backends.hdfs.HDFSBackendImpl.doHDFSRequest(HDFSBackendImpl.java:191)] The used HDFS endpoint is not active, trying another one (host=http://cosmos.lab.fi-ware.org)
2015-05-04 09:05:10,436 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - es.tid.fiware.fiwareconnectors.cygnus.sinks.OrionSink.process(OrionSink.java:139)] Persistence error (The talky/talkykar/room6_room directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
Thanks.
If you take a look to this log:
2015-05-04 09:05:10,435 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - es.tid.fiware.fiwareconnectors.cygnus.backends.hdfs.HDFSBackendImpl.doHDFSRequest(HDFSBackendImpl.java:255)] HDFS request: PUT http://http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/mped.mlg/talky/talkykar/room6_room?op=mkdirs&user.name=mped.mlg HTTP/1.1
You will se your are trying to create a HDFS directory by using a http://http://cosmos.lab... URL (please, notice the double http://http://).
This is becasuse you have configured:
cygnusagent.sinks.hdfs-sink.hive_host = http://cosmos.lab.fi-ware.org
Instead of:
cygnusagent.sinks.hdfs-sink.hive_host = cosmos.lab.fi-ware.org
Such a parameter asks for a host, not a URL.
Being said that, in future releases we will allow for both encodings.