Flume - sink as Maprfs. Files not writing - hdfs

My source type is spooldir and sink type is hdfs. There is no error but files are not copied.
Between I am completely aware of the NFS mount feature to copy data. I am in learning flume and I want to try this feature. Once this is working I would like try to write data using log4j, avro as source and hdfs as sink.
Any help is greatly appreciated
Regards
Mani
# Name the components of this agents
maprfs-agent.sources = spool-collect
maprfs-agent.sinks = maprfs-write
maprfs-agent.channels = memory-channel
# Describe/ Configure the sources
maprfs-agent.sources.spool-collect.type = spooldir
maprfs-agent.sources.spool-collect.spoolDir = /home/appdata/mani
maprfs-agent.sources.spool-collect.fileHeader = true
maprfs-agent.sources.spool-collect.bufferMaxLineLength = 500
maprfs-agent.sources.spool-collect.bufferMaxLines = 10000
maprfs-agent.sources.spool-collect.batchSize = 100000
# Describe/ Configure sink
maprfs-agent.sinks.maprfs-write.type = hdfs
maprfs-agent.sinks.maprfs-write.hdfs.fileType = DataStream
maprfs-agent.sinks.maprfs-write.hdfs.path = maprfs:///sample.node.com/user/hive/test
maprfs-agent.sinks.maprfs-write.writeFormat = Text
maprfs-agent.sinks.maprfs-write.hdfs.proxyUser = root
maprfs-agent.sinks.maprfs-write.hdfs.kerberosPrincipal = mapr
maprfs-agent.sinks.maprfs-write.hdfs.kerberosKeytab = /opt/mapr/conf/flume.keytab
maprfs-agent.sinks.maprfs-write.hdfs.filePrefix = %{file}
maprfs-agent.sinks.maprfs-write.hdfs.fileSuffix = .csv
maprfs-agent.sinks.maprfs-write.hdfs.rollInterval = 0
maprfs-agent.sinks.maprfs-write.hdfs.rollCount = 0
maprfs-agent.sinks.maprfs-write.hdfs.rollSize = 0
maprfs-agent.sinks.maprfs-write.hdfs.batchSize = 100
maprfs-agent.sinks.maprfs-write.hdfs.idleTimeout = 0
maprfs-agent.sinks.maprfs-write.hdfs.maxOpenFiles = 5
# Configure channel buffer
maprfs-agent.channels.memory-channel.type = memory
maprfs-agent.channels.memory-channel.capacity = 1000
# Bind the source and the sink to the channel
maprfs-agent.sources.spool-collect.channels = memory-channel
maprfs-agent.sinks.maprfs-write.channel = memory-channel
I am getting below message. no error and no files copied when I execute below command.
hadoop mfs -ls /user/hive/test
15/05/26 13:55:45 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
15/05/26 13:55:45 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:mapr-spool.conf
15/05/26 13:55:45 INFO conf.FlumeConfiguration: Added sinks: maprfs-write Agent: maprfs-agent
15/05/26 13:55:45 INFO conf.FlumeConfiguration: Processing:maprfs-write
15/05/26 13:55:45 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [maprfs-agent]
15/05/26 13:55:45 INFO node.AbstractConfigurationProvider: Creating channels
15/05/26 13:55:45 INFO channel.DefaultChannelFactory: Creating instance of channel memory-channel type memory
15/05/26 13:55:45 INFO node.AbstractConfigurationProvider: Created channel memory-channel
15/05/26 13:55:45 INFO source.DefaultSourceFactory: Creating instance of source spool-collect, type spooldir
15/05/26 13:55:45 INFO sink.DefaultSinkFactory: Creating instance of sink: maprfs-write, type: hdfs
15/05/26 13:55:47 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
15/05/26 13:55:47 INFO hdfs.HDFSEventSink: Auth method: PROXY
15/05/26 13:55:47 INFO hdfs.HDFSEventSink: User name: root
15/05/26 13:55:47 INFO hdfs.HDFSEventSink: Using keytab: false
15/05/26 13:55:47 INFO hdfs.HDFSEventSink: Superuser auth: SIMPLE
15/05/26 13:55:47 INFO hdfs.HDFSEventSink: Superuser name: root
15/05/26 13:55:47 INFO hdfs.HDFSEventSink: Superuser using keytab: false
15/05/26 13:55:47 INFO hdfs.HDFSEventSink: Logged in as user root
15/05/26 13:55:47 INFO node.AbstractConfigurationProvider: Channel memory-channel connected to [spool-collect, maprfs-write]
15/05/26 13:55:47 INFO node.Application: Starting new configuration:{ sourceRunners:{spool-collect=EventDrivenSourceRunner: { source:Spool Directory source spool-collect: { spoolDir: /home/appdata/mani } }} sinkRunners:{maprfs-write=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#7fc7efa0 counterGroup:{ name:null counters:{} } }} channels:{memory-channel=org.apache.flume.channel.MemoryChannel{name: memory-channel}} }
15/05/26 13:55:47 INFO node.Application: Starting Channel memory-channel
15/05/26 13:55:47 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: memory-channel: Successfully registered new MBean.
15/05/26 13:55:47 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memory-channel started
15/05/26 13:55:47 INFO node.Application: Starting Sink maprfs-write
15/05/26 13:55:47 INFO node.Application: Starting Source spool-collect
15/05/26 13:55:47 INFO source.SpoolDirectorySource: SpoolDirectorySource source starting with directory: /home/appdata/mani
15/05/26 13:55:47 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: maprfs-write: Successfully registered new MBean.
15/05/26 13:55:47 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: maprfs-write started
15/05/26 13:55:47 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: spool-collect: Successfully registered new MBean.
15/05/26 13:55:47 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: spool-collect started
15/05/26 13:55:47 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/appdata/mani/cron-s3.log to /home/appdata/mani/cron-s3.log.COMPLETED
15/05/26 13:55:47 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
15/05/26 13:55:48 INFO hdfs.BucketWriter: Creating maprfs:///sample.node.com/user/hive/test/.1432644947885.csv.tmp
15/05/26 13:57:08 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/appdata/mani/network-usage.log to /home/appdata/mani/network-usage.log.COMPLETED
15/05/26 13:57:08 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/appdata/mani/processor-usage-2014-10-17.log to /home/appdata/mani/processor-usage-2014-10-17.log.COMPLETED
15/05/26 13:57:25 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/appdata/mani/total-processor-usage.log to /home/appdata/mani/total-processor-usage.log.COMPLETED
15/05/26 13:57:25 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
15/05/26 13:57:26 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
15/05/26 13:57:26 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
15/05/26 13:57:27 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
15/05/26 13:57:27 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.
15/05/26 13:57:28 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.

Related

fetch aws credentials from assumed role with web identity

I am trying to fetch credentials to be used for spark submit. I already have an assumed role with web identity provider that airflow task is doing for me. But in order to export these credentials to spark, I need to fetch these credentials and set it in Spark context. how can I do it?
[2022-08-23, 11:07:42 UTC] {{subprocess.py:89}} INFO - + aws configure list
[2022-08-23, 11:07:43 UTC] {{subprocess.py:89}} INFO - Name Value Type Location
[2022-08-23, 11:07:43 UTC] {{subprocess.py:89}} INFO - ---- ----- ---- --------
[2022-08-23, 11:07:43 UTC] {{subprocess.py:89}} INFO - profile <not set> None None
[2022-08-23, 11:07:43 UTC] {{subprocess.py:89}} INFO - access_key ****************WWSO assume-role-with-web-identity
[2022-08-23, 11:07:43 UTC] {{subprocess.py:89}} INFO - secret_key ****************wZz0 assume-role-with-web-identity
As you can see above, the access keys are not stored in environment variables. However a web identity access token is present and the authentication to AWS is happening through it
Once you got the environment variable of access key and secret key, you can set it by
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", AWS_ACCESS_KEY)
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", AWS_SECRET_KEY)

Emitting logs to Open Telemetry Collector

I am new to Open Telemetry and I am trying its C++ API with a toy example that emit logs. I am using the OtlpHttpLogExporter to emit to the Open Telemetry Collector via http. And I configured the Collector to use the "logging" exporter to print the collected data to the console. When I start up the collector and then run the code, it completes successfully but nothing is printed. Am I missing something? Can you suggest how I can go debugging this issue?
I tried switching to OStreamLogExporter to print directly to console instead of sending to Open Telemetry collector and the logs print fine. I also tried emitting traces and spans to the collector and that works as well. The issue seems to be specifically about sending logs to the collector.
#include "opentelemetry/sdk/logs/simple_log_processor.h"
#include "opentelemetry/sdk/logs/logger_provider.h"
#include "opentelemetry/logs/provider.h"
#include "opentelemetry/exporters/otlp/otlp_http_log_exporter.h"
#include "opentelemetry/sdk/version/version.h"
namespace logs_sdk = opentelemetry::sdk::logs;
namespace otlp = opentelemetry::exporter::otlp;
opentelemetry::exporter::otlp::OtlpHttpLogExporterOptions logger_opts;
int main()
{
auto exporter = std::unique_ptr<logs_sdk::LogExporter>(new otlp::OtlpHttpLogExporter(logger_opts));
auto processor = std::unique_ptr<logs_sdk::LogProcessor>(
new logs_sdk::SimpleLogProcessor(std::move(exporter)));
auto provider =
std::shared_ptr<logs_sdk::LoggerProvider>(new logs_sdk::LoggerProvider(std::move(processor)));
// Get Logger
auto logger = provider->GetLogger("firstlog", "", OPENTELEMETRY_SDK_VERSION);
logger->Debug("I am the first log message.");
}
And the configuration of the collector:
extensions:
health_check:
pprof:
endpoint: 0.0.0.0:1777
zpages:
endpoint: 0.0.0.0:55679
receivers:
otlp:
protocols:
grpc:
http:
opencensus:
# Collect own metrics
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 10s
static_configs:
- targets: ['0.0.0.0:8888']
jaeger:
protocols:
grpc:
thrift_binary:
thrift_compact:
thrift_http:
zipkin:
processors:
batch:
exporters:
file:
path: ./myoutput.json
logging:
logLevel: debug
prometheus:
endpoint: "prometheus:8889"
namespace: "default"
jaeger:
endpoint: "localhost:14250"
tls:
insecure: true
service:
pipelines:
logs:
receivers: [otlp]
processors: [batch]
exporters: [logging]
traces:
receivers: [otlp]
processors: [batch]
exporters: [logging]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [logging]
extensions: [health_check, pprof, zpages]

fabric-ca-client error for 4 org system

I am getting this error while registering new user
2017/10/11 07:53:11 [DEBUG] Received request
POST /api/v1/enroll
Authorization: Basic YWRtaW46YWRtaW5wdw==
{"caName":"","certificate_request":"-----BEGIN CERTIFICATE REQUEST-----\r\nMIHMMHICAQAwEDEOMAwGA1UEAwwFYWRtaW4wWTATBgcqhkjOPQIBBggqhkjOPQMB\r\nBwNCAASUWo/5gS9H/PSvsiNK2iGsWw0nv7tsVnGG+ZY3cWFJ3ANz6cNmd+lRLZS3\r\nBhHYD/FZhhqwBucMHFE1sB9SqqEnoAAwDAYIKoZIzj0EAwIFAANIADBFAiEAiHjk\r\ncyM3gzqYbLAFVz8kHahVXtAjEOb82q7jiP35Tm4CIAHQsotf2301RCBVQ6i5hb9i\r\nByHhofDyhEFbch7gJVVF\r\n-----END CERTIFICATE REQUEST-----\r\n"}
2017/10/11 07:53:11 [DEBUG] Directing traffic to default CA
2017/10/11 07:53:11 [DEBUG] DB: Getting identity admin
2017/10/11 07:53:11 [DEBUG] Failed to get identity 'admin': sql: no rows in result set
I have my own fabric-ca-server-config.yaml file
identities:
- name: admin
pass: adminpw
type: client
affiliation: ""
maxenrollments: -1
attrs:
hf.Registrar.Roles: "client,user,peer,validator,auditor"
hf.Registrar.DelegateRoles: "client,user,validator,auditor"
hf.Revoker: true
hf.IntermediateCA: true
hf.GenCRL: true
affiliations:
org1:
- department1
- department2
org2:
- department1
- department2
org3:
- department1
- department2
org4:
- department1
- department2
```
I browsed .db file I didn't find any data in any of the table
Deleted fabric-ca-server.db file and keys in keystore restarted ca_Peers and it worked

Run Spark in Amazon EMR

I am a new bee in Spark and trying to run Spark on Amazon EMR. Here's my code (I've copied from an example and did a little bit modification):
package test;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.PairFunction;
import scala.Tuple2;
import com.google.common.base.Optional;
public class SparkTest {
public static void main(String[] args) {
JavaSparkContext sc = new JavaSparkContext(new SparkConf().setAppName("Spark Count"));
sc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "xxxxxxxxxxxxxxx");
sc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "yyyyyyyyyyyyyyyyyyyyyyy");
JavaRDD<String> customerInputFile = sc.textFile("s3n://aws-logs-494322476419-ap-southeast-1/test/customers_data.txt");
JavaPairRDD<String, String> customerPairs = customerInputFile.mapToPair(new PairFunction<String, String, String>() {
public Tuple2<String, String> call(String s) {
String[] customerSplit = s.split(",");
return new Tuple2<String, String>(customerSplit[0], customerSplit[1]);
}
}).distinct();
JavaRDD<String> transactionInputFile = sc.textFile("s3n://aws-logs-494322476419-ap-southeast-1/test/transactions_data.txt");
JavaPairRDD<String, String> transactionPairs = transactionInputFile.mapToPair(new PairFunction<String, String, String>() {
public Tuple2<String, String> call(String s) {
String[] transactionSplit = s.split(",");
return new Tuple2<String, String>(transactionSplit[2], transactionSplit[3]+","+transactionSplit[1]);
}
});
//Default Join operation (Inner join)
JavaPairRDD<String, Tuple2<String, String>> joinsOutput = customerPairs.join(transactionPairs);
System.out.println("Joins function Output: "+joinsOutput.collect());
//Left Outer join operation
JavaPairRDD<String, Iterable<Tuple2<String, Optional<String>>>> leftJoinOutput = customerPairs.leftOuterJoin(transactionPairs).groupByKey().sortByKey();
System.out.println("LeftOuterJoins function Output: "+leftJoinOutput.collect());
//Right Outer join operation
JavaPairRDD<String, Iterable<Tuple2<Optional<String>, String>>> rightJoinOutput = customerPairs.rightOuterJoin(transactionPairs).groupByKey().sortByKey();
System.out.println("LeftOuterJoins function Output: "+rightJoinOutput.collect());
sc.close();
}
}
But after made a jar and setup a cluster and run, it always report such error and fail:
2015-07-24 12:22:41,550 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at ip-10-0-0-61.ap-southeast-1.compute.internal/10.0.0.61:8032
2015-07-24 12:22:42,619 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Requesting a new application from cluster with 2 NodeManagers
2015-07-24 12:22:42,694 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Verifying our application has not requested more than the maximum memory capability of the cluster (2048 MB per container)
2015-07-24 12:22:42,698 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Will allocate AM container, with 896 MB memory including 384 MB overhead
2015-07-24 12:22:42,700 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Setting up container launch context for our AM
2015-07-24 12:22:42,707 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Preparing resources for our AM container
2015-07-24 12:22:45,445 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Uploading resource file:/usr/lib/spark/lib/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar -> hdfs://10.0.0.61:8020/user/hadoop/.sparkStaging/application_1437740323036_0001/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
2015-07-24 12:22:47,701 INFO [main] metrics.MetricsSaver (MetricsSaver.java:showConfigRecord(643)) - MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1437740335527
2015-07-24 12:22:47,713 INFO [main] metrics.MetricsSaver (MetricsSaver.java:<init>(284)) - Created MetricsSaver j-1NM41B4W6K3IP:i-525f449f:SparkSubmit:06588 period:60 /mnt/var/em/raw/i-525f449f_20150724_SparkSubmit_06588_raw.bin
2015-07-24 12:22:49,047 INFO [DataStreamer for file /user/hadoop/.sparkStaging/application_1437740323036_0001/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar block BP-1554902524-10.0.0.61-1437740270491:blk_1073741830_1015] metrics.MetricsSaver (MetricsSaver.java:compactRawValues(464)) - 1 aggregated HDFSWriteDelay 183 raw values into 1 aggregated values, total 1
2015-07-24 12:23:03,845 INFO [main] fs.EmrFileSystem (EmrFileSystem.java:initialize(107)) - Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
2015-07-24 12:23:06,316 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[E987B96CAE12A2B2], ServiceEndpoint=[https://aws-logs-494322476419-ap-southeast-1.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=0, ClientExecuteTime=[2266.609], HttpRequestTime=[1805.926], HttpClientReceiveResponseTime=[17.096], RequestSigningTime=[187.361], ResponseProcessingTime=[0.66], HttpClientSendRequestTime=[1.065],
2015-07-24 12:23:06,329 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Uploading resource s3://aws-logs-494322476419-ap-southeast-1/test/spark-test.jar -> hdfs://10.0.0.61:8020/user/hadoop/.sparkStaging/application_1437740323036_0001/spark-test.jar
2015-07-24 12:23:06,568 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[C40A7775223B6772], ServiceEndpoint=[https://aws-logs-494322476419-ap-southeast-1.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[237.557], HttpRequestTime=[20.943], HttpClientReceiveResponseTime=[13.247], RequestSigningTime=[29.321], ResponseProcessingTime=[186.674], HttpClientSendRequestTime=[1.998],
2015-07-24 12:23:07,265 INFO [main] s3n.S3NativeFileSystem (S3NativeFileSystem.java:open(1159)) - Opening 's3://aws-logs-494322476419-ap-southeast-1/test/spark-test.jar' for reading
2015-07-24 12:23:07,312 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206], ServiceName=[Amazon S3], AWSRequestID=[FB5C0051C241A9AC], ServiceEndpoint=[https://aws-logs-494322476419-ap-southeast-1.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[42.753], HttpRequestTime=[31.778], HttpClientReceiveResponseTime=[20.426], RequestSigningTime=[1.266], ResponseProcessingTime=[7.357], HttpClientSendRequestTime=[1.065],
2015-07-24 12:23:07,330 INFO [main] metrics.MetricsSaver (MetricsSaver.java:<init>(915)) - Thread 1 created MetricsLockFreeSaver 1
2015-07-24 12:23:07,875 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Uploading resource file:/tmp/spark-91e17f5e-45f2-466a-b4cf-585174b9fa98/__hadoop_conf__3852777564911495008.zip -> hdfs://10.0.0.61:8020/user/hadoop/.sparkStaging/application_1437740323036_0001/__hadoop_conf__3852777564911495008.zip
2015-07-24 12:23:07,965 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Uploading resource s3://aws-logs-494322476419-ap-southeast-1/test/spark-assembly-1.4.1-hadoop2.6.0.jar -> hdfs://10.0.0.61:8020/user/hadoop/.sparkStaging/application_1437740323036_0001/spark-assembly-1.4.1-hadoop2.6.0.jar
2015-07-24 12:23:07,993 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[200], ServiceName=[Amazon S3], AWSRequestID=[25260792F013C91A], ServiceEndpoint=[https://aws-logs-494322476419-ap-southeast-1.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[23.713], HttpRequestTime=[15.297], HttpClientReceiveResponseTime=[12.147], RequestSigningTime=[6.568], ResponseProcessingTime=[0.312], HttpClientSendRequestTime=[1.033],
2015-07-24 12:23:08,003 INFO [main] s3n.S3NativeFileSystem (S3NativeFileSystem.java:open(1159)) - Opening 's3://aws-logs-494322476419-ap-southeast-1/test/spark-assembly-1.4.1-hadoop2.6.0.jar' for reading
2015-07-24 12:23:08,064 INFO [main] amazonaws.latency (AWSRequestMetricsFullSupport.java:log(203)) - StatusCode=[206], ServiceName=[Amazon S3], AWSRequestID=[DDF86EA9B896052A], ServiceEndpoint=[https://aws-logs-494322476419-ap-southeast-1.s3.amazonaws.com], HttpClientPoolLeasedCount=0, RequestCount=1, HttpClientPoolPendingCount=0, HttpClientPoolAvailableCount=1, ClientExecuteTime=[60.109], HttpRequestTime=[55.175], HttpClientReceiveResponseTime=[43.324], RequestSigningTime=[1.067], ResponseProcessingTime=[3.409], HttpClientSendRequestTime=[1.16],
2015-07-24 12:23:09,002 INFO [main] metrics.MetricsSaver (MetricsSaver.java:commitPendingKey(1043)) - 1 MetricsLockFreeSaver 2 comitted 556 matured S3ReadDelay values
2015-07-24 12:23:24,296 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Setting up the launch environment for our AM container
2015-07-24 12:23:24,724 INFO [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing view acls to: hadoop
2015-07-24 12:23:24,727 INFO [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing modify acls to: hadoop
2015-07-24 12:23:24,731 INFO [main] spark.SecurityManager (Logging.scala:logInfo(59)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
2015-07-24 12:23:24,912 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Submitting application 1 to ResourceManager
2015-07-24 12:23:25,818 INFO [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(252)) - Submitted application application_1437740323036_0001
2015-07-24 12:23:26,872 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:26,893 INFO [main] yarn.Client (Logging.scala:logInfo(59)) -
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1437740605459
final status: UNDEFINED
tracking URL: http://ip-10-0-0-61.ap-southeast-1.compute.internal:20888/proxy/application_1437740323036_0001/
user: hadoop
2015-07-24 12:23:27,902 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:28,906 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:29,909 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:30,913 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:31,917 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:32,920 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:33,924 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:34,931 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:35,936 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:36,939 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:37,944 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:38,948 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:39,951 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:40,965 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:41,969 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:42,973 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:43,978 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:44,981 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:45,991 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:46,994 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: ACCEPTED)
2015-07-24 12:23:47,999 INFO [main] yarn.Client (Logging.scala:logInfo(59)) - Application report for application_1437740323036_0001 (state: FAILED)
2015-07-24 12:23:48,002 INFO [main] yarn.Client (Logging.scala:logInfo(59)) -
client token: N/A
diagnostics: Application application_1437740323036_0001 failed 2 times due to AM Container for appattempt_1437740323036_0001_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://ip-10-0-0-61.ap-southeast-1.compute.internal:20888/proxy/application_1437740323036_0001/Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://10.0.0.61:8020/user/hadoop/.sparkStaging/application_1437740323036_0001/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
java.io.FileNotFoundException: File does not exist: hdfs://10.0.0.61:8020/user/hadoop/.sparkStaging/application_1437740323036_0001/spark-assembly-1.4.1-hadoop2.6.0-amzn-0.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1437740605459
final status: FAILED
tracking URL: http://ip-10-0-0-61.ap-southeast-1.compute.internal:8088/cluster/app/application_1437740323036_0001
user: hadoop
2015-07-24 12:23:48,038 INFO [Thread-0] util.Utils (Logging.scala:logInfo(59)) - Shutdown hook called
2015-07-24 12:23:48,040 INFO [Thread-0] util.Utils (Logging.scala:logInfo(59)) - Deleting directory /tmp/spark-91e17f5e-45f2-466a-b4cf-585174b9fa98
Can anyone find out what the problem is?
Thank you very much.

Cannot create sink whose type is HDFS in flume-ng

I have a flume-ng which write logs to HDFS.
I made one agent in a single node.
But it is not running.
There is my configuration.
# example2.conf: A single-node Flume configuration
# Name the components on this agent
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1
# Describe/configure source1
agent1.sources.source1.type = avro
agent1.sources.source1.bind = localhost
agent1.sources.source1.port = 41414
# Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 10000
agent1.channels.channel1.transactionCapacity = 100
# Describe sink1
agent1.sinks.sink1.type = HDFS
agent1.sinks.sink1.hdfs.path = hdfs://dbkorando.kaist.ac.kr:9000/flume
# Bind the source and sink the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
and i command
flume-ng agent -n agent1 -c conf -C /home/hyahn/hadoop-0.20.2/hadoop-0.20.2-core.jar -f conf/example2.conf -Dflume.root.logger=INFO,console
The Result is
Info: Including Hadoop libraries found via (/home/hyahn/hadoop-0.20.2/bin/hadoop) for HDFS access
+ exec /usr/java/jdk1.7.0_02/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/etc/flume-ng/conf:/usr/lib/flume-ng/lib/*:/home/hyahn/hadoop-0.20.2/hadoop-0.20.2-core.jar' -Djava.library.path=:/home/hyahn/hadoop-0.20.2/bin/../lib/native/Linux-amd64-64 org.apache.flume.node.Application -n agent1 -f conf/example2.conf
2012-11-27 15:33:17,250 (main) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 1
2012-11-27 15:33:17,253 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent1
2012-11-27 15:33:17,257 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting
2012-11-27 15:33:17,257 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:203)] Node manager starting
2012-11-27 15:33:17,258 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 9
2012-11-27 15:33:17,258 (conf-file-poller-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:195)] Reloading configuration file:conf/example2.conf
2012-11-27 15:33:17,266 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:988)] Processing:sink1
2012-11-27 15:33:17,266 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:988)] Processing:sink1
2012-11-27 15:33:17,267 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:988)] Processing:sink1
2012-11-27 15:33:17,268 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:902)] Added sinks: sink1 Agent: agent1
2012-11-27 15:33:17,290 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:122)] Post-validation flume configuration contains configuration for agents: [agent1]
2012-11-27 15:33:17,290 (conf-file-poller-0) [INFO - org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadChannels(PropertiesFileConfigurationProvider.java:249)] Creating channels
2012-11-27 15:33:17,354 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.(MonitoredCounterGroup.java:68)] Monitoried counter group for type: CHANNEL, name: channel1, registered successfully.
2012-11-27 15:33:17,355 (conf-file-poller-0) [INFO - org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadChannels(PropertiesFileConfigurationProvider.java:273)] created channel channel1
2012-11-27 15:33:17,368 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.(MonitoredCounterGroup.java:68)] Monitoried counter group for type: SOURCE, name: source1, registered successfully.
2012-11-27 15:33:17,378 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:70)] Creating instance of sink: sink1, type: HDFS
As above, the problem that flume-ng stop at the sink generating part has occurred.
What is the problem?
you need to open another window and send an avro command at port 41414 as:
bin/flume-ng avro-client --conf conf -H localhost -p 41414 -F /home/hadoop1/aaa.txt -Dflume.root.logger=DEBUG,console
here i have a file named aaa.txt at /home/hadoop1/ directory
your flume will read this file and send to hdfs.