SageMaker multimodel and RandomCutForest

SageMaker multimodel and RandomCutForest - amazon-web-services

I am trying to invoke a MultiModel Endpoint with a RandomCutForest Model. I receive error though, 'Error loading model'. I can invoke the endpoint with models given from the examples.
Am I missing something e.g. limitations on what models I can use?
For MultiModel inspiration I am using below:
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/multi_model_xgboost_home_value/xgboost_multi_model_endpoint_home_value.ipynb
https://aws.amazon.com/blogs/machine-learning/save-on-inference-costs-by-using-amazon-sagemaker-multi-model-endpoints/
I am trying to deploy the outputted 'model.tar.gz' from below RCF example in the MultiModel endpoint:
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/random_cut_forest/random_cut_forest.ipynb
model_name = 'model'
full_model_name = '{}.tar.gz'.format(model_name)
features = data
body = ','.join(map(str, features)) + '\n'
response = runtime_sm_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType='text/csv',
TargetModel=full_model_name,
Body=body)
print(response)
Cloudwatch log Error:
> 2020-04-27 17:28:59,005 [INFO ]
> W-9003-b39b888fb4a3fa6cf83bb34a9-stdout
> com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Error loading model: Unable
> to load model: invalid load key, '{'. [17:28:59]
> /workspace/src/learner.cc:334: Check failed: fi->Read(&mparam_,
> sizeof(mparam_)) == sizeof(mparam_) (25 vs. 136) : BoostLearner: wrong
> model format 2020-04-27 17:28:59,005 [INFO ]
> W-9003-b39b888fb4a3fa6cf83bb34a9-stdout
> com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Stack trace: 2020-04-27
> 17:28:59,005 [INFO ] W-9003-b39b888fb4a3fa6cf83bb34a9-stdout
> com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [bt] (0)
> /miniconda3/xgboost/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x24)
> [0x7f37ce1cacb4] 2020-04-27 17:28:59,005 [INFO ]
> W-9003-b39b888fb4a3fa6cf83bb34a9 com.amazonaws.ml.mms.wlm.WorkerThread
> - Backend response time: 0 2020-04-27 17:28:59,005 [INFO ] W-9003-b39b888fb4a3fa6cf83bb34a9-stdout
> com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [bt] (1)
> /miniconda3/xgboost/libxgboost.so(xgboost::LearnerImpl::Load(dmlc::Stream*)+0x4b5)
> [0x7f37ce266985] 2020-04-27 17:28:59,005 [INFO ]
> W-9003-b39b888fb4a3fa6cf83bb34a9-stdout
> com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [bt] (2)
> /miniconda3/xgboost/libxgboost.so(XGBoosterLoadModel+0x37)
> [0x7f37ce1bf417] 2020-04-27 17:28:59,005 [INFO ]
> W-9003-b39b888fb4a3fa6cf83bb34a9-stdout
> com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [bt] (3)
> /miniconda3/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c)
> [0x7f37ee993ec0] 2020-04-27 17:28:59,005 [INFO ]
> W-9003-b39b888fb4a3fa6cf83bb34a9-stdout
> com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [bt] (4)
> /miniconda3/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call+0x22d)
> [0x7f37ee99387d] 2020-04-27 17:28:59,005 [INFO ]
> W-9003-b39b888fb4a3fa6cf83bb34a9-stdout
> com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [bt] (5)
> /miniconda3/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce)
> [0x7f37eeba91de] 2020-04-27 17:28:59,005 [INFO ]
> W-9003-b39b888fb4a3fa6cf83bb34a9-stdout
> com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [bt] (6)
> /miniconda3/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(+0x12c14)
> [0x7f37eeba9c14] 2020-04-27 17:28:59,005 [INFO ]
> W-9003-b39b888fb4a3fa6cf83bb34a9-stdout
> com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [bt] (7)
> /miniconda3/bin/python(_PyObject_FastCallKeywords+0x48b)
> [0x563d71b4218b] 2020-04-27 17:28:59,005 [INFO ]
> W-9003-b39b888fb4a3fa6cf83bb34a9-stdout
> com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [bt] (8)
> /miniconda3/bin/python(_PyEval_EvalFrameDefault+0x52cf)
> [0x563d71b91e8f] 2020-04-27 17:28:59,005 [INFO ]
> W-9003-b39b888fb4a3fa6cf83bb34a9-stdout
> com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 2020-04-27 17:28:59,005
> [WARN ] W-9003-b39b888fb4a3fa6cf83bb34a9
> com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker thread
> exception. java.lang.IllegalArgumentException: reasonPhrase contains
> one of the following prohibited characters: \r\n: Unable to load
> model: Unable to load model: invalid load key, '{'. [17:28:59]
> /workspace/src/learner.cc:334: Check failed: fi->Read(&mparam_,
> sizeof(mparam_)) == sizeof(mparam_) (25 vs. 136) : BoostLearner: wrong
> model format Stack trace: [bt] (0)
> /miniconda3/xgboost/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x24)
> [0x7f37ce1cacb4] [bt] (1)
> /miniconda3/xgboost/libxgboost.so(xgboost::LearnerImpl::Load(dmlc::Stream*)+0x4b5)
> [0x7f37ce266985] [bt] (2)
> /miniconda3/xgboost/libxgboost.so(XGBoosterLoadModel+0x37)
> [0x7f37ce1bf417] [bt] (3)
> /miniconda3/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c)
> [0x7f37ee993ec0] [bt] (4)
> /miniconda3/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call+0x22d)
> [0x7f37ee99387d] [bt] (5)
> /miniconda3/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce)
> [0x7f37eeba91de] [bt] (6)
> /miniconda3/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(+0x12c14)
> [0x7f37eeba9c14] [bt] (7)
> /miniconda3/bin/python(_PyObject_FastCallKeywords+0x48b)
> [0x563d71b4218b] [bt] (8)
> /miniconda3/bin/python(_PyEval_EvalFrameDefault+0x52cf)
> [0x563d71b91e8f]

SageMaker Random Cut Forest is part of the built-in algorithm library and cannot be deployed in multi-model endpoint (MME). Built-in algorithms currently cannot be deployed to MME. XGboost is an exception, since it has an open-source container https://github.com/aws/sagemaker-xgboost-container.
If you really need to deploy a RCF to a multi-model endpoint, one option is to find a reasonably similar open-source implementation (for example rrcf looks reasonably serious: based on the same paper from Guha et al and with 170+ github stars) and create a custom MME docker container. The documentation is here and there is an excellent tuto here

Related

How to define Log Configuration for AWS Batch in Step function definition?

I am trying to deploy a AWS step function where each state machine runs a AWS Batch job. All worked successfully but now I need to store all the logs for these state machines in a specific Cloudwatch log group.
Based on AWS documentation for Batch, I try this snippet in my step function definition in cloudformation template -
*"ContainerOverrides": {
"LogConfiguration": { #also tried logConfiguration
"LogDriver": "awslogs", #also tried logDriver
"Options": { #also tried options
"awslogs-group": "${PipelineLogGroup}",
"awslogs-stream-prefix": "canonical-"
}
}
}*
Under the same "ContainerOverrides" tag, "Environment" is defined and is working correctly. For "Log Configuration", I receiving the build error - 'SCHEMA_VALIDATION_FAILED: The field "LogConfiguration" is not supported by Step Functions (same for logConfiguration).
Isn't it possible to define "Log Configuration" of AWS Batch job through Step Function definition?

The "LogConfiguration" is not a part of "ContainerOverrides" in "StateMachine" tag. Rather, it is required to be configured with the Batch job definition.
> PipelineJobDefinition:
> Type: AWS::Batch::JobDefinition
> Properties:
> Type: Container
> ContainerProperties:
> Memory: 32768
> LogConfiguration:
> LogDriver: 'awslogs'
> Options: {
> "awslogs-group": !Ref 'LogGroupCreatedinCF',
> "awslogs-stream-prefix": "batch"
> }
> JobRoleArn: !ImportValue pipeline-DataBatchJobRoleArn
> Vcpus: 16
> Image: "...."
> JobDefinitionName: PipelineJobDefinition
> RetryStrategy:
> Attempts: 1

How to deploy image classifier with resnet50 model on AWS endpoint to predict without worker dying?

Created a imageclassifier model built on renet50 to identify dog breeds. I created it in sagemaker studio. Tuning and training are done, I deployed it, but when I try to predict on it, it fails. I believe this is related to the pid of the worker because its first warning I see.
Getting following Cloudwatch log output says worker pid not available yet then soon after the worker dies.
timestamp,message,logStreamName
1648240674535,"2022-03-25 20:37:54,107 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...",AllTraffic/i-055c5d00e53e84b93
1648240674535,"2022-03-25 20:37:54,188 [INFO ] main org.pytorch.serve.ModelServer - ",AllTraffic/i-055c5d00e53e84b93
1648240674535,Torchserve version: 0.4.0,AllTraffic/i-055c5d00e53e84b93
1648240674535,TS Home: /opt/conda/lib/python3.6/site-packages,AllTraffic/i-055c5d00e53e84b93
1648240674535,Current directory: /,AllTraffic/i-055c5d00e53e84b93
1648240674535,Temp directory: /home/model-server/tmp,AllTraffic/i-055c5d00e53e84b93
1648240674535,Number of GPUs: 0,AllTraffic/i-055c5d00e53e84b93
1648240674535,Number of CPUs: 1,AllTraffic/i-055c5d00e53e84b93
1648240674535,Max heap size: 6838 M,AllTraffic/i-055c5d00e53e84b93
1648240674535,Python executable: /opt/conda/bin/python3.6,AllTraffic/i-055c5d00e53e84b93
1648240674535,Config file: /etc/sagemaker-ts.properties,AllTraffic/i-055c5d00e53e84b93
1648240674535,Inference address: http://0.0.0.0:8080,AllTraffic/i-055c5d00e53e84b93
1648240674535,Management address: http://0.0.0.0:8080,AllTraffic/i-055c5d00e53e84b93
1648240674535,Metrics address: http://127.0.0.1:8082,AllTraffic/i-055c5d00e53e84b93
1648240674535,Model Store: /.sagemaker/ts/models,AllTraffic/i-055c5d00e53e84b93
1648240674535,Initial Models: model.mar,AllTraffic/i-055c5d00e53e84b93
1648240674535,Log dir: /logs,AllTraffic/i-055c5d00e53e84b93
1648240674535,Metrics dir: /logs,AllTraffic/i-055c5d00e53e84b93
1648240674535,Netty threads: 0,AllTraffic/i-055c5d00e53e84b93
1648240674535,Netty client threads: 0,AllTraffic/i-055c5d00e53e84b93
1648240674535,Default workers per model: 1,AllTraffic/i-055c5d00e53e84b93
1648240674535,Blacklist Regex: N/A,AllTraffic/i-055c5d00e53e84b93
1648240674535,Maximum Response Size: 6553500,AllTraffic/i-055c5d00e53e84b93
1648240674536,Maximum Request Size: 6553500,AllTraffic/i-055c5d00e53e84b93
1648240674536,Prefer direct buffer: false,AllTraffic/i-055c5d00e53e84b93
1648240674536,Allowed Urls: [file://.*|http(s)?://.*],AllTraffic/i-055c5d00e53e84b93
1648240674536,Custom python dependency for model allowed: false,AllTraffic/i-055c5d00e53e84b93
1648240674536,Metrics report format: prometheus,AllTraffic/i-055c5d00e53e84b93
1648240674536,Enable metrics API: true,AllTraffic/i-055c5d00e53e84b93
1648240674536,Workflow Store: /.sagemaker/ts/models,AllTraffic/i-055c5d00e53e84b93
1648240674536,"2022-03-25 20:37:54,195 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...",AllTraffic/i-055c5d00e53e84b93
1648240675536,"2022-03-25 20:37:54,217 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: model.mar",AllTraffic/i-055c5d00e53e84b93
1648240675536,"2022-03-25 20:37:55,505 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model model loaded.",AllTraffic/i-055c5d00e53e84b93
1648240675786,"2022-03-25 20:37:55,515 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.",AllTraffic/i-055c5d00e53e84b93
1648240675786,"2022-03-25 20:37:55,569 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080",AllTraffic/i-055c5d00e53e84b93
1648240675786,"2022-03-25 20:37:55,569 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.",AllTraffic/i-055c5d00e53e84b93
1648240675786,"2022-03-25 20:37:55,569 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082",AllTraffic/i-055c5d00e53e84b93
1648240675786,Model server started.,AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,727 [WARN ] pool-2-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,812 [INFO ] pool-2-thread-1 TS_METRICS - CPUUtilization.Percent:100.0|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,813 [INFO ] pool-2-thread-1 TS_METRICS - DiskAvailable.Gigabytes:38.02598190307617|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,813 [INFO ] pool-2-thread-1 TS_METRICS - DiskUsage.Gigabytes:12.715518951416016|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,814 [INFO ] pool-2-thread-1 TS_METRICS - DiskUtilization.Percent:25.1|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,815 [INFO ] pool-2-thread-1 TS_METRICS - MemoryAvailable.Megabytes:29583.98046875|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,815 [INFO ] pool-2-thread-1 TS_METRICS - MemoryUsed.Megabytes:1355.765625|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,816 [INFO ] pool-2-thread-1 TS_METRICS - MemoryUtilization.Percent:5.7|#Level:Host|#hostname:container-0.local,timestamp:1648240675",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,994 [INFO ] W-9000-model_1-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,994 [INFO ] W-9000-model_1-stdout MODEL_LOG - [PID]48",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,994 [INFO ] W-9000-model_1-stdout MODEL_LOG - Torch worker started.",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,994 [INFO ] W-9000-model_1-stdout MODEL_LOG - Python runtime: 3.6.13",AllTraffic/i-055c5d00e53e84b93
1648240676036,"2022-03-25 20:37:55,999 [INFO ] W-9000-model_1 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,006 [INFO ] W-9000-model_1-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,111 [INFO ] W-9000-model_1-stdout MODEL_LOG - Backend worker process died.",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,111 [INFO ] W-9000-model_1-stdout MODEL_LOG - Traceback (most recent call last):",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,111 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 182, in <module>",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,111 [INFO ] W-9000-model_1-stdout MODEL_LOG - worker.run_server()",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,111 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 154, in run_server",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,111 [INFO ] W-9000-model_1-stdout MODEL_LOG - self.handle_connection(cl_socket)",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 116, in handle_connection",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] W-9000-model_1-stdout MODEL_LOG - service, result, code = self.load_model(msg)",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 89, in load_model",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] W-9000-model_1-stdout MODEL_LOG - service = model_loader.load(model_name, model_dir, handler, gpu, batch_size, envelope)",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_loader.py"", line 110, in load",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,112 [INFO ] W-9000-model_1-stdout MODEL_LOG - initialize_fn(service.context)",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.BatchAggregator - Load model failed: model, error: Worker died.",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/home/model-server/tmp/models/23b30361031647d08792d32672910688/handler_service.py"", line 51, in initialize",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [INFO ] W-9000-model_1-stdout MODEL_LOG - super().initialize(context)",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-model_1-stderr",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-model_1-stdout",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/sagemaker_inference/default_handler_service.py"", line 66, in initialize",AllTraffic/i-055c5d00e53e84b93
1648240676286,"2022-03-25 20:37:56,113 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-model_1-stdout",AllTraffic/i-055c5d00e53e84b93
1648240676536,"2022-03-25 20:37:56,114 [INFO ] W-9000-model_1 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.",AllTraffic/i-055c5d00e53e84b93
1648240676536,"2022-03-25 20:37:56,416 [INFO ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-model_1-stderr",AllTraffic/i-055c5d00e53e84b93
1648240676536,"2022-03-25 20:37:56,461 [INFO ] W-9000-model_1 ACCESS_LOG - /169.254.178.2:39848 ""GET /ping HTTP/1.1"" 200 9",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:56,461 [INFO ] W-9000-model_1 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:container-0.local,timestamp:null",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,567 [INFO ] W-9000-model_1-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,568 [INFO ] W-9000-model_1-stdout MODEL_LOG - [PID]86",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,568 [INFO ] W-9000-model_1-stdout MODEL_LOG - Torch worker started.",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,568 [INFO ] W-9000-model_1-stdout MODEL_LOG - Python runtime: 3.6.13",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,568 [INFO ] W-9000-model_1 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,569 [INFO ] W-9000-model_1-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,642 [INFO ] W-9000-model_1-stdout MODEL_LOG - Backend worker process died.",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,642 [INFO ] W-9000-model_1-stdout MODEL_LOG - Traceback (most recent call last):",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,642 [INFO ] epollEventLoopGroup-5-2 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,642 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 182, in <module>",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout MODEL_LOG - worker.run_server()",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 154, in run_server",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.BatchAggregator - Load model failed: model, error: Worker died.",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout MODEL_LOG - self.handle_connection(cl_socket)",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 116, in handle_connection",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-model_1-stderr",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout MODEL_LOG - service, result, code = self.load_model(msg)",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [WARN ] W-9000-model_1 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-model_1-stdout",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout MODEL_LOG - File ""/opt/conda/lib/python3.6/site-packages/ts/model_service_worker.py"", line 89, in load_model",AllTraffic/i-055c5d00e53e84b93
1648240677787,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-model_1-stdout",AllTraffic/i-055c5d00e53e84b93
1648240678037,"2022-03-25 20:37:57,643 [INFO ] W-9000-model_1 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.",AllTraffic/i-055c5d00e53e84b93
1648240679288,"2022-03-25 20:37:57,991 [INFO ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-model_1-stderr",AllTraffic/i-055c5d00e53e84b93
1648240679288,"2022-03-25 20:37:59,096 [INFO ] W-9000-model_1-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000",AllTraffic/i-055c5d00e53e84b93
1648240679288,"2022-03-25 20:37:59,097 [INFO ] W-9000-model_1-stdout MODEL_LOG - [PID]114",AllTraffic/i-055c5d00e53e84b93
Model tuning and training came out alright so I'm not sure why it won't predict if that is fine. Someone mentioned to me that it might be due to entry point script, but I don't know what would cause it fail in predicting after deployed if it can predict fine during training.
Entry point script:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.models as models
import torchvision.transforms as transforms
import json
import copy
import argparse
import os
import logging
import sys
from tqdm import tqdm
from PIL import ImageFile
import smdebug.pytorch as smd
ImageFile.LOAD_TRUNCATED_IMAGES = True
logger=logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler(sys.stdout))
def test(model, test_loader, criterion, hook):
model.eval()
running_loss=0
running_corrects=0
hook.set_mode(smd.modes.EVAL)
for inputs, labels in test_loader:
outputs=model(inputs)
loss=criterion(outputs, labels)
_, preds = torch.max(outputs, 1)
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
##total_loss = running_loss // len(test_loader)
##total_acc = running_corrects.double() // len(test_loader)
##logger.info(f"Testing Loss: {total_loss}")
##logger.info(f"Testing Accuracy: {total_acc}")
logger.info("New test acc")
logger.info(f'Test set: Accuracy: {running_corrects}/{len(test_loader.dataset)} = {100*(running_corrects/len(test_loader.dataset))}%)')
def train(model, train_loader, validation_loader, criterion, optimizer, hook):
epochs=50
best_loss=1e6
image_dataset={'train':train_loader, 'valid':validation_loader}
loss_counter=0
hook.set_mode(smd.modes.TRAIN)
for epoch in range(epochs):
logger.info(f"Epoch: {epoch}")
for phase in ['train', 'valid']:
if phase=='train':
model.train()
logger.info("Model Trained")
else:
model.eval()
running_loss = 0.0
running_corrects = 0
for inputs, labels in image_dataset[phase]:
outputs = model(inputs)
loss = criterion(outputs, labels)
if phase=='train':
optimizer.zero_grad()
loss.backward()
optimizer.step()
logger.info("Model Optimized")
_, preds = torch.max(outputs, 1)
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss // len(image_dataset[phase])
epoch_acc = running_corrects // len(image_dataset[phase])
if phase=='valid':
logger.info("Model Validating")
if epoch_loss<best_loss:
best_loss=epoch_loss
else:
loss_counter+=1
logger.info(loss_counter)
'''logger.info('{} loss: {:.4f}, acc: {:.4f}, best loss: {:.4f}'.format(phase,
epoch_loss,
epoch_acc,
best_loss))'''
if phase=="train":
logger.info("New epoch acc for Train:")
logger.info(f"Epoch {epoch}: Loss {loss_counter/len(train_loader.dataset)}, Accuracy {100*(running_corrects/len(train_loader.dataset))}%")
if phase=="valid":
logger.info("New epoch acc for Valid:")
logger.info(f"Epoch {epoch}: Loss {loss_counter/len(train_loader.dataset)}, Accuracy {100*(running_corrects/len(train_loader.dataset))}%")
##if loss_counter==1:
## break
##if epoch==0:
## break
return model
def net():
model = models.resnet50(pretrained=True)
for param in model.parameters():
param.requires_grad = False
model.fc = nn.Sequential(
nn.Linear(2048, 128),
nn.ReLU(inplace=True),
nn.Linear(128, 133))
return model
def create_data_loaders(data, batch_size):
train_data_path = os.path.join(data, 'train')
test_data_path = os.path.join(data, 'test')
validation_data_path=os.path.join(data, 'valid')
train_transform = transforms.Compose([
transforms.RandomResizedCrop((224, 224)),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
])
test_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
])
train_data = torchvision.datasets.ImageFolder(root=train_data_path, transform=train_transform)
train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_data = torchvision.datasets.ImageFolder(root=test_data_path, transform=test_transform)
test_data_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, shuffle=True)
validation_data = torchvision.datasets.ImageFolder(root=validation_data_path, transform=test_transform)
validation_data_loader = torch.utils.data.DataLoader(validation_data, batch_size=batch_size, shuffle=True)
return train_data_loader, test_data_loader, validation_data_loader
def main(args):
logger.info(f'Hyperparameters are LR: {args.lr}, Batch Size: {args.batch_size}')
logger.info(f'Data Paths: {args.data}')
train_loader, test_loader, validation_loader=create_data_loaders(args.data, args.batch_size)
model=net()
hook = smd.Hook.create_from_json_file()
hook.register_hook(model)
criterion = nn.CrossEntropyLoss(ignore_index=133)
optimizer = optim.Adam(model.fc.parameters(), lr=args.lr)
logger.info("Starting Model Training")
model=train(model, train_loader, validation_loader, criterion, optimizer, hook)
logger.info("Testing Model")
test(model, test_loader, criterion, hook)
logger.info("Saving Model")
torch.save(model.cpu().state_dict(), os.path.join(args.model_dir, "model.pth"))
if __name__=='__main__':
parser=argparse.ArgumentParser()
'''
TODO: Specify any training args that you might need
'''
parser.add_argument(
"--batch-size",
type=int,
default=64,
metavar="N",
help="input batch size for training (default: 64)",
)
parser.add_argument(
"--test-batch-size",
type=int,
default=1000,
metavar="N",
help="input batch size for testing (default: 1000)",
)
parser.add_argument(
"--epochs",
type=int,
default=5,
metavar="N",
help="number of epochs to train (default: 10)",
)
parser.add_argument(
"--lr", type=float, default=0.01, metavar="LR", help="learning rate (default: 0.01)"
)
parser.add_argument(
"--momentum", type=float, default=0.5, metavar="M", help="SGD momentum (default: 0.5)"
)
# Container environment
parser.add_argument("--hosts", type=list, default=json.loads(os.environ["SM_HOSTS"]))
parser.add_argument("--current-host", type=str, default=os.environ["SM_CURRENT_HOST"])
parser.add_argument("--model-dir", type=str, default=os.environ["SM_MODEL_DIR"])
parser.add_argument("--data", type=str, default=os.environ["SM_CHANNEL_TRAINING"])
parser.add_argument("--num-gpus", type=int, default=os.environ["SM_NUM_GPUS"])
args=parser.parse_args()
main(args)
To test the model on the endpoint I sent over an image using the following code:
from sagemaker.serializers import IdentitySerializer
import base64
predictor.serializer = IdentitySerializer("image/png")
with open("Akita_00282.jpg", "rb") as f:
payload = f.read()
response = predictor.predict(payload)```

The model serving workers are either dying because they cannot load your model or deserialize the payload you are sending to them.
Note that you have to provide a model_fn implementation. Please read these docs here or this blog here to know more about how to adapt the inference scripts for SageMaker deployment. If you do not want to override the input_fn, predict_fn, and/or output_fn handlers, you can find their default implementations, for example, here.

Velero - not able to restore PVC

While trying to restore the EBS volumes from the snapshot it returns status as lost. we are using AWS KMS CMK keys with policy having kms* permission. The backup operation went fine.. the restore operation is able to restore all k8s resources expect the PVC.
k get pvc -n nginx-example
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nginx-logs Lost pvc-bda55207-a1e5-11ea-b7e6-02b82f6b7f4e 0 gp2-encrypt 4m22s
k get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-bda55207-a1e5-11ea-b7e6-02b82f6b7f4e 1Gi RWO Retain Released nginx-example/nginx-logs gp2-encrypt 33m
We noticed the UID of PV and PVC are not matching after the PVC is restored.
The service account used by velero pod has below policy as
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"ec2:DeleteSnapshot",
"kms:Decrypt",
"ec2:CreateTags",
"kms:GenerateDataKeyWithoutPlaintext",
"s3:ListBucket",
"kms:GenerateDataKeyPairWithoutPlaintext",
"ec2:DescribeSnapshots",
"kms:GenerateDataKeyPair",
"kms:ReEncryptFrom",
"ec2:CreateVolume",
"s3:ListMultipartUploadParts",
"s3:PutObject",
"s3:GetObject",
"s3:AbortMultipartUpload",
"ec2:DescribeVolumes",
"ec2:CreateSnapshot",
"kms:GenerateDataKey",
"kms:ReEncryptTo",
"s3:DeleteObject"
],
"Resource": "*"
}
]
}
we are using the below yaml to define storageclass and PVC
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp2-encrypt
parameters:
type: gp2
encrypted: "true"
fsType: ext4
kmsKeyId: arn:aws:kms:us-east-XXXXXX
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Retain
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nginx-logs
namespace: nginx-example
labels:
app: nginx
spec:
storageClassName: gp2-encrypt
accessModes:
- ReadWriteOnce
resources:
requests:
storage: [50Mi]
Below are logs from velero pods..
> time="2020-05-29T19:59:04Z" level=info msg="Starting restore of backup
> cluster-addons/nginx-backup-5" logSource="pkg/restore/restore.go:394"
> restore=cluster-addons/nginx-backup-5-20200529155858
> time="2020-05-29T19:59:04Z" level=info msg="Restoring cluster level
> resource 'persistentvolumes'" logSource="pkg/restore/restore.go:779"
> restore=cluster-addons/nginx-backup-5-20200529155858
> time="2020-05-29T19:59:04Z" level=info msg="Getting client for /v1,
> Kind=PersistentVolume" logSource="pkg/restore/restore.go:821"
> restore=cluster-addons/nginx-backup-5-20200529155858
> time="2020-05-29T20:09:04Z" level=info msg="Restoring resource
> 'persistentvolumeclaims' into namespace 'nginx-example'"
> logSource="pkg/restore/restore.go:777"
> restore=cluster-addons/nginx-backup-5-20200529155858
> time="2020-05-29T20:09:04Z" level=info msg="Getting client for /v1,
> Kind=PersistentVolumeClaim" logSource="pkg/restore/restore.go:821"
> restore=cluster-addons/nginx-backup-5-20200529155858
> time="2020-05-29T20:09:04Z" level=info msg="Executing item action for
> persistentvolumeclaims" logSource="pkg/restore/restore.go:1030"
> restore=cluster-addons/nginx-backup-5-20200529155858
> time="2020-05-29T20:09:04Z" level=info msg="Executing
> AddPVFromPVCAction" cmd=/velero
> logSource="pkg/restore/add_pv_from_pvc_action.go:44" pluginName=velero
> restore=cluster-addons/nginx-backup-5-20200529155858
> time="2020-05-29T20:09:04Z" level=info msg="Adding PV
> pvc-bda55207-a1e5-11ea-b7e6-02b82f6b7f4e as an additional item to
> restore" cmd=/velero
> logSource="pkg/restore/add_pv_from_pvc_action.go:66" pluginName=velero
> restore=cluster-addons/nginx-backup-5-20200529155858
> time="2020-05-29T20:09:04Z" level=info msg="Skipping
> persistentvolumes/pvc-bda55207-a1e5-11ea-b7e6-02b82f6b7f4e because
> it's already been restored." logSource="pkg/restore/restore.go:910"
> restore=cluster-addons/nginx-backup-5-20200529155858
> time="2020-05-29T20:09:04Z" level=info msg="Executing item action for
> persistentvolumeclaims" logSource="pkg/restore/restore.go:1030"
> restore=cluster-addons/nginx-backup-5-20200529155858
> time="2020-05-29T20:09:04Z" level=info msg="Executing
> ChangeStorageClassAction" cmd=/velero
> logSource="pkg/restore/change_storageclass_action.go:63"
> pluginName=velero restore=cluster-addons/nginx-backup-5-20200529155858
> time="2020-05-29T20:09:04Z" level=info msg="Attempting to restore
> PersistentVolumeClaim: nginx-logs"
> logSource="pkg/restore/restore.go:1136"
> restore=cluster-addons/nginx-backup-5-20200529155858
> time="2020-05-29T20:09:04Z" level=info msg="Done executing
> ChangeStorageClassAction" cmd=/velero
> logSource="pkg/restore/change_storageclass_action.go:74"
> pluginName=velero restore=cluster-addons/nginx-backup-5-20200529155858
>
> The cloudtrail does not have much information. Would you please let us
> know any additional. settings needed here?

Fiware: can not start cygnus as service

I installed cygnus using RPMs on fiware image CentOS-7-x64 and I can't start it as a service, Here is my logs:
[centos#cygnus-mongo conf]$ sudo service cygnus start
Starting cygnus (via systemctl): Job for cygnus.service failed. See 'systemctl status cygnus.service' and 'journalctl -xn' for details.
[FAILED]
[centos#cygnus-mongo conf]$ sudo journalctl -xn
-- Logs begin at mer. 2015-10-07 07:48:29 UTC, end at mer. 2015-10-07 10:02:35 UTC. --
oct. 07 10:02:20 cygnus-mongo.novalocal su[5700]: pam_unix(su:session): session closed for user cygnus
oct. 07 10:02:22 cygnus-mongo.novalocal cygnus[5695]: cat: /var/run/cygnus/cygnus_mongo.pid: No such file or directory
oct. 07 10:02:22 cygnus-mongo.novalocal cygnus[5695]: [FAILED]
oct. 07 10:02:22 cygnus-mongo.novalocal cygnus[5695]: rm: cannot remove ‘/var/run/cygnus/cygnus_mongo.pid’: No such file or directory
oct. 07 10:02:22 cygnus-mongo.novalocal systemd[1]: cygnus.service: control process exited, code=exited status=1
oct. 07 10:02:22 cygnus-mongo.novalocal systemd[1]: Failed to start SYSV: cygnus.
-- Subject: Unit cygnus.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit cygnus.service has failed.
--
-- The result is failed.
oct. 07 10:02:22 cygnus-mongo.novalocal systemd[1]: Unit cygnus.service entered failed state.
oct. 07 10:02:34 cygnus-mongo.novalocal dhclient[1064]: DHCPREQUEST on eth0 to 192.168.111.71 port 67 (xid=0x761299ef)
oct. 07 10:02:34 cygnus-mongo.novalocal dhclient[1064]: DHCPACK from 192.168.111.71 (xid=0x761299ef)
oct. 07 10:02:35 cygnus-mongo.novalocal sudo[5774]: centos : TTY=pts/0 ; PWD=/usr/cygnus/conf ; USER=root ; COMMAND=/bin/journalctl -xn
Actually the directory /var/run/cygnus was not created, is it going to be created automatically?
Here is my configuration files:
agent_mongo.conf
cygnusagent.sources = http-source
cygnusagent.sinks = mongo-sink
cygnusagent.channels = mongo-channel
#=============================================
# source configuration
# channel name where to write the notification events
cygnusagent.sources.http-source.channels = mongo-channel
# source class, must not be changed
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# listening port the Flume source will use for receiving incoming notifications
cygnusagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
cygnusagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
# URL target
cygnusagent.sources.http-source.handler.notification_target = /notify
# Default service (service semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service = def_serv
# Default service path (service path semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service_path = def_servpath
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = 10
# Source interceptors, do not change
cygnusagent.sources.http-source.interceptors = ts gi
# TimestampInterceptor, do not change
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
# GroupinInterceptor, do not change
cygnusagent.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.GroupingInterceptor$Builder
# Grouping rules for the GroupingInterceptor, put the right absolute path to the file if necessary
# See the doc/design/interceptors document for more details
cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file = /usr/cygnus/conf/grouping_rules.conf
# ============================================
# OrionMongoSink configuration
# sink class, must not be changed
cygnusagent.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.OrionMongoSink
# channel name from where to read notification events
cygnusagent.sinks.mongo-sink.channel = mongo-channel
# FQDN/IP:port where the MongoDB server runs (standalone case) or comma-separated list of FQDN/IP:port pairs where the MongoDB replica set members run
cygnusagent.sinks.mongo-sink.mongo_hosts = 127.0.0.1:27017
# a valid user in the MongoDB server (or empty if authentication is not enabled in MongoDB)
cygnusagent.sinks.mongo-sink.mongo_username =
# password for the user above (or empty if authentication is not enabled in MongoDB)
cygnusagent.sinks.mongo-sink.mongo_password =
# prefix for the MongoDB databases
cygnusagent.sinks.mongo-sink.db_prefix = kura_
# prefix pro the MongoDB collections
cygnusagent.sinks.mongo-sink.collection_prefix = kura_
# true is collection names are based on a hash, false for human redable collections
cygnusagent.sinks.mongo-sink.should_hash = false
#=============================================
# mongo-channel configuration
# channel type (must not be changed)
cygnusagent.channels.mongo-channel.type = memory
# capacity of the channel
cygnusagent.channels.mongo-channel.capacity = 1000
# amount of bytes that can be sent per transaction
cygnusagent.channels.mongo-channel.transactionCapacity = 100
cygnus_instance_mongo.conf :
# Who to run cygnus as. Note that you may need to use root if you want
# to run cygnus in a privileged port (<1024)
CYGNUS_USER=cygnus
# Where is the config folder
CONFIG_FOLDER=/usr/cygnus/conf
# Which is the config file
CONFIG_FILE=/usr/cygnus/conf/agent_mongo.conf
# Name of the agent. The name of the agent is not trivial, since it is the base for the Flume parameters
# naming conventions, e.g. it appears in .sources.http-source.channels=...
AGENT_NAME=cygnusagent
# Name of the logfile located at /var/log/cygnus. It is important to put the extension '.log' in order to the log rotation works properly
LOGFILE_NAME=cygnus.log
# Administration port. Must be unique per instance
ADMIN_PORT=8081
# Polling interval (seconds) for the configuration reloading
POLLING_INTERVAL=30
Edit: add logs after lunching cygnus as a standalone application:
[centos#cygnus-mongo iot]$ ./cygnus.sh
+ exec /usr/lib/jvm/java-1.6.0-openjdk.x86_64/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/usr/cygnus/conf:/usr/cygnus/lib/*:/usr/cygnus/plugins.d/cygnus/lib/*:/usr/cygnus/plugins.d/cygnus/libext/*' -Djava.library.path= com.telefonica.iot.cygnus.nodes.CygnusApplication -f /usr/cygnus/conf/agent_mongo.conf -n cygnusagent
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/cygnus/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/cygnus/plugins.d/cygnus/lib/cygnus-0.8.2-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2015-10-08 15:50:32,629 (main) [INFO - com.telefonica.iot.cygnus.nodes.CygnusApplication.main(CygnusApplication.java:235)] Starting a Jetty server listening on port 8081 (Management Interface)
2015-10-08 15:50:32,655 (main) [INFO - org.mortbay.log.Slf4jLog.info(Slf4jLog.java:67)] Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2015-10-08 15:50:32,656 (main) [INFO - com.telefonica.iot.cygnus.nodes.CygnusApplication.main(CygnusApplication.java:238)] Starting Cygnus application
2015-10-08 15:50:32,656 (Thread-1) [INFO - org.mortbay.log.Slf4jLog.info(Slf4jLog.java:67)] jetty-6.1.26
2015-10-08 15:50:32,684 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting
2015-10-08 15:50:32,694 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:/usr/cygnus/conf/agent_mongo.conf
2015-10-08 15:50:32,714 (conf-file-poller-0) [WARN - org.apache.flume.conf.FlumeConfiguration.<init>(FlumeConfiguration.java:101)] Configuration property ignored: cygnusagent.sinks.mongo-sink.mongo_username =
2015-10-08 15:50:32,714 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:mongo-sink
2015-10-08 15:50:32,715 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:mongo-sink
2015-10-08 15:50:32,715 (conf-file-poller-0) [WARN - org.apache.flume.conf.FlumeConfiguration.<init>(FlumeConfiguration.java:101)] Configuration property ignored: cygnusagent.sinks.mongo-sink.mongo_password =
2015-10-08 15:50:32,715 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: mongo-sink Agent: cygnusagent
2015-10-08 15:50:32,716 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:mongo-sink
2015-10-08 15:50:32,716 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:mongo-sink
2015-10-08 15:50:32,716 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:mongo-sink
2015-10-08 15:50:32,716 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:mongo-sink
2015-10-08 15:50:32,731 (Thread-1) [INFO - org.mortbay.log.Slf4jLog.info(Slf4jLog.java:67)] Started SocketConnector#0.0.0.0:8081
2015-10-08 15:50:32,744 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [cygnusagent]
2015-10-08 15:50:32,745 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:150)] Creating channels
2015-10-08 15:50:32,758 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:40)] Creating instance of channel mongo-channel type memory
2015-10-08 15:50:32,765 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:205)] Created channel mongo-channel
2015-10-08 15:50:32,766 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:39)] Creating instance of source http-source, type org.apache.flume.source.http.HTTPSource
2015-10-08 15:50:32,782 (conf-file-poller-0) [INFO - com.telefonica.iot.cygnus.handlers.OrionRestHandler.<init>(OrionRestHandler.java:75)] Cygnus version (0.8.2.UNKNOWN)
2015-10-08 15:50:32,808 (conf-file-poller-0) [INFO - com.telefonica.iot.cygnus.handlers.OrionRestHandler.configure(OrionRestHandler.java:141)] Startup completed
2015-10-08 15:50:32,836 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:40)] Creating instance of sink: mongo-sink, type: com.telefonica.iot.cygnus.sinks.OrionMongoSink
2015-10-08 15:50:32,856 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:119)] Channel mongo-channel connected to [http-source, mongo-sink]
2015-10-08 15:50:32,872 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{http-source=EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{name:http-source,state:IDLE} }} sinkRunners:{mongo-sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#7caba647 counterGroup:{ name:null counters:{} } }} channels:{mongo-channel=org.apache.flume.channel.MemoryChannel{name: mongo-channel}} }
2015-10-08 15:50:32,872 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel mongo-channel
2015-10-08 15:50:32,968 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:110)] Monitoried counter group for type: CHANNEL, name: mongo-channel, registered successfully.
2015-10-08 15:50:32,968 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] Component type: CHANNEL, name: mongo-channel started
2015-10-08 15:50:32,969 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink mongo-sink
2015-10-08 15:50:32,970 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source http-source
2015-10-08 15:50:32,972 (lifecycleSupervisor-1-4) [INFO - com.telefonica.iot.cygnus.interceptors.GroupingInterceptor.initialize(GroupingInterceptor.java:92)] Grouping rules read:
2015-10-08 15:50:32,974 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.stopAllComponents(Application.java:101)] Shutting down configuration: { sourceRunners:{http-source=EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{name:http-source,state:IDLE} }} sinkRunners:{mongo-sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#7caba647 counterGroup:{ name:null counters:{} } }} channels:{mongo-channel=org.apache.flume.channel.MemoryChannel{name: mongo-channel}} }
2015-10-08 15:50:32,975 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.stopAllComponents(Application.java:105)] Stopping Source http-source
2015-10-08 15:50:32,978 (lifecycleSupervisor-1-1) [INFO - com.telefonica.iot.cygnus.sinks.OrionMongoBaseSink.start(OrionMongoBaseSink.java:139)] [mongo-sink] Startup completed
2015-10-08 15:50:32,984 (lifecycleSupervisor-1-4) [
- com.telefonica.iot.cygnus.interceptors.GroupingInterceptor.parseGroupingRules(GroupingInterceptor.java:165)] Error while parsing the Json-based grouping rules file. Details=null
2015-10-08 15:50:32,984 (lifecycleSupervisor-1-4) [WARN - com.telefonica.iot.cygnus.interceptors.GroupingInterceptor.initialize(GroupingInterceptor.java:98)] Grouping rules syntax has errors
2015-10-08 15:50:33,030 (lifecycleSupervisor-1-4) [INFO - org.mortbay.log.Slf4jLog.info(Slf4jLog.java:67)] jetty-6.1.26
2015-10-08 15:50:33,081 (lifecycleSupervisor-1-4) [INFO - org.mortbay.log.Slf4jLog.info(Slf4jLog.java:67)] Started SocketConnector#0.0.0.0:5050
2015-10-08 15:50:33,082 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:110)] Monitoried counter group for type: SOURCE, name: http-source, registered successfully.
2015-10-08 15:50:33,082 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] Component type: SOURCE, name: http-source started
2015-10-08 15:50:33,083 (conf-file-poller-0) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:171)] Stopping component: EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{name:http-source,state:START} }
2015-10-08 15:50:33,083 (conf-file-poller-0) [INFO - org.mortbay.log.Slf4jLog.info(Slf4jLog.java:67)] Stopped SocketConnector#0.0.0.0:5050
2015-10-08 15:50:33,185 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:139)] Component type: SOURCE, name: http-source stopped
2015-10-08 15:50:33,185 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:145)] Shutdown Metric for type: SOURCE, name: http-source. source.start.time == 1444319433082
2015-10-08 15:50:33,185 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:151)] Shutdown Metric for type: SOURCE, name: http-source. source.stop.time == 1444319433185
2015-10-08 15:50:33,186 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: SOURCE, name: http-source. src.append-batch.accepted == 0
2015-10-08 15:50:33,186 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: SOURCE, name: http-source. src.append-batch.received == 0
2015-10-08 15:50:33,186 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: SOURCE, name: http-source. src.append.accepted == 0
2015-10-08 15:50:33,186 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: SOURCE, name: http-source. src.append.received == 0
2015-10-08 15:50:33,187 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: SOURCE, name: http-source. src.events.accepted == 0
2015-10-08 15:50:33,187 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: SOURCE, name: http-source. src.events.received == 0
2015-10-08 15:50:33,187 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: SOURCE, name: http-source. src.open-connection.count == 0
2015-10-08 15:50:33,187 (conf-file-poller-0) [INFO - org.apache.flume.source.http.HTTPSource.stop(HTTPSource.java:172)] Http source http-source stopped. Metrics: SOURCE:http-source{src.events.accepted=0, src.events.received=0, src.append.accepted=0, src.append-batch.accepted=0, src.open-connection.count=0, src.append-batch.received=0, src.append.received=0}
2015-10-08 15:50:33,187 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.stopAllComponents(Application.java:115)] Stopping Sink mongo-sink
2015-10-08 15:50:33,188 (conf-file-poller-0) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:171)] Stopping component: SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#7caba647 counterGroup:{ name:null counters:{} } }
2015-10-08 15:50:33,189 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.stopAllComponents(Application.java:125)] Stopping Channel mongo-channel
2015-10-08 15:50:33,190 (conf-file-poller-0) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.unsupervise(LifecycleSupervisor.java:171)] Stopping component: org.apache.flume.channel.MemoryChannel{name: mongo-channel}
2015-10-08 15:50:33,190 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:139)] Component type: CHANNEL, name: mongo-channel stopped
2015-10-08 15:50:33,190 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:145)] Shutdown Metric for type: CHANNEL, name: mongo-channel. channel.start.time == 1444319432968
2015-10-08 15:50:33,190 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:151)] Shutdown Metric for type: CHANNEL, name: mongo-channel. channel.stop.time == 1444319433190
2015-10-08 15:50:33,190 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: CHANNEL, name: mongo-channel. channel.capacity == 1000
2015-10-08 15:50:33,190 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: CHANNEL, name: mongo-channel. channel.current.size == 0
2015-10-08 15:50:33,191 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: CHANNEL, name: mongo-channel. channel.event.put.attempt == 0
2015-10-08 15:50:33,191 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: CHANNEL, name: mongo-channel. channel.event.put.success == 0
2015-10-08 15:50:33,191 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: CHANNEL, name: mongo-channel. channel.event.take.attempt == 1
2015-10-08 15:50:33,191 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.stop(MonitoredCounterGroup.java:167)] Shutdown Metric for type: CHANNEL, name: mongo-channel. channel.event.take.success == 0
2015-10-08 15:50:33,191 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{http-source=EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{name:http-source,state:START} }} sinkRunners:{mongo-sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#7caba647 counterGroup:{ name:null counters:{runner.backoffs.consecutive=0} } }} channels:{mongo-channel=org.apache.flume.channel.MemoryChannel{name: mongo-channel}} }
2015-10-08 15:50:33,191 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel mongo-channel
2015-10-08 15:50:33,192 (lifecycleSupervisor-1-2) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] Component type: CHANNEL, name: mongo-channel started
2015-10-08 15:50:33,192 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink mongo-sink
2015-10-08 15:50:33,193 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source http-source
2015-10-08 15:50:33,193 (lifecycleSupervisor-1-1) [INFO - com.telefonica.iot.cygnus.sinks.OrionMongoBaseSink.start(OrionMongoBaseSink.java:139)] [mongo-sink] Startup completed
2015-10-08 15:50:33,194 (lifecycleSupervisor-1-6) [INFO - com.telefonica.iot.cygnus.interceptors.GroupingInterceptor.initialize(GroupingInterceptor.java:92)] Grouping rules read:
2015-10-08 15:50:33,194 (lifecycleSupervisor-1-6) [ERROR - com.telefonica.iot.cygnus.interceptors.GroupingInterceptor.parseGroupingRules(GroupingInterceptor.java:165)] Error while parsing the Json-based grouping rules file. Details=null
2015-10-08 15:50:33,194 (lifecycleSupervisor-1-6) [WARN - com.telefonica.iot.cygnus.interceptors.GroupingInterceptor.initialize(GroupingInterceptor.java:98)] Grouping rules syntax has errors
2015-10-08 15:50:33,195 (lifecycleSupervisor-1-6) [INFO - org.mortbay.log.Slf4jLog.info(Slf4jLog.java:67)] jetty-6.1.26
2015-10-08 15:50:33,197 (lifecycleSupervisor-1-6) [INFO - org.mortbay.log.Slf4jLog.info(Slf4jLog.java:67)] Started SocketConnector#0.0.0.0:5050
2015-10-08 15:50:33,197 (lifecycleSupervisor-1-6) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] Component type: SOURCE, name: http-source started

Cygnus is supposed to create /var/run/cygnus/ when started. You can check here the path specification, and here the creation and PID assignement.
I'm wondering which are the permissions of your /var/run... Maybe they are too restrictive for the cygnus user.
Anyway, are you able to run Cygnus as a standalone application (not as a service) with no errors? I mean, executing this command:
$ /usr/cygnus/bin/cygnus-flume-ng agent --conf /usr/cygnus/conf -f /usr/cygnus/conf/agent_mongo.conf -n cygnusagent -Dflume.root.logger=INFO,console

Cannot create sink whose type is HDFS in flume-ng

I have a flume-ng which write logs to HDFS.
I made one agent in a single node.
But it is not running.
There is my configuration.
# example2.conf: A single-node Flume configuration
# Name the components on this agent
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1
# Describe/configure source1
agent1.sources.source1.type = avro
agent1.sources.source1.bind = localhost
agent1.sources.source1.port = 41414
# Use a channel which buffers events in memory
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 10000
agent1.channels.channel1.transactionCapacity = 100
# Describe sink1
agent1.sinks.sink1.type = HDFS
agent1.sinks.sink1.hdfs.path = hdfs://dbkorando.kaist.ac.kr:9000/flume
# Bind the source and sink the channel
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
and i command
flume-ng agent -n agent1 -c conf -C /home/hyahn/hadoop-0.20.2/hadoop-0.20.2-core.jar -f conf/example2.conf -Dflume.root.logger=INFO,console
The Result is
Info: Including Hadoop libraries found via (/home/hyahn/hadoop-0.20.2/bin/hadoop) for HDFS access
+ exec /usr/java/jdk1.7.0_02/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/etc/flume-ng/conf:/usr/lib/flume-ng/lib/*:/home/hyahn/hadoop-0.20.2/hadoop-0.20.2-core.jar' -Djava.library.path=:/home/hyahn/hadoop-0.20.2/bin/../lib/native/Linux-amd64-64 org.apache.flume.node.Application -n agent1 -f conf/example2.conf
2012-11-27 15:33:17,250 (main) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 1
2012-11-27 15:33:17,253 (main) [INFO - org.apache.flume.node.FlumeNode.start(FlumeNode.java:54)] Flume node starting - agent1
2012-11-27 15:33:17,257 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider.start(AbstractFileConfigurationProvider.java:67)] Configuration provider starting
2012-11-27 15:33:17,257 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.nodemanager.DefaultLogicalNodeManager.start(DefaultLogicalNodeManager.java:203)] Node manager starting
2012-11-27 15:33:17,258 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.lifecycle.LifecycleSupervisor.start(LifecycleSupervisor.java:67)] Starting lifecycle supervisor 9
2012-11-27 15:33:17,258 (conf-file-poller-0) [INFO - org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:195)] Reloading configuration file:conf/example2.conf
2012-11-27 15:33:17,266 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:988)] Processing:sink1
2012-11-27 15:33:17,266 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:988)] Processing:sink1
2012-11-27 15:33:17,267 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:988)] Processing:sink1
2012-11-27 15:33:17,268 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:902)] Added sinks: sink1 Agent: agent1
2012-11-27 15:33:17,290 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:122)] Post-validation flume configuration contains configuration for agents: [agent1]
2012-11-27 15:33:17,290 (conf-file-poller-0) [INFO - org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadChannels(PropertiesFileConfigurationProvider.java:249)] Creating channels
2012-11-27 15:33:17,354 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.(MonitoredCounterGroup.java:68)] Monitoried counter group for type: CHANNEL, name: channel1, registered successfully.
2012-11-27 15:33:17,355 (conf-file-poller-0) [INFO - org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadChannels(PropertiesFileConfigurationProvider.java:273)] created channel channel1
2012-11-27 15:33:17,368 (conf-file-poller-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.(MonitoredCounterGroup.java:68)] Monitoried counter group for type: SOURCE, name: source1, registered successfully.
2012-11-27 15:33:17,378 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:70)] Creating instance of sink: sink1, type: HDFS
As above, the problem that flume-ng stop at the sink generating part has occurred.
What is the problem?

you need to open another window and send an avro command at port 41414 as:
bin/flume-ng avro-client --conf conf -H localhost -p 41414 -F /home/hadoop1/aaa.txt -Dflume.root.logger=DEBUG,console
here i have a file named aaa.txt at /home/hadoop1/ directory
your flume will read this file and send to hdfs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SageMaker multimodel and RandomCutForest - amazon-web-services

Related

How to define Log Configuration for AWS Batch in Step function definition?

How to deploy image classifier with resnet50 model on AWS endpoint to predict without worker dying?

Velero - not able to restore PVC

Fiware: can not start cygnus as service

Cannot create sink whose type is HDFS in flume-ng

Categories

Resources