Condor master node and workers only see the master node

Condor master node and workers only see the master node - amazon-web-services

I am trying to set a HTCondor batch system, but when I do condor_status it only shows the master in both the master and worker nodes. They both show this:
Name OpSys Arch State Activity LoadAv Mem
[master ip] LINUX X86_64 Unclaimed Idle 0.000 973
Total Owner Claimed Unclaimed Matched Preempting Backfill Drain
X86_64/LINUX 1 0 0 1 0 0 0 0
Total 1 0 0 1 0 0 0 0
Condor_restart on the master node works fine, but on the worker nodes yields this error:
ERROR
SECMAN:2010:Received "DENIED" from server for user unauthenticated#unmapped using no authentication method, which may imply host-based security. Our address was '[ip address of master]', and server's address was '[ip address of worker]'. Check your ALLOW settings and IP protocols.
Here are the config files:
of the master node:
CONDOR_HOST = [private ip of master]
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD
# to avoid user authentication
HOSTALLOW_READ = *
HOSTALLOW_WRITE = *
HOSTALLOW_ADMINISTRATOR = *
of the worker node:
CONDOR_HOST = [private ip of master]
DAEMON_LIST = MASTER, STARTD
# to avoid user authentication
HOSTALLOW_READ = *
HOSTALLOW_WRITE = *
HOSTALLOW_ADMINISTRATOR = *
I am allowing on the same security group:
All TCP TCP 0 - 65535
All ICMP-IPv4 All
SSH on port 22
This is how it looks like (security group ending in '6')

Apparently the issue was running condor_reconfig -full. I just reinstalled it without doing that and using systemctl restart condor instead and it worked. If someone wants to bring some insight on why it was so please do so :)

Related

SCSIControllerDriverKit: Process gets stuck on UserCreateTargetForID

Context:
We are working on migration of the driver, which is currently represented as a kernel extension, to the DriverKit framework.
The driver works with Thunderbolt RAID storage devices.
When connected through the Thunderbolt interface to the host, the device itself presents in the OS as a PCI device. The main function of our driver (.kext) is to create a "virtual" SCSI device in the OS for each virtual RAID array. So that the OS can work with these SCSI drives as usual disk storage.
We use https://developer.apple.com/documentation/scsicontrollerdriverkit to migrate this functionality in the dext version of the driver.
Current issue:
When a device is connected - the dext driver cannot create a SCSI drive in the OS.
Technically our dext tries to create a SCSI drive using the UserCreateTargetForID() method.
On this step the OS sends the first SCSI command "Test Unit Ready" to the device to check if it is a SCSI device.
We process this command in an additional thread separated from the main process of the dext (as it is recommended in the DriverKit documentation).
We can see in the logs that the device receives this command and responses but when our dext sends this response to the OS the process is stuck in the waiting mode. How can we understand why it happens and fix it?
More details:
We are migrating functionality of an already existing “.kext” driver. We checked logs of the kext driver of this step:
15:06:17.902539+0700 Target device try to create for idx:0
15:06:17.902704+0700 Send command 0 for target 0 len 0
15:06:18.161777+0700 Complete command: 0 for target: 0 Len: 0 status: 0 flags: 0
15:06:18.161884+0700 Send command 18 for target 0 len 6
15:06:18.161956+0700 Complete command: 18 for target: 0 Len: 6 status: 0 flags: 0
15:06:18.162010+0700 Send command 18 for target 0 len 44
15:06:18.172972+0700 Complete command: 18 for target: 0 Len: 44 status: 0 flags: 0
15:06:18.275501+0700 Send command 18 for target 0 len 36
15:06:18.275584+0700 Complete command: 18 for target: 0 Len: 36 status: 0 flags: 0
15:06:18.276257+0700 Target device created for idx:0
We can see a successful message “Target device created for idx:0”
In the the dext logs of the same step:
We do not see the “Send command 18 for target 0 len 6” as we have in the kext logs
no log of the successful result “Target device created for idx:0”
I'll add a thread name to each line of the dext log (CustomThread,DefaultQueue,SendCommandCustomThread,InterruptQueue):
15:54:10.903466+0700 Try to create target for 0 UUID 432421434863538456 - CustomThread
15:54:10.903633+0700 UserDoesHBAPerformAutoSense - DefaultQueue
15:54:10.903763+0700 UserInitializeTargetForID - DefaultQueue
15:54:10.903876+0700 UserDoesHBASupportMultiPathing DefaultQueue
15:54:10.904200+0700 UserProcessParallelTask start - DefaultQueue
15:54:10.904298+0700 Sent command : 0 len 0 for target 0 - SendCommandCustomThread
15:54:11.163003+0700 Disable interrupts - InterruptQueue
15:54:11.163077+0700 Complete cmd : 0 for target: 0 len: 0 status: 0 flags: 0 - InterruptQueue
15:54:11.163085+0700 Enable interrupts - InterruptQueue
Code for complete task
SCSIUserParallelResponse osRsp = {0};
osRsp.fControllerTaskIdentifier = osTask->taskId;
osRsp.fTargetID = osTask->targetId;
osRsp.fServiceResponse = kSCSIServiceResponse_TASK_COMPLETE;
osRsp.fCompletionStatus = (SCSITaskStatus) response->status;
// Transfer length computation.
osRsp.fBytesTransferred = transferLength; // === 0 for this case.
ParallelTaskCompletion(osTask->action, osRsp);
osTask->action->release();
Will appreciate any help

This is effectively a deadlock, which you seem to have already worked out. It's not 100% clear from your your question, but as I initially had the same problem, I assume you're calling UserCreateTargetForID from the driver's default queue. This won't work, you must call it from a non-default queue because SCSIControllerDriverKit assumes that your default queue is idle and ready to handle requests from the kernel while you are calling this function. The header docs are very ambiguous on this, though they do mention it:
The dext class should call this method to create a new target for the
targetID. The framework ensures that the new target is created before the call returns.
Note that this call to the framework runs on the Auxiliary queue.
SCSIControllerDriverKit expects your driver to use 3 different dispatch queues (default, auxiliary, and interrupt), although I think it can be done with 2 as well. I recommend you (re-)watch the relevant part of the WWDC2020 session video about how Apple wants you to use the 3 dispatch queues, exactly. The framework does not seem to be very flexible on this point.
Good luck with the rest of the driver port, I found this DriverKit framework even more fussy than the other ones.

Thanks to pmdj for direction of think. For my case answer is just add initialization for version field for response.
osRsp.version = kScsiUserParallelTaskResponseCurrentVersion1;
It looks obvious. But there are no any information in docs or WWDC2020 video about initialization version field.

My project is hardware raid 'user space driver' . My driver has now completed the io stress test. Your problem should be in the SCSI command with data transfer. And you want to send data to the system by your software driver to complete the SCSI ' inquiry ' command. I think you also used 'UserGetDataBuffer'. It seems to be some distance from iokit's function.
kern_return_t IMPL ( XXXXUserSpaceDriver, UserProcessParallelTask )
{
/*
**********************************************************************
** UserGetDataBuffer
**********************************************************************
*/
if(parallelTask.fCommandDescriptorBlock[0] == SCSI_CMD_INQUIRY)
{
IOBufferMemoryDescriptor *data_buffer_memory_descriptor = nullptr;
/*
******************************************************************************************************************************************
** virtual kern_return_t UserGetDataBuffer(SCSIDeviceIdentifier fTargetID, uint64_t fControllerTaskIdentifier, IOBufferMemoryDescriptor **buffer);
******************************************************************************************************************************************
*/
if((UserGetDataBuffer(parallelTask.fTargetID, parallelTask.fControllerTaskIdentifier, &data_buffer_memory_descriptor) == kIOReturnSuccess) && (data_buffer_memory_descriptor != NULL))
{
IOAddressSegment data_buffer_virtual_address_segment = {0};
if(data_buffer_memory_descriptor->GetAddressRange(&data_buffer_virtual_address_segment) == kIOReturnSuccess)
{
IOAddressSegment data_buffer_physical_address_segment = {0};
IODMACommandSpecification dmaSpecification;
IODMACommand *data_buffer_iodmacommand = {0};
bzero(&dmaSpecification, sizeof(dmaSpecification));
dmaSpecification.options = kIODMACommandSpecificationNoOptions;
dmaSpecification.maxAddressBits = 64;
if(IODMACommand::Create(ivars->pciDevice, kIODMACommandCreateNoOptions, &dmaSpecification, &data_buffer_iodmacommand) == kIOReturnSuccess)
{
uint64_t dmaFlags = kIOMemoryDirectionInOut;
uint32_t dmaSegmentCount = 1;
pCCB->data_buffer_iodmacommand = data_buffer_iodmacommand;
if(data_buffer_iodmacommand->PrepareForDMA(kIODMACommandPrepareForDMANoOptions, data_buffer_memory_descriptor, 0/*offset*/, parallelTask.fRequestedTransferCount/*length*/, &dmaFlags, &dmaSegmentCount, &data_buffer_physical_address_segment) == kIOReturnSuccess)
{
parallelTask.fBufferIOVMAddr = (uint64_t)data_buffer_physical_address_segment.address; /* data_buffer_physical_address: overwrite original fBufferIOVMAddr */
pCCB->OSDataBuffer = reinterpret_cast <uint8_t *> (data_buffer_virtual_address_segment.address);/* data_buffer_virtual_address */
}
}
}
}
}
}

response.fBytesTransferred = dataxferlen;
response.version = kScsiUserParallelTaskResponseCurrentVersion1;
response.fTargetID = TARGETLUN2SCSITARGET(TargetID, 0);
response.fControllerTaskIdentifier = pCCB->fControllerTaskIdentifier;
response.fCompletionStatus = taskStatus;
response.fServiceResponse = serviceResponse;
response.fSenseLength = taskStatus;
IOUserSCSIParallelInterfaceController::ParallelTaskCompletion(pCCB->completion, response);
pCCB->completion->release();
pCCB->completion = NULL;
pCCB->ccb_flags.start = 0;/*reset startdone for outstanding ccb check*/
if(pCCB->data_buffer_iodmacommand != NULL)
{
pCCB->data_buffer_iodmacommand->CompleteDMA(kIODMACommandCompleteDMANoOptions);
OSSafeReleaseNULL(pCCB->data_buffer_iodmacommand); // pCCB->data_buffer_iodmacommand->free(); pCCB->data_buffer_iodmacommand = NULL;
pCCB->OSDataBuffer = NULL;
}

nslookup showing a lot of information

I am taking a CS course and we're looking into the nslookup command. When my instructor does it he gets only the non authoritative results. When I type it, I get a ton of info with the info I'm looking for based on the -type== option than I input hidden amongst it. Here's my output. is this normal?
I ran nslookup -type==NS starwars.com
main parsing starwars.com
addlookup()
make_empty_lookup()
make_empty_lookup() = 0x7f9118d9e000->references = 1
looking up starwars.com
lock_lookup dighost.c:4184
success
start_lookup()
setup_lookup(0x7f9118d9e000)
resetting lookup counter.
cloning server list
clone_server_list()
make_server(75.75.75.75)
make_server(75.75.76.76)
idn_textname: starwars.com
using root origin
recursive query
add_question()
starting to render the message
done rendering
create query 0x7f9117a2d000 linked to lookup 0x7f9118d9e000
dighost.c:2083:lookup_attach(0x7f9118d9e000) = 2
dighost.c:2587:new_query(0x7f9117a2d000) = 1
create query 0x7f9117a2d1c0 linked to lookup 0x7f9118d9e000
dighost.c:2083:lookup_attach(0x7f9118d9e000) = 3
dighost.c:2587:new_query(0x7f9117a2d1c0) = 1
do_lookup()
start_udp(0x7f9117a2d000)
dighost.c:2936:query_attach(0x7f9117a2d000) = 2
working on lookup 0x7f9118d9e000, query 0x7f9117a2d000
dighost.c:2981:query_attach(0x7f9117a2d000) = 3
unlock_lookup dighost.c:4186
dighost.c:2898:query_attach(0x7f9117a2d000) = 4
recving with lookup=0x7f9118d9e000, query=0x7f9117a2d000, handle=(nil)
recvcount=1
have local timeout of 5000
dighost.c:2847:query_attach(0x7f9117a2d000) = 5
sending a request
sendcount=1
dighost.c:1676:query_detach(0x7f9117a2d000) = 4
dighost.c:2918:query_detach(0x7f9117a2d000) = 3
send_done(0x7f9117a8d000, success, 0x7f9117a2d000)
sendcount=0
lock_lookup dighost.c:2615
success
dighost.c:2629:lookup_attach(0x7f9118d9e000) = 4
dighost.c:2648:query_detach(0x7f9117a2d000) = 2
dighost.c:2649:lookup_detach(0x7f9118d9e000) = 3
check_if_done()
list empty
unlock_lookup dighost.c:2652
recv_done(0x7f9117a8d000, success, 0x7f91187fa010, 0x7f9117a2d000)
lock_lookup dighost.c:3577
success
recvcount=0
dighost.c:3589:lookup_attach(0x7f9118d9e000) = 4
before parse starts
after parse
printmessage()
Server: 75.75.75.75
Address: 75.75.75.75#53
Non-authoritative answer:
printsection()
starwars.com nameserver = a28-65.akam.net.
starwars.com nameserver = a9-66.akam.net.
starwars.com nameserver = a13-67.akam.net.
starwars.com nameserver = a12-66.akam.net.
starwars.com nameserver = a18-64.akam.net.
starwars.com nameserver = a1-127.akam.net.
Authoritative answers can be found from:
printsection()
printsection()
a9-66.akam.net internet address = 184.85.248.66
a9-66.akam.net has AAAA address 2a02:26f0:117::42
a13-67.akam.net internet address = 2.22.230.67
a13-67.akam.net has AAAA address 2600:1480:800::43
a12-66.akam.net internet address = 184.26.160.66
a18-64.akam.net internet address = 95.101.36.64
a1-127.akam.net internet address = 193.108.91.127
a1-127.akam.net has AAAA address 2600:1401:2::7f
a28-65.akam.net internet address = 95.100.173.65
still pending.
dighost.c:4079:query_detach(0x7f9117a2d000) = 1
dighost.c:4081:_cancel_lookup()
dighost.c:2669:query_detach(0x7f9117a2d000) = 0
dighost.c:2669:destroy_query(0x7f9117a2d000) = 0
dighost.c:1634:lookup_detach(0x7f9118d9e000) = 3
dighost.c:2669:query_detach(0x7f9117a2d1c0) = 0
dighost.c:2669:destroy_query(0x7f9117a2d1c0) = 0
dighost.c:1634:lookup_detach(0x7f9118d9e000) = 2
check_if_done()
list empty
dighost.c:4087:lookup_detach(0x7f9118d9e000) = 1
clear_current_lookup()
dighost.c:1759:lookup_detach(0x7f9118d9e000) = 0
destroy_lookup
freeing server 0x7f9117a12000 belonging to 0x7f9118d9e000
freeing server 0x7f9117a12a00 belonging to 0x7f9118d9e000
start_lookup()
check_if_done()
list empty
shutting down
dighost_shutdown()
unlock_lookup dighost.c:4091
done, and starting to shut down
cancel_all()
lock_lookup dighost.c:4200
success
unlock_lookup dighost.c:4231
destroy_libs()
freeing task
lock_lookup dighost.c:4251
success
flush_server_list()
destroy DST lib
unlock_lookup dighost.c:4279
Removing log context
Destroy memory
Just seeing if this is the normal output, because on my instructors screen, he only gets the Authoritative and Non Authoritative sections.

Looks like it relates to this: https://bugs.kali.org/view.php?id=7522
Try adding -nod2 when you run the command.

Flask-sqlalchemy / uwsgi: DB connection problem when more than on process is used

I have a Flask app running on Heroku with uwsgi server in which each user connects to his own database. I have implemented the solution reported here for a very similar situation. In particular, I have implemented the connection registry as follows:
class DBSessionRegistry():
_registry = {}
def get(self, URI, **kwargs):
if URI not in self._registry:
current_app.logger.info(f'INFO - CREATING A NEW CONNECTION')
try:
engine = create_engine(URI,
echo=False,
pool_size=5,
max_overflow=5)
session_factory = sessionmaker(bind=engine)
Session = scoped_session(session_factory)
a_session = Session()
self._registry[URI] = a_session
except ArgumentError:
raise Exception('Error')
current_app.logger.info(f'SESSION ID: {id(self._registry[URI])}')
current_app.logger.info(f'REGISTRY ID: {id(self._registry)}')
current_app.logger.info(f'REGISTRY SIZE: {len(self._registry.keys())}')
current_app.logger.info(f'APP ID: {id(current_app)}')
return self._registry[URI]
In my create_app() I assign a registry to the app:
app.DBregistry = DBSessionRegistry()
and whenever I need to talk to the DB I call:
current_app.DBregistry.get(URI)
where the URI is dependent on the user. This works nicely if I use uwsgi with one single process. With more processes,
[uwsgi]
processes = 4
threads = 1
sometimes it gets stuck on some requests, returning a 503 error code. I have found that the problem appears when the requests are handled by different processes in uwsgi. This is an excerpt of the log, which I commented to illustrate the issue:
# ... EVERYTHING OK UP TO HERE.
# ALL PREVIOUS REQUESTS HANDLED BY PROCESS pid = 12
INFO in utils: SESSION ID: 139860361716304
INFO in utils: REGISTRY ID: 139860484608480
INFO in utils: REGISTRY SIZE: 1
INFO in utils: APP ID: 139860526857584
# NOTE THE pid IN THE NEXT LINE...
[pid: 12|app: 0|req: 1/1] POST /manager/_save_task =>
generated 154 bytes in 3457 msecs (HTTP/1.1 200) 4 headers in 601
bytes (1 switches on core 0)
# PREVIOUS REQUEST WAS MANAGED BY PROCESS pid = 12
# THE NEXT REQUEST IS FROM THE SAME USER AND TO THE SAME URL.
# SO THERE IS NO NEED FOR CREATING A NEW CONNECTION, BUT INSTEAD...
INFO - CREATING A NEW CONNECTION
# TO THIS POINT, I DON'T UNDERSTAND WHY IT CREATED A NEW CONNECTION.
# THE SESSION ID CHANGES, AS IT IS A NEW SESSION
INFO in utils: SESSION ID: 139860363793168 # <<--- CHANGED
INFO in utils: REGISTRY ID: 139860484608480
INFO in utils: REGISTRY SIZE: 1
# THE APP AND THE REGISTRY ARE UNIQUE
INFO in utils: APP ID: 139860526857584
# uwsgi GIVES UP...
*** HARAKIRI ON WORKER 4 (pid: 11, try: 1) ***
# THE FAILED REQUEST WAS MANAGED BY PROCESS pid = 11
# I ASSUME THIS IS WHY IT CREATED A NEW CONNECTION
HARAKIRI: -- syscall> 7 0x7fff4290c6d8 0x1 0xffffffff 0x4000 0x0 0x0
0x7fff4290c6b8 0x7f33d6e3cbc4
HARAKIRI: -- wchan> poll_schedule_timeout
HARAKIRI !!! worker 4 status !!!
HARAKIRI [core 0] - POST /manager/_save_task since 1587660997
HARAKIRI !!! end of worker 4 status !!!
heroku[router]: at=error code=H13 desc="Connection closed without
response" method=POST path="/manager/_save_task"
DAMN ! worker 4 (pid: 11) died, killed by signal 9 :( trying respawn ...
Respawned uWSGI worker 4 (new pid: 14)
# FROM HERE ON, NOTHINGS WORKS ANYMORE
This behavior is consistent over several attempts: when the pid changes, the request fails. Even with a pool_size = 1 in the create_engine function the issue persists. No issue instead is uwsgi is used with one process.
I am pretty sure it is my fault, there is something I don't know or I don't understand about how uwsgi and/or sqlalchemy work. Could you please help me?
Thanks

What is hapeening is that you are trying to share memory between processes.
There are some exaplanations in these posts.
(is it possible to share memory between uwsgi processes running flask app?).
(https://stackoverflow.com/a/45383617/11542053)
You can use an extra layer to store your sessions outsite of the app.
For that, you can use uWsgi's SharedArea(https://uwsgi-docs.readthedocs.io/en/latest/SharedArea.html) which is very low level or you can user other approaches like uWsgi's caching(https://uwsgi-docs.readthedocs.io/en/latest/Caching.html)
hope it helps.

PAM Authentication failure for root during pexpect python

the below observation is not always the case, but after some time accessing the SUT several times with ssh with root user and correct password the python code gets into trouble with:
Apr 25 05:51:56 SUT sshd[31570]: pam_tally2(sshd:auth): user root (0) tally 83, deny 10
Apr 25 05:52:16 SUT sshd[31598]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.10.10.13 user=root
Apr 25 05:52:21 SUT sshd[31568]: error: PAM: Authentication failure for root from 10.10.10.13
Apr 25 05:52:21 SUT sshd[31568]: Connection closed by 10.10.10.13 [preauth]
This is the below python code:
COMMAND_PROMPT = '.*:~ #'
SSH_NEWKEY = '(?i)are you sure you want to continue connecting'
def scp(source, dest, password):
cmd = 'scp ' + source + ' ' + dest
try:
child = pexpect.spawn('/bin/bash', ['-c', cmd], timeout=None)
res = child.expect([pexpect.TIMEOUT, SSH_NEWKEY, COMMAND_PROMPT, '(?i)Password'])
if res == 0:
print('TIMEOUT Occurred.')
if res == 1:
child.sendline('yes')
child.expect('(?i)Password')
child.sendline(password)
child.expect([pexpect.EOF], timeout=60)
if res == 2:
pass
if res == 3:
child.sendline(password)
child.expect([pexpect.EOF], timeout=60)
except:
print('File not copied!!!')
self.logger.error(str(self.child))
When the ssh is unsuccessful, this is the pexpect printout:
version: 2.3 ($Revision: 399 $)
command: /usr/bin/ssh
args: ['/usr/bin/ssh', 'root#100.100.100.100']
searcher: searcher_re:
0: re.compile(".*:~ #")
buffer (last 100 chars): :
Account locked due to 757 failed logins
Password:
before (last 100 chars): :
Account locked due to 757 failed logins
Password:
after: <class 'pexpect.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 2284
child_fd: 5
closed: False
timeout: 30
delimiter: <class 'pexpect.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0
delayafterclose: 0.1
delayafterterminate: 0.1
Any clue maybe what could it be, is it maybe anything missing or wrong configured for pam authentication on my SUT? The problem is that when the SUT starts with this pam failures then python code will always have the problem and only a reboot of the SUT seems to help :(
Manually accessing the SUT via ssh root#... is always working, even if pexpect can't!!! The account seems not to be locked according to:
SUT:~ # passwd -S root
root P 04/24/2017 -1 -1 -1 -1
I have looked into some other questions but no real solution is mentioned or could work with my python code.
Thanks in adv.

My work around is to modify for testing purpose the pam_tally configuration files. It seems that the SUT acknowledge the multiple access as a threat and locks even the root account!
By removing this entry even_deny_root root_unlock_time=5 in the several pam_tally configuration files:
/etc/pam.d/common-account:account required pam_tally2.so deny=10 onerr=fail unlock_time=600 even_deny_root root_unlock_time=5 file=/home/test/faillog
/etc/pam.d/common-auth:auth required pam_tally2.so deny=10 onerr=fail unlock_time=600 even_deny_root root_unlock_time=5 file=/home/test/faillog
Those changes will be activated dynamically no restart of service needed!
Note: after reboot those entries will be most likely back!

ERROR 503: Service not available at persist HDFS

I have an Orion instance with Cygnus at filab; subcription and notify run fine but I can not persist data to cosmos.lab.fi-ware.org.
Cygnus returns this error:
[ERROR - es.tid.fiware.fiwareconnectors.cygnus.sinks.OrionSink.process(OrionSink.java:139)] Persistence error (The talky/talkykar/room6_room directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
This is my agent_a.conf file:
cygnusagent.sources = http-source
cygnusagent.sinks = hdfs-sink
cygnusagent.channels = hdfs-channel
#=============================================
# source configuration
# channel name where to write the notification events
cygnusagent.sources.http-source.channels = hdfs-channel
# source class, must not be changed
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# listening port the Flume source will use for receiving incoming notifications
cygnusagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
cygnusagent.sources.http-source.handler = es.tid.fiware.fiwareconnectors.cygnus.handlers.OrionRestHandler
# URL target
cygnusagent.sources.http-source.handler.notification_target = /notify
# Default service (service semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service = talky
# Default service path (service path semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service_path = talkykar
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = 10
# Source interceptors, do not change
cygnusagent.sources.http-source.interceptors = ts de
# Timestamp interceptor, do not change
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
# Destination extractor interceptor, do not change
cygnusagent.sources.http-source.interceptors.de.type = es.tid.fiware.fiwareconnectors.cygnus.interceptors.DestinationExtractor$Builder
# Matching table for the destination extractor interceptor, put the right absolute path to the file if necessary
# See the doc/design/interceptors document for more details
cygnusagent.sources.http-source.interceptors.de.matching_table = /usr/cygnus/conf/matching_table.conf
# ============================================
# OrionHDFSSink configuration
# channel name from where to read notification events
cygnusagent.sinks.hdfs-sink.channel = hdfs-channel
# sink class, must not be changed
cygnusagent.sinks.hdfs-sink.type = es.tid.fiware.fiwareconnectors.cygnus.sinks.OrionHDFSSink
# Comma-separated list of FQDN/IP address regarding the Cosmos Namenode endpoints
# If you are using Kerberos authentication, then the usage of FQDNs instead of IP addresses is mandatory
cygnusagent.sinks.hdfs-sink.cosmos_host = http://cosmos.lab.fi-ware.org
# port of the Cosmos service listening for persistence operations; 14000 for httpfs, 50070 for webhdfs and free choice for inifinty
cygnusagent.sinks.hdfs-sink.cosmos_port = 14000
# default username allowed to write in HDFS
cygnusagent.sinks.hdfs-sink.cosmos_default_username = myuser
# default password for the default username
cygnusagent.sinks.hdfs-sink.cosmos_default_password = mypass
# HDFS backend type (webhdfs, httpfs or infinity)
cygnusagent.sinks.hdfs-sink.hdfs_api = httpfs
# how the attributes are stored, either per row either per column (row, column)
cygnusagent.sinks.hdfs-sink.attr_persistence = row
# Hive FQDN/IP address of the Hive server
cygnusagent.sinks.hdfs-sink.hive_host = http://cosmos.lab.fi-ware.org
# Hive port for Hive external table provisioning
cygnusagent.sinks.hdfs-sink.hive_port = 10000
# Kerberos-based authentication enabling
cygnusagent.sinks.hdfs-sink.krb5_auth = false
# Kerberos username
cygnusagent.sinks.hdfs-sink.krb5_auth.krb5_user = krb5_username
# Kerberos password
cygnusagent.sinks.hdfs-sink.krb5_auth.krb5_password = xxxxxxxxxxxxx
# Kerberos login file
cygnusagent.sinks.hdfs-sink.krb5_auth.krb5_login_conf_file = /usr/cygnus/conf/krb5_login.conf
# Kerberos configuration file
cygnusagent.sinks.hdfs-sink.krb5_auth.krb5_conf_file = /usr/cygnus/conf/krb5.conf
#=============================================
And this is the Cygnus log:
2015-05-04 09:05:10,434 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - es.tid.fiware.fiwareconnectors.cygnus.sinks.OrionHDFSSink.persist(OrionHDFSSink.java:315)] [hdfs-sink] Persisting data at OrionHDFSSink. HDFS file (talky/talkykar/room6_room/room6_room.txt), Data ({"recvTimeTs":"1430723069","recvTime":"2015-05-04T09:04:29.819","entityId":"Room6","entityType":"Room","attrName":"temperature","attrType":"float","attrValue":"26.5","attrMd":[]})
2015-05-04 09:05:10,435 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - es.tid.fiware.fiwareconnectors.cygnus.backends.hdfs.HDFSBackendImpl.doHDFSRequest(HDFSBackendImpl.java:255)] HDFS request: PUT http://http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/mped.mlg/talky/talkykar/room6_room?op=mkdirs&user.name=mped.mlg HTTP/1.1
2015-05-04 09:05:10,435 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:186)] Connection request: [route: {}->http://http][total kept alive: 0; route allocated: 0 of 100; total allocated: 0 of 500]
2015-05-04 09:05:10,435 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:220)] Connection leased: [id: 21][route: {}->http://http][total kept alive: 0; route allocated: 1 of 100; total allocated: 1 of 500]
2015-05-04 09:05:10,435 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.DefaultClientConnection.close(DefaultClientConnection.java:169)] Connection org.apache.http.impl.conn.DefaultClientConnection#5700187d closed
2015-05-04 09:05:10,435 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.DefaultClientConnection.shutdown(DefaultClientConnection.java:154)] Connection org.apache.http.impl.conn.DefaultClientConnection#5700187d shut down
2015-05-04 09:05:10,436 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.PoolingClientConnectionManager.releaseConnection(PoolingClientConnectionManager.java:272)] Connection [id: 21][route: {}->http://http] can be kept alive for 9223372036854775807 MILLISECONDS
2015-05-04 09:05:10,436 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.DefaultClientConnection.close(DefaultClientConnection.java:169)] Connection org.apache.http.impl.conn.DefaultClientConnection#5700187d closed
2015-05-04 09:05:10,436 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.http.impl.conn.PoolingClientConnectionManager.releaseConnection(PoolingClientConnectionManager.java:278)] Connection released: [id: 21][route: {}->http://http][total kept alive: 0; route allocated: 0 of 100; total allocated: 0 of 500]
2015-05-04 09:05:10,436 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - es.tid.fiware.fiwareconnectors.cygnus.backends.hdfs.HDFSBackendImpl.doHDFSRequest(HDFSBackendImpl.java:191)] The used HDFS endpoint is not active, trying another one (host=http://cosmos.lab.fi-ware.org)
2015-05-04 09:05:10,436 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - es.tid.fiware.fiwareconnectors.cygnus.sinks.OrionSink.process(OrionSink.java:139)] Persistence error (The talky/talkykar/room6_room directory could not be created in HDFS. HttpFS response: 503 Service unavailable)
Thanks.

If you take a look to this log:
2015-05-04 09:05:10,435 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - es.tid.fiware.fiwareconnectors.cygnus.backends.hdfs.HDFSBackendImpl.doHDFSRequest(HDFSBackendImpl.java:255)] HDFS request: PUT http://http://cosmos.lab.fi-ware.org:14000/webhdfs/v1/user/mped.mlg/talky/talkykar/room6_room?op=mkdirs&user.name=mped.mlg HTTP/1.1
You will se your are trying to create a HDFS directory by using a http://http://cosmos.lab... URL (please, notice the double http://http://).
This is becasuse you have configured:
cygnusagent.sinks.hdfs-sink.hive_host = http://cosmos.lab.fi-ware.org
Instead of:
cygnusagent.sinks.hdfs-sink.hive_host = cosmos.lab.fi-ware.org
Such a parameter asks for a host, not a URL.
Being said that, in future releases we will allow for both encodings.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Condor master node and workers only see the master node - amazon-web-services

Apparently the issue was running condor_reconfig -full. I just reinstalled it without doing that and using systemctl restart condor instead and it worked. If someone wants to bring some insight on why it was so please do so :)

Related

SCSIControllerDriverKit: Process gets stuck on UserCreateTargetForID

nslookup showing a lot of information

Flask-sqlalchemy / uwsgi: DB connection problem when more than on process is used

PAM Authentication failure for root during pexpect python

ERROR 503: Service not available at persist HDFS

Categories

Resources