I'm encoutering a situation that using reload-on-rss/reload-on-as = 256 to avoid memor leak, it works for almost all scenes.
however, by setting up these params, there's a extremely small probability that all processes are running out of memory which been set. so they'll all respawn at the same time, and I found out rest requests responded in 502.
So I wanna ask if there is a way to keep at least one worker running to process requests even its memory usage's exceeded (it should not reload untill another worker started)? I've tried searching for the settings to make it but cannot found anything. does anyone can help? thank you so much! (and I'm not sure whether I make myself clear. sorry about that)
here is my uwsgi.ini:
http = :28888
touch-reload = true
reload-on-as = 256
reload-on-rss = 256
procname-prefix-spaced=service_name
module = service_name.wsgi:application
chdir = ./
pidfile = uwsgi.pid
socket = uwsgi.sock
master = true
vacuum = true
thunder-lock = true
enable-threads = true
harakiri = 600
processes = 20
threads = 10
py-autoreload = 1
chmod-socket = 664
post-buffering = 10240
socket-timeout = 3600
http-timeout = 3600
uwsgi_read_timeout = 3600
listen = 10000
Context:
We are working on migration of the driver, which is currently represented as a kernel extension, to the DriverKit framework.
The driver works with Thunderbolt RAID storage devices.
When connected through the Thunderbolt interface to the host, the device itself presents in the OS as a PCI device. The main function of our driver (.kext) is to create a "virtual" SCSI device in the OS for each virtual RAID array. So that the OS can work with these SCSI drives as usual disk storage.
We use https://developer.apple.com/documentation/scsicontrollerdriverkit to migrate this functionality in the dext version of the driver.
Current issue:
When a device is connected - the dext driver cannot create a SCSI drive in the OS.
Technically our dext tries to create a SCSI drive using the UserCreateTargetForID() method.
On this step the OS sends the first SCSI command "Test Unit Ready" to the device to check if it is a SCSI device.
We process this command in an additional thread separated from the main process of the dext (as it is recommended in the DriverKit documentation).
We can see in the logs that the device receives this command and responses but when our dext sends this response to the OS the process is stuck in the waiting mode. How can we understand why it happens and fix it?
More details:
We are migrating functionality of an already existing “.kext” driver. We checked logs of the kext driver of this step:
15:06:17.902539+0700 Target device try to create for idx:0
15:06:17.902704+0700 Send command 0 for target 0 len 0
15:06:18.161777+0700 Complete command: 0 for target: 0 Len: 0 status: 0 flags: 0
15:06:18.161884+0700 Send command 18 for target 0 len 6
15:06:18.161956+0700 Complete command: 18 for target: 0 Len: 6 status: 0 flags: 0
15:06:18.162010+0700 Send command 18 for target 0 len 44
15:06:18.172972+0700 Complete command: 18 for target: 0 Len: 44 status: 0 flags: 0
15:06:18.275501+0700 Send command 18 for target 0 len 36
15:06:18.275584+0700 Complete command: 18 for target: 0 Len: 36 status: 0 flags: 0
15:06:18.276257+0700 Target device created for idx:0
We can see a successful message “Target device created for idx:0”
In the the dext logs of the same step:
We do not see the “Send command 18 for target 0 len 6” as we have in the kext logs
no log of the successful result “Target device created for idx:0”
I'll add a thread name to each line of the dext log (CustomThread,DefaultQueue,SendCommandCustomThread,InterruptQueue):
15:54:10.903466+0700 Try to create target for 0 UUID 432421434863538456 - CustomThread
15:54:10.903633+0700 UserDoesHBAPerformAutoSense - DefaultQueue
15:54:10.903763+0700 UserInitializeTargetForID - DefaultQueue
15:54:10.903876+0700 UserDoesHBASupportMultiPathing DefaultQueue
15:54:10.904200+0700 UserProcessParallelTask start - DefaultQueue
15:54:10.904298+0700 Sent command : 0 len 0 for target 0 - SendCommandCustomThread
15:54:11.163003+0700 Disable interrupts - InterruptQueue
15:54:11.163077+0700 Complete cmd : 0 for target: 0 len: 0 status: 0 flags: 0 - InterruptQueue
15:54:11.163085+0700 Enable interrupts - InterruptQueue
Code for complete task
SCSIUserParallelResponse osRsp = {0};
osRsp.fControllerTaskIdentifier = osTask->taskId;
osRsp.fTargetID = osTask->targetId;
osRsp.fServiceResponse = kSCSIServiceResponse_TASK_COMPLETE;
osRsp.fCompletionStatus = (SCSITaskStatus) response->status;
// Transfer length computation.
osRsp.fBytesTransferred = transferLength; // === 0 for this case.
ParallelTaskCompletion(osTask->action, osRsp);
osTask->action->release();
Will appreciate any help
This is effectively a deadlock, which you seem to have already worked out. It's not 100% clear from your your question, but as I initially had the same problem, I assume you're calling UserCreateTargetForID from the driver's default queue. This won't work, you must call it from a non-default queue because SCSIControllerDriverKit assumes that your default queue is idle and ready to handle requests from the kernel while you are calling this function. The header docs are very ambiguous on this, though they do mention it:
The dext class should call this method to create a new target for the
targetID. The framework ensures that the new target is created before the call returns.
Note that this call to the framework runs on the Auxiliary queue.
SCSIControllerDriverKit expects your driver to use 3 different dispatch queues (default, auxiliary, and interrupt), although I think it can be done with 2 as well. I recommend you (re-)watch the relevant part of the WWDC2020 session video about how Apple wants you to use the 3 dispatch queues, exactly. The framework does not seem to be very flexible on this point.
Good luck with the rest of the driver port, I found this DriverKit framework even more fussy than the other ones.
Thanks to pmdj for direction of think. For my case answer is just add initialization for version field for response.
osRsp.version = kScsiUserParallelTaskResponseCurrentVersion1;
It looks obvious. But there are no any information in docs or WWDC2020 video about initialization version field.
My project is hardware raid 'user space driver' . My driver has now completed the io stress test. Your problem should be in the SCSI command with data transfer. And you want to send data to the system by your software driver to complete the SCSI ' inquiry ' command. I think you also used 'UserGetDataBuffer'. It seems to be some distance from iokit's function.
kern_return_t IMPL ( XXXXUserSpaceDriver, UserProcessParallelTask )
{
/*
**********************************************************************
** UserGetDataBuffer
**********************************************************************
*/
if(parallelTask.fCommandDescriptorBlock[0] == SCSI_CMD_INQUIRY)
{
IOBufferMemoryDescriptor *data_buffer_memory_descriptor = nullptr;
/*
******************************************************************************************************************************************
** virtual kern_return_t UserGetDataBuffer(SCSIDeviceIdentifier fTargetID, uint64_t fControllerTaskIdentifier, IOBufferMemoryDescriptor **buffer);
******************************************************************************************************************************************
*/
if((UserGetDataBuffer(parallelTask.fTargetID, parallelTask.fControllerTaskIdentifier, &data_buffer_memory_descriptor) == kIOReturnSuccess) && (data_buffer_memory_descriptor != NULL))
{
IOAddressSegment data_buffer_virtual_address_segment = {0};
if(data_buffer_memory_descriptor->GetAddressRange(&data_buffer_virtual_address_segment) == kIOReturnSuccess)
{
IOAddressSegment data_buffer_physical_address_segment = {0};
IODMACommandSpecification dmaSpecification;
IODMACommand *data_buffer_iodmacommand = {0};
bzero(&dmaSpecification, sizeof(dmaSpecification));
dmaSpecification.options = kIODMACommandSpecificationNoOptions;
dmaSpecification.maxAddressBits = 64;
if(IODMACommand::Create(ivars->pciDevice, kIODMACommandCreateNoOptions, &dmaSpecification, &data_buffer_iodmacommand) == kIOReturnSuccess)
{
uint64_t dmaFlags = kIOMemoryDirectionInOut;
uint32_t dmaSegmentCount = 1;
pCCB->data_buffer_iodmacommand = data_buffer_iodmacommand;
if(data_buffer_iodmacommand->PrepareForDMA(kIODMACommandPrepareForDMANoOptions, data_buffer_memory_descriptor, 0/*offset*/, parallelTask.fRequestedTransferCount/*length*/, &dmaFlags, &dmaSegmentCount, &data_buffer_physical_address_segment) == kIOReturnSuccess)
{
parallelTask.fBufferIOVMAddr = (uint64_t)data_buffer_physical_address_segment.address; /* data_buffer_physical_address: overwrite original fBufferIOVMAddr */
pCCB->OSDataBuffer = reinterpret_cast <uint8_t *> (data_buffer_virtual_address_segment.address);/* data_buffer_virtual_address */
}
}
}
}
}
}
response.fBytesTransferred = dataxferlen;
response.version = kScsiUserParallelTaskResponseCurrentVersion1;
response.fTargetID = TARGETLUN2SCSITARGET(TargetID, 0);
response.fControllerTaskIdentifier = pCCB->fControllerTaskIdentifier;
response.fCompletionStatus = taskStatus;
response.fServiceResponse = serviceResponse;
response.fSenseLength = taskStatus;
IOUserSCSIParallelInterfaceController::ParallelTaskCompletion(pCCB->completion, response);
pCCB->completion->release();
pCCB->completion = NULL;
pCCB->ccb_flags.start = 0;/*reset startdone for outstanding ccb check*/
if(pCCB->data_buffer_iodmacommand != NULL)
{
pCCB->data_buffer_iodmacommand->CompleteDMA(kIODMACommandCompleteDMANoOptions);
OSSafeReleaseNULL(pCCB->data_buffer_iodmacommand); // pCCB->data_buffer_iodmacommand->free(); pCCB->data_buffer_iodmacommand = NULL;
pCCB->OSDataBuffer = NULL;
}
I have created simple DAG using Bash Operator and BigQuery operator, there are approx 40 BigQuery tasks, I want to execute all 40 BigQuery task to execute parallel, only 10-15 tasks are executing parallel, I have tried with different Configuration parameters.
I also also need to execute 15-20 same kind of DAG, I need to execute all DAGs to execute parallel, all tasks from all DAGs.
My DAG is like
start >> bigquery task_1 >> end
start >> bigquery task_2 >> end
... ... ... ... ... ... ...
... ... ... ... ... ... ...
start >> bigquery task_n.. >> end
start and end is bash operator
My composer configuration as below:
Image version: composer-1.13.3-airflow-1.10.12
Python version: 3
Worker nodes"
Node count: 3
Disk size (GB): 50
Machine type: n1-standard-2
I have tried with default Airflow configuration and well as below custom configuration and don't see much improvement.
custom configuration 1
parallelism = 300
dag_concurrency = 60
max_active_runs_per_dag = 15
enable_xcom_pickling = False
sql_alchemy_pool_recycle = 570
store_serialized_dags = False
min_serialized_dag_update_interval = 30
dag_concurrencymax_active_runs_per_dag = 60
custom configuration 2
parallelism = 230
dag_concurrency = 46
max_active_runs_per_dag = 15
enable_xcom_pickling = False
sql_alchemy_pool_recycle = 570
store_serialized_dags = False
min_serialized_dag_update_interval = 30
dag_concurrencymax_active_runs_per_dag = 46
Any suggestion will be highly appreciated.
I am trying to use Flume-ng to grab 128MB of log information and put it into a file in HDFS. But HDFS rolling options not working. Flume-ng send log file per seconds. How can I fix flume.conf file?
agent01.sources = avroGenSrc
agent01.channels = memoryChannel hdfsChannel
agent01.sinks = fileSink hadoopSink
# For each one of the sources, the type is defined
agent01.sources.avroGenSrc.type = avro
agent01.sources.avroGenSrc.bind = dev-hadoop03.ncl
agent01.sources.avroGenSrc.port = 3333
# The channel can be defined as follows.
agent01.sources.avroGenSrc.channels = memoryChannel hdfsChannel
# Each sink's type must be defined
agent01.sinks.fileSink.type = file_roll
agent01.sinks.fileSink.sink.directory = /home1/irteam/flume/data
agent01.sinks.fileSink.sink.rollInterval = 3600
agent01.sinks.fileSink.sink.batchSize = 100
#Specify the channel the sink should use
agent01.sinks.fileSink.channel = memoryChannel
agent01.sinks.hadoopSink.type = hdfs
agent01.sinks.hadoopSink.hdfs.useLocalTimeStamp = true
agent01.sinks.hadoopSink.hdfs.path = hdfs://dev-hadoop04.ncl:9000/user/hive/warehouse/raw_logs/year=%Y/month=%m/day=%d
agent01.sinks.hadoopSink.hdfs.filePrefix = AccessLog.%Y-%m-%d.%Hh
agent01.sinks.hadoopSink.hdfs.fileType = DataStream
agent01.sinks.hadoopSink.hdfs.writeFormat = Text
agent01.sinks.hadoopSink.hdfs.rollInterval = 0
agent01.sinks.hadoopSink.hdfs.rollSize = 134217728
agent01.sinks.hadoopSink.hdfs.rollCount = 0
#Specify the channel the sink should use
agent01.sinks.hadoopSink.channel = hdfsChannel
# Each channel's type is defined.
agent01.channels.memoryChannel.type = memory
agent01.channels.hdfsChannel.type = memory
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent01.channels.memoryChannel.capacity = 100000
agent01.channels.memoryChannel.transactionCapacity = 10000
agent01.channels.hdfsChannel.capacity = 100000
agent01.channels.hdfsChannel.transactionCapacity = 10000
I found this solution. dfs.replication mismatch cause this problem.
In my hadoop conf (hadoop-2.7.2/etc/hadoop/hdfs-site.xml)
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
I have 2 data nodes so I change it to
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
and I add config in flume.conf
agent01.sinks.hadoopSink.hdfs.minBlockReplicas = 2
thanks for
https://qnalist.com/questions/5015704/hit-max-consecutive-under-replication-rotations-error
and
Flume HDFS sink keeps rolling small files
I'm requesting a file with a size around 14MB from a slow server with urllib2.urlopen, and it spend more than 60 seconds to get the data, and I'm getting the error:
Deadline exceeded while waiting for HTTP response from URL:
http://bigfile.zip?type=CSV
Here my code:
class CronChargeBT(webapp2.RequestHandler):
def get(self):
taskqueue.add(queue_name = 'optimized-queue', url='/cronChargeBTB')
class CronChargeBTB(webapp2.RequestHandler):
def post(self):
url = "http://bigfile.zip?type=CSV"
url_request = urllib2.Request(url)
url_request.add_header('Accept-encoding', 'gzip')
urlfetch.set_default_fetch_deadline(300)
response = urllib2.urlopen(url_request, timeout=300)
buf = StringIO(response.read())
f = gzip.GzipFile(fileobj=buf)
...work with the data insiste the file...
I create a cron task who calls CronChargeBT. Here the cron.yaml:
- description: cargar BlueTomato
url: /cronChargeBT
target: charge
schedule: every wed,sun 01:00
and it create a new task and insert into a queue, here the queue configuration:
- name: optimized-queue
rate: 40/s
bucket_size: 60
max_concurrent_requests: 10
retry_parameters:
task_retry_limit: 1
min_backoff_seconds: 10
max_backoff_seconds: 200
Of coursethat the timeout=300 isn't working because the 60seconds limit in GAE but I think yhat I can avoid it using a task... anyone knows how I can get the data in the file avoiding this timeout.
Thanks a lot!!!
Cron jobs are limited to 10 minutes deadline, not 60 seconds. If your download fails, perhaps just retry? Does the download work if you download it from your computer? There's nothing you can do on GAE if the server you are downloading from is too slow or unstable.
Edit: According to https://cloud.google.com/appengine/docs/java/outbound-requests#request_timeouts, there is a maximum deadline of 60 seconds for cron job requests. Therefore, you can't get around it.