Running python Django script from console: Segmentation fault - django

Ububntu 11.10, Python 2.7.2+, Django 1.3.1
When I'm starting huge script "parser.py" with command # python parser.py (in the root directory of my django app), it returns "Segmentation fault" after several minutes of working.
parser.py:
from django.core.management import setup_environ
import settings
setup_environ(settings)
# here comes lots of code for parsing some pages
I'm using urllib and xmlrpclib in this script.
I'm usng vps from this template http://wiki.openvz.org/Download/template/precreated (ubuntu-11.10-x86_64.tar.gz (signature))
maybe problem is here?
root#spravker:/var/log# cat /proc/user_beancounters
Version: 2.5
uid resource held maxheld barrier limit failcnt
2311649: kmemsize 9042870 19404189 52497139 57746852 0
lockedpages 0 1034 2563 2563 0
privvmpages 125787 208103 262144 262144 0
shmpages 1938 3874 27750 27750 0
dummy 0 0 0 0 0
numproc 40 200 200 200 2173
strace -o logfile.txt python parser.py
munmap(0x2b6ed1d4b000, 4096) = 0
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=20, ...}) = 0
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4
connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0
poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}])
sendto(4, "\343X\1\0\0\1\0\0\0\0\0\0\nblogsearch\6google\3c"..., 39, MSG_NOSIGNAL, NULL, 0) = 39
poll([{fd=4, events=POLLIN}], 1, 5000) = -1 EINTR (Interrupted system call)
—- SIGSEGV (Segmentation fault) # 0 (0) —-
+++ killed by SIGSEGV +++

Related

Method PeerConnection::SetLocalDescription in NativeAPI crashes

I'm trying to set up a data channel between a server written in C++ and a Python client. The server crashes with SIGSEGV error when it tries to set a local session description created in method "CreateAnswer"
The server and client exchange SDP information via WebSocket and should open the data channel without video and audio streams. Both programs are working under docker-compose in different services. So no audio or video devices are provided. I use WebRTC Native API from m76 branch.
Crashing handler:
static void OnAnswerCreated(WebRTCManagerImpl* impl_, webrtc::SessionDescriptionInterface* desc) {
LOG4CPLUS_INFO_FMT(impl_->logger_, "Answer created session_id %s", desc->session_id().c_str());
std::string offer_string;
desc->ToString(&offer_string);
LOG4CPLUS_DEBUG_FMT(impl_->logger_, "Offer string: %s", offer_string.c_str());
impl_->peer_connection_->SetLocalDescription(&impl_->set_session_description_observer_, desc);
impl_->signaling_->SendSessionDescription(*desc);
};
I create my connection with this factory:
webrtc::PeerConnectionFactoryDependencies CreatePeerConnectionFactoryDependencies() {
webrtc::PeerConnectionFactoryDependencies dependencies;
dependencies.network_thread = nullptr;
dependencies.worker_thread = nullptr;
dependencies.signaling_thread = nullptr;
dependencies.call_factory = webrtc::CreateCallFactory();
dependencies.task_queue_factory = webrtc::CreateDefaultTaskQueueFactory();
dependencies.event_log_factory = absl::make_unique<webrtc::RtcEventLogFactory>(dependencies.task_queue_factory.get());
cricket::MediaEngineDependencies mediaDependencies;
mediaDependencies.task_queue_factory = dependencies.task_queue_factory.get();
mediaDependencies.adm = rtc::scoped_refptr<webrtc::FakeAudioDeviceModule>(new webrtc::FakeAudioDeviceModule);
mediaDependencies.audio_encoder_factory = webrtc::CreateBuiltinAudioEncoderFactory();
mediaDependencies.audio_decoder_factory = webrtc::CreateBuiltinAudioDecoderFactory();
mediaDependencies.audio_processing = webrtc::AudioProcessingBuilder().Create();
mediaDependencies.video_encoder_factory = webrtc::CreateBuiltinVideoEncoderFactory();
mediaDependencies.video_decoder_factory = webrtc::CreateBuiltinVideoDecoderFactory();
dependencies.media_engine = cricket::CreateMediaEngine(std::move(mediaDependencies));
return dependencies;
}
webrtc::PeerConnectionFactoryDependencies deps = CreatePeerConnectionFactoryDependencies();
deps.signaling_thread = signaling_thread_.get();
// deps.network_thread = network_thread.get();
// deps.worker_thread = worker_thread.get();
peer_connection_factory_ = webrtc::CreateModularPeerConnectionFactory(std::move(deps));
The call stack:
<unknown> 0x0000000001e798f7
webrtc::PeerConnection::ValidateSessionDescription(webrtc::SessionDescriptionInterface const*, cricket::ContentSource) 0x00000000005e74dc
webrtc::PeerConnection::SetLocalDescription(webrtc::SetSessionDescriptionObserver*, webrtc::SessionDescriptionInterface*) 0x00000000005bb677
void webrtc::ReturnType<void>::Invoke<webrtc::PeerConnectionInterface, void (webrtc::PeerConnectionInterface::*)(webrtc::SetSessionDescriptionObserver*, webrtc::SessionDescriptionInterface*), webrtc::SetSessionDescriptionObserver*, webrtc::SessionDescriptionInterface*>(webrtc::PeerConnectionInterface*, void (webrtc::PeerConnectionInterface::*)(webrtc::SetSessionDescriptionObserver*, webrtc::SessionDescriptionInterface*), webrtc::SetSessionDescriptionObserver*, webrtc::SessionDescriptionInterface*) 0x000000000059b814
webrtc::MethodCall2<webrtc::PeerConnectionInterface, void, webrtc::SetSessionDescriptionObserver*, webrtc::SessionDescriptionInterface*>::OnMessage(rtc::Message*) 0x0000000000598f5f
webrtc::internal::SynchronousMethodCall::Invoke(rtc::Location const&, rtc::Thread*) 0x00000000007198fc
webrtc::MethodCall2<webrtc::PeerConnectionInterface, void, webrtc::SetSessionDescriptionObserver*, webrtc::SessionDescriptionInterface*>::Marshal(rtc::Location const&, rtc::Thread*) 0x0000000000593706
webrtc::PeerConnectionProxyWithInternal<webrtc::PeerConnectionInterface>::SetLocalDescription(webrtc::SetSessionDescriptionObserver*, webrtc::SessionDescriptionInterface*) 0x000000000058c982
preprocessor::p2p::WebRTCManager::WebRTCManagerImpl::OnAnswerCreated webrtc_manager.cpp:226
std::__invoke_impl<void, void (*&)(preprocessor::p2p::WebRTCManager::WebRTCManagerImpl*, webrtc::SessionDescriptionInterface*), preprocessor::p2p::WebRTCManager::WebRTCManagerImpl*&, webrtc::SessionDescriptionInterface*> invoke.h:60
std::__invoke<void (*&)(preprocessor::p2p::WebRTCManager::WebRTCManagerImpl*, webrtc::SessionDescriptionInterface*), preprocessor::p2p::WebRTCManager::WebRTCManagerImpl*&, webrtc::SessionDescriptionInterface*> invoke.h:95
std::_Bind<void (*(preprocessor::p2p::WebRTCManager::WebRTCManagerImpl*, std::_Placeholder<1>))(preprocessor::p2p::WebRTCManager::WebRTCManagerImpl*, webrtc::SessionDescriptionInterface*)>::__call<void, webrtc::SessionDescriptionInterface*&&, 0ul, 1ul>(std::tuple<webrtc::SessionDescriptionInterface*&&>&&, std::_Index_tuple<0ul, 1ul>) functional:467
std::_Bind<void (*(preprocessor::p2p::WebRTCManager::WebRTCManagerImpl*, std::_Placeholder<1>))(preprocessor::p2p::WebRTCManager::WebRTCManagerImpl*, webrtc::SessionDescriptionInterface*)>::operator()<webrtc::SessionDescriptionInterface*, void>(webrtc::SessionDescriptionInterface*&&) functional:549
std::_Function_handler<void (webrtc::SessionDescriptionInterface*), std::_Bind<void (*(preprocessor::p2p::WebRTCManager::WebRTCManagerImpl*, std::_Placeholder<1>))(preprocessor::p2p::WebRTCManager::WebRTCManagerImpl*, webrtc::SessionDescriptionInterface*)> >::_M_invoke(std::_Any_data const&, webrtc::SessionDescriptionInterface*&&) std_function.h:316
std::function<void (webrtc::SessionDescriptionInterface*)>::operator()(webrtc::SessionDescriptionInterface*) const std_function.h:706
preprocessor::p2p::CreateSessionDescriptionObserver::OnSuccess webrtc_manager.cpp:79
webrtc::WebRtcSessionDescriptionFactory::OnMessage(rtc::Message*) 0x0000000000b90785
rtc::MessageQueue::Dispatch(rtc::Message*) 0x00000000005712f8
rtc::Thread::ProcessMessages(int) 0x0000000000553398
rtc::Thread::Run() 0x0000000000552993
rtc::Thread::PreRun(void*) 0x0000000000552950
start_thread 0x00007ffff76536db
clone 0x00007ffff608a88f
WebRTC logs:
(audio_processing_impl.cc:435): Capture analyzer activated: 0
Capture post processor activated: 0
Render pre processor activated: 0
(webrtc_voice_engine.cc:196): WebRtcVoiceEngine::WebRtcVoiceEngine
(webrtc_video_engine.cc:479): WebRtcVideoEngine::WebRtcVideoEngine()
(webrtc_voice_engine.cc:219): WebRtcVoiceEngine::Init
(webrtc_voice_engine.cc:227): Supported send codecs in order of preference:
(webrtc_voice_engine.cc:230): opus/48000/2 { minptime=10 useinbandfec=1 } (111)
(webrtc_voice_engine.cc:230): ISAC/16000/1 (103)
(webrtc_voice_engine.cc:230): ISAC/32000/1 (104)
(webrtc_voice_engine.cc:230): G722/8000/1 (9)
(webrtc_voice_engine.cc:230): ILBC/8000/1 (102)
(webrtc_voice_engine.cc:230): PCMU/8000/1 (0)
(webrtc_voice_engine.cc:230): PCMA/8000/1 (8)
(webrtc_voice_engine.cc:230): CN/32000/1 (106)
(webrtc_voice_engine.cc:230): CN/16000/1 (105)
(webrtc_voice_engine.cc:230): CN/8000/1 (13)
(webrtc_voice_engine.cc:230): telephone-event/48000/1 (110)
(webrtc_voice_engine.cc:230): telephone-event/32000/1 (112)
(webrtc_voice_engine.cc:230): telephone-event/16000/1 (113)
(webrtc_voice_engine.cc:230): telephone-event/8000/1 (126)
(webrtc_voice_engine.cc:233): Supported recv codecs in order of preference:
(webrtc_voice_engine.cc:236): opus/48000/2 { minptime=10 useinbandfec=1 } (111)
(webrtc_voice_engine.cc:236): ISAC/16000/1 (103)
(webrtc_voice_engine.cc:236): ISAC/32000/1 (104)
(webrtc_voice_engine.cc:236): G722/8000/1 (9)
(webrtc_voice_engine.cc:236): ILBC/8000/1 (102)
(webrtc_voice_engine.cc:236): PCMU/8000/1 (0)
(webrtc_voice_engine.cc:236): PCMA/8000/1 (8)
(webrtc_voice_engine.cc:236): CN/32000/1 (106)
(webrtc_voice_engine.cc:236): CN/16000/1 (105)
(webrtc_voice_engine.cc:236): CN/8000/1 (13)
(webrtc_voice_engine.cc:236): telephone-event/48000/1 (110)
(webrtc_voice_engine.cc:236): telephone-event/32000/1 (112)
(webrtc_voice_engine.cc:236): telephone-event/16000/1 (113)
(webrtc_voice_engine.cc:236): telephone-event/8000/1 (126)
(apm_helpers.cc:32): Setting AGC mode to 0
(audio_processing_impl.cc:699): Highpass filter activated: 0
(audio_processing_impl.cc:717): Gain Controller 2 activated: 0
(audio_processing_impl.cc:719): Pre-amplifier activated: 0
(webrtc_voice_engine.cc:309): WebRtcVoiceEngine::ApplyOptions: AudioOptions {aec: 1, agc: 1, ns: 1, hf: 1, swap: 0, audio_jitter_buffer_max_packets: 200, audio_jitter_buffer_fast_accelerate: 0, audio_jitter_buffer_min_delay_ms: 0, audio_jitter_buffer_enable_rtx_handling: 0, typing: 1, experimental_agc: 0, extended_filter_aec: 0, delay_agnostic_aec: 0, experimental_ns: 0, residual_echo_detector: 1, }
(render_delay_buffer.cc:341): Applying total delay of 5 blocks.
(matched_filter.cc:450): Filter 0: start: 0 ms, end: 128 ms.
(matched_filter.cc:450): Filter 1: start: 96 ms, end: 224 ms.
(matched_filter.cc:450): Filter 2: start: 192 ms, end: 320 ms.
(matched_filter.cc:450): Filter 3: start: 288 ms, end: 416 ms.
(matched_filter.cc:450): Filter 4: start: 384 ms, end: 512 ms.
(audio_processing_impl.cc:699): Highpass filter activated: 0
(audio_processing_impl.cc:717): Gain Controller 2 activated: 0
(audio_processing_impl.cc:719): Pre-amplifier activated: 0
(apm_helpers.cc:48): Echo control set to 1 with mode 0
(audio_processing_impl.cc:699): Highpass filter activated: 0
(audio_processing_impl.cc:717): Gain Controller 2 activated: 0
(audio_processing_impl.cc:719): Pre-amplifier activated: 0
(audio_processing_impl.cc:699): Highpass filter activated: 0
(audio_processing_impl.cc:717): Gain Controller 2 activated: 0
(audio_processing_impl.cc:719): Pre-amplifier activated: 0
(apm_helpers.cc:62): NS set to 1
(webrtc_voice_engine.cc:447): Stereo swapping enabled? 0
(webrtc_voice_engine.cc:452): NetEq capacity is 200
(webrtc_voice_engine.cc:458): NetEq fast mode? 0
(webrtc_voice_engine.cc:464): NetEq minimum delay is 0
(webrtc_voice_engine.cc:470): NetEq handle reordered packets? 0
(webrtc_voice_engine.cc:481): Delay agnostic aec is enabled? 0
(webrtc_voice_engine.cc:491): Extended filter aec is enabled? 0
(webrtc_voice_engine.cc:501): Experimental ns is enabled? 0
(webrtc_voice_engine.cc:511): Setting AGC to 1
(webrtc_voice_engine.cc:533): Typing detection is enabled? 1
(audio_processing_impl.cc:699): Highpass filter activated: 1
(audio_processing_impl.cc:717): Gain Controller 2 activated: 0
(audio_processing_impl.cc:719): Pre-amplifier activated: 0
(webrtc_sdp.cc:3255): Ignored line: a=sctpmap:5000 webrtc-datachannel 65535
(rtc_event_log_impl.cc:63): Creating legacy encoder for RTC event log.
(peer_connection_factory.cc:361): Using default network controller factory
(bitrate_prober.cc:69): Bandwidth probing enabled, set to inactive
(paced_sender.cc:421): ProcessThreadAttached 0xec072e20
(cpu_info.cc:53): Available number of cores: 8
(aimd_rate_control.cc:105): Using aimd rate control with back off factor 0.85
(remote_bitrate_estimator_single_stream.cc:71): RemoteBitrateEstimatorSingleStream: Instantiating.
(remote_estimator_proxy.cc:44): Maximum interval between transport feedback RTCP messages (ms): 250
(openssl_identity.cc:44): Making key pair
(peer_connection.cc:5531): Local and Remote descriptions must be applied to get the SSL Role of the SCTP transport.
(openssl_identity.cc:92): Returning key pair
(openssl_certificate.cc:58): Making certificate for WebRTC
(openssl_certificate.cc:108): Returning certificate
(p2p_transport_channel.cc:519): Set backup connection ping interval to 25000 milliseconds.
(p2p_transport_channel.cc:528): Set ICE receiving timeout to 2500 milliseconds
(p2p_transport_channel.cc:535): Set ping most likely connection to 0
(p2p_transport_channel.cc:542): Set stable_writable_connection_ping_interval to 2500
(p2p_transport_channel.cc:555): Set presume writable when fully relayed to 0
(p2p_transport_channel.cc:564): Set regather_on_failed_networks_interval to 300000
(p2p_transport_channel.cc:583): Set receiving_switching_delay to 1000
(jsep_transport_controller.cc:1214): Creating DtlsSrtpTransport.
(dtls_srtp_transport.cc:61): Setting RTCP Transport on 0 transport 0
(dtls_srtp_transport.cc:66): Setting RTP Transport on 0 transport dc004830
(p2p_transport_channel.cc:465): Received remote ICE parameters: ufrag=YAvY, renomination disabled
(peer_connection.cc:4185): Session: 7301418690559709073 Old state: kStable New state: kHaveRemoteOffer
(peer_connection.cc:5531): Local and Remote descriptions must be applied to get the SSL Role of the SCTP transport.
(peer_connection.cc:5559): Local and Remote descriptions must be applied to get the SSL Role of the session.
(paced_sender.cc:293): Elapsed time (12680 ms) longer than expected, limiting to 2000 ms
Signal: SIGSEGV (Segmentation fault)
I guess the problem is not in callback but in the connection initialization. But what am I doing wrong?
I've found the error in my code:
peer_connection_->SetRemoteDescription(&set_session_description_observer_, desc.get());
I passed the raw pointer then release the smart one with the memory.

uwsgi cannot process request longer than one minute

I have a Django application that is served within docker container via uwsgi. I have prepared a custom view just to reproduce the issue I'm mentioning. It looks exactly like below:
def get(self, request):
logger = logging.getLogger('ReleaseReport')
logger.critical('Entering and sleeping')
time.sleep(180)
logger.critical('Awaking')
return Response({'response': 'anything'})
The only thing it does (intentionally) is to log message, sleep for 3 minutes, and log another message afterwards.
Here is how the log file works after I try to visit the view from Firefox / Chrome / PyCharm's rest API client:
spawned uWSGI worker 9 (pid: 14, cores: 1)
spawned uWSGI worker 10 (pid: 15, cores: 1)
spawned uWSGI http 1 (pid: 16)
CRITICAL 2018-08-31 12:10:37,658 views Entering and sleeping
CRITICAL 2018-08-31 12:11:37,742 views Entering and sleeping
CRITICAL 2018-08-31 12:11:38,687 views Awaking
[pid: 10|app: 0|req: 1/1] 10.187.133.2 () {36 vars in 593 bytes} [Fri Aug 31 12:10:37 2018] GET /api/version/ => generated 5156 bytes in 61229 msecs (HTTP/1.1 200) 4 headers in 137 bytes (1 switches on core 0)
CRITICAL 2018-08-31 12:12:37,752 views Entering and sleeping
CRITICAL 2018-08-31 12:12:38,784 views Awaking
[pid: 15|app: 0|req: 1/2] 10.187.133.2 () {36 vars in 593 bytes} [Fri Aug 31 12:11:37 2018] GET /api/version/ => generated 5156 bytes in 61182 msecs (HTTP/1.1 200) 4 headers in 137 bytes (1 switches on core 0)
CRITICAL 2018-08-31 12:13:38,020 views Entering and sleeping
CRITICAL 2018-08-31 12:13:38,776 views Awaking
[pid: 10|app: 0|req: 2/3] 10.187.133.2 () {36 vars in 593 bytes} [Fri Aug 31 12:12:37 2018] GET /api/version/ => generated 5156 bytes in 61034 msecs (HTTP/1.1 200) 4 headers in 137 bytes (1 switches on core 0)
After one minute, the view seems to be executed again, and after another minute the third time is executed. Moreover, the log says it is HTTP 200, but the client doesn't receive the data and just says it cannot load it after few more minutes (depends on client). However the first HTTP 200 in log occurs much earlier before client gives up.
Any clues what may be causing that issue? Here is my uwsgi.ini:
[uwsgi]
http = 0.0.0.0:8000
chdir = /app
module = web_server.wsgi:application
pythonpath = /app
static-map = /static=/app/static
master = true
processes = 10
vacuum = true
The Dockerfile command is as following:
/usr/local/bin/uwsgi --ini /app/uwsgi.ini
In my real application, it causes that client thinks the request failed, but since it was actually executed and finished 3 times, there are 3 records in a database. Changing worker processes number to 1 doesn't change much. Instead of waitin one minute to spawn view again, it is spawned after previous one finishes.
What's wrong with my configuration?
EDIT:
I have changed my view a bit, it now accepts sleep time parameter and looks like this:
def get(self, request, minutes=None):
minutes = int(minutes)
original_minutes = minutes
logger = logging.getLogger(__name__)
while minutes > 0:
logger.critical(f'Sleeping, {minutes} more minutes...')
time.sleep(60)
minutes -= 1
logger.critical(f'Slept for {original_minutes} minutes...')
return Response({'slept_for': original_minutes})
Now, curling:
> curl http://test-host/api/test/0
{"slept_for":0}
> curl http://test-host/api/test/1
{"slept_for":1}
> curl http://test-host/api/test/2
curl: (52) Empty reply from server
In log:
CRITICAL 2018-08-31 14:23:36,200 views Slept for 0 minutes...
[pid: 10|app: 0|req: 1/14] 10.160.43.172 () {28 vars in 324 bytes} [Fri Aug 31 14:23:35 2018] GET /api/test/0 => generated 15 bytes in 265 msecs (HTTP/1.1 200) 4 headers in 129 bytes (1 switches on core 0)
CRITICAL 2018-08-31 14:23:42,878 views Slept for 0 minutes...
[pid: 10|app: 0|req: 2/15] 10.160.43.172 () {28 vars in 324 bytes} [Fri Aug 31 14:23:42 2018] GET /api/test/0 => generated 15 bytes in 1 msecs (HTTP/1.1 200) 4 headers in 129 bytes (1 switches on core 0)
CRITICAL 2018-08-31 14:23:46,370 views Sleeping, 1 more minutes...
CRITICAL 2018-08-31 14:24:46,380 views Slept for 1 minutes...
[pid: 10|app: 0|req: 3/16] 10.160.43.172 () {28 vars in 324 bytes} [Fri Aug 31 14:23:46 2018] GET /api/test/1 => generated 15 bytes in 60011 msecs (HTTP/1.1 200) 4 headers in 129 bytes (1 switches on core 0)
CRITICAL 2018-08-31 14:27:06,903 views Sleeping, 2 more minutes...
CRITICAL 2018-08-31 14:28:06,963 views Sleeping, 1 more minutes...
CRITICAL 2018-08-31 14:29:06,995 views Slept for 2 minutes...
[pid: 9|app: 0|req: 1/17] 10.160.43.172 () {28 vars in 324 bytes} [Fri Aug 31 14:27:06 2018] GET /api/test/2 => generated 15 bytes in 120225 msecs (HTTP/1.1 200) 4 headers in 129 bytes (1 switches on core 0)
If I use the same command to test server running with manage.py runserver, it does answer every time - no matter if I sleep 2 minutes or 10. So it is not the client fault.
I've changed harakiri to 3600, no change.
EDIT2 (my Dockerfile):
FROM python:3.7.0-alpine
ADD . /app
RUN set -ex \
&& apk add mysql-dev \
pcre-dev \
&& apk add --no-cache --virtual .build-deps \
gcc \
make \
libc-dev \
musl-dev \
linux-headers \
libffi-dev \
&& pip install --no-cache-dir -r /app/requirements.txt \
&& runDeps="$( \
scanelf --needed --nobanner --recursive /venv \
| awk '{ gsub(/,/, "\nso:", $2); print "so:" $2 }' \
| sort -u \
| xargs -r apk info --installed \
| sort -u \
)" \
&& apk add --virtual .python-rundeps $runDeps \
&& apk del .build-deps
WORKDIR /app
RUN mkdir -p static
RUN python manage.py collectstatic --clear --noinput
EXPOSE 8000
CMD ["/usr/local/bin/uwsgi", "--ini", "/app/uwsgi.ini"]
It actually was the Dockerfile issue. Previously I had uWSGI in my requirements.txt, so it was installed by pip install.
When I've removed it and added uwsgi-python3 to apk add, now everything is fine.
No idea why it matters (everything else was working fine), but it solved my issue.

Setting up pyspark on windows 10

I tried to install spark on my windows 10 machine. I have anacondo2 with python 2.7. I managed to open the ipython notebook instance. I am able to run the following lines:
airlines=sc.textFile("airlines.csv")
print (airlines)
But I get an error when I run: airlines.first()
Here's the error I get:
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-6-85a5d6f5110f> in <module>()
----> 1 airlines.first()
C:\spark\python\pyspark\rdd.py in first(self)
1326 ValueError: RDD is empty
1327 """
-> 1328 rs = self.take(1)
1329 if rs:
1330 return rs[0]
C:\spark\python\pyspark\rdd.py in take(self, num)
1308
1309 p = range(partsScanned, min(partsScanned + numPartsToTry, totalParts))
-> 1310 res = self.context.runJob(self, takeUpToNumLeft, p)
1311
1312 items += res
C:\spark\python\pyspark\context.py in runJob(self, rdd, partitionFunc, partitions, allowLocal)
932 mappedRDD = rdd.mapPartitions(partitionFunc)
933 port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)
--> 934 return list(_load_from_socket(port, mappedRDD._jrdd_deserializer))
935
936 def show_profiles(self):
C:\spark\python\pyspark\rdd.py in _load_from_socket(port, serializer)
137 break
138 if not sock:
--> 139 raise Exception("could not open socket")
140 try:
141 rf = sock.makefile("rb", 65536)
Exception: could not open socket
I get a different error when I execute: airlines.collect()
Here's the error:
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-5-3745b2fa985a> in <module>()
1 # Using the collect operation, you can view the full dataset
----> 2 airlines.collect()
C:\spark\python\pyspark\rdd.py in collect(self)
775 with SCCallSiteSync(self.context) as css:
776 port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
--> 777 return list(_load_from_socket(port, self._jrdd_deserializer))
778
779 def reduce(self, f):
C:\spark\python\pyspark\rdd.py in _load_from_socket(port, serializer)
140 try:
141 rf = sock.makefile("rb", 65536)
--> 142 for item in serializer.load_stream(rf):
143 yield item
144 finally:
C:\spark\python\pyspark\serializers.py in load_stream(self, stream)
515 try:
516 while True:
--> 517 yield self.loads(stream)
518 except struct.error:
519 return
C:\spark\python\pyspark\serializers.py in loads(self, stream)
504
505 def loads(self, stream):
--> 506 length = read_int(stream)
507 if length == SpecialLengths.END_OF_DATA_SECTION:
508 raise EOFError
C:\spark\python\pyspark\serializers.py in read_int(stream)
541
542 def read_int(stream):
--> 543 length = stream.read(4)
544 if not length:
545 raise EOFError
C:\Users\AS\Anaconda2\lib\socket.pyc in read(self, size)
382 # fragmentation issues on many platforms.
383 try:
--> 384 data = self._sock.recv(left)
385 except error, e:
386 if e.args[0] == EINTR:
error: [Errno 10054] An existing connection was forcibly closed by the remote host
Please help.
INSTALL PYSPARK on Windows 10
JUPYTER-NOTEBOOK With ANACONDA NAVIGATOR
STEP 1
Download Packages
1) spark-2.2.0-bin-hadoop2.7.tgz Download
2) java jdk 8 version Download
3) Anaconda v 5.2 Download
4) scala-2.12.6.msi Download
5) hadoop v2.7.1Download
STEP 2
MAKE SPARK FOLDER IN C:/ DRIVE AND PUT EVERYTHING INSIDE IT
It will look like this
NOTE : DURING INSTALLATION OF SCALA GIVE PATH OF SCALA INSIDE SPARK FOLDER
STEP 3
NOW SET NEW WINDOWS ENVIRONMENT VARIABLES
HADOOP_HOME=C:\spark\hadoop
JAVA_HOME=C:\Program Files\Java\jdk1.8.0_151
SCALA_HOME=C:\spark\scala\bin
SPARK_HOME=C:\spark\spark\bin
PYSPARK_PYTHON=C:\Users\user\Anaconda3\python.exe
PYSPARK_DRIVER_PYTHON=C:\Users\user\Anaconda3\Scripts\jupyter.exe
PYSPARK_DRIVER_PYTHON_OPTS=notebook
NOW SELECT PATH OF SPARK : EDIT AND ADD NEW
Add "C:\spark\spark\bin” to variable “Path” Windows
STEP 4
Make folder where you want to store Jupyter-Notebook outputs and files
After that open Anaconda command prompt and cd Folder name
then enter Pyspark
thats it your browser will pop up with Juypter localhost
STEP 5
Check pyspark is working or not !
Type simple code and run it
from pyspark.sql import Row
a = Row(name = 'Vinay' , age=22 , height=165)
print("a: ",a)

Github activity feed urls dont include domain

I want to display a user's activity feed in an rails application. I am using feedjira.
2.2.4 :006 > xml_feed = Feedjira::Feed.fetch_raw "https://github.com/prasadsurase.atom"
2.2.4 :006 > github_feed = Feedjira::Feed.parse xml_feed
2.2.4 :006 > github_feed.entries.first.content
=> "<!-- pull_request -->\n<svg aria-label=\"Pull request\" class=\"octicon octicon-git-pull-request dashboard-event-icon\" height=\"32\" role=\"img\" version=\"1.1\" viewBox=\"0 0 12 16\" width=\"24\"><path d=\"M11 11.28V5c-.03-.78-.34-1.47-.94-2.06C9.46 2.35 8.78 2.03 8 2H7V0L4 3l3 3V4h1c.27.02.48.11.69.31.21.2.3.42.31.69v6.28A1.993 1.993 0 0 0 10 15a1.993 1.993 0 0 0 1-3.72zm-1 2.92c-.66 0-1.2-.55-1.2-1.2 0-.65.55-1.2 1.2-1.2.65 0 1.2.55 1.2 1.2 0 .65-.55 1.2-1.2 1.2zM4 3c0-1.11-.89-2-2-2a1.993 1.993 0 0 0-1 3.72v6.56A1.993 1.993 0 0 0 2 15a1.993 1.993 0 0 0 1-3.72V4.72c.59-.34 1-.98 1-1.72zm-.8 10c0 .66-.55 1.2-1.2 1.2-.65 0-1.2-.55-1.2-1.2 0-.65.55-1.2 1.2-1.2.65 0 1.2.55 1.2 1.2zM2 4.2C1.34 4.2.8 3.65.8 3c0-.65.55-1.2 1.2-1.2.65 0 1.2.55 1.2 1.2 0 .65-.55 1.2-1.2 1.2z\"></path></svg>\n\n<div class=\"time\">\n <relative-time datetime=\"2016-09-23T07:30:24Z\">Sep 23, 2016</relative-time>\n</div>\n\n<div class=\"title\">\n prasadsurase opened pull request joshsoftware/code-curiosity#136\n</div>\n\n<div class=\"details\">\n <img alt=\"#prasadsurase\" class=\"gravatar\" height=\"30\" src=\"https://avatars3.githubusercontent.com/u/562052?v=3&s=60\" width=\"30\" />\n <div class=\"message\">\n <blockquote>Fixed controller method scopes. Removed unwanted routes.</blockquote>\n <div class=\"pull-info\">\n <svg aria-hidden=\"true\" class=\"octicon octicon-git-commit\" height=\"16\" version=\"1.1\" viewBox=\"0 0 14 16\" width=\"14\"><path d=\"M10.86 7c-.45-1.72-2-3-3.86-3-1.86 0-3.41 1.28-3.86 3H0v2h3.14c.45 1.72 2 3 3.86 3 1.86 0 3.41-1.28 3.86-3H14V7h-3.14zM7 10.2c-1.22 0-2.2-.98-2.2-2.2 0-1.22.98-2.2 2.2-2.2 1.22 0 2.2.98 2.2 2.2 0 1.22-.98 2.2-2.2 2.2z\"></path></svg>\n <em>1</em> commit with\n <em>34</em> additions and\n <em>31</em> deletions\n </div>\n </div>\n</div>\n"
when rendering the content for every entry from github_feed.entries in haml as raw, the links do not contain the domain https://github.com/ but contain only the path. This causes problem as such that the rendered links on the UI contain the app domain and not github. How do we fix this?
Since github atom feeds provides the content with relative url's you might want to replace them with absolute references.
This could be done with
gsub( %r{<a href=\"/},'<a href="https://github.com/')
And thus replace your:
2.2.4 :006 > github_feed.entries.first.content
with
2.2.4 :006 > github_feed.entries.first.content.gsub( %r{<a href=\"/},'<a href="https://github.com/')
=> "<!-- pull_request_review_comment -->\n<svg aria-label=\"Review pull request comment\" class=\"octicon octicon-comment-discussion dashboard-event-icon\" height=\"32\" role=\"img\" version=\"1.1\" viewBox=\"0 0 16 16\" width=\"32\"><path d=\"M15 1H6c-.55 0-1 .45-1 1v2H1c-.55 0-1 .45-1 1v6c0 .55.45 1 1 1h1v3l3-3h4c.55 0 1-.45 1-1V9h1l3 3V9h1c.55 0 1-.45 1-1V2c0-.55-.45-1-1-1zM9 11H4.5L3 12.5V11H1V5h4v3c0 .55.45 1 1 1h3v2zm6-3h-2v1.5L11.5 8H6V2h9v6z\"></path></svg>\n\n<div class=\"time\">\n <relative-time datetime=\"2016-09-28T03:21:31Z\">Sep 28, 2016</relative-time>\n</div>\n\n<div class=\"title\">\n prasadsurase commented on pull request joshsoftware/code-curiosity#137\n</div>\n\n<div class=\"details\">\n <img alt=\"#prasadsurase\" class=\"gravatar\" height=\"30\" src=\"https://avatars3.githubusercontent.com/u/562052?v=3&s=60\" width=\"30\" />\n <div class=\"message markdown-body\">\n <blockquote>\n <p>#BandanaPandey Same here.</p>\n </blockquote>\n </div>\n</div>\n"

Tuning gunicorn (with Django): Optimize for more concurrent connections and faster connections

I use Django 1.5.3 with gunicorn 18.0 and lighttpd. I serve my static and dynamic content like that using lighttpd:
$HTTP["host"] == "www.mydomain.com" {
$HTTP["url"] !~ "^/media/|^/static/|^/apple-touch-icon(.*)$|^/favicon(.*)$|^/robots\.txt$" {
proxy.balance = "hash"
proxy.server = ( "" => ("myserver" =>
( "host" => "127.0.0.1", "port" => 8013 )
))
}
$HTTP["url"] =~ "^/media|^/static|^/apple-touch-icon(.*)$|^/favicon(.*)$|^/robots\.txt$" {
alias.url = (
"/media/admin/" => "/var/www/virtualenvs/mydomain/lib/python2.7/site-packages/django/contrib/admin/static/admin/",
"/media" => "/var/www/mydomain/mydomain/media",
"/static" => "/var/www/mydomain/mydomain/static"
)
}
url.rewrite-once = (
"^/apple-touch-icon(.*)$" => "/media/img/apple-touch-icon$1",
"^/favicon(.*)$" => "/media/img/favicon$1",
"^/robots\.txt$" => "/media/robots.txt"
)
}
I already tried to run gunicorn (via supervisord) in many different ways, but I cant get it better optimized than it can handle about 1100 concurrent connections. In my project I need about 10000-15000 connections
command = /var/www/virtualenvs/myproject/bin/python /var/www/myproject/manage.py run_gunicorn -b 127.0.0.1:8013 -w 9 -k gevent --preload --settings=myproject.settings
command = /var/www/virtualenvs/myproject/bin/python /var/www/myproject/manage.py run_gunicorn -b 127.0.0.1:8013 -w 10 -k eventlet --worker_connections=1000 --settings=myproject.settings --max-requests=10000
command = /var/www/virtualenvs/myproject/bin/python /var/www/myproject/manage.py run_gunicorn -b 127.0.0.1:8013 -w 20 -k gevent --settings=myproject.settings --max-requests=1000
command = /var/www/virtualenvs/myproject/bin/python /var/www/myproject/manage.py run_gunicorn -b 127.0.0.1:8013 -w 40 --settings=myproject.settings
On the same server there live about 10 other projects, but CPU and RAM is fine, so this shouldnt be a problem, right?
I ran a load test and these are the results:
At about 1100 connections my lighttpd errorlog says something like that, thats where the load test shows the drop of connections:
2013-10-31 14:06:51: (mod_proxy.c.853) write failed: Connection timed out 110
2013-10-31 14:06:51: (mod_proxy.c.939) proxy-server disabled: 127.0.0.1 8013 83
2013-10-31 14:06:51: (mod_proxy.c.1316) no proxy-handler found for: /
... after about one minute
2013-10-31 14:07:02: (mod_proxy.c.1361) proxy - re-enabled: 127.0.0.1 8013
These things also appear ever now and then:
2013-10-31 14:06:55: (network_linux_sendfile.c.94) writev failed: Connection timed out 600
2013-10-31 14:06:55: (mod_proxy.c.853) write failed: Connection timed out 110
...
2013-10-31 14:06:57: (mod_proxy.c.828) establishing connection failed: Connection timed out
2013-10-31 14:06:57: (mod_proxy.c.939) proxy-server disabled: 127.0.0.1 8013 45
So how can I tune gunicorn/lighttpd to serve more connections faster? What can I optimize? Do you know any other/better setup?
Thanks alot in advance for your help!
Update: Some more server info
root#django ~ # top
top - 15:28:38 up 100 days, 9:56, 1 user, load average: 0.11, 0.37, 0.76
Tasks: 352 total, 1 running, 351 sleeping, 0 stopped, 0 zombie
Cpu(s): 33.0%us, 1.6%sy, 0.0%ni, 64.2%id, 0.4%wa, 0.0%hi, 0.7%si, 0.0%st
Mem: 32926156k total, 17815984k used, 15110172k free, 342096k buffers
Swap: 23067560k total, 0k used, 23067560k free, 4868036k cached
root#django ~ # iostat
Linux 2.6.32-5-amd64 (django.myserver.com) 10/31/2013 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
33.00 0.00 2.36 0.40 0.00 64.24
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 137.76 980.27 2109.21 119567783 257268738
sdb 24.23 983.53 2112.25 119965731 257639874
sdc 24.25 985.79 2110.14 120241256 257382998
md0 0.00 0.00 0.00 400 0
md1 0.00 0.00 0.00 284 6
md2 1051.93 38.93 4203.96 4748629 512773952
root#django ~ # netstat -an |grep :80 |wc -l
7129
Kernel Settings:
echo "10152 65535" > /proc/sys/net/ipv4/ip_local_port_range
sysctl -w fs.file-max=128000
sysctl -w net.ipv4.tcp_keepalive_time=300
sysctl -w net.core.somaxconn=250000
sysctl -w net.ipv4.tcp_max_syn_backlog=2500
sysctl -w net.core.netdev_max_backlog=2500
ulimit -n 10240