Download Remote HDFS Files to My Local Mac - hdfs

I need to download files from a HDFS file system to my Mac local machine:
import os
import pyhdfs
os.environ["http_proxy"] = "http://host:port"
os.environ["https_proxy"] = "http://host:port"
os.environ["no_proxy"] = "host_x,host_y,host_x,machine_name1,machine_name2,machine_name3"
client = pyhdfs.HdfsClient(hosts='host_x,host_y,host_z', user_name='username')
I got expected, desired output by running client.get_home_directory(), client.listdir('/some_remote_hadoop_directory/') and client.exists("/some_remote_hadoop_directory/file"). client.exists("/some_remote_hadoop_directory/file") returned True which was also a good sign. However, the command client.copy_to_local("/some_remote_hadoop_directory/file", "/my_local_mac_directory/file") gave me the following errors:
gaierror: [Errno 8] nodename nor servname provided, or not known
NewConnectionError: <urllib3.connection.HTTPConnection object at 0x1058b47c0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known
MaxRetryError: HTTPConnectionPool(host='host', port=port): Max retries exceeded with url: some_hadoop_url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1058b47c0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
ConnectionError: HTTPConnectionPool(host='host', port=port): Max retries exceeded with url: some_hadoop_url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1058b47c0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
I then checked online for solution and tried the one (i.e. modify /etc/hosts from https://stackoverflow.com/a/43549848/6693221) which had been shared by many across different platforms. However, this did not work for me. It seemed that my Mac could "talk" to the remote machine but failed to fetch data from it. I turned to configure my Mac by turning on System Preferences -> Sharing -> File Sharing and changed to everyone can read & write but still failed. I know this was meant to be easy but still had no clue what was really missing. I did this as a workaround for my previous question: How to read remote HDFS parquets from my local PySpark?

Related

gcloud crashed (TransportError)

From last two weeks getting gcloud crashed error for my Wifi- Tikona. But when I connected to mobile hot (Airtel) spot then it is getting connected. How to resolve this error?
gcloud crashed (TransportError): HTTPSConnectionPool(host='oauth2.googleapis.com', port=443): Max retries exceeded with url: /token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa4cf786610>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

GCP : Listing available projects failed: HTTPSConnectionPool(host='oauth2.googleapis.com', port=443):

When i try to load the project list in gcp on my terminal, i see this error:
gcloud projects --list gives this error:
I am using mac with m1 chip, please help.
Listing available projects failed: HTTPSConnectionPool(host='oauth2.googleapis.com', port=443): Max retries exceeded with url: /token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fe1280833c8>: Failed to establish a new connection: [Errno 60] Operation timed out'))

Solr in Flask web app - Failed to connect to server - Failed to establish a new connection: [Errno 111] Connection refused

I have indexed files successfully using Solr, and queried the core successfully by using pysolr. The code is working fine in the Pycharm Terminal. Now I am trying to convert the code to a web app using Flask on pythonanywhere.com. However when I put the following line in the code for the web app I get an error:
solr = Solr('http://localhost:8983/solr/techproducts')
The error that I am getting is:
pysolr.SolrError: Failed to connect to server at http://localhost:8983/solr/techproducts/select/?q=Sumedh&wt=json: HTTPConnectionPool(host='localhost', port=8983): Max retries exceeded with url: /solr/techproducts/select/?q=Sumedh&wt=json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa57c3c28d0>: Failed to establish a new connection: [Errno 111] Connection refused',))
If someone can please help me in solving this error I would be really grateful.

Plotly installation fails to establish new connection

I usually use a proxy at my workplace such that I have:
C:\myPath>echo %https_proxy%
http://myUser:myPassword#myProxy:myPort
Now I am at home (it means.. no proxy!) and I am trying to install a package ("plotly"):
C:\myPath>conda install -c plotly plotly=1.3.2
Fetching package metadata ...
CondaHTTPError: HTTP None None
for url <None>
An HTTP error occurred when trying to retrieve this URL.
ProxyError(MaxRetryError("HTTPSConnectionPool(host='conda.anaconda.org', port=443): Max retries exceeded with url: /plot
ly/win-64/repodata.json (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<requests.packages.urllib3
.connection.VerifiedHTTPSConnection object at 0x0000000004D786A0>: Failed to establish a new connection: [Errno 11004] g
etaddrinfo failed',)))",),)
In order to remove the proxy settings without removing the environment variables I tried:
set http_proxy = ""
set https_proxy= ""
but the situation does not change.
I have also tried:
C:\myPath>pip install plotly
Collecting plotly
Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'ProxyError('Cannot conne
ct to proxy.', NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0
x0000000003C6C400>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))': /simple/plotly/
The questions are:
- is the problem my proxy settings?
- if it is, how can I change them?
I have solved in a very simple way. The mistake was in the way I removed the proxy. I set:
C:\myPath>set https_proxy=
and the command
C:\myPath>conda install -c plotly plotly=1.3.2
worked!
WARNING: the commands http(s)_proxy= have to contain no blank spaces.

error: [Errno 10060] A connection attempt failed when using solrpy in python

when i am trying to run the following python code which uses solrpy,
enter code here
import solr
s = solr.SolrConnection('http://example.org:8080/solr')
s.add(id=1, title='Lucene in Action', author=['hello','lucky'])
s.commit()
response = s.query('title:lucene')
for hit in response.results:
print hit['title']
i am getting the following error:
error: [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
You should point to a valid Solr (are you sure there is a Solr running at example.org?)