Unable to connect to Huggingface from EC2 instance - amazon-web-services

I am running a python code in EC2 instance where I am loading a Huggingface model using the from_pretrained() method. I get the error
OSError: Couldn't reach server at 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json' to download pretrained model configuration file.
while trying to initialize the reader. To get over this, I downloaded the file manually and provided the local JSON path. That worked fine but then I see issues in loading the tokenizer too.
OSError: Couldn't reach server at '{}' to download vocabulary files.
I think my network settings of EC2 are not correct due to which I am unable to connect to external Huggingface repository.
I tried relaxing the inbound rules for EC2 to IP version|Type|Protocol|Port range|Destination=>IPv4|All|traffic|All|All|0.0.0.0/0 but even that doesn't help. The outbound rules are already IPv4|All|traffic|All|All|0.0.0.0/0.
I also tried creating an IAM role with policy AmazonS3ReadOnlyAccess and attached it to the EC2 instance but still getting the same error.
Could someone point what needs to be done to solve this. Thanks.

Here is how i fixed this issue.
i installed pyopenssl like this :
!pip install pyopenssl
then i restarted terminal and re-ran the code and it fixed the issue for me,thanks

might be your network is using proxy
this might help
$ proxies = {"http": 'foo.bar:3128', addyourproxy:'foo.bar:4012'}
$ from transformers import pipeline
$ qt_ans = pipeline('question-answering')

Related

Jupyter notebook permission denied, unable to save

I am having permission denied for Jupyter notebook running on Ubuntu, AWS E2 instance. Below is the image for the error
I have tried to use other browser, reinstall it and clear cache. None of it works.
If your notebook is configured with some auth code (generated by notebook by default), and you've cleared your cache, and cookies, you will get the screen similar to below:
Here you can use the token shown in terminal, while running the notebook, the log will be similar to the below:
To access the notebook, open this file in a browser:
file:///home/ubuntu/.local/share/jupyter/runtime/nbserver-19740-open.html
Or copy and paste one of these URLs:
http://ip-123-1-1-123:8888/?token=abcdefghijl
or http://127.0.0.1:8888/?token=abcdefghijl
and if you want to run notebook without authentication you can run the below command:
jupyter notebook --ip='0.0.0.0' --NotebookApp.token='' --NotebookApp.password=''
Also try creating a new notebook, instead of accessing the Test.ipynb again, so that you can figure out if the issue is from the specific file or whole server.
The main reason is i don't have the permission to overwrite the file. First of all, i create the file at /home/ubuntu, which I don't have permission of. So i create a folder to store the file inside it. Other than that, i also done quite a lot modifications including adding the inbound rules and the permissions, i think some of it did help. I list out some of the website i think is very useful in tackling this issue:
PermissionError: [Errno 13] Permission denied: Cannot open Jupyter on Browser despite running correctly on AWS EC2 instance
https://stackoverflow.com/questions/53097180/permissionerror-errno-13-permission-denied-when-accessing-to-aws-ec2#:~:text=Make%20sure%20you%20type%20https,with%20version%2070%20or%20newer.

Filebeat and AWS Elasticsearch - Not Working

I have good experience in working with Elasticsearch, I have worked with version 2.4 and now trying to learn new Elasticsearch.
I am trying to implement Filebeat to send my apache and system logs to my Elasticsearch endpoint. To save my time I preferred to launch a t2.medium single node instance over AWS Elasticsearch Service under the public domain and I have attached the access policy to allow everyone to access the cluster.
The AWS Elasticsearch instance is up and running healthy.
I launched a Ubuntu(18.04) server, downloaded the filebeat tar and made the following configuration in filebeat.yml:
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
# Array of hosts to connect to.
hosts: ["https://my-public-test-domain.ap-southeast-1.es.amazonaws.com:443"]
18.04- # Optional protocol and basic auth credentials.
#protocol: "https"
#username: "elastic"
#password: "changeme"
I enabled the required modules :
filebeat modules enable system apache
Then as per the filebeat documentation I changed the ownership of the filebeat file and started the filebeat with the following commands :
sudo chown root filebeat.yml
sudo ./filebeat -e
When I started the filebeat I faced the following permission and ownership issues :
Error loading config from file '/home/ubuntu/beats/filebeat-7.2.0-linux-x86_64/modules.d/system.yml', error invalid config: config file ("/home/ubuntu/beats/filebeat-7.2.0-linux-x86_64/modules.d/system.yml") must be owned by the user identifier (uid=0) or root
To resolve this I changed the ownership for the files which were throwing errors.
When I restarted the filebeat service , I started facing the following issue :
Connection marked as failed because the onConnect callback failed: cannot retrieve the elasticsearch license: unauthorized access, could not connect to the xpack endpoint, verify your credentials
Going through this link , I found that to work with AWS Elasticsearch I will need Beats OSS versions.
So I again downloaded the OSS version for beat from this link and followed the same procedure as above, but still no luck. Now I am facing the following errors :
Error 1:
Attempting to reconnect to backoff(elasticsearch(https://my-public-test-domain.ap-southeast-1.es.amazonaws.com:443)) with 12 reconnect attempt(s)
Error 2:
Failed to connect to backoff(elasticsearch(https://my-public-test-domain.ap-southeast-1.es.amazonaws.com:443)): Connection marked as failed because the onConnect callback failed: 1 error: Error loading pipeline for fileset system/auth: This module requires an Elasticsearch plugin that provides the geoip processor. Please visit the Elasticsearch documentation for instructions on how to install this plugin. Response body: {"error":{"root_cause":[{"type":"parse_exception","reason":"No processor type exists with name [geoip]","header":{"processor_type":"geoip"}}],"type":"parse_exception","reason":"No processor type exists with name [geoip]","header":{"processor_type":"geoip"}},"status":400}
From the second error I can understand that the geoip plugin is not available because of which I facing this error.
What else needs to be done to get this working?
Has anyone been to successfully connect Beats to AWS Elasticsearch?
What other steps I could to take to mitigate the above issue?
Envrionment Details:
AWS Elasticsearch Version : 6.7
File Beat : 7.2.0
First, you need to use OSS version of filebeat with AWS ES https://www.elastic.co/downloads/beats/filebeat-oss
Second, AWS ElasticSearch does not provide GeoIP module, so you will need to edit pipelines for any of the default modules you want to use, and make sure GeoIP is removed/commented out.
For example in /usr/share/filebeat/module/system/auth/ingest/pipeline.json (that's the path when installed from deb package - your path will be different of course) comment out:
{
"geoip": {
"field": "source.ip",
"target_field": "source.geo",
"ignore_failure": true
}
},
Repeat the same for apache module.
I've spent hours trying to make filebeat iis module works with AWS elasticsearch. I kept getting ingest-geoip error, Below fixed the issue.
For windows iis logs, AWS elasticsearch remove geoip from filebeat module configuration:
C:\Program Files (x86)\filebeat\module\iis\access\ingest\default.json
C:\Program Files (x86)\filebeat\module\iis\access\manifest.yml
C:\Program Files (x86)\filebeat\module\iis\error\ingest\default.json
C:\Program Files (x86)\filebeat\module\iis\error\manifest.yml

Problems using Halyard to configure GCP as cloud provider in a new Spinnaker installation when my GCP Network is not called 'Default'

I'm using this guidelines:
https://www.spinnaker.io/setup/install/providers/gce/
When I'm running this line:
hal config provider google account add my-gce-account --project $PROJECT --json-path $SERVICE_ACCOUNT_DEST
I get an error:
Problems in default.provider.google: ! ERROR Network default not found via any configured google
When I run the entire process in a project that has a 'default' named network, it works fine.
I was not able to find how I say to Halyard that my network has another name.
Could someone help me?
Thanks
#Thiago you might have to go an edit .hal/config and modify network: default to network: <your-custom-network-name> and followed by hal deploy apply. We ran into same issue, this is a dirty hack that worked for us.
I had a similar problem, for my setup it turned out that bakery was running in the wrong zone, so expanding on the previous response:
$ hal config provider google bakery edit --network <your-custom-network-name>
$ hal config provider google bakery edit --zone <your-zone>
$ sudo hal deploy apply
The full spec for options can be found here:
https://www.spinnaker.io/reference/halyard/commands/#hal-config-provider-google-bakery-edit
Thanks to Sweeti Bharti's response, it led me to that response :)

deploying play framework on Amazon EC2 using `sbt dist`

I am trying to deploy my play framework 2.6.x to the amazon EC2 instance.
I have made a zip successfully using sbt dist command. I have then copied the zip file to /opt/{project-name}/ folder and unziped it there. I then tried running the application as a daemon with the following parameters and few more
-Dhttp.port=80
-Dplay.http.secret.key={my-secret}
-Dconfig.file=/path/to/conf/prod.conf
When I tried running the application using my EC2 public IPv4
ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com/login
I got the host not allowed error for that IP. I then added the following to my conf/application.conf file:
play.filters.enabled += play.filters.hosts.AllowedHostsFilter
play.filters.hosts {
allowed = ["ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com", "localhost:9000", ".compute-1.amazonaws.com"]
}
After doing all this, when I try to run the application I still get the following host not allowed error:
BAD REQUEST
For request 'GET /login' [Host not allowed: ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com]
Any help would be much much appreciated. Thanks a ton in advance!

Adding JDBC jar driver to classpath for AWS Elastic Beanstalk job

I have an Elastic Beanstalk application that I'm trying to configure to connect to a FileMaker Pro database, over JDBC. The code I'm using is:
import jaydebeapi as jdp
jdbc_driver_location = '/tmp/fmjdbc.jar'
conn = jdb.connect(jdbc_driver_class,
jdbc_connection_type + '://' + db_url + '/' + db_name,
[user_name, password], jdbc_driver_location,)
When I attempt this, I get the following error:
java.sql.SQLException: No suitable driver found for jdbc:filemaker://10.120.120.108/carecord-<class 'jpype._jexception.java.sql.SQLExceptionPyRaisable'>
To try and solve to problem, I've added the jdbc.jar to both the /tmp folder of the Ec2 instance, as well as included it in the project directory. When if I SSH into the EC2 instance and issue the command:
JAVA_HOME=/tmp/fmjdbc.jar
The program will run the next time it's prompted, without issue. After a few hours it will give the original error and need to be issued the above command again to work. To fix this I tried adding the following to /.ebextensions, to copy the .jar into the tmp folder from the project directory and issue the above command to the server from the start:
commands:
command01:
command: sudo cp /opt/python/current/app/fmjdbc.jar /tmp/fmjdbc.jar
command02:
command: JAVA_HOME=/tmp/fmjdbc.jar
But the project still gives the error. Any thoughts on how I can add this driver to the classpath such that the job will run consistently?
To help folks who have this issue in the future, the answer to this that I found was at the end of this thread.
I appended the following:
if jpype.isJVMStarted() and not jpype.isThreadAttachedToJVM():
jpype.attachThreadToJVM()
jpype.java.lang.Thread.currentThread().setContextClassLoader(jpype.java.lang.ClassLoader.getSystemClassLoader())
Just above the
jdbc_driver_location = '/tmp/fmjdbc.jar'
section of my original code above. This allows the application to loop and successfully find the necessary driver.
JAVA_HOME is supposed to point to the location where Java is installed on the server. You don't use JAVA_HOME to add libraries to the classpath. You shouldn't have to set any environment variables for your code to work.
The root of your problem is that you are copying the file to /tmp/fmjdbc.jar but you are setting jdbc_driver_location to be /tmp/jdbc.jar. Notice how those file names are different. To fix your code change it to this:
jdbc_driver_location = '/tmp/fmjdbc.jar'