Does youtube-dl still work(newest version youtube-dl-2020.2.16)? - youtube-dl

Used command:
youtube-dl --max-filesize 30m -f m4a -o "/home/dc3014b3c6a1a23bba88b2a1fbcc1447.m4a" "https://www.youtube.com/watch?v=_Xa0ydtx8PM"
youtube-dl can't work at all for me. Error happened like these:
ERROR: Unable to download webpage: <urlopen error EOF occurred in violation of protocol (_ssl.c:618)> (caused by URLError(SSLEOFError(8, u'EOF occurred in violation of protocol (_ssl.c:618)'),))
OR
ERROR: Unable to download webpage: HTTP Error 429: Too Many Requests (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
But When I use curl command to get url content, it's ok.
curl -v https://www.youtube.com/watch?v=_Xa0ydtx8PM
How can I resolve it?

From the error message, You might want to be sure you are using the latest version of youtube-dl. you might want to update it. Am assuming you are using a *nix system. Also depending on how you first installed, there are several options for updating. here are a few options;
For manual installations:
you can simply run youtube-dl -U or, on Linux, sudo youtube-dl -U.
If you are already running on an update version, you may want to consider the below as best methods to download videos. Mind you that with the new version of youtube-dl, it automatically downloads the best version for you so you do not need to specify although you could still do this to be sure.
# Download best mp4 format available or any other best if no mp4 available
$ youtube-dl -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best'
# Download best format available but no better than 480p
$ youtube-dl -f 'bestvideo[height<=480]+bestaudio/best[height<=480]'
# Download best video only format but no bigger than 50 MB
$ youtube-dl -f 'best[filesize<50M]'
# Download best format available via direct link over HTTP/HTTPS protocol
$ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'
# Download the best video format and the best audio format without merging them
$ youtube-dl -f 'bestvideo,bestaudio' -o '%(title)s.f%(format_id)s.%(ext)s'
Here is a link for reference and further instructions/support.
I hope this helps. If not, let me know. Glad to help to the end.

Just wanted to share the youtube-dl alternative (it is just a fork) that for now (September 2022) works fine and supports almost all youtube-dl features yt-dlp.
I switched to this project and now I am using the same scripts which I had have previously.

Related

Wget - Downloading lots of files recursively is taking a long time

Currently I am trying to download a large dataset (200k+ of large images) Its all stored on google cloud. The authors provide a wget script to download it:
wget -r -N -c -np --user username --ask-password https://alpha.physionet.org/files/mimic-cxr/2.0.0/
Now it downloads etc, but its been 2 days and its still going and I don't know how long its going to take. AFAIK its downloading each file individually. is there a way for me to download it in parallel?
EDIT: I don't have sudo access to the machine doing the downloading. I just have user access.
wget is a great tool but it is not designed to be efficient for downloading 200K files.
You can either wait for it to finish or find another tool that does parallel downloads provided that you have a fast Internet connection to support parallel downloads which might decrease the time by half over wget.
Since the source is an HTTPS web server, there really is not much you can do to speed this up besides downloading two to four files in parallel. Depending on your Internet speed, distance to the source server, you might not achieve any improvement with parallel downloads.
Note: You do not specify what you are downloading onto. If the destination is a Compute Engine VM, and you picked a tiny one (f1-micro) you may be resource limited. For any hi-speed data transfer pick at least an n1 instance size.
If you don't know the urls then use the good old httrack website copier to download files in parallel:
httrack -v -w https://user:password#example.com/
Default is 8 parallel connections but you can use cN option to increase it.
If the files are large you can use aria2c this will download single file with multiple threads:
aria2c -x 16 url
You could find out if the files are store in GCS, if so then you can just use
gsutil -m <src> <destination>
This will download files in multithreaded mode
Take a look at the updated official MIMIC-CXR https://mimic-cxr.mit.edu/about/download/downloads page.
There you'll find the info how to download via wget (locally) and gsutil (Google Cloud Storage)

ERROR: manifest for hyperledger/fabric-couchdb:latest not found

I was following the tutorial listed here to create my first blockchain network. but when I run ./startfabric.sh it gives me the error
# don't rewrite paths for Windows Git Bash users
export MSYS_NO_PATHCONV=1
docker-compose -f docker-compose.yml down
Removing network net_basic
docker-compose -f docker-compose.yml up -d ca.example.com orderer.example.com peer0.org1.example.com couchdb
Creating network "net_basic" with the default driver
Pulling couchdb (hyperledger/fabric-couchdb:latest)...
ERROR: manifest for hyperledger/fabric-couchdb:latest not found
Any help on how I could fix this?
I think you have missed the step with "Download Platform-specific Binaries", which basically takes care to download all relevant images and binary files, such that you won't have to compile them.
Please download the version of couchdb manually.
docker pull "hyperledger/fabric-couchdb:<version>"
create a tag with latest.
docker image tag hyperledger/fabric-couchdb:<version> hyperledger/fabric-couchdb:latest
NOTE:
Solution is only for this case. Why fabric-couchdb with latest tag is missing in docker hub is to be investigated.

How to get kaggle competition data via command line on virtual machine?

I am looking for the easiest way to download the kaggle competition data (train and test) on the virtual machine using bash to be able to train it there without uploading it on git.
Fast-forward three years later and you can use Kaggle's API using the CLI, for example:
kaggle competitions download favorita-grocery-sales-forecasting
First you need to copy your cookie information for kaggle site in a text file. There is a chrome extension which will help you to do this.
Copy the cookie information and save it as cookies.txt.
Now transfer the file to the EC2 instance using the command
scp -i /path/my-key-pair.pem /path/cookies.txt user-name#ec2-xxx-xx-xxx-x.compute-1.amazonaws.com:~
Accept the competitions rules and copy the URLs of the datasets you want to download from kaggle.com. For example the URL to download the sample_submission.csv file of Intel & MobileODT Cervical Cancer Screening competition is: https://kaggle.com/c/intel-mobileodt-cervical-cancer-screening/download/sample_submission.csv.zip
Now, from the terminal use the following command to download the dataset into the instance.
wget -x --load-cookies cookies.txt https://kaggle.com/c/intel-mobileodt-cervical-cancer-screening/download/sample_submission.csv.zip
Install CurlWget chrome extension.
start downloading your kaggle data-set. CurlWget will give you full wget command. paste this command to terminal with sudo.
Job is done.
Install cookies.txt extension on chrome and enable it.
Login to kaggle
Go to the challenge page that you want the data from
Click on cookie.txt extension on top right and it download the current page's cookie. It will download the cookies in cookies.txt file
Transfer the file to the remote service using scp or other methods
Copy the data link shown on kaggle page (right click and copy link address)
run wget -x --load-cookies cookies.txt <datalink>

Process stops when one URL in file causes error

I use youtube-dl -a filename to download the videos. However, when one URL in the list of URLs fail, the process exits, is there a way to skip the failing URL and proceeding with the remaining URLs?
The man page of youtube-dl says:
-i, --ignore-errors Continue on download errors, for example to skip unavailable
videos in a playlist
Thus:
youtube-dl -i -a filename
edit: I strongly advice you to run
youtube-dl -U
prior to any download, as the world of online videos is fast changing and updates often fix download errors. Moreover, some errors are due to content restriction and can be solved by adding login and password to the tool:
youtube-dl -u USERNAME -p PASSWORD

Setting Content-Type for static website hosted on AWS S3

I'm hosting a static website on S3. To push my site to Amazon I use the s3cmd command line tool. All works fine except setting the Content-Type to text/html;charset=utf-8.
I know I can set the charset in the meta tag in the HTML file, but I would like to avoid it.
Here is the exact command I'm using:
s3cmd --add-header='Content-Encoding':'gzip'
--add-header='Content-Type':'text/html;charset=utf-8'
put index.html.gz s3://www.example.com/index.html
Here is the error I get:
ERROR: S3 error: 403 (SignatureDoesNotMatch): The request signature we calculated does not match the signature you provided. Check your key and signing method.
If I remove the ;charset=utf-8 part from the above command it works, but the Content-Type gets set to text/html not text/html;charset=utf-8.
Two step process to solve your problem.
(1) Upgrade your installation of S3cmd. Version 1.0.x does not have the capability to set the charset. Install from master on github. Master includes fixes for this (1) bug and this (2) bug that result in failure to recognize the format of the content-type and the "called before definition" problem in earlier versions.
To install s3cmd from master on OSX do the following:
git clone https://github.com/s3tools/s3cmd.git
cd s3cmd/
sudo python setup.py install (sudo optional based on your setup)
Make sure your python libraries are in your path by adding the following to your .profile or .bashrc or .zshrc (again, depending on your system).
export PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin:$PATH"
but if you use homebrew to might cause conflicts so - just symlink to the executable.
ln -s /Library/Frameworks/Python.framework/Versions/2.7/bin/s3cmd /usr/local/bin/s3cmd
Close terminal and reopen.
s3cmd --version
will still output
s3cmd version 1.5.0-alpha3 - but its the patched version.
(2) Once upgraded, use:
s3cmd --acl-public --no-preserve --add-header="Content-Encoding:gzip" --add-header="Cache-Control:public, max-age=86400" --mime-type="text/html; charset=utf-8" put index.html s3://www.example.com/index.html
If the upload succeeds and sets the Content-Type to "text/html; charset=utf-8" but you see this error in the process:
WARNING: Module python-magic is not available...
I prefer to live without python-magic - I find that if you don't specifically set the mime-type, python-magic often guesses wrong. Install python-magic but be sure to set mime-type="application/javascript" in s3cmd or python-magic will guess it to be "application/x-gzip" if you gzip your js locally.
Install python-magic:
sudo pip install python-magic
PIP broke with the recent OSX upgrade so you may need to update PIP:
sudo easy_install -U pip
That will do it. All this works with S3cmd sync too - not just put. I suggest you put s3cmd sync into a thor-type task so you don't forget to set the mime-type on any particular file (if you are using python-magic on gzipped files).
This is a gist of an example thor task for deploying a static Middleman site to s3. This task allows you to rename files locally and use s3cmd sync rather than using S3cmd put to rename them one-by-one.