how to use proxy on youtube-dl? - youtube-dl

I wanna use the proxy and run this
youtube-dl --proxy socks5://127.0.0.1:1080
this is an error below
Usage: youtube-dl [OPTIONS] URL [URL...]
youtube-dl: error: You must provide at least one URL.
what is the problem here?

The option --proxy ... just applies to that invocation of youtube-dl. To download a video using a proxy, add the video URL to the command line, like this:
youtube-dl --proxy socks5://127.0.0.1:1080 https://youtu.be/BaW_jenozKc
If you want to use a proxy for all further invocations, create a configuration file with the contents
--proxy socks5://127.0.0.1:1080

How to download video and Playlist from youtube using youtube-dl
Step 1. Download exe file from https://github.com/ytdl-org/youtube-dl
I used the following command to download in windows 8.1
E:>youtube-dl.exe -proxy https://10.20.30.10:8080 https://www.youtube.com/playlist?list=xx
Network Options:
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy.
To enable SOCKS proxy, specify a proper
scheme. For example
socks5://127.0.0.1:1080/. Pass in an empty
string (--proxy "") for direct connection
--socket-timeout SECONDS Time to wait before giving up, in seconds
--source-address IP Client-side IP address to bind to
-4, --force-ipv4 Make all connections via IPv4
-6, --force-ipv6 Make all connections via IPv6

Related

Bypassing Cloud Run 32mb error via HTTP2 end to end solution

I have an api query that runs during a post request on one of my views to populate my dashboard page. I know the response size is ~35mb (greater than the 32mb limits set by cloud run). I was wondering how I could by pass this.
My configuration is set via a hypercorn server and serving my django web app as an asgi app. I have 2 minimum instances, 1gb ram, 2 cpus per instance. I have run this docker container locally and can't bypass the amount of data required and also do not want to store the data due to costs. This seems to be the cheapest route. Any pointers or ideas would be helpful. I understand that I can bypass this via http2 end to end solution but I am unable to do so currently. I haven't created any additional hypecorn configurations. Any help appreciated!
The Cloud Run HTTP response limit is 32 MB and cannot be increased.
One suggestion is to compress the response data. Django has compression libraries for Python or just use zlib.
import gzip
data = b"Lots of content to compress"
cdata = gzip.compress(s_in)
# return compressed data in response
Cloud Run supports HTTP/1.1 server side streaming, which has unlimited response size. All you need to do is use chunked transfer encoding.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding

How could I start envoy from a dumped config. Which is generated by /config_dump

When debug envoy, I try to run from a dumpped config file, but couldn't figure it out.
Dump the config using the envoy admin api '/config_dump'.
curl -X POST http://127.0.0.1:15000/config_dump -o envoy.config
But can't start it up, there are errors:
envoy --config-path envoy.config
...
[2019-12-22 12:40:50.313][194][critical][main] [external/envoy/source/server/server.cc:98] error initializing configuration 'envoy.config': Protobuf message (type envoy.config.bootstrap.v2.Bootstrap reason INVALID_ARGUMENT:configs: Cannot find field.) has unknown fields
[2019-12-22 12:40:50.313][194][info][main] [external/envoy/source/server/server.cc:607] exiting Protobuf message (type envoy.config.bootstrap.v2.Bootstrap reason INVALID_ARGUMENT:configs: Cannot find field.) has unknown fields
The dumped config is actually not intended to be used to start up the server. You start a server with a Bootstrap Config, but if you look closely at the output of the /config_dump endpoint, it actually contains 5 or more separate config dumps. My local envoy (Envoy 1.12.2) actually show config dumps for:
Bootstrap Config
Clusters
Listeners
ScopedRoutes
Routes
Secrets
You can read more about the output structure in the config dump docs, but the summary of that is that it's a totally different structure.
If you do take the output of /config_dump and strip it down to just the bootstrap config field, you can indeed start the server with it.

npm:youtube-dl and Lamda HTTP Error 429: Too Many Requests

I am running an npm package: youtube-dl through a Lambda function as I want to create an online convertor.
I have suddenly started to run into the following error message:
{
"errorMessage": "Command failed: /var/task/node_modules/youtube-dl/bin/youtube-dl --dump-json --format=best[ext=mp4] https://www.youtube.com/watch?v=MfTbHITdhEI\nERROR: Unable to download webpage: HTTP Error 429: Too Many Requests (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.\n",
"errorType": "Error",
"stackTrace": ["ERROR: Unable to download webpage: HTTP Error 429: Too Many Requests (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.", "", "ChildProcess.exithandler (child_process.js:275:12)", "emitTwo (events.js:126:13)", "ChildProcess.emit (events.js:214:7)", "maybeClose (internal/child_process.js:925:16)", "Process.ChildProcess._handle.onexit (internal/child_process.js:209:5)"]
}
Edit: I have run this a few times when I was testing the other day, but today I only ran it once.
I think that the IP address used by my Lambda function has now been blacklisted. I'm unsure how to proceed as I am a junior and very new to all this.
Is there a way to resolve this? Can I get a new IP address? Is this going to be super costly?
youtube-dl lack of delay (limit of request per time) option.
(see suggestion it the bottom of my post).
NEVER download more than one video with youtube-dl.
You can search youtube-dl author contact (e-mail etc) and write them directly, also as open issue on github page regarding it. as more request they have as fast they be pleased to fix it.
Currenty they have planty same request on this issue in gitlab but they hardly to block discussions and close tickets by this problem.
This is some sort of misbehaviour I believe.
I also found that developer suggest to use proxy instead of introducing delay option in his code - extremely funny.
OK, re to use proxy - but this actually does not solve the problem since it is lack of program design and no matter you use proxy or not YouTube limits is still here.
Please note:
This cause not only subj error but blocking your IP by YouTube.
Once you hit this situation YouTube will block your IP as a suspicious again and again even with a small requests amount. this cause tremendous problems since IP marked as suspicious.
Without limiting request per time option (with safe value by default) I consider youtube-dl as a dangerous software should cause problems and I stopped using it until this option will be introduced.
RECOMENDATIONS:
Use Ctrl+S (suspend) , Ctrl+Q (resume) when youtube-dl collecting digest for many videos (when you already donloaded many videos of channel but new one still there). I suspend it for a few minutes after eatch 10.
And use --limit-rate 150K (or as low as it sane), this may help you to not hit the limit since whole transmission is shaped.
Ok, so I found this response: https://stackoverflow.com/a/45339683/9793169
I am wondering if it's possible that our because our volume is low we just always end up using the same container hence the same IP address?
Yes, that is exactly the reason. A container is only spawned if no containers are already available. After a few minutes of no further demand, excess/unneeded containers are destroyed.
If so is there any way to prevent this?
No, this behavior is by design.
SOLUTION:
I logged out for 20 minutes and went back to the function and ran it again. It worked
Not my solution, it took me a while to understand what he ment (reading is an art). It worked for me.
(see: https://askubuntu.com/questions/1220266/youtube-dl-do-not-working-http-error-429-too-many-requests-how-can-i-solve-this)
You have to use the option --cookies in combination with a current/correct cookie file.
Here the steps I followed
1. if you use Firefox, install addon cookies.txt, enable the addon
2. clear your browser cache, clear you browser cookies (privacy reasons)
3. go to google.com, and log in with your google account
4. go to youtube.com
5. click on the cookies.txt addon, and export the cookies, save it as cookies.txt (in the same directory from where you are going to run youtube-dl)
6. this worked for me ... youtube-dl --cookies cookies.txt https://www.youtube.com/watch?v=....
Hope it helps.
use --force-ipv4 option in command.
youTube-dl --force-ipv4 ...
What you should do is handle that error by retrying the requests that are throttled.

How to schedule task to call gRPC method?

I have .Net server running in Google Kubernetes Engine. It is configured to use gRPC through Google Cloud Endpoints. Now I need to schedule task to call my gRPC method once per day.
The first thing I tried was to use Google Cloud Scheduler to call http methods directly. For that I have:
Set up HTTP to gRPC transcoding on my server to call my gRPC method through http.
Created and enabled SSL certificate as described here.
Created service account in IAM & admin console with Service Account Token Creator and Service Account User permissions.
Created Cloud Scheduler job with my url and Auth header as OIDC token and created above service account.
Deployed Google Cloud Endpoints configuration with following parameters (not only them):
authentication:
providers:
- id: google_service_account
issuer: MY_SERVICE_ACCOUNT_EMAIL
jwks_uri: https://www.googleapis.com/robot/v1/metadata/x509/MY_SERVICE_ACCOUNT_EMAIL
rules:
- selector: "*"
requirements:
- provider_id: google_service_account
After that when I run scheduler job it returns result "Failed". In logs it writes ERROR with status UNKNOWN.
The second thing I tried was to use Google Cloud Scheduler to publish message in Pub Sub topic with my server as subscriber.
Unsuccesfully too because I can't verify ownership of Google Cloud Endpoints domain. I asked regarding question here: How to verify ownership of Google Cloud Endpoints service URL?
Now the question: what is the best way to schedule task that would call gRPC method assuming following environment:
.Net server running on GKE
gRPC
Automated periodical call of that task (I can call manually but it's meaningless)
So you were able to make a HTTP call manually, but not automatically by Google Cloud Scheduler, is that correct?
If so, check to see if the request reach the Cloud Endpoint Proxy in the cloud console Endpoint Logging, it may give you some hints.
Distributed scheduler
more details refer sourcedcode Distributed scheduler
This application can be run on different hosts and offers functionality to
schedule execution of arbitrary command at particular time or periodically.
There are two ways to communicate with application: gRPC and REST. Remote
interfaces are
specified in dsched.proto file
Corresponding REST API could be also found over there in form of API
annotations. We also provide generated Swagger files.
To specify task execution timing, we are using notation adopted by cron.
Scheduled tasks are stored in file and loaded automatically during startup.
Building
Install gRPC
Install gRPC gateway
To parse crontab statements and schedule task execution, we are using gopkg.in/robfig/cron.v2 library.
So it should be installed also: go get -u gopkg.in/robfig/cron.v2. Documentation could be found here
Get dsched package: go get
-u gitlab.com/andreynech/dsched
Now it is possible to run standard go build command in dscheduler and
gateway directories to generate binaries for scheduler and REST/JSON API
gateway. It might be also helpful to examine our
CI configuration file to see how we
set up building environment.
Running
All the scheduling functionality is implemented by dscheduler executable. So
it could be run on system startup or on demand. As described by dscheduler --help,
there are two command line parameters:
-i string - File name to store task list (default "/var/run/dscheduler.db")
-p string - Endpoint to listen (default ":50051")
If there is a need to offer REST/JSON API, gateway application located in
gateway directory should be run. It could reside on the same host as
dscheduler, but typically it would be other host which is accessible over
HTTP from outside and at the same way can talk to dscheduler running in
internal network. This setup was also the reason to split scheduler and
gateway in two executables. gateway is mostly generated application and
supports several command-line parameters described by running gateway --help.
Important parameter is -sched_endpoint string which is endpoint of Scheduler
service (default "localhost:50051"). It specifies the host name and port
where dscheduler is listening for requests.
Scheduling tasks (testing)
There are three ways to control scheduler server:
Using Go client implemented in cli/ directory
Using Python client implemented in py_cli directory
Using REST/JSON API gateway and curl
Go and Python clients have similar set of command line parameters.
$ ./cli --help
Usage of cli:
-a string
The command to execute at time specified by -c parameter
-c string
Statement in crontab format describes when to execute the command
-e string
Host:port to connect (default "localhost:50051")
-l List scheduled tasks
-p Purge all scheduled tasks
-r int
Remove the task with specified id from schedule
-s Schedule task. -c and -a arguments are required in this case
They are using gRPC protocol to talk to scheduler server. Here are several
example invocations:
$ ./cli -l list currently scheduled tasks
$ ./cli -s -c "#every 0h00m10s" -a "df" schedule df command for
execution every 10 seconds
$ ./cli -s -c "0 30 * * * *" -a "ls -l" schedule ls -l command to
run every 30 minutes
$ ./cli -r 3 remove task with ID 3
$ ./cli -p remove all scheduled tasks
It is also possible to use curl to invoke dscheduler functionality over
REST/JSON API gateway. Assuming that dscheduler and gateway applications
are running, here are some invocations to list, add and remove scheduling
entries from the same host (localhost):
curl 'http://localhost:8080/v1/scheduler/list' list currently scheduled tasks
curl -d '{"id":0, "cron":"#every 0h00m10s", "action":"ls"}' -X POST 'http://localhost:8080/v1/scheduler/add' schedule ls command for execution every 10 seconds
curl -d '{"id":0, "cron":"0 30 * * * *", "action":"ls -l"}' -X POST 'http://localhost:8080/v1/scheduler/add' schedule ls -l to run every 30 minutes
curl -d '{"id":2}' -X POST 'http://localhost:8080/v1/scheduler/remove' remove task with ID 2.
curl -X POST 'http://localhost:8080/v1/scheduler/removeall' remove all scheduled tasks
All changes are automatically saved in file.
Thoughts on scheduler service discovery
In large deployment scenarios (like hundreds of hosts) it might be
challenging problem to find out all IP addresses and ports where scheduler
service is started. It would be pretty easy to add support for Zeroconf
(Bonjour/Avahi) technology to simplify service discovery. As alternative, it
might be possible to implement something similar to CORBA Naming Service
where running services register themself and location of naming service is
well known. We decide to collect feedback before deciding for particular
service discovery implementation. So your input very welcome!

is there any option to give management or service port as system arguement in wso2 other than portOffset?

I wanted to pass the management console port(specified in catalina-server) and service http port(specified in axis2.xml) as system properties (using -DmgmntPort=9292 -DservPort=8282) while starting wso2 server. I tried -DhttpsPort but not working. please help
I don't think there is an option to allow such usage. I looked into startup script and found, that port always by default 9443, but you can configure offset.
It means if u have offset 10 then actual port number will be 9453 = 9443 + 10
example of such command bellow. Lets consider that u distribution located in /var/lib/wso2esb-4.9.0
rename WSO_HOME/repository/conf/carbon.xml to carbon.original.xml
then add to startup script handler for input variale of offset. Lets call it offset
command
sed "s/<Offset>0<\/Offset>/<Offset>$offset<\/Offset>/" /var/lib/wso2esb-4.9.0/repository/conf/carbon.original.xml > /var/lib/wso2esb-4.9.0./repository/conf/carbon.xml
will create new carbon.xml in proper directory and it will be used to configure ports.
Use -DportOffset= [offset value] when you start the server.
Ex:
./wso2server.sh -DportOffset=3