How i can download multiple files using URL links of google drive files - google-people-api

I have created an API for sending letters. My letters are on google drive. I have created an API that uses my pdf files saved on a google drive account. So the API has a loop according to a number of letters. If I have 1000 records API will take 1000 files from google drive for every record already saved in a specific folder. I'm not using any API for downloading files from google drive. I'm using this link for getting files https://drive.google.com/uc?id=FILE_ID&format=pdf or https://drive.google.com/uc?id=FILE_ID&authuser=0&export=download
$query_str = explode('/', $fileURL);
$fileID = $query_str[5]; $parrsedURL = "https://drive.google.com/uc?id=".$fileID;
file_put_contents('ClientFiels/'.$fileName.'.pdf', fopen($fileURL, 'r'));
*FILE_ID id will be changed every time the new round of loop will start. Problem: The API is working fine for 60 to 70 records. And after that, I'm getting this error. 403 for URL: https://doc-00-5o-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/576qj60mg7jp9jdrko5ck1dtabf1ktm3/1647339525000/09925282465490487763/*/1V1AYJRwLcZxSoAFJPRzuOEMFWiKiptb4?e=download And after 5 to 10 minutes if I run the program again it works for the same amount of records again. When I check my API metrics in google console it shows me an error in Drive.files.copy: method. I tried to download files in different ways but I got the same error. I'm not sure if there is any quota limit or I need to change any settings. Google Drive files and folders are public.

Related

Springboot server in Elastic Beanstalk creates files that I can't see

I have a Springboot server that is deployed to an Elastic Beanstalk environment in AWS. The basic functionality is this:
1. Upload a file to the server
2. The server processes file by doing some data manipulation.
3. Then the file that is created is sent to a user via email.
The strange thing is that, the functionality mentioned above is working. The output file is sent to my email inbox successfully. However, the file cannot be seen when SSHed into the instance. The entire directory that gets created for the data manipulation is just not there. I have looked everywhere.
To test this, I even created a simple function in my Springboot Controller like this:
#GetMapping("/")
public ResponseEntity<String> dummyMethod() {
// TODO : remove line below after testing
new File(directoryToCreate).mkdirs();
return new ResponseEntity<>("Successful health check. Status: 200 - OK", HttpStatus.OK);
}
If I use Postman to hit this endpoint, the directory CANNOT be seen via the terminal that I am SSHed into. The program is working so I know that the code is correct in that sense, but its like the files and directories are invisible to me.
Furthermore, if I were to run the server locally (using Windows OR Linux) and hit this endpoint, the directory is successfully created.
Update:
I found where the app lives in the environment at /var/app. But my folders and files are still not there, only the source code files, ect are there. The files that my server is supposed to be creating are still missing. I can even print out the absolute path to the file after creating it, but that file still doesn't exist. Here is an example:
Files.copy(source, dest);
logger.info("Successfully copied file to: {}", dest.getAbsolutePath());
will print...
Successfully copied file to: /tmp/TESTING/Test-Results/GVA_output_2021-12-13 12.32.58/results_map_GVA.csv
That path DOES NOT exist in my server, but I CAN send it to me via email from the server code after being processed. But if I SSH into the instance and go to that path, nothing is there.
If I use the command: find . -name "GVA*" (to search for the file I am looking for) then it prints this:
./var/lib/docker/overlay2/fbf04e23e39d61896a1c935748a63f2d3836487d9b166bae490764c30b8870ae/diff/tmp/TESTING/Test-Results/GVA_output_2021-12-09 18.15.59
./var/lib/docker/overlay2/fbf04e23e39d61896a1c935748a63f2d3836487d9b166bae490764c30b8870ae/diff/tmp/TESTING/Test-Results/GVA_output_2021-12-13 12.26.34
./var/lib/docker/overlay2/fbf04e23e39d61896a1c935748a63f2d3836487d9b166bae490764c30b8870ae/diff/tmp/TESTING/Test-Results/GVA_output_2021-12-13 12.32.58
./var/lib/docker/overlay2/fbf04e23e39d61896a1c935748a63f2d3836487d9b166bae490764c30b8870ae/merged/tmp/TESTING/Test-Results/GVA_output_2021-12-09 18.15.59
./var/lib/docker/overlay2/fbf04e23e39d61896a1c935748a63f2d3836487d9b166bae490764c30b8870ae/merged/tmp/TESTING/Test-Results/GVA_output_2021-12-13 12.26.34
./var/lib/docker/overlay2/fbf04e23e39d61896a1c935748a63f2d3836487d9b166bae490764c30b8870ae/merged/tmp/TESTING/Test-Results/GVA_output_2021-12-13 12.32.58
But this looks like it is keeping track of differences between versions of files since I see diff and merged in the file path. I just want to find where that file is actually residing.
If you need to store an uploaded file somewhere from a Spring BOOT app, look at using an Amazon S3 bucket as opposed to writing the file to a folder on the server. For example, assume you are working with a Photo app and the photos can be uploaded via the Spring BOOT app. Instead of placing this in a directory on the server, use the Amazon S3 Java API to store the file in an Amazon S3 bucket.
Here is an example of using a Spring BOOT app and handling uploaded files by placing them in a bucket.
Creating a dynamic web application that analyzes photos using the AWS SDK for Java
This example app also shows you how to use the SES API to send data (a report in this example) to a user via email.

Unable to import more than 1000 files from Google Cloud Storage to Cloud Data Prep

I have been trying to run a Cloud Data Prep flow which takes files from Google Cloud Storage.
The files on Google Cloud Storage gets updated daily and there are more than 1000 files in the bucket right now. However, I am not able to fetch more than 1000 files from the bucket.
Is there any way to get the data from Cloud Storage? If not, is there any alternative way from which we can achieve this?
You can load a large number of files using the + button next to a folder in the file browser. This will load all the files in that folder (or more precisely prefix) when running a job on Dataflow.
There is however a limit when browsing/using the parameterization feature. Some users might have millions of files and searching among all of them is not possible. (as GCS only allow filtering by prefix).
See the limitations on that page for more details:
https://cloud.google.com/dataprep/docs/html/Import-Data-Page_57344837

How to allow google cloud storage to save duplicate files and not overwrite

I am using google cloud storage bucket to save file uploads from my Django web application. However if a file with same name is uploaded, then it overwrites the existing file and i do not want this to happen. I want to allow duplicate file saving at same location. Before moving to google cloud storage when i used my computer's hard disk to save files, Django used to smartly update filename in database as well as hard disk.
I upload files with the name given by users, and I concatenate a timestamp including seconds and milliseconds. but the name of the file is seen by clients as they added it, since I remove that part of the string when it is displayed in the view.
example
image1-16-03-2022-12-20-32-user-u123.pdf
image1-27-01-2022-8-22-32-usuario-anotheruser.pdf
both users would see the name image1

Doubts using Amazon S3 monthly calculator

I'm using Amazon S3 to store videos and some audios (average size of 25 mb each) and users of my web and android app (so far) can access them with no problem but I want to know how much I'll pay later exceeding the free stage of S3 so I checked the S3 monthly calculator.
I saw that there is 5 fields:
Storage: I put 3 gb cause right now there are 130 files (videos and audios)
PUT/COPY/POST/LIST Requests: I put 15 cause I'll upload manually around 10-15 files each month
GET/SELECT and Other Requests: I put 10000 cause a projection tells me that the users will watch/listen those files around 10000 times monthly
Data Returned by S3 Select: I put 250 Gb (10000 x 25 mb)
Data Scanned by S3 Select: I don't know what to put cause I don't need that amazon scans or analyze those files.
Am I using that calculator in a proper way?
What do I need to put in "Data Scanned by S3 Select"?
Can I put only zero?
For audio and video, you can definitely specify 0 for S3 Select -- both data scanned and data returned.
S3 Select is an optional feature that only works with certain types of text files -- like CSV and JSON -- where you make specific requests for S3 to scan through the files and return matching values, rather than you downloading the entire file and filtering it yourself.
This would not be used with audio or video files.
Also, don't overlook "Data transfer out." In addition to the "get" requests, you're billed for bandwidth when files are downloaded, so this needs to show the total size of all the downloads. This line item is data downloaded from S3 via the Internet.

s3zipper has limited to download only 1000 files from s3 bucket

I am using s3zipper along with PHP to stream zip S3 files. However there is one issue. We have more than 1000 of files to download (approx 2K to 10K varying). So when we send request to s3zipper lets say 1500 files, we were getting only 1000 files within a zip.
As per AWS docs they have 1000 keys limitation i.e.
S3 API version 2 implementation of the GET operation returns some or all (up to 1,000) of the objects in a bucket.
. So if we want to get more than than we have to use marker parameter AWS A. But in s3zipper.go this call aws_bucket.GetReader(file.S3Path), is reading file and adding to zip.I am not sure how I can use marker in this case.
I am curious how we can come over from this limitation. I am newbie to Go language, any help in this regard will be highly appreciated.