Find files accessible outside of my organization - regex

In the Drive.Files.List I can, using the 'q' parameter, get all files a user can read/write or own. I would like to be able to use regular expression in the query value. For example set q to be "not '.+#my-org.com' in writers".
Is such a query already supported?
Do I have another way (except invoking Drive.Permissions.List for each and every file in my Drive) to get this information from?

Seems the only account level drive API is part of the report API - activities list. This API (and admin console - audit - drive) section is only supported in the unlimited license. Still haven't found the proper API get the drive state (list all files metadata in the account, permissions etc.) seems that the state can only be inferred from analyzing the relevant activity events assuming the activity is not being evicted after a predefined period of time.
My conclusion, at the moment, is that there is no "root" directory at the account level. "root" is only with respect to the logged in user.
I would be more than happy to be proved wrong.
Uri

Related

GCP: Is it possible to have an access to a resource if don't have project access?

It is my first expirience in Google Cloud Platform and I'm confused.
I've got an access to a resource:
xxx#gmail.com has granted you the following roles for resource resource_name(projects/project_name/datasets/ClientsExport/tables/resource_name) BigQuery Data Editor
But if I open BigQuery Data Editor, I don't see project_name and resource_name. Search by resource_name also returns no result.
Is it only access that I have in the project (I didn't get another accesses and mails).
Could you please help me with this? Maybe should I get some additional access to resource_name will be available? If is there another way to find the resource?
Thank you in advance!
In the message you have access to BigQuery data inside a table. You can query them from your project, you are autorised to access them (and to write also, because you are editor).
However, this table isn't in your project, it's in another project that's why you don't see it directly in the BigQuery console. In addition, you haven't the right to read the metadata (roles/bigquery.metadataViewer) on the dataset of the other project. Eventually, you can't also view the table schema in the console, but the bq CLI allow you to view it.
I had some discussions with Google BigQuery team about that (because I got the same issue in my company), and updates should happen by the end of the year (or soon in 2022) to fix this "view" issue in the console.
It looks like you have IAM permission to access a specific resource in BigQuery but cannot access it from the GUI.
Some reasons you may not see access on your GUI:
You have permission to interact with BigQuery but don't have access to any of the data.
You aren't a member of the organization which provided the resources and they have higher level permissions (on the org level) which prevents sharing of resources outside of the org.
Your access is restricted to the command line/app level. (If your account is a service account then this is likely the case.)

Storing S3 Urls vs calling listObjects

I have an app that has an attachments feature for users. They can upload documents to S3 and then revisit and preview and/or Download said attachments.
I was planning on storing the S3 urls in DB and then pre-signing them when the User needs them. I'm finding a caveat here is that this can lead to edge cases between S3 and the DB.
I.e. if a file gets removed from S3 but its url does not get removed from DB (or vice-versa). This can lead to data inconsistency and may mislead users.
I was thinking of just getting the urls via the network by using listObjects in the s3 client SDK. I don't really need to store the urls and this guarantees the user gets what's actually in S3.
Only con here is that it makes 1 API request (as opposed to DB hit)
Any insights?
Thanks!
Using a database to store an index to files is a good idea, especially once the volume of objects increases. The ListObjects() API only returns 1000 objects per call. This might be okay if every user has their own path (so you can use ListObjects(Prefix='user1/'), but that's not ideal if you want to allow document sharing between users.
Using a database will definitely be faster to obtain a listing, and it has the advantage that you can filter on attributes and metadata.
The two systems will only get "out of sync" if objects are created/deleted outside of your app, or if there is an error in the app. If this concerns you, then use Amazon S3 Inventory, to provide a regular listing of objects in the bucket and write some code to compare it against the database entries. This will highlight if anything is going wrong.
While Amazon S3 is an excellent NoSQL database (Key = filename, Value = contents), it isn't good for searching/listing a large quantity of objects.

how to restrict google cloud storage upload

I have a mobile application that uses Google Cloud Storage. The application allows each registered user to upload a specific number of files.
My question is, is there a way to do some kind of checks before the storage upload? Or do I need to implement a separate reservation API of sorts that OKs an upload step?
Any alternative suggestions are welcome too, of course.
warning: Not an authoritative answer. Happy to accept removal or update requests.
I am not aware of any GCS or Firebase Cloud Storage mechanisms that will inherently limit the number of files (objects) that a given user can create. If it were me, this is how I would approach the puzzle.
I would create a database (eg. Firestore / Datastore) that has a key for each user and a value which is the number of files they have uploaded. When a user wants to upload a new file, it would first make a REST call to a Cloud Function that I would write. This Cloud Function would implicitly know the identity of the calling user. It would look up the record in the database and determine if we are allowed to upload a new file. If no, then return an error and end of story. If yes, then increment the value in the database. Next I would create a GCS "signed URL" that can be used to permit an upload. It would be that signed URL that the Cloud Function would return. The app that now wishes to upload can use that signed URL to perform the actual upload.
I would also add metadata to each file uploaded to identify the logical uploader (user) of the file. That can be then used for reconciliation if needed. We could examine all the files in the bucket and re-build the database of how many files each user had uploaded.
A possible alternative to this story is for the Cloud Function to not return a signed-url but instead receive the data to be uploaded in the same request. If the check on number of files passes, then the Cloud Function could be a proxy to a GCS write to create the file directly. This alternative needs to be carefully examined as a function of the sizes of the files to be uploaded. If the files are large this may be a very poor solution. We want to be in and out of Cloud Functions as quickly as possible and holding a Cloud Function "around" to service data pass through isn't great. We may want to look at Cloud Run in that case as it supports concurrency in the instance without increasing the cost per call.

Should I store failed login attempts in AWS Cognito or Dynamo DB?

I have a requirement to build a basic "3 failed login attempts and your account gets locked" functionality. The project uses AWS Cognito for Authentication, and the Cognito PreAuth and PostAuth triggers to run a Lambda function look like they will help here.
So the basic flow is to increment a counter in the PreAuth lambda, check it and block login there, or reset the counter in the PostAuth lambda (so successful logins dont end up locking the user out). Essentially it boils down to:
PreAuth Lambda
if failed-login-count > LIMIT:
block login
else:
increment failed-login-count
PostAuth Lambda
reset failed-login-count to zero
Now at the moment I am using a dedicated DynamoDB table to store the failed-login-count for a given user. This seems to work fine for now.
Then I figured it'd be neater to use a custom attribute in Cognito (using CognitoIdentityServiceProvider.adminUpdateUserAttributes) so I could throw away the DynamoDB table.
However reading https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-dg.pdf the section titled "Configuring User Pool Attributes" states:
Attributes are pieces of information that help you identify individual users, such as name, email, and phone number. Not all information about your users should be stored in attributes. For example, user data that changes frequently, such as usage statistics or game scores, should be kept in a separate data store, such as Amazon Cognito Sync or Amazon DynamoDB.
Given that the counter will change on every single login attempt, the docs would seem to indicate I shouldn't do this...
But can anyone tell me why? Or if there would be some negative consequence of doing so?
As far as I can see, Cognito billing is purely based on storage (i.e. number of users), and not operations, whereas Dynamo charges for read/write/storage.
Could it simply be AWS not wanting people to abuse Cognito as a storage mechanism? Or am I being daft?
We are dealing with similar problem and main reason why we have decided to store extra attributes in DB is that Cognito has quotas for all the actions and "AdminUpdateUserAttributes" is limited to 25 per second.
More information here:
https://docs.aws.amazon.com/cognito/latest/developerguide/limits.html
So if you have a pool with 100k or more it can create a bottle neck if wanted to update a Cognito user records with every login etc.
Cognito UserAttributes are meant to store information about the users. This information can then be read from the client using the AWS Cognito SDK, or just by decoding the idToken on the client-side. Every custom attribute you add will be visible on the client-side.
Another downside of custom attributes is that:
You only have 25 values to set
They cannot be removed or changed once added to the user pool.
I have personally used custom attributes and the interface to manipulate them is not excellent. But that is just a personal thought.
If you want to store this information, and not depend on DynamoDB, you can use Amazon Cognito Sync. Besides the service, it offers a client with great features that you can incorporate to your app.
AWS DynamoDb appears to be your best option, it is commonly used for such use cases. Some of the benefits of using it:
You can store separate record for each login attempt with as much info as you want such as ip address, location, user-agent etc. You can also add datetime that can be used by pre-auth Lambda to query by time range for example failed attempt within last 30 minutes
You don't need to manage table because you can set TTL for DynamoDb record so that record will be deleted automatically after specified time.
You can also archive items in S3

How to deal with deep level granularization with XACML in enterprise application

I am using IS WSO2 for authorization with XACML. I am am able to achieve authorization for static resource. But I am not sure with the design when it comes to granularization.
Example : if I have method like getCarDetails(Object User) where I should get only those cars which are assigned to this particular user, then how to deal this with XACMl?
Wso2 provides support for PIP where we can use custom classes which can fetch data from database. But I am not sure if we should either make copy of original database at PDP side or give the original database to PIP to get updated with live data.
Because Cars would be dynamic for the application eg. currently 10 cars assigned to user Alice. suddenly supervisor add 20 more car in his list which will be in application level database. Then how these other 20 cars will be automatically assigned in policy at PDP level until it also have this latest information.
I may making some mistake in understanding. But I am not sure how to deal with this as in whole application we can have lots of this kind of complex scenario where some times we will get data for one user from more than 4 or 5 tables then how to handle that scenario?
Your question is a great and the answer will highlight the key benefits of XACML and externalized authorization as a whole.
In XACML, you define generic, global rules, about what is allowed and what isn't using what I would call high-level attributes e.g. attributes of the vehicle (in your case) or the user (role, department, ...)
For instance a simple rule could be (using the ALFA syntax):
policy viewCars{
target clause actionId=="view" and resourceType=="car"
apply firstApplicable
rule allowSameRegion{
permit
condition user.region==car.region
}
}
Both the user's region and the car's region are maintained inside the application's database. The values are read using a PIP or Policy Information Point (details here).
In your example, you talk about direct assignment, i.e. a user has been directly assigned to a vehicle. In that case, the rule would become:
policy viewCars{
target clause actionId=="view" and resourceType=="car"
apply firstApplicable
rule allowAssignedVehicle{
permit
condition user.employeeId==car.assignedUser
}
}
This means that the assigned user information must be kept somewhere, in the application database, a CSV file, a web service, or another source of information. It means that from a management perspective, an administrator would add / remove vehicles from a user's assigned list (or perhaps the other way around: add / remove assigned users from a vehicle's assigned user list).
The XACML rule itself will not change. If the supervisor adds 20 more cars to the employee's list (maintained in the application-level database), then the PDP will be able to use that information via the PIP and access will be granted or denied accordingly.
The key benefit of XACML is that you could add a second rule that would state a supervisor can see the cars he/she is assigned to (the normal rule) as well as the cars assigned to his/her subordinates (a new proxy-delegate rule).
This diagram, taken from the Axiomatics blog, summarizes the XACML flow:
HTH, let me know if you have further questions. You can download ALFA here and you can watch tutorials here.