Google cloud buckets - is there a way to fetch by prefix

Google cloud buckets - is there a way to fetch by prefix - list

Google Cloud Storage Buckets has a function to get a paginated listing of the object names inside a bucket, called "list". Here are the docs:
https://developers.google.com/storage/docs/json_api/v1/buckets/list
If I want to discover whether a certain object name exists, the only (apparent) way to do so is to fetch ALL object names, one page at a time, and look through them myself. This is not scalable.
We have 10,000+ objects stored. So if I want to find gs://mybucket/my/simulated/dir/* or if I want to find gs://mybucket/my/sim*/subdir/*.txt the only way to do so is to retrieve 600,000 bytes of listing information and filter through it with code.
The question: Does anyone know a way, short of keeping track of the object names myself somehow, to get JUST the listings I care about?

It turns out I'm crazy. I was looking at the /buckets/ documentation, and instead I should have been looking at the /objects/ documentation.
https://developers.google.com/storage/docs/json_api/v1/objects/list

Related

IPFS and Editing Permissions

I just uploaded a folder of 5 images to IPFS (using the Mac Desktop IPFS Client App, so it was a very simple drag and drop operation.)
So being that I’m the one that created and published this folder, does that mean that I’m the only one that’s allowed to make further modifications to it - like adding or deleting more images from it? Or can anyone out there on IPFS do that as well?
If they can, is there a way to prevent that from happening?
=======================================
UPDATED QUESTION:
My specific use-case has to do with updating the metadata of ERC721 Tokens - after they’ve already been minted.
Imagine for example a game where certain objects - like say a magical sword - gains special powers after a certain amount of usage or after the completion of certain missions by its owner. So we’d want to update this sword’s attributes by editing it’s Metadata and re-committing this updated metadata file to the Blockchain.
If our game has 100 swords for example, and we initially uploaded to IPFS a folder containing all 100 json files (one for each sword), then I’m pretty sure IPFS still let’s you access the specific files within the hashed-folder by their specific human-readable names (and not only by their hash.)
So if our sword happens to be sword #76, and our naming convention for our JSON files was of this format: “sword000.json” , then sword#76’s JSON metadata file would have a path such as:
http://ipfs.infura.io/QmY2xxxxxxxxxxxxxxxxxxxxxx/sword076.json
If we then edited the “sword076.json“ file and drag-n-dropped it back into our master JSON folder, it would obviously cause that folder’s Hash/CID value to change. BUT, as long as we’re able update our Solidity Contract’s “tokenURI” method to look for and serve our “.json” files from this newly updated HASH/CID folder name, we could still refer to the individual files within it by their regular English names. Which means we’d be good to go.
Whether or not this is a good scheme to employ is something we can definitely discuss, but I FIRST want to go back to my original question/concern, which is that I want to make sure that WE are the ONLY ones that can update the contents of our folder - and that no one else has permission to do that.
Does that make sense?

IPFS is immutable, meaning when you add your directory along with the files, the directory gets a unique CID based on the contents of the directory. So in a sense, nobody can modify it, not even you, because it's immutable. I believe this confusion can be resolved with more background on how IPFS works.
When you add things to IPFS each file is hashed, and given a CID. The same is true for directories, but their CID can more easily be understood as a sum of the contents of the directory. So if any files in the directory are updated, added, or deleted, the directory gets a new CID.
Understanding this, if someone else added the exact same content in the exact same way, they'd end up with the exact same CID! With this, if two people added the same CID, and a third person requested that file (or directory), both nodes would be able to serve the data, as we know it's exactly the same. The same is true if you simply shared your CID and another node pinned it, both nodes would have the same data, so if anyone requested it, both nodes would be able to serve it.
So your local copy, cannot be edited by anyone. In a sense, if you're relying on the IPFS CID as the address of your data, not even by you! This is why IPFS is typically referred to as "immutable", because any data you request via an IPFS CID will always be the same. If you change any of the data, you'll get a new CID.
More info can be found here: Content Addressing & Immutability
If you read all this and thought "well what if I want mutable data?", I'd recommend looking into IPNS and possibly ipfs-sync if you're looking for a tool to automatically update IPNS for you.

what's the difference between a collection and a store in REST?

I'm trying to wrap my head around the difference between a "collection" and a "store" in REST. From what I've read so far,
a collection is:
"a server-managed directory of resources"
and a store is a:
"client-managed resource repository"
I found this post: How "store" REST archetype isn't creating a new resource and a new URI?
but it didn't really help me clarify the difference. I mean, I understand one is controlled by the server and the other by the client... but can someone give me a concrete example of what a store might be in a real world application?
I *think it's something like this:
GET http://myrestapplication.com/widgets/{widget_id} -- retrieves a widget from db
POST http://myrestapplication.com/widgets/{widget_id} -- creates a new widget from db
PUT http://myrestapplication.com/widgets/{widget_id},[list of updated parms & their vals] -- update widget
PUT http://myrestapplication.com/users/johndoe/mywishlist/{widget_id} -- updates john doe's profile to add a widget that already exists in the database... but links to it as his favorite one or one that he wants to buy
Is this correct?
if so, could the last PUT also be expressed as a POST somehow?
EDIT 1
I found an online link to the book i'm reading... where it makes the distinction between the two:
https://books.google.ca/books?id=4lZcsRwXo6MC&pg=PA16&lpg=PA16&dq=A+store+is+a+client-managed+resource+repository.+A+store+resource+lets+an+API+client:+put+resources+in,+get+them+back+out,+and+decide+when+to+delete+them&source=bl&ots=F4CkbFkweL&sig=H6eKZMPR_jQdeBZkBL1h6hVkK_E&hl=en&sa=X&ei=BB-vVJX6HYWvyQTByYHIAg&ved=0CB0Q6AEwAA#v=onepage&q=A%20store%20is%20a%20client-managed%20resource%20repository.%20A%20store%20resource%20lets%20an%20API%20client%3A%20put%20resources%20in%2C%20get%20them%20back%20out%2C%20and%20decide%20when%20to%20delete%20them&f=false

REST uses http verbs to manipulate resources. Full-Stop. That's it. To build some classes of browser-based application developers sometimes use local storage (a store), but that has absolutely nothing to do with REST (in fact, it's the opposite). Collections are a special consideration in REST-based API design because the REST principles place considerable constraints on how they are represented in the results of your queries -- special consideration also because there are no standards on how these things should be represented and access if you're using anything other than html as a resource type.
Edit:
REST suggests that when we ask for a resource we receive that resource and only that resource and that things referenced by that resource are returned as links, not as data. This mimics the http standard by which we return the requested page and links to other pages rather than embedding linked pages. So, our resources should return links to related resources, not the resources themselves.
So, what about collections?
Let's use as an example a college management system that has Course objects each of which contains a huge lists of Students.
When I GET the course I don't want to have the collection of students returned as an embedded list, because that could be huge and because my user might not be interested. Instead, I want to know that the course has a students collection and I want to be able to query that collection separately (when I need to) and I want to be able to page it on demand. For this to work, the course needs to link to the students collection URL (maybe with an appropriate type so that my code knows how to handle the link). Then, I want to use the given collection's url to request a paged list of resources. In this example, the collection's url could be something like: course/1/students, with the convention that I can add paging info to the search string to constrain the results with something like course/1/students?page=1&count=10. Embedding the students collection into the course resource would be a violation of REST. I would not be returning a course, I'd be returning course-and-students.

What tools or techniques are available to analyze disk usage?

We would like to analyze disk consumption within Documentum for possible business process improvement. Can disk usage be determined by folder, by object type, by document filename extension, over time, etc?

Actually, I think there are no other approaches for your task. It can only be solved by DQL query.
Let me describe why I think so: Documentum can store content files not only on filesystem but for example as BLOB in DB or as records in Centera and so on. So, Documentum should provide a unified approach for getting size for content files regardless of storing method. And this method is to store content size in attribute full_content_size of dmr_content object.
Also each dm_sysobject has attribute r_full_content_size where you can find size, in bytes, of the first content object (dmr_content object) associated with sysobject.
So your DQL is correct only if each dm_sysobject in your system has only one associated content object.

I know that I can roll my own DQL, for example:
select sum(r_full_content_size)/1024/1024/1024 as total_gb from dm_sysobject (all) where
cabinet('/My Cabinet',descend);
but I'm wondering if there are other approaches? Tools analogous to UNIX du(1) command, etc.

RESTful search. Return actual resources or URIs?

Pretty new to all this REST stuff.
I'm designing my API, and am not sure what I'm supposed to return from a search query. I was assuming I would just return all objects that match the query in their entirety, but after reading up a bit about HATEOAS I am thinking I should be returning a list of URI's instead?
I can see that this could help with caching of items, but I'm worried that there will be a lot of overhead generated by the subsequent multiple HTTP requests required to get the actual object info.
Am I misunderstanding? Is it acceptable to return object instances instead or URIs?

I would return a list of resources with links to more details on those resources.
From RESTFull Web Services Cookbook 2010 - Subbu Allamaraju
Design the response of a query as a representation of a collection
resource. Set the appropriate expiration caching headers. If the query
does not match any resources, return an empty collection.

IMHO it is important to always remember that "pure REST" and "real world REST" are two quite different beasts.
How are you returning the list of URIs from your query in the first place? If you return e.g. application/json, this certainly does not tell the client how it is supposed to interpret the content; therefore, the interaction is already being driven by out-of-band information (the client magically already knows where to look for the data it needs) in conflict with HATEOAS.
So, to answer your question: I find it quite acceptable to return object instances instead of URIs -- but be careful because in the general case this means you are generating all this data without knowing if the client is even going to use it. That's why you will see a hybrid approach quite often: the object instances are not full objects (i.e. a portion of the information the server has is not returned), but they do contain a unique identifier that allows the client to fetch the full representation of selected objects if it chooses to do so.

Can you build a truly RESTful service that takes many parameters?

After reading an article on REST ("Restful Grails"), I have gotten the impression that it is not possible to truly conform to a REST style in a service that demands a lot of parameters. Is this so? All the examples I have seen so far seem to imply that true REST style services are "parameterless". Using parameters would be RPC-ish and not truly RESTful.
To be more specific, say we have a service that returns graph data for stock prices, and this service needs to know the start date, end date, the currency, stock name, and whatever else might be applicable. In any case, at least 4-5 parameters are needed to retrieve the information needed.
I would imagine the URL to be something like this : /stocks/YAHOO?startDate="2008-09-01"&endDate=...
("YAHOO" is here a made-up stock name).
Would this really be REST or is this more RPC-like, what the author of the aforementioned article calls "GETful" (i.e. just low ceremony rpc)?

You can see the querystring as a filter on the resource you are GETing. Here, your resource is the stock prices of yahoo. Doing a GET on that resource give you all the available data, or the most recents. The query string filter the prices you want. Content negociation allow you to change the representation, e.g. a png graph, a csv file, and so on. To add a price, simply POST a representation (e.g. CSV) to the same resource.
The "restfulness" is not realy in the URL itself, since URIs are obscures to client, but in the way you interact with resources themselves identified by their URI

Feel free to use as many parameters as you need to identify the resource you wish to access. REST doesn't care.

Why would you think it is not possible?
Google uses REST for their charts api, and they take alot of params:
http://chart.apis.google.com/chart?cht=bvg&chs=350x300&chd=t:20,35,10&chxr=1,0,40&chds=0,40&chco=FF0000|FFA000|00FF00&chbh=65,0,35&chxt=x,y,x&chxl=0:|High|Medium|Low|2:||Task+Priority||&chxs=2,000000,12&chtt=Tasks+on+my+To+Do+list&chts=000000,20&chg=0,25,5,5

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js