I try to get all entities from a namespace in order to delete them in a later step.
Im using appengine together with datastore and ndb lib using python2.7
I have a simple query to get all entities:
def get_entities(namespace_id):
return [entity for entity in ndb.Query(namespace=namespace_id).fetch()]
Also modified it to avoid the dunder kinds/entities Datastore Statistics in legacy bundled services:
def get_entities(namespace_id):
return [entity for entity in ndb.Query(namespace=namespace_id).fetch() if not entity.key.id_or_name.startswith('__')]
While running locally using datastore emulator works just fine.
But I get this error when deployed in the cloud:
KindError: No model class found for kind '__Stat_Ns_Kind_IsRootEntity__'. Did you forget to import it?
I found this post Internal Kinds Returned When Retrieving All Entities Belonging to a Particular Namespace but not a clear answer.
If you have another way to get all the entities fro a specific namespace will be welcome!!
Per the documentation you have referenced, it's the kind name that begins and ends with two underscores.
Each statistic is accessible as an entity whose kind name begins and ends with two underscores
However, your code is checking for entity keys that starts with an underscore. You should be checking the kinds instead
Modify your code to
return [entity for entity in ndb.Query(namespace=namespace_id).fetch(keys_only=True) if not entity.kind().startswith('__')]
Note: I switched your query to only fetch keys since all you want is to delete the records
I'm putting together JSON schemas and I'd like to use $ref to DRY my schemas. I'll have many schemas that will each use common subschemas. I want to unit test my schemas before publishing them by writing unit tests that assert that, given certain input, the input is deemed valid or invalid, using a JSON schema library that I trust to be correct (so that I'm just testing my schemas, not the library).
Where I get confused is that in order to load my schemas before I've published them (which I want to do while running tests locally and during CI/CD), I need to use relative local paths like this:
"pet": { "$ref": "file://./schemas/components/pet.schema.json" }
That's because that pet schema hasn't been published to a URL yet. It hasn't been verified by automated tests that it's correct yet. This works well enough for running tests, and it also worked well for packaging inside a Docker image so that the schemas could be loaded from disk as the app starts up.
But then, if I were to give one of the top level schemas to someone (that leverages $ref) after publishing it to an absolute URL, it wouldn't load in their program because of that path that I used that worked only for my unit testing.
I found that I had to publish my schemas using absolute URLs in order for them to be used in consuming programs. I ended up publishing the schemas https://mattwelke.github.io/go-jsonschema-ref-docker-example/schemas/person.1-0-0.schema.json and https://mattwelke.github.io/go-jsonschema-ref-docker-example/schemas/components/pet.1-0-0.schema.json that way. I tested that they worked fine in a consuming program by writing the program:
package main
import (
"fmt"
"github.com/xeipuuv/gojsonschema"
)
func main() {
schemaLoader := gojsonschema.NewReferenceLoader("https://mattwelke.github.io/go-jsonschema-ref-docker-example/schemas/person.1-0-0.schema.json")
jsonStr := `
{
"name": "Matt",
"pet": {
"name": "Shady"
}
}
`
documentLoader := gojsonschema.NewStringLoader(jsonStr)
result, err := gojsonschema.Validate(schemaLoader, documentLoader)
if err != nil {
panic(fmt.Errorf("could not validate: %w", err))
}
if result.Valid() {
fmt.Printf("The document is valid.\n")
} else {
fmt.Printf("The document is not valid. See errors:\n")
for _, desc := range result.Errors() {
fmt.Printf("- %s\n", desc)
}
}
}
Which resulted in the following expected output:
The document is valid.
So I'm confused about this "chicken and egg" situation.
I was able to publish schemas that could be used, as long as I didn't unit test them before publishing them.
And I was able to unit test schemas as long as:
I didn't want to publish them in the form that was verified by the unit testing to be correct.
I'm okay with my application loading them via HTTPS as it starts up instead of loading them from disk. I'm worried about this because I don't want a web server to be a point of failure for my app starting up.
I would appreciate some insight in how one might accomplish both goals.
Where I get confused is that in order to load my schemas before I've published them (which I want to do while running tests locally and during CI/CD), I need to use relative local paths
Your initial assumption is false. URIs used in the $id keyword can be arbitrary identifiers -- they do not need to be resolvable via the network or disk at the stated location. In fact, it is an error for a JSON Schema implementation to assume to find schema documents at the stated location: they MUST support being able to load documents locally and associate them with the stated identifier:
The "$id" keyword identifies a schema resource with its canonical URI.
Note that this URI is an identifier and not necessarily a network locator. In the case of a network-addressable URL, a schema need not be downloadable from its canonical URI.
source
A schema need not be downloadable from the address if it is a network-addressable URL, and implementations SHOULD NOT assume they should perform a network operation when they encounter a network-addressable URI.
source
Therefore, you can give your schema document any identifier you like, such as the URI you anticipate using when you eventually publish your schema for public consumption, and perform local testing using that identifier.
Any implementation that does not support doing this is in violation of the specification, and this should be reported to its maintainers as a bug.
we need to create a document which references one document in another collection. We know the id of the document being referenced and that's all we need to know.
our first approach is:
$referencedDocument=$repository->find($referencedId);
$newDocument->setUser($referencedDocument);
now the question is if we can do it somehow without the first line (and hitting the database). In the db (we use Mongo) reference is just an integer field and we know that target id, so finding() the $referencedDocument seems redundant.
We tried to create new User with just an id set, but that gets us an error during persisting.
Thanks!
In one of projects I used something like this:
$categoryReference = $this->getEntityManager()->getReference(ProjectCategory::class, $category['id']);
Thou, if you use Mongo, you probably need to use getDocumentManager()
So, link to doctrine docs. mongo odm 1.0.
I have been implementing a new project which I have decided to use the repository pattern and Entity Framework.
I have sucessfuly implemented basic CRUD methods and I have no moved onto my DeepLoads.
From all the examples and documentation I can find to do this I need to call something like this:
public Foo DeepLoadFoo()
{
return (from foobah in Context.Items.Include("bah").Include("foo").Include("foofoo") select foo).Single();
}
This doesnt work for me, maybe I am trying to be too lazy but what I would like to achieve would be something along the lines of this:
public Foo DeepLoadFoo(Foo entity, Type[] childTypes)
{
return (from foobah in Context.Items.Include(childTypes).Single();
}
Is anything like this possible, or am I stuck with include.include.include.include?
Thanks
This blog post mentions that the Entity Framework ObjectContext has all the metadata about entities and their properties. So maybe you can use that metadata to walk the properties of your entity, and their child properties, etc.
In other words, I believe you should be able to use the metadata to automatically compose Include calls on your query.
Suppose you have the canonical Customer domain object. You have three different screens on which Customer is displayed: External Admin, Internal Admin, and Update Account.
Suppose further that each screen displays only a subset of all of the data contained in the Customer object.
The problem is: when the UI passes data back from each screen (e.g. through a DTO), it contains only that subset of a full Customer domain object. So when you send that DTO to the Customer Factory to re-create the Customer object, you have only part of the Customer.
Then you send this Customer to your Customer Repository to save it, and a bunch of data will get wiped out because it isn't there. Tragedy ensues.
So the question is: how would you deal with this problem?
Some of my ideas:
include an argument to the
Repository indicating which part of
the Customer to update, and ignore
others
when you load the Customer, keep it in static memory, or in the session, or wherever, and then when you receive one of the DTOs from the UI, update only the parts relevant to the DTO
IMO, both of these are kludges. Are there any other better ideas?
#chadmyers: Here is the problem.
Entity has properties A, B, C, and D.
DTO #1 contains properties for B and C.
DTO #2 contains properties for C and D.
UI asks for DTO #1, you load entity from the repository, convert it into DTO #1, filling in only B and C, and give it to the UI.
Now UI updates B and sends the DTO back. You recreate the entity and it has only B and C filled in because that is all that is contained in the DTO.
Now you want to save the entity, which has only B and C filled in, with A and D null/blank. The repository has no way of knowing if it should update A and D in persistence as blanks, or whether it should ignore them.
I would use factory to load a complete customer object from repository upon receipt of DTO. After that you can update only those fields that were specified in DTO.
That also allows you to apply some optimistic concurrency on your customer by checking last-updated timestamp, for example.
Is this a web app? Load the customer object from the repo, update it from the DTO, save it back. That doesn't seem like a kludge to me. :)
UPDATE: As per your updates (the A, B, C, D example)
So what I was thinking is that when you load the entity, it has A, B, C, and D filled in. If DTO#1 only updates B & C, that's OK. A and D are unaffected (which is the desired situation).
What the repository does with the B & C updates is up to him. If you're using Hibernate/NHibernate, for example, it will just figure it out and issue an update.
Just because DTO #1 only has B & C doesn't mean you have to also null out A & D. Just leave them alone.
I missed the point of this question at first because it is predicated on a few things that I don't think make sense from a design perspective.
Hydrating an entity from repository and then converting it to a DTO is a waste of effort. I assume that your DAL passes a DTO to your repository which then converts it to a full entity object. So converting it back to a DTO seems wasteful.
Having multiple DTOs makes sense if you have a search results page that shows a high volume of records and only displays part of your entity data. In that case it's efficient to pass that page just the data it needs. It does not make sense to pass a DTO that contains partial data to a CRUD page. Just give it a full DTO or even a full entity object. If it doesn't use all of the data, fine, no harm done.
So that main problem is that I don't think you should pass data to these pages using partial DTOs. If you used a full DTO, I would do the following 3 steps whenever the save action is performed:
Pull the full DTO from repository or db
Update the DTO with any changes made through the form
Save the full DTO back to the repository or db
This method requires an extra db hit but that's really not a significant issue on a CRUD form.
If we have an understanding that a Repository handles (almost exclusively) very rich domain Entity, then you numerous DTO's could simply map back.
i.e.
dtoUser.MapFrom<In,Out>(Entity)
or
dtoAdmin.MapFrom<In,Out>(Entity)
you would do the reverse to get the dto information back to the Entity and so on. So your repository only saves rich Entity's NOT numerous DTO's
entity.Foo = dtoUser.Foo
or
entity.Bar = dtoAdmin.Bar
entityRepsotiry.Save(entity) <-- do not pass DTO.
The whole point of DTO's is to keep things simple for the presentation or say for WCF dataTransfer, it has nothing to do with the Repository or the Entity for that matter.
Furthermore, you should never construct an Entity from DTO's... the only two ways to ever acquire an Entity is through a Factory(new) or a Repository(existing) respectively.
You mention storing the Entity somewhere, why would you do this? That is the job of your repository. It will decide where to get the Entity(db,cache,e.t.c), no need to store it somewhere else.
Hope that helps assign responsibility in your domain, it is always a challenge and there are gray area's here and there but in general, these are the typical uses of Repository, DTO e.t.c.