Presenting missing values as null or not at all in JSON - web-services

I am building a web service API, using JSON as the data language. Designing the structure of the data returned from the service, I am having some trouble deciding how to deal with missing values.
Consider this example: I have a product in my web store for which the price is yet unknown, maybe because the product has not yet been released. Do I include price: null (as shown below) or do I simply omit the price property on this item?
{
name: 'OSX 10.6.10',
brand: 'Apple',
price: null
}
My main concern is making the API as easy to consume as possible. The explicit null value makes it clear that a price can be expected on a product, but at the other hand it seems like wasted bytes. There could be a whole bunch of properties that are completely irrelevant to this particular product, while relevant for other products – should I show these as explicitly null as well?
{
name: 'OSX 10.6.10',
price: 29.95,
color: null,
size: null
}
Are there any "best practices" on web service design, favoring explicit or implicit null values? Any de-facto standard? Or does it depend entirely on the use case?

FWIW, my personal opinion:
Do I include price: null (as shown below) or do I simply omit the price property on this item?
I would set the values of "standard" fields to null. Although JSON is often used with JavaScript and there, missing properties can be handled similarly as the ones set to null, this must not be the case for other languages (e.g. Java). Having to test first whether a field is present seems inconvenient. Setting the values to null but having the fields present would be more consistent.
There could be a whole bunch of properties that are completely irrelevant to this particular product, while relevant for other products – should I show these as explicitly null as well?
I would only include those fields that are relevant for a product (e.g. not pages for a CD). It's the client's task to deal with these "optional" fields properly. If you have no value for a certain field which is relavant to a product, set it to null too.
As already said, the most important thing is to be consistent and to clearly specify which fields can be expected. You can reduce the data size using gzip compression.

i don't know which is "best practice". But i usually don't send fields that i don't need.
When i read response, i check if value exists:
if (res.size) {
// response has size
}

Related

AWS Personalize items attributes

I'm trying to implement personalization and having problems with Items schema.
Imagine I'm Amazon, I've products their brands and their categories. In what kind of Items schema should I include this information?
Should I include brand name as string as categorical field? Should I rather include brand ID as string or numeric? or should I include both?
What about categories? I've the same questions.
Metadata Fields Metadata includes string or non-string fields that
aren't required or don't use a reserved keyword. Metadata schemas have
the following restrictions:
Users and Items schemas require at least one metadata field,
Users and Interactions datasets can contain up to five metadata
fields. An Items dataset can contain up to 50 metadata fields.
If you add your own metadata field of type string, it must include the
categorical attribute. Otherwise, Amazon Personalize won't use the
field when training a model.
https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html
There are simply 2 ways to include your metadata in Items/Users datasets:
If it can be represented as a number value, then provide the actual value if it makes sense.
If it can be represented as string, then provide the string value and make sure, that categorical is set to true.
But let's take a look into "Why does they need me, to categorize my strings metadata?". The answer is pretty simple.
Let's start with an example.
If you would have Items as Amazon.com products and you would like to provide rates metadata field, then:
You could take all of the rates including the full review text sent by clients and simply put it as metadata field.
You can take just stars rating, calculate the average and put it as metadata field.
Probably the second one is making more sense in general. Having random, long reviews of product as metadata, pretty much changes nothing. Personalize doesn't understands if the review itself is good or bad, or if the author also recommends another product, so pretty much it doesn't really add anything to the recommendations.
However if you simply "cut" your dataset and calculate the average rating, like in the 2. point, then it makes a lot more sense. Maybe some of our customers like crappy products? Maybe they want to buy them, because they are famous YouTubers and they create videos about that? Based on their previous interactions and much more, Personalize will be able to perform just slightly better, because now it knows, that this product has rating of 5/5 or 3/5.
I wanted to show you, that for some cases, providing Items metadata as string makes no sense. That's why your string metadata must be categorical. It means, that it should be finite set of values, so it adds some knowledge for Personalize about given Item and why some of people might want to interact with it.
Going back to your question:
Should I include brand name as string as categorical field? Should I rather include brand ID as string or numeric? or should I include both?
I would simply go with brand ID as string. You could also go with brand name, but probably single brand can be renamed, when it's still the same brand, so picking up the ID would be more constant. Also two different brands could have the same names, because they are present on different markets, so picking up the ID solves that.
The "categorical": true switch in your schema just tells Personalize:
Hey, do you see that string field? It's categorised, finite set of values. If you train a model for me, please include this one during the training, it's important!
And as it's said in documentation, if you will provide string metadata field, which is not marked as categorical, then Personalize will "think" that:
Hmm.. this field is a string, it has pretty random values and it's not marked as categorical. It's probably just a leftover from Items export job. Let's ignore that.

"Cache data may be lost" warning when merging non-normalized data in Apollo Client 3

I'm upgrading my application with Apollo Client from v2 to v3 and I can't find the correct solution to the following problem.
I have a schema with a product and inside this product, a price. This price is not a simple number as it contains the duty free value, the all taxes included value and the VAT.
type Product {
id: ID
price: Price
}
type Price {
dutyFree: Float
allTaxesIncluded: Float
VAT: Float
}
In Apollo Client 2, whenever there was no explicit id or _id property, the InMemoryCache created a fallback fake identifier to normalize data, based on the path to the object.
In Apollo Client 3, this fallback fake identifier is no longer generated. Instead you have two options to handle non-normalized data. The first is to use the new TypePolicy option and indicates explicitly the data you receive should not be normalize. In that case, data will be linked to the parent normalized data.
The doc :
Objects that are not normalized are instead embedded within their parent object in the cache. You can't access these objects directly, but you can access them via their parent.
new InMemoryCache({
typePolicies: {
Price {
keyFields: false
}
}
})
All happy, I though my problem was solved. Well, wrong ... I can create a product in my app and add a price. But whenever I change an existing price, I get the following warning :
Cache data may be lost when replacing the price field of a Product object.
Because, when I get my Product after an update, the InMemoryCache does not know how to merge the field Price because no id is defined, which is the point of non-normalized data.
I know there is the second option to explicitly define a merge function for my Product.price field, but this example is a simpler version of the reality. I have a large number of fields through multiple objects which are typed Price, and manually defining a merge function for each and everyone one of them (even by externalizing the common logic in a function) is something I find quite inefficient and source of errors.
So my question is : what did I misunderstood about the keyFields: false option and what can I do to solve this problem without having to resort to define a merge function to 50+ fields in my app ?
Thanks for the help :)
I'm not sure you've misunderstood keyFields: false. My understanding is that when the Product is updated in the cache, InMemoryCache must handle any differences in the Price objects embedded in the price field of the old Product and the new Product. If there isn't a TypePolicy to define how that should be done, the cache logs a warning.
Starting in Apollo Client 3.3, merge functions can be defined for types in addition to fields. Here's an example from their docs:
const cache = new InMemoryCache({
typePolicies: {
Book: {
fields: {
// No longer necessary!
// author: {
// merge: true,
// },
},
},
Author: {
merge: true,
},
},
});
Since you don't want to define a merge function on a field-by-field basis, you might try defining the merge function for the Price type instead.

apollo-server - Conditionally exclude fields from selection set

I have a situation where I would like to conditionally exclude a field from a query selection before I hit that query's resolver.
The use case being that my underlying API only exposes certain 'fields' based on the user's locale, and calls made to this API will throw errors if fields are requested that are not included of that locale.
I have tried an approach with directives,
type Person {
id: Int!
name: String!
medicare: String #locale(locales: ["AU"])
}
type query {
person(id: Int!): Person
}
And using the SchemaDirectiveVisitor.visitFieldDefinition, I override field.resolve for the medicare field to return null when the user locale doesn't match any of the locales defined on the directive.
However, when a client with a non "AU" locale executes the following
query {
person(id: 111) {
name
medicareNumber
}
}
}
the field resolver for medicare is never called and the query resolver makes a request to the underlying API, appending the fields in the selection set (including the invalid medicareNumber) as query parameters. The API call returns an error object at this point.
I believe this makes sense as it seems that the directive resolver is on the FieldDefinition and would only be called when the person resolver returns a valid result.
Is there a way to achieve this sort of functionality, with or without directives?
In general, I would caution against this kind of schema design. As a client, if I include a field in the selection set, I expect to see that field in the response -- removing the field from the selection set server-side goes against the spec and can cause unnecessary confusion (especially on a larger team or with a public API).
If you are examining the requested fields in order to determine the parameters to pass to your API call, then forcing a certain field to resolve to null won't do anything -- that field will still be included in the selection set. In fact, there's really no way to create a schema directive that will impact the selection set of a request.
The best approach here would be to 1) ensure any potentially-null fields are nullable in the schema and 2) explicitly filter the selection set wherever your selection-set-to-parameters logic is.
EDIT:
Schema directives won't show up as part of the schema object returned in the info, so they can't be used as flags. My suggestion would be to maintain a separate in-memory map. For example:
const fieldsByLocale = {
US: {
Person: ['name', 'medicareNumber'],
},
AU: {
Person: ['name'],
},
}
then you could just access the appropriate list to filter with fieldsByLocale[context.locale][info.returnType]. This filtering logic is specific to your data source (in this case, the external API), so this is a bit cleaner than "polluting" the schema with information that pertains to the storage layer. If the APIs change, or you switch to a different source for this information altogether (like a database), you can update the resolvers without touching your type definitions. In fact, this way, the filtering logic can easily live inside a domain/service layer instead of your resolvers.

Django custom creation manager logic for temporal database

I am trying to develop a Django application that has built-in logic around temporal states for objects. The desire is to be able to have a singular object representing a resource, while having attributes of that resource be able to change over time. For example, a desired use case is to query the owner of a resource at any given time (last year, yesterday, tomorrow, next year, ...).
Here is what I am working with...
class Resource(models.Model):
id = models.AutoField(primary_key=True)
class ResourceState(models.Model):
id = models.AutoField(primary_key=True)
# Link the resource this state is applied to
resource = models.ForeignKey(Resource, related_name='states', on_delete=models.CASCADE)
# Track when this state is ACTIVE on a resource
start_dt = models.DateTimeField()
end_dt = models.DateTimeField()
# Temporal fields, can change between ResourceStates
owner = models.CharField(max_length=100)
description = models.TextField(max_length=500)
I feel like I am going to have to create a custom interface to interact with this state. Some example use cases (interface is completely up in the air)...
# Get all of the states that were ever active on resource 1 (this is already possible)
Resource.objects.get(id=1).states.objects.all()
# Get the owner of resource 1 from the state that was active yesterday, this is non-standard behavior
Resource.objects.get(id=1).states.at(YESTERDAY).owner
# Create a new state for resource 1, active between tomorrow and infinity (None == infinity)
# This is obviously non standard if I want to enforce one-state-per-timepoint
Resource.objects.get(id=1).states.create(
start_dt=TOMORROW,
end_dt=None,
owner="New Owner",
description="New Description"
)
I feel the largest amount of custom logic will be required to do creates. I want to enforce that only one ResourceState can be active on a Resource for any given timepoint. This means that to create some ResourceState objects, I will need to adjust/remove others.
>> resource = Resource.objects.get(id=1)
>> resource.states.objects.all()
[ResourceState(start_dt=None, end_dt=None, owner='owner1')]
>> resource.states.create(start_dt=YESTERDAY, end_dt=TOMORROW, owner='owner2')
>> resource.states.objects.all()
[
ResourceState(start_dt=None, end_dt=YESTERDAY, owner='owner1'),
ResourceState(start_dt=YESTERDAY, end_dt=TOMORROW, owner='owner2'),
ResourceState(start_dt=TOMORROW, end_dt=None, owner='owner1')
]
I know I will have to do most of the legwork around defining the logic, but is there any intuitive place where I should put it? Does Django provide an easy place for me to create these methods? If so, where is the best place to apply them? Against the Resource object? Using a custom Manager to deal with interacting with related 'ResourceState' objects?
Re-reading the above it is a bit confusing, but this isnt a simple topic either!! Please let me know if anyone has any ideas for how to do something like the above!
Thanks a ton!
too long for a comment, and purely some thoughts, not a full answer, but having dealt with many date effective records in financial systems (not in Django) some things come to mind:
My gut would be to start by putting it on the save method of the resource model. You are probably right in needing a custom manager as well.
I'd probably also flirt with the idea of a is_current boolean field in the state model but certain care would need to be considered with future date effective state records. If there is only one active state at a time, I'd also examine the need for an enddate. Having both start and end definitely makes the raw sql queries (if ever needed) easier: date() between state.start and state.end <- this would give current record, sub in any date to get that date's effective record. Also, give some consideration to the open ended end date where you don't know the end date date. Your queries will have to handle the nulls properly. YOu probably also may need to consider the open ended start date (say for a load of historical data where the original start date is unknown). I'd suggest staying away from using some super early date as a fill in (same for date far in the future for unknown end dates) - If you end up with lots of transactions, your query optimizer may thank you, however, I may be old and this doesn't matter anymore.
If you like to read about this stuff, I'd recommend a look at 1.8 in https://www.amazon.ca/Art-SQL-Stephane-Faroult/dp/0596008945/ and chapter 6:
"But before settling for one solution, we must acknowledge that
valuation tables come in all shapes and sizes. For instance, those of
telecom companies, which handle tremendous amounts of data, have a
relatively short price list that doesn't change very often. By
contrast, an investment bank stores new prices for all the securities,
derivatives, and any type of financial product it may be dealing with
almost continuously. A good solution in one case will not necessarily
be a good solution in another.
Handling data that both accumulates and changes requires very careful
design and tactics that vary according to the rate of change."

Listing SSRS subscriptions with user-defined descriptions

We have a growing number of non-data-driven SSRS 2008 subscriptions and the default list in SSRS does not provide any way to indicate what each subscription is all about. Yes, there is a description but it is auto-generated and not very helpful. We need it to say "Company ABC Quarterly to Managers", for instance.
I looked at using the ReportingService2010 web service and managed to not only read each report's description but modify it. However, as soon as someone edits the subscription from SSRS, which will be required sometimes, the description reverts to the auto-generated one.
Although I have never worked with data-driven reports, I wonder whether these will provide the functionality that I need. It just seems like a lot of work to set up, given that I don't really need the subscriptions to be data-driven.
Am I missing something simple here? Is this simple functionality something that comes with upgrading to a newer SSRS version?
Thanks!
Am I missing something simple here? Is this simple functionality
something that comes with upgrading to a newer SSRS version?
It does not appear that standard subscriptions in SSRS 2012 allow descriptions to be entered either (I couldn't find a screenshot of the UI though).
Although I have never worked with data-driven reports, I wonder
whether these will provide the functionality that I need. It just
seems like a lot of work to set up, given that I don't really need the
subscriptions to be data-driven.
I would argue against the need for data-driven subscriptions if standard subscriptions are fulfilling the business need. Would this be worth your time and effort? If it is, the obvious advantage is that subscription descriptions are editable in the UI, and do not appear to be overwritten when modifying the subscription.
If you're still interested in using standard subscriptions...
One way to "hack" SSRS such that the subscriptions will not overwrite the description (which isn't editable in the UI anyway) would be to modify the stored procedure ReportServer.dbo.UpdateSubscription.
DISCLAIMER: Use the following advice at your own risk. This involves modifying a standard sproc that Reporting Services relies on.
You can alter the update statement such that the description value would only be modified if the subscription is not a data-driven subscription (we do not want to break data-driven subscription descriptions, which are editable in the UI). In the stored procedure, you can distinguish data-driven subscriptions from standard subscriptions by looking at the value of #DataSettings. If it IS NULL, then it is a plain old subscription. If it IS NOT NULL, then we're looking at a data-driven subscription.
The following line in the Update statement:
[Description] = #Description,
Can be changed to:
[Description] = CASE WHEN #DataSettings IS NULL THEN [Description]
ELSE #Description
END,
This would keep all standard subscription descriptions the same when they are modified in SSRS, but allow data-driven subscription descriptions to be modified.
Per Mat's Mug, I am noting the solution that I eventually used:
Since all of our subscriptions involve emailing the report, we could get away with re-purposing the email subject field as the description. This way it was possible to produce a listing of all subscriptions and show an arbitrary description for each one.
Here's an expansion on the hack from dev_etter. I was able to use the Stored Procedure #parameter input along with a new Report Parameter (pReportParameterName) to create a dynamic, user controllable description that's unique to the report.
Add a new hidden text parameter to the report (pReportParameterName)
Add the Code Block 1 below in Stored Procedure UpdateSubscription above the UPDATE statement.
Change the 'Description = ' line of the UPDATE statement to something like Code Block 2 below.
With that set up, create or edit a Timed Subscription and enter the desired text into the Description parameter. After saving you should see the entered description in the Description column of the Subscriptions screen.
/*** Code Block 1 ***/
DECLARE #Param as XML = CAST(#Parameters as XML)
DECLARE #MY_Description as varchar(512)
SELECT #MY_Description = e.f.value('(.)[1]', 'varchar (100)')
FROM (select 1 id, #Param xCol) tx
CROSS APPLY tx.xCol.nodes('./ParameterValues/ParameterValue') AS a(b)
CROSS APPLY a.b.nodes('./Name') AS c(d)
CROSS APPLY a.b.nodes('./Value') AS e(f)
WHERE c.d.value('(.)[1]', 'varchar (100)') LIKE 'pReportParameterName'
/*** Code Block 2 ***/
[Description] = --#Description
CASE
WHEN #DataSettings IS NULL
AND #MY_Description IS NOT NULL
THEN #MY_Description
ELSE #Description
END,
Cheers,
Sj