To add an attribute or feature to a product by webservice, I need the id
To get the id for "red", I must first make a call to
get product_attribute_values?filter[id_category]=9 // 9 is colours
Then from that list I have to
get product_attribute_values/2
get product_attribute_values/4
get product_attribute_values/17
...
to get the names of each attribute_value (green,blue,red) until I find the value I want.
Is there no way to get attribute id by name?
I could do it easily in a join or two in mysql, but I am trying to do a clean import module to use only the webservice.
Because attributes like dimensions and weight are to be stored as an attribute or feature, even if the product schema contains dimensions and weight, the dimension and weight lists are rather large.
A work-around could be to just once scan the attribute/feature/options, and build a translate table in my import software.
Then when a product needs an attribute value, look it up in the cached table, perhaps verify if the tranlation text->id is still correct, and if not, rescan the attributes.
But can it be avoided?
Related
I am having difficulty understanding the Gremlin data load format (for use with Amazon Neptune).
Say I have a CSV with the following columns:
date_order_created
customer_no
order_no
zip_code
item_id
item_short_description
The requirements for the Gremlin load format are that the data is in an edge file and a vertex file.
The edge file must have the following columns: id, label, from and to.
The vertex file must have: id and label columns.
I have been referring to this page for guidance: https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-tutorial-format-gremlin.html
It states that in the edge file, the from column must equate to "the vertex ID of the from vertex."
And that (in the edge file) the to column must equate to "the vertex ID of the to vertex."
My questions:
Which columns need to be renamed to id, label, from and to? Or, should I add new columns?
Do I only need one vertex file or multiple?
You can have one or more of each CSV file (nodes, edges) but it is recommended to use fewer large files rather than many smaller ones. This allows the bulk loader to split the file up and load it in a parallel fashion.
As to the column headers, let's say you had a node (vertex) file of the form:
~id,~label,name,breed,age:Int
dog-1,Dog,Toby,Retriever,11
dog-2,Dog,Scamp,Spaniel,12
The edge file (for dogs that are friends), might look like this
~id,~label,~from,~to
e-1,FRIENDS_WITH,dog-1,dog-2
In Amazon Neptune, so long as they are unique, any user provided string can be used as a node or edge ID. So in your example, if customer_no is guaranteed to be unique, rather than store it as a property called customer_no you could instead make it the ~id. This can help later with efficient lookups. You can think of the ID as being a bit like a Primary Key in a relational database.
So in summary, you need to always provide the required fields like ~id and ~label. They are accessed differently using Gremlin steps such as hasLabel and hasId once the data is loaded. Columns with names from your domain like order_no will become properties on the node or edge they are defined with, and will be accessed using Gremlin steps such as has('order_no', 'ABC-123')
To follow on from Kelvin's response and provide some further detail around data modeling...
Before getting to the point of loading the data into a graph database, you need to determine what the graph data model will look like. This is done by first deriving a "naive" approach of how you think the entities in the data are connected and then validating this approach by asking the relevant questions (which will turn into queries) that you want to ask of the data.
By way of example, I notice that your dataset has information related to customers, orders, and items. It also has some relevant attributes related to each. Knowing nothing about your use case, I may derive a "naive" model that looks like:
What you have with your original dataset appears similar to what you might see in a relational database as a Join Table. This is a table that contains multiple foreign keys (the ids/no's fields) and maybe some related properties for those relationships. In a graph, relationships are materialized through the use of edges. So in this case, you are expanding this join table into the original set of entities and the relationships between each.
To validate that we have the correct model, we then want to look at the model and see if we can answer relevant questions that we would want to ask of this data. By example, if we wanted to know all items purchased by a customer, we could trace our finger from a customer vertex to the item vertex. Being able to see how to get from point A to point B ensures that we will be able to easily write graph queries for these questions later on.
After you derive this model, you can then determine how best to transform the original source data into the CSV bulk load format. So in this case, you would take each row in your original dataset and convert that to:
For your vertices:
~id, ~label, zip_code, date_order_created, item_short_description
customer001, Customer, 90210, ,
order001, Order, , 2023-01-10,
item001, Item, , , "A small, non-descript black box"
Note that I'm reusing the no's/ids for the customer, item, and order as the ID for their related vertices. This is always good practice as you can then easily lookup a customer, order, or item by that ID. Also note that the CSV becomes a sparse 2-dimensional array of related entities and their properties. I'm only providing the properties related to each type of vertex. By leaving the others blank, they will not be created.
For your edges, you then need to materialize the relationships between each entity based on the fact that they are related by being in the same row of your source "join table". These relationships did not previously have a unique identifier, so we can create one (it can be arbitrary or based on other parts of the data; it just needs to be unique). I like using the vertex IDs of the two related vertices and the label of the relationship when possible. For the ~from and ~to fields, we are including the vertices from which the relationship is deriving and what it is applying to, respectively:
~id, ~label, ~from, ~to
customer001-has_ordered-order001, has_ordered, customer001, order001
order001-contains-item001, contains, order001, item001
I hope that adds some further color and reasoning around how to get from your source data and into the format that Kelvin shows above.
I am working with Django and I am a bit lost on how to extract information from models (tables).
I have a table containing different information from various sensors. What I would like to know is if it is possible from the Django models to obtain for each sensor (each sensor has an identifier) the last row of data (using the timestamp column).
In sql it would be something like this, (probably the query is not correct but I think you can understand what I'm trying)
SELECT sensorID,timestamp,sensorField1,sensorField2
FROM sensorTable
GROUP BY sensorID
ORDER BY max(timestamp);
I have seen that the group_by() function exists and also lastest() but I don't get anything coherent and I'm also not clear if I'm choosing the best form.
Can anyone help me get started with this topic? I imagine it is very easy but it is a new world and it is difficult to start.
Greetings!
When you use a PostgreSQL database, you can make use of the .distinct(..) method [Django-doc] of the queryset where you add fields that determine on what these should be distinct.
So you can obtain the latest sensors in Django with:
SensorModel.objects.order_by('sensor', '-timestamp').distinct('sensor')
We thus order by sensor (which is required for a .distinct(..)), and then in case of a tie (so two times the same sensor), we order on the timestamp in descending order, hence we pick the latest SensorModel object for that sensor.
Neptune documentation says they support "Set" property cardinality only on property data imported via CSV, which means there is no way that a newly arrived property value could overwrite the old property value on the same vertex, on the same property.
For example, if the first CSV imports
~id,~label,age
Marko,person,29
then Marko has a birthday & a second CSV imports
~id,~label,age
Marko,person,30
'Marko' vertex 'age' property will contain both age values, which doesn't seem useful.
AWS says this (collapsing Set to Single cardinality properties (keeping the last arrived value only) needs to be done with post-processing, via Gremlin traversals.
Does this mean that there should be a traversal that continuously scanning Vertexes with multiple (Set) properties and set the property once again with Single cardinality, with the last value possible? IF so, what is the optimal Gremlin query to do do that?
In pseudo-Gremlin i'd imagine something like:
g.V().property(single, properties(*), _.tail())
Is there a guarantee at all that Set-cardinality properties are always listed in order of arrival?
Or am i completely on the wrong track here.
Any help would be appreciated.
Update:
So the best thing i was able to come with up so far is still far from a perfect solution, but it still might be useful for someone in my shoes.
In Plan A if we happen to know the property names and the order of arrival does not matter at all (just want single cardinality on these props), the traversal for all vertexes could be something like:
g.V().has(${propname}).where(property(single, ${propname}, properties(${propname}).value().order().tail() ) )
The plan B is to collect new property values under temporary property names in the same vertex (eg. starting with _), and traverse through vertexes having such temporary property names and set original properties with their tailed values with single cardinality:
g.V().has(${temp_propname}).where(property(single, ${propname}, properties(${temp_propname}).value().order().tail() ) ).properties('temp_propname').drop()
The Plan C, which would be the coolest, but unfortunately does not work, is to keep collecting property values in a dedicated vertex, with epoch timestamps as property names, and property values as their values:
g.V(${vertexid}).out('has_propnames').properties()
==>vp[1542827843->value1]
==>vp[1542827798->value2]
==>vp[1542887080->latestvalue]
and sort the property names (keys), take the last one, and use its value to keep THE main vertex property value up-to-date with the latest value:
g.V().has(${propname}).where(out(${has_these_properties}).count().is(gt(0))).where(property(single, ${propname}, out(${has_these_properties}).properties().value( out(${has_these_properties}).properties().keys().order().tail() ) ) )
Looks like the parameter for value() step must be constant, it can't use the outcome of another traversal as parameter, so i could not get this working. Perhaps someone with more Gremlin experience know a workaround for this.
AWS have recently introduced 'single' cardinality support on CSV bulk loader:
https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-tutorial-format-gremlin.html
So no more Gremlin-level property value arrangement should be needed.
It would probably be more performant to read in the file from which you are bulk loading and set that property using the vertex id, rather than scanning for a vertex with multiple values for that property.
So your gremlin update query would be as follows.
g.V(${id})
.property(single,${key},${value})
In so far as whether set is a guaranteed order, I do not know. :(
I am trying to create a store of products with WooCommerce and would like, if possible, to simplify the process of adding new products by having a default set of attributes.
For example, all of my products share certain attributes including: volts, watts, colour temperature etc. I have created these attributes and can select them when I add a new product but it would be really useful, for consistency and speed, if every time I added a product of a certain category the Add Product page automatically added a given set of attributes, rather than me having to add each attribute one by one for every product I add.
Does anyone know if this is possible?
jpg
In that Picture i have colored one part. i have attribute called "deviceModel". It contains more than one value.. i want to take using query from my domain which ItemName() contains deviceModel attribute values more than one value.
Thanks,
Senthil Raja
There is no direct approach to get what you are asking.. You need to manipulate by writing your own piece of code. By running SELECT query you will get the item Attribute-value pair. So here you need to traverse each each itemName() and count values of your desire attribute.
I think what you are refering to is called MultiValued Attributes. When you put a value in the attribute - if you don't replace the existing attribute value the values will multiply, giving you an array of items connected to the value of that attribute name.
How you create them will depend on the sdk/language you are using for your REST calls, however look for the Replace=true/false when you set the attribute's value.
Here is the documentation page on retrieving them: http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/ (look under Using Amazon SimpleDB -> Using Select to Create Amazon SimpleDB Queries -> Queries on Attributes with Multiple Values)