How to output all the processor at same time in Nifi so that I can get same UUID() value for an Attribute - compare

I am generating a Json Array from two separate MergeContent Processor, both have a common Attribute call pointId(i.e UUID()) but I am not able to get the same PointId from MergeContent.
I am passing the same PointId to both MergeContent, but I am not able to get common pointId from both merge processor.

Did you set Correlation Attribute Name to pointId in MergeContent? If so, how many bins are you using and how many unique IDs could be in the system at once?

Related

Is this a reasonable way to design this DynamoDB table? Alternatives?

Our team has started to use AWS and one of our projects will require storing approval statuses of various recommendations in a table.
There are various things that identify a single recommendation, let's say they're : State, ApplicationDate, LocationID, and Phase. And then a bunch of attributes corresponding to the recommendation (title, volume, etc. etc.)
The use case will often require grabbing all entries for a given State and ApplicationDate (and then we will look at all the LocationId and Phase items that correspond to it) for review from a UI. Items are added to the table one at a time for a given Station, ApplicationDate, LocationId, Phase and updated frequently.
A dev with a little more AWS experience mentioned we should probably use State+ApplicationDate as the partition key, and LocationId+Phase as the sort key. These two pieces combined would make the primary key. I generally understand this, but how does that work if we start getting multiple recommendations for the same primary key? I figure we either are ok with just overwriting what was previously there, OR we have to add some other attribute so we can write a recommendation for the State+ApplicationDate/LocationId+Phase multiple times and get all previous values if we need to... but that would require adding something to the primary key right? Would that be like adding some kind of unique value to the sort key? Or for example, if we need to do status and want to record different values at different statuses, would we just need to add status to the sort key?
Does this sound like a reasonable approach or should I be exploring a different NAWS offering for storing this data?
Use a time-based id property, such as a ULID or KSID. This will provide randomness to avoid overwriting data, but also provide a time-based sorting of your data when used as part of a sort key
Because the id value is random, you will want to add it to your sort key for the table or index where you perform your list operations, and reserve the pk for known values that can be specified exactly.
It sounds like the 'State' is a value that can change. You can't update an item's key attributes on the table, so it is more common to use these attributes in a key for a GSI if they are needed to list data.
Given the above, an alternative design is to use the LocationId as the pk, the random id value as the sk, and a GSI with the GSI with 'State' as the pk and the random id as the sk. Or, if you want to list the items by State -> Phase -> date, the GSI sk could be a concatenation of the Phase and id property. The above pattern gives you another list mechanism using the LocationId + timestamp of the recommendation create time.

find prestashop attribute by name with webservice

To add an attribute or feature to a product by webservice, I need the id
To get the id for "red", I must first make a call to
get product_attribute_values?filter[id_category]=9 // 9 is colours
Then from that list I have to
get product_attribute_values/2
get product_attribute_values/4
get product_attribute_values/17
...
to get the names of each attribute_value (green,blue,red) until I find the value I want.
Is there no way to get attribute id by name?
I could do it easily in a join or two in mysql, but I am trying to do a clean import module to use only the webservice.
Because attributes like dimensions and weight are to be stored as an attribute or feature, even if the product schema contains dimensions and weight, the dimension and weight lists are rather large.
A work-around could be to just once scan the attribute/feature/options, and build a translate table in my import software.
Then when a product needs an attribute value, look it up in the cached table, perhaps verify if the tranlation text->id is still correct, and if not, rescan the attributes.
But can it be avoided?

what does the attribute selection in preprocess tab do in weka?

I cant seem to find out what attribute selection filter does in pre process tab? someone could please tell me in simple language as im new to weka
when i apply it to my dataset it seems to remove a couple of attributes but im unsure why
A real data set may contain many attributes. Applying any data mining process on this data set (e.g. finding clusters, generating a classification model ...) may take very long time.
Instead of that, we can select some attributes(dimensions) which is called the most discriminative attributes. These attributes can almost describe the data set with lower number of attributes and this will speed up any process done on the data.
Attribute selection tab contains many different methods for selecting these attributes. One of them is CFS Feature Set Evaluation This filter gives you the attributes that have higher correlation with the class label which makes them discriminative attributes.

Assigning unique IDs to String using MapReduce

I want to run a MapReduce Job where I want to scan multiple columns from a given file and assign a unique ID(Index No.) to each distinct value for each column. The main challenge is to share the same ID for same value that is encountered on different node or different instances of Reducer.
Currently, I am using zookeeper for sharing the Unique IDs, but that is having its performance impact. I have even kept the information in local cache's at reducer level to avoid multiple trips to zookeeper for same value. I wanted to explore if there is any other better mechanism to do the same.
I can suggest two possible solutions for your problem
Create unique ID based on your value. This might be a hash function with low collision rate.
Use faster storage than ZooKeeper. You can try simple key value storage like Redis to store value to id mapping.

How can i query to get the multiple values in SimpleDB (AWS)

jpg
In that Picture i have colored one part. i have attribute called "deviceModel". It contains more than one value.. i want to take using query from my domain which ItemName() contains deviceModel attribute values more than one value.
Thanks,
Senthil Raja
There is no direct approach to get what you are asking.. You need to manipulate by writing your own piece of code. By running SELECT query you will get the item Attribute-value pair. So here you need to traverse each each itemName() and count values of your desire attribute.
I think what you are refering to is called MultiValued Attributes. When you put a value in the attribute - if you don't replace the existing attribute value the values will multiply, giving you an array of items connected to the value of that attribute name.
How you create them will depend on the sdk/language you are using for your REST calls, however look for the Replace=true/false when you set the attribute's value.
Here is the documentation page on retrieving them: http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/ (look under Using Amazon SimpleDB -> Using Select to Create Amazon SimpleDB Queries -> Queries on Attributes with Multiple Values)