I want to remove all attributes from a given flow file except the ones I explicitly define to keep.
Given the following sample flow file attributes:
name: aaa
Place: bbb
Host: ccc
JsonAttribute: {
"A": "a",
"B": "b"
}
data: ddd
And I want to keep Host and JsonAttribute only.
Thus, the resulting flow file attributes should be:
Host: ccc
JsonAttribute: {
"A": "a",
"B": "b"
}
How can I achieve this using the standard processors NiFi provides?
renaming the attributes
Is it possible to rename the attributes using the same procedure? E.g. I would like to keep the attributes like above but rename JsonAttribute into customName
The UpdateAttribute processor does not only allow to add / set flow file attributes but also enables you to delete existing attributes from a flow file.
To do so you have to pass a regular expression matching all attributes you want to remove to the property Delete Attributes Expression.
In your case you can use a negative look-around to match every attribute other than the ones you want to keep.
^((?!Host)(?!JsonAttribute).)*$
So no additional processor is needed.
keeping Host attribute, moving JsonAttribute to a different attribute name
You should be able to also achieve the second behaviour using only a single UpdateAttribute processor.
Simply add a property, e.g. customName, to the processor and reference the old attribute using the NiFi Expression Language:
${JsonAttribute}
In that case you may also simplify your deletion regex to also delete the JsonAttribute flow file attribute:
^((?!Host).)*$
AttributesToJSON Processor can filter data you don't need.You can set Attributes Regular Expression in this Processor.
In the invokehttp processor there is an Attributes to Send property where you could define attributes to be used in request.
So you don't need to filter out attributes in another processor.
Related
I have a situation where I would like to conditionally exclude a field from a query selection before I hit that query's resolver.
The use case being that my underlying API only exposes certain 'fields' based on the user's locale, and calls made to this API will throw errors if fields are requested that are not included of that locale.
I have tried an approach with directives,
type Person {
id: Int!
name: String!
medicare: String #locale(locales: ["AU"])
}
type query {
person(id: Int!): Person
}
And using the SchemaDirectiveVisitor.visitFieldDefinition, I override field.resolve for the medicare field to return null when the user locale doesn't match any of the locales defined on the directive.
However, when a client with a non "AU" locale executes the following
query {
person(id: 111) {
name
medicareNumber
}
}
}
the field resolver for medicare is never called and the query resolver makes a request to the underlying API, appending the fields in the selection set (including the invalid medicareNumber) as query parameters. The API call returns an error object at this point.
I believe this makes sense as it seems that the directive resolver is on the FieldDefinition and would only be called when the person resolver returns a valid result.
Is there a way to achieve this sort of functionality, with or without directives?
In general, I would caution against this kind of schema design. As a client, if I include a field in the selection set, I expect to see that field in the response -- removing the field from the selection set server-side goes against the spec and can cause unnecessary confusion (especially on a larger team or with a public API).
If you are examining the requested fields in order to determine the parameters to pass to your API call, then forcing a certain field to resolve to null won't do anything -- that field will still be included in the selection set. In fact, there's really no way to create a schema directive that will impact the selection set of a request.
The best approach here would be to 1) ensure any potentially-null fields are nullable in the schema and 2) explicitly filter the selection set wherever your selection-set-to-parameters logic is.
EDIT:
Schema directives won't show up as part of the schema object returned in the info, so they can't be used as flags. My suggestion would be to maintain a separate in-memory map. For example:
const fieldsByLocale = {
US: {
Person: ['name', 'medicareNumber'],
},
AU: {
Person: ['name'],
},
}
then you could just access the appropriate list to filter with fieldsByLocale[context.locale][info.returnType]. This filtering logic is specific to your data source (in this case, the external API), so this is a bit cleaner than "polluting" the schema with information that pertains to the storage layer. If the APIs change, or you switch to a different source for this information altogether (like a database), you can update the resolvers without touching your type definitions. In fact, this way, the filtering logic can easily live inside a domain/service layer instead of your resolvers.
I am trying to understand apache nifi in and out keeping files in hdfs and have various scenarios to work on. Please let me know the feasibility of each with explanations. I am adding few understanding with each scenario.
Can we check null value present with in a single column? I have checked different processors, and found notNull property, but I think this works on file names, not on columns present within file.
Can we drop a column present in hdfs using nifi transformations?
Can we change column values as in replace one text with other? I have checked replaceText property for the same.
Can we delete a row from file system?
Please suggest the possibilities and how to achieve the goal.
Try with this:
1.Can we check null value present with in a single column? I have checked different :
Yes using replace text processor you can check and replace if you want to replace or use 'Route on Attribute' if want to route based on null value condition.
Can we drop a column present in hdfs using nifi transformations?
Yes using same 'ReplaceText' processor you can put desired fields with delimiter as I used to have current date field and some mandatory fields only in my data with comma separated so I provided replacement value as
"${'userID'}","${'appID'}","${sitename}","${now():format("yyyy-MM-dd")}"
To change column value use 'ReplaceText' processor.
I wanted a JAPE which on execute will return Annotation list which only sounds meaning to my requirement. Like I do not want SpaceToken, Sentence, Token, Lookup etc. implicit in my Annotation. As this Jape will be in last of Application sequence and it does not require any thing to match in LHS( as far my understanding, CMIIW ), is there any way we can have only RHS code)
Phase: filteAnnot
Input: token
Options: control = appelt
Rule: filteAnnot
Priority: 50
-->
:label{
[My Logical Stuff of removing annotations]
}
First of all, you probably don't need to delete the annotations. Usually when you're embedding GATE you'll call a pipeline and then delete the document anyways.
If you need to clear the default Annotation Set you can run an "Annotation Set Transfer PR" to move your valuable annotations to a different AS and then "Document Reset PR" to clear the default AS. Or if you don't have that many annotation types, just use "Document Reset PR" and add the types to its "annotationTypes" parameter.
You can also write a groovy script PR to remove annotations:
inputAS.findAll{
it.type != "MyAnnotation"
}.each{ ann ->
outputAS.remove(ann); // probably removeAll would be simpler
}
I would like to setup ElasticSearch on a table "content" which also have translation table "content_translation" for localization purpose (Globalize gem). We have 10 languages.
I want to implement Elasticsearch for my data model. In SQL I would search like:
SELECT id, content_translation.content
FROM content
LEFT JOIN content_translation on content.id = content_translation.content_id
WHERE content_translation.content LIKE '%???????%'
I wonder what is the best strategy to do "left join" like search with ElasticSearch?
Should I create just "content" index with all translation data in it?
{"id":21, "translations":{"en":{"content":"Lorem..."}, "de":{"content":"Lorem..."} ..}
Should I create "content_translation" index and just filter results for specific locale?
{"content_id":21, "locale":"en", "content": "Lorem ..."}
Are there some good practices how to do it?
Should I take care of maintaining index by myself or should I use something like "Tire gem" which takes care about indexing by itself.
I would recommend the second alternative (one document per language), assuming that you wouldn't need to show content from multiple languages.
i.e.
{"content_id":21, "locale":"en", "content": "Lorem ..."}
I recommend a gem like Tire, and exploit it's DSL to your advantage.
You could have your content model to look like:
class Content < ActiveRecord:Base
include Tire::Model::Search
include Tire::Model::Callbacks
...
end
Then you could do have a search method that would do something like
Content.search do
query do
...
end
filter :terms, :locale => I18n.locale.to_s
end
Your application would need to maintain locale at all times, to serve the respective localized content. You could just use I18n's locale and look up data. Just pass this in as a filter and you could have the separation you wish. Bonus is, you get fallback for free if you have enabled it in i18n for rails.
However, if you have a use case where you need to show multi-lingual content side by side, then this fails and you could look at a single document holding all language content.
I have done a search for all nodes that have an attribute containing (substring) a String. These nodes can be found at different levels of the tree, sometimes 5 or 6 levels deep. I'd like to know what parent/ancestor node they correspond to at a specified level, 2 levels deep. The result for the search only should be much greater than the results for the corresponding parents.
EDIT to include code:
/xs:schema/xs:element/descendant::node()/#*[starts-with(., 'my-search-string-here')]
EDIT to clarify my intent:
When I execute the Xpath above sometimes the results are
/xs:schema/xs:element/xs:complexType/xs:attribute or
/xs:schema/xs:element/xs:complexType/xs:sequence/xs:element or
/xs:schema/xs:element/xs:complexType/xs:complexContent/xs:extension/xs:sequence/xs:element
These results indicate a place in the Schema where I have added application specific code. However, I need to remove this code now. I'm building an "adapter" schema that will redefine the original Schema (untouched) and import my schema. The String I am searching for is my prefix. What I need is the #name of the /xs:schema/node() in which the prefix is found, so I can create a new schema defining these elements. They will be imported into the adapter and redefine another schema (that I'm not supposed to modify).
To reiterate, I need to search all the attributes (descendants of /xs:schema/xs:element) for a prefix, and then get the corresponding /xs:schema/xs:element/#name for each of the matches to the search.
To reiterate, I need to search all the attributes (descendants of /xs:schema/xs:element) for a prefix, and then get the corresponding /xs:schema/xs:element/#name for each of the matches to the search.
/
xs:schema/
xs:element
[descendant::*/#*[starts-with(., 'my-search-string-here')]]/
#name
This should do it:
/xs:schema/xs:element[starts-with(descendant::node()/#*, 'my-search-string-here')]
You want to think of it as
select the xs:elements which contain a node with a matching attribute
rather than
select the matching attributes of descendant nodes of xs:elements, then work back up
As Eric mentioned, I need to change my thought process to select the xs:elements which contain a node with a matching attribute rather than select the matching attributes of descendant nodes of xs:elements, then work back up. This is critical. However, the code sample he posted to select the attributes does not work, we need to use another solution.
Here is the code that works to select an element that contains and attribute containing* (substring) a string.
/xs:schema/child::node()[descendant::node()/#*[starts-with(., 'my-prefix-here')]]