Select certain fields/children of flat xml with xpath - xslt

I have a xml file like a single database table, is there a way I can get "listpos"-rows of certain fields, ie. only "type" and "objName" ?
<listpos lfdNr="0001" reihe="20140626143443">
<type>Akt</type>
<objName>2#25.6.2014#40801#de</objName>
<laborOrt>au</laborOrt>
...
</listpos>
<listpos lfdNr="0002" reihe="20140626181936">
<type>Akt</type>
<objName>2#25.6.2014#40802#de</objName>
<laborOrt>au</laborOrt>
...
</listpos>
...
So the output should be
<listpos>
<type>Akt</type>
<objName>2#25.6.2014#40801#de</objName>
</listpos>
<listpos>
<type>Akt</type>
<objName>2#25.6.2014#40802#de</objName>
</listpos>
...
Writing this it comes to my mind it might be a job for xslt?
The whole thing I do, is trying to make django-xml work together with django_tables2 ...

Yes, by using XPath you can get that data pretty easily.
You navigate to the appropriate elements using Expressions.
For the data you're trying to extract (assuming you wanted to access all "type" and "objName" Elements), those would be "//listpos/type" and "//listpos/objName"
Those would get you the right nodes from which you can extract the content.

Related

Elastic search + translated content in Rails 4

I would like to setup ElasticSearch on a table "content" which also have translation table "content_translation" for localization purpose (Globalize gem). We have 10 languages.
I want to implement Elasticsearch for my data model. In SQL I would search like:
SELECT id, content_translation.content
FROM content
LEFT JOIN content_translation on content.id = content_translation.content_id
WHERE content_translation.content LIKE '%???????%'
I wonder what is the best strategy to do "left join" like search with ElasticSearch?
Should I create just "content" index with all translation data in it?
{"id":21, "translations":{"en":{"content":"Lorem..."}, "de":{"content":"Lorem..."} ..}
Should I create "content_translation" index and just filter results for specific locale?
{"content_id":21, "locale":"en", "content": "Lorem ..."}
Are there some good practices how to do it?
Should I take care of maintaining index by myself or should I use something like "Tire gem" which takes care about indexing by itself.
I would recommend the second alternative (one document per language), assuming that you wouldn't need to show content from multiple languages.
i.e.
{"content_id":21, "locale":"en", "content": "Lorem ..."}
I recommend a gem like Tire, and exploit it's DSL to your advantage.
You could have your content model to look like:
class Content < ActiveRecord:Base
include Tire::Model::Search
include Tire::Model::Callbacks
...
end
Then you could do have a search method that would do something like
Content.search do
query do
...
end
filter :terms, :locale => I18n.locale.to_s
end
Your application would need to maintain locale at all times, to serve the respective localized content. You could just use I18n's locale and look up data. Just pass this in as a filter and you could have the separation you wish. Bonus is, you get fallback for free if you have enabled it in i18n for rails.
However, if you have a use case where you need to show multi-lingual content side by side, then this fails and you could look at a single document holding all language content.

How to determine if a given word or phrase from a list is within an anchor tag?

We have a ColdFusion based site that involves a large number of 'document authors' that have little or no knowledge of HTML. The 'documents' they create are comprised of HTML stored in a table in the database. They use a CKEDITOR interface. The content that they create is output into specific area of the page. The document frequently has tons of technical terms that readers may not be familiar with that we would like to have tooltips automatically show up for.
I and the other programmer want to have some code insert 'tooltip' code into the page based on a list of words in a table on our SQL server. The 'dictionary' table in our database has a unique ID, the word/phrase we will look for and a corresponding definition that would be displayed in the tooltip.
For instance, one of the word/phrases we will be looking for is 'Scrum Master'. If it occurs in the document area, we need to insert code around the words to create a tooltip. To do that, we need to see if certain conditions exist. Are the words within an anchor tag? If yes, is there already a title value for the tag (title is used to contain the info to be displayed in a tooltip)? If a title tag exists, don't do anything. If the words are not in an anchor tag, then we would put anchor tags around the words along with the title that will contain the definition.
The tooltip code we use is via jQuery (http://jqueryui.com/tooltip/). It is quick and simple to use. We just need to figure out how to use it dynamically based on our dictionary table.
Do you have any suggestions of how to go about this?
I was hoping that jSoup might have a function that I could use, but that doesn't seem to be the right technology for what I want to do, but I could be wrong and I am happy to be corrected!
We have a large number of these documents and so manually inserting and maintaining the tooltip code is just not an option.
Update you content with something like:
strOut = ReplaceList(strIn, ValueList(qryTT.find), ValueList(qryTT.replace));
Since words are delimited by spaces, the qryTT.find needs to have spaces. The replace column is going to need to include some of the original content. You are going to have to be careful with words followed by a comma or a period too.
I would cache the results because I would expect it to be memory intensive.

Using RegEx in SSIS

I currently have a package pulling data from an excel file, but when pulling the data out I get rows I do not want. So I need to extract everything from the 'ID' field that has any sort of letter in it.
I need to be able to run a RegEx command such as "%[a-zA-Z]%" to pull out that data. But with the current limitation of conditional split it's not letting me do that. Any ideas on how this can be done?
At the core of the logic, you would use a Script Transformation as that's the only place you can access the regex.
You could simply a second column to your data flow, IDCleaned and that column would only contain cleaned values or a NULL. You could then use the Conditional Split to filter good rows vs bad. System.Text.RegularExpressions.Regex.Replace error in C# for SSIS
If you don't want to add another column, you can set your current ID column to be ReadWrite for the Script and then update in place. Perhaps adding a boolean column might make the Conditional Split logic easier at this point.

RegEx for a JSON string

Im storing a person object as JSON in my SQLite database. The table will have few 1000 records of person objects. What i need is to query person based on the "name" attribute.
After investigation i figured out using GLOB method of SQLite to perform a RegEx kind of search in the JSON elements.
My Sample JSON is something like this.
{"name":"john","age":"22","father-name":"jackson"}
Now i want a RegEx matcher to get me all the records that matches a part of the SubString provided with the name attribute in JSON. And it should be case insensitive too.
For Ex: "ohn" search should fetch me john's record.
While you can store JSON and search against it using regexes (which are rather limited in SQLite), it does not mean you should.
Instead, you should really consider splitting your JSON into fields and storing them in normal SQLite table. Doing so will not only allow you to perform easier searches without need to painfully parse data every single time, search will be much faster too (if you add necessary indexes).
If you do want to go down the regex route the following will extract the record:
/\{"name":"\w*ohn\w*[^\}]+\}/i
This will match each of these:
{"name":"john","age":"22","father-name":"jackson"}
{"name":"john","age":"22","father-name":"johnson"}
{"name":"johnny","age":"22","father-name":"smith"}
but not:
{"name":"fred","age":"22","father-name":"hall"},
{"name":"mike","age":"22","father-name":"johnson"}
{"name":"bob","age":"22","father-name":"todd"}

What is a good design pattern to implement a dynamic data importer tool?

We are planning to build a dynamic data import tool. Basically taking information on one end in a specified format (access, excel, csv) and upload it into an web service.
The situation is that we do not know the export field names, so the application will need to be able to see the wsdl definition and map to the valid entries in the other end.
In the import section we can define most of the fields, but usually they have a few that are custom. Which I see no problem with that.
I just wonder if there is a design pattern that will fit this type of application or help with the development of it.
I am not sure where the complexity is in your application, so I will just give an example of how I have used patterns for importing data of different formats. I created a factory which takes file format as argument and returns a parser for particular file format. Then I use the builder pattern. The parser is provided with a builder which the parser calls as it is parsing the file to construct desired data objects in application.
// In this example file format describes a house (complex data object)
AbstractReader reader = factory.createReader("name of file format");
AbstractBuilder builder = new HouseBuilder(list_of_houses);
reader.import(text_stream, builder);
// now the list_of_houses should contain an extra house
// as defined in the text_stream
I would say the Adaptor Pattern, as you are "adapting" the data from a file to an object, like the SqlDataDataAdapter does it from a Sql table to a DataTable
have a different Adaptor for each file type/format? example SqlDataAdptor, MySqlDataAdapter, they handle the same commands but different datasources, to achive the same output DataTable
Adaptor pattern
HTH
Bones
Probably Bridge could fit, since you have to deal with different file formats.
And Façade to simplify the usage. Handle my reply with care, I'm just learning design patterns :)
You will probably also need Abstract Factory and Command patterns.
If the data doesn't match the input format you will probably need to transform it somehow.
That's where the command pattern come in. Because the formats are dynamic, you will need to base the commands you generate off of the input. That's where Abstract factory is useful.
Our situation is that we need to import parametric shapes from competitors files. The layout of their screen and data fields are similar but different enough so that there is a conversion process. In addition we have over a half dozen competitor and maintenance would be a nightmare if done through code only. Since most of them use tables to store their parameters for their shapes we wrote a general purpose collection of objects to convert X into Y.
In my CAD/CAM application the file import is a Command. However the conversion magic is done by a Ruleset via the following steps.
Import the data into a table. The field names are pulled in as well depending on the format.
We pass the table to a RuleSet. I will explain the structure the ruleset in a minute.
The Ruleset transform the data into a new set of objects (or tables) which we retrieve
We pass the result to the rest of the software.
A RuleSet is comprise of set of Rules. A Rule can contain another Rule. A rule has a CONDITION that it tests, and a MAP TABLE.
The MAP TABLE maps the incoming field with a field (or property) in the result. There are can be one mapping or a multitude. The mapping doesn't have to involve just poking the input value into a output field. We have a syntax for calculation and string concatenation as well.
This syntax is also used in the Condition and can incorporate multiple files like ([INFIELD1] & "-" & [INFIELD2])="A-B" or [DIM1] + [DIM2] > 10. Anything between the brackets is substituted with a incoming field.
Rules can contain other Rules. The way this works is that in order for a sub Rule mapping to apply both it's condition and those of it's parent (or parents) have to be true. If a subRule has a mapping that conflicts with a parent's mapping then the subRule Mapping applies.
If two Rules on the same level have condition that are true and have conflicting mapping then the rule with the higher index (or lower on the list if you are looking at tree view) will have it's mapping apply.
Nested Rules is equivalent to ANDs while rules on the same level are equivalent of ORs.
The result is a mapping table that is applied to the incoming data to transform it to the needed output.
It is amicable to be being displayed in a UI. Namely a Treeview showing the rules hierarchy and a side panel showing the mapping table and conditions of the rule. Just as importantly you can create wizards that automate common rule structures.