Use Yahoo Pipes to convert RSS html tags to standard tag items - regex

I want to move from using bookmarking service Delicious to Diigo, but the way diigo organise tags in their RSS is preventing the move.
I want to use a Yahoo Pipe to turn Diigo rss tags into the same format as Delicious rss tags
Diigo tags are stored as a html list at the bottom of the 'Description' item, like this:
Some test describing the link.
<p class="diigo-tags"><strong>Tags:</strong>
<a rel="nofollow" target="_blank" href='https://www.diigo.com/user/username/firsttag'>firsttag</a>
<a rel="nofollow" target="_blank" href='https://www.diigo.com/user/username/2ndtag'>2ndtag</a>
<a rel="nofollow" target="_blank" href='https://www.diigo.com/user/username/anothertag'>anothertag</a>
etc... </p>
I need to extract each of these and store them in their own item. Delicious stores each tag in a nested field category by number, like this:
category
0
domain http://delicious.com/username/
content firsttag
1
domain http://delicious.com/username/
content 2ndtag
So, the Yahoo Pipe needs to strip the html list and separate each tag into single category fields.
Not sure where to start, except maybe this regular expression in regex to strip the html:
(?si)<a[^<>]*?[^<>]*>(.*?)</a>
Any advice appreciated.

You can extract the tags from the diigo stream by performing the following replacements using the Regex operator:
replace <a[^<>]*?[^<>]*>(.*?)</a> with $1, using options g and s (the tag itself within the <a>...</a>)
replace <.+> with nothing, using options g and m (delete all HTML tags)
replace [\s]+ with a single space, using options g and s
As a result, the description field now contains the list of tags separated by spaces. I'm not sure what you need next, if you tell me I can try to help.
Here's the pipe:
https://pipes.yahoo.com/pipes/pipe.info?_id=1656d9fcab9d9ed6016bdae7486ee71f
UPDATE
I see, the tricky part is adding multiple category nodes to an RSS feed. Unfortunately, I don't think that's possible. I updated the pipe, so that now you have item.category.1, .2, .3, and so on, but when you look at the RSS output of the pipe, it doesn't show any categories. (I think this might be related to the fact that the Create RSS operator doesn't have a category field either.)
In the JSON output there are multiple categories correctly.
I also tested that if there is only one category field, it would show up correctly in the RSS output. If there are more than one then no.
And I'm afraid this is as far as I can get you.

Related

Bad management of BR tag by M2Doc

I have come across a weird behavior of M2Doc.
My template is the following:
1.1 {M:REQ.REQIFNAME}
{m:req.ReqIFText.fromHTMLBodyString().replaceLink(req).reduceAllImages(380,380)}
If the ReqIfName field contains the following
<div id="LHDD__006"><br/>Beginning of the ReqIfText<br/>abc<br/>def<div/>
Then the whole content of the field is displayed in the title. Meaning that it actually merges with the preceding content.
If the ReqIfName field contains the following
<div id="LHDD__006">Beginning of the ReqIfText<br/>abc<br/>def<div/>
Then the result is the one one would expect.
Here is a screencopy of the result
Word document resulting from the generation

capybara find and save text value from a page

i am in the process of writing a Capybara Automation suite. one thing that i am trying to do is extract a value between the td tags from the html source
<td class="table-column-data">CB/AE9999XX/A001</td>
i.e. find and extract the value CB/AE9999XX/A001 then save it into a variable to use later on.
i hope you can help
thanks
saved_text = find("td.table-column-data").text
Will get the text from the element - obviously the selector passed needs to select a unique element, which will depend on the surrounding html
You can use the below mentioned way to extract and save the value in a variable:
extractedValue = find('.table-column-data').text
This will fetch the text "CB/AE9999XX/A001" and store it in the variable "extractedValue".
Apart from this, you can also extract the text using jquery as shown below:
extractedValue = page.evaluate_script("$('.table-column-data').text()")
Hope this helps :)

Reg Expression, Handling a URL exclusions

I am trying to write a regexp to use with Crazyegg that will allow me to only gather data from my product pages.
My site structure is:
category page: www.sitename.com/categoryname/sub-categoryname
product page: www.sitename.com/productname/
My regex so far is:
^https?://([A-Za-z0-9.-]*\.)?sitename\.com/[A-Za-z0-9.-]*(/|/\?|)$
This allows everything that isnt at the sub category level (2nd level folder?)
the issue is that this allow top level categories so I need to exclude these by their name for example:
^https?://([A-Za-z0-9.-]*\.)?sitename\.com/(?!\babout\b|\bcheckout\b)(/|/\?|)$
Could you please help me get the exclusion correct? ive also tried doing using [^\babout\b|\bcheckout\b]
if you only want product pages, the regex for capture product pages is:
".*productname"
For capture category pages: ".*categoryname/sub-categoryname"
I hope help you. If you have more questions, ask me!

Sharepoint2013 list item filter

My requirement is I have to submit records into a list/lib with attachments and that record can be tag to category field they can be multiple for a single item.
i.e. An item A can be tag to tag to category X or can be cat X,Y(multiple category can be)
My requirement it user can also filter these record in list/lib on the basis of category tagged.
i.e. if an item A is tagged with cat X,Y it should be show in both cat when we filter.
What approach i should i use in Sharepoint 2013?
you can use the taxonomy to achieve this behaviour. Generate Term sets and terms in it and use them to tag your List Item.
You didnt provide the details about how you are searching the data? by your code or the OOTB sharepoint search ?
Thanks

MediaWiki Template Templates?

Not sure I have the correct terminology here. I'll explain what I want to do and you guys can tell me if it's possible.
I'm using MediaWiki as a Customer List page. So, I have a category for customers, and for instance, I have 20 customers. Inside the actual customer page I have several "headings" that make up the customer page, including an infobox. What I'm wanting to know is, how I would go about "including" the headings as a template "Customer Landing Page". Meaning, each "Customer Landing Page" (Customer A or Customer B, etc, etc) has the same "headings", but not the same content - so all I want is that each customer page, I can include a "template" and it has the same headings with no content under the headings - so that each time I change this "template" file, it changes it on every customer, and all I have to do is edit the content on the customer page that is required.
You'll have to make one big template for the entire customer page, in which you put all the info. I'll make an example template for a page with two headers, "Customer Landing Page" and "More info". The headers are fixed, and the contents below it vary between customer page.
First, you make the template by creating the page Template:Customer
In here you put:
=Customer Landing Page=
{{{landingpagetext}}}
=More info=
{{{moreinfotext}}}
The triple accolades indicate the variables you will later define in each customer page. For customer A:
{{customer
| landingpagetext = This is the landingpage for customer A
| moreinfotext = This customer is a vegetarian
}}
Customer B:
{{customer
| landingpagetext = This is the landingpage for customer B
| moreinfotext = This customer likes Tom & Jerry
}}
The double accolades indicate the start of a template, and the first word is the templatename used. Then after each pipe ( | ) you can assign variables. I only used newlines to make it easier to read, you don't have to do that (but it makes it easier to maintain).
If you don't use the variable names (like {{customer|Landing page text|More info text}} ) you can access the variables by the order they are defined in, using {{{1}}} and {{{2}}} in the template.
If the customer pages are really big you might want to split the template up and use one per section.
Another option (but more complex) is the use of Navboxes. This requires a lot more set up but mayb be closer to what you are looking for?
You could look at using MultiBoilerPlate, I use this to set default text in pages. I would call this a template but Mediawiki uses that term for something else. If you just want to load the same default text when you start a new page and then fill it in with your own text, then I think this is what you need.