Yahoo Pipes and Website Name - regex

How do I fetch Page Name with Yahoo Pipes?
I'm making a news / blog aggregator, and need to know the name of the site where the info is coming from (bbc, cnn, fox, etc).
Do I need to do this with REGEX?
Anyone that can help?

You can fetch the page using the XPath Fetch Page or Fetch Feed modules in the Sources menu. Maybe with others too.
After that you can extract the page name itself using the various operators, possibly Regex, or others, depending on the source page you are using and the output you want to get.
In general your question is too broad and difficult to answer. To get you started, I created an example pipe that extracts the title of your question from this post, which is basically the "page name" of the current page.
http://pipes.yahoo.com/pipes/pipe.info?_id=668acf3f807c30d7b75f12459edd3252
I used the XPath Fetch Page with parameters:
URL = this page
Extract using XPath = //div[#id="question-header"]
I got that div path by inspecting the source code of this page, where I saw that div#question-header is the container of a question. I could have selected a deeper inner container or a higher level container. It all depends on the amount of other information you need. The more information you want to you from the page, the higher level container you select.
Next, I used the Create RSS operator to create a proper RSS feed, with parameters:
Title = h1.a
Link = h1.a.href
I chose these elements because in the container I extracted with xpath, the page name is inside h1 a. In Yahoo Pipes you use a dot as the path separator.

I found this sample pipe http://pipes.yahoo.com/pipes/pipe.info?_id=69b5dce1c59501a0c64a660c1cfdb856. The page title included the name of the site too. I am not sure if this what you are looking for.

Related

How to link to html page at specific ID bookmark?

Using Diagrams.net (draw.io), I would like to link specific elements to web pages. This is easily accomplished currently by creating a link for the element (say a rectangle).
However, I would like to navigate directly to a specific id bookmark in the HTML page. I cannot seem to get that to work.
For example, if I try to use this syntax (which works in the browser location bar):
https://en.wikipedia.org/wiki/Canada#Geography
I will be taken to the main page:
https://en.wikipedia.org/wiki/Canada
However, the goal is to go to the "Geography" section of this page.
I have also tried the json syntax without any success:
data:action/json,{"actions":[{"open":"https://en.wikipedia.org/wiki/Canada#Geography"}]}
I have also played with different action syntax such as:
data:action/json,{"actions":[{"open":"https://en.wikipedia.org/wiki/Canada"},{"scroll":{"tags":["Geography"]}}]}
Note: I'm using the diagrams.net desktop version 14.1.8.
Thank you for taking the time to read this question.
Paul
On Windows this only seems to work if the browser isn't already open. There is not much we can do to fix this as we're passing the link to the OS.

MailChimp: How to use conditional logic with RSS feeds

I have searched MailChimp's documentation as well as other sites but cannot seem to figure out how to use both conditional merge tag blocks with |FEED| merge tags.
Basically I am wanting to combine the two in order to include posts from multiple blogs in my e-mail campaign; in particular, I would like to use conditional logic so that any RSS feeds evaluated as "empty" (meaning no new items) receive alternative content that says something along the lines of "no updates available."
I have tried to come up with a few ways of doing this - none have been successful, but here's the type of thing I had in mind:
*|FEEDBLOCK:http://www.mailchimp.com/blog/feed/|*
*|FEED:TITLE|*
*|IF:FEED:POSTS[$content=full] != |*
*|FEED:POSTS[$count=3,$content=titles]|*
*|ELSE:|*
no updates available for this feed
*|END:IF|*
*|END:FEEDBLOCK|*
any help would be appreciated.
Well, this question is from ages ago, but I had a similar problem and figured out a (really hacky) work-around...
Assumptions:
1. You're using a (very) custom RSS feed
2. You're overriding the default RSS tags with custom content
In my use case, I'm using the <category> RSS tag, which MailChimp reads in via the *FEEDITEM:CATEGORY* merge tag. I'm using this as a subheading for my RSS feed, instead.
If that subheading is filled out in the admin (that is, whatever admin system you're using to spit out the RSS feed), I want to include it in the feed -- but I also need to add in more html for the email template. The solution is including the required html in the RSS feed. (Like I said -- hacky.)
Shockingly, this works. Mailchimp dutifully pulls in all the html/css.
The RSS feed (vastily simplied here) looks something like this:
<channel>
<item>
<category><![CDATA[ <table><td><tr><div class="example">Sub Headline</div></tr></td> ]]></category>
</item>
<channel>
If that field is not set in my custom admin, then no <category> tags at all are outputted, and MailChimp simply ignores that merge tag.
So basically, any email HTML code that you want to display only if the merge tag is valid, should show up in the feed itself.
Definitely not ideal, but it works.
YMMV...
MailChimp is still very limited with it's conditional tags, which are limited to subscriber data. RSS feed conditions would be a welcome addition.
http://blog.mailchimp.com/conditional-dynamic-content-in-mailchimp/

MVC - Strip unwanted text from rss feed

Ive got the following code in my RSS consumer (Vandelay Industries RemoteRSS) in my Orchard CMS implementation:
#using System.Xml.Linq
#{
var feed = Model.Feed as XElement;
}
<ul>
#foreach(var item in feed
.Element("channel")
.Elements("item")
.Take((int)Model.ItemsToDisplay))
{
<li>#T(item.Element("description").Value)</li>
}
</ul>
The rss feed Im using is from Pinterest, and this bundles the image, link, and a short description all inside the 'description' elements of the feed.
<description><a href="/pin/215609900882251703/"><img src="http://media-cache-ec2.pinterest.com/upload/88664686384961121_UIyVRN8A_b.jpg"></a>How to install Orchard CMS on IIS Server</description>
My issue is that I don't want the text bits, and I also need to prefix the 'href=' links with 'http://www.pinterest.com'.
I've managed to edit the original code with my newbie skills to the above,, which essentially displays the images as links which are only relative and thus pointing locally to my server. These images are also then followed by the short description.
So to summarise, I need a way to prefix all links with 'http://pinterest.com' and then to remove the fee text after the image/links.
Any pointers will be greatly appreciated, Thanks.
You should probably parse the description, with something like http://htmlagilitypack.codeplex.com/, and then tweak it to add the prefix. Or you can learn regular expression and do without a library. Could be a little trickier and error-prone however.

Regex with iframe in Yahoo! Pipes

I'm building a Yahoo! Pipe to pull an RSS feed from Reddit which links to some content in the description. I'm using a regex to match the href attribute of the anchor link in an item.description field. The regex I'm using is:
^.+?href="([^"]+)">\[link\].+?$
As a test, I set the replace to simply:
$1
and I see that the entire description field has been replaced with the URL. So far, so good.
I then put the following in the replace field. The idea being to iframe the content that's linked to:
Content: <iframe src="$1">no iframe support</iframe> End
What I get out however is:
Content: no iframe support End
I've confirmed that this is also coming through in the pipe's output and not just in the Yahoo! Pipes debug console.
I've so far tried replacing my angle brackets with < and > entities. I've tried wrapping the entire thing in a <![CDATA[ ... ]]> block and still, I get nothing. If I break my iframe tag by removing an angle bracket, the broken content comes through fine, but if I have a well-formed iframe element, it vanishes, leaving the "no iframe support" text. Am I doing something wrong here, or is Yahoo! actively preventing me from using iframe tags in my generated pipe? A cursory search on Google isn't turning up anything related to this.
The pipe in question is here:
http://pipes.yahoo.com/pipes/pipe.info?_id=2ba41448cadd2347d86f377efd3d199f
This Pipes FAQ Question "Why does Pipes Strip <object> and <embed> tags... ?" shows that a certain amount of sanitization is performed, by placing content (at least certain content) into an iframe for the safety of RSS consumers - though it does not state it specifically, this probably also removes other iframes in order to avoid nesting and other work-arounds.
Yahoo is big enough I would doubt they have a week sanitizer, but an extremely long shot is that you might be able to fool it by nesting the iframe in a bunch of other tags (again I doubt this will work). Also depending upon which step does the sanitization, perhaps adding part of the tag in one step, then adding another part somewhere else might work (yet again, doubt overwhelms me)
Not sure what else to suggest, other than getting something else to consume and transform your RSS a little bit more (by fixing otherwise broken tags??) - but that's what you're using pipes for to begin with, isn't it? Idunno...
Good luck!
Pipes has an fanatical devotion to the RSS spec and the spec says the description field is plain text only. HTML etc is supposed to go in the content:encoded field, not that I've had much luck getting pipes to do that.

Multiple pages html output from a .rst document in Django

I'm writing a Django app to serve some documentation written in RestructuredText.
I have many documents written in *.rst, each of them is quite long with many section, subsection and so on.
Display the whole document in a single page is not a problem using Django filters, but I'd rather have just the topic index on a first page, whit links to an URL where I can display a single section / subsection (which will need some 'previous | up | home | next' link I guess...). In a way similar to a 'multiple HTML page output' as in a docbook / XML to HTML conversion.
Can anyone point me to some direction to build a document tree of a *.rst document an parse a single section of it, or suggest a clever way to obtain a similar result?
Choice 1. Include URL links to the other parts of the document.
You write an index.rst, part1.rst, part2.rst, etc. And your index.rst has links to the other parts. This requires almost no work, except careful planning to make sure that your RST HTML links are correct.
There's no "parse". You just break your document into sections. Manually.
[This seems so obvious, I'm afraid to mention it.]
Choice 2. Use Sphinx. It manages table-of-contents and inter-document connections very nicely.
However, the Sphinx extensions to RST aren't handled directly by Django, so you'd need to save the Sphinx output and then display that in Django. We use the JSON HTML Builder (http://sphinx.pocoo.org/builders.html?highlight=json#sphinx.builders.html.JSONHTMLBuilder) output from Sphinx. Then we render these documents through a template.