Regex to replace spam links in Wordpress - regex

I am dealing with old hacked sites in Wordpress where there are injection spam links on images.
I have access to the database and would like to remove links that look like this:
<a style="text-decoration:none" href="/ansaid-retail-cost">.</a>
Now text varies inside the <href> it might be for cialas or any product, but the rest doesn't vary. I want to remove the entire LINK, so the result is a single space.
I don't know regex, so I would appreciate the help. I've tried online generators but they don't seem to be working.

Related

Regular expression to get specific pages out of a list of landing pages in Google Analytics

In Google Analytics, I need to select landing pages for each hotel my client operates. Hotel pages are identified by the string /hotels-in-XYZ/.
I need to exclude all other pages
I need to exclude sub pages like /hotels-in-XYZ/offer-page/ too
Sample list of hotels:
/XXX-one/login/
/hotels-in-ranthambhore/
/hotels-in-jaipur-resort/
/hotels-in-morocco-marrakech/
/about-us/
/hotels-in-mumbai/
/hotels-in-bengaluru/
/hotels-in-agra-resort/special-offers/extended-stay-offer/
/hotels-in-shimla/amp/
/hotels-in-udaipur-resort/amp/
I'm not that familiar with regex and I've been googling to find a solution. The closest I have is .*?\/hotels(.*)\/.* but it does not exclude page like /hotels-in-shimla/amp/
Your help would be appreciated. Let me know if I need to post any additional information to explain the question better.
Does ^\/hotels-in-[\w\-]+\/$ work for you?
I tested this at https://regex101.com/r/9c2IRC/1/

Regex specific question and search function on my website dealing with broken links

I've been trying to figure out my regex pattern but it doesn't seem to be working for me.
Here's what i'm trying to do:
I have broken links on my website if someone accidentally gets to a page like so:
https://example.com/catalogsearch/result/?q=
or
https://example.com/catalogsearch/result/
So i'm redirecting them back to my homepage. The problem is now the search is just sending everything back to the homepage. So i'm assuming if there is something after the equals it needs to continue the search.. obviously
https://example.com/catalogsearch/result/?q=person
but currently i can't figure this out..
Here is my regex that i've been messing with for quite sometime now... still seems to be wrong or something else is wrong with my search.
"^/catalogsearch/result((/)|(/\\?)|(/\\?[a-z])|(/\\?[a-z]=))?$"
Please forgive me i'm horrible with regex.
After a lot of discussion, it is concluded that the routes.yaml will consider the url path as a valid route but not the query string part. Hence out of the two examples in the post, you can use
"/catalogsearch/result": { to: "https://example.com/", prefix: false }
and for other one please change it in nginx config to redirect to homepage or if its not possible then check with magento support on how to incorporate the query string part in routes.yaml file.

Analytics Goal Funnel Regex doesn't recognize "example.html?p=2"

I have my goal funnel set up and this is the regex for one of the stages: ^/shop/(.*)
This will match pages such as /shop/collections/art.html but when I look at the goal funnel, it says people are dropping out by going to pages like /shop/collections/art.html?p=2. Notice the ?p=2 is the only difference here.
I tried to do it as ^/shop/((.|\?)*) but I'm not sure that's fixing it.
How do I fix this?

How can I use a regex to validate slideshare slideshow URLs?

I am using www.slideshare.net to allow my users to display embedded slideshows on their profiles.
I'm using slideshare's api to get the slideshow's id, given the slideshow link that users has to get by clicking 'share' on the slideshow and copy/paste the url:
What I would need is to validate thoroughly the latter url.
Just to further explain my process, when I have the slideshow's id, I compute the embedded code like so :
"<iframe src='https://www.slideshare.net/slideshow/embed_code/" + json.slideshow_id + "' frameborder='0' allowfullscreen webkitallowfullscreen mozillaallowfullscreen></iframe>"
where json is the object returned by slideshare's api.
A basic regex to answer my question would be:
^http\://www\.slideshare\.net/[a-zA-Z0-9\-]+/[a-zA-Z0-9\-]+$
But it feels a little weak to me :
I don't want my users to just copy/paste the url in the navigator address bar
I'm not sure this regex works for all slideshare's slideshows as I'm not a slideshare specialist (does that even exist?)
Ideally I would like to exclude all other regular urls from www.slideshare.net that doesn't point to a slideshow.
EDIT 7/12/2014: rewrite
You can use something like this:
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?
More example from this website

Regex with iframe in Yahoo! Pipes

I'm building a Yahoo! Pipe to pull an RSS feed from Reddit which links to some content in the description. I'm using a regex to match the href attribute of the anchor link in an item.description field. The regex I'm using is:
^.+?href="([^"]+)">\[link\].+?$
As a test, I set the replace to simply:
$1
and I see that the entire description field has been replaced with the URL. So far, so good.
I then put the following in the replace field. The idea being to iframe the content that's linked to:
Content: <iframe src="$1">no iframe support</iframe> End
What I get out however is:
Content: no iframe support End
I've confirmed that this is also coming through in the pipe's output and not just in the Yahoo! Pipes debug console.
I've so far tried replacing my angle brackets with < and > entities. I've tried wrapping the entire thing in a <![CDATA[ ... ]]> block and still, I get nothing. If I break my iframe tag by removing an angle bracket, the broken content comes through fine, but if I have a well-formed iframe element, it vanishes, leaving the "no iframe support" text. Am I doing something wrong here, or is Yahoo! actively preventing me from using iframe tags in my generated pipe? A cursory search on Google isn't turning up anything related to this.
The pipe in question is here:
http://pipes.yahoo.com/pipes/pipe.info?_id=2ba41448cadd2347d86f377efd3d199f
This Pipes FAQ Question "Why does Pipes Strip <object> and <embed> tags... ?" shows that a certain amount of sanitization is performed, by placing content (at least certain content) into an iframe for the safety of RSS consumers - though it does not state it specifically, this probably also removes other iframes in order to avoid nesting and other work-arounds.
Yahoo is big enough I would doubt they have a week sanitizer, but an extremely long shot is that you might be able to fool it by nesting the iframe in a bunch of other tags (again I doubt this will work). Also depending upon which step does the sanitization, perhaps adding part of the tag in one step, then adding another part somewhere else might work (yet again, doubt overwhelms me)
Not sure what else to suggest, other than getting something else to consume and transform your RSS a little bit more (by fixing otherwise broken tags??) - but that's what you're using pipes for to begin with, isn't it? Idunno...
Good luck!
Pipes has an fanatical devotion to the RSS spec and the spec says the description field is plain text only. HTML etc is supposed to go in the content:encoded field, not that I've had much luck getting pipes to do that.