Hacked WP, regex link removal

Hacked WP, regex link removal - regex

I am working on a Wordpress website which got hacked by a "Pharmahack". I have done multiple scans and checked multiple files they all seem fine.
My task now is to remove all of the spam links and text, at the moment I'm doing it manually from a DB dump but there are 300 instances of the offending content.
It is all contained within <div style="position:absolute; left:-3841px; top:-3137px;"> the left and top are variable.
Does anyone know a regex that could remove all the content within this div?
Thanks for any help.

Assuming the width and height will always be negative, and that the content within the offending <div/> snippets don't have any inner <div/> snippets:
s/<div style="position:absolute; left:-\d*px; top:-\d*px;">.*?<\/div>//g
Here's the regex in play: https://regex101.com/r/eV1kN7/1

Related

Remove everything from a cell thats within <> in Google Sheets

I know there are a lot of topics about removing characters from Google Sheets cells. I've tried to find a way how to solve my issue with the information on the web / stackoverflow but I can't find it...
I need to create a column with text in multiple rows. The original file still has styling codes <p> <strong> <i> etc in it. I need to remove these styling codes. So actually every <> code should be removed from the cell. I tried to do this with substitute but than I can only say f.e. remove <p> and I'm still having the other styling codes in the sheet.
I think this could be done with REGEXREPLACE but I cant get it working. I hope that someone can help me to understand how I can get this working. Thank you!

use:
=ARRAYFORMULA(REGEXREPLACE(D2:D, "<.*?>", ))
in some cases that wont be enough so:
=ARRAYFORMULA(REGEXREPLACE(D2:D, "<\/\w+>|<\w+.*?>", ))
and in some cases even that wont be enough so:
=ARRAYFORMULA(REGEXREPLACE(D2:D, "</?\S+[^<>]*>", ))

TOC problem when knitting Rmarkdown document with images

Finished and published on github my first website using Distill package in RStudio:
https://crlnp.github.io/3-objets.html
On this particular page I include several images in my rmarkdown document and it causes problems with the table of contents: the headings are duplicated and don't follow the headings adequatly. I have tried every toc_float: option available and it does not solve the problem. Aldo tried changing the heading levels (all #, ...). The problem appears wether the image is inserted traditionally or in a code bloc. If I take out the images in my file, the TOC works perfectly. I have not been able to find any information on this issue. Thanks in advance for any help!

Your .Rmd file on the website (GitHub) repo.
If you include JS file (optional) and move wrapping div container, from current position to above your first header, TOC behaviour will change:
<script src="hideOutput.js"></script>
<div class="fold o">
## 1. Les objets
.
.
.
</div>
Is this helpful in any way?
Output:

content empty when using scrapy

Thanks for everyone in advance.
I encountered a problem when using Scrapy on Python 2.7.
The webpage I tried to crawl is a discussion board for Chinese stock market.
When I tried to get the first number "42177" just under the banner of this page (the number you see on that webpage may not be the number you see in the picture shown here, because it represents the number of times this article has been read and is updated realtime...), I always get an empty content. I am aware that this might be the dynamic content issue, but yet don't have a clue how to crawl it properly.
The code I used is:
item["read"] = info.xpath("div[#id='zwmbti']/div[#id='zwmbtilr']/span[#class='tc1']/text()").extract()
I think the xpath is set correctly and I have checked the return value of this response and it indeed told me that there is nothing under this directory. Results shown here:'read': [u'<div id="zwmbtilr"></div>']
If it has something, there should be something between <div id="zwmbtilr"> and </div>.
Really appreciated if you guys share any thoughts on this!

I just opened your link in Firefox with NoScript enabled. There nothing inside the <div #id='zwmbtilr'></div>. If I enable the javascripts, I can see the content you want. So, as you already new, it is a dynamic content issue.
Your first option is try to identify the request generated by javascript. If you can do that, you can send the same request from scrapy. If you can't do it, the next option is usually to use some package with javascript/browser emulation or someting like that. Something like ScrapyJS or Scrapy + Selenium.

Regex with iframe in Yahoo! Pipes

I'm building a Yahoo! Pipe to pull an RSS feed from Reddit which links to some content in the description. I'm using a regex to match the href attribute of the anchor link in an item.description field. The regex I'm using is:
^.+?href="([^"]+)">\[link\].+?$
As a test, I set the replace to simply:
$1
and I see that the entire description field has been replaced with the URL. So far, so good.
I then put the following in the replace field. The idea being to iframe the content that's linked to:
Content: <iframe src="$1">no iframe support</iframe> End
What I get out however is:
Content: no iframe support End
I've confirmed that this is also coming through in the pipe's output and not just in the Yahoo! Pipes debug console.
I've so far tried replacing my angle brackets with < and > entities. I've tried wrapping the entire thing in a <![CDATA[ ... ]]> block and still, I get nothing. If I break my iframe tag by removing an angle bracket, the broken content comes through fine, but if I have a well-formed iframe element, it vanishes, leaving the "no iframe support" text. Am I doing something wrong here, or is Yahoo! actively preventing me from using iframe tags in my generated pipe? A cursory search on Google isn't turning up anything related to this.
The pipe in question is here:
http://pipes.yahoo.com/pipes/pipe.info?_id=2ba41448cadd2347d86f377efd3d199f

This Pipes FAQ Question "Why does Pipes Strip <object> and <embed> tags... ?" shows that a certain amount of sanitization is performed, by placing content (at least certain content) into an iframe for the safety of RSS consumers - though it does not state it specifically, this probably also removes other iframes in order to avoid nesting and other work-arounds.
Yahoo is big enough I would doubt they have a week sanitizer, but an extremely long shot is that you might be able to fool it by nesting the iframe in a bunch of other tags (again I doubt this will work). Also depending upon which step does the sanitization, perhaps adding part of the tag in one step, then adding another part somewhere else might work (yet again, doubt overwhelms me)
Not sure what else to suggest, other than getting something else to consume and transform your RSS a little bit more (by fixing otherwise broken tags??) - but that's what you're using pipes for to begin with, isn't it? Idunno...
Good luck!

Pipes has an fanatical devotion to the RSS spec and the spec says the description field is plain text only. HTML etc is supposed to go in the content:encoded field, not that I've had much luck getting pipes to do that.

Is there any performance implication in using one big <cfoutput> tag?

I'm being forced/payed to work on a Legacy ColdFusion project (I'm an usual C# programmer) and one peculiarity with CF is that they have they're own tags that are supposed to blend with HTML (bad bad decision, IMO, since it just confuses the hell out of me even with the "starts with cf rule).
Besides this, they have the # character to indicate the start of CF "territory" much alike <% in ASP.Net or $ in Spark or so many equivalents. But this only gets parsed if inside a tag.
My question is: Is there a problem with opening one tag in the begining of the file and closing it, against using only when i'm going to use the # character?
To illustrate here's some code:
<cfoutput>
Some text #SomeVar# Some text.<br />
Some Images some other things #AnotherVar#
</cfoutput>
Against:
Some text <cfoutput>#SomeVar#</cfoutput> Some text.<br/>
Some Images some other things <cfoutput>#AnotherVar#</cfoutput>
Granted, this is might seem trivial for small content but i'm talking about a whole page.

Depending on the page contents, either is fine. There may be a performance impact (minor) by putting all of your page inside the CFOUTPUT tag, because the CFML engine needs to parse and scan the contents of the tag for executable code. Outside of the CFOUTPUT tag, the CFML engine can ignore the page as static content.
If you have CSS and HTML code that uses pound signs (for example named anchors or Hex color codes), you need to escape all pound signs (by adding a second one like "##") when within a CFOUTPUT. Because of this, I generally only put the CFOUTPUT around code I specifically want the CF engine to run.
That said, the CFML engine pays a bit of a performance penalty for constantly opening and closing the CFOUTPUT. If you're looping over come content, put the CFOUTPUT around the entire loop, rather than opening and closing it in each iteration of the loop.
Also, if you're having trouble knowing what code is CFML and what isn't, you might want to get a better IDE/editor for CFML like CFEclipse. It color codes the tags and lets you see the difference between CFML and HTML tags immediately. It's open source.

One problem you might find is that cfoutput is often used to display queries and they can not be nested inside of other cfoutput tags. So this will cause a 'Invalid tag nesting configuration' error
<cfoutput>
<cfoutput query="qFriends">
<li>#qFriends.fname# #qFriends.lname#</li>
</cfoutput>
</cfoutput>

It should not be a big issue but be careful using hex-valued colors, you'll need to escape those with an extra #. If it was me, I would try to break down those huge chunks of content into smaller pieces. Let HTML, JS, Flash and CSS do their jobs and use CF for the server side.

If you want to put cfoutput at the beginning and end of the page, you have to use double sign ## for colors value.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Hacked WP, regex link removal - regex

Assuming the width and height will always be negative, and that the content within the offending <div/> snippets don't have any inner <div/> snippets: s/<div style="position:absolute; left:-\dpx; top:-\dpx;">.*?<\/div>//g Here's the regex in play: https://regex101.com/r/eV1kN7/1

Related

Remove everything from a cell thats within <> in Google Sheets

TOC problem when knitting Rmarkdown document with images

content empty when using scrapy

Regex with iframe in Yahoo! Pipes

Is there any performance implication in using one big <cfoutput> tag?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Hacked WP, regex link removal - regex

Assuming the width and height will always be negative, and that the content within the offending <div/> snippets don't have any inner <div/> snippets: s/<div style="position:absolute; left:-\d*px; top:-\d*px;">.*?<\/div>//g Here's the regex in play: https://regex101.com/r/eV1kN7/1

Related

Remove everything from a cell thats within <> in Google Sheets

TOC problem when knitting Rmarkdown document with images

content empty when using scrapy

Regex with iframe in Yahoo! Pipes

Is there any performance implication in using one big <cfoutput> tag?

Categories

Resources

Assuming the width and height will always be negative, and that the content within the offending <div/> snippets don't have any inner <div/> snippets: s/<div style="position:absolute; left:-\dpx; top:-\dpx;">.*?<\/div>//g Here's the regex in play: https://regex101.com/r/eV1kN7/1