Remove all HTML tags from a cell - regex

I'm trying to remove all the HTML tags and comments within the following cell in Google Sheets:
<div class="prod-desc" itemprop="description">
<div class="row">
<div class="col-md-8">
<p>This is a 100 count box of the ACC-DX01A Proximity Card to be used with any of our DX line of Access Control Readers. It is the size of a credit card so it can easily fit into your wallet. Use these like a proximity card and carry them on your key ring for easy access. </p>
<p> Please note: To add a DX Card or FOB to the DX Access Control System, you must use the Auto/Add Function. If you need assistance, FREE US based tech support is just a phone call away. </p>
</div>
<!-- Description Side Bar START ************************************ -->
<div class="col-md-4"> <img src="/images_templ/Accesss-Control_product-image.jpg"> <span class="boxtitle ">Full Line of Access Control</span> <span style="font-size: 18px; font-family: inherit; font-weight: 400">Access Control Proximity Card Readers and Electronic Door Locks and more!</span> </div>
<!-- Description Side Bar END ************************************ -->
</div>
</div>
So ideally the input should come out as:
This is a 100 count box of the ACC-DX01A Proximity Card to be used with any of our DX line of Access Control Readers. It is the size of a credit card so it can easily fit into your wallet. Use these like a proximity card and carry them on your key ring for easy access.
Please note: To add a DX Card or FOB to the DX Access Control System, you must use the Auto/Add Function. If you need assistance, FREE US based tech support is just a phone call away.
Full Line of Access Control Access Control Proximity Card Readers and Electronic Door Locks and more!
I've searched around found several answers, however, none of them seems to be working for me, maybe it's because of the new lines and carriage returns? I don't know. What I want to do is remove all the HTML and keep all the newlines and carriage returns in the text. Here are some posts that I was following:
Remove HTML In Google Sheets Cells
https://superuser.com/questions/564727/html-tags-in-google-spreadsheet

try like this:
=ARRAYFORMULA(TEXTJOIN(CHAR(10), 1,
TRIM(SPLIT(REGEXREPLACE(A1, "</?\S+[^<>]*>", ), CHAR(10)))))

Yes. Besides the answer that #player0 gave, you can also use 'Search and Replace' ctrl+H And then just paste all you wish to change/remove and replace it with nothing. It works for more than 1 cell too.
Its more laborious but you can target the entire book or ranges if needed.

Related

Displaying columns with space between them

How I can display columns with space between them? by default they are all following each other and I can't find a way to change that.
Here is an example from youtube:
Depends on how much spacing do you want... when you use a frontend framework sometimes you need to make some tradeoff, development speed vs "not too customized". The sample you take from youtube has less spacing than default column gutter of the framework, so your affirmation of "by default they are all following each other", I think is not quite precise; they have spacing, if you want to remove, just use row collapse as the class of the columns container.
Now, if you want more spacing than the default, you still have options:
You can leave a column between each element, just a matter to add one column offset to each one.
You can change the column gutter size on framework settings (if you're using the CLI version or customized, not the prebuilt)
You can also write some CSS to increase spacing for an specific column container (I won't recommend to do so globally because you could mess with the framework).
Your solution is to use a framework such as Bootstrap or Foundation. in foundation, every 'column' is inside padded, so you're able to display a grid such as this.
Read here for more info: http://foundation.zurb.com/sites/docs/grid.html
This post is tagged with Zurb Foundation, so I will solve it using their classes with what is a little bit of a workaround. For example, if you want three columns with the ability to keep adding items and have it automatically wrap you could have the following simple example with the block grid:
<div class="grid-x small-up-3">
<div class="cell">
Placeholder Text
</div>
<div class="cell">
Placeholder Text
</div>
<div class="cell">
Placeholder Text
</div>
<div class="cell">
Placeholder Text
</div>
<div class="cell">
Placeholder Text
</div>
<div class="cell">
Placeholder Text
</div>
</div>
and the following CSS
.small-up-3 > .cell {
width: calc(33.33333% - 4px);
margin-left: 2px;
margin-right: 2px;
}
.small-up-3 > .cell:nth-of-type(n+4) {
margin-top: 2px;
}
The calc is needed to subtract your margin from the width of each cell to prevent the wrapping you are seeing when just adding normal margin.

Regex to match only the first occurrence of an html element

Yes yes, I know, "don't parse HTML with Regex". I'm doing this in notepad++ and it's a one-time thing so please bear with me for a moment.
I'm trying to simplify some HTML code by using some more advanced techniques. Notably, I have "inserts" or "callouts" or whatever you call them, in my documentation, indicating "note", "warning" and "technical" short phrases to grab the attention of the reader on important information:
<div class="note">
<p><strong>Notes</strong>: This icon shows you something that complements
the information around it. Understanding notes is not critical but
may be helpful when using the product.</p>
</div>
<div class="warning">
<p><strong>Warnings</strong>: This icon shows information that may
be critical when using the product.
It is important to pay attention to these warnings.</p>
</div>
<div class="technical">
<p><strong>Technical</strong>: This icon shows technical information
that may require some technical knowledge to understand. </p>
</div>
I want to simplify this HTML into the following:
<div class="box note"><strong>Notes</strong>: This icon shows you something that complements
the information around it. Understanding notes is not critical but
may be helpful when using the product.</div>
<div class="box warning"><strong>Warnings</strong>: This icon shows information that may
be critical when using the product.
It is important to pay attention to these warnings.</div>
<div class="box technical"><strong>Technical</strong>: This icon shows technical information
that may require some technical knowledge to understand.</div>
I almost have the regex necessary to do a nice global search & replace in my project from notepad++, but it's not picking up "only" the first div, it's picking up all of them - if my cursor is at the beginning of my file, the "select" when I click Find is from the first <div class="something"> up until the last </div>, essentially.
Here's my expression: <div class="(.*[^"])">[^<]*<p>(.*?)<\/p>[^<]*<\/div> (notepad++ "automatically" adds the / / around it, kinda).
What am I doing wrong, here?
You have a greedy dot-quantifier while matching the class attribute — that's the evil guy who's causing your problems.
Make it non-greedy: <div class="(.*?[^"])"> or change it to a character class: <div class="([^"]*)">.
Compare: greedy class vs. non-greedy class.

Aligning text within a 'hover over' jquery

I am designing a site that has images that when hovered over fade a text appears.
I have used the below thread to do this, all went well however when the text I am adding in goes to the full width and height of the image it's going over. I've tried to add padding to the text through my CSS but it doesn't work.
DIV with text over an image on hover
Here is my amended code, amended
CSS
p1{font-size:1.3em;text-align:left;color:#ffffff;font-family: 'geosanslightregular';margin:100px 20px 0px 20px;padding:0;}
div.containerdiv{position:relative}
div.texts{position:absolute; top:0; left:0; width:100%; display:none; z-index:10}
div.texts:hover{display:block}
html
<div class="grid_8">
<a href="cncpt.html">
<div class="containerdiv">
<img src="images/cncpt.jpg" alt="background">
<div class="texts">
<p1>LAUNCH OF E-COMMERCE MENSWEAR STORE, STOCKING EVERYONE FROM BALMAIN AND GIVENCHY TO ADIDAS X OPENING CEREMONY, YMC, NIKE AND BEYOND. BREAK HOSTED THE LAUNCH EVENT AND INTRODUCED 200+ KEY MEDIA, BRAND AND INDUSTRY CONTACTS TO THE STORE. WE CONTINUE TO OPERATE THE PRESS OFFICE FOR CNCPT AND HAVE PICKED UP FANS EVERYWHERE FROM GQ DAILY AND METRO, TO KEY ONLINE INFLUENCERS.</p1>
</div>
</div>
</a>
</div>
<!-- end .grid_8 -->
Still no joy! it's showing the image fine but no text is showing over it or anywhere on the page for that matter!
Any ideas on how to solve this would be greatly appreciated.
Thanks,
John
A simple answer using CSS is to use the :hover pseudo class on an anchor tag.
Set you image container as position:relative in CSS.
Create a div containing your text, formatted using html and CSS inside the image container. Position this absolute in CSS. Absolute positioning positions elements relative to the parent container positioned relative. If no element is set to position relative it will take its position from the body tag. It is important to set a width to the element too.
THE HTML
<div class="container">
<a><img src="img.jpg" alt="background">
<div class="text">I will show on hover</div>
</a>
</div>
CSS
div.container{position:relative;}
div.text{ position:absolute; top:0; left:0; width:100%; display:none; z-index:10;}
a:hover div.text{display:block;}
This will position the text over the container you set to position relative aligning to the top left corner. The z-index stacks elements one above the other. The higher the z-index the higher the element is in the stack.
w3 schools have some excellent definitions and examples on all the code above if it is new to you.
The effect you are after can be achieved with html and css alone. I would advise you focus on:
design your site on paper
layout your page with html and CSS
add your rollover effects and jQuery animations
before adding the jQuery animation
CSS3 transitions are not compatible with all browsers, there are work arounds in CSS though a jQuery fallback is often used.

jSoup - How to get elements with background style (inline CSS)?

I'm building an app in Railo, which uses the jSoup .jar library. It all works really well in my CFML language.
Anyhow, I can grab every element with a "style" attribute doing:
<cfset variables.mySelection = variables.myDocument.select("*[style]") />
But this returns an array which contains elements that sometimes do not have a "background" or "background-image" style on them. As an example, the HTML might looks like so:
<p style="color: red;">I should not be selected</p>
<p style="background: green">I **should** be selected</p>
<p style="text-align: left;">I should not be selected</p>
<p style="background-image: url("/path/to/image.jpg");">I **should** be selected</p>
So I can get these elements above, but I don't want the 1st and 3rd in my array, as they don't have a background style...do you know how I can only grab and work with these?
Please note, I'm not after a COMPUTATED style, or anything that complicated, I'm just wondering if I can filter based on the properties of an inline CSS style. Perhaps some regex after the fact? I'm open to ideas!
I tried messing with :contains(background) as a key word, but I wasn't sure if that was the correct path?
Many thanks for your help.
Michael.
Try with:
variables.myDocument.select("*[style*='background']")
As *= is the standard selector to match a substring in the attribute content.
Elements els = doc.select(div[style*=dashed]);
Or
Elements elements = doc1.select("span[style*=font-weight:bold]");

What are the appropriate formats for the properties of http://schema.org/GeoShape?

It would be nice if the GeoShape page included examples or the individual properties were broken out instead of just being Text.
I'm specifically interested in the circle property. I want to define a circle of 20 mile (~ 32km) radius from Nottingham City Centre (52.953, -1.149).
<!DOCTYPE html>
<html>
<head>
<title>Nottingham City Neighbourhood</title>
</head>
<body>
<div itemscope itemtype="http://schema.org/Place">
<div itemprop="geo" itemscope itemtype="http://schema.org/GeoShape">
<meta itemprop="circle" content="52.953 -1.149 32186.88"/>
</div>
</div>
</body>
</html>
The rich snippet tool does pick out the data, but I don't trust that I've used the right format. Especially since the parsed longitude is positive.
> The following structured data is viewable only in the XML results view
> in Custom Search. More information.
>
> geoshape (source = MICRODATA) circle = 52.953 -1.149 32186.88
>
>
> The following structured data can be used to filter search results in
> Custom Search. More information.
>
> more:pagemap:geoshape more:pagemap:geoshape-circle
> more:pagemap:geoshape-circle:1.149
> more:pagemap:geoshape-circle:32186.88
> more:pagemap:geoshape-circle:52.953
> more:pagemap:geoshape-circle:52.953_
As for the others, I think both box and polygon would be in the format "$lat1,$long1 $lat2,$long2 $lat3,$long3 $lat1,$long1" for a square.
Anybody have a definitive answer or reason?
I've done some archaeology, following a similar trail to others.
Details: http://lists.w3.org/Archives/Public/public-vocabs/2012Jun/0116.html
The compounding confusion seems to be (as Yves Martin points out) missing whitespace in the original rNews examples.
We'll get this situation improved and I'll report back here.
Validation
The example you give (in the first version of your question) does not pass validation at http://validator.nu/. You cannot use directly a property in the same node that declares the entity type. Probably the rich snippet tool is not strict enough. To confirm, this alternate tool also refuses to generate a JSON expression from your block because of the lack of a top level element.
So an additional node is required for the geo property, here is a proper way to express it (doctype and title are for validation tool only):
<!DOCTYPE html>
<title>Nottingham City Neighbourhood</title>
<div class="hidden" itemscope itemtype="http://schema.org/GeoShape">
<div itemprop="geo">
<meta itemprop="circle" content="52.953 -1.149 32186.88"/>
</div>
</div>
Recommendation
According to this Google FAQ only few entities are really supported and based on Organization and Event examples in microdata format, the optional geo property only propose longitude and latitude elements from http://schema.org/GeoCoordinates. So there is less doubt to use that simple point definition compared to circle. By the way this example is valid and properly extracted:
<div itemscope itemtype="http://data-vocabulary.org/Organization">
<span itemprop="name">Nottingham City Neighbourhood</span>
<div itemprop="geo">
<meta itemprop="circle" content="52.953 -1.149 32186.88"/>
</div>
</div>
If you use sindice.com, there is no hit for http://schema.org/GeoShape whereas http://schema.org/GeoCoordinates is extensively used. Not so easy to find real world usage of circle.
Circle property value
For the circle property content itself, many documentation refers to WGS84 but it only concerns point. This documentation confirms the content text structure for the circle element.
This example for rNews obviously lacks a space before the 500 radius and is not properly rendered, the page source contains <td class="rnews_td codestyle">38.920952 -94.645443500</td> instead of <td class="rnews_td codestyle">38.920952 -94.645443 500</td>
You should look at schema generators or parsers. Maybe one of them has implemented a fine grain editor for GeoShape properties instead of a raw text field, so that you can confirm property content structure. I have looked at Any23 but still the same issue: GeoCoordinates is implemented but not GeoShape.
Box and polygon property value
No coma is expected between longitude and latitude values for point, box, polygon or line (only use space) according to both rNews and GeoRSS.
As a conclusion, you should avoid GeoShape if your aim is to provide a location to search engines... At the moment, only GeoCoordinates seems to be a reasonable choice.
Going from the discussion on W3.org, an example value of a GeoShape Box would be:
38.920952 -94.645443 38.951797 -94.680439
Those values result in the area mapped here.
As stated in the schema, they just need to be the unique values of the corners of the box (e.g. "latmin latmax lonmin lonmax"):
A polygon is the area enclosed by a point-to-point path for which the starting and ending points are the same. A polygon is expressed as a series of four or more spacedelimited points where the first and final points are identical.