Regex to find a particular pattern in Excel - regex

This is the text in cell H1
<a class="stop-propagation" href="javascript:void(0);" data-link="/propertyDetails/poiOnMap.html?lat=19.2412011&longt=73.1290596&projectOrProp=Project&city=Thane&includeJs=y&type=poiMap2017&address=Thane, Maharashtra" id="map_link_27696295" onclick="stopPage=true; showPhotoMap('/propertyDetails/poiOnMap.html?lat=19.2412011&longt=73.1290596&projectOrProp=Project&city=Thane&includeJs=y&type=poiMap2017&address=Thane, Maharashtra');" style="outline: 1px solid blue;"><span class="icoMap"></span>Map</a>
From above cell I'm trying to extract element of 1st occurrence of lat and longt
This is what I have tried
=IF(LEFT(H1,2)="lat=",SUBSTITUTE(H1,"lat=",""),IF(RIGHT(H1,2)="lat=",SUBSTITUTE(H1,"lat=",""),H1))
But it doesn't gives me proper output.
This is what I Expect
lat=19.2412011
longt=73.1290596
Any help would be much appreciated.
Thanks

For the lat=19.2412011,
=TRIM(LEFT(SUBSTITUTE(REPLACE(H1,1,FIND("?",H1),TEXT(,)),"&",REPT(" ",LEN(H1))), LEN(H1)))
For the longt=73.1290596,
=TRIM(MID(SUBSTITUTE(REPLACE(H1,1,FIND("?",H1),TEXT(,)),"&",REPT(" ",LEN(H1))), LEN(H1), LEN(H1)))
For the two together in a single cell with a line feed,
=TRIM(LEFT(SUBSTITUTE(REPLACE(H1,1,FIND("?",H1),TEXT(,)),"&",REPT(" ",LEN(H1))),LEN(H1)))&CHAR(10)&TRIM(MID(SUBSTITUTE(REPLACE(H1,1,FIND("?",H1),TEXT(,)),"&",REPT(" ",LEN(H1))),LEN(H1),LEN(H1)))

Related

Regex (re2 googlesheets) multiple values in multiline cell

Getting stuck on how to read and pretty up these values from a multiline cell via arrayformula.
Im using regex as preceding line can vary.
just formulas please, no custom code
The first column looks like a set of these:
```
[config]
name = the_name
texture = blah.dds
cost = 1000
[effect0]
value = 1000
type = ATTR_A
[effect1]
value = 8
type = ATTR_B
[feature0]
name = feature_blah
[components]
0 = comp_one,1
[resources]
res_one = 1
res_five = 1
res_four = 1
<br/>
Where to be useful elsewhere, at minimum it needs each [tag] set ([effect\d], [feature\d], ect) to be in one column each, for example the 'effects' column would look like:
ATTR_A:1000,ATTR_B:8
and so on.
Desired output can also be seen in the included spreadsheet
<br/>
<b>Here is the example spreadsheet:</b>
https://docs.google.com/spreadsheets/d/1arMaaT56S_STTvRr2OxCINTyF-VvZ95Pm3mljju8Cxw/edit?usp=sharing
**Current REGEXREPLACE**
Kinda works, finds each 'type' and 'value' great, just cant figure out how to extract just that from the rest, tried capture (and non-capturing) groups before and after but didnt work
=ARRAYFORMULA(REGEXREPLACE($A3:$A,"[\n.][effect\d][\n.](.)\n(.)","1:$1 2:$2"))
**Current SUBSTITUTE + REGEXEXTRACT + REGEXREPLACE**
A different approach entirely, also kinda works, longer form though and left with having to parse the values out of that string, where got stuck again. Idea was to use this to simplify, then regexreplace like above. Getting stuck removing content around the final matches though, and if can do that then above approach is fine too.
// First ran a substitute
=ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE($A3:$A,char(10),";"),";;",char(10)))
// Then variation of this (gave up on single line 'effect/d' so broke it up to try and get it working)
=ARRAYFORMULA(IF(A3:A<>"",IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect0]);(.)$")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect1]);(.)$")&";;")&""&IFERROR(REGEXEXTRACT(A3:A,"(?m)^(?:[effect2]);(.)$")&";;"),""))
// Then use regexreplace like above
=ARRAYFORMULA(REGEXREPLACE($B3:$B,"value = (.);type = (.);;","1:$1 2:$2"))
**--EDIT--**
Also, as my updated 'Desired Output' sheet shows (see timestamped comment below), bonus kudos if you can also extract just the values of matching 'type's to those extra columns (see spreadsheet).
All good if you cant though, just realized would need that too for lookups.
**--END OF EDIT--**
<br/>
Ive tried dozens of things, discarding each in turn, had a quick look in version history to grab out two promising attempts and shared them in separate sheets.
One of these also used SUBSTITUTE to simplify input column, im happy for a solution using either RAW or the SUBSTITUTE results.
<br/>
**Potentially Useful links:**
https://github.com/google/re2/wiki/Syntax
<br/>
<b>Just some more words:</b>
I also have looked at dozens of stackoverflow and google support pages, so tried both REGEXEXTRACT and REGEXREPLACE, both promising but missing that final tweak. And i tried dozens of tweaks already on both.
Any help would be great, and hopefully help others in future since examples with spreadsheets are great since every new REGEX seems to be a new adventure ;)
<br/>
P.S. if we can think of better title for OP, please say in comment or your answer :)
paste in B3:
=ARRAYFORMULA(SUBSTITUTE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(
IF(C3:E<>"", C2:E2&":"&C3:E, )),,999^99))), " ", ", "))
paste in C3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&C2)))
paste in D3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&D2)))
paste in E3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT($A3:$A, "(\d+)\ntype = "&E2)))
paste in F3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[feature\d+\]\nname = (.*)")))
paste in G3:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A3:A, "\[components\]\n\d+ = (.*)")))
paste in H3:
=ARRAYFORMULA(IFNA(REGEXREPLACE(INDEX(SPLIT(REGEXEXTRACT(
REGEXREPLACE(A3:A, "\n", ", "), "\[resources\], (.*)"), "["),,1), ", , $", )))
spreadsheet demo
This was a fun exercise. :-)
Caveat first: I have added some "input data". Examples:
[feature1]
name = feature_active_spoiler2
[components]
0 = spoiler,1
1 = spoilerA, 2
So the output has "extra" output.
See the tab ADW's Solution.

Extract text with bold content from css selector

I am trying to extract a text from forum posts, however the bold element is ignored.
How can I extract raw data like Some text to extract bold content? Currently I am getting only Some text to extract ?
<blockquote class="messageText SelectQuoteContainer ugc baseHtml">
Some text to extract <b>bold content</b>?
</blockquote>
def parse_page(self, response):
for quote in response.css('article'):
yield {
'text': quote.css('blockquote::text').extract()
}
You need a space in your css selector:
'blockquote ::text'
^
Because you want text of every descending node under blockquote, without space it means just the text of blockquote node.
Use * selector to select text of all inner elements inside an element.
''.join([ a.strip() for a in quote.css('blockquote *::text').extract() ])

web browser innertext data how received in textbox?

I have posted my HTML below. In which I want to get the name value from within my textbox area. I've tried several processes and I'm still not getting any valid solution. Please check my HTML and code snippet, and show me a possible solution.
The name prefix will always stay the same when I refresh the page. However, the last name within the "name" area will change, but will always contain the literal "mr." as the first 3 digits. regex as ([mM]r.\ ) - Four digits if you consider the literal space. Below is my table example.
<table>
<tr><td><b>Your Name is </b> mr. kamrul</td></tr>
<tr><td><b>your age </b> 12</td></tr>
<tr><td><b>Email:</b>kennethdasma30#gmail.com</td></tr>
<tr><td><b>job title</b> sales man</td></tr>
</table>
As shown below I am trying this process using listbox but I am not receiving anything.
HtmlElementCollection bColl =
webBrowser1.Document.GetElementsByTagName("table");
foreach (HtmlElement bEl in bColl)
{
if (bEl.GetAttribute("table") != null)
{
listBox1.Items.Add(bEl.GetAttribute("table"));
}
}
If anyone ca give me an idea of how I am able to receive all in the browser window as ("mr. " + text) within my list box I would appreciate it. Also, if you can explain the answer verbosely and with good comments I would appreciate it, as I'd like to understand the answer in greater detail as well.
Here is one simple way using Regex, assuming that the format of your html page doesn't change.
Regex re = new Regex(#"(?<=<tr><td><b>Your\sName\sis\s?</b>\s?)[mM]r\.\s.+?(?=</td></tr>)", RegexOptions.Singleline);
foreach (Match match in re.Matches(webBrowser1.DocumentText))
{
listBox1.Items.Add(match.Value);
}

Qt : QXmlQuery and XPaths

I'm here to ask you some help with QXmlQuery and Xpath.
I'm trying to use this combination to extract some data from several HTML documents.
These documents are downloaded and then cleaned with the HTML Tidy Library.
The problem is when I try my XPath. Here is an example code :
[...]
<ul class="bullet" id="idTab2">
<li><span>Hauteur :</span> 1127 mm</li>
<li><span>Largeur :</span> 640 mm</li>
<li><span>Profondeur :</span> 685 mm</li>
<li><span>Poids :</span> 159.6 kg</li>
[...]
The clean code is stored in a QString "code" :
QStringList fields, values;
QXmlQuery query;
query.setFocus(code);
query.setQuery("//*[#id=\"idTab2\"]/*/*/string()");
query.evaluateTo(&fields);
My goal is to get all the fields (Hauteur, Largeur, Profondeur, Poids, etc.) and their value (1127 mm, 640 mm, 685 mm, 159.6 kg, etc.).
Question 1
As you can see, I use this XPath //*[#id="idTab2"]/*/*/string() to recover the fields because this : //ul[#id="idTab2"]/li/span/string() doesn't work. When I try to specify a tag name, it gives me nothing. It only works with *. Why ? I've checked the code returned by the tidy function and the XPath is not altered. So, I don't see any prolem. Is this normal ? Or maybe there is something I don't know...
Question 2
In the previous XHTML code, the li tags wrap a span tag and some text. I don't know how to get only the text and not the content of the span tag. I tried :
//*[#id="idTab2"]/*/string() gives : Hauteur : 1127 mm Largeur : 640 mm Profondeur : 685 mm
//*[#id="idTab2"]/*[2]/string() gives : Nothing
So, if I'm not wrong, the text in the li tag is not considered as a child node but it should be. See the accepted answer : Select just text directly in node, not in child nodes.
Thanks for reading, I hope someone can help me.
To get the elements (not the text representation) inside the different <li>s, you can test the text content:
//*[#id=\"idTab2\"]/li[starts-with(span, "Hauteur")]
Same thing of other items:
//*[#id=\"idTab2\"]/li[starts-with(span, "Largeur")]
//*[#id=\"idTab2\"]/li[starts-with(span, "Profondeur")]
//*[#id=\"idTab2\"]/li[starts-with(span, "Poids")]
To get the string representation of these <li>, you can use string() around the whole expression, like this:
string(//*[#id=\"idTab2\"]/li[starts-with(span, "Poids")])
which gives "Poids : 159.6 kg"
To extract only the text node in the <li>, without the <span>, you can use these expressions, which select the text nodes which are direct children of <li> (<span> is not a text node), and removes the leading and trailing whitespace characters (normalize-space())
normalize-space(//*[#id=\"idTab2\"]/li[starts-with(span, "Hauteur")]/text())
normalize-space(//*[#id=\"idTab2\"]/li[starts-with(span, "Largeur")]/text())
normalize-space(//*[#id=\"idTab2\"]/li[starts-with(span, "Profondeur")]/text())
normalize-space(//*[#id=\"idTab2\"]/li[starts-with(span, "Poids")]/text())
The last on gives "159.6 kg"

Infragistics Webdatagrid: Resize column to largest header/data cell contents

I am searching for a solution that changes the column width based on largest data content or header.
Solutions like
tbody.igg_Item > tr > td
{
white-space: nowrap !important;
}
does not work because the grid sets the column width based on the column header contents not the data, thus the data cell contents is not displayed in its full length.
e.g if content is "my test data content", I can only see "my test dat" because my header is not long enough.
My markup is:
<ig:WebDataGrid ID="WebDataGrid1" runat="server" AutoGenerateColumns="False" Height="350px"
Width="100%">
<ClientEvents MouseDown="GridMouseDown" />
<Behaviors>
<ig:Activation>
</ig:Activation>
<ig:ColumnResizing>
</ig:ColumnResizing>
<ig:Selection CellSelectType="Single">
</ig:Selection>
</Behaviors>
</ig:WebDataGrid>
I am adding the columns in code behind (I have not seen any DataColumn property that controls width)
The grid will automatically size the columns to the data portion of the grid if there is no width set on the column and the grid itself doesn't have a width. Note that you will need to put the grid in a container if you want a horizontal scroll bar and if using paging the pager will scroll with the columns.
I have a more detailed answer to this question here on StackOverflow. I also have this posted in the Infragistics forums here with a sample. I also have a modification of that example that allows wrapping of text in the header if there are multiple words with sample posted here.