Extracting the 'end' of a string, conditions or regular expression? - regex

I have a data table - can be extracted to text or spreadsheet - The column has random text with areas in square metres that I want to copy to a new column in hectares. (So parse text and divide by 10,000).
e.g.
Deposited Plan 172499, 53,310 m2
Deposited Plan 166167, 853 m2
This plan has no area stated
Section 21 Block I Wellington District, 403,573 m2
Output column should have:
5.3310
0.0853
40.3573
Is there a way I can automate this in LibreOffice Calc, or with a regular expression editor like TextCrawler? Or perhaps using an AutoIt script?

Try with this
/([0-9]+,*[0-9]+\sm2)$/

Related

Dynamic Google Sheets Column + Row formula

I have a good sheet that I want to grab the header which a date time stamp which will match against another sheet find the entries with that date and suburb and type and give me an average cost.
My formula is =AVERAGEIFS(Sheet1!C:C,Sheet1!A:A, B11:B, Sheet1!F:F, C10) which gives me the average but i've hard coded the header date:
example:
What I want to do is dynamically add the data from the row above with the date time instead of of manually adding it in the formula something like this:
=AVERAGEIFS(Sheet1!C:C,Sheet1!A:A, B11:B, Sheet1!F:F, =CHAR(COLUMN()+64) & 10)
Which would automatically grab the column + row 10 e.g C10, D10, E10.
If i put =CHAR(COLUMN()+64) & 10 in its own cell it works but when I add it to averageifs condition it gives me a parsing error.
Expecting C10, D10, E10 from =CHAR(COLUMN()+64) & 10 which should allow me to dynamically filter data on the date int he header above it.
try:
=AVERAGEIFS(Sheet1!C:C, Sheet1!A:A, B11:B, Sheet1!F:F, INDIRECT(CHAR(COLUMN()+64)&10))

Replace blanks with a string with DAX coding in Power BI

I have a data in Excel and I have uploaded in power bi, created a visualisation using a chart which looks like -
blank, CP, Jj10 are basically my y axis and dashes are my bars of horizontal chart. I have tried to show how my chart looks like because I don't get any other option
(Blank)-------------998
CP-----------56
Jj10--------44
0BN--------------77
Hi-po---2
Naas-------21
There is a column named performance (sheet_name=Empl_data) and what I want is to replace the blanks with Non-GT in power bi with creating a new column.
What my output should look like -
(Non-GT)-------------998
CP-----------56
Jj10--------44
0BN--------------77
Hi-po---2
Naas-------21
I have tried this -
Non-GT = IF(ISBLANK('Empl_data'[performance]),"Non-GT",'Empl_data'[performance])
What i get is
Non-GT----------------964
(Blank)-------------34
CP-----------56
Jj10--------44
0BN--------------77
Hi-po---2
Naas-------21
I just want to replace blanks with Non-TSG completely but still it shows blank. Please help me out to solve the problem and please let me know if I have made clear what my prblm is.
My data -
Empl_id
Empl_name
performance
99807
Somman paul
0076
Richards.M
8870
Maheen Josef.T
11209
Dojar Farah
6651
Macklegn Sagoe
Hi-po
551
Cada Farez
Jj10
12
Qwezy Goha
Hi-po
6567
Beheriop Produse
CP
2227
John semmers
0BN
656
Majeeio .f
80100
Drejju Yan
Is it actually Blank where it is showing nothing? First confirm there are spaces or really null in the data, then apply conditions as bellow-
Non-GT =
IF(
'Empl_data'[performance] = BLANK() || 'Empl_data'[performance] = "",
"Non-GT",
'Empl_data'[performance]
)
Q: Where you are transforming data with that condition? Power Query Editor? Or creating Measure or Calculated column?

Get a string after a specific word, using a program that has limited regex features?

Looking for help on building a regex that captures a 1-line string after a specific word.
The challenge I'm running into is that the program where I need to build this regex uses a single line format, in other words dot matches new line. So the formula I created isn't working. See more details below. Any advice or tips?
More specific regex task:
I'm trying to grab the line that comes after the word Details from entries like below. The goal is pull out 100% Silk, or 100% Velvet. This is the material of the product that always comes after Details.
Raw data:
<p>Loose fitted blouse green/yellow lily print.
V-neck opening with a closure string.
Small tie string on left side of top.</p>
<h3>Details</h3> <p>100% Silk.</p>
<p>Made in Portugal.</p> <h3>Fit</h3>
<p>Model is 5‰Ûª10,‰Û size 2 wearing size 34.</p> <p>Size 34 measurements</p>
OR
<p>The velvet version of this dress. High waist fit with hook and zipper closure.
Seams run along edges of pants to create a box-like.</p>
<h3>Details</h3> <p>100% Velvet.</p>
<p>Made in the United States.</p>
<h3>Fit</h3> <p>Model is 5‰Ûª10‰Û, size 2 and wearing size M pants.</p> <p>Size M measurements Length: 37.5"åÊ</p>
<p>These pants run small. We recommend sizing up.</p>
Here is the current formula I created that's not working:
Replace (.)(\bDetails\s+(.)) with $3
The output gives the below:
<p>100% Silk.</p>
<p>Made in Portugal.</p>
<h3>Fit</h3>
<p>Model is 5‰Ûª10,‰Û size 2 wearing size 34.</p>
<p>Size 34 measurements</p>
OR
<p>100% Velvet.</p>
<p>Made in the United States.</p>
<h3>Fit</h3> <p>Model is 5‰Ûª10‰Û, size 2 and wearing size M pants.</p> <p>Size M measurements Length: 37.5"åÊ</p>
<p>These pants run small. We recommend sizing up.</p>
`
How do I capture just the desired string? Let me know if you have any tips! Thank you!
Difficult to provide a working solution in your situation as you mention your program has "limited regex features" but don't explain what limitations.
Here is a Regex you can try to work with to capture the target string
^(?:<h3>Details<\/h3>)(.*)$
I would personally use BeautifulSoup for something like this, but here are two solutions you could use:
Match the line after "Details", then pull out the data.
matches = re.findall('(?<=Details<).*$', text)
matches = [i.strip('<>') for i in matches]
matches = [i.split('<')[0] for i in [j.split('>')[-1] for j in matches]]
Replace "Details<...>data" with "Detailsdata", then find the data.
text = re.sub('Details<.*?<.*>', '', text)
matches = re.findall('(?<=Details).*?(?=<)', text)

source file fixed width , need only Header and Footer to the target(oracle)

I have this scenario with source as fix width flat file, and I need to read to target only the Header and Footer not the details records.
I need to trim the first column (PA22109 ) and get only PA and next 2 columns to rows as two different dates.
For Footer get only the PT(PT000000000700000030620E00000055612I00000010277I) and the rest into a column of the target.
How can I achieve this logic, inputs are appreciated.
source file :
PA22109 00153252015110905408179 2015110820151108PO ---header
DE0E9D TESTGROUPEXCH TESTINSEXCH TESTLOCEXCH ID014 LNAME014 FNAME014 14 MAIN ST ANYWHERE NJ011110000 195001012Z 01000000014 LNAME014 PATFIRST014 14 MAIN ST ANYWHERE NJ011110000 1955010110106000220 TESTGROUPEXCH 8179 TESTBENEXCH TESTCNTE53 0000000000 0000002643005 011234567890 011234567890 1234 TEST PHARMACY TEST PHARMACY LANE PHARMACYTOWN NJ09876 5555555555 11Y5 019876543210 019876543210 NJPRESCLAST PRESCFIRST 5555555551 DRLAST DRFIRST 110110000009770990300406048410 2015092720150927154401000000000000120150929 0000100000000000000000000000000
PT000000000700000030620E00000055612I00000010277I --Footer
As this a fixed file you can perform following to meet your requirement.
In your Informatica mapping, Read row in a single column.
In Expression, Mark each record for filter out if It does not start with PA OR PT (Assumption your Detail records do not start with PA or PT). Filter detail record out using Filter transformation.
Now you have only Header and Footer Records.
Now you can apply respective condition in expression for PA and PT Records.

VBA Excel : Read file, and stock images in a column

I am new to VBA (I mean, REALLY new) and I would like you to give me some tips.
I have an Excel file with 2 columns: SKU and media_gallery
I also have images stocked in a folder (lets name it /imageFolder)
I need to parse the imageFolder and look for ALL images sarting by SKU.jpg , and put them into the media_gallery column separated by a semicolon ( ; )
Example: My SKU is "1001", I need to parse the image folder for all images starting by 1001 (all image have this pattern: 1001-2.jpg , 1001-3.jpg etc...)
I can do that in Java or C# but I want to give a chance to VBA. :)
How can I do that?
EDIT: I only need file names yes! And I should of said that I have 20 000 images in my folder, and 8000 SKUs , so I don't know how we can handle looping on 20 000 images names.
EDIT2: If SKU contains a dash ( - ), I don't need to treat it, so I can pass to the next SKU. And each SKU has a maximum of 5 images (....;SKU-5.jpg)
Thanks all.
How to insert images given you have one image name per cell in a column: How to get images to appear in Excel given image url
Take the above and introduce an inner loop for the file name:
if instr(url_column.Cells(i).Value, "-") = 0 then
dim cur_file_name as string
cur_file_name = dir("imageFolder\" & url_column.Cells(i).Value & "*.jpg")
do until len(cur_file_name) = 0
image_column.Cells(i).Value = image_column.Cells(i).Value & cur_file_name & ";"
cur_file_name = Dir
loop
end if