Notepad++ truncate LatLng data - regex

I have a KML file that has multiple coordinates. I want to trim the coordinates to see if it will reduce the size.
CHANGES
The LatLng is different and not in a consistent format. Below is a sample of the LatLng that I have within my file. My apologies for not using a more accurate capture I didn't realize that it would affect the RegEx.
20.0649556884364,42.546758117893,0
-6.665609089909049,61.4394550582227,0
142.843146200241,54.2804088338613,0
This goes on for awhile as it is a multigeometry polygon. I would like to reduce the LatLng to 20.1234,20.1234
How do I remove the last 9 digits leaving only 4 after the period?

Find what:
(\.\d{4})\d+
Replace with:
$1
DEMO

Related

Regular Expression - Extract Words and number

So I'm using Regexextract in GoogleSheet to find the value for a big amount of data. I have 2 problems I don't know to extract or what I did wrong. Feel free to point out my mistakes and help me with a solution.
Require: I need to extract the part number which format is ABCD#### or ABCD-#### which is is Upper character and numbers follow after, with or w/o "-" , for example KTA1763 or SPD-4124
# I use this formula: =Regexextract(A1,"([A-Z]+-?[0-9]*)") .FYI, the values I'm extracting, it could appear either at the beginning, middle or last.
1.First problem, I have the value as below:
REACH TECH 223/224 list document for KTD2026BEWE-TR
=> Extract result : REACH
[What I need: KTD2026]
2.I have the value as:
information for Part number KTA1550EDS-TR
=> Extract result: P
[What I need: KTA1550]
Please let me know which part in the formula should I fix to have the final expected result. Or how should I alter my formula for that matter, big thanks
go for:
=INDEX(IFNA(REGEXEXTRACT(A1:A, "[A-Z]+\d+")))
Try this in one cell.
=ArrayFormula(IF(A2:A="",,REGEXEXTRACT(REGEXEXTRACT(A2:A, ".+"&REGEXEXTRACT(A2:A, "[0-9]+")), ".+\s(.+)")))

How can I extract specific patterns from a string?

I currently have a dataset filled with the following pattern:
My goal is to get each value into a different cell.
I have tried with the following formula, but it's not yielded the results I am looking for.
=SPLIT(D8,"[Stock]",FALSE,FALSE)
I would appreciate any guidance on how I can get to the ideal output, using Google Sheets.
Thank you in advance!
I will assume here from your post that your original data runs D8:D.
If you want to retain [Stock] in each entry, try the following in the Row-8 cell of a column that is otherwise empty from Row 8 downward:
=ArrayFormula(IF(D8:D="",,TRIM(SPLIT(REGEXREPLACE(D8:D&"~","(\[Stock\]).","$1~"),"~",1,1))))
If you don't want to retain [Stock] in each entry, use this version:
=ArrayFormula(IF(D8:D="",,TRIM(SPLIT(REGEXREPLACE(D8:D&"~","\[Stock\].","~"),"~",1,1))))
These formulas don't function based on using any punctuation at all as markers. They also assure that you don't wind up with blank (and therefore unusable) cells interspersed for ending SPLITs.
, only used in the separator
=ARRAYFORMULA(SPLIT(D8:D,", ",FALSE))
, used also in each string ([stock] will be replaced)
=ARRAYFORMULA(SPLIT(D8:D," [Stock], ",FALSE))
, used also in each string ([stock] will not be replaced)
=ArrayFormula(SPLIT(REGEXREPLACE(M9:M11,"(\[Stock\]), ","$1♦"),"♦"))
use:
=INDEX(TRIM(IFNA(SPLIT(D8:D; ","))))

Excel- Extract Number from Cell

I have multiple cells that I am attempting to extract a number from, and need help finding a regex alternative.
The cells range in the following formats:
asdfs. Seat#29 asfddsa
asdfsa. Seat#5d
asdfasN/A . Seat#22 as789fsd
Seat#111 words33
The closest that I came to a solution is:
=IFERROR(TRIM(MID([#DisplayName],FIND("#",[#DisplayName])+1,3)),"")
As you can see this will extract most of the numbers but for some it leaves a character at the end.
The only commonality is the # preceding the seat number. I am trying to extract only the seat number, no other numbers.
I cannot use VBA, this must be done using formulas. I have figured this out once before but stupidly pasted over the formulas with a values only paste.
This can be done utilizing a flash fill, but I was hoping for a more stable formula.
If you want just the numbers then use:
=--MID(A1,FIND("#",A1)+1,AGGREGATE(15,6,ROW(1:5)/(ISERROR(--MID(REPLACE(A1,1,FIND("#",A1),""),ROW(1:5),1))),1)-1)
If you want the letter also then:
=MID(A1,FIND("#",A1)+1,FIND(" ",REPLACE(A1,1,FIND("#",A1),""))-1)
If you do not need the letter following the seat number, you can use
.*#(\d+)
Edit for clarity: Excel does not have regex functions built in. You will either have to use a UDF (I can help with that if you'd like) or use a non-regex solution.
Here is a solution without VBA to extract all numbers inside the strings.
https://drive.google.com/open?id=1Fk6VFznD3i8s6scADy_vXCEj-1zQpBPW
Sheet #3

Check if cell contains numbers in Google Spreadsheet using RegExMatch

I want to check if specific cell contain only numbers.
I know I should use RegExMatch but I get an error.
This is what I wrote : =if(RegExMatch(H2,[0-9]),"a","b")
I want it to say : write 'a' if H2 contains only numbers, 'b' otherwise.
Thank you
Try this:
=IF(ISNUMBER(H2,"A","B"))
or
=if(isna(REGEXEXTRACT(text(H2,"#"),"\d+")),"b","a")
One reason your match isn't working also - is that it in interpreting your numbers as text. the is number function is a bit more consistent, but if you really need to use regex, then you can see in the second formula where im making sure the that source text is matching against a string.
Your formula is right, simple you forget the double quotes at regexmatch function's regular_expression .
This is the right formula: =if(RegExMatch(B20,"[0-9]"),"a","b")
=REGEXREPLACE(“text”,”regex”,”replacement”)
It spits out the entire content but with the regular expression matched content replaced. =REGEXREPLACE(A2,[0-9],"a")
=REGEXREPLACE(A2,![0-9],"b")//not sure about not sign.
will fill a cell with the same text as A2, but with the 0-9 becoming an a!

Automatically finding numbering patterns in filenames

Intro
I work in a facility where we have microscopes. These guys can be asked to generate 4D movies of a sample: they take e.g. 10 pictures at different Z position, then wait a certain amount of time (next timepoint) and take 10 slices again.
They can be asked to save a file for each slice, and they use an explicit naming pattern, something like 2009-11-03-experiment1-Z07-T42.tif. The file names are numbered to reflect the Z position and the time point
Question
Once you have all these file names, you can use a regex pattern to extract the Z and T value, if you know the backbone pattern of the file name. This I know how to do.
The question I have is: do you know a way to automatically generate regex pattern from the file name list? For instance, there is an awesome tool on the net that does similar thing: txt2re.
What algorithm would you use to parse all the file name list and generate a most likely regex pattern?
There is a Perl module called String::Diff which has the ability to generate a regular expression for two different strings. The example it gives is
my $diff = String::Diff::diff_regexp('this is Perl', 'this is Ruby');
print "$diff\n";
outputs:
this\ is\ (?:Perl|Ruby)
Maybe you could feed pairs of filenames into this kind of thing to get an initial regex. However, this wouldn't give you capturing of numbers etc. so it wouldn't be completely automatic. After getting the diff you would have to hand-edit or do some kind of substitution to get a working final regex.
First of all, you are trying to do this the hard way. I suspect that this may not be impossible but you would have to apply some artificial intelligence techniques and it would be far more complicated than it is worth. Either neural networks or a genetic algorithm system could be trained to recognize the Z numbers and T numbers, assuming that the format of Z[0-9]+ and T[0-9]+ is always used somewhere in the regex.
What I would do with this problem is to write a Python script to process all of the filenames. In this script, I would match twice against the filename, one time looking for Z[0-9]+ and one time looking for T[0-9]+. Each time I would count the matches for Z-numbers and T-numbers.
I would keep four other counters with running totals, two for Z-numbers and two for T-numbers. Each pair would represent the count of filenames with 1 match, and the ones with multiple matches. And I would count the total number of filenames processed.
At the end, I would report as follows:
nnnnnnnnnn filenames processed
Z-numbers matched only once in nnnnnnnnnn filenames.
Z-numbers matched multiple times in nnnnnn filenames.
T-numbers matched only once in nnnnnnnnnn filenames.
T-numbers matched multiple times in nnnnnn filenames.
If you are lucky, there will be no multiple matches at all, and you could use the regexes above to extract your numbers. However, if there are any significant number of multiple matches, you can run the script again with some print statements to show you example filenames that provoke a multiple match. This would tell you whether or not a simple adjustment to the regex might work.
For instance, if you have 23,768 multiple matches on T-numbers, then make the script print every 500th filename with multiple matches, which would give you 47 samples to examine.
Probably something like [ -/.=]T[0-9]+[ -/.=] would be enough to get the multiple matches down to zero, while also giving a one-time match for every filename. Or at worst, [0-9][ -/.=]T[0-9]+[ -/.=]
For Python, see this question about TemplateMaker.