Simplify points in KML using regex - regex

I am trying to cut down the file size of a kml file I have.
The coordinates for the polygons are this accurate:
-113.52106535153605,53.912817815321503,0.
I am not very good with regex, but I think it would be possible to write one that selects the eight characters before the commas. I'd run a search and replace so the result would be
-113.521065,53.9128178,0.
Any regex experts out there think this is possible?

Try this
\d{8}(?=,)
and replace with an empty string
See it here on Regexr

Here is something that might work. Replaces 8 chars and the coma with a coma: s/(.{8}),/,/g;
echo "-113.52106535153605,53.912817815321503,0." | sed 's/.\{8\},/,/'
So you can cat the file you have to a sed command like this:
cat file.kml | sed 's/.\{8\},/,/' > newfile.kml

I Just had to do the same thing. This is perl instead of sed, but it will look for a string of eight uninterrupted digits and then replace any number of uninterrupted digits after that with nothing. It worked great.
cat originalfile.kml | perl -pe 's/(?<=\d{8})\d*//g' > shortenedfile.kml

Related

Select a single character in an alphanumeric string in bash

I have an issue with string manipulation in bash. I have a list of names, each name being composed of two parts, chars and numbers: for example
abcdef01234
I want to cut the last character before the numeric part starts, in this case
f
I think there is a regular expression to help me with this but just can't figure it out. AWK/sed solutions are accepted too. Hope someone can help.
Thank you.
In bash it can be done with parameter expansion with substring removal and string indexes, e.g.,
a=abcdef01234 # your string
tmp=${a%%[0-9]*} # remove all numbers from right
echo ${tmp:(-1)} # output last of remaining chars
Output: f
You can use a regexp like [a-zA-Z]+([a-zA-Z])[0-9]+. If you know how to use sed is pretty easy.
Check https://regex101.com/r/XCkKM5/1
The match will be the letter you want.
^\w+([a-zA-Z])\d+$
As a sed command (on OSX) this will be :
echo "abcdef12345" | sed -E "s#^[a-zA-Z]+([a-zA-Z])[0-9]+\$#\1#"
try following too once.
echo "abcdef01234" | awk '{match($0,/[a-zA-Z]+/);print substr($0,RLENGTH,1)}'
I have a list of names I assume is a file, file. Using grep's PCRE and (positive) lookahead:
$ grep -oP "[a-z](?=[^a-z])" file
f
It prints out the first (lowercase) letter followed by a non-(lowercase)-letter.

AWK\SED Replace both ^(beginning) and $(end) of a string in a single command

I've been looking around but couldn't find a way to do it with both AWK and SED.
I was wondering if there's a way to replace a string's start and end in a single command.
more specifically, there's a file with a lot of words in it, and I would like to add something before the word and after the word.
Thanks,
Roy
Since you said: more specifically, there's a file with a lot of words in it, and I would like to add something before the word and after the word.
The only thing you need is $& that is match itself. So you simply can write anything that you want just before and end of this whildcard. that's it.
For example say you have this file:
this is line 1.
this is line 2.
this is line 3.
And I tested with perl:
perl -lne 'print "beginning->", $&, "<-end" if /.+/g' file
which the output is:
beginning->this is line 1.<-end
beginning->this is line 2.<-end
beginning->this is line 3.<-end
May you would like to match only one word, so still this is a good solution such as:
perl -lne 'use English; print "$PREMATCH", "[$MATCH]","$POSTMATCH" if /line/g' file
Here I matched line and put around that: [ then $& then ]
the output
this is [line] 1.
this is [line] 2.
this is [line] 3.
NOTE
As you can see the only things you need just are prematch and match and postmatch. I tested it with perl for you, and if you are interesting in Perl you can use it or may you want to use Sed or Awk. Since you have no specific examples I tested with Perl.
If you want to wrap a particular word with markers you can use & in the replacement string to achieve what you want.
For example to put square brackets around every occurrence of the word bird:
$ echo "hello bird, are you really a bird?" | sed "s/\bbird\b/[&]/g"
hello [bird], are you really a [bird]?
to replace a string's start and end in a single command
Let's say we have a test file with line:
tag hello, world tag
To enclose each tag word with angle brackets < ... > we can apply:
awk approach with gsub() function:
awk '{ gsub(/\<tag\>/, "<&>"); print}' test_file
word boundaries \<, \> may differ depending on awk implementations
sed approach:
sed 's/\btag\b/<&>/g' test_file
The output(for both approaches):
<tag> hello, world <tag>

How to replace arbritary combinations of (special) characters and numbers using sed and regular expressions

I have a csv file with nearly arbritary filled colums like this:
"bla","","blabla","bla::bla::blabla",19.05.16 12:00:03,123456789,"bla::38594f-47849-h945f",""
and now I want to replace the comma between the two numbers with a point:
"bla","","blabla","bla::bla::blabla",19.05.16 12:00:03.123456789,"bla::38594f-47849-h945f",""
I tried a lot but nothing helped. :-(
sed s/[0-9],[0-9]/./g data.csv
works but it delets the two numbers before and after the comma. So I tried things like
sed s/\(\.[0-9]\),\([0-9]\.\)/\1.\2/g data.csv
but that changed nothing.
Try with s/\([0-9]\),\([0-9]\)/\1.\2/g:
$ echo '"bla","","blabla","bla::bla::blabla",19.05.16 12:00:03,123456789,"bla::38594f-47849-h945f",""' | sed 's/\([0-9]\),\([0-9]\)/\1.\2/g'
"bla","","blabla","bla::bla::blabla",19.05.16 12:00:03.123456789,"bla::38594f-47849-h945f",""
Regex Demo Here
You don't really need the additional dot \. in the capturing groups.

How can I use sed to regex string and number in bash script

I want to separate string and number in a file to get a specific number in bash script, such as:
Branches executed:75.38% of 1190
I want to only get number
75.38
. I have try like the code below
$new_value=value | sed -r 's/.*_([0-9]*)\..*/\1/g'
but it was incorrect and it was failed.
How should it works? Thank you before for your help.
You can use the following regex to extract the first number in a line:
^[^0-9]*\([0-9.]*\).*$
Usage:
% echo 'Branches executed:75.38% of 1190' | sed 's/^[^0-9]*\([0-9.]*\).*$/\1/'
75.38
Give this a try:
value=$(sed "s/^Branches executed:\([0-9][.0-9]*[0-9]*\)%.*$/\1/" afile)
It is assumed that the line appears only once in afile.
The value is stored in the value variable.
There are several things here that we could improve. One is that you need to escape the parentheses in sed: \(...\)
Another one is that it would be good to have a full specification of the input strings as well as a good script that can help us to play with this.
Anyway, this is my first attempt:
Update: I added a little more bash around this regex so it'll be more easy to play with it:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]*\.[0-9]*\).*/\1/g'`
echo $new_value
Update 2: as john pointed out, it will match only numbers that contain a decimal dot. We can fix it with an optional group: \(\.[0-9]\+\)?.
An explanation for the optional group:
\(...\) is a group.
\(...\)? Is a group that appears zero or one times (mind the question mark).
\.[0-9]\+ is the pattern for a dot and one or more digits.
Putting all together:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]\+\(\.[0-9]\+\)\?\).*/\1/g'`
echo $new_value

Regex command line change format of each line

I have a file that contains lines in a format similar to this...
/data/file.geojson?10,20,30,40
/data/file.geojson?bbox=-5.20751953125,49.05227025601607,3.0322265625,56.46249048388979
/data/file.geojson?bbox=-21.46728515625,45.99696161820381,19.2919921875,58.88194208135912
/data/file.geojson?bbox=-2.8482055664062496,54.38935426009769,-0.300750732421875,55.158473983815306
/data/file.geojson?bbox=-21.46728515625,45.99696161820381,19.2919921875,58.88194208135912
/data/file.geojson?bbox=-21.46728515625,45.99696161820381,19.2919921875,58.88194208135912
I've tried a combination of grep, sed, gawk, and |(pipes) to try and pattern match and then change the format to be more like this...
[10,40],[30,40],[30,20][10,20],
[-5.20751953125,56.46249048388979],[3.0322265625,56.46249048388979].....
Hopefully you get the idea from the first line so I don't have to type out all the examples manually!
I've got the hang of regex to match the co-ordinates. In fact the input file is the result of extracting from apache access logs. It might be easier to read/understand answers if they just match positive integer numbers, I will then be able to slot in a more complicated pattern to match the right range.
To be able to arrange the results like you which it is important to be able to access the last for values per line.
No pattern matching is required if you use awk. You can split the input strings by a set of delimiters and reassemble the resulting fields. 40 can be accessed as $(NF), 30 as $(NF-1) and so on.
awk -F'[?,=]' '
{printf "[%s,%s],[%s,%s],[%s,%s],[%s,%s]\n",
$(NF-3),$(NF),$(NF-1),$(NF),
$(NF-1),$(NF-2),$(NF-3),$(NF-2)
}' file
I'm using ?, , or = as the field delimiters. This makes it simple to access the columns of interest.
Output:
[10,40],[30,40],[30,20],[10,20]
[-5.20751953125,56.46249048388979],[3.0322265625,56.46249048388979],[3.0322265625,49.05227025601607],[-5.20751953125,49.05227025601607]
[-21.46728515625,58.88194208135912],[19.2919921875,58.88194208135912],[19.2919921875,45.99696161820381],[-21.46728515625,45.99696161820381]
[-2.8482055664062496,55.158473983815306],[-0.300750732421875,55.158473983815306],[-0.300750732421875,54.38935426009769],[-2.8482055664062496,54.38935426009769]
[-21.46728515625,58.88194208135912],[19.2919921875,58.88194208135912],[19.2919921875,45.99696161820381],[-21.46728515625,45.99696161820381]
[-21.46728515625,58.88194208135912],[19.2919921875,58.88194208135912],[19.2919921875,45.99696161820381],[-21.46728515625,45.99696161820381]
Btw, also sed can be used here:
sed -r 's/.*[?=]([^,]+),([^,]+),([^,]+),(.*)/[\1,\4],[\3,\4],[\3,\2],[\1,\2]/' file
The command is capturing the numbers at the end each in a separate capturing group and re-assembles them in the replacement part.
Not all versions of sed support the + quantifier. The most compatible version would look like this :)
sed 's/.*[?=]\([^,]\{1,\}\),\([^,]\{1,\}+\),\([^,]\{1,\}\),\(.*\)/[\1,\4],[\3,\4],[\3,\2],[\1,\2]/' file
sed strips off items prior to numbers, then awk splits on comma and outputs in different order. Assuming data is in a file called "td.txt"
sed 's/^[^0-9-]*//' td.txt|awk -F, '{print "["$1","$4"],["$3","$4"],["$3","$2"],["$1","$2"],"}'
This might work for you (GNU sed):
sed -r 's/^.*\?[^-0-9]*([^,]*),([^,]*),([^,]*),([^,]*)/[\1,\4],[\3,\4],[\3,\2],[\1,\2]/' file
Or with more toothpicks:
sed 's/^.*\?[^-0-9]*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\)/[\1,\4],[\3,\4],[\3,\2],[\1,\2]/' file
You can use the following to match:
(\/data\/file\.geojson\?(?:bbox=)?)([0-9.-]+),([0-9.-]+),([0-9.-]+),([0-9.-]+)
And replace with the following:
$1[$2,$3],[$4,$5]
See DEMO