I am trying to reduce the size of a geoJSON file so my website viewers can view the maps in page very quickly.
You can find more information about geoJSON format here http://geojson.org/
I read a blog suggesting to reduce the number of digits after decimal places in a GeoJSON file using notepad ++.
I can find answers for removing all decimal places in a number. But my question is I want to preserve the first 5 decimal places in a number and remove the others.
EG: -103.3751447563353
After replacing: -103.37514
Edit:
I tried the answers but my notepad++ says "can't find the text". I have ensured regular expression checkbox is checked but still no luck
This will save more than 10 characters for each latitude or longitude co-ordinates.
Please share your answers
See regex in use here
(?<=\d\.\d{5})\d+
(?<=\d\.\d{5}) Positive lookbehind ensuring what precedes is a digit, dot, and then 5 digits
\d+ Matches one or more digits (this is what will be replaced)
Replace with nothing
Another alternative. See regex in use here
\d+\.\d{5}\K\d+
\d+ Match one or more digits
\. Match the dot character literally
\d{5} Match any digit exactly 5 times
\K Resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
\d+ Matches one or more digits (this is what will be replaced)
Replace with nothing
You could use the following Regex : (\d+\.\d{5})\d*
\d+ looks for any number of digits.
\. looks for the character .
\d{5} lets 5 digits through
\d* looks for the remaining digits
You can then use $1.
Related
I have an Url formatted as follow : https://www.mywebsite.com/subdomain/123456789.htm. I know that the webpage number is built with exactly 9 or 10 digits. I would like to extract this number using a Regex.
The Regex I use to perform this operation is :
^https://www.mywebsite.com/[A-Za-z0-9_.-~/]+([0-9]{9,10}).htm$
The problem is that when the number is 10 digits long, I get a match which is good but only the last 9 digits are captured. For example : https://www.mywebsite.com/subdomain/1234567890.htm captures 234567890 only.
I could easily create two regexes (one with 9 digits and one with 10) and take the longest number if both matches, but is there any elegant way to solve this problem using Regex?
EDIT
Following remarks which have been made below, there is actually a mistake in my original Regex : the first character group matches the first digit of the 10, and leaves only the 9 others for the capturing group. I've added a screenshot below. Adding a forward slash to the Regex before the capturing group solved the issue, thanks!
As per #TheFourthBird, you are missing a match on the forward slash. Maybe a slightly different approach to yours would be a non-capturing group:
^https://www.mywebsite.com/(?:[^/]+/)+(\d{9,10}).htm$
The character class [A-Za-z0-9_.-~/]+ matches all the character that follow until the end of the line.
This part ([0-9]{9,10}). will then backtrack until it can match the resulting digits, which it can starting from 9 digits and that will be in the capturing group.
Note to either escape the hyphen \- or place it at the start or end of the character class or else it could possible match a range.
One option is to use a word bounary \b before matching the digits
^https://www\.mywebsite\.com/[A-Za-z0-9_.~/-]+\b([0-9]{9,10})\.htm$
Regex demo
Another way could be matching the / right before the digits.
^https://www\.mywebsite\.com/[A-Za-z0-9_.~/-]+/([0-9]{9,10})\.htm$
Regex demo
If there can also be chars a-zA-Z or an underscoe before the digits and a lookbehind is supported, you could also assert that there is not a digit before (?<!\d)
^https://www\.mywebsite\.com/[A-Za-z0-9_.~/-]+(?<!\d)([0-9]{9,10})\.htm$
Regex demo
One more approach. This gets all the numbers between / and htm
(\d+)(?=\.htm)
RegexDemo
Newbie question but how can I check for instances where there are multiple numbers on the same line. For instance, the content reads for example contact 408-555-5454 or reach out to 408-555-4545. Right now the best I can do is ^4 but that's only catching multiple things if the mutliline flag is tured on. Any idea.
You could try the regex below
/4\d{2}(-| )?\d{3}(-| )?\d{4}/g
This of course assumes that you're looking for numbers that start with 4. You can have a look at the Regex Snippet here and you can experiment with trying different variations of the regex to suit your needs.
here's a key to the regex elements included:
4 = matches the literal number 4
\d{2} = matches 2 digits (0-9).
(-| )? = matches either a hyphen or single space but makes it not required. ie you can have a space or hyphen or not.
\d{3} = matches 3 digits (0-9)
Same as #3 above
\d{4} = matches 4 digits (0-9)
the g flag will ensure that you're searching through the whole text and not stopping after the first match.
If you like the answer please Accept it :)
I am looking for help here. I want to write a regex to help me find EXACTLY a 7 digit in string - no more or less.
For instance in this string:
1234567 RE:TKT-2744870-R6P1G0: Gentle Reminder
It should return only 1234567
In this one:
12345678 RE:TKT-2744870-R6P1G0: Gentle Reminder
It should return none.
Can you help me with this one.
thanks in advance.
The proper regex should include \d{7} (7 digits) and 2 "border criteria",
for both start and end of the match, to block matching of a fragment
from longer sequence of digits.
My first thought was that neither before nor after the match there can be any digit.
But as I see from your example, these border criteria should be extended.
The set of "forbidden" chars (either before or after the match) should
include also - and letters.
E.g. 2744870 in your example data contains just 7 digits (no more, no less),
but you still don't want it to be matched, apparently because they are surrounded with - chars.
To keep the regex short, I propose:
(?<![\w-])\d{7}(?![\w-])
Details:
(?<![\w-]) - Negative lookbehind for word char or -.
\d{7} - 7 digits.
(?![\w-]) - Negative lookahead for word char or -.
If you decide to extend the set of "forbidden" chars in both border criteria,
just add them to [...] fragments in lookbehind / lookahead (but - char
should remain at the end, otherwise it must be quoted with \).
Regex like (\d{7})[^\d] (in other proposition) is wrong,
as it matches last 7 digits from any longer sequence of digits
(no "front border criterion").
It matches also both 2744870 (surronded with - chars), which are not
to be matched.
This one should do for your examples:
(\d{7})[^\d]
The first matching group contains the seven digits.
Alternatively –as suggested in the comments– you can use a negative lookahead to only match the seven digits and not require matching groups:
^\d{7}(?!\d)
I've managed to put together a syntax (.tmLanguage) file for use in Sublime Text 2. I'd quite like to highlight numerals. I tried:
<string>0|1|2|3|4|5|6|7|8|9</string>
which works, but only for single digits, so I thought the regex would be
<string>[0-9]</string>
But that doesn't work. Can someone please help me with the correct syntax in Sublime?
If you change your code to:
<string>\d+</string>
It should find all integers.
\d equals any number (0-9)
+ Is a multiplier stating "one or more of the previous character"
In your case, at least one digit, but as many as possible. Might I suggest:
<string>\d+(\.\d+)?</string>
as that will find decimal numbers as well.
\d equals any number (0-9)
+ Is a multiplier stating "one or more of the previous character"
( Starts a group
\. An escaped period sign, to actually capture the period character
\d+ One or more digits
) End f the group
? Makes the entire group optional.
That should capture both integers and decimal numbers.
Is it possible to create a regular expression that matches a comparison such as less than or greater than? For example, match all dollar values less than $500.
One way I would use this would be on online stores that list many products on a single page but do not provide a way to sort by price. I found a search page by regex extension for Chrome and am trying to figure out if there is a way I can use a regex to match any strings on the page beginning with a dollar sign followed by any number less than a number that I specify.
This should work for you \$[1-4]?\d?\d\b.
Explanation:
r"""
\$ # Match the character “$” literally
[1-4] # Match a single character in the range between “1” and “4”
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
\d # Match a single digit 0..9
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
\d # Match a single digit 0..9
\b # Assert position at a word boundary
"""
This could do what you need: ^(\$[1-4]?\d?\d)$. This will match any value between $1 and $499.
As mentioned above, if you would like to match even decimal values you could use something like so: ^(\$[1-4]?\d?\d(\.\d{2})?)$. That being said, numeric validation should ideally be done using actual mathematical operations, and not regular expressions.
Edit: this is overly complicated, but it will also match any value strictly less than 500
\$[1-4]\d{2}(\.\d{2})?$|\$\d{1,2}(\.\d{2})?$
if you need to match $500 as well, add another |\$500(\.00)?$
This matches:
\$ the dollar symbol
[1-4] followed by a digit between 1 and 4
\d{2} followed by exactly 2 digits
(\.\d{2})? optionally --> ()? followed by a dot --> \. and exactly 2 digits
$ followed by end of line (may be replaced with \b for word boundaries)
| or
\$\d{1,2} the dollar symbol followed by any two digits
(\.\d{1,2})?$ again optionally followed by cents, followed by end of line