Sublime colour syntax highlighting using regex - regex

I've managed to put together a syntax (.tmLanguage) file for use in Sublime Text 2. I'd quite like to highlight numerals. I tried:
<string>0|1|2|3|4|5|6|7|8|9</string>
which works, but only for single digits, so I thought the regex would be
<string>[0-9]</string>
But that doesn't work. Can someone please help me with the correct syntax in Sublime?

If you change your code to:
<string>\d+</string>
It should find all integers.
\d equals any number (0-9)
+ Is a multiplier stating "one or more of the previous character"
In your case, at least one digit, but as many as possible. Might I suggest:
<string>\d+(\.\d+)?</string>
as that will find decimal numbers as well.
\d equals any number (0-9)
+ Is a multiplier stating "one or more of the previous character"
( Starts a group
\. An escaped period sign, to actually capture the period character
\d+ One or more digits
) End f the group
? Makes the entire group optional.
That should capture both integers and decimal numbers.

Related

RegEx to reduce the decimal places to 5 digits in notepad++

I am trying to reduce the size of a geoJSON file so my website viewers can view the maps in page very quickly.
You can find more information about geoJSON format here http://geojson.org/
I read a blog suggesting to reduce the number of digits after decimal places in a GeoJSON file using notepad ++.
I can find answers for removing all decimal places in a number. But my question is I want to preserve the first 5 decimal places in a number and remove the others.
EG: -103.3751447563353
After replacing: -103.37514
Edit:
I tried the answers but my notepad++ says "can't find the text". I have ensured regular expression checkbox is checked but still no luck
This will save more than 10 characters for each latitude or longitude co-ordinates.
Please share your answers
See regex in use here
(?<=\d\.\d{5})\d+
(?<=\d\.\d{5}) Positive lookbehind ensuring what precedes is a digit, dot, and then 5 digits
\d+ Matches one or more digits (this is what will be replaced)
Replace with nothing
Another alternative. See regex in use here
\d+\.\d{5}\K\d+
\d+ Match one or more digits
\. Match the dot character literally
\d{5} Match any digit exactly 5 times
\K Resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
\d+ Matches one or more digits (this is what will be replaced)
Replace with nothing
You could use the following Regex : (\d+\.\d{5})\d*
\d+ looks for any number of digits.
\. looks for the character .
\d{5} lets 5 digits through
\d* looks for the remaining digits
You can then use $1.

Decoding a regex... I know what it's function is but I want to understand exactly what is happening

I have a regular expression that I'm going to be using to verify that an inputted number is in standard U.S. telephone format (i.e (###) ###-####). I am new to regex and still having some trouble figuring out the exact function of each character. If someone would go through this piece by piece/verify that I am understanding I would really appreciate it. Also if the regex is wrong I would obviously like to know that.
\D*?(\d\D*?){10}
What I think is happening:
\D*?( indicates an escape sequence for the parenthesis metacharacter... not sure why the \D*? is necessary
\d indicating digits
\D*? indicating there is a non-digit character (-) followed by the closing parenthesis.
{10} for the 10 digits
I feel very unsure explaining this, like my understanding is very vague in terms of why the regex is in the order that it is etc. Thanks in advance for help/explanations.
EDIT
It seems like this is not the best regex for what I want. Another possibility was [(][0-9]{3}[)] [0-9]{3}-[0-9]{4}, but I was told this would fail. I suppose I'll have to do a little more work with regular expressions to figure this out.
\D matches any non-digit character.
* means that the previous character is repeated 0 or more times.
*? means that the previous character is repeated 0 or more times, but until the match of the following character in the regex. It is a bit difficult perhaps at the start, but in your regex, the next character is \d, meaning \D*? will match the least amount of characters until the next \d character.
( ... ) is a capture group, and is also used to group things. For instance {10} means that the previous character or group is repeated 10 times exactly.
Now, \D*?(\d\D*?){10} will match exactly 10 numbers, starting with non-digit characters or not, with non-digit characters in between the digits if they are present.
[(][0-9]{3}[)] [0-9]{3}-[0-9]{4}
This regex is a bit better since it doesn't just accept anything (like the first regex does) and will match the format (###) ###-#### (notice the space is a character in regex!).
The new things introduced here are the square brackets. These represent character classes. [0-9] means any character between 0 to 9 inclusive, which means it will match 0, 1, 2, 3, 4, 5, 6, 7, 8 or 9. Adding {3} after it makes it match 3 similar character class, and since this character class contains only digits, it will match exactly 3 digits.
A character class can be used to escape certain characters, such as ( or ) (note I mentioned earlier they are for capturing groups, or grouping) and thus, [(] and [)] are literal ( and ) instead of being used for capturing/grouping.
You can also use backslashes (\) to escape characters. Thus:
\([0-9]{3}\) [0-9]{3}-[0-9]{4}
Will also work. I would also recommend the use of line anchors ^ and $ if you're only trying to see if a phone number matches the above format. This ensures that the string has only the phone number, and nothing else. ^ matches the beginning of a line and $ matches the end of a line. Thus, the regex will become:
^\([0-9]{3}\) [0-9]{3}-[0-9]{4}$
However, I don't know all the combinations of the different formats of phone numbers in the US, so this regex might need some tweaking if you have different phone number formats.
\D is "not a digit"; \d is "digit". With that in mind:
This matches zero or more non-digits, then it matches a digit and any number of non-digit characters 10 times. This won't actually verify that the number if formatted properly, just that it contains 10 digits. I suspect that the regex isn't what you want in the first place.
For example, the following will match your regex:
this is some bad text 1 and some more 2 and more 34567890
\D matches a character that is not a digit
* repeats the previous item 0 or more times
? find the first occurrence
\d matches a digit
so your group is matches 10 digits or non digits

What does +(?!\d) in regex mean?

I have also seen it as +$.
I am using
$(this).text( $(this).text().replace(/(\d)(?=(\d{3})+(?!\d))/g, "$1,") );
To convert 10000 into 10,000 etc.
I think I understand everything else:
(\d) - find number
(?=\d{3}) - if followed by 3 numbers
'+' - don't stop after first find
(?!\d) - starting from the last number?
/g - for the whole string
,"$1," - replace number with self and comma
I think you're slightly misreading it:
(?=\d{3}) - if followed by 3 numbers
Note that the regexp is actually:
(?=(\d{3})+
i.e. you've missed an open paren. The entire of the following:
(\d{3})+(?!\d)
is within the (?= ... ), which is a zero-width lookahead assertion—a nice way of saying that the stuff within should follow what we've matched so far, but we don't actually consume it.
The (?!\d) says that a \d (i.e. number) should not follow, so in total:
(\d) find and capture a number.
(?=(\d{3})+(?!\d)) assert that one or more groups of three digits should follow, but they should not have yet another digit following them all.
We replace with "$1,", i.e. the first number captured and a comma.
As a result, we place commas after digits which have multiples of three digits following, which is a nice way to say we add commas as thousand separators!
?! means Negative lookahead , it is used to match something not followed by something else, in your case a digit

Regex to find integers and decimals in string

I have a string like:
$str1 = "12 ounces";
$str2 = "1.5 ounces chopped;
I'd like to get the amount from the string whether it is a decimal or not (12 or 1.5), and then grab the immediately preceding measurement (ounces).
I was able to use a pretty rudimentary regex to grab the measurement, but getting the decimal/integer has been giving me problems.
Thanks for your help!
If you just want to grab the data, you can just use a loose regex:
([\d.]+)\s+(\S+)
([\d.]+): [\d.]+ will match a sequence of strictly digits and . (it means 4.5.6 or .... will match, but those cases are not common, and this is just for grabbing data), and the parentheses signify that we will capture the matched text. The . here is inside character class [], so no need for escaping.
Followed by arbitrary spaces \s+ and maximum sequence (due to greedy quantifier) of non-space character \S+ (non-space really is non-space: it will match almost everything in Unicode, except for space, tab, new line, carriage return characters).
You can get the number in the first capturing group, and the unit in the 2nd capturing group.
You can be a bit stricter on the number:
(\d+(?:\.\d*)?|\.\d+)\s+(\S+)
The only change is (\d+(?:\.\d*)?|\.\d+), so I will only explain this part. This is a bit stricter, but whether stricter is better depending on the input domain and your requirement. It will match integer 34, number with decimal part 3.40000 and allow .5 and 34. cases to pass. It will reject number with excessive ., or only contain a .. The | acts as OR which separate 2 different pattern: \.\d+ and \d+(?:\.\d*)?.
\d+(?:\.\d*)?: This will match and (implicitly) assert at least one digit in integer part, followed by optional . (which needs to be escaped with \ since . means any character) and fractional part (which can be 0 or more digits). The optionality is indicated by ? at the end. () can be used for grouping and capturing - but if capturing is not needed, then (?:) can be used to disable capturing (save memory).
\.\d+: This will match for the case such as .78. It matches . followed by at least one (signified by +) digit.
This is not a good solution if you want to make sure you get something meaningful out of the input string. You need to define all expected units before you can write a regex that only captures valid data.
use this regular expression \b\d+([\.,]\d+)?
To get integers and decimals that either use a comma or a dot plus the next word, use the following regex:
/\d+([\.,]\d+)?\s\S+/

Regex allow digits and a single dot

What would be the regex to allow digits and a dot? Regarding this \D only allows digits, but it doesn't allow a dot, I need it to allow digits and one dot this is refer as a float value I need to be valid when doing a keyup function in jQuery, but all I need is the regex that only allows what I need it to allow.
This will be in the native of JavaScript replace function to remove non-digits and other symbols (except a dot).
Cheers.
If you want to allow 1 and 1.2:
(?<=^| )\d+(\.\d+)?(?=$| )
If you want to allow 1, 1.2 and .1:
(?<=^| )\d+(\.\d+)?(?=$| )|(?<=^| )\.\d+(?=$| )
If you want to only allow 1.2 (only floats):
(?<=^| )\d+\.\d+(?=$| )
\d allows digits (while \D allows anything but digits).
(?<=^| ) checks that the number is preceded by either a space or the beginning of the string. (?=$| ) makes sure the string is followed by a space or the end of the string. This makes sure the number isn't part of another number or in the middle of words or anything.
Edit: added more options, improved the regexes by adding lookahead- and behinds for making sure the numbers are standalone (i.e. aren't in the middle of words or other numbers.
\d*\.\d*
Explanation:
\d* - any number of digits
\. - a dot
\d* - more digits.
This will match 123.456, .123, 123., but not 123
If you want the dot to be optional, in most languages (don't know about jquery) you can use
\d*\.?\d*
Try this
boxValue = boxValue.replace(/[^0-9\.]/g,"");
This Regular Expression will allow only digits and dots in the value of text box.
My try is combined solution.
string = string.replace(',', '.').replace(/[^\d\.]/g, "").replace(/\./, "x").replace(/\./g, "").replace(/x/, ".");
string = Math.round( parseFloat(string) * 100) / 100;
First line solution from here: regex replacing multiple periods in floating number . It replaces comma "," with dot "." ; Replaces first comma with x; Removes all dots and replaces x back to dot.
Second line cleans numbers after dot.
Try the following expression
/^\d{0,2}(\.\d{1,2})?$/.test()