Regex to match last sentence of a line - regex

Got some text:
[23/07 | DEV | FARO | QC Billable | #2032] Unable to Load label
[30/07 | QC | ROLAWN ] Selling products as a bundle
[11/08 | EST | QC BILLABLE | #2015 ISUOG ] On Demand website looping
[05/08 | EST | ROLAWN | Problems with 'find a stockist'
[29/07 | DEV | QUBA] Blog comments loading to error
[24/07 | FROG | EST| QC BILLABLE #2033] Carousel banner not working correctly
I'm trying to match the last sentence at the end of each line so the matches are as follows:
Unable to Load label
Selling products as a bundle
On Demand website looping
Problems with 'find a stockist'
Blog comments loading to error
Carousel banner not working correctly
Unfortunately, I can't depend on the structure of the line to conform, but the information I'm trying to extract should always be the last sentence. I've tried quite a few different things, but I'm struggling here.

If there is also some kind on no-word character before last sentence, try with:
[\w\s']+$
DEMO

Edit: The answer above by m.cekiera [\w\s']+$ is better.
](.+)$
Here's a pretty naive solution: https://regex101.com/r/yT8jJ7/1.
If you give more details about the actual structure it could be refined.

Related

How to use regexp_replace in spark.sql() to extract hashtags from string

I need to write a regexg_replace query in spark.sql() and I'm not sure how to handle it. For readability purposes, I have to utilize SQL for it. I am trying to pull out the hashtags from the table. I know how to do this using the python method but most of my team are SQL users.
My dataframe example looks like so:
Insta_post
Today, Senate Dems vote to #SaveTheInternet. Proud to support similar #NetNeutrality legislation here in the House…
RT #NALCABPolicy: Meeting with #RepDarrenSoto . Thanks for taking the time to meet with #LatinoLeader ED Marucci Guzman. #NALCABPolicy2018.…
RT #Tharryry: I am delighted that #RepDarrenSoto will be voting for the CRA to overrule the FCC and save our #NetNeutrality rules. Find out…
My code:
I create a tempview:
post_df.createOrReplaceTempView("post_tempview")
post_df = spark.sql("""
select
regexp_replace(Insta_post, '.*?(.|'')(#)(\w+)', '$1') as a
from post_tempview
where Insta_post like '%#%'
""")
My end result:
+--------------------------------------------------------------------------------------------------------------------------------------------+
|a |
+--------------------------------------------------------------------------------------------------------------------------------------------+
|Today, Senate Dems vote to #SaveTheInternet. Proud to support similar #NetNeutrality legislation here in the House… |
|RT #NALCABPolicy: Meeting with #RepDarrenSoto . Thanks for taking the time to meet with #LatinoLeader ED Marucci Guzman. #NALCABPolicy2018.…|
|RT #Tharryry: I am delighted that #RepDarrenSoto will be voting for the CRA to overrule the FCC and save our #NetNeutrality rules. Find out…|
+--------------------------------------------------------------------------------------------------------------------------------------------+
desired result:
+---------------------------------+
|a |
+---------------------------------+
| #SaveTheInternet, #NetNeutrality|
| #NALCABPolicy2018 |
| #NetNeutrality |
+---------------------------------+
I haven't really used regexp_replace too much so this is new to me. Any help would be appreciated as well as an explanation of how to structure the subsets!
For Spark 3.1+, you can use regexp_extract_all function to extract multiple matches:
post_df = spark.sql("""
select regexp_extract_all(Insta_post, '(#\\\\w+)', 1) as a
from post_tempview
where Insta_post like '%#%'
""")
post_df.show(truncate=False)
#+----------------------------------+
#|a |
#+----------------------------------+
#|[#SaveTheInternet, #NetNeutrality]|
#|[#NALCABPolicy2018] |
#|[#NetNeutrality] |
#+----------------------------------+
For Spark <3.1, you can use regexp_replace to remove all that doesn't match the hashtag pattern :
post_df = spark.sql("""
select trim(trailing ',' from regexp_replace(Insta_post, '.*?(#\\\\w+)|.*', '$1,')) as a
from post_tempview
where Insta_post like '%#%'
""")
post_df.show(truncate=False)
#+-------------------------------+
#|a |
#+-------------------------------+
#|#SaveTheInternet,#NetNeutrality|
#|#NALCABPolicy2018 |
#|#NetNeutrality |
#+-------------------------------+
Note the use trim to remove the unnecessary trailing commas created by the first replace $,.
Do you really need a view? Because the following code might do it:
df = df.filter(F.col('Insta_post').like('%#%'))
col_trimmed = F.trim((F.regexp_replace('Insta_post', '.*?(#\w+)|.+', '$1 ')))
df = df.select(F.regexp_replace(col_trimmed,'\s',', ').alias('a'))
df.show(truncate=False)
# +--------------------------------+
# |a |
# +--------------------------------+
# |#SaveTheInternet, #NetNeutrality|
# |#NALCABPolicy2018 |
# |#NetNeutrality |
# +--------------------------------+
I ended up using two of regexp_replace, so potentially there could be a better alternative, just couldn't think of one.

How do I find change point in a timeseries in PoweBi

I have a group of people who started receiving a specific type of social benefit called benefitA, I am interested in knowing what(if any) social benefits the people in the group might have received immediately before they started receiving BenefitA.
My optimal result would be a table with the number people who was receiving respectively BenefitB, BenefitC and not receiving any benefit “BenefitNon” immediately before they started receiving BenefitA.
My data is organized as a relation database with a Facttabel containing an ID for each person in my data and several dimension tables connected to the facttabel. The important ones here at DimDreamYdelse(showing type of benefit received), DimDreamTid(showing week and year). Here is an example of the raw data.
Data Example
I'm not sure how to approach this in PowerBi as I am fairly new to this program. Any advice is most welcome.
I have tried to solve the problem in SQL but as I need this as part of a running report i need to do it in PowerBi. This bit of code might however give some context to what I want to do.
USE FLISDATA_Beskaeftigelse;
SELECT dbo.FactDream.DimDreamTid , dbo.FactDream.DimDreamBenefit , dbo.DimDreamTid.Aar, dbo.DimDreamTid.UgeIAar, dbo.DimDreamBenefit.Benefit,
FROM dbo.FactDream INNER JOIN
dbo.DimDreamTid ON dbo.FactDream.DimDreamTid = dbo.DimDreamTid.DimDreamTidID INNER JOIN
dbo.DimDreamYdelse ON dbo.FactDream.DimDreamBenefit = dbo.DimDreamYdelse.DimDreamBenefitID
WHERE (dbo.DimDreamYdelse.Ydelse LIKE 'Benefit%') AND (dbo.DimDreamTid.Aar = '2019')
ORDER BY dbo.DimDreamTid.Aar, dbo.DimDreamTid.UgeIAar
I suggest to use PowerQuery to transform your table into more suitable form for your analysis. Things would be much easier if each row of the table represents the "change" of benefit plan like this.
| Person ID | Benefit From | Benefit To | Date |
|-----------|--------------|------------|------------|
| 15 | BenefitNon | BenefitA | 2019-07-01 |
| 15 | BenefitA | BenefitNon | 2019-12-01 |
| 17 | BenefitC | BenefitA | 2019-06-01 |
| 17 | BenefitA | BenefitB | 2019-08-01 |
| 17 | BenefitB | BenefitA | 2019-09-01 |
| ...
Then you can simply count the numbers by COUNTROWS(BenefitChanges) filtering/slicing with both Benefit From and Benefit To.

Regex to extract two values from single string in Splunk

I've log statements appearing in Splunk as below.
info Request method=POST, time=100, id=12345
info Response statuscode=200, time=300, id=12345
I'm trying to write a Splunk query that would extract the time parameter from the lines starting with info Request and info Response and basically find the time difference. Is there a way I can do this in a query? I'm able to extract values separately from each statement but not the two values together.
I'm hoping for something like below, but I guess the piping won't work:
... | search log="info Request*" | rex field=log "time=(?<time1>[^\,]+)" | search log="info Response*" | rex field=log "time=(?<time2>[^\,]+)" | table time1, time2
Any help is highly appreciated.
General process:
Extract type into a field
Calculate response and request times
Group by id
Calculate the diff
You may want to use something other than stats(latest) but won't matter if there's only one request/response per id.
| rex field=_raw "info (?<type>\w+).*"
| eval requestTime = if(type="Request",time,NULL)
| eval responseTime = if(type="Response",time,NULL)
| stats latest(requestTime) as requestTime latest(responseTime) as responseTime by id
| eval diff = responseTime - requestTime

cucumber Repeat steps

I am learing cucumber and trying to write a feature file.
Following is my feature file.
Feature: Doctors handover Notes Module
Scenario: Search for patients on the bases of filter criteria
Given I am on website login page
When I put username, password and select database:
| Field | Value |
| username | test |
| password | pass |
| database | test|
Then I login to eoasis
Then I click on doctors hand over notes link
And I am on doctors handover notes page
Then I select sites, wards, onCallTeam, grades,potential Discharge, outstanding task,High priority:
| siteList | wardsList | onCallTeamList | gradesList | potentialDischargeCB | outstandingTasksCB | highPriorityCB |
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | null | null | null | null | null |
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | GENERAL MEDICINE | null | null | null | null |
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | GENERAL MEDICINE | CONSULTANT | null | null | null |
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | GENERAL MEDICINE | CONSULTANT | true | null | null |
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | GENERAL MEDICINE | CONSULTANT | true | true | null |
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | GENERAL MEDICINE | CONSULTANT | true | true | true |
Then I click on search button
Then I should see search results
I want to repeat last three steps like I select the search criteria then click on search button and then check search result. So how should I break this feature file. if I use scenario outline then there would be two different scenarios One for login and one for search criteria. Is that fine? Will the session will maintain in that case? Whats the best way to write such feature file.
Or is this a right way to write?
I don't think we can have multiple example sets in a Scenario Outline.
Most of the scenario steps in the example is too procedural to have its own step.
The first three steps could be reduced to something like.
Given I am logged into eoasis as a <user>
Code in the step definition, which could make calls to a separate login method that could take care of updating entering the username, password and selecting database.
Another rule is to avoid statements like "When I click the doctor's handover link". The keyword to avoid here being click. Today its a click, tomorrow it could be drop down or a button. So the focus should be on the functional expectation of the user, which is viewing the handover notes. So we modify this to
When I view the doctor's handover notes link
To summarize, this is how I would write this test.
Scenario Outline: Search for patients on the basis of filter criteria
Given I am logged into eoasis as a <user>
When I view the doctor's handover notes link
And I select sites, wards, onCallTeam, grades, potential Discharge, outstanding task, High priority
And perform a search
Then I should see the search results
Examples:
|sites |wards |onCallTeam |grades |potential Discharge |outstanding task |High priority|
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | null | null | null | null | null |
This really is the wrong way to write features. This feature is very declarative, its all about HOW you do something. What a feature should do is explain WHY you are doing something.
Another bad thing this feature does is mix up the details of two different operations, signing in, and searching for patients. Write a feature for each one e.g.
Feature: Signing in
As a doctor
I want my patients data to only be available if I sign in
So I ensure their confidentiality
Scenario: Sign in
Given I am a doctor
When I sign in
Then I should be signed in
Feature: Search for patients
Explain why searching for patients gives value to the doctor
...
You should focus on the name of the feature and the bit at the top that explains why this has value first. If you do that well then the scenarios are much easier to write (look how simple my sign in scenario is).
The art of writing features is doing this bit well, so that you end up with simple scenarios.

Regular Expression patterns for Tracking numbers

Does anybody know good place or patterns for checking which company tracking number is the given tracking number for a package. Idea is After scanning a barcode for a package check tracking number with patterns and show which company it was shipped by.
Just thought I would post an update on this as I am working on this to match via jquery and automatically select the appropriate shipping carrier. I compiled a list of the matching regex for my project and I have tested a lot of tracking numbers across UPS FedEX and USPS.
If you come across something which doesn't match, please let me know here via comments and I will try to come up for that as well.
UPS:
/\b(1Z ?[0-9A-Z]{3} ?[0-9A-Z]{3} ?[0-9A-Z]{2} ?[0-9A-Z]{4} ?[0-9A-Z]{3} ?[0-9A-Z]|[\dT]\d\d\d ?\d\d\d\d ?\d\d\d)\b/
FedEX: (3 Different Ones)
/(\b96\d{20}\b)|(\b\d{15}\b)|(\b\d{12}\b)/
/\b((98\d\d\d\d\d?\d\d\d\d|98\d\d) ?\d\d\d\d ?\d\d\d\d( ?\d\d\d)?)\b/
/^[0-9]{15}$/
USPS: (4 Different Ones)
/(\b\d{30}\b)|(\b91\d+\b)|(\b\d{20}\b)/
/^E\D{1}\d{9}\D{2}$|^9\d{15,21}$/
/^91[0-9]+$/
/^[A-Za-z]{2}[0-9]+US$/
Please note that I did not come up with these myself. I simply searched around and compiled the list from different sources, including some which may have been mentioned here.
Thanks
Edit: Fixed missing end delimiter.
I needed something more robust for my use case. I kept running across examples that were incomplete, incorrect, or overly verbose without any improvement in correctness. Hopefully this helps someone else! It covers all of the different formats in the other answers, plus a few more, and doesn't overlap between FedEx and USPS unlike some of the other answers.
Tracking Number Regular Expressions:
USPS/S10:
https://postalpro.usps.com/mnt/glusterfs/2020-02/Pub%20199%20Intelligent%20Mail%20Package%20Barcode%20(IMpb)%20Implementation%20Guide%202020_02_11%20TT%20v6.pdf
\b([A-Z]{2}\d{9}[A-Z]{2}|(420\d{9}(9[2345])?)?\d{20}|(420\d{5})?(9[12345])?(\d{24}|\d{20})|82\d{8})\b
UPS:
\b1Z[A-Z0-9]{16}\b
FedEx:
\b([0-9]{12}|100\d{31}|\d{15}|\d{18}|96\d{20}|96\d{32})\b
Caveats/notes:
FedEx SmartPost is [intentionally] categorized as USPS; it can be tracked with either
USPS includes S10 format tracking numbers used for international post
Tracking numbers have module check bits; these regex's don't check them
This was found by reading spec sheets, reading other answers, looking at open source code, etc. It matched ~6,000 tracking numbers I ran it against with 100% accuracy, but I can't be sure it will be correct in all cases.
These assume you've removed all whitespace before applying the regex
Example Tracking Numbers
Mostly pulled from:
https://tools.usps.com/go/TrackConfirmAction
https://github.com/jkeen/tracking_number_data
| Tracking Number | Kind | Tracking Carrier |
|------------------------------------|-------------------------------------|------------------|
| 03071790000523483741 | USPS 20 | USPS |
| 71123456789123456787 | USPS 20 | USPS |
| 4201002334249200190132607600833457 | USPS 34v2 | USPS |
| 4201028200009261290113185417468510 | USPS 34v2 | USPS |
| 420221539101026837331000039521 | USPS 91 | USPS |
| 71969010756003077385 | USPS 91 | USPS |
| 9505511069605048600624 | USPS 91 | USPS |
| 9101123456789000000013 | USPS 91 | USPS |
| 92748931507708513018050063 | USPS 91 | USPS |
| 9400111201080805483016 | USPS 91 | USPS |
| 9361289878700317633795 | USPS 91 | USPS |
| 9405803699300124287899 | USPS 91 | USPS |
| EK115095696SA | S10 | USPS |
| 1Z5R89390357567127 | UPS | UPS |
| 1Z879E930346834440 | UPS | UPS |
| 1Z410E7W0392751591 | UPS | UPS |
| 1Z8V92A70367203024 | UPS | UPS |
| 1ZXX3150YW44070023 | UPS | UPS |
| 986578788855 | FedEx Express (12) | FedEx |
| 477179081230 | FedEx Express (12) | FedEx |
| 799531274483 | FedEx Express (12) | FedEx |
| 790535312317 | FedEx Express (12) | FedEx |
| 974367662710 | FedEx Express (12) | FedEx |
| 1001921334250001000300779017972697 | FedEx Express (34) | FedEx |
| 1001921380360001000300639585804382 | FedEx Express (34) | FedEx |
| 1001901781990001000300617767839437 | FedEx Express (34) | FedEx |
| 1002297871540001000300790695517286 | FedEx Express (34) | FedEx |
| 61299998820821171811 | FedEx SmartPost | USPS |
| 9261292700768711948021 | FedEx SmartPost | USPS |
| 041441760228964 | FedEx Ground | FedEx |
| 568283610012000 | FedEx Ground | FedEx |
| 568283610012734 | FedEx Ground | FedEx |
| 000123450000000027 | FedEx Ground (SSCC-18) | FedEx |
| 9611020987654312345672 | FedEx Ground 96 (22) | FedEx |
| 9622001900000000000000776632517510 | FedEx Ground GSN | FedEx |
| 9622001560000000000000794808390594 | FedEx Ground GSN | FedEx |
| 9622001560001234567100794808390594 | FedEx Ground GSN | FedEx |
| 9632001560123456789900794808390594 | FedEx Ground GSN | FedEx |
| 9400100000000000000000 | USPS Tracking | USPS |
| 9205500000000000000000 | Priority Mail | USPS |
| 9407300000000000000000 | Certified Mail | USPS |
| 9303300000000000000000 | Collect On Delivery Hold For Pickup | USPS |
| 8200000000 | Global Express Guaranteed | USPS |
| EC000000000US | Priority Mail Express International | USPS |
| 9270100000000000000000 | Priority Mail Express | USPS |
| EA000000000US | Priority Mail Express | USPS |
| CP000000000US | Priority Mail International | USPS |
| 9208800000000000000000 | Registered Mail | USPS |
| 9202100000000000000000 | Signature Confirmation | USPS |
I need to verify JUST United States Postal Service (USPS) tracking numbers. WikiAnswers says that my number formats are as follows:
USPS only offers tracking with Express
mail, with usually begins with an "E",
another letter, followed by 9 digits,
and two more letters. USPS does have
"Label numbers" for other services
that are between 16 and 22 digits
long.
http://wiki.answers.com/Q/How_many_numbers_in_a_USPS_tracking_number
I'm adding in that the Label numbers start with a "9" as all the ones I have from personal shipments for the past 2 years start with a 9.
So, assuming that WikiAnswers is correct, here is my regex that matches both:
/^E\D{1}\d{9}\D{2}$|^9\d{15,21}$/
It's pretty simple. Here is the break down:
^E - Begins w/ E (For express number)
\D{1} - followed by another letter
\d{9} - followed by 9 numbers
\D{2} - followed by 2 more letters
$ - End of string
| - OR
^9 - Basic Track & Ship Number
\d{15,21} - followed by 15 to 21 numbers
$ - End of string
Using www.gummydev.com's regex tester this patter matches both of my test strings:
EXPRESS MAIL : EK225651436US
LABEL NUMBER: 9410803699300003725216
**Note: If you're using ColdFusion (I am), remove the first and last "/" from the pattern
I pressed Royal Mail for a regex for the Recorded Delivery & Special Delivery tracking references but didn't get very far. Even a full set of rules so I could roll my own was beyond them.
Basically, even after they had taken about a week and came back with various combinations of letters denoting service type, I was able to provide examples from our experience that showed there were additional combinations that were obviously valid but that they had not documented.
The references follow the apparently standard international format that I think Jefe's /^[A-Za-z]{2}[0-9]+GB$/ regex would describe:
XX123456789GB
Even though this seems to be a standard format, i.e. most international mail has the same format where the last two letters denote the country of origin, I've not been able to find out any more about this 'standard' or where it originates from (any clarification welcome!).
Particular to Royal Mail seems to be the use of the first two letters to denote service level. I have managed to compile a list of prefixes that denote Special Delivery, but am not convinced that it is 100% complete:
AD AE AF AJ AK AR AZ BP CX DS EP HC HP KC KG
KH KI KJ KQ KU KV KW KY KZ PW SA SC SG SH SI
SJ SL SP SQ SU SW SY SZ TX WA WH XQ WZ
Without one of these prefixes the service is Recorded Delivery which gives delivery confirmation but no tracking.
It seems generally that inclusion of an S, X or Z denotes a higher service level and I don't think I've ever seen a normal Recorded Delivery item with any of those letters in the prefix.
However, as you can see there are many prefixes that would need to be tested if service level were to be checked using regex, and given the fact that Royal Mail seem incapable of providing a comprehensive rule set then trying to test for service level may be futile.
Here are some sample numbers from the main US carriers:
USPS:
70160910000108310009 (certified)
23153630000057728970 (signature confirmation)
RE360192014US (registered mail)
EL595811950US (priority express)
9374889692090270407075 (regular)
FEDEX:
810132562702 (all seem to follow same pattern regardless)
795223646324
785037759224
UPS:
K2479825491 (UPS ground)
J4603636537 (UPS next day express)
1Z87585E4391018698 (regular)
Patterns I am using (php code). Yep I gave up and started testing against all the patterns at my disposal. Had to write the second UPS one.
public function getCarrier($trackingNumber){
$matchUPS1 = '/\b(1Z ?[0-9A-Z]{3} ?[0-9A-Z]{3} ?[0-9A-Z]{2} ?[0-9A-Z]{4} ?[0-9A-Z]{3} ?[0-9A-Z]|[\dT]\d\d\d ?\d\d\d\d ?\d\d\d)\b/';
$matchUPS2 = '/^[kKJj]{1}[0-9]{10}$/';
$matchUSPS0 = '/(\b\d{30}\b)|(\b91\d+\b)|(\b\d{20}\b)/';
$matchUSPS1 = '/(\b\d{30}\b)|(\b91\d+\b)|(\b\d{20}\b)|(\b\d{26}\b)| ^E\D{1}\d{9}\D{2}$|^9\d{15,21}$| ^91[0-9]+$| ^[A-Za-z]{2}[0-9]+US$/i';
$matchUSPS2 = '/^E\D{1}\d{9}\D{2}$|^9\d{15,21}$/';
$matchUSPS3 = '/^91[0-9]+$/';
$matchUSPS4 = '/^[A-Za-z]{2}[0-9]+US$/';
$matchUSPS5 = '/(\b\d{30}\b)|(\b91\d+\b)|(\b\d{20}\b)|(\b\d{26}\b)| ^E\D{1}\d{9}\D{2}$|^9\d{15,21}$| ^91[0-9]+$| ^[A-Za-z]{2}[0-9]+US$/i';
$matchFedex1 = '/(\b96\d{20}\b)|(\b\d{15}\b)|(\b\d{12}\b)/';
$matchFedex2 = '/\b((98\d\d\d\d\d?\d\d\d\d|98\d\d) ?\d\d\d\d ?\d\d\d\d( ?\d\d\d)?)\b/';
$matchFedex3 = '/^[0-9]{15}$/';
if(preg_match($matchUPS1, $trackingNumber) ||
preg_match($matchUPS2, $trackingNumber))
{
echo('UPS');
$carrier = 'UPS';
return $carrier;
} else if(preg_match($matchUSPS0, $trackingNumber) ||
preg_match($matchUSPS1, $trackingNumber) ||
preg_match($matchUSPS2, $trackingNumber) ||
preg_match($matchUSPS3, $trackingNumber) ||
preg_match($matchUSPS4, $trackingNumber) ||
preg_match($matchUSPS5, $trackingNumber)) {
$carrier = 'USPS';
return $carrier;
} else if (preg_match($matchFedex1, $trackingNumber) ||
preg_match($matchFedex2, $trackingNumber) ||
preg_match($matchFedex3, $trackingNumber)) {
$carrier = 'FedEx';
return $carrier;
} else if (0){
$carrier = 'DHL';
return $carrier;
}
return;
}
Been researching this for a while, and made these based mostly on the answers here.
These should cover everything, without being too lenient.
UPS:
/^(1Z\s?[0-9A-Z]{3}\s?[0-9A-Z]{3}\s?[0-9A-Z]{2}\s?[0-9A-Z]{4}\s?[0-9A-Z]{3}\s?[0-9A-Z]$|[\dT]\d{3}\s?\d{4}s?\d{3})$/i
USPS:
/^(EA|EC|CP|RA)\d{9}(\D{2})?$|^(7\d|03|23|91)\d{2}\s?\d{4}\s?\d{4}\s?\d{4}\s?\d{4}(\s\d{2})?$|^82\s?\d{3}\s?\d{3}\s?\d{2}$/i
FEDEX:
/^(((96|98)\d{5}\s?\d{4}$|^(96|98)\d{2})\s?\d{4}\s?\d{4}(\s?\d{3})?)$/i
I'm working in an Angular2+ app and just put together a component to handle common US tracking numbers. It tests them using standard JavaScript RegExp's that I put together from this resource HERE & HERE and sets the href on an anchor tag with the tracking link URL if it's good. You don't have to be using Angular or TypeScript to easily adapt this to your application. I tested it out with different dummy numbers and seem to work dynamically so far. Please note, you can also switch out the null in the last else statement with the in-line commented url and it will send you to a Google search.
Any feedback (or if your tracking numbers don't work) please let me know I will update the answer. Thanks!
USAGE IN HTML:
<app-tracking-number [trackNum]="myTrackingNumberInput"></app-tracking-number>
COMPONENT .TS
import { Component, OnInit, Input } from '#angular/core';
#Component({
selector: 'app-tracking-number',
templateUrl: './tracking-number.component.html',
styleUrls: ['./tracking-number.component.scss']
})
export class TrackingNumberComponent implements OnInit {
#Input() trackNum:string;
trackNumHref:string = null;
// Carrier tracking numbers patterns from https://www.iship.com/trackit/info.aspx?info=24 AND https://www.canadapost.ca/web/en/kb/details.page?article=how_to_track_a_packa&cattype=kb&cat=receiving&subcat=tracking
isUPS:RegExp = new RegExp('^1Z[A-H,J-N,P,R-Z,0-9]{16}$'); // UPS tracking numbers usually begin with "1Z", contain 18 characters, and do not contain the letters "O", "I", or "Q".
isFedEx:RegExp = new RegExp('^[0-9]{12}$|^[0-9]{15}$'); // FedEx Express tracking numbers are normally 12 digits long and do not contain letters AND FedEx Ground tracking numbers are normally 15 digits long and do not contain letters.
isUSPS:RegExp = new RegExp('^[0-9]{20,22}$|^[A-Z]{2}[0-9,A-Z]{9}US$'); // USPS Tracking numbers are normally 20-22 digits long and do not contain letters AND USPS Express Mail tracking numbers are normally 13 characters long, begin with two letters, and end with "US".
isDHL:RegExp = new RegExp('^[0-9]{10,11}$'); // DHL tracking numbers are normally 10 or 11 digits long and do not contain letters.
isCAPost:RegExp = new RegExp('^[0-9]{16}$|^[A-Z]{2}[0-9]{9}[A-Z]{2}$'); // 16 numeric digits (0000 0000 0000 0000) AND 13 numeric and alphabetic characters (AA 000 000 000 AA).
constructor() { }
ngOnInit() {
this.setHref();
}
setHref() {
if(!this.trackNum) this.trackNumHref = null;
else if(this.isUPS.test(this.trackNum)) this.trackNumHref = `https://wwwapps.ups.com/WebTracking/processInputRequest?AgreeToTermsAndConditions=yes&loc=en_US&tracknum=${this.trackNum}&Requester=trkinppg`;
else if(this.isFedEx.test(this.trackNum)) this.trackNumHref = `https://www.fedex.com/apps/fedextrack/index.html?tracknumber=${this.trackNum}`;
else if(this.isUSPS.test(this.trackNum)) this.trackNumHref = `https://tools.usps.com/go/TrackConfirmAction?tLabels=${this.trackNum}`;
else if(this.isDHL.test(this.trackNum)) this.trackNumHref = `http://www.dhl.com/en/express/tracking.html?AWB=${this.trackNum}&brand=DHL`;
else if(this.isCAPost.test(this.trackNum)) this.trackNumHref =`https://www.canadapost.ca/trackweb/en#/search?searchFor=${this.trackNum}`;
else this.trackNumHref = null; // Google search as fallback... `https://www.google.com/search?q=${this.trackNum}`;
}
}
COMPONENT .HTML
<a *ngIf="trackNumHref" [href]="trackNumHref" target="_blank">{{trackNum}}</a>
<span *ngIf="!trackNumHref">{{trackNum}}</span>
Here is a great resource which captures just about all possibilities and is as tight as I have found:
https://andrewkurochkin.com/blog/code-for-recognizing-delivery-company-by-track
string[] upsPattern = new string[]
{
"^(1Z)[0-9A-Z]{16}$",
"^(T)+[0-9A-Z]{10}$",
"^[0-9]{9}$",
"^[0-9]{26}$"
};
string[] uspsPattern = new string[]
{
"^(94|93|92|94|95)[0-9]{20}$",
"^(94|93|92|94|95)[0-9]{22}$",
"^(70|14|23|03)[0-9]{14}$",
"^(M0|82)[0-9]{8}$",
"^([A-Z]{2})[0-9]{9}([A-Z]{2})$"
};
string[] fedexPattern = new string[]
{
"^[0-9]{20}$",
"^[0-9]{15}$",
"^[0-9]{12}$",
"^[0-9]{22}$"
};
You can try these (not guaranteed):
UPS:
\b(1Z ?[0-9A-Z]{3} ?[0-9A-Z]{3} ?[0-9A-Z]{2} ?[0-9A-Z]{4} ?[0-9A-Z]{3} ?[0-9A-Z]|[\dT]\d\d\d ?\d\d\d\d ?\d\d\d)\b
UPS:
\b(1Z ?\d\d\d ?\d\w\w ?\d\d ?\d\d\d\d ?\d\d\d ?\d|[\dT]\d\d\d ?\d\d\d\d ?\d\d\d)\b
USPost:
\b(\d\d\d\d ?\d\d\d\d ?\d\d\d\d ?\d\d\d\d ?\d\d\d\d ?\d\d|\d\d\d\d ?\d\d\d\d ?\d\d\d\d ?\d\d\d\d ?\d\d\d\d)\b
But please test before you use them. I recommend RegexBuddy.
I use these in an eBay application I wrote:
USPS Domestic:
/^91[0-9]+$/
USPS International:
/^[A-Za-z]{2}[0-9]+US$/
FedEx:
/^[0-9]{15}$/
However, this might be eBay/Paypal specific, as all USPS Domestic labels start with "91". All USPS International labels start with two characters and end with "US". As far as I know, FedEx just uses 15 random digits.
(Please note that these regular expressions assume all spaces are removed. It would be fairly easy to allow for spaces though)
Check out this github project that lists a lot of PHP tracking regexes. https://github.com/darkain/php-tracking-urls
Here are the ones I am now using in my Java app. These are determined by my experience of sucking tracking numbers out of shipping confirmation emails from a whole pile of drop ship services. I just made a new USPS one from scratch since none of the ones I found worked for some of my numbers based on example numbers on the USPS site. These only work for US tracking codes because we only sell in the US.
private final Pattern UPS_TRACKING_NUMBER =
Pattern.compile("[^A-Za-z0-9](1Z[A-Za-z0-9]{6,})",
Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
private final Pattern FEDEX_TRACKING_NUMBER =
Pattern.compile("\\b((96|98)\\d{18,20}|\\d{15}|\\d{12})\\b",
Pattern.MULTILINE);
private final Pattern USPS_TRACKING_NUMBER =
Pattern.compile("\\b(9[2-4]\\d{20}(?:(?:EA|RA)\\d{9}US)?|(?:03|23|14|70)\\d{18})\\b",
Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
I believe FedEx is 12 digits:
^[0-9]{12}$
I also came across tracking numbers from FedEx with 22 digits recently, so watch out!
I haven't found any good reference for the FedEx's general format yet.
FedEx Example #: 9612019059803563050071
Late to the party however, the below will work with 26 char USPS numbers as well.
/(\b\d{30}\b)|(\b91\d+\b)|(\b\d{20}\b)|(\b\d{26}\b)|^E\D{1}\d{9}\D{2}$|^9\d{15,21}$|^91[0-9]+$|^[A-Za-z]{2}[0-9]+US$/i
I know there are already lots of answers and that this was asked a long time ago, but I don't see a single one that addresses all the possible USPS tracking numbers with a single expression.
Here is what I came up with:
((\d{4})(\s?\d{4}){4}\s?\d{2})|((\d{2})(\s?\d{3}){2}\s?\d{2})|((\D{2})(\s?\d{3}){3}\s?\D{2})
See it working here: http://regexr.com/3e61u
//UPS - UNITED PARCEL SERVICE
final String UPS = "\b(1Z ?[0-9A-Z]{3} ?[0-9A-Z]{3} ?[0-9A-Z]{2} ?[0-9A-Z]{4} ?[0-9A-Z]{3} ?[0-9A-Z]|T\d{3} ?\d{4} ?\d{3})\b";
//USPS - UNITED STATES POSTAL SERVICE - FORMAT 1
final String USPS_FORMAT1 = "\b((420 ?\d{5} ?)?(91|92|93|94|01|03|04|70|23|13)\d{2} ?\d{4} ?\d{4} ?\d{4} ?\d{4}( ?\d{2,6})?)\b";
//USPS - UNITED STATES POSTAL SERVICE - FORMAT 2
final String USPS_FORMAT2 = "\b((M|P[A-Z]?|D[C-Z]|LK|E[A-C]|V[A-Z]|R[A-Z]|CP|CJ|LC|LJ) ?\d{3} ?\d{3} ?\d{3} ?[A-Z]?[A-Z]?)\b";
//USPS - UNITED STATES POSTAL SERVICE - FORMAT 3
final String USPS_FORMAT3 = "\b(82 ?\d{3} ?\d{3} ?\d{2})\b";
//FEDEX - FEDERAL EXPRESS
final String FED_EX = "\b(((96\d\d|6\d)\d{3} ?\d{4}|96\d{2}|\d{4}) ?\d{4} ?\d{4}( ?\d{3})?)\b";
//ONTRAC
final String ONTRAC = "\b(C\d{14})\b";
//DHL
final String DHL = "\b(\d{4}[- ]?\d{4}[- ]?\d{2}|\d{3}[- ]?\d{8}|[A-Z{3}\d{7})\b";
Sample tracking number
UPS
//"1Z 999 AA1 01 2345 6784"
Fed-ex
// "449044304137821"
USPS
//"9400 1000 0000 0000 0000 00"
final Pattern pattern = Pattern.compile(DHL, Pattern.CASE_INSENSITIVE |
Pattern.UNICODE_CASE);
final Matcher matcher = pattern.matcher("1Z 999 AA1 01 2345 6784");
if (matcher.find()) {
System.out.println(true + "");
}
It's working in java and android.
https://regex101.com/
You can change your regex into another language regex by this link and generate code also.
Here's an up-to-date regex for UPS. It works with standard and Mail Innovation type tracking numbers:
\b(1Z ?[0-9A-Z]{3} ?[0-9A-Z]{3} ?[0-9A-Z]{2} ?[0-9A-Z]{4} ?[0-9A-Z]{3} ?[0-9A-Z]|[\dT]\d\d\d ?\d\d\d\d ?\d\d\d|\d\d\d ?\d\d\d ?\d\d\d|\d{22,34})\b
I solved this by using an external API : https://shippingcarrierdetector.com/
If your project allows external API's it might be a much quicker and easier solution than trying to build the logic yourself.