Regular Expression patterns for Tracking numbers

Regular Expression patterns for Tracking numbers - regex

Does anybody know good place or patterns for checking which company tracking number is the given tracking number for a package. Idea is After scanning a barcode for a package check tracking number with patterns and show which company it was shipped by.

Just thought I would post an update on this as I am working on this to match via jquery and automatically select the appropriate shipping carrier. I compiled a list of the matching regex for my project and I have tested a lot of tracking numbers across UPS FedEX and USPS.
If you come across something which doesn't match, please let me know here via comments and I will try to come up for that as well.
UPS:
/\b(1Z ?[0-9A-Z]{3} ?[0-9A-Z]{3} ?[0-9A-Z]{2} ?[0-9A-Z]{4} ?[0-9A-Z]{3} ?[0-9A-Z]|[\dT]\d\d\d ?\d\d\d\d ?\d\d\d)\b/
FedEX: (3 Different Ones)
/(\b96\d{20}\b)|(\b\d{15}\b)|(\b\d{12}\b)/
/\b((98\d\d\d\d\d?\d\d\d\d|98\d\d) ?\d\d\d\d ?\d\d\d\d( ?\d\d\d)?)\b/
/^[0-9]{15}$/
USPS: (4 Different Ones)
/(\b\d{30}\b)|(\b91\d+\b)|(\b\d{20}\b)/
/^E\D{1}\d{9}\D{2}$|^9\d{15,21}$/
/^91[0-9]+$/
/^[A-Za-z]{2}[0-9]+US$/
Please note that I did not come up with these myself. I simply searched around and compiled the list from different sources, including some which may have been mentioned here.
Thanks
Edit: Fixed missing end delimiter.

I needed something more robust for my use case. I kept running across examples that were incomplete, incorrect, or overly verbose without any improvement in correctness. Hopefully this helps someone else! It covers all of the different formats in the other answers, plus a few more, and doesn't overlap between FedEx and USPS unlike some of the other answers.
Tracking Number Regular Expressions:
USPS/S10:
https://postalpro.usps.com/mnt/glusterfs/2020-02/Pub%20199%20Intelligent%20Mail%20Package%20Barcode%20(IMpb)%20Implementation%20Guide%202020_02_11%20TT%20v6.pdf
\b([A-Z]{2}\d{9}[A-Z]{2}|(420\d{9}(9[2345])?)?\d{20}|(420\d{5})?(9[12345])?(\d{24}|\d{20})|82\d{8})\b
UPS:
\b1Z[A-Z0-9]{16}\b
FedEx:
\b([0-9]{12}|100\d{31}|\d{15}|\d{18}|96\d{20}|96\d{32})\b
Caveats/notes:
FedEx SmartPost is [intentionally] categorized as USPS; it can be tracked with either
USPS includes S10 format tracking numbers used for international post
Tracking numbers have module check bits; these regex's don't check them
This was found by reading spec sheets, reading other answers, looking at open source code, etc. It matched ~6,000 tracking numbers I ran it against with 100% accuracy, but I can't be sure it will be correct in all cases.
These assume you've removed all whitespace before applying the regex
Example Tracking Numbers
Mostly pulled from:
https://tools.usps.com/go/TrackConfirmAction
https://github.com/jkeen/tracking_number_data
| Tracking Number | Kind | Tracking Carrier |
|------------------------------------|-------------------------------------|------------------|
| 03071790000523483741 | USPS 20 | USPS |
| 71123456789123456787 | USPS 20 | USPS |
| 4201002334249200190132607600833457 | USPS 34v2 | USPS |
| 4201028200009261290113185417468510 | USPS 34v2 | USPS |
| 420221539101026837331000039521 | USPS 91 | USPS |
| 71969010756003077385 | USPS 91 | USPS |
| 9505511069605048600624 | USPS 91 | USPS |
| 9101123456789000000013 | USPS 91 | USPS |
| 92748931507708513018050063 | USPS 91 | USPS |
| 9400111201080805483016 | USPS 91 | USPS |
| 9361289878700317633795 | USPS 91 | USPS |
| 9405803699300124287899 | USPS 91 | USPS |
| EK115095696SA | S10 | USPS |
| 1Z5R89390357567127 | UPS | UPS |
| 1Z879E930346834440 | UPS | UPS |
| 1Z410E7W0392751591 | UPS | UPS |
| 1Z8V92A70367203024 | UPS | UPS |
| 1ZXX3150YW44070023 | UPS | UPS |
| 986578788855 | FedEx Express (12) | FedEx |
| 477179081230 | FedEx Express (12) | FedEx |
| 799531274483 | FedEx Express (12) | FedEx |
| 790535312317 | FedEx Express (12) | FedEx |
| 974367662710 | FedEx Express (12) | FedEx |
| 1001921334250001000300779017972697 | FedEx Express (34) | FedEx |
| 1001921380360001000300639585804382 | FedEx Express (34) | FedEx |
| 1001901781990001000300617767839437 | FedEx Express (34) | FedEx |
| 1002297871540001000300790695517286 | FedEx Express (34) | FedEx |
| 61299998820821171811 | FedEx SmartPost | USPS |
| 9261292700768711948021 | FedEx SmartPost | USPS |
| 041441760228964 | FedEx Ground | FedEx |
| 568283610012000 | FedEx Ground | FedEx |
| 568283610012734 | FedEx Ground | FedEx |
| 000123450000000027 | FedEx Ground (SSCC-18) | FedEx |
| 9611020987654312345672 | FedEx Ground 96 (22) | FedEx |
| 9622001900000000000000776632517510 | FedEx Ground GSN | FedEx |
| 9622001560000000000000794808390594 | FedEx Ground GSN | FedEx |
| 9622001560001234567100794808390594 | FedEx Ground GSN | FedEx |
| 9632001560123456789900794808390594 | FedEx Ground GSN | FedEx |
| 9400100000000000000000 | USPS Tracking | USPS |
| 9205500000000000000000 | Priority Mail | USPS |
| 9407300000000000000000 | Certified Mail | USPS |
| 9303300000000000000000 | Collect On Delivery Hold For Pickup | USPS |
| 8200000000 | Global Express Guaranteed | USPS |
| EC000000000US | Priority Mail Express International | USPS |
| 9270100000000000000000 | Priority Mail Express | USPS |
| EA000000000US | Priority Mail Express | USPS |
| CP000000000US | Priority Mail International | USPS |
| 9208800000000000000000 | Registered Mail | USPS |
| 9202100000000000000000 | Signature Confirmation | USPS |

I need to verify JUST United States Postal Service (USPS) tracking numbers. WikiAnswers says that my number formats are as follows:
USPS only offers tracking with Express
mail, with usually begins with an "E",
another letter, followed by 9 digits,
and two more letters. USPS does have
"Label numbers" for other services
that are between 16 and 22 digits
long.
http://wiki.answers.com/Q/How_many_numbers_in_a_USPS_tracking_number
I'm adding in that the Label numbers start with a "9" as all the ones I have from personal shipments for the past 2 years start with a 9.
So, assuming that WikiAnswers is correct, here is my regex that matches both:
/^E\D{1}\d{9}\D{2}$|^9\d{15,21}$/
It's pretty simple. Here is the break down:
^E - Begins w/ E (For express number)
\D{1} - followed by another letter
\d{9} - followed by 9 numbers
\D{2} - followed by 2 more letters
$ - End of string
| - OR
^9 - Basic Track & Ship Number
\d{15,21} - followed by 15 to 21 numbers
$ - End of string
Using www.gummydev.com's regex tester this patter matches both of my test strings:
EXPRESS MAIL : EK225651436US
LABEL NUMBER: 9410803699300003725216
**Note: If you're using ColdFusion (I am), remove the first and last "/" from the pattern

I pressed Royal Mail for a regex for the Recorded Delivery & Special Delivery tracking references but didn't get very far. Even a full set of rules so I could roll my own was beyond them.
Basically, even after they had taken about a week and came back with various combinations of letters denoting service type, I was able to provide examples from our experience that showed there were additional combinations that were obviously valid but that they had not documented.
The references follow the apparently standard international format that I think Jefe's /^[A-Za-z]{2}[0-9]+GB$/ regex would describe:
XX123456789GB
Even though this seems to be a standard format, i.e. most international mail has the same format where the last two letters denote the country of origin, I've not been able to find out any more about this 'standard' or where it originates from (any clarification welcome!).
Particular to Royal Mail seems to be the use of the first two letters to denote service level. I have managed to compile a list of prefixes that denote Special Delivery, but am not convinced that it is 100% complete:
AD AE AF AJ AK AR AZ BP CX DS EP HC HP KC KG
KH KI KJ KQ KU KV KW KY KZ PW SA SC SG SH SI
SJ SL SP SQ SU SW SY SZ TX WA WH XQ WZ
Without one of these prefixes the service is Recorded Delivery which gives delivery confirmation but no tracking.
It seems generally that inclusion of an S, X or Z denotes a higher service level and I don't think I've ever seen a normal Recorded Delivery item with any of those letters in the prefix.
However, as you can see there are many prefixes that would need to be tested if service level were to be checked using regex, and given the fact that Royal Mail seem incapable of providing a comprehensive rule set then trying to test for service level may be futile.

Here are some sample numbers from the main US carriers:
USPS:
70160910000108310009 (certified)
23153630000057728970 (signature confirmation)
RE360192014US (registered mail)
EL595811950US (priority express)
9374889692090270407075 (regular)
FEDEX:
810132562702 (all seem to follow same pattern regardless)
795223646324
785037759224
UPS:
K2479825491 (UPS ground)
J4603636537 (UPS next day express)
1Z87585E4391018698 (regular)
Patterns I am using (php code). Yep I gave up and started testing against all the patterns at my disposal. Had to write the second UPS one.
public function getCarrier($trackingNumber){
$matchUPS1 = '/\b(1Z ?[0-9A-Z]{3} ?[0-9A-Z]{3} ?[0-9A-Z]{2} ?[0-9A-Z]{4} ?[0-9A-Z]{3} ?[0-9A-Z]|[\dT]\d\d\d ?\d\d\d\d ?\d\d\d)\b/';
$matchUPS2 = '/^[kKJj]{1}[0-9]{10}$/';
$matchUSPS0 = '/(\b\d{30}\b)|(\b91\d+\b)|(\b\d{20}\b)/';
$matchUSPS1 = '/(\b\d{30}\b)|(\b91\d+\b)|(\b\d{20}\b)|(\b\d{26}\b)| ^E\D{1}\d{9}\D{2}$|^9\d{15,21}$| ^91[0-9]+$| ^[A-Za-z]{2}[0-9]+US$/i';
$matchUSPS2 = '/^E\D{1}\d{9}\D{2}$|^9\d{15,21}$/';
$matchUSPS3 = '/^91[0-9]+$/';
$matchUSPS4 = '/^[A-Za-z]{2}[0-9]+US$/';
$matchUSPS5 = '/(\b\d{30}\b)|(\b91\d+\b)|(\b\d{20}\b)|(\b\d{26}\b)| ^E\D{1}\d{9}\D{2}$|^9\d{15,21}$| ^91[0-9]+$| ^[A-Za-z]{2}[0-9]+US$/i';
$matchFedex1 = '/(\b96\d{20}\b)|(\b\d{15}\b)|(\b\d{12}\b)/';
$matchFedex2 = '/\b((98\d\d\d\d\d?\d\d\d\d|98\d\d) ?\d\d\d\d ?\d\d\d\d( ?\d\d\d)?)\b/';
$matchFedex3 = '/^[0-9]{15}$/';
if(preg_match($matchUPS1, $trackingNumber) ||
preg_match($matchUPS2, $trackingNumber))
{
echo('UPS');
$carrier = 'UPS';
return $carrier;
} else if(preg_match($matchUSPS0, $trackingNumber) ||
preg_match($matchUSPS1, $trackingNumber) ||
preg_match($matchUSPS2, $trackingNumber) ||
preg_match($matchUSPS3, $trackingNumber) ||
preg_match($matchUSPS4, $trackingNumber) ||
preg_match($matchUSPS5, $trackingNumber)) {
$carrier = 'USPS';
return $carrier;
} else if (preg_match($matchFedex1, $trackingNumber) ||
preg_match($matchFedex2, $trackingNumber) ||
preg_match($matchFedex3, $trackingNumber)) {
$carrier = 'FedEx';
return $carrier;
} else if (0){
$carrier = 'DHL';
return $carrier;
}
return;
}

Been researching this for a while, and made these based mostly on the answers here.
These should cover everything, without being too lenient.
UPS:
/^(1Z\s?[0-9A-Z]{3}\s?[0-9A-Z]{3}\s?[0-9A-Z]{2}\s?[0-9A-Z]{4}\s?[0-9A-Z]{3}\s?[0-9A-Z]$|[\dT]\d{3}\s?\d{4}s?\d{3})$/i
USPS:
/^(EA|EC|CP|RA)\d{9}(\D{2})?$|^(7\d|03|23|91)\d{2}\s?\d{4}\s?\d{4}\s?\d{4}\s?\d{4}(\s\d{2})?$|^82\s?\d{3}\s?\d{3}\s?\d{2}$/i
FEDEX:
/^(((96|98)\d{5}\s?\d{4}$|^(96|98)\d{2})\s?\d{4}\s?\d{4}(\s?\d{3})?)$/i

I'm working in an Angular2+ app and just put together a component to handle common US tracking numbers. It tests them using standard JavaScript RegExp's that I put together from this resource HERE & HERE and sets the href on an anchor tag with the tracking link URL if it's good. You don't have to be using Angular or TypeScript to easily adapt this to your application. I tested it out with different dummy numbers and seem to work dynamically so far. Please note, you can also switch out the null in the last else statement with the in-line commented url and it will send you to a Google search.
Any feedback (or if your tracking numbers don't work) please let me know I will update the answer. Thanks!
USAGE IN HTML:
<app-tracking-number [trackNum]="myTrackingNumberInput"></app-tracking-number>
COMPONENT .TS
import { Component, OnInit, Input } from '#angular/core';
#Component({
selector: 'app-tracking-number',
templateUrl: './tracking-number.component.html',
styleUrls: ['./tracking-number.component.scss']
})
export class TrackingNumberComponent implements OnInit {
#Input() trackNum:string;
trackNumHref:string = null;
// Carrier tracking numbers patterns from https://www.iship.com/trackit/info.aspx?info=24 AND https://www.canadapost.ca/web/en/kb/details.page?article=how_to_track_a_packa&cattype=kb&cat=receiving&subcat=tracking
isUPS:RegExp = new RegExp('^1Z[A-H,J-N,P,R-Z,0-9]{16}$'); // UPS tracking numbers usually begin with "1Z", contain 18 characters, and do not contain the letters "O", "I", or "Q".
isFedEx:RegExp = new RegExp('^[0-9]{12}$|^[0-9]{15}$'); // FedEx Express tracking numbers are normally 12 digits long and do not contain letters AND FedEx Ground tracking numbers are normally 15 digits long and do not contain letters.
isUSPS:RegExp = new RegExp('^[0-9]{20,22}$|^[A-Z]{2}[0-9,A-Z]{9}US$'); // USPS Tracking numbers are normally 20-22 digits long and do not contain letters AND USPS Express Mail tracking numbers are normally 13 characters long, begin with two letters, and end with "US".
isDHL:RegExp = new RegExp('^[0-9]{10,11}$'); // DHL tracking numbers are normally 10 or 11 digits long and do not contain letters.
isCAPost:RegExp = new RegExp('^[0-9]{16}$|^[A-Z]{2}[0-9]{9}[A-Z]{2}$'); // 16 numeric digits (0000 0000 0000 0000) AND 13 numeric and alphabetic characters (AA 000 000 000 AA).
constructor() { }
ngOnInit() {
this.setHref();
}
setHref() {
if(!this.trackNum) this.trackNumHref = null;
else if(this.isUPS.test(this.trackNum)) this.trackNumHref = `https://wwwapps.ups.com/WebTracking/processInputRequest?AgreeToTermsAndConditions=yes&loc=en_US&tracknum=${this.trackNum}&Requester=trkinppg`;
else if(this.isFedEx.test(this.trackNum)) this.trackNumHref = `https://www.fedex.com/apps/fedextrack/index.html?tracknumber=${this.trackNum}`;
else if(this.isUSPS.test(this.trackNum)) this.trackNumHref = `https://tools.usps.com/go/TrackConfirmAction?tLabels=${this.trackNum}`;
else if(this.isDHL.test(this.trackNum)) this.trackNumHref = `http://www.dhl.com/en/express/tracking.html?AWB=${this.trackNum}&brand=DHL`;
else if(this.isCAPost.test(this.trackNum)) this.trackNumHref =`https://www.canadapost.ca/trackweb/en#/search?searchFor=${this.trackNum}`;
else this.trackNumHref = null; // Google search as fallback... `https://www.google.com/search?q=${this.trackNum}`;
}
}
COMPONENT .HTML
<a *ngIf="trackNumHref" [href]="trackNumHref" target="_blank">{{trackNum}}</a>
<span *ngIf="!trackNumHref">{{trackNum}}</span>

Here is a great resource which captures just about all possibilities and is as tight as I have found:
https://andrewkurochkin.com/blog/code-for-recognizing-delivery-company-by-track
string[] upsPattern = new string[]
{
"^(1Z)[0-9A-Z]{16}$",
"^(T)+[0-9A-Z]{10}$",
"^[0-9]{9}$",
"^[0-9]{26}$"
};
string[] uspsPattern = new string[]
{
"^(94|93|92|94|95)[0-9]{20}$",
"^(94|93|92|94|95)[0-9]{22}$",
"^(70|14|23|03)[0-9]{14}$",
"^(M0|82)[0-9]{8}$",
"^([A-Z]{2})[0-9]{9}([A-Z]{2})$"
};
string[] fedexPattern = new string[]
{
"^[0-9]{20}$",
"^[0-9]{15}$",
"^[0-9]{12}$",
"^[0-9]{22}$"
};

You can try these (not guaranteed):
UPS:
\b(1Z ?[0-9A-Z]{3} ?[0-9A-Z]{3} ?[0-9A-Z]{2} ?[0-9A-Z]{4} ?[0-9A-Z]{3} ?[0-9A-Z]|[\dT]\d\d\d ?\d\d\d\d ?\d\d\d)\b
UPS:
\b(1Z ?\d\d\d ?\d\w\w ?\d\d ?\d\d\d\d ?\d\d\d ?\d|[\dT]\d\d\d ?\d\d\d\d ?\d\d\d)\b
USPost:
\b(\d\d\d\d ?\d\d\d\d ?\d\d\d\d ?\d\d\d\d ?\d\d\d\d ?\d\d|\d\d\d\d ?\d\d\d\d ?\d\d\d\d ?\d\d\d\d ?\d\d\d\d)\b
But please test before you use them. I recommend RegexBuddy.

I use these in an eBay application I wrote:
USPS Domestic:
/^91[0-9]+$/
USPS International:
/^[A-Za-z]{2}[0-9]+US$/
FedEx:
/^[0-9]{15}$/
However, this might be eBay/Paypal specific, as all USPS Domestic labels start with "91". All USPS International labels start with two characters and end with "US". As far as I know, FedEx just uses 15 random digits.
(Please note that these regular expressions assume all spaces are removed. It would be fairly easy to allow for spaces though)

Check out this github project that lists a lot of PHP tracking regexes. https://github.com/darkain/php-tracking-urls

Here are the ones I am now using in my Java app. These are determined by my experience of sucking tracking numbers out of shipping confirmation emails from a whole pile of drop ship services. I just made a new USPS one from scratch since none of the ones I found worked for some of my numbers based on example numbers on the USPS site. These only work for US tracking codes because we only sell in the US.
private final Pattern UPS_TRACKING_NUMBER =
Pattern.compile("[^A-Za-z0-9](1Z[A-Za-z0-9]{6,})",
Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
private final Pattern FEDEX_TRACKING_NUMBER =
Pattern.compile("\\b((96|98)\\d{18,20}|\\d{15}|\\d{12})\\b",
Pattern.MULTILINE);
private final Pattern USPS_TRACKING_NUMBER =
Pattern.compile("\\b(9[2-4]\\d{20}(?:(?:EA|RA)\\d{9}US)?|(?:03|23|14|70)\\d{18})\\b",
Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);

I believe FedEx is 12 digits:
^[0-9]{12}$

I also came across tracking numbers from FedEx with 22 digits recently, so watch out!
I haven't found any good reference for the FedEx's general format yet.
FedEx Example #: 9612019059803563050071

Late to the party however, the below will work with 26 char USPS numbers as well.
/(\b\d{30}\b)|(\b91\d+\b)|(\b\d{20}\b)|(\b\d{26}\b)|^E\D{1}\d{9}\D{2}$|^9\d{15,21}$|^91[0-9]+$|^[A-Za-z]{2}[0-9]+US$/i

I know there are already lots of answers and that this was asked a long time ago, but I don't see a single one that addresses all the possible USPS tracking numbers with a single expression.
Here is what I came up with:
((\d{4})(\s?\d{4}){4}\s?\d{2})|((\d{2})(\s?\d{3}){2}\s?\d{2})|((\D{2})(\s?\d{3}){3}\s?\D{2})
See it working here: http://regexr.com/3e61u

//UPS - UNITED PARCEL SERVICE
final String UPS = "\b(1Z ?[0-9A-Z]{3} ?[0-9A-Z]{3} ?[0-9A-Z]{2} ?[0-9A-Z]{4} ?[0-9A-Z]{3} ?[0-9A-Z]|T\d{3} ?\d{4} ?\d{3})\b";
//USPS - UNITED STATES POSTAL SERVICE - FORMAT 1
final String USPS_FORMAT1 = "\b((420 ?\d{5} ?)?(91|92|93|94|01|03|04|70|23|13)\d{2} ?\d{4} ?\d{4} ?\d{4} ?\d{4}( ?\d{2,6})?)\b";
//USPS - UNITED STATES POSTAL SERVICE - FORMAT 2
final String USPS_FORMAT2 = "\b((M|P[A-Z]?|D[C-Z]|LK|E[A-C]|V[A-Z]|R[A-Z]|CP|CJ|LC|LJ) ?\d{3} ?\d{3} ?\d{3} ?[A-Z]?[A-Z]?)\b";
//USPS - UNITED STATES POSTAL SERVICE - FORMAT 3
final String USPS_FORMAT3 = "\b(82 ?\d{3} ?\d{3} ?\d{2})\b";
//FEDEX - FEDERAL EXPRESS
final String FED_EX = "\b(((96\d\d|6\d)\d{3} ?\d{4}|96\d{2}|\d{4}) ?\d{4} ?\d{4}( ?\d{3})?)\b";
//ONTRAC
final String ONTRAC = "\b(C\d{14})\b";
//DHL
final String DHL = "\b(\d{4}[- ]?\d{4}[- ]?\d{2}|\d{3}[- ]?\d{8}|[A-Z{3}\d{7})\b";
Sample tracking number
UPS
//"1Z 999 AA1 01 2345 6784"
Fed-ex
// "449044304137821"
USPS
//"9400 1000 0000 0000 0000 00"
final Pattern pattern = Pattern.compile(DHL, Pattern.CASE_INSENSITIVE |
Pattern.UNICODE_CASE);
final Matcher matcher = pattern.matcher("1Z 999 AA1 01 2345 6784");
if (matcher.find()) {
System.out.println(true + "");
}
It's working in java and android.
https://regex101.com/
You can change your regex into another language regex by this link and generate code also.

Here's an up-to-date regex for UPS. It works with standard and Mail Innovation type tracking numbers:
\b(1Z ?[0-9A-Z]{3} ?[0-9A-Z]{3} ?[0-9A-Z]{2} ?[0-9A-Z]{4} ?[0-9A-Z]{3} ?[0-9A-Z]|[\dT]\d\d\d ?\d\d\d\d ?\d\d\d|\d\d\d ?\d\d\d ?\d\d\d|\d{22,34})\b

I solved this by using an external API : https://shippingcarrierdetector.com/
If your project allows external API's it might be a much quicker and easier solution than trying to build the logic yourself.

Related

How do I find change point in a timeseries in PoweBi

I have a group of people who started receiving a specific type of social benefit called benefitA, I am interested in knowing what(if any) social benefits the people in the group might have received immediately before they started receiving BenefitA.
My optimal result would be a table with the number people who was receiving respectively BenefitB, BenefitC and not receiving any benefit “BenefitNon” immediately before they started receiving BenefitA.
My data is organized as a relation database with a Facttabel containing an ID for each person in my data and several dimension tables connected to the facttabel. The important ones here at DimDreamYdelse(showing type of benefit received), DimDreamTid(showing week and year). Here is an example of the raw data.
Data Example
I'm not sure how to approach this in PowerBi as I am fairly new to this program. Any advice is most welcome.
I have tried to solve the problem in SQL but as I need this as part of a running report i need to do it in PowerBi. This bit of code might however give some context to what I want to do.
USE FLISDATA_Beskaeftigelse;
SELECT dbo.FactDream.DimDreamTid , dbo.FactDream.DimDreamBenefit , dbo.DimDreamTid.Aar, dbo.DimDreamTid.UgeIAar, dbo.DimDreamBenefit.Benefit,
FROM dbo.FactDream INNER JOIN
dbo.DimDreamTid ON dbo.FactDream.DimDreamTid = dbo.DimDreamTid.DimDreamTidID INNER JOIN
dbo.DimDreamYdelse ON dbo.FactDream.DimDreamBenefit = dbo.DimDreamYdelse.DimDreamBenefitID
WHERE (dbo.DimDreamYdelse.Ydelse LIKE 'Benefit%') AND (dbo.DimDreamTid.Aar = '2019')
ORDER BY dbo.DimDreamTid.Aar, dbo.DimDreamTid.UgeIAar

I suggest to use PowerQuery to transform your table into more suitable form for your analysis. Things would be much easier if each row of the table represents the "change" of benefit plan like this.
| Person ID | Benefit From | Benefit To | Date |
|-----------|--------------|------------|------------|
| 15 | BenefitNon | BenefitA | 2019-07-01 |
| 15 | BenefitA | BenefitNon | 2019-12-01 |
| 17 | BenefitC | BenefitA | 2019-06-01 |
| 17 | BenefitA | BenefitB | 2019-08-01 |
| 17 | BenefitB | BenefitA | 2019-09-01 |
| ...
Then you can simply count the numbers by COUNTROWS(BenefitChanges) filtering/slicing with both Benefit From and Benefit To.

Keep words starting with character/letter in Pandas | Python

I'm not sure how to do this in a dataframe context
I have the table below here with text information
TEXT |
-------------------------------------------|
"Get some new #turbo #stacks today!" |
"Is it one or three? #phone" |
"Mayhaps it be three afterall..." |
"So many new issues with phone... #iphone" |
And I want to edit it down to where only the words with a '#' symbol are kept, like in the result below.
TEXT |
-----------------|
"#turbo #stacks" |
"#phone" |
"" |
"#iphone" |
In some cases, I'd also like to know if it's possible to eliminate the rows that are empty by checking for NaN as true or if you run a different kind of condition to get this result:
TEXT |
-----------------|
"#turbo #stacks" |
"#phone" |
"#iphone" |
Python 2.7 and pandas for this.

You could try using regex and extractall:
df.TEXT.str.extractall('(#\w+)').groupby(level=0)[0].apply(' '.join)
Output:
0 #turbo #stacks
1 #phone
3 #iphone
Name: 0, dtype: object

Regex to match last sentence of a line

Got some text:
[23/07 | DEV | FARO | QC Billable | #2032] Unable to Load label
[30/07 | QC | ROLAWN ] Selling products as a bundle
[11/08 | EST | QC BILLABLE | #2015 ISUOG ] On Demand website looping
[05/08 | EST | ROLAWN | Problems with 'find a stockist'
[29/07 | DEV | QUBA] Blog comments loading to error
[24/07 | FROG | EST| QC BILLABLE #2033] Carousel banner not working correctly
I'm trying to match the last sentence at the end of each line so the matches are as follows:
Unable to Load label
Selling products as a bundle
On Demand website looping
Problems with 'find a stockist'
Blog comments loading to error
Carousel banner not working correctly
Unfortunately, I can't depend on the structure of the line to conform, but the information I'm trying to extract should always be the last sentence. I've tried quite a few different things, but I'm struggling here.

If there is also some kind on no-word character before last sentence, try with:
[\w\s']+$
DEMO

Edit: The answer above by m.cekiera [\w\s']+$ is better.
](.+)$
Here's a pretty naive solution: https://regex101.com/r/yT8jJ7/1.
If you give more details about the actual structure it could be refined.

cucumber Repeat steps

I am learing cucumber and trying to write a feature file.
Following is my feature file.
Feature: Doctors handover Notes Module
Scenario: Search for patients on the bases of filter criteria
Given I am on website login page
When I put username, password and select database:
| Field | Value |
| username | test |
| password | pass |
| database | test|
Then I login to eoasis
Then I click on doctors hand over notes link
And I am on doctors handover notes page
Then I select sites, wards, onCallTeam, grades,potential Discharge, outstanding task,High priority:
| siteList | wardsList | onCallTeamList | gradesList | potentialDischargeCB | outstandingTasksCB | highPriorityCB |
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | null | null | null | null | null |
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | GENERAL MEDICINE | null | null | null | null |
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | GENERAL MEDICINE | CONSULTANT | null | null | null |
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | GENERAL MEDICINE | CONSULTANT | true | null | null |
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | GENERAL MEDICINE | CONSULTANT | true | true | null |
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | GENERAL MEDICINE | CONSULTANT | true | true | true |
Then I click on search button
Then I should see search results
I want to repeat last three steps like I select the search criteria then click on search button and then check search result. So how should I break this feature file. if I use scenario outline then there would be two different scenarios One for login and one for search criteria. Is that fine? Will the session will maintain in that case? Whats the best way to write such feature file.
Or is this a right way to write?

I don't think we can have multiple example sets in a Scenario Outline.
Most of the scenario steps in the example is too procedural to have its own step.
The first three steps could be reduced to something like.
Given I am logged into eoasis as a <user>
Code in the step definition, which could make calls to a separate login method that could take care of updating entering the username, password and selecting database.
Another rule is to avoid statements like "When I click the doctor's handover link". The keyword to avoid here being click. Today its a click, tomorrow it could be drop down or a button. So the focus should be on the functional expectation of the user, which is viewing the handover notes. So we modify this to
When I view the doctor's handover notes link
To summarize, this is how I would write this test.
Scenario Outline: Search for patients on the basis of filter criteria
Given I am logged into eoasis as a <user>
When I view the doctor's handover notes link
And I select sites, wards, onCallTeam, grades, potential Discharge, outstanding task, High priority
And perform a search
Then I should see the search results
Examples:
|sites |wards |onCallTeam |grades |potential Discharge |outstanding task |High priority|
| THE INFIRMARY | INFIRMARY WARD 9 - ASSESSMENT | null | null | null | null | null |

This really is the wrong way to write features. This feature is very declarative, its all about HOW you do something. What a feature should do is explain WHY you are doing something.
Another bad thing this feature does is mix up the details of two different operations, signing in, and searching for patients. Write a feature for each one e.g.
Feature: Signing in
As a doctor
I want my patients data to only be available if I sign in
So I ensure their confidentiality
Scenario: Sign in
Given I am a doctor
When I sign in
Then I should be signed in
Feature: Search for patients
Explain why searching for patients gives value to the doctor
...
You should focus on the name of the feature and the bit at the top that explains why this has value first. If you do that well then the scenarios are much easier to write (look how simple my sign in scenario is).
The art of writing features is doing this bit well, so that you end up with simple scenarios.

The best way to generate path pattern for materialized path tree structures

Browsing through examples all over the web, I can see that people generate the path using something like "parent_id.node_id". Examples:-
uid | name | tree_id
--------------------
1 | Ali | 1.
2 | Abu | 2.
3 | Ita | 1.3.
4 | Ira | 1.3.
5 | Yui | 1.3.4
But as explained in this question - Sorting tree with a materialized path?, using zero padding to the tree_id make it easy to sort it by the creation order.
uid | name | tree_id
--------------------
1 | Ali | 0001.
2 | Abu | 0002.
3 | Ita | 0001.0003.
4 | Ira | 0001.0003.
5 | Yui | 0001.0003.0004
Using fix length string like this also make it easy for me to calculate the level - length(tree_id)/5. What I'm worried is it would limit me to maximum 9999 users rather than 9999 per branch. Am I right here ?
9999 | Tar | 0001.9999
10000 | Tor | 0001.??

You are correct -- zero-padding each node ID would allow you to sort the entire tree quite simply. However, you have to make the padding width match the upper limit of digits of the ID field, as you have pointed out in your last example. E.g., if you're using an int unsigned field for your ID, the highest value would be 4,294,967,295. This is ten digits, meaning that the record set from your last example might look like:
uid | name | tree_id
9999 | Tar | 0000000001.0000009999
10000 | Tor | 0000000001.0000010000
As long as you know you're not going to need to change your ID field to bigint unsigned in the future, this will continue work, though it might be a bit data-hungry depending on how huge your tables get. You could shave off two bytes per node ID by storing the values in hexadecimal, which would still be sorted correctly in a string sort:
uid | name | tree_id
9999 | Tar | 00000001.0000270F
10000 | Tor | 00000001.00002710
I can imagine this would make things a real headache when trying to update the paths (pruning nodes, etc) though.
You can also create extra fields for sorting, e.g.:
uid | name | tree_id | name_sort
9999 | Tar | 00000001.0000270F | Ali.Tar
10000 | Tor | 00000001.00002710 | Ali.Tor
There are limitations, however, as laid out by this guy's answer to a similar materialized path sorting question. The name field would have to be padded to a set length (fortunately, in your example, each name seems to be three characters long), and it would take up a lot of space.
In conclusion, given the above issues, I've found that the most versatile way to do sorting like this is to simply do it in your application logic -- say, using a recursive function that builds a nested array, sorting the children of each node as it goes.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js