Match all characters in group except for first and last occurrence - regex

Say I request
parent/child/child/page-name
in my browser. I want to extract the parent, children as well as page name. Here are the regular expressions I am currently using. There should be no limit as to how many children there are in the url request. For the time being, the page name will always be at the end and never be omitted.
^([\w-]{1,}){1} -> Match parent (returns 'parent')
(/(?:(?!/).)*[a-z]){1,}/ -> Match children (returns /child/child/)
[\w-]{1,}(?!.*[\w-]{1,}) -> Match page name (returns 'page-name')
The more I play with this, the more I feel how clunky this solution is. This is for a small CMS I am developing in ASP Classic (:(). It is sort of like the MVC routing paths. But instead of calling controllers and functions based on the URL request. I would be travelling down the hierarchy and finding the appropriate page in the database. The database is using the nested set model and is linked by a unique page name for each child.
I have tried using the split function to split with a / delimiter however I found I was nested so many split statements together it became very unreadable.
All said, I need an efficient way to parse out the parent, children as well as page name from a string. Could someone please provide an alternative solution?
To be honest, I'm not even sure if a regular expression is the best solution to my problem.
Thank you.

You could try using:
^([\w-]+)(/.*/)([\w-]+)$
And then access the three matching groups created using Match.SubMatches. See here for more details.
EDIT
Actually, assuming that you know that [\w-] is all that is used in the names of the parts, you can use ^([\w-]+)(.*)([\w-]+)$ instead and it will handle the no-child case fine by itself as well.

Related

Shopify regular expressions for checkout process

I am working with the Google Ads team in my company on a Shopify store and they asked me for some regular expressions for the several steps of the checkout process. I created them and everything was running fine, until the guys noticed that sometimes Analytics added a _ga paremeter to the URL query parameters.
My original expressions are:
1. When in cart - no problem here
\/cart
2. First step of checkout - Contact Information - In several lines for easier reading
(
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)$
|
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\?step=contact_information
)
In this part I added the step=contact_information as an OR option. It isn't normally there except for when you go back to contact information it is added to the URL as step. I know this is not the ideal way, but I am far from fluent in regex.
3. Shipping information
(
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\?step=shipping_method
|
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)?(.*)&step=shipping_method
)
In this part it always has step=shipping_method but it can also have previous_step=contact_information. This is also not ideal, but I am not sure how to do it.
4. Payment information
(
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\?step=payment_method
|
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)?(.*)&step=payment_method
)
The same as step 3, in this case it always has step=payment_method but it can also have previous_step=shipping_method. As points 2 and 3, not ideal.
5. Processing - this part works fine, because I am not interested in the query parameters
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\/processing
6. Thank you page - this also works fine, because I am not interested in the query parameters
\/([0-9]*)\/checkouts\/([a-z0-9\-]*)\/thank_you
Issue with _ga parameter
Those regular expressions work fine with the regular URLs, but when I add the _ga parameter to the URL they don't match. I think there was a way to match query parameters, but I am not sure how to match certain and exclude others.
The _ga parameter normally persists on the next steps
The list of all the possible matches for points 2., 3. and 4.:
Contact information without and with _ga
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=contact_information
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=contact_information
Shipping method without and with _ga
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=shipping_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=shipping_method&previous_step=contact_information
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=shipping_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=shipping_method&previous_step=contact_information
Payment method without and with _ga
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=payment_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?step=payment_method&previous_step=shipping_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=payment_method
/25931284564/checkouts/df24e48ecc81f767583c4a26680bcb82?_ga=2.150710640.738515769.1576779089-71346777.1571176760%26_gac%3D1.16451458.1576260301.EAIaIQobChMI9v2c5Zqz5gIVr__jBx1VAgxPEAAYBCAAEgLccPD_BwE&locale=es&step=payment_method&previous_step=shipping_method
Any ideas how I could solve this? I am pretty sure it's simple, but my brain just doesn't get around more complex regular expressions :)
UPDATE
Just to clear this up a bit more, what I need to achieve with the regular expressions is to identify specifically the step of the funnel.
The Google Ads guys from my team are creating a funnel in Analytics and they add the corresponding steps from the checkout as stages of the funnel.
So basically I just need my regexes to be able to work with or without the _ga query, BUT always detecting a specific step.
UPDATE 2
I added all the possible matches. I need to be able to identify the specific step through the regular expression. So basically I need one regular expression for contact information, one for shipping method and one for payment method, each identifying only the specific step with or without _ga in the URL.
I believe for the checkout url you can simply use this regex:
/([0-9]+)/checkouts/([a-z0-9-]+)(?:.*step=([a-z0-9_-]+))?
no matter if the url is with/without _ga parameter.
Basically it will provide you three groups in a match, the third group will contain step parameter value, e.g.: contact_information
Example:
https://regex101.com/r/C1GuDY/1

What's the correct way to create a REST service that allows for different types of identifiers?

I need to create a RESTful webservice that allows for addressing entities by using different types of IDs. I will give you an example based on books (which is not what I need to process but I want to build a common understanding this way).
Books can be identifier by:
ISBN 13
ID
title
I can create a book by POSTing to /api/v1/books/The%20Bible. This book can then later be addressed by its ISBN /api/v1/books/12312312301 or ID /api/v1/books/A9471IZ1. If I implemented it this way I would need to analyze whatever identifier gets sent and convert it internally.
Is it 'legal' to add the type of identifier to the URL ? Like /api/v1/books/title/The%20Bible?
It seems that what you need is not simply retrieving resources, but searching for them by certain criteria (in your case, by ISBN, title or ID). In that case, rather than complicate your /books endpoint (which, ideally, should only returns books by ID), I'd create a separate /search function. You can then use it search for books by any field.
For example, you would have:
GET /search?title=bible
GET /search?isbn=12312312301
It can even be easily expanded to add more fields later on.
First: A RESTful URl should only contain nouns and not verbs. You can find a lot of best-practices online, as example: RESTful API Design: nouns are good, verbs are bad
One approach would be to detect the id/identifier in code.
The pattern would be, as you already mentioned:
GET /api/v1/books/{id}, like /api/v1/books/12312312301 or /api/v1/books/The%20Bible
Another approach, similar to this.lau_, would be with a query parameter. But I suggest to add the query parameter to the books URL (because only nouns, no verbs):
GET /api/v1/books?isbn=12312312301
The better solution? Not sure…
Because you are selecting “one book by id” (except title), rather than performing a query/search, I prefer the first approach (…/books should return “a collection of books” and .../books/{id} should return only one book).
But maybe someone has a better approach/idea?
Edit:
I suggest to avoid adding the identifier to the URL, it has “bad smell”. But is also a possible approach and I saw that a lot in other APIs. Let’s see if I can find some information on that, if its “ok” or should be avoided.
Edit 2:
See REST API DESIGN - Getting a resource through REST with different parameters but same url pattern and REST - supporting multiple possible identifiers

Express 4 router implying parameter regex?

I have got a simple issue where I have 2 routes which do different things, one is:
blah\groups\:group_id
and then
blah\groups\count
Now the first returns a specific group, the latter returns the amount of groups the user has access to. Now the problem is that the first route is hit even when I use the 2nd route url. Which makes sense as it doesnt know that there is a different route for count. I was looking at doing regex to tell it to use group_id if it does not contain count however then I cannot use the router.param with it, so is there a way to tell express to use count first then if that isnt matched try the group_id one? or if not any way to keep the parameter name but attach some regex so it has the context of what to look for but retains the parameter name?
Routes work like middleware and are executed in the order they are placed.
Having blah\groups\count before blah\groups\:group_id will make sure a match of count comes before :group_id.

Hierarchical URLs in Django

Is there a way to implement hierarchical query pattern in Django? As far as I know, the framework only allows to route to views by parsing URLs of a specific format, like:
/customers/{order} -> customer.views.show_orders(order)
But what if I need something like this:
/book1/chapter1/section1/paragraph1/note5 -> notes.view.show(note_id)
where note_id is the id of the last part of the URL, but the URL could have different number of components:
/book1/chapter1
/book1/chapter1/section1
etc.
Each time, it would point to the relevant part of the book depth depending on the depth. Is this doable?
I know there is this: https://github.com/MrKesn/django-mptt-urls, but I am wondering if there is another solution. This isn't ideal for me.
Django URLs are just regular expressions, so the simplest way would be to just ignore everything prior to the "note" section of the URL. For example:
url(r'^.*/note(?P<note_id>[0-9]+)$', 'notes.view.show'),
However, this would ignore the book, chapter, paragraph components. Which would mean your notes would need unique ids across the system, not just within the book. If you needed to capture any number of the interim parts it would be more complicated.
I can't confirm this will work right now, but using non-capture groups in regular expressions, you should be able to capture an optional book and chapter like so:
url(r'^(?:book(?P<book_id>[0-9]+)/)?(?:chapter(?P<chapter_id>[0-9]+)/)?note(?P<note_id>[0-9]+)$', 'notes.view.show'),
Use named groups to accomplish this: https://docs.djangoproject.com/en/dev/topics/http/urls/#named-groups
url(r'^book(?P<book_id>\d+)/chapter(?P<chapter_id>\d+)/section(?P<section_id>\d+)/paragraph(?P<paragraph_id>\d+)/note(?P<note_id>\d+)$', notes.view.show(book_id, chapter_id, section_id, paragraph_id, note_id)
For those who really need a variable-depth URL structure and need the URL to consist strictly of slugs, not IDs, knowing all the components of the URL is critical to retrieve the correct record from the database. Then, the only solution I can think of is using:
url(r'^.*/$', notes.views.show, name='show')
and then parsing the content of the URL to get the individual components after retrieving the URL in the view using the request.path call. This doesn't sound ideal, but it is a way to accomplish it.

Recursive URL Patterns CMS Style

Whenever I learn a new language/framework, I always make a content management system...
I'm learning Python & Django and I'm stuck with making a URL pattern that will pick the right page.
For example, for a single-level URL pattern, I have:
url(r'^(?P<segment>[-\w]+)/$', views.page_by_slug, name='pg_slug'),
Which works great for urls like:
http://localhost:8000/page/
Now, I'm not sure if I can get Django's URL system to bring back a list of slugs ala:
http://localhost:8000/parent/child/grandchild/
would return parent, child, grandchild.
So is this something that Django does already? Or do I modify my original URL pattern to allow slashes and extract the URL data there?
Thanks for the help in advance.
That's because your regular expression does not allow middle '/' characters. Recursive definition of url segments pattern may be possible, but anyway it would be passed as a chunk to your view function.
Try this
url(r'^(?P<segments>[-/\w]+)/$', views.page_by_slug, name='pg_slug'),
and split segments argument passed to page_by_slug() by '/', then you will get ['parent', 'child', 'grandchild']. I'm not sure how you've organized the page model, but if it is not much sophiscated, consider using or improving flatpages package that is already included in Django.
Note that if you have other kind of urls that does not indicate user-generated pages but system's own pages, you should put them before the pattern you listed because Django's url matching mechanism follows the given order.