Case Insensitive Search parameters for API endpoint

Case Insensitive Search parameters for API endpoint - regex

I am working on a project that involves integrating the PUBG API. From my site, the player can lookup stats using their player name, platform and season. One issue I am facing is that the player name have to be exact and is case sensitive. Now I assumed it to be the case at the beginning. However, after searching for the name in this site I found that they don't need the name to be case sensitive. Also, referring to this post from the PUBG Dev community here I saw that it confirmed my initial assumption. So my question is if PUBG API requires the names to be case sensitive then, how is the site (linked) can search for the player even if the name provided is not in exact, matching case? For example,:
I looked up the player name MyCholula. From the PUBG API page for player lookup, it returns the proper value. When I tried mycholula, it doesn't and sends a 404. From the linked site above, both combination seems to work. Now if spaces or other separators were involved in the name then, it would be easy to convert it assuming that separated words are all capitalized (somewhat naive assumption though). For this name, I don't see any way of converting mycholula to MyCholula. I also tried many other combination in the linked site above (also different user names I got from my friends) to confirm that the linked site is actually returning the data as expected for any combination of user names. I also tried it on other sites like this and it didn't work just like it doesn't work from the PUBG DEV API page or from my page.
I am really confused as to how they are doing it. The only possible explanation I can come up with is that they have the player records stored in their database from where, they can perform advanced regexp based search to get the actual name. However, this sounds far fetched since, there are millions of players and it would require them to know all the player names and associated IDs. Also, as far as I know, it is not possible to use regex or other string manipulation to convert to the actual name because there can be many combinations (not an expert on regex so can't be definitive on this).
Any help or suggestions will be greatly appreciated. Thanks.

Related

REGEXP_MATCH multiple words in a string using CASE statement in Google DataStudio

I am using Google Datastudio to make a CASE statement to take a multi-words string and split it out into categories. I was asked to use REGEXP_MATCH (nothing else, I know contains function would be easier).
I need a solution to match the following words:
HouseBrochure
home brochure
HomeBrochure
house brochure
Bathroom brochure
Bathroombrochure
FloorBrochure
floor brochure
To complicate matters, these words come in via a website request system, meaning people can request a house, bathroom and floor brochure in one request. When such requests reach my server, it compiles into a list(string) which looks like this:
# (with the pipes included)
HouseBrochure|Bathroom brochure|floor brochure
This is just an example of 1 request, there are many variations and multiple requests that come through (I've also only included a few of these brochures, there are many more)
I need to separate out all the house brochures, all the bathroom brochures and all the floor brochures etc, so I can count how many requests have been made for each brochure.
Being new to Regex, I have a basic understanding but nowhere near advanced.
My current attempt in Data studio looks like this:
CASE
WHEN REGEXP_MATCH(Event Label,'^.*(HouseBrochure.*|home brochure.*|HomeBrochure.*|house brochure.*).*$') THEN 'Home Brochure'
END
This is just for the home brochure, yet it's not working, can someone help?
Also, as an FYI Datastudio uses REG2

My approach would be:
convert everything to lower case (avoid messing with upper/lower case differences)
Use regex to replace variations with base form:
e.g.
(house|home)\s*brochure
replace with
HomeBrochure
Test here.
Do some counting as needed, using just the base keywords.

What's the correct way to create a REST service that allows for different types of identifiers?

I need to create a RESTful webservice that allows for addressing entities by using different types of IDs. I will give you an example based on books (which is not what I need to process but I want to build a common understanding this way).
Books can be identifier by:
ISBN 13
ID
title
I can create a book by POSTing to /api/v1/books/The%20Bible. This book can then later be addressed by its ISBN /api/v1/books/12312312301 or ID /api/v1/books/A9471IZ1. If I implemented it this way I would need to analyze whatever identifier gets sent and convert it internally.
Is it 'legal' to add the type of identifier to the URL ? Like /api/v1/books/title/The%20Bible?

It seems that what you need is not simply retrieving resources, but searching for them by certain criteria (in your case, by ISBN, title or ID). In that case, rather than complicate your /books endpoint (which, ideally, should only returns books by ID), I'd create a separate /search function. You can then use it search for books by any field.
For example, you would have:
GET /search?title=bible
GET /search?isbn=12312312301
It can even be easily expanded to add more fields later on.

First: A RESTful URl should only contain nouns and not verbs. You can find a lot of best-practices online, as example: RESTful API Design: nouns are good, verbs are bad
One approach would be to detect the id/identifier in code.
The pattern would be, as you already mentioned:
GET /api/v1/books/{id}, like /api/v1/books/12312312301 or /api/v1/books/The%20Bible
Another approach, similar to this.lau_, would be with a query parameter. But I suggest to add the query parameter to the books URL (because only nouns, no verbs):
GET /api/v1/books?isbn=12312312301
The better solution? Not sure…
Because you are selecting “one book by id” (except title), rather than performing a query/search, I prefer the first approach (…/books should return “a collection of books” and .../books/{id} should return only one book).
But maybe someone has a better approach/idea?
Edit:
I suggest to avoid adding the identifier to the URL, it has “bad smell”. But is also a possible approach and I saw that a lot in other APIs. Let’s see if I can find some information on that, if its “ok” or should be avoided.
Edit 2:
See REST API DESIGN - Getting a resource through REST with different parameters but same url pattern and REST - supporting multiple possible identifiers

Correct non existent domain name to nearest match

I'm looking for a service that tells you the nearest match of a non existent domain, because it was misspelled by the user. For example, if an user writes 'hotmail.con', send a query with that and obtain as a result 'hotmail.com'.

You've picked a hard problem. A domain can be 1-63 characters long, shall contain characters [a-z0-9-], and shall not start with a hyphen. Brute forcing it not an option. If the user types in hotmail.con you could search misspellings of it, which would try homail.com and hotmale.com, which may or may not be accurate domain names, who is to know WHICH mis-spelling is the correct one? The computer would have to return a list of options to the user: "Did you mean this domain name, or maybe or that domain name?".
You might be interested in Peter Norvig's spelling corrector that Google uses to spell check queries that come in. It's one of the best spelling correctors on the planet.
http://norvig.com/spell-correct.html
Peter Norvig's Spell checker should work provided you had a body of correct domain names which is up to date. You could create your own list on the fly, by keeping a list of which sites the user has been to, and using those as the body of domain names to check against. That way, when the user selects "hotmail.con" it finds hotmail.com in your list. However, this does not protect the user from accidentally visiting: "hotmale.com". Because that is a valid site.
Here is a stackoverflow qustion about how to get all the domain names:
https://stackoverflow.com/questions/4539155/how-to-get-all-the-domain-names
The best idea is to think outside the box and do it like firefox does it. When the user starts typing hotmail.com, what they usually do is click a textbox, type "h", then "o". Have a dropdown come out with recently visited domain names that start with that.

Regular Expressions - Parsing Domain Issues

I am trying to find the domain -- everything but the subdomain.
I have this regexp right now:
(?:[-a-zA-Z0-9]+\.)*([-a-zA-Z0-9]+(?:\.[a-zA-Z]{2,3})){1,2}
This works for things like:
domain.tld
subdomain.tld
But it runs into trouble with tld's like ".com.au" or ".co.uk":
domain.co.uk (finds co.uk, should find domain.co.uk)
subdomain.domain.co.uk (finds co.uk, should find domain.co.uk)
Any ideas?

I'm not sure this problem is "reasonably solvable"; Mozilla maintains a list of 'public suffix' domains that is intended to help browser authors accept cookies for only domains within one administrative control (e.g., prevent someone from setting a cookie valid for *.co.uk. or *.union.aero.). It obviously isn't perfect (near the end, you'll find a long list of is-a-caterer.com-style domains, so foo.is-a-caterer.com couldn't set a cookie that would be used by bar.is-a-caterer.com, but is-a-caterer.com is perfectly well a "domain" as you've defined it.)
So, if you're prepared to use the list as provided, you could write a quick little parser that would know how to apply the general rules and exceptions to determine where in the given input string your "domain" comes, and return just the portion you're interested in.
I think simpler approaches are doomed to failure: some ccTLDs such as .ca don't use second-level domains, some such as .br use dozens, and some, like lib.or.us are several levels away from the "domain" such as multnomah.lib.or.us. Unless you're using curated lists of which domains are a public suffix, you're doomed to being wrong for some non-trivial set of input strings.

Match all characters in group except for first and last occurrence

Say I request
parent/child/child/page-name
in my browser. I want to extract the parent, children as well as page name. Here are the regular expressions I am currently using. There should be no limit as to how many children there are in the url request. For the time being, the page name will always be at the end and never be omitted.
^([\w-]{1,}){1} -> Match parent (returns 'parent')
(/(?:(?!/).)*[a-z]){1,}/ -> Match children (returns /child/child/)
[\w-]{1,}(?!.*[\w-]{1,}) -> Match page name (returns 'page-name')
The more I play with this, the more I feel how clunky this solution is. This is for a small CMS I am developing in ASP Classic (:(). It is sort of like the MVC routing paths. But instead of calling controllers and functions based on the URL request. I would be travelling down the hierarchy and finding the appropriate page in the database. The database is using the nested set model and is linked by a unique page name for each child.
I have tried using the split function to split with a / delimiter however I found I was nested so many split statements together it became very unreadable.
All said, I need an efficient way to parse out the parent, children as well as page name from a string. Could someone please provide an alternative solution?
To be honest, I'm not even sure if a regular expression is the best solution to my problem.
Thank you.

You could try using:
^([\w-]+)(/.*/)([\w-]+)$
And then access the three matching groups created using Match.SubMatches. See here for more details.
EDIT
Actually, assuming that you know that [\w-] is all that is used in the names of the parts, you can use ^([\w-]+)(.*)([\w-]+)$ instead and it will handle the no-child case fine by itself as well.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js