Sublime Snippet Regex Replacement - regex

I've recently been creating quite a few Sublime Text 3 plugins/snippets/etc. to automate repetitive tasks. The current one I am stuck on uses regex in a snippet to get my default skeleton for a new function.
Ideally, I would like the snippet would generate something similar to:
// Multiple Args (one arg would obviously look like (..." + "a: " + a + ")");)
function Foo(a, b, c)
{
Log.AppendFolder("Foo(" + "a: " + a + ", b: " + b + ", c: " + c + ")");
//body
Log.PopLogFolder();
}
// Zero Args
function Foo()
{
Log.AppendFolder("Foo()");
//body
Log.PopLogFolder();
}
So far, I can get it formatted with 1 argument or many arguments, not all possible combos (zero, one, many).
The outline is current this, I just need to figure out the second ${2} with regex:
<snippet>
<content><![CDATA[
function ${1:function_name}(${2:arguments})
{
Log.AppendFolder("$1(" + ${2/(?#stuck here)//} + ")");
${3://body}
Log.PopLogFolder();
}$0]]></content>
<tabTrigger>fun</tabTrigger>
<scope>source.js</scope>
<description>Function Template</description>
</snippet>
One Arg:
"$1(" + ${2/^([A-z0-9_-]*),?.*/"\1\: " + \1 + /}");"
Many Args (with 1 arg, this shows "a: " + a + a):
"$1(" + ${2/^([A-z0-9_-]*),?(.*)/"\1\: " + \1 + /}${2/([A-z0-9_-]*)(?:, *([A-z0-9_-]*))/"$2\: " + $2 + /g}");"
One method worked by had an extra + "" + in there, which I'd like to avoid:
${2/([A-z_0-9]+)((?:, ?)?)/"\1\: " + \1 + "\2" + /g}
I've tried a conditional look-ahead based on commas, but that gets messed up >1 arg, probably due to my lack of understanding of them:
${2/(?(?!,)^([A-z0-9_-]*)$|([A-z0-9_-]*), *)/"\1\: " + \1/g}
I could easily do this via a normal plugin (this is easy programmatically), but ideally this can remain a snippet/code-completion since I can just override the JS "fun" code-completion.
What am I missing to accomplish this (or is it simply the wrong avenue - if that's the case, I'd still like to know to learn more about regex)?

Finally figured this out, there is a conditional replacement option:
?n:then:else
So the final format looks like:
<snippet>
<content><![CDATA[
function ${1:function_name}(${2:args})
{
Log.AppendFolder("$1(${2/.+/" + /}${2/([A-z_0-9-]+) *(,)? */"$1\: " + $1 ?2: + "$2 " + :+ /g}${2/.+/"/})");
${3:// body...}
Log.PopLogFolder();
}$0]]></content>
<tabTrigger>fun</tabTrigger>
<scope>source.js</scope>
<description>Function</description>
</snippet>
Which will give the desired result:
function function_name()
{
Log.AppendFolder("function_name()");
// body...
Log.PopLogFolder();
}
function function_name(a)
{
Log.AppendFolder("function_name(" + "a: " + a + ")");
// body...
Log.PopLogFolder();
}
function function_name(a, b)
{
Log.AppendFolder("function_name(" + "a: " + a + ", " + "b: " + b + ")");
// body...
Log.PopLogFolder();
}

Related

Possible encoding issue between PS and C++

I have a C++ program written using Qt that I'm using as a front end to create AD accounts. Essentially I launch an elevated process that executes PowerShell commands within an elevated PowerShell session. I can create the accounts fine but when I attempt to pull membership from a pre-existing user to copy it over to the new one it fails. I need to understand why it's failing and resolve the issue, any help is greatly appreciated. It fails with the following error:
Get-ADUser : Cannot validate argument on parameter 'Identity'. The argument is null. Provide a valid value for the
argument, and then try running the command again.
At line:2 char:28
+ "}); $groups = (Get-ADUser $tmpusr -Properties MemberOf).MemberOf; $u ...
+ ~~~~~~~
+ CategoryInfo : InvalidData: (:) [Get-ADUser], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.ActiveDirectory.Management.Commands.GetADUser
The $tmpusr variable is just the value of duser.template_user which is pulled from a QComboBox and is deffinetly not null because it outputs correctly and shows the selected template account. In C++ it's just data added to a struct member:
duser.set_groups_command = "$tmpusr = (Get-ADUser -Filter {Name -like " "\"" + duser.template_user + "\"" "}); "
"$groups = (Get-ADUser $tmpusr -Properties MemberOf).MemberOf; "
"$usr = " "\"" + duser.sam_name + "\"" + "; "
"Foreach ($group in $groups) {Add-ADGroupMember -Identity (Get-ADGroup $group).name -Members $usr} ";
If I strip the C++ and run the same command within PowerShell it executes fine:
$tmpusr = (Get-ADUser -Filter {Name -like "Example User"}); $groups = (Get-ADUser $tmpusr -Properties MemberOf).MemberOf; $usr = "TestUser"; Foreach ($group in $groups) {Add-ADGroupMember -Identity (Get-ADGroup $group).name -Members $usr}
The purpose of the command is to determine which groups "Example User" belongs to and then to add "TestUser" to the same groups. Again, creating the user works fine. That is done with:
duser.complete_command = p + "New-ADUser -Name " + "\"" + duser.employe_name +"\"" + " -GivenName " + "\"" + duser.given_name + "\""
+ " -Surname " + "\"" + duser.surname + "\"" + " -AccountPassword $sec " + " -UserPrincipalName " + "\"" + duser.userpname + "\""
" -DisplayName " + "\"" + duser.display_name + "\"" + " -EmailAddress " + "\"" + duser.email_address + "\"" + " -SamAccountName " +
"\"" + duser.sam_name + "\"" + " -Enabled " + duser.is_enabled;
You'll note the existence of "p" which is another QString created earlier on to convert to a secure string. The only other component is the function that elevates and executes:
void MainWindow::elevate_and_execute(QString param)
{
QProcess *process = new QProcess();
QStringList params = QStringList();
params = QStringList({"-Command", QString("Start-Process -Verb runAs powershell; "), param});
process->startDetached("powershell", params);
process->waitForFinished();
process->terminate();
}
I was able to resolve the issue. I found that when I pulled the so called template user from the QComboBox it contained a carriage return. I had written the logs to a text file and found it was pushing it to the next line; so the $tmpusr was broken. I was able to resolve the issue by modifying the duser.template_user variable when it's initially filled by stripping that "\r" out with remove(QChar('\r'))

Cleaning up formatting after deletion using regex

I have a function similar to the one below appearing in multiple files. I want to use regex to get rid of all references to outputString, since clearly, they're wasteful.
... other functions, class declarations, etc
public String toString()
{
String outputString = "";
return ... some stuff
+ outputString;
}
... other functions, class declarations, etc
I'm happy to do this in multiple passes. So far I've got regexes to find the first and last line (String outputString = "";$ and ( \+ outputString;)$). However, I've got two problems: first, I want to get rid of the whitespace that results in deleting the two lines that refer to outputString. Second, I need the final ; on the second last line to move up to the line above it.
As a bonus, I'd also like to know what's wrong with adding the line start anchor (^) to either of the regexes I specified. It seems like doing so would tighten them up, but when I try something like ^( \+ outputString;)$ I get zero results.
After all's said and done the function above should look like this:
... other functions, class declarations, etc
public String toString()
{
return ... some stuff;
}
... other functions, class declarations, etc
Here's an example of what "some stuff" might be:
"name" + ":" + getName()+ "," +
"id" + ":" + getId()+ "]" + System.getProperties().getProperty("line.separator") +
" " + "student = "+(getStudent()!=null?Integer.toHexString(System.identityHashCode(getStudent())):"null")
Here's a concrete example:
Current:
public void delete()
{
Student existingStudent = student;
student = null;
if (existingStudent != null)
{
existingStudent.delete();
}
}
public String toString()
{
String outputString = "";
return super.toString() + "["+
"name" + ":" + getName()+ "," +
"id" + ":" + getId()+ "]" + System.getProperties().getProperty("line.separator") +
" " + "student = "+(getStudent()!=null?Integer.toHexString(System.identityHashCode(getStudent())):"null")
+ outputString;
}
public String getId()
{
return id;
}
Required:
public void delete()
{
Student existingStudent = student;
student = null;
if (existingStudent != null)
{
existingStudent.delete();
}
}
public String toString()
{
return super.toString() + "["+
"name" + ":" + getName()+ "," +
"id" + ":" + getId()+ "]" + System.getProperties().getProperty("line.separator") +
" " + "student = "+(getStudent()!=null?Integer.toHexString(System.identityHashCode(getStudent())):"null");
}
public String getId()
{
return id;
}
1st pass:
Find:
.*outputString.*\R
Replace with empty string.
Demo:
https://regex101.com/r/g3aYnp/2
2nd pass:
Find:
(toString\(\)[\s\S]+\))(\s*\R\s*?\})
Replace:
$1;$2
https://regex101.com/r/oxsNRW/3
Assuming that the wanted part of the return expression does not contain any semi colons (i.e. ;) then you can do it in one replace. Search for:
^ +String outputString = "";\R( +return [^;]+?)\R +\+ outputString;
and replace with:
\1;
The idea is to match all three lines in one go, to keep the wanted part and to add the ;.
An interesting point in this replacement. My first attempt had ... return [^;]+)\R +\+ ... and it failed whereas ... return [^;]+)\r\n +\+ ... worked. The \R version appeared to leave a line-break before the final ;. Turning on menu => View => Show symbol => Show end of line reveals that the greedy term within the capture group collected the \r and the \R matched only the \n. Changing to a non-greedy form allowed the \R to match the entire \r\n.

Need help to make a RegExp filter to replace redundant Parentheses

For the past few days (weeks, months, years maybe if you count my on-again off-again search and attempts) I've been trying to make or find a RegEx filter to help me remove all redundant parentheses found in my code.
A worst case scenario of what the regex filter will have to deal with is attached. As is a best case scenario
return ((((((((((((((((((((((((((getHumanReadableLine("avHardwareDisable") + getHumanReadableLine("hasAccessibility")) + getHumanReadableLine("hasAudio")) + getHumanReadableLine("hasAudioEncoder")) + getHumanReadableLine("hasEmbeddedVideo")) + getHumanReadableLine("hasIME")) + getHumanReadableLine("hasMP3")) + getHumanReadableLine("hasPrinting")) + getHumanReadableLine("hasScreenBroadcast")) + getHumanReadableLine("hasScreenPlayback")) + getHumanReadableLine("hasStreamingAudio")) + getHumanReadableLine("hasStreamingVideo")) + getHumanReadableLine("hasTLS")) + getHumanReadableLine("hasVideoEncoder")) + getHumanReadableLine("isDebugger")) + getHumanReadableLine("language")) + getHumanReadableLine("localFileReadDisable")) + getHumanReadableLine("manufacturer")) + getHumanReadableLine("os")) + getHumanReadableLine("pixelAspectRatio")) + getHumanReadableLine("playerType")) + getHumanReadableLine("screenColor")) + getHumanReadableLine("screenDPI")) + getHumanReadableLine("screenResolutionX")) + getHumanReadableLine("screenResolutionY")) + getHumanReadableLine("version")));
return ((((name + ": ") + Capabilities[name]) + "\n"));
As you can see there's... a few... redundant parentheses in my code. Been working actively with these for a very long time but have always tried to clean up what I come across and been trying to find a faster way to do it.
So one example of how the "clean" code would look, I'm hoping at least!
return (name + ": " + Capabilities[name] + "\n");
return name + ": " + Capabilities[name] + "\n";
Either one is acceptable to be completely honest as long as the code itself doesn't mock up and change how it works.
I greatly appreciate any answers anyone can give me. Please don't Mock what I do or am trying to achieve. I haven't worked much with regex or similar things before...
And just to humour you... Here's my "RegExp" for my "clean" example
(return) ({1,}((.[^)]{1,}))(.{1,}))(.{1,})){1,}
$1 $2 $3 $4 // output
oh... Forgot to mention
(!(testCrossZ()))
Might appear at times as well but those aren't as big of an issue to clean up manually if needed.
P.S... There is a "LOT" of occurances of the redundant parentheses... Like... Maybe thousands... Most likely thousands.
Not sure if it applies for actionscript, but for Java you can do: Main Menu | Analyze | Run Inspection by Name | type "parentheses" | select "Unnecessary parentheses" | run in the whole project and fix all problems
Result:
return getHumanReadableLine("avHardwareDisable") + getHumanReadableLine("hasAccessibility")
+ getHumanReadableLine("hasAudio") + getHumanReadableLine("hasAudioEncoder")
+ getHumanReadableLine("hasEmbeddedVideo") + getHumanReadableLine("hasIME")
+ getHumanReadableLine("hasMP3") + getHumanReadableLine("hasPrinting")
+ getHumanReadableLine("hasScreenBroadcast") + getHumanReadableLine("hasScreenPlayback")
+ getHumanReadableLine("hasStreamingAudio") + getHumanReadableLine("hasStreamingVideo")
+ getHumanReadableLine("hasTLS") + getHumanReadableLine("hasVideoEncoder")
+ getHumanReadableLine("isDebugger") + getHumanReadableLine("language")
+ getHumanReadableLine("localFileReadDisable") + getHumanReadableLine("manufacturer")
+ getHumanReadableLine("os") + getHumanReadableLine("pixelAspectRatio")
+ getHumanReadableLine("playerType") + getHumanReadableLine("screenColor")
+ getHumanReadableLine("screenDPI") + getHumanReadableLine("screenResolutionX")
+ getHumanReadableLine("screenResolutionY") + getHumanReadableLine("version");
I honestly haven't understood the exact form of the output format that you wanted but as per starters at least clearing off the unnecessary parenthesis can be done with pure JavaScript as follows.
var text = 'return ((((((((((((((((((((((((((getHumanReadableLine("avHardwareDisable") + getHumanReadableLine("hasAccessibility")) + getHumanReadableLine("hasAudio")) + getHumanReadableLine("hasAudioEncoder")) + getHumanReadableLine("hasEmbeddedVideo")) + getHumanReadableLine("hasIME")) + getHumanReadableLine("hasMP3")) + getHumanReadableLine("hasPrinting")) + getHumanReadableLine("hasScreenBroadcast")) + getHumanReadableLine("hasScreenPlayback")) + getHumanReadableLine("hasStreamingAudio")) + getHumanReadableLine("hasStreamingVideo")) + getHumanReadableLine("hasTLS")) + getHumanReadableLine("hasVideoEncoder")) + getHumanReadableLine("isDebugger")) + getHumanReadableLine("language")) + getHumanReadableLine("localFileReadDisable")) + getHumanReadableLine("manufacturer")) + getHumanReadableLine("os")) + getHumanReadableLine("pixelAspectRatio")) + getHumanReadableLine("playerType")) + getHumanReadableLine("screenColor")) + getHumanReadableLine("screenDPI")) + getHumanReadableLine("screenResolutionX")) + getHumanReadableLine("screenResolutionY")) + getHumanReadableLine("version")));',
r = /\(((getHumanReadableLine\("\w+"\)[\s\+]*)+)\)/g,
temp = "";
while (text != temp) {
temp = text;
text = text.replace(r,"$1");
}
document.write('<pre>' + text + '</pre>');
From this point on, it shouldn't be a big deal to convert the reduced text into the desired output format.

Eclipse Conditional replace with regex

Given the text
public void MyFunction(int i, String str, boolean doIt) {
Log.i(TAG, "Enter MyFunction(int i, String str, boolean doIt)");
I want to make some replacements on the second line, but not the first
public void MyFunction(int i, String str, boolean doIt) {
Log.i(TAG, "Enter MyFunction( i:" + i + ", str:" + str ", doIt:" + doIt + ")");
So far using the following regex I manage to get these results:
find "\w+\s+(\w+)([,\)])"
replace with "$1:" + $1 + "$2"
public void MyFunction(i:" + i + ", str:" + str ", doIt:" + doIt + ") ") {
Log.i(TAG, "Enter MyFunction( i:" + i + ", str:" + str ", doIt:" + doIt + ") ");
Is there any way to force the replace to be executed only on the Log.i lines?
EDIT:
I tried the following regex
"Log\.i\(.*?\((\s*(\w+\s+(\w+)([,\)]))+"
but $1,$2,$3 only contains the last match (the last argument: doIt)
$1=boolean doIt)
$2=doIt
$3=)
when there should be 3 sets of $1,$2,$3, one for each argument.
If you know how to retrieve multiple matches, that would also make for a solution
I caved,
I used this little perl to do the job:
next unless /Log\.i/;
s/TAG,/TAGG/;
s/(final\s+)?[^ \(]+\s+(\w+)([,\)])/$2:\" \+ $2 \+ \"$3/g;
s/TAGG/TAG,/;
with the command line:
perl -pi <scriptname> <file>
If someone still wants to contribute some, I understand I could have run perl as Eclipse external tool to process the java files. How do I do that?
UPDATE:
I wrote a post on how to use external perl to run the script from within Eclipse IDE
see the post

Regular expression for a language tag (as defined by BCP47)

I need a regular expression for a language tag as defined by BCP 47.
I know that the full BNF syntax is available at http://www.rfc-editor.org/rfc/bcp/bcp47.txt and that I could use it to write my own, but hopefully there is one already out there.
Looks like this:
^((?<grandfathered>(en-GB-oed|i-ami|i-bnn|i-default|i-enochian|i-hak|i-klingon|i-lux|
i-mingo|i-navajo|i-pwn|i-tao|i-tay|i-tsu|sgn-BE-FR|sgn-BE-NL|sgn-CH-DE)|(art-lojban|
cel-gaulish|no-bok|no-nyn|zh-guoyu|zh-hakka|zh-min|zh-min-nan|zh-xiang))|((?<language>
([A-Za-z]{2,3}(-(?<extlang>[A-Za-z]{3}(-[A-Za-z]{3}){0,2}))?)|[A-Za-z]{4}|[A-Za-z]{5,8})
(-(?<script>[A-Za-z]{4}))?(-(?<region>[A-Za-z]{2}|[0-9]{3}))?(-(?<variant>[A-Za-z0-9]{5,8}
|[0-9][A-Za-z0-9]{3}))*(-(?<extension>[0-9A-WY-Za-wy-z](-[A-Za-z0-9]{2,8})+))*
(-(?<privateUse>x(-[A-Za-z0-9]{1,8})+))?)|(?<privateUse>x(-[A-Za-z0-9]{1,8})+))$
Here is the code to generate it (in C#):
var regular = "(art-lojban|cel-gaulish|no-bok|no-nyn|zh-guoyu|zh-hakka|zh-min|zh-min-nan|zh-xiang)";
var irregular = "(en-GB-oed|i-ami|i-bnn|i-default|i-enochian|i-hak|i-klingon|i-lux|i-mingo|i-navajo|i-pwn|i-tao|i-tay|i-tsu|sgn-BE-FR|sgn-BE-NL|sgn-CH-DE)";
var grandfathered = "(?<grandfathered>" + irregular + "|" + regular + ")";
var privateUse = "(?<privateUse>x(-[A-Za-z0-9]{1,8})+)";
var singleton = "[0-9A-WY-Za-wy-z]";
var extension = "(?<extension>" + singleton + "(-[A-Za-z0-9]{2,8})+)";
var variant = "(?<variant>[A-Za-z0-9]{5,8}|[0-9][A-Za-z0-9]{3})";
var region = "(?<region>[A-Za-z]{2}|[0-9]{3})";
var script = "(?<script>[A-Za-z]{4})";
var extlang = "(?<extlang>[A-Za-z]{3}(-[A-Za-z]{3}){0,2})";
var language = "(?<language>([A-Za-z]{2,3}(-" + extlang + ")?)|[A-Za-z]{4}|[A-Za-z]{5,8})";
var langtag = "(" + language + "(-" + script + ")?" + "(-" + region + ")?" + "(-" + variant + ")*" + "(-" + extension + ")*" + "(-" + privateUse + ")?" + ")";
var languageTag = #"^(" + grandfathered + "|" + langtag + "|" + privateUse + ")$";
Console.WriteLine(languageTag);
I cannot guarantee its correctness (I may have made typos), but it works fine on the examples in Appendix A.
Depending on your environment, you might need to remove the named capturing groups "?<...>".
An optimized version that works in PHP.
/^(?<grandfathered>(?:en-GB-oed|i-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|navajo|pwn|t(?:a[oy]|su))|sgn-(?:BE-(?:FR|NL)|CH-DE))|(?:art-lojban|cel-gaulish|no-(?:bok|nyn)|zh-(?:guoyu|hakka|min(?:-nan)?|xiang)))|(?:(?<language>(?:[A-Za-z]{2,3}(?:-(?<extlang>[A-Za-z]{3}(?:-[A-Za-z]{3}){0,2}))?)|[A-Za-z]{4}|[A-Za-z]{5,8})(?:-(?<script>[A-Za-z]{4}))?(?:-(?<region>[A-Za-z]{2}|[0-9]{3}))?(?:-(?<variant>[A-Za-z0-9]{5,8}|[0-9][A-Za-z0-9]{3}))*(?:-(?<extension>[0-9A-WY-Za-wy-z](?:-[A-Za-z0-9]{2,8})+))*)(?:-(?<privateUse>x(?:-[A-Za-z0-9]{1,8})+))?$/Di
Javascript polices duplicate named capture groups so you have to change the 2nd use of ?<privateUse> to e.g. ?<privateUse1>. Compiles to:
/^((?<grandfathered>(en-GB-oed|i-ami|i-bnn|i-default|i-enochian|i-hak|i-klingon|i-lux|i-mingo|i-navajo|i-pwn|i-tao|i-tay|i-tsu|sgn-BE-FR|sgn-BE-NL|sgn-CH-DE)|(art-lojban|cel-gaulish|no-bok|no-nyn|zh-guoyu|zh-hakka|zh-min|zh-min-nan|zh-xiang))|((?<language>([A-Za-z]{2,3}(-(?<extlang>[A-Za-z]{3}(-[A-Za-z]{3}){0,2}))?)|[A-Za-z]{4}|[A-Za-z]{5,8})(-(?<script>[A-Za-z]{4}))?(-(?<region>[A-Za-z]{2}|[0-9]{3}))?(-(?<variant>[A-Za-z0-9]{5,8}|[0-9][A-Za-z0-9]{3}))*(-(?<extension>[0-9A-WY-Za-wy-z](-[A-Za-z0-9]{2,8})+))*(-(?<privateUse>x(-[A-Za-z0-9]{1,8})+))?)|(?<privateUse1>x(-[A-Za-z0-9]{1,8})+))$/
Here's a way to construct it:
let privateUseUsed = 0
const privateUse = () => "(?<privateUse" + (privateUseUsed++) + ">x(-[A-Za-z0-9]{1,8})+)"
const grandfathered = "(?<grandfathered>" +
/* irregular */ (
"en-GB-oed" +
"|" + "i-(?:ami|bnn|default|enochian|hak|klingon|lux|mingo|navajo|pwn|tao|tay|tsu)" +
"|" + "sgn-(?:BE-FR|BE-NL|CH-DE)"
) +
"|" + /* regular */ (
"art-lojban|cel-gaulish|no-bok|no-nyn|zh-guoyu|zh-hakka|zh-min|zh-min-nan|zh-xiang"
) +
")"
const langtag = "(" +
"(?<language>" + (
"([A-Za-z]{2,3}(-" +
"(?<extlang>[A-Za-z]{3}(-[A-Za-z]{3}){0,2})" +
")?)|[A-Za-z]{4,8})"
) +
"(-" + "(?<script>[A-Za-z]{4})" + ")?" +
"(-" + "(?<region>[A-Za-z]{2}|[0-9]{3})" + ")?" +
"(-" + "(?<variant>[A-Za-z0-9]{5,8}|[0-9][A-Za-z0-9]{3})" + ")*" +
"(-" + "(?<extension>" + (
/* singleton */ "[0-9A-WY-Za-wy-z]" +
"(-[A-Za-z0-9]{2,8})+)"
) +
")*" +
"(-" + privateUse() + ")?" +
")"
const languageTagReStr = "^(" + grandfathered + "|" + langtag + "|" + privateUse() + ")$";
Edit: turns out ff doens't support named capture groups so you have to strip them out with .replace(/\?<a-zA-Z>/g, '') or jest leave them out in the first place:
const grandfathered = "(" +
/* irregular */ "(en-GB-oed|i-ami|i-bnn|i-default|i-enochian|i-hak|i-klingon|i-lux|i-mingo|i-navajo|i-pwn|i-tao|i-tay|i-tsu|sgn-BE-FR|sgn-BE-NL|sgn-CH-DE)" +
"|" +
/* regular */ "(art-lojban|cel-gaulish|no-bok|no-nyn|zh-guoyu|zh-hakka|zh-min|zh-min-nan|zh-xiang)" +
")";
const langtag = "(" +
"(" + (
"([A-Za-z]{2,3}(-" +
"([A-Za-z]{3}(-[A-Za-z]{3}){0,2})" +
")?)|[A-Za-z]{4}|[A-Za-z]{5,8})"
) +
"(-" + "([A-Za-z]{4})" + ")?" +
"(-" + "([A-Za-z]{2}|[0-9]{3})" + ")?" +
"(-" + "([A-Za-z0-9]{5,8}|[0-9][A-Za-z0-9]{3})" + ")*" +
"(-" + "(" + (
/* singleton */ "[0-9A-WY-Za-wy-z]" +
"(-[A-Za-z0-9]{2,8})+)"
) +
")*" +
"(-" + "(x(-[A-Za-z0-9]{1,8})+)" + ")?" +
")";
const languageTag = RegExp("^(" + grandfathered + "|" + langtag + "|" + "(x(-[A-Za-z0-9]{1,8})+)" + ")$");
Test with languageTag.test('en-us')
If using a CLDR-based function set, like PHP's intl extension, you can check if a locale exists in the intl database using a function like:
<?php
function is_locale($locale=''){
// STANDARDISE INPUT
$locale=locale_canonicalize($locale);
// LOAD ARRAY WITH LOCALES
$locales=resourcebundle_locales(NULL);
// RETURN WHETHER FOUND
return (array_search($locale,$locales)!==F);
}
?>
It takes about half a millisecond to load and search the data, so it won't be too much of a performance hit.
Of course, it will only find those in the database of the CLDR version supplied with the PHP version used, but will be updated with each subsequent PHP release.