I want to concatenate my Access Key ID and Secret Access Key together so I can easily rotate the credentials with Azure Key Vault. I'm having trouble finding out which characters will not be used by either the generated Access Key ID or the Secret Access Key to keep them separated in the concatenated string. Is it safe to use a semicolon or a colon?
Edit: https://docs.aws.amazon.com/IAM/latest/APIReference/API_AccessKey.html indicates that the Access Key ID can contain any nonspace character, although I'm not sure if generated IDs are more limited in practice. Unfortunately, no guidelines are given for Secret Access Keys. Is a space a reasonable separator?
Amazon actually provide regular expressions for searching for access keys and secret access keys in this article, which we can use to tell what characters are used:
Search for access key IDs: (?<![A-Z0-9])[A-Z0-9]{20}(?![A-Z0-9]). In English, this regular expression says: Find me 20-character, uppercase, alphanumeric strings that don’t have any uppercase, alphanumeric characters immediately before or after.
Search for secret access keys: (?<![A-Za-z0-9/+=])[A-Za-z0-9/+=]{40}(?![A-Za-z0-9/+=]). In English, this regular expression says: Find me 40-character, base-64 strings that don’t have any base 64 characters immediately before or after.
So letters and numbers in the access key and those plus the characters /+= could appear in the secret key. This means a semicolon or a colon would be safe choices for a separator.
Related
I’ve developed a function in PowerShell (.NET Framework) to retrieve data from any given Azure Blob Storage: so far, so good.
When it comes down to validate users’ input, unfortunately, I cannot rely on external modules or libraries such as the NameValidator Class of Azure SDK for .NET.
Nevertheless, the article Naming and Referencing Containers, Blobs, and Metadata goes into the details of naming rules and thus regex patterns might come to the rescue.
For Container Names I’ve came up with this, and it seems to fit:
(?=^.{3,63}$)(?!.*--)[^-][a-z0-9-]*[^-]
Container Names
A container name must be a valid DNS name, conforming to the following
naming rules:
Container names must start or end with a letter or number, and can contain only letters, numbers, and the dash (-) character.
Every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not permitted in
container names.
All letters in a container name must be lowercase.
Container names must be from 3 through 63 characters long.
For Blob Names however I’m not able to get around the counting of path segments:
(?=^.{1,1024}$)(?<=^|\/)(\S*?)[^\.] (?=\/|$)
NB: the Azure Storage emulator has been deprecated and therefor out of scope.
Blob Names
A blob name must conforming to the following naming rules:
A blob name can contain any combination of characters.
A blob name must be at least one character long and cannot be more than 1,024 characters long, for blobs in Azure Storage.
The Azure Storage emulator supports blob names up to 256 characters long. For more information, see Use the Azure storage emulator for
development and testing.
Blob names are case-sensitive.
Reserved URL characters must be properly escaped.
The number of path segments comprising the blob name cannot exceed 254. A path segment is the string between consecutive delimiter characters (e.g., the forward slash '/') that corresponds to the name
of a virtual directory.
Note Avoid blob names that end with a dot (.), a forward slash (/), or a sequence or combination of the two. No path segments should end
with a dot (.).
The Blob service is based on a flat storage scheme, not a hierarchical scheme. However, you may specify a character or string delimiter within a blob name to create a virtual hierarchy. For example, the following list shows valid and unique blob names. Notice that a string can be valid as both a blob name and as a virtual directory name in the same container:
/a
/a.txt
/a/b
/a/b.txt
You can take advantage of the delimiter character when enumerating blobs.
NB: Just before asking this question, I’ve found this ones that answer what I’ve already solved on my own or use the aforementioned class:
Azure Container Name RegEx
How to validate Azure storage blob names
By the way, does anybody know which flavor of regex is used by PowerShell?
You need to use
^(?!.{1025})/?[^/]*[^/.](?:/[^/]*[^/.]){0,253}$
^(?=.{1,1024}$)/?[^/]*[^/.](?:/[^/]*[^/.]){0,253}$
See the regex demo.
Details:
^ - start of string
(?=.{1,1024}$) - the string should contain from 1 to 1024 chars
(?!.{1025}) - the string cannot contain more than 1025 chars
/? - an optional /
[^/]*[^/.] - zero or more chars other than / and then a char other than / and .
(?:/[^/]*[^/.]){0,253} - zero to 253 occurrences of / followed by zero or more chars other than / and then a char other than / and .
$ - end of string.
I am trying to solve an email domain co-existence problem with Exchange online. Basically i need it so when a message is sent to one tenant (domain.com) and forwarded to another tenant (newdomain.com) - that the To and/or CC headers are replaced with the endpoint (newdomain.com) email addresses before they are delivered to the final destination.
For Example:
1) Gmail (or any) user sends and email to sally.sue#domain.com, MX is looked up for that domain, it is delivered to the Office 365 Tenant for domain.com
2) That same office 365 tenant, is set to forward emails to sally.sue#newdomain.com (different tenant)
3) When the message arrives to sally sue at newdomain.com and she hits "Reply All" the original sender AND her (sally.sue#domain.com) are added to the To: line in the email.
The way to fix that is to use Header Replacement with Proofpoint, which as mentioned below works on a single users basis. The entire question below is me trying to get it to work using RegEx (As thats the only solution) for a large number of users.
I need to convert the following users email address:
username#domain.com to username#newdomain.com
This has to be done using ProofPoint which is a cloud hosted MTA. They have been able to provide some sort of an answer but its not working.
Proofpoint support has suggested using this:
Header Name : To
Find Value : domain\.com$
Replace : newdomain\.com$ or just newdomain.com
Neither of the above work. In both cases the values are just completely ignored.
This seems to find the values:
Header Name : To
Find Value : \b[A-Z0-9._%-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b
Replace : $1#fake.com
But the above simply and only replaces the To: line (in the email) with the literal string: $1#fake.com
I would also need to be able to find lowercase and numbers in email addresses as well. i believe the above example only finds caps.
I need it do the following:
Header Name : To
Find Value : \b[A-Z0-9._%-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b (find users email address, domain)
Replace : user.name#newdomain.com
This is for a large number of users so there is no way to manually update or create separate rules for each user.
If i do create a individual rule, then it works as expected but as stated that requires manually typing out each user To: address And their new desired To: address.
This solution here almost worked: Regex to replace email address domains?
I have a couple of observations from general experience, although I have not worked with Office365 specifically.
First, a regex used for replacement usually needs to have a "capture group". This is often expressed with parentheses, as in:
match : \b([A-Z0-9._%-]+)#domain.com$
replacement : $1#newdomain.com
The idea is that the $1 in the replacement pattern is replaced with whatever was found within the () in the matching pattern.
Note that some regex engines use a different symbol for the replacement, so it might be \1#newdomain.com or some such. Note also that some regex engines need the parentheses escaped, so the matching pattern might be something like \b\([A-Z0-9._%-]+\)#domain.com$
Second, if you want to include - inside a "character class" set (that is, inside square brackets []), then the - should be first; otherwise it's ambiguous because - is also used for a range of characters. The regex engine in question might not care, but I suggest writing your matching pattern as:
\b([-A-Z0-9._%]+)#domain.com$
This way, the first - is unambiguously itself, because there is nothing before it to indicate the start of a range.
Third, for lowercase letters, it's easiest to just expand your character class set to include them, like so:
[-A-Za-z0-9._%]
I am trying to create elasticsearch indexes with strings like xxx/yyy and xxx yyy but these are not permitted because they contain illegal characters (/ and ). These names are largely user created and out of my control so changing the names for the sake of fitting into the requirements of elasticsearch is not really an option.
This is the exact error message:
[Error: InvalidIndexNameException[[XXX\%FFZZZ] Invalid index name [XXX\%FFZZZ], must not contain the following characters [\, /, *, ?, ", <, >, |, , ,]]]
Anyways, I've tried URL encoding the strings, but that doesn't work because those include capital letters which are not permitted and backslash escaping is out of the question because it is in the list of illegal characters.
Is there a conventional solution to this problem, or do I have to come up with some sketchy serialization and/or hashing scheme to solve this?
Hmm, letting users have the control on such things like index name is asking for troubles :)
But if you're willing to pursue that route, what I suggest is simply to remove any character that is not alphanumeric and lowercase the result in the process.
In PHP that would be:
$index = preg_replace("/[^a-z0-9]+/i", "", $index);
In Java:
index = index.replace("/[^a-z0-9]+/i", "");
In Javascript:
index = index.replace(/[^a-z0-9]+/i, "");
Please do not allow users to define the index name. You can try to filter out illegal characters, but your regexp might have an issue, and you might run into trouble later.
Also users might not understand why they create problems if one usere uses My_Index and writes stuff in and the next user trying to access yndex accesses the same index.
BTW: The regexp given above is more strict than the list of legal characters asks for. For example _ is legal (but not at the beginning of the name), if you wanted to create a regexp that allows everything that is legal by ES standards, your regexp becomes more complicated and more error prone.
I've recently upgraded a CloudSearch instance from the 2011 to the 2013 API. Both instances have a field called sid, which is a text field containing a two-letter code followed by some digits e.g. LC12345. With the 2011 API, if I run a search like this:
q=12345*&return-fields=sid,name,desc
...I get back 1 result, which is great. But the sid of the result is LC12345 and that's the way it was indexed. The number 12345 does not appear anywhere else in any of the resulting document fields. I don't understand why it works. I can only assume that this type of query is looking for any terms in any fields that even contain the number 12345.
The reason I'm asking is because this functionality is now broken when I query using the 2013 API. I need to use the structured query parser, but even a comparable wildcard query using the simple parser is not working e.g.
q.parser=simple&q=12345*&return=sid,name,desc
...returns nothing, although the document is definitely there i.e. if I query for LC12345* it finds the document.
If I could figure out how to get the simple query working like it was before, that would at least get me started on how to do the same with the structured syntax.
Why it's not working
CloudSearch v1 (2011) had a different way of tokenizing mixed alpha+numeric strings. Here's the logic as described in the archived docs (emphasis mine).
If a string contains both alphabetic and numeric characters and is at
least three and no more than nine characters long, the alphabetic and
numeric portions of the string are treated as separate tokens. For
example, the string DOC298 is tokenized into two terms: doc 298
CloudSearch v2 (2013) text processing follows Unicode Text Segmentation, which does not specify that behavior:
Do not break within sequences of digits, or digits adjacent to letters (“3a”, or “A3”).
Solution
You should just be able to search *12345 to get back results with any prefix. There may be some edge cases like getting back results you don't want (things with more preceding digits like AB99912345); I don't know enough about your data to say whether those are real concerns.
Another option would would be to index the numeric prefix separately from the alphabetical suffix but that's additional work that may be unnecessary.
I'm guessing you are using Cloudsearch in English, so maybe this isn't your specific problem, but also watch out for Stopwords in your search queries:
https://docs.aws.amazon.com/cloudsearch/latest/developerguide/configuring-analysis-schemes.html#stopwords
In your example, the word "jo" is a stop word in Danish and another languages, and of course, supported languages, have a dictionary of stop words that has very common ones. If you don't specify a language in your text field, it will be English. You can see them here: https://docs.aws.amazon.com/cloudsearch/latest/developerguide/text-processing.html#text-processing-settings
It looks like, from inspection, that the form of a Google Tag Manager id is "GTM-XXXXXX" where the x's are [A-Z]|d, is this accurate? I need to verify whether the id's being submitted to a CMS are valid.
I've just wrote this one for our framework:
/^GTM-[A-Z0-9]{1,7}$/
Tested with 25 GTM container IDs in our account, all passed validation. You can try this expression out here.
The format varies. I see various combinations of numbers and letters, some just letters, none just numbers, most 6 characters, and few with 4 characters. There's no clear pattern. They begin with either a letter or number, and end with a letter or number.