Regular Expression to strip sensitive information from a JSON object - regex

I have a JSON object something like below from which i wanted to strip out sensitive information like password, mobile no, etc. using Regular Expressions,
Example JSON
{
"username":"abc",
"password":"xyz123",
"Security":{
"SecurityQuestion":"what is your first pet name",
"SecurityAnswer": "snoopy"
}
}
From the above JSON object, I wanted to strip out sensitive information like "password" and "SecurityAnswer". I tried various regular expression patterns but it was removing only either any one of the item.
I need help or guidance on how to construct a regular expression, in which i can include any names in the expression and then those fields will be stripped out of the JSON.
Expected Output:
{
"username":"abc",
"Security":{
"SecurityQuestion":"what is your first pet name"
}
}
Note: If a password is the last property, then the expression should be able to remove the comma (,) also from the previous property.
I tried the expression from Regex remove json property with various combinations but none were working as per my requirement.

If you want to get values from JSON, you don't need to use regex and make a very complex regular expression.
var data = {
"username":"abc",
"password":"xyz123",
"Security":{
"SecurityQuestion":"what is your first pet name",
"SecurityAnswer": "snoopy"
}
}
That is your object, now if you want to retrieve the data simply treat it like a json.
function retrieveData( Obj ) {
return {
username: Obj.username,
Security:{
SecurityQuestion: Obj.Security.SecurityQuestion
}
}
}
var extractedData = retrieveData(data);

Related

Extract JSON from String using flutter dart

Hello I want to extract JSON from below input string.
I have tried bellow regex in java and it is working fine,
private static final Pattern shortcode_media = Pattern.compile("\"shortcode_media\":(\\{.+\\})");
I want in regex for dart.
Input String
<script type="text/javascript">window.__initialDataLoaded(window._sharedData);</script><script type="text/javascript">window.__additionalDataLoaded('/p/B9fphP5gBeG/',{"graphql":{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}});</script><script type="text/javascript">
<script type="text/javascript">window.__initialDataLoaded(window._newData);</script><script type="text/javascript">window._newData('/p/B9fphP5gBeG/',{"graphql":{"post":{"__typename":"id","id":"2260708142683789190","new_code":"B9fphP5gBeG"}}});</script><script type="text/javascript">
(function(){
function normalizeError(err) {
var errorInfo = err.error || {};
var getConfigProp = function(propName, defaultValueIfNotTruthy) {
var propValue = window._sharedData && window._sharedData[propName];
return propValue ? propValue : defaultValueIfNotTruthy;
};
return {}
}
)
Expected json
{"graphql":{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}}
Note: There are multiple json string in input string, i need json of shortcode_media tag
please use
void main() {
​
String json = '''
{"graphql":
{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}},
"abc":{"def":"test"}
}
''';
RegExp regExp = new RegExp(
"\"shortcode_media\":(\\{.+\\})",
caseSensitive: false,
multiLine: false,
);
print(regExp.stringMatch(json).toString());
}
output
"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}
Dartpad
The corresponding Dart RegExp would be:
static final RegExp shortcodeMedia = RegExp(r'"shortcode_media":(\{.+\})");
It does not work, though. JSON is not a regular language, so you can't parse it using regular expressions.
The value of "shortcode_media" in your example JSON ends with several } characters. The RegExp will stop the match at the third of those, even though the second } is the one matching the leading {. If your JSON text contains any further values after the shortcode_media entry, those might be included as well.
Stopping at the first } would also be too short.
If someone reorders the JSON source code to the equivalent
"shortcode_media":{"dimensions":{"height":1326,"width":1080},"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG"}
(that is, putting the "dimensions" entry first), then you would only capture until the end of the dimensions block.
I would recommend either using a proper JSON parser, or at least improving the RegExp to be able to handle a single nested JSON object - since you seem to already know that it will happen.
Such a RegExp could be:
RegExp(r'"shortcode_media":(\{(?:[^{}]*(?:\{.*?\})?)*?\})')
This RegExp will capture the correct number of braces for the example code, but still won't work if there are more nested JSON objects. Only a real parser can handle the general case correctly.

Regex for getting content of a html property when another specific property doesn't exist

I struggle to find a solution for what is probably pretty simple, and despite I crawl a lot of questions, I can't manage to make it work.
Here are 2 HTML elements:
Test1
Test2
I want to get ONLY the content of the 1st element's href property (#content1). It must match because the html element contains no "onclick" property.
This regex works for matching the 1st element only:
^<a href="#"((?!onclick).)*$
but I can't figure out how to get the HREF content.
I've tried this:
^<a href="#(.*)"((?!onclick).)*$
but in this case, both elements are matching.
Thanks for your help !
I strongly suggest that you should do that in two steps. For one thing, parsing arbitrary html with a regexp is a notoriously slippery and winding road. For the other: there is no achievement in doing everything with one illegible regex.
And there's more to it: "contains no "onclick" attribute" is not the same as "href attribute is not directly followed by onclick attribute". So, a one-regex-solution would be either very complicated or very fragile (html tags have arbitrary attributes order).
var a = [
'Test1',
'Test2'
];
console.log(
a.filter(i => i.match(/onclick/i) == null)
.map(i => i.match(/href="([^"]+)"/i)[1]
)
This assumes that your href attribute values are valid and do not contain quotes (which is, of course, technically possible).
Regex is not made for this. JavaScript would work better. This code will store an array of the hrefs matching your requirements in the variable hrefArray.
var hrefArray = [];
for (var elem of document.getElementsByTagName('a')) {
if (elem.onclick) hrefArray.push(elem.href)
}
An example with your HTML is in the snippet below:
var hrefArray = [];
for (var elem of document.getElementsByTagName('a')) {
if (elem.onclick) hrefArray.push(elem.href)
}
console.log(hrefArray);
body {
background-color: gray;
}
Test1
Test2

Custom vallidator to ban a specific wordlist

I need a custom validator to ban a specific list of banned words from a textarea field.
I need exactly this type of implementation, I know that it's not logically correct to let the user type part of a query but it's exactly what I need.
I tried with a regExp but it has a strange behaviour.
My RegExp
/(drop|update|truncate|delete|;|alter|insert)+./gi
my Validator
export function forbiddenWordsValidator(sqlRe: RegExp): ValidatorFn {
return (control: AbstractControl): { [key: string]: any } | null => {
const forbidden = sqlRe.test(control.value);
return forbidden ? { forbiddenSql: { value: control.value } } : null;
};
}
my formControl:
whereCondition: new FormControl("", [
Validators.required,
forbiddenWordsValidator(this.BAN_SQL_KEYWORDS)...
It works only in certain cases and I don't understand why does the same string works one time and doesn't work if i delete a char and rewrite it or sometimes if i type a whitespace the validator returns ok.
There are several issues here:
The global g modifier leads to unexpected alternated results when used in RegExp#test and similar methods that move the regex index after a valid match, it must be removed
. at the end requires any 1 char other than line break char, hence it must be removed.
Use
/drop|update|truncate|delete|;|alter|insert/i
Or, to match the words as whole words use
/\b(?:drop|update|truncate|delete|alter|insert)\b|;/i
This way, insert in insertion and drop in dropout won't get "caught" (=matched).
See the regex demo.
it's not a great idea to give such power to the user

Regular Expression If 2nd parameter is Enrollment

I have below response
{
"id": "3452",
"enrollable_id": "3452",
"enrollable_type": "Enrollment"
}
{
"id": "3453",
"enrollable_id": "3453",
"enrollable_type": "Task"
}
{
"id": "3454",
"enrollable_id": "3454",
"enrollable_type": "Enrollment"
}
{
"id": "3455",
"enrollable_id": "3455",
"enrollable_type": "Task"
}
I would like to get id [3452 and 3454] only if enrollable_type= Enrollment. This is for jmeter regex extractor so it would be great if I can just use one liner regex to fetch 3452 and 3454.
The RegEx you are looking for is:
_id":\s*"([^"]+(?=[^\0}]+_type":\s*"E))
Try it online!
Explanation
_id":\s*" Finds the place where the enrollment_id is
[^"]+(?= Matches the ID if:
[^\0}]+_type":\s* Finds the place where enrollable_type is
"E Checks if the enrollable type begins with an uppercase E
) End if
( ) Captures the ID
It's important to note that this RegEx will match on valid people and capture the valid ID. This means you will need to get each match's capture rather than just getting each match.
Disclaimer
The above RegEx contains backslashes, which you will need to escape if using the RegEx as a string literal.
This is the RegEx with all necessary-to-escape characters escaped:
_id":\\s*"([^"]+(?=[^\\0}]+_type":\\s*"E))
It's usually a bad idea to parse structured data with just a regex, but if you're intent on going this route then here you go:
"(\d+)"\s*,\s*(?="enrollable_type":\s*"Enrollment")
This assumes that entrollable_type always follows enrollable_id and that everything is quoted consistently with a little allowance for variance in white space. You should be able to handle a little more variance if necessary, such as if you're unsure if can depend on keys or data being quoted (["']?). However, if you can depend on the order of the properties (such as if they type comes before id) then you should abandon using a regex.
Here's a sample working in JavaScript
const text = `{ "id": "3452", "enrollable_id": "3452", "enrollable_type": "Enrollment" } { "id": "3453", "enrollable_id": "3453", "enrollable_type": "Task" } { "id": "3454", "enrollable_id": "3454", "enrollable_type": "Enrollment" } { "id": "3455", "enrollable_id": "3455", "enrollable_type": "Task" }`;
const re = /"(\d+)"\s*,\s*(?="enrollable_type":\s*"Enrollment")/g;
var match;
while(match = re.exec(text)) {
console.log(match[1]);
}
Your response seems to be a JSON one (however it's malformed). If this is the case and it's really JSON - I would recommend going for JSON Extractor instead as regular expressions are fragile, sensitive to markup change, new lines, order of elements, etc. while JSON Extractor looks only into the content.
The relevant JSON Path query would be something like:
$..[?(#.enrollable_type == 'Enrollment')].enrollable_id
Demo:
More information: JMeter's JSON Path Extractor Plugin - Advanced Usage Scenarios
You can extract the data in 2 ways
Using Json Extractor.
To extract data using json extractor response data should follow json syntax rules,
To extract data use the following JSON path in json extractor
$..[?(#.enrollable_type=="Enrollment")].id
and use match no -1 as shown below
To extract data using regular expression extractor use the following regex
id": "(.+?)",\s*(.+?)\s*"enrollable_type": "Enrollment
template : $1$2$3$4$
Match no -1
as shown below
you can see the variables stored using debug sampler
More information
extract variables

Replace variable names with actual class Properties - Regex? (C#)

I need to send a custom email message to every User of a list ( List < User > ) I have. (I'm using C# .NET)
What I would need to do is to replace all the expressions (that start with "[?&=" have "variableName" in the middle and then ends with "]") with the actual User property value.
So for example if I have a text like this:
"Hello, [?&=Name]. A gift will be sent to [?&=Address], [?&=Zipcode], [?&=Country].
If [?&=Email] is not your email address, please contact us."
I would like to get this for the user:
"Hello, Mary. A gift will be sent to Boulevard Spain 918, 11300, Uruguay.
If marytech#gmail.com is not your email address, please contact us."
Is there a practical and clean way to do this with Regex?
This is a good place to apply regex.
The regular expression you want looks like this /\[\?&=(\w*)\]/ example
You will need to do a replace on the input string using a method that allows you to use a custom function for replacement values. Then inside that function use the first capture value as the Key so to say and pull the correct corresponding value.
Since you did not specify what language you are using I will be nice and give you an example in C# and JS that I made for my own projects just recently.
Pseudo-Code
Loop through matches
Key is in first capture group
Check if replacements dict/obj/db/... has value for the Key
if Yes, return Value
else return ""
C#
email = Regex.Replace(email, #"\[\?&=(\w*)\]",
match => //match contains a Key & Replacements dict has value for that key
match?.Groups[1].Value != null
&& replacements.ContainsKey(match.Groups[1].Value)
? replacements[match.Groups[1].Value]
: "");
JS
var content = text.replace(/\[\?&=(\w*)\]/g,
function (match, p1) {
return replacements[p1] || "";
});