Matching and storing part of a string in a variable using JScript.NET - regex

I am fiddling with some a script for Fiddler, which uses JScript.NET. I have a string of the format:
{"params":{"key1":"somevalue","key2":"someothervalue","key3":"whatevervalue", ...
I want to match and show "key2":"someothervalue" where someothervalue could be any value but the key is static.
Using good old sed and bash I can replace the part I am looking for with:
$ a='{"params":{"key1":"somevalue","key2":"someothervalue","key3":"whatevervalue", ...'
$ echo $a | sed -r 's/"key2":"[^"]+"/replaced/g'
{"params":{"key1":"somevalue",replaced,"key3":"whatevervalue", ...
Now. Instead of replacing it, I want to extract that part into a variable using JScript.NET. How can that be done?

The most graceful way is to use a JSON parser. My personal preference is to import IE's JSON parser using the htmlfile COM object.
import System;
var str:String = '{"params":{"key1":"foo","key2":"bar","key3":"baz"}}',
htmlfile = new ActiveXObject('htmlfile');
// force htmlfile COM object into IE9 compatibility
htmlfile.IHTMLDocument2_write('<meta http-equiv="x-ua-compatible" content="IE=9" />');
// clone JSON object and methods into familiar syntax
var JSON = htmlfile.parentWindow.JSON,
// deserialize your JSON-formatted string
obj = JSON.parse(str);
// access JSON values as members of a hierarchical object
Console.WriteLine("params.key2 = " + obj.params.key2);
// beautify the JSON
Console.WriteLine(JSON.stringify(obj, null, '\t'));
Compiling, linking, and running results in the following console output:
params.key2 = bar
{
"params": {
"key1": "foo",
"key2": "bar",
"key3": "baz"
}
}
Alternatively, there are also at least a couple of .NET namespaces which provide methods to serialize objects into a JSON string, and to deserialize a JSON string into objects. Can't say I'm a fan, though. The ECMAScript notation of JSON.parse() and JSON.stringify() are certainly a lot easier and profoundly less alien than whatever neckbeard madness is going on at Microsoft.
And while I certainly don't recommend scraping JSON (or any other hierarchical markup if it can be helped) as complicated text, JScript.NET will handle a lot of familiar Javascript methods and objects, including regex objects and regex replacements on strings.
sed syntax:
echo $a | sed -r 's/("key2"):"[^"]*"/\1:"replaced"/g'
JScript.NET syntax:
print(a.replace(/("key2"):"[^"]*"/, '$1:"replaced"'));
JScript.NET, just like JScript and JavaScript, also allows for calling a lambda function for the replacement.
print(
a.replace(
/"(key2)":"([^"]*)"/,
// $0 = full match; $1 = (key2); $2 = ([^"]*)
function($0, $1, $2):String {
var replace:String = $2.toUpperCase();
return '"$1":"' + replace + '"';
}
)
);
... Or to extract the value of key2 using the RegExp object's exec() method:
var extracted:String = /"key2":"([^"]*)"/.exec(a)[1];
print(extracted);
Just be careful with that, though, as retrieving element [1] of the result of exec() will cause an index-out-of-range exception if there is no match. Might either want to if (/"key2":/.test(a)) or add a try...catch. Or better yet, just do what I said earlier and deserialize your JSON into an object.

Related

Regex for finding the name of a method containing a string

I've got a Node module file containing about 100 exported methods, which looks something like this:
exports.methodOne = async user_id => {
// other method contents
};
exports.methodTwo = async user_id => {
// other method contents
fooMethod();
};
exports.methodThree = async user_id => {
// other method contents
fooMethod();
};
Goal: What I'd like to do is figure out how to grab the name of any method which contains a call to fooMethod, and return the correct method names: methodTwo and methodThree. I wrote a regex which gets kinda close:
exports\.(\w+).*(\n.*?){1,}fooMethod
Problem: using my example code from above, though, it would effectively match methodOne and methodThree because it finds the first instance of export and then the first instance of fooMethod and goes on from there. Here's a regex101 example.
I suspect I could make use of lookaheads or lookbehinds, but I have little experience with those parts of regex, so any guidance would be much appreciated!
Edit: Turns out regex is poorly-suited for this type of task. #ctcherry advised using a parser, and using that as a springboard, I was able to learn about Abstract Syntax Trees (ASTs) and the recast tool which lets you traverse the tree after using various tools (acorn and others) to parse your code into tree form.
With these tools in hand, I successfully built a script to parse and traverse my node app's files, and was able to find all methods containing fooMethod as intended.
Regex isn't the best tool to tackle all the parts of this problem, ideally we could rely on something higher level, a parser.
One way to do this is to let the javascript parse itself during load and execution. If your node module doesn't include anything that would execute on its own (or at least anything that would conflict with the below), you can put this at the bottom of your module, and then run the module with node mod.js.
console.log(Object.keys(exports).filter(fn => exports[fn].toString().includes("fooMethod(")));
(In the comments below it is revealed that the above isn't possible.)
Another option would be to use a library like https://github.com/acornjs/acorn (there are other options) to write some other javascript that parses your original target javascript, then you would have a tree structure you could use to perform your matching and eventually return the function names you are after. I'm not an expert in that library so unfortunately I don't have sample code for you.
This regex matches (only) the method names that contain a call to fooMethod();
(?<=exports\.)\w+(?=[^{]+\{[^}]+fooMethod\(\)[^}]+};)
See live demo.
Assuming that all methods have their body enclosed within { and }, I would make an approach to get to the final regex like this:
First, find a regex to get the individual methods. This can be done using this regex:
exports\.(\w+)(\s|.)*?\{(\s|.)*?\}
Next, we are interested in those methods that have fooMethod in them before they close. So, look for } or fooMethod.*}, in that order. So, let us name the group searching for fooMethod as FOO and the name of the method calling it as METH. When we iterate the matches, if group FOO is present in a match, we will use the corresponding METH group, else we will reject it.
exports\.(?<METH>\w+)(\s|.)*?\{(\s|.)*?(\}|(?<FOO>fooMethod)(\s|.)*?\})
Explanation:
exports\.(?<METH>\w+): Till the method name (you have already covered this)
(\s|.)*?\{(\s|.)*?: Some code before { and after, non-greedy so that the subsequent group is given preference
(\}|(?<FOO>fooMethod)(\s|.)*?\}): This has 2 parts:
\}: Match the method close delimiter, OR
(?<FOO>fooMethod)(\s|.)*?\}): The call to fooMethod followed by optional code and method close delimiter.
Here's a JavaScript code that demostrates this:
let p = /exports\.(?<METH>\w+)(\s|.)*?\{(\s|.)*?(\}|(?<FOO>fooMethod)(\s|.)*?\})/g
let input = `exports.methodOne = async user_id => {
// other method contents
};
exports.methodTwo = async user_id => {
// other method contents
fooMethod();
};
exports.methodThree = async user_id => {
// other method contents
fooMethod();
};';`
let match = p.exec( input );
while( match !== null) {
if( match.groups.FOO !== undefined ) console.log( match.groups.METH );
match = p.exec( input )
}

Extract JSON from String using flutter dart

Hello I want to extract JSON from below input string.
I have tried bellow regex in java and it is working fine,
private static final Pattern shortcode_media = Pattern.compile("\"shortcode_media\":(\\{.+\\})");
I want in regex for dart.
Input String
<script type="text/javascript">window.__initialDataLoaded(window._sharedData);</script><script type="text/javascript">window.__additionalDataLoaded('/p/B9fphP5gBeG/',{"graphql":{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}});</script><script type="text/javascript">
<script type="text/javascript">window.__initialDataLoaded(window._newData);</script><script type="text/javascript">window._newData('/p/B9fphP5gBeG/',{"graphql":{"post":{"__typename":"id","id":"2260708142683789190","new_code":"B9fphP5gBeG"}}});</script><script type="text/javascript">
(function(){
function normalizeError(err) {
var errorInfo = err.error || {};
var getConfigProp = function(propName, defaultValueIfNotTruthy) {
var propValue = window._sharedData && window._sharedData[propName];
return propValue ? propValue : defaultValueIfNotTruthy;
};
return {}
}
)
Expected json
{"graphql":{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}}
Note: There are multiple json string in input string, i need json of shortcode_media tag
please use
void main() {
​
String json = '''
{"graphql":
{"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}},
"abc":{"def":"test"}
}
''';
RegExp regExp = new RegExp(
"\"shortcode_media\":(\\{.+\\})",
caseSensitive: false,
multiLine: false,
);
print(regExp.stringMatch(json).toString());
}
output
"shortcode_media":{"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG","dimensions":{"height":1326,"width":1080}}}
Dartpad
The corresponding Dart RegExp would be:
static final RegExp shortcodeMedia = RegExp(r'"shortcode_media":(\{.+\})");
It does not work, though. JSON is not a regular language, so you can't parse it using regular expressions.
The value of "shortcode_media" in your example JSON ends with several } characters. The RegExp will stop the match at the third of those, even though the second } is the one matching the leading {. If your JSON text contains any further values after the shortcode_media entry, those might be included as well.
Stopping at the first } would also be too short.
If someone reorders the JSON source code to the equivalent
"shortcode_media":{"dimensions":{"height":1326,"width":1080},"__typename":"GraphSidecar","id":"2260708142683789190","shortcode":"B9fphP5gBeG"}
(that is, putting the "dimensions" entry first), then you would only capture until the end of the dimensions block.
I would recommend either using a proper JSON parser, or at least improving the RegExp to be able to handle a single nested JSON object - since you seem to already know that it will happen.
Such a RegExp could be:
RegExp(r'"shortcode_media":(\{(?:[^{}]*(?:\{.*?\})?)*?\})')
This RegExp will capture the correct number of braces for the example code, but still won't work if there are more nested JSON objects. Only a real parser can handle the general case correctly.

How to extract the dynamically generated access token from the text file in perl?

I am Beginner to perl programming and I want know solution for this problem. I have this below information in the text file called token.txt. I want to extract only dynamically generated access_token value and store that value to mysql database.As mentioned access_token will be auto generated everytime so how I need to store this access_token value everytime. Anyone help me with the perl code. Thanks in advance
{
"access_token" : "JgV8Ln1lRGE8JTz4olEQW0rJJHUYsq2LO8Ny9o6m",
"token_type" : "abcdef",
"expires_in" : 123456
}
This is JSON formatted text, so I would suggest reading the file into a string and decoding it, e.g.:
parse.pl
use File::Slurp;
use v5.10;
use JSON;
$token = decode_json ( read_file('token.txt') );
say $token->{'access_token'};
Test it like this:
perl parse.pl
Output:
JgV8Ln1lRGE8JTz4olEQW0rJJHUYsq2LO8Ny9o6m
token will be in $tok,
perl -ne 's/"access_token"\s:\s"([^"]+)"/$tok=$1;print $1/e' token.txt

Running regex from windows BATCH

I'm trying to use regex running from a batch
<zed>(.*?)<zed>
to find values I've stored in a file
<process>34593845387<process>
<zed>M567<zed>
<encode>UTF16<encode>
I'm able to do that from java not from batch
You'll probably have to use something like powershell, or another tool. The basics of what you can get in batch won't be enough. Once you do, you'll probably want a regex like:
<zed>([^<]+)<
That way if later the format changes a bit from:
<zed>234<zed>
to
<zed>234< /zed>
or something, it will still work. It's happened to me before :)
Findstr can technically use regex, but it's limited to character sets and can't handle capturing.
If your data looks exactly like that (nothing to the left of <zed>), you can tokenize the string using < and > as delimiters and store the value of <zed> as the second token in the string.
for /F "tokens=1,2 delims=<>" %%A in (data.txt) do if "%%A"=="zed" set zed_value=%%B"
You can then access the variable with %zed_value%. If you have multiple <zed> fields, the variable will contain the value of the last one.
Why not use cscript to access use Javascript regex?
type data.txt | cscript //nologo match.js "<zed>(.*)<zed>"
Where match.js is defined as:
if (WScript.Arguments.Count() !== 1) {
WScript.Echo("Syntax: match.js regex");
WScript.Quit(1);
}
var rx = new RegExp(WScript.Arguments(0), "i");
var matched = false;
while (!WScript.StdIn.AtEndOfStream) {
var str = WScript.StdIn.ReadLine();
if (str.match(rx)) {
WScript.Echo(str);
matched = true;
}
}
if (!matched) {
WScript.Quit(1);
}

Regex to parse querystring values to named groups

I have a HTML with the following content:
... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...
I would like to parse that and get a match with named groups:
match 1
group["user"]=123
group["section"]=2
match 2
group["user"]=678
group["section"]=5
I can do it if parameters always go in order, first User and then Section, but I don't know how to do it if the order is different.
Thank you!
In my case I had to parse an Url because the utility HttpUtility.ParseQueryString is not available in WP7. So, I created a extension method like this:
public static class UriExtensions
{
private static readonly Regex queryStringRegex;
static UriExtensions()
{
queryStringRegex = new Regex(#"[\?&](?<name>[^&=]+)=(?<value>[^&=]+)");
}
public static IEnumerable<KeyValuePair<string, string>> ParseQueryString(this Uri uri)
{
if (uri == null)
throw new ArgumentException("uri");
var matches = queryStringRegex.Matches(uri.OriginalString);
for (int i = 0; i < matches.Count; i++)
{
var match = matches[i];
yield return new KeyValuePair<string, string>(match.Groups["name"].Value, match.Groups["value"].Value);
}
}
}
Then It's matter of using it, for example
var uri = new Uri(HttpUtility.UrlDecode(#"file.aspx?userId=123&section=2"),UriKind.RelativeOrAbsolute);
var parameters = uri.ParseQueryString().ToDictionary( kvp => kvp.Key, kvp => kvp.Value);
var userId = parameters["userId"];
var section = parameters["section"];
NOTE: I'm returning the IEnumerable instead of the dictionary directly just because I'm assuming that there might be duplicated parameter's name. If there are duplicated names, then the dictionary will throw an exception.
Why use regex to split it out?
You could first extrct the query string. Split the result on & and then create a map by splitting the result from that on =
You didn't specify what language you are working in, but this should do the trick in C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace RegexTest
{
class Program
{
static void Main(string[] args)
{
string subjectString = #"... some text ...
link ... some text ...
... some text ...
link ... some text ...
... some text ...";
Regex regexObj =
new Regex(#"<a href=""file.aspx\?(?:(?:userId=(?<user>.+?)&section=(?<section>.+?)"")|(?:section=(?<section>.+?)&user=(?<user>.+?)""))");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
string user = matchResults.Groups["user"].Value;
string section = matchResults.Groups["section"].Value;
Console.WriteLine(string.Format("User = {0}, Section = {1}", user, section));
matchResults = matchResults.NextMatch();
}
Console.ReadKey();
}
}
}
Using regex to first find the key value pairs and then doing splits... doesn't seem right.
I'm interested in a complete regex solution.
Anyone?
Check this out
\<a\s+href\s*=\s*["'](?<baseUri>.+?)\?(?:(?<key>.+?)=(?<value>.+?)[&"'])*\s*\>
You can get pairs with something like Groups["key"].Captures[i] & Groups["value"].Captures[i]
Perhaps something like this (I am rusty on regex, and wasn't good at them in the first place anyway. Untested):
/href="[^?]*([?&](userId=(?<user>\d+))|section=(?<section>\d+))*"/
(By the way, the XHTML is malformed; & should be & in the attributes.)
Another approach is to put the capturing groups inside lookaheads:
Regex r = new Regex(#"<a href=""file\.aspx\?" +
#"(?=[^""<>]*?user=(?<user>\w+))" +
#"(?=[^""<>]*?section=(?<section>\w+))";
If there are only two parameters, there's no reason to prefer this way over the alternation-based approaches suggested by Mike and strager. But if you needed to match three parameters, the other regexes would grow to several times their current length, while this one would only need another lookahead like just like the two existing ones.
By the way, contrary to your response to Claus, it matters quite a bit which language you're working in. There's a huge variation in capabilities, syntax, and API from one language to the next.
You did not say which regex flavor you are using. Since your sample URL links to an .aspx file, I'll assume .NET. In .NET, a single regex can have multiple named capturing groups with the same name, and .NET will treat them as if they were one group. Thus you can use the regex
userID=(?<user>\d+)&section=(?<section>\d+)|section=(?<section>\d+)&userID=(?<user>\d+)
This simple regex with alternation will be far more efficient than any tricks with lookaround. You can easily expand it if your requirements include matching the parameters only if they're in a link.
a simple python implementation overcoming the ordering problem
In [2]: x = re.compile('(?:(userId|section)=(\d+))+')
In [3]: t = 'href="file.aspx?section=2&userId=123"'
In [4]: x.findall(t)
Out[4]: [('section', '2'), ('userId', '123')]
In [5]: t = 'href="file.aspx?userId=123&section=2"'
In [6]: x.findall(t)
Out[6]: [('userId', '123'), ('section', '2')]