Extracting key-value pairs from a string using ruby & regex - regex

I want to accomplish the following with ruby and if possible a regex:
Input: "something {\"key\":\"value\",\"key2\":3}"
Output: [["\"key\"", "\"value\""], [["\"key2\"", "3"]]
My attempt so far:
s = "something {key:\"value\",key2:3}"
s.scan(/.* {(?:([^:]+):([^,}]+),?)+}$/)
# Output: [["\"key2\"", "3"]]
For some reason the regex above only matches the last key value pair. Does someone know how to retrieve all the pairs?
Just to be clear, "something" can be any kind of string. For this reason, solutions such as (1) splitting the text directly on the equal or (2) a regex as used in s.scan(/(?:([^:]+):([^,}]+),?)/) don't work for me.
I know there are similar questions on SO. Still, from what I saw, they mostly tend towards the solutions 1 & 2 or focus on a single key value pair.

your string looks like a json data structure encoded as a string, you can use JSON.parse for this as long as you remove the word "something " from the string
require 'json'
string = "something {\"key\":\"value\",\"key2\":3}"
# the following line removes the word something
string = string[string.index("{")..-1]
x = JSON.parse(string)
puts x["key"]
puts x["key2"]
you can then convert that to an array if required
alternatively if you want to use regular expressions try
string.scan(/(?:"(\w+)":"?(\w+)"?)/)

Related

parse URL params in Perl

I am working on some tutorials to explain things like GET/POST's and need to parse the URI manually. The follow perl code works, but I am trying to do two things:
list each key/value
be able to look up one specific value
What I do NOT care about is replacing the special chars to spaces or anything, the one value I need to get should be a number. In other languages I have used, the regular expression in question should group each key/value into one grouping with a part 1/part 2, does Perl do the same? If so, how do I put that into a map?
my #paramList = split /(?:\?|&|;)([^=]+)=([^&|;]+)/, $ENV{'REQUEST_URI'};
if(#paramList)
{
print "<h1>The Params</h1><ul>";
foreach my $i (#paramList) {
if($i) {
print "<li>$i</li>";
}
}
print "<ul>";
}
Per the request, here is a basic example of the input:
REQUEST_URI = /cgi-bin/printenv_html.pl?customer_name=fdas&phone_number=fdsa&email_address=fads%40fd.com&taxi=van&extras=tip&pickup_time=2020-01-14T20%3A45&pickup_place=&dropoff_place=Airport&comments=
goal is the following where the left of the equal is the key, and the right is the value:
customer_name=fdas
phone_number=fdsa
email_address=fads%40fd.com
taxi=van
extras=tip
pickup_time=2020-01-14T20%3A45
pickup_place=
dropoff_place=Airport
comments=
How about feeding your list of key-value pairs into a hash?
my %paramList = $ENV{'REQUEST_URI'} =~ /(?:\?|&|;)([^=]+)=([^&|;]+)/g;
(no reason for the split as far as I can tell)
This relies crucially on there being an even-sized list of matches, where each "before-=" thing becomes a key in the hash, with the value being its pairing "after-=" thing.
In order to also get "pairs" without a value (like comments=) change + in the last pattern to *

Regular Expression to extract the digits comes after 36th character in a String

In jmeter, I need to extract digits which comes after 36th character.
Example
Response: {"data":{"paymentId":"DOM1234567890111243"}}
I need to extract :11243 (Sometimes it will be only 1 or 2 or 3 or 4 digits)
Right boundary : DOM12345678901 Keeps changing too.But the right boundary length will be 36 charters always.
Any help will be higly appreciated.
Your response data seems to be JSON therefore I wouldn't rely on this "36 characters" as it's format might be different.
I would suggest extracting this paymentId value first and then apply a regular expression onto this DOMxxx bit.
Add JSR223 PostProcessor as a child of the request which returns the above data
Put the following code into "Script" area:
def dom = new groovy.json.JsonSlurper().parse(prev.getResponseData()).data.paymentId
log.info("DOM: " + dom)
def myValue = ((dom =~ ".{14}(\\d+)")[0][1]) as String
log.info("myValue: " + myValue)
vars.put("myValue", myValue)
That's it, you should be able to access the extracted data as ${myValue} where required.
More information:
Groovy: Parsing and producing JSON
Groovy: Match Operator
Apache Groovy - Why and How You Should Use It
If there isn't anything else in the string you're checking, you could use something like:
.{36}(\d+)
The first group of this regex will be the number you're looking for.
Test and explanation: https://regex101.com/r/iDOO8T/2

How to access the results of .match as string value in Crystal lang

In many other programming languages, there is a function which takes as a parameter a regular expression and returns an array of string values. This is true of Javascript and Ruby. The .match in crystal, however, does 1) not seem to accept the global flag and 2) it does not return an array but rather a struct of type Regex::MatchData. (https://crystal-lang.org/api/0.25.1/Regex/MatchData.html)
As an example the following code:
str = "Happy days"
re = /[a-z]+/i
matches = str.match(re)
puts matches
returns Regex::MatchData("Happy")
I am unsure how to convert this result into a string or why this is not the default as it is in the inspiration language (Ruby). I understand this question probably results from my inexperience dealing with structs and compiled languages but I would appreciate an answer in hopes that it might also help someone else coming from a JS/Ruby background.
What if I want to convert to a string merely the first match?
puts "Happy days"[/[a-z]+/i]?
puts "Happy days".match(/[a-z]+/i).try &.[0]
It will try to match a string against /[a-z]+/i regex and if there is a match, Group 0, i.e. the whole match, will be output. Note that the ? after [...] will make it fail gracefully if there is no match found. If you just use puts "??!!"[/[a-z]+/i], an exception will be thrown.
See this online demo.
If you want the functionality similar to String#scan that returns all matches found in the input, you may use (shortened version only left as per #Amadan's remark):
matches = str.scan(re).map(&.string)
Output of the code above:
["Happy days", "Happy days"]
Note that:
String::scan will return an array of Regex::MatchData for each match.
You can call .string on the match to return the actual matched text.
Actually the posted example returns a #<MatchData "Happy"> in Ruby, which also has no "global" flag – thats what String#scan(Regex) is for as mentioned by others.
If you want only a single match without going through Regex::MatchData, you can use String#[](Regex):
str = "Happy days"
p str[/[a-z]+/i] # => "Happy"

Remove all text from string after a sequence of words in Scala

I am trying to assemble a UDF in Scala that takes a column from a data frame and manipulates it to remove HTML and other useless pieces of text.
The column I need to modify is very messy, sometimes there is HTML, sometimes there is not... Searching SO I have found a regex solution to remove HTML
what I'd like to accomplish now is to find a regex that can find a specific word in the text and delete all the text after that word.
I think I understand from this SO answer that the regex should be something like \).* if you want to remove all after ), so I am trying to adapt this to my case, unsuccessfully due to my lack of knowledge about regex.
I have strings like:
I am interested to hear from you, thanks Sent from iPhone other stuff I want to delete....
I'd like to retain the first part of the string up to "Sent from" excluded, so a perfect output would be:
I am interested to hear from you, thanks
What I have so far is something like:
val toStringNoHTML = udf[String, String](_.toString
// code from SO as linked above
.replaceAll("""<(?!\/?a(?=>|\s.*>))\/?.*?>""", " ")
// delete all text after key word
.replaceAll("""'Sent from'.*""", "")
// remove all punctuation
.replaceAll("""[\p{Punct}\n]""", " ")
)
While the HTML gets remove, the "Sent from" and all the text after does not. Any hint how to adjust the regex to make it work?
EDIT
as pointed out in the comment, a small typo prevented my code to work, thanks for the help:
.replaceAll("""'Sent from'.*""", "")
should be
.replaceAll("""Sent from.*""", "")
Instead of doing multiple replaceAll(pattern, blank) I'd be tempted to start with an extraction.
val msgRE = "(.*>)?(.*)Sent from.*".r
val result = udfStr match {
case msgRE(_, msg) => Some(msg.trim) // .replaceAll() can be added here
case _ => None
}
Here the result is an Option[String] but that really depends on how you want to handle the non-matching input.
If more cleaning is needed after the extraction then replaceAll() can be added where indicated (or the extraction pattern can be better refined).

Checking if a string contains a character in Scala

I have a collection of Strings and I'm checking if they're correctly masked or not.
They're in a map and so I'm iterating over it, pulling out the text value and then checking. I'm trying various different combinations but none of which are giving me the finished result that I need. I have gotten it working by iterating over each character but that feels very java-esque.
My collection is something like:
"text"-> "text"
"text"-> "**xt"
"text"-> "****"
in the first two cases I need to confirm that the value is not all starred out and then add them to another list that can be returned.
Edit
My question: I need to check if the value contains anything other an '*', how might I accomplish this in the most efficient scala-esque way?
My attempt at regex also failed giving many false positives and it seems like such a simple task. I'm not sure if regex is the way to go, I also wondered if there was a method I could apply to .contains or use pattern matching
!string.matches("\\*+") will tell you if the string contains characters other than *.
If I understand correctly, you want to find the keys in your map for which the value is not just stars. You can do this with a regex :
val reg = "\\*+".r
yourMap.filter{ case (k,v) => !reg.matches(v) }.keys
If you're not confortable with a regex, you can use a forall statement:
yourMap.filter{ case(k,v) => v.forall(_ == '*') }.keys
Perhaps I misunderstood your question, but if you started with a Map you could try something like:
val myStrings = Map("1"-> "text", "2"-> "**xt", "3"-> "****")
val newStrings = myStrings.filterNot( _._2.contains("*") )
This would give you a Map with just Map(1 -> "text").
Try:
val myStrings = Map("1"-> "text", "2"-> "**xt", "3"-> "****")
val goodStrings = myStrings.filter(_._2.exists(_ !='*'))
This finds all cases where the value in the map contains something other than an asterisk. It will remove all empty and asterisk-only strings. For something this simple, i.e. one check, a regex is overkill: you're just looking for strings that contain any non-asterisk character.
If you only need the values and not the whole map, use:
val goodStrings = myStrings.values.filter(_.exists(_ !='*'))