Extracting numeric data from string using regex in Python - regex

I would like to extract numeric data from multiple string in a list, for example, considering the following string;
'\nReplies:\r\n\t\t\t\t\t\r\n\t\t\t\t\t\t20\r\n\t\t\t\t\t\r\n\t\t\t\t\nViews: 20,087\nRating0 / 5\n'
I would like to extract the numeric data of views, i.e., 20,087 and the same holds good for replies, i.e., 20
I use the following regex code using python
view = re.findall("\W*Views*:\D*(\d+)*,(\d+)", str(string_name))
replies = re.findall("\W*Views*:\D*(\d+)", str(string_name))
I do get the following output;
views: [('20', '087')]
replies: ['20']
But, the problem arises when I try to run the same code for the following string;
'\nReplies:\r\n\t\t\t\t\t\r\n\t\t\t\t\t\t20\r\n\t\t\t\t\t\r\n\t\t\t\t\nViews: 208\nRating0 / 5\n'
I actually get a empty list, which is not what I want. Also, I run the whole thing in a loop, for a list of 34 different strings.
views = []
replies = []
for data in data_container:
statistics = data.find("ul", class_ = 'threadstats')
view = re.findall("\W*Views*:\D*(\d+)*,(\d+)", str(statistics))
views.append(view)
repl = re.findall("\W*Replies*:\D*(\d+)", str(statistics))
replies.append(repl)
So, when I run in a loop, I get the following output, which is not what I am looking for!!
Views: [[('20', '087')], [('44', '467')], [('6', '975')], [('43', '287')], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
Since, I am missing out the numeric data which consists of only 2-3 digits. Any help would be really appreciated.

I suggest extracting a digit (\d) and any 0+ chars after it that are digits or commas ([\d,]*) to ensure you get the whole formatted number in the resulting list:
view = re.findall(r"\bViews:\D*(\d[\d,]*)", string_name)
replies = re.findall(r"\bReplies:\D*(\d[\d,]*)", string_name)
See the Python demo:
import re
string_names = ['\nReplies:\r\n\t\t\t\t\t\r\n\t\t\t\t\t\t20\r\n\t\t\t\t\t\r\n\t\t\t\t\nViews: 208\nRating0 / 5\n',
'\nReplies:\r\n\t\t\t\t\t\r\n\t\t\t\t\t\t20\r\n\t\t\t\t\t\r\n\t\t\t\t\nViews: 20,087\nRating0 / 5\n']
for string_name in string_names:
view = re.findall(r"\bViews:\D*(\d[\d,]*)", string_name)
replies = re.findall(r"\bReplies:\D*(\d[\d,]*)", string_name)
print("view = {}; replies = {}".format(view, replies))
Output:
view = ['208']; replies = ['20']
view = ['20,087']; replies = ['20']

Try this.
Views\s*\:\s*([0-9\,\.]*?)\\

Try this:
(\W\w)*[rR]eplies:(\W\w)*(?<replies>\d+)(\W\w)*[vV]iews:\s(?<views>\d+,?\d+).*
It will give you both replies and views in seperate groups:
eg. for input
'\nReplies:\r\n\t\t\t\t\t\r\n\t\t\t\t\t\t20\r\n\t\t\t\t\t\r\n\t\t\t\t\nViews: 208\nRating0 / 5\n'
'replies' group: 20
'views group: 208
See it on regex101

Related

how to create a filter to search for a word with special characters while writing in the input without special characters

it's my first post.
I work to Quasar (Vue.js)
I have list of jobs, and in this list, i have words with special caractere.
Ex :
[ ...{ "libelle": "Agent hôtelier" },{"libelle": "Agent spécialisé / Agente spécialisée des écoles maternelles -ASEM-"},{ "libelle": "Agriculteur / Agricultrice" },{ "libelle": "Aide aux personnes âgées" },{ "libelle": "Aide de cuisine" },...]
And on "input" i would like to search "Agent spécialisé" but i want to write "agent specialise" (without special caractere) or the initial name, i want to write both and autocomplete my "input".
I just don't fin the solution for add to my filter code ...
My input :
<q-select
filled
v-model="model"
use-input
hide-selected
fill-input
input-debounce="0"
:options="options"
hint="Votre métier"
style="width: 250px; padding-bottom: 32px"
#filter="filterFn"
>
</q-select>
</div>
My code :
export default {
props: ['data'],
data() {
return {
jobList: json,
model: '',
options: [],
stringOptions: []
}
},
methods: {
jsonJobsCall(e) {
this.stringOptions = []
json.forEach(res => {
this.stringOptions.push(res.libelle)
})
},
filterFn(val, update) {
if (val === '') {
update(() => {
this.jsonJobsCall(val)
this.options = this.stringOptions
})
return
}
update(() => {
const regex = /é/i
const needle = val.toLowerCase()
this.jsonJobsCall(val)
this.options = this.stringOptions.filter(
v => v.replace(regex, 'e').toLowerCase().indexOf(needle) > -1
)
})
},
}
}
To sum up : i need filter for write with or witouth special caractere in my input for found in my list the job which can contain a special character.
I hope i was clear, ask your questions if i haven't been.
Thanks you very much.
I am not sure if its work for you but you can use regex to create valid filter for your need. For example, when there is "e" letter you want to check "e" or "é" (If I understand correctly)
//Lets say we want to match "Agent spécialisé" with the given search text
let searchText = "Agent spe";
// Lets create a character map for matching characters
let characterMap = {
e: ['e', 'é'],
a: ['a', '#']
}
// Replacing special characters with a regex part which contains all equivelant characters
// !Remember replaceAll depricated
Object.keys(characterMap).forEach((key) => {
let replaceReg = new RegExp(`${key}`, "g")
searchText = searchText.replace(replaceReg, `[${characterMap[key].join("|")}]`);
})
// Here we create a regex to match
let reg = new RegExp(searchText + ".*")
console.log("Agent spécialisé".match(reg) != null);
Another approach could be the reverse of this. You can normalize "Agent spécialisé". (I mean replace all é with normal e with a regex like above) and store in the object along with the original text. But search on this normalized string instead of original.

Query Django JSONFields that are a list of dictionaries

Given a Django JSONField that is structured as a list of dictionaries:
# JSONField "materials" on MyModel:
[
{"some_id": 123, "someprop": "foo"},
{"some_id": 456, "someprop": "bar"},
{"some_id": 789, "someprop": "baz"},
]
and given a list of values to look for:
myids = [123, 789]
I want to query for all MyModel instances that have a matching some_id anywhere in those lists of dictionaries. I can do this to search in dictionaries one at a time:
# Search inside the third dictionary in each list:
MyModel.objects.filter(materials__2__some_id__in=myids)
But I can't seem to construct a query to search in all dictionaries at once. Is this possible?
Given the clue here from Davit Tovmasyan to do this by incrementing through the match_targets and building up a set of Q queries, I wrote this function that takes a field name to search, a property name to search against, and a list of target matches. It returns a new list containing the matching dictionaries and the source objects they come from.
from iris.apps.claims.models import Claim
from django.db.models import Q
def json_list_search(
json_field_name: str,
property_name: str,
match_targets: list
) -> list:
"""
Args:
json_field_name: Name of the JSONField to search in
property_name: Name of the dictionary key to search against
match_targets: List of possible values that should constitute a match
Returns:
List of dictionaries: [
{"claim_id": 123, "json_obj": {"foo": "y"},
{"claim_id": 456, "json_obj": {"foo": "z"}
]
Example:
results = json_list_search(
json_field_name="materials_data",
property_name="material_id",
match_targets=[1, 22]
)
# (results truncated):
[
{
"claim_id": 1,
"json_obj": {
"category": "category_kmimsg",
"material_id": 1,
},
},
{
"claim_id": 2,
"json_obj": {
"category": "category_kmimsg",
"material_id": 23,
}
},
]
"""
q_keys = Q()
for match_target in match_targets:
kwargs = {
f"{json_field_name}__contains": [{property_name: match_target}]
}
q_keys |= Q(**kwargs)
claims = Claim.objects.filter(q_keys)
# Now we know which ORM objects contain references to any of the match_targets
# in any of their dictionaries. Extract *relevant* objects and return them
# with references to the source claim.
results = []
for claim in claims:
data = getattr(claim, json_field_name)
for datum in data:
if datum.get(property_name) and datum.get(property_name) in match_targets:
results.append({"claim_id": claim.id, "json_obj": datum})
return results
contains might help you. Should be something like this:
q_keys = Q()
for _id in myids:
q_keys |= Q(materials__contains={'some_id': _id})
MyModel.objects.filter(q_keys)

Elixir/Phoenix enum through tuple to replace paths

I'm parsing some HTML with Floki. And receive the following tuple:
{"html", [{"lang", "en"}],
[{"head", [],
[{"title", [], ["My App"]},
{"link", [{"rel", "stylesheet"}, {"href", "/css/app.css"}], []}]},
{"body", [],
[{"main", [{"id", "main_container"}, {"role", "main"}], []},
{"script", [{"src", "/js/app.js"}], [""]},
{"iframe",
[{"src", "/phoenix/live_reload/frame"}, {"style", "display: none;"}],
[]}]}]}
Is it possible to enumerate through all the elements, and for those that have href or src add full path to them? For example in this case replace them with: http://localhost/css/app.css and http://localhost/js/app.js
Here's one way you could do it using a recursive function.
defmodule HTML do
def use_full_path({el, attrs, children}) do
{el, update_attrs(attrs), Enum.map(children, &use_full_path/1)}
end
def use_full_path(string) do
string
end
defp update_attrs(attrs) do
Enum.map(attrs, fn {key, val} ->
if key in ["href", "src"] do
{key, "http://localhost" <> val}
else
{key, val}
end
end)
end
end
tree = {"html", [{"lang", "en"}],
[{"head", [],
[{"title", [], ["My App"]},
{"link", [{"rel", "stylesheet"}, {"href", "/css/app.css"}], []}]},
{"body", [],
[{"main", [{"id", "main_container"}, {"role", "main"}], []},
{"script", [{"src", "/js/app.js"}], [""]},
{"iframe",
[{"src", "/phoenix/live_reload/frame"}, {"style", "display: none;"}],
[]}]}]}
HTML.use_full_path(tree) |> IO.inspect

groovy: create a list of values with all strings

I am trying to iterate through a map and create a new map value. The below is the input
def map = [[name: 'hello', email: ['on', 'off'] ], [ name: 'bye', email: ['abc', 'xyz']]]
I want the resulting data to be like:
[hello: ['on', 'off'], bye: ['abc', 'xyz']]
The code I have right now -
result = [:]
map.each { key ->
result[random] = key.email.each {random ->
"$random"
}
}
return result
The above code returns
[hello: [on, off], bye: [abc, xyz]]
As you can see from above, the quotes from on, off and abc, xyz have disappeared, which is causing problems for me when i am trying to do checks on the list value [on, off]
It should not matter. If you see the result in Groovy console, they are still String.
Below should be sufficient:
map.collectEntries {
[ it.name, it.email ]
}
If you still need the single quotes to create a GString instead of a String, then below tweak would be required:
map.collectEntries {
[ it.name, it.email.collect { "'$it'" } ]
}
I personally do not see any reasoning behind doing the later way. BTW, map is not a Map, it is a List, you can rename it to avoid unnecessary confusions.
You could convert it to a json object and then everything will have quotes
This does it. There should/may be a groovier way though.
def listOfMaps = [[name: 'hello', email: ['on', 'off'] ], [ name: 'bye', email: ['abc', 'xyz']]]
def result = [:]
listOfMaps.each { map ->
def list = map.collect { k, v ->
v
}
result[list[0]] = ["'${list[1][0]}'", "'${list[1][1]}'"]
}
println result

How to search comma separated data in mongodb

I have movie database with different fields. the Genre field contains a comma separated string like :
{genre: 'Action, Adventure, Sci-Fi'}
I know I can use regular expression to find the matches. I also tried:
{'genre': {'$in': genre}}
the problem is the running time. it take lot of time to return a query result. the database has about 300K documents and I have done normal indexing over 'genre' field.
Would say use Map-Reduce to create a separate collection that stores the genre as an array with values coming from the split comma separated string, which you can then run the Map-Reduce job and administer queries on the output collection.
For example, I've created some sample documents to the foo collection:
db.foo.insert([
{genre: 'Action, Adventure, Sci-Fi'},
{genre: 'Thriller, Romantic'},
{genre: 'Comedy, Action'}
])
The following map/reduce operation will then produce the collection from which you can apply performant queries:
map = function() {
var array = this.genre.split(/\s*,\s*/);
emit(this._id, array);
}
reduce = function(key, values) {
return values;
}
result = db.runCommand({
"mapreduce" : "foo",
"map" : map,
"reduce" : reduce,
"out" : "foo_result"
});
Querying would be straightforward, leveraging the queries with an multi-key index on the value field:
db.foo_result.createIndex({"value": 1});
var genre = ['Action', 'Adventure'];
db.foo_result.find({'value': {'$in': genre}})
Output:
/* 0 */
{
"_id" : ObjectId("55842af93cab061ff5c618ce"),
"value" : [
"Action",
"Adventure",
"Sci-Fi"
]
}
/* 1 */
{
"_id" : ObjectId("55842af93cab061ff5c618d0"),
"value" : [
"Comedy",
"Action"
]
}
Well you cannot really do this efficiently so I'm glad you used the tag "performance" on your question.
If you want to do this with the "comma separated" data in a string in place you need to do this:
Either with a regex in general if it suits:
db.collection.find({ "genre": { "$regex": "Sci-Fi" } })
But not really efficient.
Or by JavaScript evaluation via $where:
db.collection.find(function() {
return (
this.genre.split(",")
.map(function(el) {
return el.replace(/^\s+/,"")
})
.indexOf("Sci-Fi") != -1;
)
})
Not really efficient and probably equal to above.
Or better yet and something that can use an index, the separate to an array and use a basic query:
{
"genre": [ "Action", "Adventure", "Sci-Fi" ]
}
With an index:
db.collection.ensureIndex({ "genre": 1 })
Then query:
db.collection.find({ "genre": "Sci-Fi" })
Which is when you do it that way it's that simple. And really efficient.
You make the choice.