Escape colon in a Scorched/Sunburnt/Solr search - django

I'm using scorched for interfacing my Django code with my Solr instance:
response = si.query(DismaxString(querydict.get('q')).execute()
In the data I'm searching I have colons (e.g. Col: Mackay's Reel) and I don't want Solr to interpret that colon as a 'field' query:
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"undefined field Col"
How can I escape all the colons the user enters in the query?
I'm getting an error even on the Solr query interface directly.

For now my workaround is by adding a backslash to each colon via JavaScript before sending the form, so that the colons are sent as ':'.

Related

How to execute a structured query containing symbols in AWS Cloudsearch

I'm trying to execute a structured prefix query in Cloudsearch.
Here's a snippet of the query args (csattribute is of type text)
{
"query": "(prefix field=csattribute '12-3')",
"queryParser": "structured",
"size": 5
}
My above query will result in No matches for "(prefix field=csattribute '12-3')".
However, if I change my query to
{
"query": "(prefix field=csattribute '12')",
"queryParser": "structured",
"size": 5
}
Then I will get a list of results I expect.
I haven't found much in my brief googling. How do I include the - in the query? Does it need to be escaped? Are there other characters that need to be escaped?
I got pointed to the right direction via this SO question: How To search special symbols AWS Search
Below is a snippet from https://docs.aws.amazon.com/cloudsearch/latest/developerguide/text-processing.html
Text Processing in Amazon CloudSearch ... During tokenization, the
stream of text in a field is split into separate tokens on detectable
boundaries using the word break rules defined in the Unicode Text
Segmentation algorithm.
According to the word break rules, strings separated by whitespace
such as spaces and tabs are treated as separate tokens. In many cases,
punctuation is dropped and treated as whitespace. For example, strings
are split at hyphens (-) and the at symbol (#). However, periods that
are not followed by whitespace are considered part of the token.
From what I understand, text and text-array fields are tokenized based upon the analysis scheme (in my case it's english). The text was tokenized, and the - symbol is a word break token.
This field doesn't need to be tokenized. Updating the index type to literal prevents all tokenization on the field, which allows the query in my question to return the expected results.

Get an error when trying to import a CSV using Google Bigquery CLI with backslash as an escape character

When trying to upload a csv file to BigQuery with the following params:
bq load --quote='"' --null_marker='\\N' --source_format=CSV sendpulse.user_orders gs://data-sendpulse/user_orders_3.csv
I get an error when trying to parse the following row:
"0","63800.00","1","0","Service \"Startup Pack\""
Obviously, Bigquery doesn't treat backslash as an escape character for inner quotes, but is there a way to specify backslash as an escape character?
Tried different options and always got errors.
Update:
Quote in a quoted csv value is escaped with another quote and there is no setting for an escape character.
I don't see a better workaround than replacing all \" with ' or "" in your files.

Mariadb: Regexp_substr not working with non-matching group regular expression

I am using a query to pull a user ID from a column that contains text. This is for a forum system I am using and want to get the User id portion out of a text field that contains the full message. The query I am using is
SELECT REGEXP_SUBSTR(message, '(?:member: )(\d+)'
) AS user_id
from posts
where message like '%quote%';
Now ignoring the fact thats ugly SQL and not final I just need to get to the point where it reads the user ID. The following is an example of the text that you would see in the message column
`QUOTE="Moony, post: 967760, member: 22665"]I'm underwhelmed...[/QUOTE]
Hopefully we aren’t done yet and this is nothing but a depth signing!`
Is there something different about the regular expression when used in mariadb REGEXP_SUBST? this should be PCRE and works within regex testers and should read correctly. It should be looking for the group "member: " and then take the numbers after that and have a single match on all those posts.
This is an ugly hack/workaround that works by using a lookahead for the following "] however will not work if there are multiple quotes in a post
(?<=member: )\.+(?="])

Django url pattern to ignore extra characters

I have a django url shown below :
url(r'^update_status/field1/(?P<field1_id>.*)/field2/(?P<field2_id>.*)/$', 'update_status', name='update_status')
This catches both the urls like :
update_status/field1/0445df4d8e1c43ae9/field2/f12b6b5c98/mraid.js/
and
update_status/field1/0445df4d8e1c43ae9/field2/f12b6b5c98
I want to capture only the second url . What should be changed in the django url?
Your regular expression (?P<field_id>.*) catches any character, including / characters. You want to restrict it to the format of the field_ids like this: (?P<field1_id>[0-9a-f]+) (same for field2_id).
Note: I'm assuming your id consists only of hexadecimal characters.

Django forms to restrict chinese characters

I am using a form wizard for step-by-step forms in our django application.
Now the requirement is to restrict chinese characters in the form fields. I managed to get the regex but the problem is if I try to use that validator regex on email field it doesnt allow '#' symbol, and raises invalid flag.
Here is the regex I am using:
restrict_chinese_characters_regex = RegexValidator(r'^[\u4E00-\u9FFF\u3400-\u4DFF\uF900-\uFAFF]*$', 'Chinese characters are restricted.')
I am kind of lost to make it work also with email field to allow '#' symbol.
Any ideas please?