Subsonic DAL fails compilation on columns which are keywords - subsonic3

I'm using Subsonic 3 (Active Record), VS2010, Framework 3.5 against a MySQL database. Someone named a column "string." I S#*t you not.
OK, I've named some database objects some dumb names (Like naming a SQL Server table "User") before but... c'mon! And I cannot change the table because of dependencies in the app!
OK I feel a little better now that I've vented a little thanks for listening
So, of course, in the generated code I get errors all over the place from lines like:
private string string {}
I don't see this as a priority bug for Subsonic unless others are having similar issues with other column names. Any plans to do anything about this?
Thanks
Paul
p.s. I will try to talk the other developers into changing, or allowing me to change the column name, I know that is the real solution, but Subsonic could gracefully handle illegal names, maybe something like the way .netTiers handles it (add _safeName to the name, so it would become string_safeName).

I perhaps used the wrong language. Before I went ahead and made modifications, I wanted to make sure this was not all ready on some priority list somewhere or being worked on. Here's the way I went with it.
In Settings.ttinclude:
string[] reservedWords = new string[]{"abstract", "as", "base", "bool", "break", "byte", "case", "catch", "char", "checked",
"class", "const", "continue", "decimal", "default", "delegate", "do", "double", "else", "enum", "event", "explicit",
"extern", "finally", "fixed", "float", "for", "foreach", "goto", "if", "implicit", "in", "int", "interface", "internal",
"is", "lock", "long", "namespace", "new", "null", "object", "operator", "out", "override", "params", "private",
"protected", "public", "readonly", "ref", "return", "sbyte", "sealed", "short", "sizeof", "stackalloc", "static", "string",
"struct", "switch", "this", "throw", "try", "typeof", "unit", "ulong", "unchecked", "unsafe", "ushort", "using", "virtual",
"void", "volatile", "while", "FALSE", "TRUE", "yield", "by", "descending", "from", "group", "into", "orderby", "select",
"var", "where" };
string CleanUp(string tableName){
string result=tableName;
//strip blanks
result=result.Replace(" ","");
if(reservedWords.Contains(result)){
result += "_SafeName";
}
//put your logic here...
return result;
}
The CleanUp function already fixes table and column names, so I put my logic there. I hope I got all the reserved words...
This is also on a smaller project where the database is MySQL.
So this hack/patch is only for c#/mysql, but very easy to move to other DBs/languages.

You can fix this yourself very easily.
Open up the SQLServer.ttinclude file. It's a T4 template file that SubSonic uses to generate your code.
Almost halfway down, on line 155, you will find the LoadColumns function:
List<Column> LoadColumns(Table tbl){
var result=new List<Column>();
var cmd=GetCommand(COLUMN_SQL);
cmd.Parameters.AddWithValue("#tableName",tbl.Name);
using(IDataReader rdr=cmd.ExecuteReader(CommandBehavior.CloseConnection)){
while(rdr.Read()){
Column col=new Column();
col.Name=rdr["ColumnName"].ToString();
col.CleanName=CleanUp(col.Name);
col.DataType=rdr["DataType"].ToString();
col.SysType=GetSysType(col.DataType);
col.DbType=GetDbType(col.DataType);
col.AutoIncrement=rdr["IsIdentity"].ToString()=="1";
col.IsNullable=rdr["IsNullable"].ToString()=="YES";
int.TryParse(rdr["MaxLength"].ToString(),out col.MaxLength);
result.Add(col);
}
}
return result;
}
Simply add logic here in this function when it assigns the name to change from the column name in your database, "string", to something a little more sane.
List<Column> LoadColumns(Table tbl){
var result=new List<Column>();
var cmd=GetCommand(COLUMN_SQL);
cmd.Parameters.AddWithValue("#tableName",tbl.Name);
using(IDataReader rdr=cmd.ExecuteReader(CommandBehavior.CloseConnection)){
while(rdr.Read()){
Column col=new Column();
var rawName = rdr["ColumnName"].ToString();
if (rawName.Equals("string")) {
col.Name="StringColumn";
} else {
col.Name=rawName;
}
col.CleanName=CleanUp(col.Name);
col.DataType=rdr["DataType"].ToString();
col.SysType=GetSysType(col.DataType);
col.DbType=GetDbType(col.DataType);
col.AutoIncrement=rdr["IsIdentity"].ToString()=="1";
col.IsNullable=rdr["IsNullable"].ToString()=="YES";
int.TryParse(rdr["MaxLength"].ToString(),out col.MaxLength);
result.Add(col);
}
}
return result;
}
Also, were you to put some time into it and make it handle all keywords (say, through lookup in a static dictionary and change the names by appending a common suffix or dictionary substitution), then you could submit back a patch to the project and contribute instead of trying to talk the other developers into doing it.

If you change the LoadColumns method you'll still have problems on the foreign keys, but you can replace your CleanUp(string tableName) method in Setting.ttinclude to:
string CleanUp(string tableName){
string result=tableName;
//strip blanks
result=result.Replace(" ","");
//put your logic here...
if (reservedWords.Contains(result)) {
result = "_" + result;
}
return result;
}
this worked like a charm for me.

Related

How to match a string exactly OR exact substring from beginning using Regular Expression

I'm trying to build a regex query for a database and it's got me stumped. If I have a string with a varying number of elements that has an ordered structure how can I find if it matches another string exactly OR some exact sub string when read from the left?
For example I have these strings
Canada.Ontario.Toronto.Downtown
Canada.Ontario
Canada.EasternCanada.Ontario.Toronto.Downtown
England.London
France.SouthFrance.Nice
They are structured by most general location to specific, left to right. However, the number of elements varies with some specifying a country.region.state and so on, and some just country.town. I need to match not only the words but the order.
So if I want to match "Canada.Ontario.Toronto.Downtown" I would want to both get #1 and #2 and nothing else. How would I do that? Basically running through the string and as soon as a different character comes up it's not a match but still allow a sub string that ends "early" to match like #2.
I've tried making groups and using "?" like (canada)?.?(Ontario)?.? etc but it doesn't seem to work in all situations since it can match nothing as well.
Edit as requested:
Mongodb Database Collection:
[
{
"_id": "doc1",
"context": "Canada.Ontario.Toronto.Downtown",
"useful_data": "Some Data"
},
{
"_id": "doc2",
"context": "Canada.Ontario",
"useful_data": "Some Data"
},
{
"_id": "doc3",
"context": "Canada.EasternCanada.Ontario.Toronto.Downtown",
"useful_data": "Some Data"
},
{
"_id": "doc4",
"context": "England.London",
"useful_data": "Some Data"
},
{
"_id": "doc5",
"context": "France.SouthFrance.Nice",
"useful_data": "Some Data"
},
{
"_id": "doc6",
"context": "",
"useful_data": "Some Data"
}
]
User provides "Canada", "Ontario", "Toronto", and "Downtown" values in that order and I need to use that to query doc1 and doc2 and no others. So I need a regex pattern to put in here: collection.find({"context": {$regex: <pattern here>}) If it's not possible I'll just have to restructure the data and use different methods of finding those docs.
At each dot, start an nested optional group for the next term, and add start and end anchors:
^Canada(\.Ontario(\.Toronto(\.Downtown)?)?)?$
See live demo.

Cannot convert std::string to QJsonArray in Qt

The following text is a bit of std::string text that is generated by another app (I do not have control of what the app sends me). I have tried for days to get this converted into a QJsonArray and cannot figure this out. I am using C++ within QT. Does anyone have a bit of direction or sample C++ code that could solve this?
{
"saved_mik_yous": {
"2120ce2d-a5b1-49b8-8384-3781b7b2d73b": {
"name": null,
"id": "2120ce2d-a5b1-49b8-8384-3781b7b2d73b",
"start": 1565288936.1127193,
"end": 1565289128.1236603,
"mixxer": 128.567505,
"mik_source": "algo"
},
"bf855c0d-a71d-42ea-b3ef-7cbe0e2c7a3d": {
"name": null,
"id": "bf855c0d-a71d-42ea-b3ef-7cbe0e2c7a3d",
"start": 1565301673.4609745,
"end": 1565301832.665656,
"mixxer": 308.485107,
"mik_source": "algo"
}
},
"mik_you_state": "completed"
}
All you have to do is this:
QJsonDocument doc = QJsonDocument::fromJson(QByteArray::fromStdString(str));
Then, you can access the values for the keys for example as:
doc["saved_mik_yous"]
And so on.
Mind you, the json you are showing seems to be an object rather than an array since it contains key-value pairs rather than a list of elements inside square brackets. So, whilst it does not matter when you are converting the std::string into a QJsonDocument, you need to access the values by keys rather than indices.
If you are getting dynamic json which can be either an array or object, you can always check for the type with isArray() or isObject() to convert it to the right type.

distinct value with count and condition mongo DB

I am new to MongoDB, and so far it seems like it is trying to go out of it's way to make doing simple things overly complex.
I am trying to run the below MYSQL equivalent
SELECT userid, COUNT(*)
FROM userinfo
WHERE userdata like '%PC% or userdata like '%wire%'
GROUP BY userid
I have mongo version 3.0.4 and i am running MongoChef.
I tried using something like the below:
db.userinfo.group({
"key": {
"userid": true
},
"initial": {
"countstar": 0
},
"reduce": function(obj, prev) {
prev.countstar++;
},
"cond": {
"$or": [{
"userdata": /PC/
}, {
"userdata": /wire/
}]
}
});
but that did not like the OR.
when I took out the OR, thinking I’d do half at a time and combine results in excel, i got an error "group() can't handle more than 20000 unique keys", and the result table should be much bigger than that.
From what I can tell online, I could do this using aggregation pipelines, but I cannot find any clear examples of how to do that.
This seems like it should be a simple thing that should be built in to any DB, and it makes no sense to me that it is not.
Any help is much appreciated.
/
Works "sooo" much better with the .aggregate() method, as .group() is a very outmoded way of approaching this:
db.userinfo.aggregate([
{ "$match": {
"userdata": { "$in":[/PC/,/wire/] }
}},
{ "$group": {
"_id": "$userid",
"count": { "$sum": 1 }
}}
])
The $in here is a much shorter way of writing your $or condition as well.
This is native code as opposed to JavaScript translation as well, so it runs much faster.
Here is an example which counts the distinct number of first_name values for records with a last_name value of “smith”:
db.collection.distinct("first_name", {“last_name”:”smith”}).length;
output
3

How to wisely combine shingles and edgeNgram to provide flexible full text search?

We have an OData-compliant API that delegates some of its full text search needs to an Elasticsearch cluster.
Since OData expressions can get quite complex, we decided to simply translate them into their equivalent Lucene query syntax and feed it into a query_string query.
We do support some text-related OData filter expressions, such as:
startswith(field,'bla')
endswith(field,'bla')
substringof('bla',field)
name eq 'bla'
The fields we're matching against can be analyzed, not_analyzed or both (i.e. via a multi-field).
The searched text can be a single token (e.g. table), only a part thereof (e.g. tab), or several tokens (e.g. table 1., table 10, etc).
The search must be case-insensitive.
Here are some examples of the behavior we need to support:
startswith(name,'table 1') must match "Table 1", "table 100", "Table 1.5", "table 112 upper level"
endswith(name,'table 1') must match "Room 1, Table 1", "Subtable 1", "table 1", "Jeff table 1"
substringof('table 1',name) must match "Big Table 1 back", "table 1", "Table 1", "Small Table12"
name eq 'table 1' must match "Table 1", "TABLE 1", "table 1"
So basically, we take the user input (i.e. what is passed into the 2nd parameter of startswith/endswith, resp. the 1st parameter of substringof, resp. the right-hand side value of the eq) and try to match it exactly, whether the tokens fully match or only partially.
Right now, we're getting away with a clumsy solution highlighted below which works pretty well, but is far from being ideal.
In our query_string, we match against a not_analyzed field using the Regular Expression syntax. Since the field is not_analyzed and the search must be case-insensitive, we do our own tokenizing while preparing the regular expression to feed into the query in order to come up with something like this, i.e. this is equivalent to the OData filter endswith(name,'table 8') (=> match all documents whose name ends with "table 8")
"query": {
"query_string": {
"query": "name.raw:/.*(T|t)(A|a)(B|b)(L|l)(E|e) 8/",
"lowercase_expanded_terms": false,
"analyze_wildcard": true
}
}
So, even though, this solution works pretty well and the performance is not too bad (which came out as a surprise), we'd like to do it differently and leverage the full power of analyzers in order to shift all this burden at indexing time instead of searching time. However, since reindexing all our data will take weeks, we'd like to first investigate if there's a good combination of token filters and analyzers that would help us achieve the same search requirements enumerated above.
My thinking is that the ideal solution would contain some wise mix of shingles (i.e. several tokens together) and edge-nGram (i.e. to match at the start or end of a token). What I'm not sure of, though, is whether it is possible to make them work together in order to match several tokens, where one of the tokens might not be fully input by the user). For instance, if the indexed name field is "Big Table 123", I need substringof('table 1',name) to match it, so "table" is a fully matched token, while "1" is only a prefix of the next token.
Thanks in advance for sharing your braincells on this one.
UPDATE 1: after testing Andrei's solution
=> Exact match (eq) and startswith work perfectly.
A. endswith glitches
Searching for substringof('table 112', name) yields 107 docs. Searching for a more specific case such as endswith(name, 'table 112') yields 1525 docs, while it should yield less docs (suffix matches should be a subset of substring matches). Checking in more depth I've found some mismatches, such as "Social Club, Table 12" (doesn't contain "112") or "Order 312" (contains neither "table" nor "112"). I guess it's because they end with "12" and that's a valid gram for the token "112", hence the match.
B. substringof glitches
Searching for substringof('table',name) matches "Party table", "Alex on big table" but doesn't match "Table 1", "table 112", etc. Searching for substringof('tabl',name) doesn't match anything
UPDATE 2
It was sort of implied but I forgot to explicitely mention that the solution will have to work with the query_string query, mainly due to the fact that the OData expressions (however complex they might be) will keep getting translated into their Lucene equivalent. I'm aware that we're trading off the power of the Elasticsearch Query DSL with the Lucene's query syntax, which is a bit less powerful and less expressive, but that's something that we can't really change. We're pretty d**n close, though!
UPDATE 3 (June 25th, 2019):
ES 7.2 introduced a new data type called search_as_you_type that allows this kind of behavior natively. Read more at: https://www.elastic.co/guide/en/elasticsearch/reference/7.2/search-as-you-type.html
This is an interesting use case. Here's my take:
{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer",
"filter": ["lowercase"]
},
"my_edge_ngram_analyzer": {
"tokenizer": "my_edge_ngram_tokenizer",
"filter": ["lowercase"]
},
"my_reverse_edge_ngram_analyzer": {
"tokenizer": "keyword",
"filter" : ["lowercase","reverse","substring","reverse"]
},
"lowercase_keyword": {
"type": "custom",
"filter": ["lowercase"],
"tokenizer": "keyword"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "25"
},
"my_edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "25"
}
},
"filter": {
"substring": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 25
}
}
}
},
"mappings": {
"test_type": {
"properties": {
"text": {
"type": "string",
"analyzer": "my_ngram_analyzer",
"fields": {
"starts_with": {
"type": "string",
"analyzer": "my_edge_ngram_analyzer"
},
"ends_with": {
"type": "string",
"analyzer": "my_reverse_edge_ngram_analyzer"
},
"exact_case_insensitive_match": {
"type": "string",
"analyzer": "lowercase_keyword"
}
}
}
}
}
}
}
my_ngram_analyzer is used to split every text into small pieces, how large the pieces are depends on your use case. I chose, for testing purposes, 25 chars. lowercase is used since you said case-insensitive. Basically, this is the tokenizer used for substringof('table 1',name). The query is simple:
{
"query": {
"term": {
"text": {
"value": "table 1"
}
}
}
}
my_edge_ngram_analyzer is used to split the text starting from the beginning and this is specifically used for the startswith(name,'table 1') use case. Again, the query is simple:
{
"query": {
"term": {
"text.starts_with": {
"value": "table 1"
}
}
}
}
I found this the most tricky part - the one for endswith(name,'table 1'). For this I defined my_reverse_edge_ngram_analyzer which uses a keyword tokenizer together with lowercase and an edgeNGram filter preceded and followed by a reverse filter. What this tokenizer basically does is to split the text in edgeNGrams but the edge is the end of the text, not the start (like with the regular edgeNGram).
The query:
{
"query": {
"term": {
"text.ends_with": {
"value": "table 1"
}
}
}
}
for the name eq 'table 1' case, a simple keyword tokenizer together with a lowercase filter should do it
The query:
{
"query": {
"term": {
"text.exact_case_insensitive_match": {
"value": "table 1"
}
}
}
}
Regarding query_string, this changes the solution a bit, because I was counting on term to not analyze the input text and to match it exactly with one of the terms in the index.
But this can be "simulated" with query_string if the appropriate analyzer is specified for it.
The solution would be a set of queries like the following (always use that analyzer, changing only the field name):
{
"query": {
"query_string": {
"query": "text.starts_with:(\"table 1\")",
"analyzer": "lowercase_keyword"
}
}
}

Mongo db query with and

I have my data in mongo database, and my collection has two fields namely created_at, text, I want to extract a documents having the words like bank, chile and fin in the text field and having created_at value as jan 15.. I am new to mongodb and when I tried to use the below query it gives the error as "unexpected token"
query:
db.tweet.find({$and : [{"created_at" : /.*jan 15.*/i}, {"text : /.*bank.*/i, /.*chile.*/i, /.*fin.*/i "}]})
Please suggest me corrections.Thanks in advance
This is written wrong. Partly in a redundant use of $and where it is not needed, and secondly I think you mean $or for the second condition. Which actually translates to using the regex form in the easiest sense:
db.tweet.find({
"created_at": /.*jan 15.*/i,
"text": /bank|chile|fin/i
})
Actually, use "word boundaries" for more exact "word" matching:
db.tweet.find({
"created_at": /.*jan 15.*/i,
"text": /\bbank\b|\bchile\b|\bfin\b/i
})
If you do in fact mean "and" which means the "text" field must contain "all" of those strings, then you need an $and operator. But different to how you did it:
db.tweet.find({
"created_at": /.*jan 15.*/i,
"$and": [
{ "text": /.*bank.*/i },
{ "text": /.*chile.*/i },
{ "text": /.*fin.*/i }
]
}
The purpose of $and is to allow an "array" construct where the "same" field name is referenced for different conditions. This is so the structure is valid and "key names" are not duplicated.
Otherwise all MongoDB arguments are implicitly an "and" argument always.