wxStyledTextCtrl how to style keywords - c++

Having some trouble getting wxStyledTextCtrl to colourise my word listings.
x->m_ctlEdit->SetKeyWords(0,"true false");
x->SetWordChars(wxT("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMONPQRSTUVWXYZ._") );
x->StyleSetForeground(wxSTC_HPHP_WORD, wxColour(0x67,0xa6,0xff));
true and false receive no colouring this way, I've used the StyleSetForeground on many of the other definitions and it all works fine, just having trouble with the word lists.
As a second question, How do I seperate colours for different word listings? I'm aware I can set different keywords list with the number identifier, but how do I apply the styles per keyword list since the function doesn't take in an identifier?
Note:
Using the HTML/PHP lexer that comes as a default option with wxStyledTextCtrl

For the wxSTC_LEX_HTML html lexer or the wxSTC_LEX_PHPSCRIPT php lexer, you need to specify key word set 4. So for example:
x->m_ctlEdit->SetKeyWords(4,"true false");
If you're using the html lexer, you can learn this by calling x->m_ctlEdit->DescribeKeyWordSets(); which will return the following list:
HTML elements and attributes
JavaScript keywords
VBScript keywords
Python keywords
PHP keywords
SGML and DTD keywords
In this case, the 0-based index of the PHP keywords is 4, so this would be the number to pass in to the SetKeyWords method.
However this way of checking this fails when using the PHP lexer since calling DescribeKeyWordSets will only return "PHP keywords". So you would think you should call SetKeyWords with 0, but in fact you still need to use 4 because the php script lexer is the same as the html lexer. That just seems to be an oddity of Scintilla.
On an unrelated note, I think the call to SetWordChars is unnecessary. According to the documentation, that is for searching by words and not for keywords.
As a second question, How do I seperate colours for different word listings?
That depends on the lexer. For example, the C lexer offers the following keyword sets
Primary keywords and identifiers
Secondary keywords and identifiers
Documentation comment keywords
Global classes and typedefs
Preprocessor definitions
which correspond to the lexer states wxSTC_C_WORD, wxSTC_C_WORD2,wxSTC_C_COMMENTDOCKEYWORD, etc.
Unfortunately, as described above the html lexer only offers 1 keyword set for PHP.

Related

What does “\p” in comments means?

During reading LLVM source code, I find something different in comments, e.g.
/// If \p DebugLogging is true, we'll log our progress to llvm::dbgs().
What does \p means here?
LLVM uses Doxygen for generating documentation, the /// marker is one of the many ways of creating a special comment block that Doxygen will parse to form documentation.
Within a special comment block, \p is simply one of the mark-up commands, this particular one renders the following word in typewriter font (fixed rather than proportional). The \c option is an alias for the same thing.
3 slashes is one of the ways that doxygen comments are identified.
The \p tag has some meaning, see it's documentation: https://www.doxygen.nl/manual/commands.html#cmdp
Displays the parameter using a typewriter font. You can use this command to refer to member function parameters in the running text.
I agree . These seems to be Doxygen commands to format typewritten fonts but since its in comments its not showing the 'font format' but the character itself.
Comments are not touched or processed by Doxygen. They have their own formatting. The /c /p precedes some important keywords (methods, members, parameters etc) only and not arbitrary. The author in all good intentions wanted people to identify the keywords but in comments, all are equal.

Scanning a language with non-delimited strings with nested tokens

I want to create a lexer/parser for a language that has non-delimited strings.
Which part of the language is a string is defined by the command preceding it.
For example it has statements that look like this:
pause 5
alert Hello world[CRLF] this contains 'pause' once (1)
Alert in this instance can end with any string, including keywords and numbers.
Further complicating things, the text can contain tags like [CRLF] that I want to separate too.
Ideally I'd want this to be broken up into:
[PAUSE][INT 5]
[ALERT][STR "Hello world"][CRLF][STR " this contains 'pause' once (1)"]
I'm currently using flex but from what I've gathered this kind of thing isn't possible with flex.
How can I achieve what I want here?
(Since one of your tags is "regex", I'll suggest a non-flex approach.)
From the example, it seems like you could just:
match each line against ^(\w+) (.+) to obtain command and arguments-text, and then
get individual arguments by splitting the arguments-text on (\[\w+\]) (assuming your regex library's split function can return both the splitter-strings and the split-strings).
It's possible your actual situation is more complex and something like flex makes more sense, but I'm not really seeing it so far.

Use reserved keyword as alias in doctrine query builder

$em->createQueryBuilder()
->select("MIN(m.price) AS min")
->addSelect('MAX(m.price) AS max')
->from('AppBundle:Sites', 'm');`
How can I escape min to make this work? I tried to change min Alias to something like _min instead but there should be a better way.
I tried both single quotes and backticks but neither worked.
You won't be able to use min or max as an alias since it is simply not available with the current grammar of the DQL. You can find this info in the section of the documentation that is defining the grammar of DQL. There you will find out the following:
In Select Expressions:
SimpleSelectExpression ::= (...) [["AS"] AliasResultVariable]
In Identifiers:
/* Alias ResultVariable declaration (the "total" of "COUNT(*) AS total") */
AliasResultVariable = identifier
And eventually, in Terminals:
identifier (name, email, …) must match [a-z_][a-z0-9_]*
As you can see, there is nothing in there to help you escape your keyword in anyway. Thus, as it is, when stumbling upon min, the Lexer will identify it as the MIN function (see this section of the code) and not as an identifier, hence the error.
Long story short, you will have to either rely on a native query or use an alias name that is not one of the reserved keywords listed here.
Note: Doctrine allows you to implement your own quoting strategy as discussed in this post but the issue is unrelated. Here, the problem with your alias is that it is matched as a function by the DQL parser which is unexpected at this position.
Using Mysql, you could escape it using the backtick `min`
Using Postgres, you could escape it using the doublequote "min"
You can use another word like minimum as an alias to avoid being database dependant

Using variables in reStructuredText

I'm writing a long HOWTO in reStructuredText format and wondering if there's a way to let user specify values for a couple variables (hostname, ip address) at the top so the rest of the document would be filled with those automatically?
Like me, you are probably looking for substitution. At the bottom of the section you'll find how to replace text.
Substitution Definitions
Doctree element: substitution_definition.
Substitution definitions are indicated by an explicit markup start
(".. ") followed by a vertical bar, the substitution text, another
vertical bar, whitespace, and the definition block. Substitution text
may not begin or end with whitespace. A substitution definition block
contains an embedded inline-compatible directive (without the leading
".. "), such as "image" or "replace".
Specifically about text replacement:
Replacement text
The substitution mechanism may be used for simple macro substitution. This may be appropriate when the replacement text is
repeated many times throughout one or more documents, especially if it
may need to change later. A short example is unavoidably contrived:
|RST|_ is a little annoying to type over and over, especially
when writing about |RST| itself, and spelling out the
bicapitalized word |RST| every time isn't really necessary for
|RST| source readability.
.. |RST| replace:: reStructuredText
.. _RST: http://docutils.sourceforge.net/rst.html
reStructuredText is a markup language to define static content. HTML content (I assumed the desired output format is HTML) is typically generated from reStructuredText on build time and then released/shipped to the user.
To allow users to specify variables, you would need a solution on top of reStructuredText, for example:
Ship the content with a JavaScript plugin that dynamically replaces specific strings in the HTML document with user input.
Generate the documentation on-the-fly after the user has specified the variables.
Note that these examples are not necessarily particularly viable solutions.

CloudSearch wildcard query not working with 2013 API after migration from 2011 API

I've recently upgraded a CloudSearch instance from the 2011 to the 2013 API. Both instances have a field called sid, which is a text field containing a two-letter code followed by some digits e.g. LC12345. With the 2011 API, if I run a search like this:
q=12345*&return-fields=sid,name,desc
...I get back 1 result, which is great. But the sid of the result is LC12345 and that's the way it was indexed. The number 12345 does not appear anywhere else in any of the resulting document fields. I don't understand why it works. I can only assume that this type of query is looking for any terms in any fields that even contain the number 12345.
The reason I'm asking is because this functionality is now broken when I query using the 2013 API. I need to use the structured query parser, but even a comparable wildcard query using the simple parser is not working e.g.
q.parser=simple&q=12345*&return=sid,name,desc
...returns nothing, although the document is definitely there i.e. if I query for LC12345* it finds the document.
If I could figure out how to get the simple query working like it was before, that would at least get me started on how to do the same with the structured syntax.
Why it's not working
CloudSearch v1 (2011) had a different way of tokenizing mixed alpha+numeric strings. Here's the logic as described in the archived docs (emphasis mine).
If a string contains both alphabetic and numeric characters and is at
least three and no more than nine characters long, the alphabetic and
numeric portions of the string are treated as separate tokens. For
example, the string DOC298 is tokenized into two terms: doc 298
CloudSearch v2 (2013) text processing follows Unicode Text Segmentation, which does not specify that behavior:
Do not break within sequences of digits, or digits adjacent to letters (“3a”, or “A3”).
Solution
You should just be able to search *12345 to get back results with any prefix. There may be some edge cases like getting back results you don't want (things with more preceding digits like AB99912345); I don't know enough about your data to say whether those are real concerns.
Another option would would be to index the numeric prefix separately from the alphabetical suffix but that's additional work that may be unnecessary.
I'm guessing you are using Cloudsearch in English, so maybe this isn't your specific problem, but also watch out for Stopwords in your search queries:
https://docs.aws.amazon.com/cloudsearch/latest/developerguide/configuring-analysis-schemes.html#stopwords
In your example, the word "jo" is a stop word in Danish and another languages, and of course, supported languages, have a dictionary of stop words that has very common ones. If you don't specify a language in your text field, it will be English. You can see them here: https://docs.aws.amazon.com/cloudsearch/latest/developerguide/text-processing.html#text-processing-settings