I need to check a string, in this case called "word," to see if it contains a letter (or character if you prefer).
I don't really need to know the location of the letter, simply if it is present. Currently I have this:
if character in word then //both "word" and "character" are string variables.
begin
{some code}
end;
Trouble is, is that this is just me ripping off a python function:
if character in word: //In python I would use an array for "word"
//some code
And this doesn't seem to work in pascal.
This may seem like a dumb question, but I am very new to pascal and indeed to asking for help on stack exchange. Any help as to how to check for characters in strings would be greatly appreciated.
if pos(character,word)>0 then
... some code
pos is overloaded for both characters and strings (for substring matches)
Note that searching is case sensitive. Uppercase() both character and word if you want otherwise.
Related
I am struggling with a little problem concerning regular expressions.
I want to replace all odd length substrings of a specific character with another substring of the same length but with a different character.
All even sequences of the specified character should remain the same.
Simplified example: A string contains the letters a,b and y and all the odd length sequences of y's should be replaced by z's:
abyyyab -> abzzzab
Another possible example might be:
ycyayybybcyyyyycyybyyyyyyy
becomes
zczayybzbczzzzzcyybzzzzzzz
I have no problem matching all the sequences of odd length using a regular expression.
Unfortunately I have no idea how to incorporate the length information from these matches into the replacement string.
I know I have to use backreferences/capture groups somehow, but even after reading lots of documentation and Stack Overflow articles I still don't know how to pursue the issue correctly.
Concerning possible regex engines, I am working with mainly with Emacs or Vim.
In case I have overlooked an easier general solution without a complicated regular expression (e.g. a small and fixed series of simple search and replace commands), this would help too.
Here's how I'd do it in vim:
:s/\vy#<!y(yy)*y#!/\=repeat('z', len(submatch(0)))/g
Explanation:
The regex we're using is \vy#<!y(yy)*y#!. The \v at the beginning turns on the magic option, so we don't have to escape as much. Without it, we would have y\#<!y\(yy\)*y\#!.
The basic idea for this search, is that we're looking for a 'y' y followed by a run of pairs of 'y's,(yy)*. Then we add y#<! to guarantee there isn't a 'y' before our match, and add y\#! to guarantee there isn't a 'y' after our match.
Then we replace this using the eval register, i.e. \=. From :h sub-replace-\=:
*sub-replace-\=* *s/\=*
When the substitute string starts with "\=" the remainder is interpreted as an
expression.
The special meaning for characters as mentioned at |sub-replace-special| does
not apply except for "<CR>". A <NL> character is used as a line break, you
can get one with a double-quote string: "\n". Prepend a backslash to get a
real <NL> character (which will be a NUL in the file).
The "\=" notation can also be used inside the third argument {sub} of
|substitute()| function. In this case, the special meaning for characters as
mentioned at |sub-replace-special| does not apply at all. Especially, <CR> and
<NL> are interpreted not as a line break but as a carriage-return and a
new-line respectively.
When the result is a |List| then the items are joined with separating line
breaks. Thus each item becomes a line, except that they can contain line
breaks themselves.
The whole matched text can be accessed with "submatch(0)". The text matched
with the first pair of () with "submatch(1)". Likewise for further
sub-matches in ().
TL;DR, :s/foo/\=blah replaces foo with blah evaluated as vimscript code. So the code we're evaluating is repeat('z', len(submatch(0))) which simply makes on 'z' for each 'y' we've matched.
Recently in an Interview, I was asked a question that I have a string with a couple of billions of characters in it. The string contains ASCII and non-ASCII characters in it. The task was to remove all the non-ASCII characters and in output, the string must contain only ASCII characters. The solution must be a time efficient algorithm.
I suggested two approaches:
Make an array of ASCII characters. Loop over string check if the current character is in ASCII characters array. If yes then skip or else replace that with null.
Obviously, it's not a time efficient solution.
Secondly, I suggested that if we partition the array in half and a further half and so on. I'll still be checking ASCII characters like in above approaches.
This conversation lead to a discussion where the interviewer was looking for a solution in which we don't have to go character by character and he suggested using Regular Expressions.
My Question here is when we match a pattern using Regular Expressions, will it check the string character by character or it'll use some other approach. I was sure the Regular Expressions will find/match character by character.
Can anyone please clear my doubt?
Thanks
You could use a range like this:
[\x20-\x7E]
This range matches every character from [space] to ~. The printable ascii range.
Regular expressions indeed do use optimisations for cases where a sequence of characters is matched: simply explained, if you're looking for "XXXXXXX", you know you can test every 7-th character, and only look closer once you find an X there. However, you need to filter every single character: this means, a regular expression would be not more efficient (and indeed it would be less efficient, because you would need to go in and out of regexp to process your discoveries).
Instead, the efficient method (assuming C-like architecture) would be to start with two indices (source and result) at zero, and process the string: if the character has the high-bit clear, it's ASCII: copy from source to result, increment both indices. If the high-bit is set, it's non-ASCII: just increment source index.
void removeNonAscii(char *str) {
int s, r;
for (s = 0, r = 0; str[s]; s++) {
if (!(str[s] & 128)) {
str[r++] = str[s];
}
}
str[r] = 0;
}
(or you can make a non-destructive one, by copying into a new string instead of overwriting the current one; the algorithm is the same.)
What I mean is that I need a regular expression that can match either something like this...
"I am a sentence."
or something like this...
"I am a sentence.
(notice the missing quotation mark at the end of the second one). My attempt at this so far is
["](\\.|[^"])*["]*
but that isn't working. Thanks for the help!
Edit for clarity: I am intending for this to be something like a C style string. I want functionality that will match with a string even if the string is not closed properly.
You could write the pattern as:
["](\\.|[^"\n])*["]?
which only has two small changes:
It excludes newline characters inside the string, so that the invalid string will only match to the end of the line. (. does not match newline, but a negated character class does, unless of course the newline is explicitly negated.)
It makes the closing doubke quote optional rather than arbitrarily repeated.
However, it is hard to imagine a use case in which you just want to silently ignore the error. So I wiuld recommend writing two rules:
["](\\.|[^"\n])*["] { /* valid string */ }
["](\\.|[^"\n])* { /* invalid string */ }
Note that the first pattern is guaranteed to match a valid string because it will match one more character than the other pattern and (f)lex always goes with the longer match.
Also, writing two overlapping rules like that does not cause any execution overhead, because of the way (f)lex compiles the patterns. In effect, the common prefix is automatically factored out.
We need Java Regex to find if a given String contains a set of characters in the same order of their occurrence.
E.g. if the given String is "TYPEWRITER",
the following strings should return a match:
"YERT", "TWRR" & "PEWRR" (character by character match in the order of occurrence),
but not
"YERW" or "YERX" (this contains characters either not present in the given string or doesn't match the order of occurrence).
This can be done by character by character matching in a for loop, but it will be more time consuming. A regex for this or any pointers will be highly appreciated.
First of all REGEX has nothing to do with it. Regex is powerful but not that much powerful to accomplish this.
The thing you are asking is a part of Longest Common Subsequence(LCS) Algorithm implementation. For your case you need to change the algorithm a bit. I mean instead of matching part of string from both, you'll require to match your one string as a whole subsequence from the Larger one.
The LCS is a dynamic algorithm and so far this is the fastest way to achieve this. If you take a look at the LCS Example here you'll find that what I am talking about.
I am writing a C++ program to solve a common problem of message decoding. Part of the problem requires me to get a bunch of random characters, including '\', and map them to a key, one by one.
My program works fine in most cases, except that when I read characters such as '\' from a string, I obviously get a completely different character representation (e.g. '\0' yields a null character, or '\' simply escapes itself when it needs to be treated as a character).
Since I am not supposed to have any control on what character keys are included, I have been desperately trying to find a way to treat special control characters such as the backslash as the character itself.
My questions are basically these:
Is there a way to turn all special characters off within the scope of my program?
Is there a way to override current digraphs definitions of special characters and define them as something else (like digraphs using very rare keys)?
Is there some obscure method on the String class that I missed which can force the actual character on the string to be read instead of the pre-defined constant?
I have been trying to look for a solution for hours now but all possible fixes I've found are for other languages.
Any help is greatly appreciate.
If you read in a string like "\0" from stdin or a file, it will be treated as two separate characters: '\\' and '0'. There is no additional processing that you have to do.
Escaping characters is only used for string/character literals. That is to say, when you want to hard-code something into your source code.