Blue Prism Replace non English characters - replace

Looking for a function in Blue Prism to replace non english characters with english characters.
Example:
Input: Andrés Chávez 
Output: Andres Chavez

I have a code prepared just for that :)
That's a C# code, with one input string and one output string. They are conveniently named "input" and "output".
string help = input.Normalize(System.Text.NormalizationForm.FormD);
System.Text.StringBuilder sb = new System.Text.StringBuilder();
for (int i = 0; i < help.Length; i++)
{
System.Globalization.UnicodeCategory uc =
System.Globalization.CharUnicodeInfo.GetUnicodeCategory(help[i]);
if (uc != System.Globalization.UnicodeCategory.NonSpacingMark)
{
sb.Append(help[i]);
}
}
output = sb.ToString().Normalize(System.Text.NormalizationForm.FormC);
That code requires namespace "System.Globalization". It needs to be added into Code Options of your business object. .
I hope you'll be able to get that working easy.

thanks to #Andrzej Kaczor from 2020! Code works perfectly. Just make sure that you use C# as language in your Object, you have System.Globalization namespaces imported as shown and you have input/output set in corresponding tabs in your Code stage.

Hey you could just do this. It replaces upper and lower case of some diacritic letters.
I know it's not looking pretty. But it does what I want it to do.
textEdit1 = textEdit.Replace("ě","e").Replace("š","s").Replace("ř","r").Replace("č", "c").Replace("ž", "z").Replace("ý", "y").Replace("á", "a").Replace("í", "i").Replace("é", "e").Replace("ň", "n").Replace("ť", "t").Replace("ď", "d").Replace("Ě", "E").Replace("Š", "S").Replace("Č", "C").Replace("Ř", "R").Replace("Ž", "Z").Replace("Ý", "Y").Replace("Á", "A").Replace("Í", "I").Replace("É", "E").Replace("Ň", "N").Replace("Ť", "T").Replace("Ď", "D");

Related

C++ Recognize UTF-8 or Hebrew languague

I'm working on some code that his target it recognize if the strings equal
Have two type of string - string 1 came from text file , string 2 came from server side from chat packet
i try very different options , this my last trying but nothing success the sentence not recognize has equal at all for example this string on text file "בדיקה" and the string that came from packet side is "בדיקה" too and still nothing equal
`
if(gSentenceEvent.IsRunning())
{
std::string s = lpMsg->message;
int Len = strlen(gSentenceEvent.RandomSentence);
std::string str;
str.assign(gSentenceEvent.RandomSentence, gSentenceEvent.RandomSentence + Len);
if (str.compare(s) == 0)
{
gSentenceEvent.SetRunning(false);
gNotice.GCNoticeSendToAll(0,0,0,0,0,0,gMessage.GetMessage(1130));
gNotice.GCNoticeSendToAll(0,0,0,0,0,0,gMessage.GetMessage(1127),lpObj->Name);
}
else
{
gNotice.GCNoticeSendToAll(0,0,0,0,0,0,"%s Try %s\n",lpObj->Name,s);
gNotice.GCNoticeSendToAll(0,0,0,0,0,0,"Answer Is %s\n",str);
}
}
`
if someone have any idea for solving the issue i will be happy to hear some ways that recognize it well
Thanks in advance !
trying convert the text for wstring as well but still nothing
when i check the hex value of both sentence even they equal
gNotice.GCNoticeSendToAll(0,0,0,0,0,0,"%.2X",lpMsg->message);
gNotice.GCNoticeSendToAll(0,0,0,0,0,0,"%.2X",gSentenceEvent.RandomSentence);
that really came different from example "בדיקהה 3" On both sides
ServerSide = 22B6970
TextFile = D9C0B0

wxStyledTextCtrl non ASCII characters

I realized that in wxStyledTextCtrl if the user's comments contains non-ASCII characters, the positions reported by WordStartPosition and WordEndPosition are wrong. What is a good way of dealing with non-ASCII characters in wxStyledTextCtrl? How can I identify the characters that are non-ASCII?
You've probably answered this question by now, but in the experiments I've done, WordStartPosition and WordEndPosition still work with non-ASCII characters. The data internally in the control is stored in UTF-8 format, and those functions give the number of bytes in that data where the word starts and ends. If that's not what's happening for you, can you post a sample where they don't work?
As for determining which characters are and aren't ASCII, something like the following seems to work (assuming a is the start and b is the end position):
wxString s = m_stc->GetTextRange(a,b);
for (wxString::const_iterator i = s.begin(); i != s.end(); ++i)
{
wxUniChar uni_ch = *i;
if(uni_ch.IsAscii())
{
//something
}
else
{
//something else
}
}
One thing I did notice is that if you use a value for a or b that falls in the middle of one of the non-ASCII characters, the resulting string will be empty. I hope this of some help if you haven't already found a solution.

C++ search a string

I am having a really hard time with this problem...
Write a program that reads two strings (that do not contain blanks)
called searchPattern and longSequence.
The program will display in the screen the positions where
searchPattern appears in longSequence.
For example, when
seachPattern is asd
and longSewuence is asdfasdfasdfasdf
(the positions are 0123456789012345)
the program will display 0, 4, 8, 12.
Another example, when
seachPattern is jj
and longSewuence is kjlkjjlkjjjlkjjjkl
(the positions are 012345678901234567)
the program will display 4, 8, 9, 13, 14.
can anyone help?
Some hints:
Read in the two strings. Look up "std::cin" for how to read and "std::string" for how to store the strings.
Look at the std::string class's find() method to search for the substring in the long string.
Have a go and then post what you have done on here. You will find plenty of people happy to help you, but you have to make some effort yourself. :-)
As a starting point, maybe just write the part that reads in the strings. When that is working well, you can add features.
Good luck.
To start thinking about the solution of problems like this, the best way is to think how you would solve it using a pen and paper in as much detail as possible and then try to translate that to code.
I would use Test Driven Development and start out small and build up.
For example, forget about user I/O, and stick with hard-coded data:
#include <iostream>
#include <string>
using std::cout;
using std::endl;
using std::string;
int main(void) // For now, can be modified later.
{
const char pattern[] = "asd";
const char sequence[] = "asdfasdfasdfasdf";
std::string::size_type position = 0;
const std::string longSequence(sequence);
position = longSequence.find(pattern, position);
while (position != std::string::npos)
{
cout << "pattern found at position: " << position << endl;
position = longSequence.find(pattern, position);
}
cout << "Paused. Press ENTER to continue." << endl;
cin.ignore(100000, '\n');
return 0;
}
You may want to convert the above into using a state machine rather than using std::string::find(). Again, this is just a foundation to build upon.
It's a recursive backtracking problem. Just like getting the mouse out of the maze. Define your base cases and your paths through the data. In the end all you need is a single function of maybe 15 - 20 lines.

C++ Text File, Chinese characters

I have a C++ project which is supposed to add <item> to the beginning of every line and </item > to the end of every line. This works fine with normal English text, but I have a Chinese text file I would like to do this to, but it does not work. I normally use .txt files, but for this I have to use .rtf to save the Chinese text. After I run my code, it becomes gibberish. Here's an example.
{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff31507\deff0\stshfdbch31506\stshfloch31506\stshfhich31506\stshfbi31507\deflang1033\deflangfe1033\themelang1033\themelangfe0\themelangcs0{\fonttbl{\f2\fbidi
\fmodern\fcharset0\fprq1{*\panose
02070309020205020404}Courier
New;}
Code:
int main()
{
ifstream in;
ofstream out;
string lineT, newlineT;
in.open("rawquote.rtf");
if(in.fail())
exit(1);
out.open("itemisedQuote.rtf");
do
{
getline(in,lineT,'\n');
newlineT += "<item>";
newlineT += lineT;
newlineT += "</item>";
if (lineT.length() >5)
{
out<<newlineT<<'\n';
}
newlineT = "";
lineT = "";
} while(!in.eof());
return 0;
}
That looks like RTF, which makes sense as you say this is an rtf file.
Basically, if you dump that file when you open, you'll see it looks like that...
Also, you should revisit your loop
std::string line;
while(getline(in, line, '\n'))
{
// do stuff here, the above check correctly that you have indeed read in a line!
out << "<item>" << line << "</item>" << endl;
}
You can't read the RTF code the same way as plain text as you'll just ignore format tags, etc. and might just break the code.
Try to save your chinese text as a text file using UTF-8 (without BOM) and your code should work. However this might fail if some other UTF-8 encoded character contains essentially a line break (not sure about this part right now), so you should try to do real UTF-8 conversion and read the file using wide chars instead of regular chars (as Chan suggested), which is a little bit tricky using C++.
It's kind of a miracle that this works for non-Chinese text. "\n" is not the line separator in RTF, "\par" is. The odds that more damage is done to the RTF header are certainly greater for Chinese.
C++ is not the best language to tackle this. It is a trivial 5 minute program in C# as long as the file doesn't get too large:
using System;
using System.Windows.Forms; // Add reference
class Program {
static void Main(string[] args) {
var rtb = new RichTextBox();
rtb.LoadFile(args[0], RichTextBoxStreamType.RichText);
var lines = rtb.Lines;
for (int ix = 0; ix < lines.Length; ++ix) {
lines[ix] = "<item>" + lines[ix] + "</item>";
}
rtb.Lines = lines;
rtb.SaveFile(args[0], RichTextBoxStreamType.RichText);
}
}
If C++ is a hard requirement then you'll have to find an RTF parser.
I think you should use 'wchar' for string instead of 'regular char'.
If I'm understanding the objective of this code, your solution is not going to work. A line break in an RTF document does not correspond to a line break in the visible text.
If you can't just use plain text (Chinese characters are not a problem with a valid encoding), take a look at the RTF spec. You'll discover that it is a nightmare. So you're best bet is probably a third-party library that can parse RTF and read it "line" by "line." I have never looked for such a library, so do not have any suggestions off the top of my head, but I'm sure they are out there.

Regex Rejecting matches because of Instr

What's the easiest way to do an "instring" type function with a regex? For example, how could I reject a whole string because of the presence of a single character such as :? For example:
this - okay
there:is - not okay because of :
More practically, how can I match the following string:
//foo/bar/baz[1]/ns:foo2/#attr/text()
For any node test on the xpath that doesn't include a namespace?
(/)?(/)([^:/]+)
Will match the node tests but includes the namespace prefix which makes it faulty.
I'm still not sure whether you just wanted to detect if the Xpath contains a namespace, or whether you want to remove the references to the namespace. So here's some sample code (in C#) that does both.
class Program
{
static void Main(string[] args)
{
string withNamespace = #"//foo/ns2:bar/baz[1]/ns:foo2/#attr/text()";
string withoutNamespace = #"//foo/bar/baz[1]/foo2/#attr/text()";
ShowStuff(withNamespace);
ShowStuff(withoutNamespace);
}
static void ShowStuff(string input)
{
Console.WriteLine("'{0}' does {1}contain namespaces", input, ContainsNamespace(input) ? "" : "not ");
Console.WriteLine("'{0}' without namespaces is '{1}'", input, StripNamespaces(input));
}
static bool ContainsNamespace(string input)
{
// a namspace must start with a character, but can have characters and numbers
// from that point on.
return Regex.IsMatch(input, #"/?\w[\w\d]+:\w[\w\d]+/?");
}
static string StripNamespaces(string input)
{
return Regex.Replace(input, #"(/?)\w[\w\d]+:(\w[\w\d]+)(/?)", "$1$2$3");
}
}
Hope that helps! Good luck.
Match on :? I think the question isn't clear enough, because the answer is so obvious:
if(Regex.Match(":", input)) // reject
You might want \w which is a "word" character. From javadocs, it is defined as [a-zA-Z_0-9], so if you don't want underscores either, that may not work....
I dont know regex syntax very well but could you not do:
[any alpha numeric]\*:[any alphanumeric]\*
I think something like that should work no?
Yeah, my question was not very clear. Here's a solution but rather than a single pass with a regex, I use a split and perform iteration. It works as well but isn't as elegant:
string xpath = "//foo/bar/baz[1]/ns:foo2/#attr/text()";
string[] nodetests = xpath.Split( new char[] { '/' } );
for (int i = 0; i < nodetests.Length; i++)
{
if (nodetests[i].Length > 0 && Regex.IsMatch( nodetests[i], #"^(\w|\[|\])+$" ))
{
// does not have a ":", we can manipulate it.
}
}
xpath = String.Join( "/", nodetests );