Aspose Word Merge - aspose

I am using Aspose library for Word merge.
In the previous version of the Aspose, if we add white spaces for a field then while merging , it doesn't considers it as empty but after upgrade to latest version, it is considering the whitespaces as blank and removing those fields if setting is ON.
For my case, I want to prevent whitespaces or empty fields for few fileds but remove it for rest of the fields.
I tried to find a setting which can be applied on field level to prevent or remove empty fields but have'nt got any.
Is there any way I can acheive this?

If paragraph contains only whitespaces it is considered as empty and is removed. So for example if you use code like the following:
string[] fieldNames = new string[] { "FirstName", "MidName", "LastName" };
string[] fieldValues = new string[] { "Alexey", " ", "Noskov" };
Document doc = new Document(#"C:\Temp\in.docx");
doc.MailMerge.CleanupOptions = MailMergeCleanupOptions.RemoveEmptyParagraphs;
doc.MailMerge.Execute(fieldNames, fieldValues);
doc.Save(#"C:\Temp\out.docx");
Where MidName merge field is placed in a separate paragraph, the paragraph will be removed as empty.
However, you can work this behavior around using IFieldMergingCallback. For example, you can put hidden text at the merge field to make the paragraph to be considered as not empty. For example see the following code:
string[] fieldNames = new string[] { "FirstName", "MidName", "LastName" };
string[] fieldValues = new string[] { "Alexey", " ", "Noskov" };
Document doc = new Document(#"C:\Temp\in.docx");
doc.MailMerge.FieldMergingCallback = new MergeWhitespaceCallback("MidName");
doc.MailMerge.CleanupOptions = MailMergeCleanupOptions.RemoveEmptyParagraphs;
doc.MailMerge.Execute(fieldNames, fieldValues);
doc.Save(#"C:\Temp\out.docx");
private class MergeWhitespaceCallback : IFieldMergingCallback
{
private readonly string[] mRetainParagraphsWithFields;
public MergeWhitespaceCallback(params string[] retainParagraphsWithFields)
{
mRetainParagraphsWithFields = retainParagraphsWithFields;
}
public void FieldMerging(FieldMergingArgs args)
{
if (!string.IsNullOrEmpty(args.FieldValue.ToString().Trim()))
return;
if (!mRetainParagraphsWithFields.Contains(args.FieldName))
return;
DocumentBuilder builder = new DocumentBuilder(args.Document);
builder.MoveTo(args.Field.Start);
builder.Font.Hidden = true;
builder.Write("<empty paragraph>");
}
public void ImageFieldMerging(ImageFieldMergingArgs args)
{
// Do nothing.
}
}
Later, you can remove hidden text if required.

Assuming you're actually executing a mailmerge (not just overwriting mergefields), you should be able to control most, if not all, of that via mailmerge field coding in the mailmerge main document.
On PCs, you can use the mergefield \b and/or \f switches to suppress a space before or after an empty mergefield. For example, suppose you have:
«Title» «FirstName» «SecondName» «LastName»
but «SecondName» is sometimes empty and you don’t want that to leave two spaces in the output. To deal with that:
select the «SecondName» field and press Shift-F9 so that you get
{MERGEFIELD SecondName};
edit the field code so that you end up with-
{MERGEFIELD SecondName \b " "} or
{MERGEFIELD SecondName \f " "}
depending on whether the space to be suppressed is following or before the mergefield;
delete, as appropriate, the corresponding space following or before
the mergefield;
position the cursor anywhere in this field and press F9 to update it.
Note 1: the \b and \f switches don't work on Macs or in conjunction with other switches. In such cases you need to use and IF test instead, coded along the lines of-
{IF{MERGEFIELD SecondName}<> "" " {MERGEFIELD SecondName}"} or
{IF{MERGEFIELD SecondName}<> "" "{MERGEFIELD SecondName} "}
Even so, you can use the \b and \f switches to express other mergefields that do have switches of their own. For example, suppose you have four fields ‘Product’, ‘Supplier’, ‘Quantity’ and ‘UnitPrice’, and you don’t want to output the ‘Product’, ‘Quantity’ or ‘UnitPrice’ fields if the ‘Supplier’ field is empty. In that case, you might use a field coded along the lines of:
{MERGEFIELD "Supplier" \b "{MERGEFIELD Product}→" \f "→{MERGEFIELD Quantity \# 0}→{MERGEFIELD UnitPrice \# "$0.00"}¶
"}
Note 2: The field brace pairs (i.e. '{ }') for the above example are all created in the document itself, via Ctrl-F9 (Cmd-F9 on a Mac or, if you’re using a laptop, you might need to use Ctrl-Fn-F9); you can't simply type them or copy & paste them from this message. Nor is it practical to add them via any of the standard Word dialogues. Likewise, the chevrons (i.e. '« »') are part of the actual mergefields - which you can insert from the 'Insert Merge Field' dropdown (i.e. you can't type or copy & paste them from this message, either). The spaces represented in the field constructions are all required. Instead of the →, ↵ and ¶ symbols, you should use real tabs and line/paragraph breaks, respectively.
For more Mailmerge Tips & Tricks, see: https://www.msofficeforums.com/mail-merge/21803-mailmerge-tips-tricks.html

Related

Refactoring starting place for regex

I have a function that stripes HTML markup to display inside of a text element.
stripChar: function stripChar(string) {
string = string.replace(/<\/?[^>]+(>|$)/g, "")
string = string.trim()
string = string.replace(/(\n{2,})/gm,"\n\n");
string = string.replace(/…/g,"...")
string = string.replace(/ /g,"")
let changeencode = entities.decode(string);
return changeencode;
}
This has worked great for me, but I have a new requirement and Im struggle to work out where I should start refactoring the code above. I still need to stripe out the above, but I have 2 exceptions;
List items, <ul><li>, I need to handle these so that they still appear as a bullet point
Hyperlinks, I want to use the react-native-hyperlink, so I need to leave intack the <a> for me to handle separately
Whilst the function is great for generalise tag replacement, its less flexible for my needs above.
You may use
stripChar: function stripChar(string) {
string = string.replace(/ |<(?!\/?(?:li|ul|a)\b)\/?[^>]+(?:>|$)/g, "");
string = string.trim();
string = string.replace(/\n{2,}/g,"\n\n");
string = string.replace(/…/g,"...")
let changeencode = entities.decode(string);
return changeencode;
}
The main changes:
.replace(/ /g,"") is moved to the first replace
The first replace is now used with a new regex pattern where the li, ul and a tags are excluded from the matches using a negative lookahead (?!\/?(?:li|ul|a)\b).
See the updated regex demo here.

How to generate regex patterns in python using re.compile

I am trying to create a python code that will be able to extract the information from strings such as the one below, using regular expressions.
date=2019-10-26 time=17:59:00 logid="0000000020" type="traffic" subtype="forward" level="notice" vd="root" eventtime=1572127141 srcip=192.168.6.15 srcname="TR" srcport=522 srcintf="port1" srcintfrole="lan" dstip=172.217.15.194 dstport=43 dstintf="wan2" dstintfrole="wan" poluuid="feb1fa32-d08b-51e7-071f-19e3b5d2213c" sessionid=195421734 proto=6 action="accept" policyid=4 policytype="policy" service="HTTPS" dstcountry="United States" srccountry="Reserved" trandisp="snat" transip=168.168.140.247 transport=294 appid=537 app="Google.Ads" appcat="General.Interest" apprisk="elevated" applist="Seniors" appact="detected" duration=719 sentbyte=2691 rcvdbyte=2856 sentpkt=19 rcvdpkt=25 shapingpolicyid=1 sentdelta=449 rcvddelta=460 devtype="Linux" devcategory="Linux" mastersrcmac="fa:cc:4e:a3:56:2d" srcmac="fa:cc:4e:a3:56:2d" srcserver=0
I found someone's code on github and he uses the lines below to extract the information, however, his code doesn't extract all of the fields I require, most notably srcip=192.168.1.105
I don't want to post the guy's entire code as it's not mine. However, if it is required I can.
I am hoping all the fields will be extracted from the jumble of information so I can save them as a .csv file.
The regex \w+=([^\s"]+|"[^"]*") matches
The field name (at least one word character), then
An = sign, then
Either:
An unquoted field value (at least one character, excluding whitespace and quotes), or
A quoted field value (", then any number of non-quotes, then ").
By adding parentheses around the parts of the regex which match field name, and the unquoted and quoted values, we can extract the relevant parts and put them into a dictionary using a comprehension, using the findall method:
import re
pattern = re.compile(r'(\w+)=(([^\s"]+)|"([^"]*)")')
def parse_fields(text):
return {
name: (value or quoted_value)
for name,_,value,quoted_value in pattern.findall(text)
}
Same as kaya3, but I don't keep the quotes
s = '''date=2019-10-26 time=17:59:00 logid="0000000020" type="traffic"
subtype="forward" level="notice" vd="root" eventtime=1572127141
srcip=192.168.6.15 srcname="TR" srcport=522 srcintf="port1" srcintfrole="lan"
dstip=172.217.15.194 dstport=43 dstintf="wan2" dstintfrole="wan"
poluuid="feb1fa32-d08b-51e7-071f-19e3b5d2213c" sessionid=195421734 proto=6
action="accept" policyid=4 policytype="policy" service="HTTPS"
dstcountry="United States" srccountry="Reserved" trandisp="snat"
transip=168.168.140.247 transport=294 appid=537 app="Google.Ads"
appcat="General.Interest" apprisk="elevated" applist="Seniors"
appact="detected" duration=719 sentbyte=2691 rcvdbyte=2856 sentpkt=19
rcvdpkt=25 shapingpolicyid=1 sentdelta=449 rcvddelta=460 devtype="Linux"
devcategory="Linux" mastersrcmac="fa:cc:4e:a3:56:2d" srcmac="fa:cc:4e:a3:56:2d"
srcserver=0'''
import re
matches = re.findall(r'([a-zA-Z_][a-zA-Z0-9_]*)=(?:"([^"]+)"|(\S+))', s)
d = {
name: quoted or unquoted
for name, quoted, unquoted in matches
}

Add empty option to List Validator

I'm trying to add the option to my users that, on a List Validator, to allow select any of the options or a blank option. Spreadjs has the IgnoreBlanks setting, which I use, so when the user uses the delete key or the backspace and deletes the cell it validates correctly.
However, I would love to use the same functionality as in Excel, which allows blank options in the list validator, in part of the list.
I've tried to target the <select> element that holds the list and programmatically add the empty element, however, it crashes after the user selects the empty option.
I've also tried to add different escaped characters to the list. If I select a character that represents an empty string or a tab, it won't add a new option to the list. If I use any strange character, or even the null character \0 you get a new option to select, but the content is that typical rectangle you see when your font doesn't have the character you're trying to display.
I've also tested using a regular ListValidator like in the example pages, not our custom functionality and doesn't work either.
https://www.grapecity.com/demos/spread/JS/TutorialSample/#/demos/basicDataValidator
I have also tried creating a FormulaListValidator, and if my range has empty cells I could then get an empty option on my list, however, because the range may have duplicates, I get duplicated options.
After researching a little bit I found a workaround in a different language which I adapted to Typescript (Angular 6)
export const getListValidatorFromArray = (spread: GC.Spread.Sheets.Workbook, data: any[]) => {
// saving validation list values in a hidden sheet
spread.addSheet(spread.getSheetCount());
const sheet = spread.getSheet(spread.getSheetCount() - 1);
sheet.visible(false);
for (let i = 0; i < data.length; i++) {
sheet.setValue(i, 0, data[i]);
}
// create validator based on the values
const dv = GC.Spread.Sheets.DataValidation.createFormulaListValidator(
'=' + sheet.name() + '!$A$1:' + sheet.name() + '!$A$' + data.length
);
return dv;
};
Note: This creates an extra sheet for each validator you create. Makes sure you reuse them as much as possible (i.e. assigning it to a variable when it's created, and reusing the variable for other columns/rows that use the same one).

Search for an item in a text file using UIMA Ruta

I have been trying to search for an item which is there in a text file.
The text file is like
Eg: `
>HEADING
00345
XYZ
MethodName : fdsafk
Date: 23-4-2012
More text and some part containing instances of XYZ`
So I did a dictionary search for XYZ initially and found the positions, but I want only the 1st XYZ and not the rest. There is a property of XYZ that , it will always be between the 5 digit code and the text MethondName .
I am unable to do that.
WORDLIST ZipList = 'Zipcode.txt';
DECLARE Zip;
Document
Document{-> MARKFAST(Zip, ZipList)};
DECLARE Method;
"MethodName" -> Method;
WORDLIST typelist = 'typelist.txt';
DECLARE type;
Document{-> MARKFAST(type, typelist)};
Also how do we use REGEX in UIMA RUTA?
There are many ways to specify this. Here are some examples (not tested):
// just remove the other annotations (assuming type is the one you want)
type{-> UNMARK(type)} ANY{-STARTSWITH(Method)};
// only keep the first one: remove any annotation if there is one somewhere in front of it
// you can also specify this with POSISTION or CURRENTCOUNT, but both are slow
type # #type{-> UNMARK(type)}
// just create a new annotation in between
NUM{REGEXP(".....")} #{-> type} #Method;
There are two options to use regex in UIMA Ruta:
(find) simple regex rules like "[A-Za-z]+" -> Type;
(matches) REGEXP conditions for validating the match of a rule element like
ANY{REGEXP("[A-Za-z]+")-> Type};
Let me know if something is not clear. I will extend the description then.
DISCLAIMER: I am a developer of UIMA Ruta

How to remove text inside of parentheses with VB script RegExp

I am using a labeling software and I don't want any text inside of parentheses to display on the labels. Here is what I have so far
Function RemovePara(TextToBeEdited)
Set myRegEx = New RegExp
myRegEx.IgnoreCase = True
myRegEx.Global = True
myRegEx.Pattern = "\(([a-z]+?)\)(.+)"
Set RemovePara = myRegEx.Replace(txt, "")
End Function
Now I'm pretty new to this, and when I try to save this code in the labeling software it says "The script did not read the "Value" property, which means the current specified data source was ignored. This may not be what you intended" I had the text I field name I want edited where "TextToBeEdited" is at. What am I missing here?
You could use lookaround assertions.
myRegEx.Pattern = "(?<=\()[^()]*(?=\))"
Set RemovePara = myRegEx.Replace(txt, "")
DEMO