I'm trying to compare two hex values as in the code at the bottom. I was expecting that the IF statement, where I compare field A and field WS-FLD-X, would result in true, but it doesn't do it.
In other words, when I move 12 to WS-FLD-A, the value in WS-FLD-X should be stored as X'0C', right? This value is expected to be the same value in field A. Comparing the two values should result in true, however this is not happening.
Why? What is the difference between the value held in field A and the value in WS-FLD-X?
IDENTIFICATION DIVISION.
PROGRAM-ID. HELLO-WORLD.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 FF.
05 A PIC XX.
05 B PIC XXXXXX.
01 F.
05 WS-FLD PIC S9(4) COMP.
05 WS-FLD-X REDEFINES WS-FLD PIC XX.
PROCEDURE DIVISION.
DISPLAY 'Hello, world' UPON CONSOLE.
MOVE X'0C' TO A.
MOVE "SOME TEXT" TO B.
DISPLAY FF UPON CONSOLE.
MOVE 12 TO WS-FLD
DISPLAY "HEX OF 12 IS:" WS-FLD-X UPON CONSOLE.
IF WS-FLD-X = A THEN DISPLAY "SAME" UPON CONSOLE END-IF.
Code in web IDE
You moved a single byte to a 2 byte field so the move padded to the right with a space (per Simon).
MOVE X'0C' TO A. // (A now contains x'0C20' which is not 12)
You'd need to move both bytes to keep the value of the number intact.
MOVE x'000C' TO A
The program now displays 'SAME'.
Related
I have a table like this :
When I select a product with data validation in column B I get automatically the unit in column F
For example I type Product 1 in B3, and in F3 I get "Kilo" , if its product 2 I get "Piece"
What im trying to do is:
If F3 <> Kilo (meaning is piece, box etc), in D3 to allow only whole numbers, but if F3 = "Kilo" then allow decimal numbers as well
And I need this to apply for the entire column
Also im tryng to format column D based on F
if F3 = Kilo then have a format like "#.##"
But if F3 <> Kilo to have only the format as "##" without decimals, only 1,3,7, 15, 30 etc
I have been looking for a solution but I don't get it
Any help please ?
try:
=((F2="pz")*(NOT(REGEXMATCH(""&D2, "\.|,"))))+(F2<>"pz")
In Google Sheets, the MID formula seems to output a weird value type that doesn't work well with comparison functions, namely IF. This issue also applies to LEFT and RIGHT functions.
Below, The Row 1 shows the function in each cell, Row 2 shows the column names, and Row 3 shows the values.
Each cell with a number is of type Custom Number Format: 123
SOURCE "=IF($A2>123,$A2-1,$A2)" "=MID($A2,1,3)" "=IF($C2>123,$C2-1,$C2)"
Col A Col B Col C Col D
---------------------------------------------------------------
123 123 123 122
The expected output from the IF check in Col D on the MID output is 123, yet it's outputting 122 (even though 123 is NOT greater than 123).
Even if I change the formats of each cell to Number 1,000.12, the IF check on MID's output is wrong.
Why is this?
EDIT: My hunch is that MID LEFT and RIGHT accept string inputs and passing in a number to substring somehow still works in output, but operating on the output gets wonky?
its because C2 is considered as Plain text from D2, therefore, MID needs to be wrapped in VALUE like:
=VALUE(MID($A2,1,3))
I am using the below date/time format in gSheets:
01 Apr at 11:00
I wonder whether it is possible to use Data Validation (or any other function) to report error (add the small red triangle to the corner of the cell) when the format differs in any way.
Possible values in the given format:
01 -> any number between 01-31 (but not "1", there must be the leading zero)
space
Apr -> 3 letters for month (Jan, Feb, Mar... Dec)
space
at
space
11 -> hours in 24h format (00, 01...23)
:
00 -> minutes (00, 01,...59)
Is there any way to validate that the cell contains "text/data" exactly in the above mentioned format?
The right way to do this is using Regular Expression and "regexmatch()" function in Google Sheets. For the given example, I made the below regular expression:
[0-3][0-9] (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) at [0-2][0-9]\:[0-5][0-9]
Process:
Select range of cells to be validated
Go to Data > Data Validation
Under Criteria select "Own pattern is" (not sure the exact translation used in EN)
Paste: =regexmatch(to_text(K4); "[0-3][0-9] (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) at [0-2][0-9]\:[0-5][0-9]")
Make sure that instead of K4 in "to_text(K4)" there is a upper-left cell from the selected range
Save
Hope it helps someone :)
You may try the formula for data validation:
=not(iserror(SUBSTITUTE(A1," at","")*1))*(len(A1)=15)*(right(A1,2)*1<61)
not(iserror(SUBSTITUTE(A1," at","")*1)) checks all statemant is legal date
(len(A1)=15) checks dates are entered with 2 digits
(right(A1,2)*1<61) cheks too much minutes, for some reason 01 Apr at 11:99 is a legal date..
Select the range of fields, where you need the data validation to occur to.
Press on -> Data -> Data validation
For "Criteria" select "Custom formula is"
Enter the following in the textfield next to "Custom formula is":
=regexmatch(Tablename!B2; "^[a-z_]*$")
Where as "Tablename" should be replaced by the table name and "B2" should be replaced by the first cell of the range.
Inside the "" you enter then your regex-expression. Here this would allow only small letters and underscores.
Using the to_text() function additionally didn't work for me. So you should maybe avoid it in order to make sure, that it works.
Press save
I used qpdf to uncompress a PDF file and below is the output. You can see that there both, encoding and ToUnicode, are present. If there is only ToUnicode I know how to map individual characters with Cmap file. But if you see output of Content stream is following
Tf
0.999402 0 0 1 71.9995 759.561 Tm
[()-2.11826()-1.14177()2.67786()-2.11826()8.55269()-5.44998()-4.70186()2.67786()-2.32338()2.67786()12.679( )-3.75591()9.73429()]TJ
in break-at there are some garbage data that is not visible. So how to link data to cmap file ?
And one another question is that in /Encoding what are values contain in Difference ?
10 0 obj
<< /BaseEncoding /WinAnsiEncoding /Differences [ 1 /g100 /g28 /g94 /g3 /g87 /g24 /g38 /g47 /g62 ] /Type /Encoding >>
Even if I pass one by one values of Difference array into one of FreeType function is named as FT_Get_Name_Indek. This function return values like [ 100 28 94 3 87 24 38 47 62]
What is those values ? how to map those Value ?
here is pdf
run following cmd
qpdf --stream-data=uncompress input.pdf output.text
output.text
I got the same output if I pass contents stream data into zlib. kindly check output.txt file from link
Firstly the general question
how to exract the text in pdf if encoding and ToUnicode both are present in pdf? how to map it?
[...] if you see there are encoding and ToUnicode both are present in pdf. i know if only ToUnicode is there so how to map individual char with Cmap file.
In such a case, i.e. when you have both a sufficiently complete and correct ToUnicode map and an Encoding for a font, you can ignore the Encoding and only use the ToUnicode map.
This follows from the PDF specification which in section 9.10.2 "Mapping Character Codes to Unicode Values" states that the methods to map a character code to a Unicode value with the highest priority is
If the font dictionary contains a ToUnicode CMap (see 9.10.3, "ToUnicode CMaps"), use that CMap to convert the character code to Unicode.
Thus, if you (as you say) already know how to extract text if there only is a ToUnicode map, you can use the same algorithm unchanged. And as a corollary, if that doesn't work, the ToUnicode map in question is insufficiently complete or incorrect, or your knowledge itself on how to extract text using only a ToUnicode map actually is incomplete.
Secondly the sample document
You wrote
[()-2.11826()-1.14177()2.67786()-2.11826()8.55269()-5.44998()-4.70186()2.67786()-2.32338()2.67786()12.679( )-3.75591()9.73429()]TJ
in break-at there are some garbag data that is not visible. so how to link data to cmap file ?
In the brackets there are the values identifying your glyphs, so they definitively are not garbage.
Thus, here are the byte values from within the brackets:
[(
01
)-2.11826(
02
)-1.14177(
03
)2.67786(
01
)-2.11826(
04
)8.55269(
05
)-5.44998(
06
)-4.70186(
07
)2.67786(
04
)-2.32338(
07
)2.67786(
08
)12.679(
09
)-3.75591(
02
)9.73429(
04
)]TJ
Using the ToUnicode map of the font in question
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CMapType 2 def
1 begincodespacerange
<00><ff>
endcodespacerange
9 beginbfrange
<01><01><0054>
<02><02><0045>
<03><03><0053>
<04><04><0020>
<05><05><0050>
<06><06><0044>
<07><07><0046>
<08><08><0049>
<09><09><004c>
endbfrange
endcmap
CMapName currentdict /CMap defineresource pop
end end
the byte values from within the brackets map to:
01 0054 "T"
02 0045 "E"
03 0053 "S"
01 0054 "T"
04 0020 " "
05 0050 "P"
06 0044 "D"
07 0046 "F"
04 0020 " "
07 0046 "F"
08 0049 "I"
09 004c "L"
02 0045 "E"
04 0020 " "
Thus,
"TEST PDF FILE "
which matches the rendered file just fine:
Thirdly the encoding
and one another question is that in /Encoding what are values contain in Difference ?
10 0 obj << /BaseEncoding /WinAnsiEncoding /Differences [ 1 /g100 /g28 /g94 /g3 /g87 /g24 /g38 /g47 /g62 ] /Type /Encoding >>
According to the PDF specification,
The value of the Differences entry shall be an array of character codes and character names organized as follows:
code1 name1,1 name1,2 …
code2 name2,1 name2,2 …
…
coden namen,1 namen,2 …
Each code shall be the first index in a sequence of character codes to be changed. The first character name after the code becomes the name corresponding to that code. Subsequent names replace consecutive code indices until the next code appears in the array or the array ends. These sequences may be specified in any order but shall not overlap.
Thus, the encoding entry in your case says that the encoding basically is WinAnsiEncoding with the difference that the codes 1, ..., 9 instead represent the glyphs named /g100, /g28, /g94, /g3, /g87, /g24, /g38, /g47, and /g62 respectively.
As these glyph names are no standard glyph names, the PDF specification does not consider this encoding helpful for text extraction because it only describes a method for a simple font
that has an encoding whose Differences array includes only character names taken from the Adobe standard Latin character set and the set of named characters in the Symbol font (see Annex D)
The "/gXX" names in your sample clearly are not among them.
It's worth observing that most of the time the /Encoding map is a character codes (intended as the encoded bytes of a string) to CID map, where CID (Character ID) in most font types corresponds to a glyph index/identifier. The exception appears to with Type2 fonts which have separate CID and GID (Glyph ID) concepts, supplying a /CIDToGIDMap to convert between them. In the above cases the /Encoding map has nothing to do with decoding an Unicode representation of the string. To decode the Unicode representation you definitely should use the /ToUnicode when available, as pointed bt #mkl. If it is not available, you are in one case where you either have a predefined encoding (optionally with a /Difference map) or CMap, or you a in a case where the font program supplies an implicit encoding, like in Type1 fonts. This is all stated in the very good #mkl answer as well. /Encoding could possibly corresponds to the map to convert between the character codes and Unicode code points when it's either a predefined encoding (like MacRomanEncoding, MacExpertEncoding, or WinAnsiEncoding, but I also saw use of possibly non compliant Identity-H, which is a predefined CMap name, not a predefined encoding) or in a supposedly malformed font. With this regard PDF reference/standard is often confusing about what is legal and what is not, so a library decoding encoded strings in PDF should always be as lenient as possible. Also the PDF reference/standard itself is not much clear in explaining the distinction between character codes, CID, GID and Unicode representations.
I found this code which does exactly what I want: gets a list of integers, and generates every combination.
perm([H|T], Perm) :-
perm(T, SP),
insert(H, SP, Perm).
perm([], []).
insert(X, T, [X|T]).
insert(X, [H|T], [H|NT]) :-
insert(X, T, NT).
Now what I want to do is, if one permutation does not meet some criteria, I want perm to return another result. So, and sorry for the lack of vocabulary, I want the same effect that would happen if I would execute that code, got a solution, and typed ; to get more results. I believe this is a very simple idea but I can't see it right now.
So, pseudocode would be:
enumerate(inputList, outputNodes, OutputArcs) :-
perm(inputList,OutputPermutation),
getArcs(OutputPermutation,OutputArcs),%I want to build the OutputArcs, then check for every element to be unique, if it isn't, generate another list with perm, if it IS, return said list as accepted)
areArcsNumberUniques(OutputArcs,OutputArcs),%TODO now is when I do not know how to make the call, here if it is valid, end, if it isn't, call perm again)
So I would need to understand how do I go about this. Also, any other ideas about the problem are welcome, since I'm brute forcing my way because I'm unable to find any type of algorithm or pattern to solve the actual problem (which I've asked about before. This is my attempted solution, just in order to give an actual answer to the exercise...)
edit: query:
enumerate([a-b,b-c], EnumNodos, EnumArcos).
expected output:
EnumNodos = [enum(3,a), enum(1,b), enum(2,c)],
EnumArcos = [enum(2,a‐b), enum(1,b‐c)]
This would be like the end game goal, where I get a list of arcs where each arc has an unique value that is equal to substracting the values of its nodes (every node also has an unique value).
And so far, since I did not find any way to do this algorithmically, I thought about trying every possibility (basically I cannot get an unique way to do this, trees with different branches seem different to me, and only restriction is that there are N nodes and N-1 arcs).
edit more examples:
6a
5 4
1b 2e
2 3
3c 5f
1
4d
EnumNodos = [enum(6,a), enum(1,b), enum(2,e), enum(3,c), enum(5,f), enum(4,d)],
EnumArcos = [enum(5,a‐b), enum(4,a-e), enum(3,e-f), , enum(2,b-c), enum(1,c-d)]
5a
4 3
1b 2e
1 2
3c 4f
EnumNodos = [enum(5,a), enum(1,b), enum(2,e), enum(3,c), enum(4,f)],
EnumArcos = [enum(4,a‐b), enum(3,a-e), enum(1,b-c), , enum(2,e-f)]
5a
4 3
1b 2e
2
3c
1
4d
9a
8 7
1b 2e
6 4
7c 6f
2 2
5d 4g
3 1
8h 3i