Interpreting / Reading text files written for Assembly application

Interpreting / Reading text files written for Assembly application - c++

I am just starting out in C++.
I am writing a console application, to "read in" an .evt file (custom, not to be confused with Event viewer files in Windows) and its contents but now I need to write a method to.
a) Store each block of 'EVENT X' including but also ending at 'END'.
b) Make the contents of each block searchable/selectable.
If the content wasn't so 'wildly' varied, I would be happy to put this into some SQL table or experiment with an array but I don't know a starting point to do this as the number of 'fields' or parameters varies. The maximum number of lines I have seen in a block is around 20, the maximum number of parameters per line I have seen is around 13.
I'm not asking for an explicit answer or the whole code to do it although it is welcome, just a generic sample of code to get started that might be appropriate.
This my function to just load the data as it is.
void event_data_loader()
{
string evt_data;
string response2;
cout << "You have chosen to Create/Load Soma events\n\n";
ifstream named_EVT("C:/evts/1.evt");
while (getline(named_EVT, evt_data))
{
// Output the text from the file
cout << evt_data << "\n"; // Iterate out each line of the EVT file including spaces
//name_EVT.close();*/
}
cout << "Does the output look ok?(Y/N)";
cin >> response2;
if (response2 == "Y")
{
// Vectors? Dynamic array? to re-arrange the data?
}
}
The files themselves have content like this. I know what most of the functions do, less so all of the parameters. For some reason putting this on the page it puts them into a single line.
EVENT 01
A CHECK_HUMAN
A CHECK_POSITION 1 250 90 350 90
E BBS_OPEN 1 0
END
EVENT 02
E SELECT_MSG 336 363 314 337 03 338 12 -1 -1
END
EVENT 03
E RUN_EVENT 761
E RUN_EVENT 04
E RUN_EVENT 05
END
EVENT 761
A EXIST_ITEM 373 1
E SELECT_MSG 857 315 762 316 763 -1 -1 -1 -1
E RETURN
END
EVENT 762
A EXIST_ITEM 373 1
E ROB_ITEM 373 1
E SHOW_MAGIC 6
E CHANGE_HP 1 10000
E CHANGE_MP 1 10000
E MESSAGE_NONE 858
E RETURN
END
EVENT 1862
A ABSENT_EVENT 1582
A EXIST_ITEM 1800 1
A EXIST_ITEM 1801 1
A EXIST_ITEM 1802 1
A EXIST_ITEM 1803 1
A EXIST_ITEM 1804 1
A EXIST_ITEM 1805 1
A EXIST_ITEM 1806 1
A EXIST_ITEM 1807 1
A WEIGHT 365 1854 1 1832 1 -1 1 -1 -1 -1 -1
A CHECK_ITEMSLOT 393 1854 1 1832 1 -1 1 -1 -1 -1 -1
A GENDER 1
E ADD_EVENT 1582
E MESSAGE_NONE 3237
E ROB_ITEM 1800 1
E ROB_ITEM 1801 1
E ROB_ITEM 1802 1
E ROB_ITEM 1803 1
E ROB_ITEM 1804 1
E ROB_ITEM 1805 1
E ROB_ITEM 1806 1
E ROB_ITEM 1807 1
E GIVE_ITEM 1854 1
E GIVE_ITEM 1832 1
E RETURN
END

I would do something like this:
struct Subevent {
std::string selector;
std::string name;
std::vector<int> params;
};
struct Event {
int id;
std::vector<Subevent> subevents;
};
std::vector<Event> load_events(std::istream& input_stream) {
std::vector<Event> out;
Event current_event {}; // current event being built
std::string line;
bool inside_event = false; // are we inside the scope of an event?
while (std::getline(input_stream, line)) {
// strip trailing whitespace
while (isspace(line.back())) {
line.pop_back();
}
// skip empty lines
if (line.size() == 0) {
continue;
}
// read first token (until first space)
std::stringstream ss(line);
std::string first_token;
ss >> first_token;
bool is_new_event_line = first_token == "EVENT";
bool is_end_line = first_token == "END";
if (is_new_event_line) {
// line: EVENT <id>
if (inside_event) {
// error: "not expecting new event"
// choose your own error messaging method
}
int id;
ss >> id; // read <id>
// setup new event
current_event.id = id;
inside_event = true;
}
else if (is_end_line) {
// line: END
if (!inside_event) {
// error: "unexpected END"
}
// record and clear current event
out.push_back(current_event);
inside_event = false;
current_event = Event();
}
else {
// line: <selector> <name> <params...>
// e.g.: A GENDER 1
if (!inside_event) {
// error: "unexpected property entry"
}
// read subevent
Subevent subevent {};
subevent.selector = first_token;
ss >> subevent.name;
// copy over the int params from the line
std::copy(
std::istream_iterator<int>(ss),
std::istream_iterator<int>(),
std::back_inserter(subevent.params)
);
// push back subevent
event.subevents.push_back(subevent);
}
}
return out;
}

Related

How do I send text from TextView to 'Export to PDF'?

I'm new to GTK4, an open source toolkit for C++ binding. I want to send text from current TextView buffer (from a saved file) to a PDF file using GTK libraries (gtkmm4), but couldn't get anything printed out.
This is the code I have started from reading the documentation:
void MainWindow:export_note() {
auto op = Gtk::PrintOperation::create();
// setup op
cout << save_file_path << endl;
string content = editor.get_buffer()->get_text();
ofstream out(work_dir + save_file_path);
out << content;
out.close();
curr_state = edit_file;
op->set_export_filename("test.pdf");
auto res = op->run(Gtk::PrintOperation::Action::EXPORT);
return;
}
This only exports to a blank PDF, but I'm expecting text to show up on PDF.

It looks like you are not using any binary application to attempt conversion from text to text/pdf or more commonly binary application/pdf. You cannot simply stuff text data into a container called Test.pdf
There are simple means to convert Text to PDF, traditionally by using PostScript Printer files, but more commonly recently using a PDF printer driver direct
so start at the most basic level Hello World needs a file something like this, where the body is built up from stacked vectors or font labelled strings at X&Y co-ordinates.
Test.pdf
%PDF-1.1
%âãÏÓ
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj
2 0 obj<</Type/Pages/Kids [3 0 R]/Count 1/MediaBox [0 0 594 792]>>endobj
3 0 obj<</Type/Page/Parent 2 0 R/Resources<</Font<</F1<</Type/Font/Subtype/Type1/BaseFont/Helvetica>>>>>>/Contents 4 0 R>>endobj
4 0 obj<</Length 81
>>
stream
BT /F1 18 Tf 036 740 Td (Hello) Tj ET
BT /F1 18 Tf 036 720 Td (World!) Tj ET
endstream
endobj xref
0 5
0000000000 65535 f
0000000019 00000 n
0000000063 00000 n
0000000137 00000 n
0000000267 00000 n
trailer<</Root 1 0 R /Size 5>>startxref
399 %%EOF

How to create in C or C++ the contents value of Sig type object for digital signature in PDF?

We are programmatically creating PDF using our in house lib (C++) by adding all the required objects so that PDF readers can render them properly.
Currently we are enhancing the lib to support digital signatures in PDF. Our users will use USB token or Windows certificates to sign the PDF.
On studying raw PDF file with digital signature, we were able to make sense of all the objects except for the contents of Sig type object.
18 0 obj
<<
/Type /Sig
/Filter /Adobe.PPKLite --> signature handler for authenticating the fields contents
/SubFilter /adbe.pkcs7.sha1 --> submethod of the handler
/Contents <....> --> signature token
/ByteRange [ 0 101241 105931 7981
] --> byte range for digest calc
/M (D:20210505094125+05'30') --> timestamp
/Reason ()
/Location ()
/ContactInfo ()
>>
endobj
We have referred
https://www.adobe.com/devnet-docs/acrobatetk/tools/DigSigDC/Acrobat_DigitalSignatures_in_PDF.pdf
to understand what all constitutes the signature token.
We need direction on how to programmatically create the signature token for PDF using windows APIs. Currently we are not looking at 3rd party lib solutions.
Thanks in advance.
Update
We tried the following:
Updated our in-house PDF lib to support incremental updates so that digital signing related objects can be added. We added something like this apart from the obj# 18 mentioned above:
16 0 obj --> new Acroform obj
<<
/Fields [ 17 0 R ]
/SigFlags 3
>>
endobj
2 0 obj --> Updating root to add AcroForm
<<
/Type /Catalog
/Pages 3 0 R
/AcroForm 16 0 R
>>
endobj
17 0 obj --> new obj for signature field
<<
/T (SignatureField1)
/Type /Annot
/Subtype /Widget
/FT /Sig
/F 4
/Rect [ 270 159 503 201 ] --> field position. this will have image of sign
/P 5 0 R
/V 18 0 R
/AP <<
/N 19 0 R
>>
>>
endobj
5 0 obj --> updating existing page obj with Annots
<<
/Type /Page
/Parent 3 0 R
/MediaBox [ 0 0 595 841 ]
/Resources 4 0 R
/Contents 6 0 R
/Annots [ 17 0 R ]
>>
endobj
18 0 obj
<<
/Type /Sig
/Filter /Adobe.PPKLite
/SubFilter /adbe.pkcs7.sha1 --> we tried with adbe.pkcs7.detached as well
/Contents <> --> updated contents using windows APIs
/ByteRange [ 0 100381 102645 7322
] --> updated ByteRange with right offsets and lengths
/M (D:20210610125837+05'30') --> sign verified time
/Reason ()
/Location ()
/ContactInfo ()
>>
endobj
19 0 obj --> new obj
<<
/Length 7
/BBox [ 0 0 233 42 ]
/Type /XObject
/Subtype /Form
/Resources <<
/XObject <<
/FRM 20 0 R
>>
>>
>>
stream
/FRM Do
endstream
endobj
20 0 obj --> new obj for image manipulation
<<
/Length 29
/Type /XObject
/Subtype /Form
/Resources <<
/XObject <<
/Im1 21 0 R
>>
>>
/BBox [ 0 0 233 42 ]
>>
stream
q 233 0 0 42 0 0 cm /Im1 Do Q
endstream
endobj
21 0 obj --> image obj which contains sign info. Generated by us
<<
/Length 6166
/Type /XObject
/Subtype /Image
/Width 372
/Height 82
/ColorSpace /DeviceRGB
/BitsPerComponent 8
/Filter /DCTDecode
>>
stream
---------------------------------> image stream
endstream
endobj
xref --> updated xref
0 1
0000000000 65535 f
2 1
0000099954 00000 n
5 1
0000100172 00000 n
16 6
0000099901 00000 n
0000100020 00000 n
0000100297 00000 n
0000102944 00000 n
0000103096 00000 n
0000103271 00000 n
trailer --> updated trailer
<<
/Root 2 0 R
/Info 1 0 R
/Size 22
/ID [ <982AAACB948CE1AD9FDD976D177BF316> <982AAACB948CE1AD9FDD976D177BF316> ]
--> ID generated via windows API
/Prev 99491
>>
startxref
109605
%%EOF
For contents data, we used the below API:
bool SignMessageBySubjectName (BytePtr pMessage, ULong pMessageSize, StrPtr pSubjectName, CRYPT_DATA_BLOB * pSignBlob)
{
HCERTSTORE store_handle = NULL;
PCCERT_CONTEXT cert_context = NULL;
BYTE * signed_blob = NULL;
ULong signed_blob_size;
ULong message_size;
CRYPT_SIGN_MESSAGE_PARA signature_params;
BYTE * message;
pSignBlob->cbData = 0;
pSignBlob->pbData = NULL;
message = (BYTE *) pMessage;
message_size = (pMessageSize + 1) * sizeof(Char); //Size in bytes
const BYTE * message_array[] = {message};
DWORD message_array_size[1];
message_array_size[0] = message_size;
store_handle = CertOpenStore(CERT_STORE_PROV_SYSTEM, 0, NULL,
CERT_SYSTEM_STORE_CURRENT_USER, L"MY");
cert_context = CertFindCertificateInStore( store_handle, PKCS_7_ASN_ENCODING | X509_ASN_ENCODING, 0,
CERT_FIND_SUBJECT_STR, pSubjectName, NULL);
signature_params.cbSize = sizeof(CRYPT_SIGN_MESSAGE_PARA);
signature_params.dwMsgEncodingType = PKCS_7_ASN_ENCODING | X509_ASN_ENCODING;
signature_params.pSigningCert = cert_context;
signature_params.HashAlgorithm.pszObjId = szOID_RSA_SHA1RSA;
signature_params.HashAlgorithm.Parameters.cbData = NULL;
signature_params.cMsgCert = 1;
signature_params.rgpMsgCert = &cert_context;
signature_params.cAuthAttr = 0;
signature_params.dwInnerContentType = 0;
signature_params.cMsgCrl = 0;
signature_params.cUnauthAttr = 0;
signature_params.dwFlags = 0;
signature_params.pvHashAuxInfo = NULL;
signature_params.rgAuthAttr = NULL;
//Get size of signed message
CryptSignMessage(&signature_params, TRUE, 1, message_array, message_array_size,NULL, &signed_blob_size);
signed_blob = (BYTE *) Malloc(signed_blob_size);
CryptSignMessage(&signature_params, TRUE, 1, message_array, message_array_size, signed_blob, &signed_blob_size);
pSignBlob->cbData = signed_blob_size;
pSignBlob->pbData = signed_blob;
CertFreeCertificateContext(cert_context);
CertCloseStore(store_handle, CERT_CLOSE_STORE_FORCE_FLAG);
return true;
}
While using CryptSignMessage() with detached parameter as TRUE, we get a around 850 length sign token which we convert to hex and add in the contents part. It'll approximately be around 1700 chars.
In case of the image used in the Field newly added, we generated our own image and added it as a PDF obj.
For the ID in trailer part, we generated the same using API from Bcrypt.lib (BCryptGenRandom()), converted its output to hex and updated the ID part.
Listing out the steps we did:
We generated 2 buffers. Both buffers are identical with respect to all the PDF objects required, the ID generated from BCryptGenRandom() and ByteRange array updated with actual values. buffer1 has contents data as 0s for a definite length acting as a placeholder. buffer2 has empty contents data (/Contents <>)
buffer2 will be passed onto CryptSignMessage() to generate the sign token. This will be converted to hex.
The hex sign token will be added to contents part of buffer1 replacing the 0s based on its length.
buffer1 will be written to a PDF file.
When we did all these, and opened the PDF in readers, we got errors like
Signature is invalid
Document has been corrupted or altered since the signature was applied.
Error from a PDF Reader:
Detailed Error:
But with these errors too, the reader was able to identify the user, certificate, hash algorithm and signature algorithm used.
We think we need to somehow add timestamp data as part of the sign token so as to avoid this error. Or something else we would have missed.
PFA sample PDF here:https://drive.google.com/file/d/1Udog4AmGoq2ls3Tu3Wq5s2xU9LxaI3fH/view?usp=sharing
Kindly help us solve this issue. Thanks in advance.

We used a different set of APIs to make this work.
Pasting the code here:
bool SignatureHandler::SignMessageTest (BytePtr pMessage, ULong pMessageSize, StrPtr pSubjectName, CRYPT_DATA_BLOB * pSignBlob, LPSTR pOid, DWORD pFlag, DWORD pType)
{
HCERTSTORE store_handle = NULL;
PCCERT_CONTEXT cert_context = NULL;
BYTE * signed_blob = NULL;
ULong signed_blob_size = 0;
CRYPT_SIGN_MESSAGE_PARA signature_params;
BYTE * message;
BOOL rc;
pSignBlob->cbData = 0;
pSignBlob->pbData = NULL;
store_handle = CertOpenStore (CERT_STORE_PROV_SYSTEM, 0, NULL, CERT_SYSTEM_STORE_CURRENT_USER, L"MY");
cert_context = CertFindCertificateInStore (store_handle, (PKCS_7_ASN_ENCODING | X509_ASN_ENCODING), 0, CERT_FIND_SUBJECT_STR, pSubjectName, NULL);
HCRYPTPROV_OR_NCRYPT_KEY_HANDLE a = 0;
DWORD ks = 0;
BOOL bfr = false;
HCRYPTPROV_OR_NCRYPT_KEY_HANDLE PrivateKeys;
CERT_BLOB CertsIncluded;
CMSG_SIGNER_ENCODE_INFO Signers;
HCRYPTMSG hMsg;
rc = CryptAcquireCertificatePrivateKey (cert_context, 0, 0, &a, &ks, &bfr);
CMSG_SIGNER_ENCODE_INFO SignerEncodeInfo = {0};
SignerEncodeInfo.cbSize = sizeof (CMSG_SIGNER_ENCODE_INFO);
if (a)
SignerEncodeInfo.hCryptProv = a;
if (bfr)
PrivateKeys = a;
CERT_BLOB SignerCertBlob;
SignerCertBlob.cbData = cert_context->cbCertEncoded;
SignerCertBlob.pbData = cert_context->pbCertEncoded;
CertsIncluded = SignerCertBlob;
SignerEncodeInfo.cbSize = sizeof (CMSG_SIGNER_ENCODE_INFO);
SignerEncodeInfo.pCertInfo = cert_context->pCertInfo;
SignerEncodeInfo.dwKeySpec = ks;
SignerEncodeInfo.HashAlgorithm.pszObjId = pOid;
SignerEncodeInfo.HashAlgorithm.Parameters.cbData = NULL;
SignerEncodeInfo.pvHashAuxInfo = NULL;
Signers = SignerEncodeInfo;
CMSG_SIGNED_ENCODE_INFO SignedMsgEncodeInfo = {0};
SignedMsgEncodeInfo.cbSize = sizeof (CMSG_SIGNED_ENCODE_INFO);
SignedMsgEncodeInfo.cSigners = 1;
SignedMsgEncodeInfo.rgSigners = &Signers;
SignedMsgEncodeInfo.cCertEncoded = 1;
SignedMsgEncodeInfo.rgCertEncoded = &CertsIncluded;
SignedMsgEncodeInfo.rgCrlEncoded = NULL;
signed_blob_size = 0;
signed_blob_size = CryptMsgCalculateEncodedLength ((PKCS_7_ASN_ENCODING | X509_ASN_ENCODING), pFlag, pType, &SignedMsgEncodeInfo, 0, pMessageSize);
if (signed_blob_size) {
signed_blob_size *= 2;
hMsg = CryptMsgOpenToEncode (CERTIFICATE_ENCODING_TYPE,
pFlag,
pType,
&SignedMsgEncodeInfo,
0,
NULL);
if (hMsg) {
signed_blob = (BYTE *)malloc (signed_blob_size);
BOOL CU = CryptMsgUpdate (hMsg, (BYTE *)pMessage, (DWORD)pMessageSize, true);
if (CU) {
if (CryptMsgGetParam (
hMsg, // Handle to the message
CMSG_CONTENT_PARAM, // Parameter type
0, // Index
signed_blob, // Pointer to the BLOB
&signed_blob_size)) // Size of the BLOB
{
signed_blob = (BYTE *)realloc (signed_blob, signed_blob_size);
if (hMsg) {
CryptMsgClose (hMsg);
hMsg = 0;
}
}
}
if (hMsg)
CryptMsgClose (hMsg);
hMsg = 0;
}
}
CryptReleaseContext (a, 0);
pSignBlob->cbData = signed_blob_size;
pSignBlob->pbData = signed_blob;
CertFreeCertificateContext (cert_context);
CertCloseStore (store_handle, CERT_CLOSE_STORE_FORCE_FLAG);
return true;
}
The oid, flag and type we used are szOID_RSA_SHA1RSA, CMSG_DETACHED_FLAG and CMSG_SIGNED respectively.
On converting pSignBlob->pbData to hex and adding it to /Contents, the PDF and signature became valid when opened in PDF readers.

Ok, the signature container is embedded correctly.
But there are issues with the signature container itself:
Both in the SignedData.digestAlgorithms collection and in the SignerInfo.digestAlgorithm value you have used the OID of SHA1withRSA, but that is a full signature algorithm, not the mere digest algorithm SHA1 expected there.
Then the SHA1 hash of the signed bytes is BB78A402F7A537A34D6892B83881266501A691A8 but the hash you signed is 90E28B8A0D8E48691DAFE2BA10A4761FFFDCCD3D. This might be because you hash buffer2 and
buffer2 has empty contents data (/Contents <>)
The hex string delimiters '<' and '>' also belong to the contents value and, therefore, must also be removed in buffer2.
Furthermore, your signature is very weak:
It uses SHA1 as hash algorithm. SHA1 meanwhile has been recognized as too weak a hash algorithm for document signatures.
It doesn't use signed attributes, neither the ESS signing certificate nor the algorithm identifier protection attribute. Many validation policies require such special attributes.

pandas - group by: create aggregation function using multiple columns

I have the following data frame:
id my_year my_month waiting_time target
001 2018 1 95 1
002 2018 1 3 3
003 2018 1 4 0
004 2018 1 40 1
005 2018 2 97 1
006 2018 2 3 3
007 2018 3 4 0
008 2018 3 40 1
I want to groupby my_year and my_month, then in each group I want to compute the my_rate based on
(# of records with waiting_time <= 90 and target = 1)/ total_records in the group
i.e. I am expecting output like:
my_year my_month my_rate
2018 1 0.25
2018 2 0.0
2018 3 0.5
I wrote the following code to compute the desired value my_rate:
def my_rate(data):
waiting_time_list = data['waiting_time']
target_list = data['target']
total = len(data)
my_count = 0
for i in range(len(data)):
if total_waiting_time_list[i] <= 90 and target_list[i] == 1:
my_count += 1
rate = float(my_count)/float(total)
return rate
df.groupby(['my_year','my_month']).apply(my_rate)
However, I got the following error:
KeyError 0
KeyErrorTraceback (most recent call last)
<ipython-input-29-5c4399cefd05> in <module>()
17
---> 18 df.groupby(['my_year','my_month']).apply(my_rate)
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/groupby.pyc in apply(self, func, *args, **kwargs)
714 # ignore SettingWithCopy here in case the user mutates
715 with option_context('mode.chained_assignment', None):
--> 716 return self._python_apply_general(f)
717
718 def _python_apply_general(self, f):
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/groupby.pyc in _python_apply_general(self, f)
718 def _python_apply_general(self, f):
719 keys, values, mutated = self.grouper.apply(f, self._selected_obj,
--> 720 self.axis)
721
722 return self._wrap_applied_output(
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/groupby.pyc in apply(self, f, data, axis)
1727 # group might be modified
1728 group_axes = _get_axes(group)
-> 1729 res = f(group)
1730 if not _is_indexed_like(res, group_axes):
1731 mutated = True
<ipython-input-29-5c4399cefd05> in conversion_rate(data)
8 #print total_waiting_time_list[i], target_list[i]
9 #print i, total_waiting_time_list[i], target_list[i]
---> 10 if total_waiting_time_list[i] <= 90:# and target_list[i] == 1:
11 convert_90_count += 1
12 #print 'convert ', convert_90_count
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
599 key = com._apply_if_callable(key, self)
600 try:
--> 601 result = self.index.get_value(self, key)
602
603 if not is_scalar(result):
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/indexes/base.pyc in get_value(self, series, key)
2426 try:
2427 return self._engine.get_value(s, k,
-> 2428 tz=getattr(series.dtype, 'tz', None))
2429 except KeyError as e1:
2430 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4363)()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value (pandas/_libs/index.c:4046)()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:13913)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:13857)()
KeyError: 0
Any idea what I did wrong here? And how do I fix it? Thanks!

I believe better is use mean of boolean mask per groups:
def my_rate(x):
return ((x['waiting_time'] <= 90) & (x['target'] == 1)).mean()
df = df.groupby(['my_year','my_month']).apply(my_rate).reset_index(name='my_rate')
print (df)
my_year my_month my_rate
0 2018 1 0.25
1 2018 2 0.00
2 2018 3 0.50
Any idea what I did wrong here?
Problem is waiting_time_list and target_list are not lists, but Series:
waiting_time_list = data['waiting_time']
target_list = data['target']
print (type(waiting_time_list))
<class 'pandas.core.series.Series'>
print (type(target_list))
<class 'pandas.core.series.Series'>
So if want indexing it failed, because in second group are indices 4,5, not 0,1.
if waiting_time_list[i] <= 90 and target_list[i] == 1:
For avoid it is possible convert Series to list:
waiting_time_list = data['waiting_time'].tolist()
target_list = data['target'].tolist()

How to write binary operator with two post operands syntax with Boost Spirit x3?

I am following this example: https://github.com/boostorg/spirit/blob/develop/example/x3/calc/calc9/expression_def.hpp
What I am trying to accomplish is to write a rule that parses and generates like min{x}{y}. Mostly the code is using expression grammar like x + y, but now I want to place and parse both operands to the rhs of the operator.
I added the following code in expression_def.hpp file:
...
x3::symbols<ast::optoken> additive_op;
x3::symbols<ast::optoken> multiplicative_op;
x3::symbols<ast::optoken> binarypost_op;
x3::symbols<ast::optoken> unary_op;
x3::symbols<> keywords;
...
binarypost_op.add
("min", ast::op_divide) // Dummy operation usage for now
;
...
struct binarypost_expr_class;
struct unary_expr_class;
...
typedef x3::rule<binarypost_expr_class, ast::expression>
binarypost_expr_type;
...
binarypost_expr_type const binarypost_expr = "binarypost_expr";
...
auto const multiplicative_expr_def =
binarypost_expr
>> *(multiplicative_op > binarypost_expr)
;
auto const binarypost_expr_def = // See the chaining operation
('{' > unary_expr > '}')
>> *(binarypost_op > ('{' > unary_expr > '}'))
;
auto const unary_expr_def =
primary_expr
| (unary_op > primary_expr)
;
This works fine. But it can only parse something like , {x} min {y}. I want to be able to parse min {x} {y}. I tried the many combinations such as :
binarypost_op >> ('{' > unary_expr > '}') > ('{' > unary_expr > '}') etc. But I cant seem to figure it out as to what is the right way to write this? Any suggestions / comments ?

Ok, here's the changes. The hard part is actually code-generating the builtin function.
Parsing
Step 1: extend AST
Always start with the AST. We want operands that can be function calls:
In ast.hpp:
struct function_call; // ADDED LINE
// ...
struct operand :
x3::variant<
nil
, unsigned int
, variable
, x3::forward_ast<unary>
, x3::forward_ast<expression>
, x3::forward_ast<function_call> // ADDED LINE
>
{
using base_type::base_type;
using base_type::operator=;
};
// ...
enum funtoken
{
fun_min,
fun_max,
};
// ...
struct function_call : x3::position_tagged
{
funtoken fun;
std::list<operand> args;
};
In ast_adapted.hpp:
BOOST_FUSION_ADAPT_STRUCT(client::ast::function_call,
fun, args
)
Step 2: extend grammar
(This is all in expression_def.hpp)
Let's be generic, so parse function name tokens using a symbol table:
x3::symbols<ast::funtoken> functions;
Which we have to initialize in add_keywords:
functions.add
("min", ast::fun_min)
("max", ast::fun_max)
;
Now declare a rule for function calls:
struct function_call_class;
typedef x3::rule<function_call_class, ast::function_call> function_call_type;
function_call_type const function_call = "function_call";
That's all red-tape. The "interesting thing" is the rule definition:
auto const function_call_def =
functions
>> '(' >> expression % ',' >> ')'
;
Well. That's underwhelming. Let's integrate into our primary expression rule:
auto const primary_expr_def =
uint_
| bool_
| function_call
| (!keywords >> identifier)
| ('(' > expression > ')')
;
Note the ordering. If you want to be able to add function names that collide with a keyword, you'll need to add precautions.
Also, lets make AST annotation work for our node:
struct function_call_class : x3::annotate_on_success {};
Code generation
It's easy to find where to add support for the new AST node:
In compiler.hpp:
bool operator()(ast::function_call const& x) const;
Now comes the hard part.
What's really required for general n-ary is an accumulator. Since we don't have registers, this would need to be a temporary (local). However, since the VM implementation doesn't have these, I've limited the implementation to a fixed binary function call only.
Note that the VM already has support for function calls. Functions can have locals. So, if you code-gen a variable-argument built-in function you can implement a left-fold recursive solution.
In compiler.cpp:
bool compiler::operator()(ast::function_call const& x) const
{
auto choice = [&](int opcode) {
BOOST_ASSERT(x.args.size() == 2); // TODO FIXME hardcoded binary builtin
auto it = x.args.begin();
auto& a = *it++;
if (!boost::apply_visitor(*this, a))
return false;
auto& b = *it++;
if (!boost::apply_visitor(*this, b))
return false;
program.op(opcode); // the binary fold operation
program.op(op_jump_if, 0);
size_t const branch = program.size()-1;
if (!boost::apply_visitor(*this, a))
return false;
program.op(op_jump, 0);
std::size_t continue_ = program.size()-1;
program[branch] = int(program.size()-branch);
if (!boost::apply_visitor(*this, b))
return false;
program[continue_] = int(program.size()-continue_);
return true;
};
switch (x.fun) {
case ast::fun_min: return choice(op_lt);
case ast::fun_max: return choice(op_gt);
default: BOOST_ASSERT(0); return false;
}
return true;
}
I've just taken inspiration from the surrounding code on how to generate the jump labels.
Trying It Out
A simplistic example would be: var x = min(1,3);
Assembler----------------
local x, #0
start:
op_stk_adj 1
op_int 1
op_int 3
op_lt
op_jump_if 13
op_int 1
op_jump 15
13:
op_int 3
15:
op_store x
end:
-------------------------
Results------------------
x: 1
-------------------------
Running it with some random contrived input:
./test <<< "var a=$(($RANDOM % 100)); var
b=$(($RANDOM % 100)); var contrived=min(max(27,2*a), 100+b);"
Prints e.g.:
Assembler----------------
local a, #0
local b, #1
local contrived, #2
start:
op_stk_adj 3
op_int 31
op_store a
op_int 71
op_store b
op_int 27
op_int 2
op_load a
op_mul
op_gt
op_jump_if 24
op_int 27
op_jump 29
24:
op_int 2
op_load a
op_mul
29:
op_int 100
op_load b
op_add
op_lt
op_jump_if 58
op_int 27
op_int 2
op_load a
op_mul
op_gt
op_jump_if 51
op_int 27
op_jump 56
51:
op_int 2
op_load a
op_mul
56:
op_jump 63
58:
op_int 100
op_load b
op_add
63:
op_store contrived
end:
-------------------------
Results------------------
a: 31
b: 71
contrived: 62
-------------------------

Find sum of the column values based on some other column

I have a input file like this:
j,z,b,bsy,afj,upz,343,13,ruhwd
u,i,a,dvp,ibt,dxv,154,00,adsif
t,a,a,jqj,dtd,yxq,540,49,kxthz
j,z,b,bsy,afj,upz,343,13,ruhwd
u,i,a,dvp,ibt,dxv,154,00,adsif
t,a,a,jqj,dtd,yxq,540,49,kxthz
c,u,g,nfk,ekh,trc,085,83,xppnl
For every unique value of Column1, I need to find out the sum of column7
Similarly, for every unique value of Column2, I need to find out the sum of column7
Output for 1 should be like:
j,686
u,308
t,98
c,83
Output for 2 should be like:
z,686
i,308
a,98
u,83
I am fairly new in Python. How can I achieve the above?

This could be done using Python's Counter and csv library as follows:
from collections import Counter
import csv
c1 = Counter()
c2 = Counter()
with open('input.csv') as f_input:
for cols in csv.reader(f_input):
col7 = int(cols[6])
c1[cols[0]] += col7
c2[cols[1]] += col7
print "Column 1"
for value, count in c1.iteritems():
print '{},{}'.format(value, count)
print "\nColumn 2"
for value, count in c2.iteritems():
print '{},{}'.format(value, count)
Giving you the following output:
Column 1
c,85
j,686
u,308
t,1080
Column 2
i,308
a,1080
z,686
u,85
A Counter is a type of Python dictionary that is useful for counting items automatically. c1 holds all of the column 1 entries and c2 holds all of the column 2 entries. Note, Python numbers lists starting from 0, so the first entry in a list is [0].
The csv library loads each line of the file into a list, with each entry in the list representing a different column. The code takes column 7 (i.e. cols[6]) and converts it into an integer, as all columns are held as strings. It is then added to the counter using either the column 1 or 2 value as the key. The result is two dictionaries holding the totaled counts for each key.

You can use pandas:
df = pd.read_csv('my_file.csv', header=None)
print(df.groupby(0)[6].sum())
print(df.groupby(1)[6].sum())
Output:
0
c 85
j 686
t 1080
u 308
Name: 6, dtype: int64
1
a 1080
i 308
u 85
z 686
Name: 6, dtype: int64
The data frame should look like this:
print(df.head())
Output:
0 1 2 3 4 5 6 7 8
0 j z b bsy afj upz 343 13 ruhwd
1 u i a dvp ibt dxv 154 0 adsif
2 t a a jqj dtd yxq 540 49 kxthz
3 j z b bsy afj upz 343 13 ruhwd
4 u i a dvp ibt dxv 154 0 adsif
You can also use your own names for the columns. Like c1, c2, ... c9:
df = pd.read_csv('my_file.csv', index_col=False, names=['c' + str(x) for x in range(1, 10)])
print(df)
Output:
c1 c2 c3 c4 c5 c6 c7 c8 c9
0 j z b bsy afj upz 343 13 ruhwd
1 u i a dvp ibt dxv 154 0 adsif
2 t a a jqj dtd yxq 540 49 kxthz
3 j z b bsy afj upz 343 13 ruhwd
4 u i a dvp ibt dxv 154 0 adsif
5 t a a jqj dtd yxq 540 49 kxthz
6 c u g nfk ekh trc 85 83 xppnl
Now, group by column 1 c1 or column c2 and sum up column 7 c7:
print(df.groupby(['c1'])['c7'].sum())
print(df.groupby(['c2'])['c7'].sum())
Output:
c1
c 85
j 686
t 1080
u 308
Name: c7, dtype: int64
c2
a 1080
i 308
u 85
z 686
Name: c7, dtype: int64

SO isn't supposed to be a code writing service, but I had a few minutes. :) Without Pandas you can do it with the CSV module;
import csv
def sum_to(results, key, add_value):
if key not in results:
results[key] = 0
results[key] += int(add_value)
column1_results = {}
column2_results = {}
with open("input.csv", 'rt') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
sum_to(column1_results, row[0], row[6])
sum_to(column2_results, row[1], row[6])
print column1_results
print column2_results
Results:
{'c': 85, 'j': 686, 'u': 308, 't': 1080}
{'i': 308, 'a': 1080, 'z': 686, 'u': 85}
Your expected results don't seem to match the math that Mike's answer and mine got using your spec. I'd double check that.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Interpreting / Reading text files written for Assembly application - c++

Related

How do I send text from TextView to 'Export to PDF'?

How to create in C or C++ the contents value of Sig type object for digital signature in PDF?

pandas - group by: create aggregation function using multiple columns

How to write binary operator with two post operands syntax with Boost Spirit x3?

Find sum of the column values based on some other column

Categories

Resources