pseudo steps for parsing http post data

pseudo steps for parsing http post data - c++

I am programming a CGI script using c++ where I get post-data from client
If I have post-data like this
sam=1&sam2=3&sam5=65
what are the steps to parse options where I will put each pair in map object in c++?
I have thought of splitting values by '&' into vector array then splitting each entry in the vector array by '=' and put them in map object, what do you think? Also I want to know what can I do to deal with malformed post-data.

You have the first two steps correct. Separate the name/value pairs at the & and separate the name from the value at the =. But then you have to URL decode the name and URL decode the value before inserting them into the map.
For example, a + in the name or the value should be changed to a space. A %2B should be replaced with a space. A %3D with an equals sign.
As for malformed post data, it will really depend on your application what you consider malformed. You probably should check for an ambiguous value like x=2=3. Is the value 2=3 or 3? And you can check for a missing name, like =3 or just 3.
But the rest is up to the specifics of what's legal in your particular application.

Related

PDI - Check data types of field

I'm trying to create a transformation read csv files and check data types for each field in that csv.
Like this : the standard field A should string(1) character and field B is integer/number.
And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.
I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.

You can read files in loop, and
add step as below,
after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.
after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.
If it has value 1 then move files else ....
I hope this can help.

You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.
Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.
In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.
Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move
And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like
there. It is more flexible if you need maintenance on the long run.

SQLITE CHECK constraint for hex values

I want to store IP Address using c++ in sqlite3 DB in hex format. I am using TEXT as the format for storing hex values. I want to perform a check on the value being inserted in the DB. The value range I want to check for is 0x00000000 - 0xFFFFFFFF. For INTEGER type you can check by - CHECK (num BETWEEN 0 AND 100). Is there any way to add check constraint for Hex Values?
If there is an smarter way to store IP address in SQLITE please share with me.
Thanks

I think you have two main choices: (a) store as hex (i.e. text) and check "hex conformity", or (b) store as integer and print it as hex when reading data.
There may be several reasons for preferring the one over the other, e.g. if the application actually provides and receives integer values. Anyway, some examples for option (a) and option (b).
Option (a) - store as text and check hex conformity
the probably simplest way is to use a check based on GLOB as remarked by CL:
value TEXT CONSTRAINT "valueIsHex" CHECK(value GLOB "0x[a-fA-F0-9][a-fA-F0-9][a-fA-F0-9][a-fA-F0-9][a-fA-F0-9][a-fA-F0-9][a-fA-F0-9][a-fA-F0-9]")
If logic goes beyond that supported by GLOB, you could install a user defined function, either a general regexp() function or a custom function for your specific needs. This could be the case, if you want to store a complete IP-address in one field and still want to do a conformity check. For help on general regexp() function, confer this SO answer. For help on user defined functions confer, for example, this StackOverflow answer.
Option (b) - store as int and print as hex
If your application is actually working with integers, then change your db schema to integer and add a check like:
value INTEGER CONSTRAINT "valueIsValid" CHECK(value <= 0xFFFFFFFF)
For convenience, you can still print the value as hex using a query (or defining a corresponding view) like the following:
select printf('%08X', value) from theTable

RapidMiner: Can I use a wildcard as an attribute value for training a decision tree model?

I am working on a fairly simple process in RapidMiner 5.3.013, which reads a CSV file and uses it as a training set to train the decision tree classifier. The result of the process is the model. A second CSV is read and used as the unlabeled set. The model (calculated earlier) is applied to the unlabeled test set, in an effort to label it properly.
Each line of the CSVs contains a few attributes, for example:
15, 0, 1555, abc*15, label1
but some lines of the training set may be like this:
15, 0, *, abc*15, label2
This is done because the third value may take various values, so the creator of the training set used a star as a wildcard in the place of the value.
What I would like to do is let the decision tree know that the star there means "match anything", so that it does not literally only match a star.
Notes:
the star in the 4th field (abc*15) should be matched literally and not as a wildcard.
if the 3rd field always contained stars, I could just not include it in the attributes, but that's not the case. Sometimes the 3rd field contains integer values, which should be matched literally.
I tried leaving the field blank, but it doesn't work
So, is there a way to use regular expressions, or at least a simple wildcard while training the classifier or using the model?
A different way to put it is: Can I instruct the classifier to not use some of the attributes in some of the entries (lines in the CSV)?
Thanks!

I would process the data so the missing value is valid in its own right and I would discretize the valid numbers to be in ranges.
In more detail, what I meant by missing is the situation where the value of an attribute is something like *. I would simply allow this to be one valid value that the attribute takes. For all the other values of this attribute, these are numerical so they need to be converted to a nominal value to be compatible with the now valid *.
It's fairly fiddly to do this and I haven't tried this but I would start with the operator Declare Missing Value to detect the * and make them missing. From there, I would use the operator Discretize by Binning to convert numbers into nominal values. Finally, I would use Replace Missing Values to change the missing values to a nominal value like Missing. You might ask why bother with the first Declare Missing step above? The reason is that it will allow the Discretizing operation to work because it will be working on numbers alone given that non-numbers are marked as missing.
The resulting example set then be passed to a model in the normal way. Obviously, the model has to be able to cope with nominal attributes (Decision trees does).
It occurred to me that some modelling operators are more tolerant of missing data. I think k-nearest-neighbours may be one. In this case, you could simply mark the missing ones as above and not bother with the discretizing step.
The whole area of missing data does need care because it's important to understand the source of missingness. If missing data is correlated with other attributes or with the label itself, handling it inappropriately can skew results.

How can I obfuscate/de-obfuscate integer properties?

My users will in some cases be able to view a web version of a database table that stores data they've entered. For various reasons I need to include all the stored data, including a number of integer flags for each record that encapsulate adjacencies and so forth within the data (this is for speed and convenience at runtime). But rather than exposing them one-for-one in the webview, I'd like to have an obfuscated field that's just called "reserved" and contains a single unintelligible string representing those flags that I can easily encode and decode.
How can I do this efficiently in C++/Objective C?
Thanks!

Is it necessary that this field is exposed to the user visually, or just that it’s losslessly captured in the HTML content of the webview? If possible, can you include the flags as a hidden input element with each row, i.e., <input type=“hidden” …?

Why not convert each of the fields to hex, and append them as a string and save that value?
As long as you always append the strings in the same order, breaking them back apart and converting them back to numbers should be trivial.

Use symmetric encryption (example) to encode and decode the values. Of course, only you should know of the key.
Alternatively, Assymetric RSA is more powerfull encryption but is less efficient and is more complex to use.
Note: i am curios about the "various reasons" that require this design...

Multiply your flag integer by 7, add 3, and convert to base-36. To check if the resulting string is modified, convert back to base-2, and check if the result modulo 7 is still 3. If so, divide by 7 to get the flags. note that this is subject to replay attacks - users can copy any valid string in.

Just calculate a CRC-32 (or similar) and append it to your value. That will tell you, with a very high probability, if your value has been corrupted.

Making an index-creating class

I'm busy with programming a class that creates an index out of a text-file ASCII/BINARY.
My problem is that I don't really know how to start. I already had some tries but none really worked well for me.
I do NOT need to find the address of the file via the MFT. Just loading the file and finding stuff much faster by searching for the key in the index-file and going in the text-file to the address it shows.
The index-file should be built up as follows:
KEY ADDRESS
1 0xABCDEF
2 0xFEDCBA
. .
. .
We have a text-file with the following example value:
1, 8752 FW,
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++,
******************************************************************************,
------------------------------------------------------------------------------;
I hope that this explains my question a bit better.
Thanks!

It seems to me that all your class needs to do is store an array of pointers or file start offsets to the key locations in the file.
It really depends on what your Key locations represent.
I would suggest that you access the file through your class using some public methods. You can then more easily tie in Key locations with the data written.
For example, your Key locations may be where each new data block written into the file starts from. e.g. first block 1000 bytes, key location 0; second block 2500 bytes, key location 1000; third block 550 bytes; key location 3500; the next block will be 4050 all assuming that 0 is the first byte.
Store the Key values in a variable length array and then you can easily retrieve the starting point for a data block.
If your Key point is signified by some key character then you can use the same class, but with a slight change to store where the Key value is stored. The simplest way is to step through the data until the key character is located, counting the number of characters checked as you go. The count is then used to produce your key location.

Your code snippet isn't so much of an idea as it is the functionality you wish to have in the end.
Recognize that "indexing" merely means "remembering" where things are located. You can accomplish this using any data structure you wish... B-Tree, Red/Black tree, BST, or more advanced structures like suffix trees/suffix arrays.
I recommend you look into such data structures.
edit:
with the new information, I would suggest making your own key/value lookup. Build an array of keys, and associate their values somehow. this may mean building a class or struct that contains both the key and the value, or instead contains the key and a pointer to a struct or class with a value, etc.
Once you have done this, sort the key array. Now, you have the ability to do a binary search on the keys to find the appropriate value for a given key.
You could build a hash table in a similar manner. you could build a BST or similar structure like i mentioned earlier.

I still don't really understand the question (work on your question asking skillz), but as far as I can tell the algorithm will be:
scan the file linearly, the first value up to the first comma (',') is a key, probably. All other keys occur wherever a ';' occurs, up to the next ',' (you might need to skip linebreaks here). If it's a homework assignment, just use scanf() or something to read the key.
print out the key and byte position you found it at to your index file
AFAIUI that's the algorithm, I don't really see the problem here?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js