Translate c++ functions in TypeScript - c++

Given the following functions written in C++:
#define getbit(s,i) ((s)[(i)/8] & 0x01<<(i)%8)
#define setbit(s,i) ((s)[(i)/8] |= 0x01<<(i)%8)
How can I turn them into compatible TypeScript functions?
I came up with:
function setbit(s: string, i: number): number {
return +s[i / 8] | 0x01 << i % 8;
}
function getbit(s: string, i: number): number {
return +s[i / 8] & 0x01 << i % 8;
}
I found out that the a |= b equivalent is a = a | b, but I'm not sure about the getbit function implementation. Also, I don't really understand what those functions are supposed to do. Could someone explain them, please?
Thank you.
EDIT:
Using the ideas from #Thomas, I ended up doing this:
function setBit(x: number, mask: number) {
return x | 1 << mask;
}
// not really get, more like a test
function getBit(x: number, mask: number) {
return ((x >> mask) % 2 !== 0);
}
since I don't really need a string for the binary representation.

Strings ain't a good storage here. And btw, JS Strings use 16bit characters, so you're using only 1/256th of the storage possible.
function setbit(string, index) {
//you could do `index >> 3` but this will/may fail if index > 0xFFFFFFFF
//well, fail as in produce wrong results, not as in throwing an error.
var position = Math.floor(index/8),
bit = 1 << (index&7),
char = string.charCodeAt(position);
return string.substr(0, position) + String.fromCharCode(char|bit) + string.substr(position+1);
}
function getbit(string, index) {
var position = Math.floor(i/8),
bit = 1 << (i&7),
char = string.charCodeAt(position);
return Boolean(char & bit);
}
better would be a (typed) Array.
function setBit(array, index){
var position = Math.floor(index/8),
bit = 1 << (index&7);
array[position] |= bit; //JS knows `|=` too
return array;
}
function getBit(array, index) {
var position = Math.floor(index/8),
bit = 1 << (index&7);
return Boolean(array[position] & bit)
}
var storage = new Uint8Array(100);
setBit(storage, 42);
console.log(storage[5]);
var data = [];
setBit(data, 42);
console.log(data);
works with both, but:
all typed Arrays have a fixed length that can not be changed after memory allocation (creation).
regular arrays don't have a regular type, like 8bit/index or so, limit is 53Bit with floats, but for performance reasons you should stick with up to INT31 (31, not 32), that means 30bits + sign. In this case the JS engine can optimize this thing a bit behind the scenes; reduce memory impact and is a little faster.
But if performance is the topic, use Typed Arrays! Although you have to know in advance how big this thing can get.

Related

Convert hex integer into form "\x" (c++ - memory)

DWORD FindPattern(DWORD base, DWORD size, char *pattern, char *mask)
{
// Get length for our mask, this will allow us to loop through our array
DWORD patternLength = (DWORD)strlen(mask);
for (DWORD i = 0; i < size - patternLength; i++)
{
bool found = true;
for (DWORD j = 0; j < patternLength; j++)
{
// If we have a ? in our mask then we have true by default,
// or if the bytes match then we keep searching until finding it or not
found &= mask[j] == '?' || pattern[j] == *(char*)(base + i + j);
}
// Found = true, our entire pattern was found
// Return the memory addy so we can write to it
if (found)
{
return base + i;
}
}
return NULL;
}
Above is my FindPattern function that I use to find bytes in a given section of memory, here's how I call the function:
DWORD PATTERN = FindPattern(0xC0000000, 0x20000,"\x1F\x37\x66\xE3", "xxxx");
PrintStringBottomCentre("%02x", PATTERN);
Now, say I had an integer for example: 0xDEADBEEF
I want to convert this into a char pointer like: "\xDE\xAD\xBE\xEF", this is so that I can put it into my FindPattern function. How would I do this?
You have to be careful here. On many architectures including x86, ints are stored using little endian, meaning that the int 0xDEADBEEF is stored in memory in this order: EF BE AD DE. But the char array is stored in the order DE AD BE EF.
So the question is, are you trying to find an int 0xDEADBEEF stored in memory, or do you actually want the sequence of bytes DE AD BE EF?
If you want the int, don't use a char* array at all. Pass in your pattern and mask as DWORDs, and you can simplify that function a lot.
If you want to find the sequence of bytes, then don't store it as an int in the first place. Just get the input as a char array and pass it directly in as your pattern.
Edit: you can try something like this, which I think will give you what you want:
int a = 0xDEADBEEF;
char pattern[4];
pattern[0] = (a >> 24) & 0xFF;
pattern[1] = (a >> 16) & 0xFF;
pattern[2] = (a >> 8) & 0xFF;
pattern[3] = a & 0xFF;
The \ character in C/C++ is an escape character, so anything that follows it is translated to the escape character you want, hex conversion (\x) in your string. In order to avoid that, add another \ before it so it will be considered as a normal character.
Ex.) \\xDE\\xAD\\xBE\\xEF

How to extract one specific bit from a uint16_t variable properly

Long story short, I am currently coding a wrapper in C++ for a C - library which extracts the value of registers on an embedded system. To monitor what happens, I need to read the value of a bit for some registers and make a getter for each of them.
Basically, I would like my method to return one bool from a bit stored into a uint16_t variable. On a 'naive' and uncaffeinated approach I was doing something like that :
bool getBusyDevice(int fd) // fd stands for file descriptor, for each instance of the class
{
uint16_t statusRegVal = 0;
get_commandReg(fd, &statusRegVal); // C-library function to get the value of status register
uint16_t shift = 0; // depends on the bit to access - for reusability
bool Busy = (bool) (statusRegVal >> shift);
return busy;
}
I am not quite happy with the result and I would like to know if there was a 'proper' way to do that...
Thanks a lot for your advice !
The normal way to get just a single bit is to use the bitwise and operator &. Like e.g. statusRegVal & bitValue. If the bit is set then the result will be equal to bitValue, meaning to get a boolean result you could do a simple comparison: statusRegVal & bitValue == bitValue.
So if you want to check if bit zero (which has the value 0x0001) is set, then you could simply do
return statusRegVal & 0x0001 == 0x0001;
For better understanding of what you want, take a look at the following link
Masking: https://en.wikipedia.org/wiki/Mask_(computing)
and
Bit Manipulation: https://en.wikipedia.org/wiki/Bit_manipulation
Conclusion:
If you want to read specific number of bits in variable(register), you should make a MASK with this variable with bits positions.
say you've 2Byte variable (u16Reg) and you want to read bits [5,7] so,
value = ((u16Reg & 0x00A0) >> 5).
In you case, you want to read one bit and return with its status TRUE or FALSE.
value = ((u16Reg & (0x0001 << n)) >> n)
where n is the bit number you want to read.
Lets understand it.
say u16Reg = 0x529D = 0b0101001010011101; bit[0] = 1 and bit[15] = 0; and you want to get bit number 9.
So, First make sure that all bits are zeros except yours (9).
(0b0101001010011101 & (0x0001 << 9)) =
(0b0101001010011101 & 0x0200) =
(0b0101001010011101 & 0b0000001000000000) =
(0b0000001000000000) = 0x0200
this means TRUE in case you mean nonZero is TRUE. But if TRUE means 0x01, you should move this bit to bit[0] as following:
(0x0200 >> 9) = 0x0001 is TRUE
If you can understand this, you can make it simpler like:
value = ((u16Reg >> n) & 0x0001)
Why not to use templates:
template<int SHIFT>
bool boolRegVal(uint16_t val) {
return val & (1 << SHIFT);
}
And then usage:
boolRegVal<4>(statusRegVal);
The casting to bool won't be helpful because it doesn't 1 bit type.
you must clean the rest of the bits, and then check if you've got 0.
You can do something like this:
bool Busy = ((statusRegVal >> shift) & 1) ? true : false;
The standard library provides std::bitset for manipulating bits. Here's an example but am sure you can guess what it does.
#include <bitset>
#include <iostream>
using namespace std;
int main(int, char**){
typedef bitset<sizeof(int)*8> BitsType; //or uint16_t or whatever
BitsType bits(0xDEADBEEF);
for(int i = 0; i < 5; ++i) //access the bits
cout << "bits[" << i << "] = " << bits[i] << '\n';
cout << "bit[3] = " << bits[3] << '\n'; //original
bits.flip(3);
cout << "bit[3] = " << bits[3] << '\n'; //b[3] = !b[3]
return 0;
}
The operator [](size_t) is overloaded to return a reference so you can assign to it too. bits[4] = false for example. And finally, when done playing with your bits :) you can convert back to long (or ulong) or in your case uint16_t value = static_cast<uint16_t>(bits.to_ulong()). Kudos to stdlib.

platform difference?

I was trying out the bitset class in C++ and I tried this with the number 137 as an example:
So, I converted it to binary number which gave me 10001001. Now, I wanted to cut off the MSB and store the rest bits 0001001 in another bit instance called bitarray and I was expecting to see that in the bitarray but it wasn't giving the right value. what could have been the problem? I was just trying to split the MSB from the rest of the bits in the 137 binary representation...here is the code:
bitset<8> bitarray;
bitset<8> bitsetObject(num);
int val = bitsetObject.size();
for (int i = 0; i <= (val - 1); i++)
{
if (i == 6)
break;
else
bitarray[i] = bitsetObject[i + 1];
}
If anyone knows how I could easily slice from the second element to the last element in the bitsetObject array, let me know. Thanks..
If you're just trying to make a new bitset object with the most significant set bit reset, then consider the following:
template<std::size_t N>
std::bitset<N> strip_mssb(std::bitset<N> bitarray)
{
for (std::size_t i = bitarray.size(); i--;)
if (bitarray[i])
{
bitarray.reset(i);
break;
}
return bitarray;
}
Online demo.
You set bitarray[0] equal to bitsetObject[1], which is 0 (assuming num is really 137).
You seem to expect the least bit of bitarray to be equal to 1.

add 1 to c++ bitset

I have a c++ bitset of given length. I want to generate all possible combinations of this bitset for which I thought of adding 1 2^bitset.length times. How to do this? Boost library solution is also acceptable
Try this:
/*
* This function adds 1 to the bitset.
*
* Since the bitset does not natively support addition we do it manually.
* If XOR a bit with 1 leaves it as one then we did not overflow so we can break out
* otherwise the bit is zero meaning it was previously one which means we have a bit
* overflow which must be added 1 to the next bit etc.
*/
void increment(boost::dynamic_bitset<>& bitset)
{
for(int loop = 0;loop < bitset.count(); ++loop)
{
if ((bitset[loop] ^= 0x1) == 0x1)
{ break;
}
}
}
All possible combinations? Just use 64-bit unsigned integer and make your life easier.
Not best, but brute force way, but you can add 1 by converting using to_ulong()
bitset<32> b (13);
b = b.to_ulong() + 1;
Using boost library, you can try the following:
For example, a bitset of length 4
boost::dynamic_bitset<> bitset;
for (int i = 0; i < pow(2.0, 4); i++) {
bitset = boost::dynamic_bitset<>(4, i);
std::cout << bitset << std::endl;
}

Efficiently check string for one of several hundred possible suffixes

I need to write a C/C++ function that would quickly check if string ends with one of ~1000 predefined suffixes. Specifically the string is a hostname and I need to check if it belongs to one of several hundred predefined second-level domains.
This function will be called a lot so it needs to be written as efficiently as possible. Bitwise hacks etc anything goes as long as it turns out fast.
Set of suffixes is predetermined at compile-time and doesn't change.
I am thinking of either implementing a variation of Rabin-Karp or write a tool that would generate a function with nested ifs and switches that would be custom tailored to specific set of suffixes. Since the application in question is 64-bit to speed up comparisons I could store suffixes of up to 8 bytes in length as const sorted array and do binary search within it.
Are there any other reasonable options?
If the suffixes don't contain any expansions/rules (like a regex), you could build a Trie of the suffixes in reverse order, and then match the string based on that. For instance
suffixes:
foo
bar
bao
reverse order suffix trie:
o
-a-b (matches bao)
-o-f (matches foo)
r-a-b (matches bar)
These can then be used to match your string:
"mystringfoo" -> reverse -> "oofgnirtsym" -> trie match -> foo suffix
You mention that you're looking at second-level domain names only, so even without knowing the precise set of matching domains, you could extract the relevant portion of the input string.
Then simply use a hashtable. Dimension it in such a way that there are no collisions, so you don't need buckets; lookups will be exactly O(1). For small hash types (e.g. 32 bits), you'd want to check if the strings really match. For a 64-bit hash, the probability of another domain colliding with one of the hashes in your table is already so low (order 10^-17) that you can probably live with it.
I would reverse all of the suffix strings, build a prefix tree of them and then test the reverse of your IP string against that.
I think that building your own automata would be the most efficient way.. it's a sort of your second solution, according to which, starting from a finite set of suffixes, it generates an automaton fitted for that suffixes.
I think you can easily use flex to do it, taking care of reversing the input or handling in a special way the fact that you are looking just for suffixes (just for efficienty matters)..
By the way using a Rabin-Karp approach would be efficient too since your suffixes will be short. You can fit a hashset with all the suffixes needed and then
take a string
take the suffix
calculate the hash of the suffix
check if suffix is in the table
Just create a 26x26 array of set of domains. e.g. thisArray[0][0] will be the domains that end in 'aa', thisArray[0][1] is all the domains that end in 'ab' and so on...
Once you have that, just search your array for thisArray[2nd last char of hostname][last char of hostname] to get the possible domains. If there's more than one at that stage, just brute force the rest.
I think that the solution should be very different depending on the type of input strings. If the strings are some kind of string class that can be iterated from the end (such as stl strings) it is a lot easier than if they are NULL-terminated C-strings.
String Class
Iterate the string backwards (don't make a reverse copy - use some kind of backward iterator). Build a Trie where each node consists of two 64-bit words, one pattern and one bitmask. Then check 8 characters at a time in each level. The mask is used if you want to match less than 8 characters - e.g. deny "*.org" would give a mask with 32 bits set. The mask is also used as termination criteria.
C strings
Construct an NDFA that matches the strings on a single-pass over them. That way you don't have to first iterate to the end but can instead use it in one pass. An NDFA can be converted to a DFA, which will probably make the implementation more efficient. Both construction of the NDFA and conversion to DFA will probably be so complex that you will have to write tools for it.
After some research and deliberation I've decided to go with trie/finite state machine approach.
The string is parsed starting from the last character going backwards using a TRIE as long as the portion of suffix that was parsed so far can correspond to multiple suffixes. At some point we either hit the first character of one of the possible suffixes which means that we have a match, hit a dead end, which means there are no more possible matches or get into situation where there is only one suffix candidate. In this case we just do compare remainder of the suffix.
Since trie lookups are constant time, worst case complexity is o(maximum suffix length). The function turned out to be pretty fast. On 2.8Ghz Core i5 it can check 33,000,000 strings per second for 2K possible suffixes. 2K suffixes totaling 18 kilobytes, expanded to 320kb trie/state machine table. I guess that I could have stored it more efficiently but this solution seems to work good enough for the time being.
Since suffix list was so large, I didn't want to code it all by hand so I ended up writing C# application that generated C code for the suffix checking function:
public static uint GetFourBytes(string s, int index)
{
byte[] bytes = new byte[4] { 0, 0, 0, 0};
int len = Math.Min(s.Length - index, 4);
Encoding.ASCII.GetBytes(s, index, len, bytes, 0);
return BitConverter.ToUInt32(bytes, 0);
}
public static string ReverseString(string s)
{
char[] chars = s.ToCharArray();
Array.Reverse(chars);
return new string(chars);
}
static StringBuilder trieArray = new StringBuilder();
static int trieArraySize = 0;
static void Main(string[] args)
{
// read all non-empty lines from input file
var suffixes = File
.ReadAllLines(#"suffixes.txt")
.Where(l => !string.IsNullOrEmpty(l));
var reversedSuffixes = suffixes
.Select(s => ReverseString(s));
int start = CreateTrieNode(reversedSuffixes, "");
string outFName = #"checkStringSuffix.debug.h";
if (args.Length != 0 && args[0] == "--release")
{
outFName = #"checkStringSuffix.h";
}
using (StreamWriter wrt = new StreamWriter(outFName))
{
wrt.WriteLine(
"#pragma once\n\n" +
"#define TRIE_NONE -1000000\n"+
"#define TRIE_DONE -2000000\n\n"
);
wrt.WriteLine("const int trieArray[] = {{{0}\n}};", trieArray);
wrt.WriteLine(
"inline bool checkSingleSuffix(const char* str, const char* curr, const int* trie) {\n"+
" int len = trie[0];\n"+
" if (curr - str < len) return false;\n"+
" const char* cmp = (const char*)(trie + 1);\n"+
" while (len-- > 0) {\n"+
" if (*--curr != *cmp++) return false;\n"+
" }\n"+
" return true;\n"+
"}\n\n"+
"bool checkStringSuffix(const char* str, int len) {\n" +
" if (len < " + suffixes.Select(s => s.Length).Min().ToString() + ") return false;\n" +
" const char* curr = (str + len - 1);\n"+
" int currTrie = " + start.ToString() + ";\n"+
" while (curr >= str) {\n" +
" assert(*curr >= 0x20 && *curr <= 0x7f);\n" +
" currTrie = trieArray[currTrie + *curr - 0x20];\n" +
" if (currTrie < 0) {\n" +
" if (currTrie == TRIE_NONE) return false;\n" +
" if (currTrie == TRIE_DONE) return true;\n" +
" return checkSingleSuffix(str, curr, trieArray - currTrie - 1);\n" +
" }\n"+
" --curr;\n"+
" }\n" +
" return false;\n"+
"}\n"
);
}
}
private static int CreateTrieNode(IEnumerable<string> suffixes, string prefix)
{
int retVal = trieArraySize;
if (suffixes.Count() == 1)
{
string theSuffix = suffixes.Single();
trieArray.AppendFormat("\n\t/* {1} - {2} */ {0}, ", theSuffix.Length, trieArraySize, prefix);
++trieArraySize;
for (int i = 0; i < theSuffix.Length; i += 4)
{
trieArray.AppendFormat("0x{0:X}, ", GetFourBytes(theSuffix, i));
++trieArraySize;
}
retVal = -(retVal + 1);
}
else
{
var groupByFirstChar =
from s in suffixes
let first = s[0]
let remainder = s.Substring(1)
group remainder by first;
string[] trieIndexes = new string[0x60];
for (int i = 0; i < trieIndexes.Length; ++i)
{
trieIndexes[i] = "TRIE_NONE";
}
foreach (var g in groupByFirstChar)
{
if (g.Any(s => s == string.Empty))
{
trieIndexes[g.Key - 0x20] = "TRIE_DONE";
continue;
}
trieIndexes[g.Key - 0x20] = CreateTrieNode(g, g.Key + prefix).ToString();
}
trieArray.AppendFormat("\n\t/* {1} - {2} */ {0},", string.Join(", ", trieIndexes), trieArraySize, prefix);
retVal = trieArraySize;
trieArraySize += 0x60;
}
return retVal;
}
So it generates code like this:
inline bool checkSingleSuffix(const char* str, const char* curr, const int* trie) {
int len = trie[0];
if (curr - str < len) return false;
const char* cmp = (const char*)(trie + 1);
while (len-- > 0) {
if (*--curr != *cmp++) return false;
}
return true;
}
bool checkStringSuffix(const char* str, int len) {
if (len < 5) return false;
const char* curr = (str + len - 1);
int currTrie = 81921;
while (curr >= str) {
assert(*curr >= 0x20 && *curr <= 0x7f);
currTrie = trieArray[currTrie + *curr - 0x20];
if (currTrie < 0) {
if (currTrie == TRIE_NONE) return false;
if (currTrie == TRIE_DONE) return true;
return checkSingleSuffix(str, curr, trieArray - currTrie - 1);
}
--curr;
}
return false;
}
Since for my particular set of data len in checkSingleSuffix was never more than 9, I tried to replace the comparison loop with switch (len) and hardcoded comparison routines that compared up to 8 bytes of data at a time but it didn't affect overall performance at all either way.
Thanks for everyone who contributed their ideas!