Grouping lines with same leading word using Javascript Regex Engine

Grouping lines with same leading word using Javascript Regex Engine - regex

Suppose you have the following multi-line string:
C1 10
C2 20
C3 30
C2 40
C4 50
C3 60
And you want to match only those lines which have the same leading word, so as to build the following result:
C1 10
C2 20 40
C3 30 60
C4 50
I am trying to figure out a solution with pure Regex, but I am stuck. Any help?
I did try what the regex that follows, but it didn't work...
Regex: /(^\w+\b)(.*$)([\s\S]*?\n)(\1)(.*$)/gm
Substitution:$1$2$5$3
Result:
C1 10
C2 20 40
C3 30
C4 50
C3 60
As you can see, it only works with the first occurrence, despite the fact that I have used a lazy quantifier in the third capturing group.
Any help?

You could also accomplish this using reduce()
const data = `C1 10
C2 20
C3 30
C2 40
C4 50
C3 60`;
const result = data.split("\n").reduce((acc, val) => {
const vals = val.split(" ");
if (!acc[vals[0]]) acc[vals[0]] = vals[1];
else acc[vals[0]] += ` ${vals[1]}`;
return acc;
}, {});
console.log(result);

Related

Reading CSV file lines with hex content and convert it to decimal

Here is my CSV file and its contents are in hex.
a3 42 fe 9e 89
a3 43 14 9d cd
a3 43 1e 02 82
a3 43 23 bd 85
a3 43 39 d5 83
a3 43 3e b9 8d
a3 43 3f 44 c0
a3 43 50 c9 49
a3 43 67 29 c8
a3 43 67 43 0d
I need only the second-last value and the code to extract that value is this.
void getvalues(){
std::ifstream data("mydata1.CSV");
int row_count =0;
std::string line;
while(std::getline(data,line))
{
row_count +=1;
std::stringstream lineStream(line);
std::string cell;
int column_count = 0;
while(std::getline(lineStream,cell,' '))
{
column_count+=1;
if ( column_count == 5){
std::cout << std::dec<< cell[0]<< std::endl;
}
}
}
Since I am reading the lines into a string cell, I am unable to do any conversion. At first I tried wrapping the cell with an int but it returns me the ASCII value of the character which is quite obvious and I shouldn't have done that.

If you want to convert a string to an integer, you can use std::stoi, which is included in the string package. By default, you can use stoi as such:
int num = std::stoi(cell)
However since we want to parse a base 16 hex number, then we need to use it like:
int num = std::stoi(cell, 0, 16)
Quick article about this: https://www.includehelp.com/stl/convert-hex-string-to-integer-using-stoi-function-in-cpp-stl.aspx

c++ replacing order of items in byte array

I'm writing a code for Arduino C++.
I have a byte array with hex byte values, for example:
20 32 36 20 E0 EC 20 F9 F0 E9 E9 E3 F8 5C 70 5C 70 5C 73 20 E3 E2 EC 20 F8 E0 E5 E1 EF 20 39 31 5C
There are four ASCII digits in these bytes:
HEX 0x32 is number 2 in ascii code
HEX 0x35 is number 5 in ascii code
HEX 0x39 is number 9 in ascii code
and so on....
https://www.ascii-codes.com/cp862.html
So the hex values 32, 36 represent the number 26, and 39, 31 represent 91.
I want to find these numbers and reverse each group, so that (in this example) 62 and 19 are represented instead of 26 and 91.
The output would thus have to look like this:
20 36 32 20 E0 EC 20 F9 F0 E9 E9 E3 F8 5C 70 5C 70 5C 73 20 E3 E2 EC 20 F8 E0 E5 E1 EF 20 31 39 5C
The numbers don't have to be two digits but could be anything in 0-1000
I also know that each group of such numbers is preceded by the hex value 20, if that helps.
I have done this in C# (with some help of Stack overflow users :-) ):
string result = Regex.Replace(HexMessage1,
#"(?<=20\-)3[0-9](\-3[0-9])*(?=\-20)",
match => string.Join("-", Transform(match.Value.Split('-'))));
private static IEnumerable<string> Transform(string[] items)
{
// Either terse Linq:
// return items.Reverse();
// Or good old for loop:
string[] result = new string[items.Length];
for (int i = 0; i < items.Length; ++i)
result[i] = items[items.Length - i - 1];
return result;
}
Can someone help me make it work on C++?

Loop over the array, element by element, looking for 0x32 or 0x39. If found, check the next byte (if within bounds) to see if it matches 0x36 or 0x31 (respectively). If it does then swap the current and the next byte. Continue the loop, skipping over the current and the next byte.

CUDA - check repeated values and add two values

I have two group of arrays
a1 a2 a3 a4 a5 a6 a7 a8 <= name it as key1
b1 b2 b3 b4 b5 b6 b7 b8 <= val1
c1 c2 c3 c4 c5 c6 c7 c8
and
d1 d2 d3 d4 d5 d6 d7 d8 <= key2
e1 e2 e3 e4 e5 e6 e7 e8 <= val2
f1 f2 f3 f4 f5 f6 f7 f8
The arrays a1,...,an and d1,...,dn are sorted and might be repeated. i.e. their values might be something like 1 1 2 3 4 6 7 7 7 ... I want to check if for each Tuple di,ei check if it is equal to any of ai,bi. If it is (di==ai,bi==ei) then I have to combine fi and ci using some function e.g. add and store in fi.
Firstly, is it possible to do this using zip iterators and transformation in thurst library to solve this efficiently?
Secondly, the simplest method that I can imagine is to count occurance of number of each keys (ai) do prefix sum and use both to get start and end index of each keys and then for each di use above counting to iterate through those indices and check if ei==di. and perform the transformation.
i.e. If I have
1 1 2 3 5 6 7
2 3 4 5 2 4 6
2 4 5 6 7 8 5
as first array, I count the occurance of 1,2,3,4,5,6,7,...:
2 1 1 0 1 1 1 <=name it as count
and then do prefix sum to get:
2 3 4 4 5 6 7 <= name it as cumsum
and use this to do:
for each element di,
for i in (cumsum[di] -count[di]) to cumsum[di]:
if ei==val1[i] then performAddition;
What I fear is that since not all threads are equal, this will lead to warp divergence, and I may not have efficient performance.

You could treat your data as two key-value tables.Table1: (a,b) -> c and Table2: (d,e)->f, where pair (a,b) and (d,e) are keys, and c, f are values.
Then your problem simplifies to
foreach key in Table2
if key in Table1
Table2[key] += Table1[key]
Suppose a and b have limited ranges and are positive, such as unsigned char, a simple way to combine a and b into one key is
unsigned short key = (unsigned short)(a) * 256 + b;
If the range of key is still not too large as in the above example, you could create your Table1 as
int Table1[65536];
Checking if key in Table1 becomes
if (Table1[key] != INVALID_VALUE)
....
With all these restrictions, implementation with thrust should be very simple.
Similar combining method could still be used if a and b have larger range like int.
But if the range of key is too large, you have to go to the method suggested by Robert Crovella.

Grep: find lines only matching unknown character once

I have a list with hexadecimal lines. For example:
0b 5a 3f 5a 7d d0 5d e6 2b c4 7e 7d c2 c0 e6 9a
84 bd aa 74 f3 85 da 9d ac b6 e0 b6 62 0f b5 d5
c0 b0 f5 60 02 8b 1c a4 41 7c 53 f2 85 20 a0 d1
...
I'm trying to find all the lines with grep, where there is a character that occurs only once in the line.
For example: there is only one time a 'd' in the third line.
I tried this, but it's not working:
egrep '^.*([a-f0-9])[^\1]*$'

This can be done with a regex, but it has to be verbose.
It kind of can't be generalized.
# ^(?:[^a]*a[^a]*|[^b]*b[^b]*|[^c]*c[^c]*|[^d]*d[^d]*|[^e]*e[^e]*|[^f]*f[^f]*|[^0]*0[^0]*|[^1]*1[^1]*|[^2]*2[^2]*|[^3]*3[^3]*|[^4]*4[^4]*|[^5]*5[^5]*|[^6]*6[^6]*|[^7]*7[^7]*|[^8]*8[^8]*|[^9]*9[^9]*)$
^
(?:
[^a]* a [^a]*
| [^b]* b [^b]*
| [^c]* c [^c]*
| [^d]* d [^d]*
| [^e]* e [^e]*
| [^f]* f [^f]*
| [^0]* 0 [^0]*
| [^1]* 1 [^1]*
| [^2]* 2 [^2]*
| [^3]* 3 [^3]*
| [^4]* 4 [^4]*
| [^5]* 5 [^5]*
| [^6]* 6 [^6]*
| [^7]* 7 [^7]*
| [^8]* 8 [^8]*
| [^9]* 9 [^9]*
)
$
For discovery, if you put capture groups around the letters and numbers,
and use a brach reset:
^
(?|
[^a]* (a) [^a]*
| [^b]* (b) [^b]*
| [^c]* (c) [^c]*
| [^d]* (d) [^d]*
| [^e]* (e) [^e]*
| [^f]* (f) [^f]*
| [^0]* (0) [^0]*
| [^1]* (1) [^1]*
| [^2]* (2) [^2]*
| [^3]* (3) [^3]*
| [^4]* (4) [^4]*
| [^5]* (5) [^5]*
| [^6]* (6) [^6]*
| [^7]* (7) [^7]*
| [^8]* (8) [^8]*
| [^9]* (9) [^9]*
)
$
This is the output:
** Grp 0 - ( pos 0 , len 50 )
0b 5a 3f 5a 7d d0 5d e6 2b c4 7e 7d c2 c0 e6 9a
** Grp 1 - ( pos 7 , len 1 )
f
-----------------------
** Grp 0 - ( pos 50 , len 51 )
84 bd aa 74 f3 85 da 9d ac b6 e0 b6 62 0f b5 d5
** Grp 1 - ( pos 77 , len 1 )
c
-----------------------
** Grp 0 - ( pos 101 , len 51 )
c0 b0 f5 60 02 8b 1c a4 41 7c 53 f2 85 20 a0 d1
** Grp 1 - ( pos 148 , len 1 )
d

I don't know a way to do it with a regex. However you can use this stupid awk script:
awk -F '' '{for(i=1;i<=NF;i++){a[$i]++};for(i in a){if(a[i]==1){print;next}}}' input
The scripts counts the number of occurrences of every character in the line. At the end of the line it checks all totals and prints the line if at least one of those totals equals 1.

Here is a piece of code that uses a number of shell tools beyond grep.
It reads the input line by line. Generates a frequency table. Upon finding an element with frequency 1 it outputs the unique character and the entire line.
cat input | while read line ; do
export line ;
echo $line | grep -o . | sort | uniq -c | \
awk '/[ ]+1[ ]/ {print $2 ":" ENVIRON["line"] ; exit }' ;
done
Note that if you are interested in digits only you could replace grep -o . with grep -o "[a-f]"

Regular expression sequence matching

Is it possible to create a regular expression to find an incrementing sequence of hex numbers? I am trying to find number sequences (4 numbers long) inside seemingly random hex number strings.
... 59 fd 25 bf b1 b2 b3 b4 39 ca ...
... 35 c1 55 c4 c5 c6 c7 74 92 e1 ...
I was hoping to find the pattern b1 b2 b3 b4 in line 1 and c4 c5 c6 c7 in line 2.
Group matching will find same number sequences... /(\w\w)\1{3}/ will find c4 c4 c4 c4 but I haven't found a way to match the incrementing sequence.
Any ideas?

Regex is used for matching patterns occurring repeatedly not for matching patterns occurring incrementally
You better parse it with your own parser

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Grouping lines with same leading word using Javascript Regex Engine - regex

Related

Reading CSV file lines with hex content and convert it to decimal

c++ replacing order of items in byte array

CUDA - check repeated values and add two values

Grep: find lines only matching unknown character once

Regular expression sequence matching

Categories

Resources