What is the hyperscan::Regex equivalent of regex::Regex::replace? - regex

I'm rewriting this code:
use regex::Regex;
use std::fs;
use std::io;
fn main() {
let contents = fs::read_to_string("/home/jerzy/trackers_best.txt")
.expect("Something went wrong reading the file");
println!("Give me the magnet link.");
let mut magnet = String::new();
io::stdin()
.read_line(&mut magnet)
.expect("Failed to read line");
magnet = magnet.trim().to_string();
let re = Regex::new(r"\&tr=.{3,1000}").expect("Failed to initialize the variable `re`.");
magnet = re.replace(&magnet, contents).to_string();
println!("The updated magnet link:\n{}", magnet);
}
/home/jerzy/trackers_best.txt is a text file that gets automatically downloaded every evening to my home directory from https://raw.githubusercontent.com/ngosang/trackerslist/master/trackers_best.txt with the help of wget and cron.
I want to use the hyperscan crate instead of the regex crate to improve the performance.
The new code:
use hyperscan::regex::Regex;
use std::fs;
use std::io;
fn main() {
let contents = fs::read_to_string("/home/jerzy/trackers_best.txt")
.expect("Something went wrong reading the file");
println!("Give me the magnet link.");
let mut magnet = String::new();
io::stdin()
.read_line(&mut magnet)
.expect("Failed to read line");
magnet = magnet.trim().to_string();
let re = Regex::new(r"\&tr=.{3,1000}").expect("Failed to initialize the variable `re`.");
magnet = re.replace(&magnet, contents).to_string();
println!("The updated magnet link:\n{}", magnet);
}
The dependencies section of my Cargo.toml:
[dependencies]
hyperscan = "0.2.2"
This code does not compile, I get the following output from the compiler:
error[E0599]: no method named `replace` found for struct `Regex` in the current scope
--> src/main.rs:26:14
|
26 | magnet = re.replace(&magnet, contents).to_string();
| ^^^^^^^ method not found in `Regex`
How might I rewrite the line in question? What method should I use instead of .replace()?
I looked for resources about using hyperscan, but there is not much on their GitHub.

Related

How do I un-gzip a file without saving it?

I am new to rust and I am trying to port golang code that I had written previosuly. The go code basically downloaded files from s3 and directly (without writing to disk) ungziped the files and parsed them.
Currently the only solution I found is to save the gzipped files on disk then ungzip and parse them.
Perfect pipeline would be to directly ungzip and parse them.
How can I accomplish this?
const ENV_CRED_KEY_ID: &str = "KEY_ID";
const ENV_CRED_KEY_SECRET: &str = "KEY_SECRET";
const BUCKET_NAME: &str = "bucketname";
const REGION: &str = "us-east-1";
use anyhow::{anyhow, bail, Context, Result}; // (xp) (thiserror in prod)
use aws_sdk_s3::{config, ByteStream, Client, Credentials, Region};
use std::env;
use std::io::{Write};
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() -> Result<()> {
let client = get_aws_client(REGION)?;
let keys = list_keys(&client, BUCKET_NAME, "CELLDATA/year=2022/month=06/day=06/").await?;
println!("List:\n{}", keys.join("\n"));
let dir = Path::new("input/");
let key: &str = &keys[0];
download_file_bytes(&client, BUCKET_NAME, key, dir).await?;
println!("Downloaded {key} in directory {}", dir.display());
Ok(())
}
async fn download_file_bytes(client: &Client, bucket_name: &str, key: &str, dir: &Path) -> Result<()> {
// VALIDATE
if !dir.is_dir() {
bail!("Path {} is not a directory", dir.display());
}
// create file path and parent dir(s)
let mut file_path = dir.join(key);
let parent_dir = file_path
.parent()
.ok_or_else(|| anyhow!("Invalid parent dir for {:?}", file_path))?;
if !parent_dir.exists() {
create_dir_all(parent_dir)?;
}
file_path.set_extension("json");
// BUILD - aws request
let req = client.get_object().bucket(bucket_name).key(key);
// EXECUTE
let res = req.send().await?;
// STREAM result to file
let mut data: ByteStream = res.body;
let file = File::create(&file_path)?;
let Some(bytes)= data.try_next().await?;
let mut gzD = GzDecoder::new(&bytes);
let mut buf_writer = BufWriter::new( file);
while let Some(bytes) = data.try_next().await? {
buf_writer.write(&bytes)?;
}
buf_writer.flush()?;
Ok(())
}
fn get_aws_client(region: &str) -> Result<Client> {
// get the id/secret from env
let key_id = env::var(ENV_CRED_KEY_ID).context("Missing S3_KEY_ID")?;
let key_secret = env::var(ENV_CRED_KEY_SECRET).context("Missing S3_KEY_SECRET")?;
// build the aws cred
let cred = Credentials::new(key_id, key_secret, None, None, "loaded-from-custom-env");
// build the aws client
let region = Region::new(region.to_string());
let conf_builder = config::Builder::new().region(region).credentials_provider(cred);
let conf = conf_builder.build();
// build aws client
let client = Client::from_conf(conf);
Ok(client)
}
Your snippet doesn't tell where GzDecoder comes from, but I'll assume it's flate2::read::GzDecoder.
flate2::read::GzDecoder is already built in a way that it can wrap anything that implements std::io::Read:
GzDecoder::new expects an argument that implements Read => deflated data in
GzDecoder itself implements Read => inflated data out
Therefore, you can use it just like a BufReader: Wrap your reader and used the wrapped value in place:
use flate2::read::GzDecoder;
use std::fs::File;
use std::io::BufReader;
use std::io::Cursor;
fn main() {
let data = [0, 1, 2, 3];
// Something that implements `std::io::Read`
let c = Cursor::new(data);
// A dummy output
let mut out_file = File::create("/tmp/out").unwrap();
// Using the raw data would look like this:
// std::io::copy(&mut c, &mut out_file).unwrap();
// To inflate on the fly, "pipe" the data through the decoder, i.e. wrap the reader
let mut stream = GzDecoder::new(c);
// Consume the `Read`er somehow
std::io::copy(&mut stream, &mut out_file).unwrap();
}
playground
You don't mention what "and parse them" entails, but the same concept applies: If your parser can read from an impl Read (e.g. it can read from a std::fs::File), then it can also read directly from a GzDecoder.

Replacing Path parts in Rust

Suppose I have the following three paths:
let file = path::Path::new("/home/meurer/test/a/01/foo.txt");
let src = path::Path::new("/home/meurer/test/a");
let dst = path::Path::new("/home/meurer/test/b");
Now, I want to copy file into dst, but for that I need to correct the paths, so that I can have new_file with a path that resolves to /home/meurer/test/b/01/foo.txt. In other words, how do I remove src from file and then append the result to dst?
/home/meurer/test/a/01/foo.txt -> /home/meurer/test/b/01/foo.txt
Note that we can't assume that src will always be this similar to dst.
You can use Path::strip_prefix and Path::join:
use std::path::Path;
fn main() {
let file = Path::new("/home/meurer/test/a/01/foo.txt");
let src = Path::new("/home/meurer/test/a");
let dst = Path::new("/home/meurer/test/b");
let relative = file.strip_prefix(src).expect("Not a prefix");
let result = dst.join(relative);
assert_eq!(result, Path::new("/home/meurer/test/b/01/foo.txt"));
}
As usual, you probably don't want to use expect in your production code, it's only for terseness of the answer.

Cannot use `replace_all` from the regex crate: expected (), found String

I'm trying to find and replace all instances of a string with a shortened version, and I want to maintain references to a capture if it's found.
I've written this code:
extern crate regex;
use regex::{Regex, Captures};
//... get buffer from stdin
let re = Regex::new(r"(capture something1) and (capture 2)").unwrap();
let out = re.replace_all(&buffer, |caps: &Captures| {
if let ref = caps.at(2).unwrap().to_owned() {
refs.push(ref.to_owned());
}
caps.at(1).unwrap().to_owned();
});
Unfortunately compilation fails with the error:
src/bin/remove_links.rs:16:18: 16:29 error: type mismatch resolving `for<'r, 'r> <[closure#src/bin/remove_links.rs:16:39: 22:6] as std::ops::FnOnce<(&'r regex::Captures<'r>,)>>::Output == std::string::String`:
expected (),
found struct `std::string::String` [E0271]
src/bin/remove_links.rs:16 let out = re.replace_all(&buffer, |caps: &Captures| {
^~~~~~~~~~~
src/bin/remove_links.rs:16:18: 16:29 help: run `rustc --explain E0271` to see a detailed explanation
src/bin/remove_links.rs:16:18: 16:29 note: required because of the requirements on the impl of `regex::Replacer` for `[closure#src/bin/remove_links.rs:16:39: 22:6]`
I can't make sense of it. I've also tried adding use regex::{Regex, Captures, Replacer} but that doesn't change the error at all.
As #BurntSushi5 pointed out, your closure should return a String. Here is a complete example for future reference:
extern crate regex;
use regex::{Regex, Captures};
fn main() {
let buffer = "abcdef";
let re = Regex::new(r"(\w)bc(\w)").unwrap();
let out = re.replace_all(&buffer, |caps: &Captures| {
caps.at(1).unwrap().to_owned()
});
println!("{:?}", out); // => "aef"
}

Need the Groovy way to do partial file substitutions

I have a file that I need to modify. The part I need to modify (not the entire file), is similar to the properties shown below. The problem is that I only need to replace part of the "value", the "ConfigurablePart" if you will. I receive this file so can not control it's format.
alpha.beta.gamma.1 = constantPart1ConfigurablePart1
alpha.beta.gamma.2 = constantPart2ConfigurablePart2
alpha.beta.gamma.3 = constantPart3ConfigurablePart3
I made this work this way, though I know it is really bad!
def updateFile(String pattern, String updatedValue) {
def myFile = new File(".", "inputs/fileInherited.txt")
StringBuffer updatedFileText = new StringBuffer()
def ls = System.getProperty('line.separator')
myFile.eachLine{ line ->
def regex = Pattern.compile(/$pattern/)
def m = (line =~ regex)
if (m.matches()) {
def buf = new StringBuffer(line)
buf.replace(m.start(1), m.end(1), updatedValue)
line = buf.toString()
}
println line
updatedFileText.append(line).append(ls)
}
myFile.write(updatedFileText.toString())
}
The passed in pattern is required to contain a group that is substituted in the StringBuffer. Does anyone know how this should really be done in Groovy?
EDIT -- to define the expected output
The file that contains the example lines needs to be updated such that the "ConfigurablePart" of each line is replaced with the updated text provided. For my ugly solution, I would need to call the method 3 times, once to replace ConfigurablePart1, once for ConfigurablePart2, and finally for ConfigurablePart3. There is likely a better approach to this too!!!
*UPDATED -- Answer that did what I really needed *
In case others ever hit a similar issue, the groovy code improvements I asked about are best reflected in the accepted answer. However, for my problem that did not quite solve my issues. As I needed to substitute only a portion of the matched lines, I needed to use back-references and groups. The only way I could make this work was to define a three-part regEx like:
(.*)(matchThisPart)(.*)
Once that was done, I was able to use:
it.replaceAdd(~/$pattern/, "\$1$replacement\$3")
Thanks to both replies - each helped me out a lot!
It can be made more verbose with the use of closure as args. Here is how this can be done:
//abc.txt
abc.item.1 = someDummyItem1
abc.item.2 = someDummyItem2
abc.item.3 = someDummyItem3
alpha.beta.gamma.1 = constantPart1ConfigurablePart1
alpha.beta.gamma.2 = constantPart2ConfigurablePart2
alpha.beta.gamma.3 = constantPart3ConfigurablePart3
abc.item.4 = someDummyItem4
abc.item.5 = someDummyItem5
abc.item.6 = someDummyItem6
Groovy Code:-
//Replace the pattern in file and write to file sequentially.
def replacePatternInFile(file, Closure replaceText) {
file.write(replaceText(file.text))
}
def file = new File('abc.txt')
def patternToFind = ~/ConfigurablePart/
def patternToReplace = 'NewItem'
//Call the method
replacePatternInFile(file){
it.replaceAll(patternToFind, patternToReplace)
}
println file.getText()
//Prints:
abc.item.1 = someDummyItem1
abc.item.2 = someDummyItem2
abc.item.3 = someDummyItem3
alpha.beta.gamma.1 = constantPart1NewItem1
alpha.beta.gamma.2 = constantPart2NewItem2
alpha.beta.gamma.3 = constantPart3NewItem3
abc.item.4 = someDummyItem4
abc.item.5 = someDummyItem5
abc.item.6 = someDummyItem6
Confirm file abc.txt. I have not used the method updateFile() as done by you, but you can very well parameterize as below:-
def updateFile(file, patternToFind, patternToReplace){
replacePatternInFile(file){
it.replaceAll(patternToFind, patternToReplace)
}
}
For a quick answer I'd just go this route:
patterns = [pattern1 : constantPart1ConfigurablePart1,
pattern2 : constantPart2ConfigurablePart2,
pattern3 : constantPart3ConfigurablePart3]
def myFile = new File(".", "inputs/fileInherited.txt")
StringBuffer updatedFileText = new StringBuffer()
def ls = System.getProperty('line.separator')
myFile.eachLine{ line ->
patterns.each { pattern, replacement ->
line = line.replaceAll(pattern, replacement)
}
println line
updatedFileText.append(line).append(ls)
}
myFile.write(updatedFileText.toString())

How to access file given to cilly in my CIL module

I have added a new feature to CIL(C Intermediate Language). I am able to execute my new module using
$cilly --dotestmodule --save-temps -D HAPPY_MOOD -o test test.c
Now, in my testmodule, I want to call Cfg.computeFileCFG for test.c file. But I don't know how to access test.c file in my module.
I tried using Cil.file. but it says "Unbound value Cil.file".
my code:
open Pretty
open Cfg
open Cil
module RD = Reachingdefs
let () = Cfg.computeFileCFG Cil.file
let rec fact n = if n < 2 then 1 else n * fact(n-1)
let doIt n = fact n
let feature : featureDescr =
{ fd_name = "testmodule";
fd_enabled = ref false;
fd_description = "simple test 1240";
fd_extraopt = [];
fd_doit = (function (f: file) -> ignore (doIt 10));
fd_post_check = true;
}
please tell me how to compute the Cfg for test.c file.
I am not a CIL expert, but here are a few remarks:
the CIL online documentation states that Cil.file is an Ocaml type. Passing a type as an argument to a function is probably not what you want to do here;
it seems like the fd_doit function in your feature descriptor takes the file you are looking to process as its argument f;
according to the Cilly manual, the type of f is Cil.file. Conveniently, this seems to be the type of the argument required by the function computeFileCFG.
Hopefully you can take it from here. Good luck!