How to compose two calls to Regex::replace_all? - regex

Regex::replace_all has the signature fn (text: &str) -> Cow<str>. How would two calls to this be written, f(g(x)), giving the same signature?
Here's some code I'm trying to write. This has the two calls separated out into two functions, but I couldn't get it working in one function either. Here's my lib.rs in a fresh Cargo project:
#![allow(dead_code)]
/// Plaintext and HTML manipulation.
use lazy_static::lazy_static;
use regex::Regex;
use std::borrow::Cow;
lazy_static! {
static ref DOUBLE_QUOTED_TEXT: Regex = Regex::new(r#""(?P<content>[^"]+)""#).unwrap();
static ref SINGLE_QUOTE: Regex = Regex::new(r"'").unwrap();
}
fn add_typography(text: &str) -> Cow<str> {
add_double_quotes(&add_single_quotes(text)) // Error! "returns a value referencing data owned by the current function"
}
fn add_double_quotes(text: &str) -> Cow<str> {
DOUBLE_QUOTED_TEXT.replace_all(text, "“$content”")
}
fn add_single_quotes(text: &str) -> Cow<str> {
SINGLE_QUOTE.replace_all(text, "’")
}
#[cfg(test)]
mod tests {
use crate::{add_typography};
#[test]
fn converts_to_double_quotes() {
assert_eq!(add_typography(r#""Hello""#), "“Hello”");
}
#[test]
fn converts_a_single_quote() {
assert_eq!(add_typography("Today's Menu"), "Today’s Menu");
}
}
Here's the best I could come up with, but this will get ugly fast when composing three or four functions:
fn add_typography(input: &str) -> Cow<str> {
match add_single_quotes(input) {
Cow::Owned(output) => add_double_quotes(&output).into_owned().into(),
_ => add_double_quotes(input),
}
}

A Cow contains maybe-owned data.
We can infer from what the replace_all function does that it returns borrowed data only if substitutions did not happen, otherwise it has to return new, owned data.
The problem arises when the inner call makes a substitution but the outer one does not. In that case, the outer call will simply pass its input through as Cow::Borrowed, but it borrows from the Cow::Owned value returned by the inner call, whose data now belongs to a Cow temporary that is local to add_typography(). The function would therefore return a Cow::Borrowed, but would borrow from the temporary, and that's obviously not memory-safe.
Basically, this function will only ever return borrowed data when no substitutions were made by either call. What we need is a helper that can propagate owned-ness through the call layers whenever the returned Cow is itself owned.
We can construct a .map() extension method on top of Cow that does exactly this:
use std::borrow::{Borrow, Cow};
trait CowMapExt<'a, B>
where B: 'a + ToOwned + ?Sized
{
fn map<F>(self, f: F) -> Self
where F: for <'b> FnOnce(&'b B) -> Cow<'b, B>;
}
impl<'a, B> CowMapExt<'a, B> for Cow<'a, B>
where B: 'a + ToOwned + ?Sized
{
fn map<F>(self, f: F) -> Self
where F: for <'b> FnOnce(&'b B) -> Cow<'b, B>
{
match self {
Cow::Borrowed(v) => f(v),
Cow::Owned(v) => Cow::Owned(f(v.borrow()).into_owned()),
}
}
}
Now your call site can stay nice and clean:
fn add_typography(text: &str) -> Cow<str> {
add_single_quotes(text).map(add_double_quotes)
}

Related

How to mock `std::fs::File` so can check if `File::set_len` was used correctly in unit tests?

As I check File::set_len(..) looks like it's implemented for struct File , but not via Trait.
Goal: test foo that takes file open as read/write , performs operations of : reads, writes, seeks, and trimming file to certain size. We like to provide initial state of file in test, and check result. Preferably in-memory.
How to test code that relies on set_len? (io::Seek or other traits didn't help so far).
I would like to mock it.
Let's make a toy example, to make discussion easier:
#![allow(unused_variables)]
use std::error::Error;
use std::fs::File;
use std::io::Cursor;
// assumes that file is open in Read/Write mode
// foo performs reads and writes and Seeks
// at the end wants to trim size of file to certain size.
fn foo(f: &mut File) -> Result<(), Box<dyn Error>> {
f.set_len(0)?;
Ok(())
}
fn main () -> Result<(), Box<dyn Error>> {
let mut buf = Vec::new();
let mut mockfile = Cursor::new(&buf);
// we would like to supply foo
// with "test" representation of initial file state
foo(&mut mockfile)
// and check afterwards if resulting contents (=> size)
// of file match expectations
}
on rust-play : https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=950a94504168d51f043966288fae3bca
Error:
error[E0308]: mismatched types
--> src/main.rs:15:9
|
15 | foo(&mut mockfile)
| ^^^^^^^^^^^^^ expected struct `File`, found struct `std::io::Cursor`
P.S. before receiving answers I started giving a shot to tempfile crate: https://docs.rs/tempfile/3.1.0/tempfile/#structs . Still, ideal solution is "in-memory" so can't wait for answers to question :).
In short, you can't mock std::fs::File given a function that requires that exact type - it's just not how Rust works.
However, if you have control over foo, you can easily invent a trait that has set_len and make foo generic over that trait. Since it's your trait, you can implement it for types defined elsewhere (such as File), which will make foo() accept File as it did before. But it will also accept anything else that implements the trait, including the mock types you create in the test suite. And thanks to monomorphization, its execution will be just as efficient as the original code. For example:
pub trait SetLen {
fn set_len(&mut self, len: u64) -> io::Result<()>;
}
impl SetLen for File {
fn set_len(&mut self, len: u64) -> io::Result<()> {
File::set_len(self, len)
}
}
pub fn foo(f: &mut impl SetLen) -> Result<(), Box<dyn Error>> {
f.set_len(0)?;
Ok(())
}
// You can always pass a `File` to `foo()`:
fn main() -> Result<(), Box<dyn Error>> {
let mut f = File::create("bla")?;
foo(&mut f)?;
Ok(())
}
To mock it, you would just define a type that implements the trait and records whether it's been called:
#[derive(Debug, Default)]
struct MockFile {
set_len_called: Option<u64>,
}
impl SetLen for MockFile {
fn set_len(&mut self, len: u64) -> io::Result<()> {
self.set_len_called = Some(len);
Ok(())
}
}
#[test]
fn test_set_len_called() {
let mut mf = MockFile::default();
foo(&mut mf).unwrap();
assert_eq!(mf.set_len_called, Some(0));
}
Playground

Unit testing, mocking and traits in rust [duplicate]

This question already has answers here:
How to mock external dependencies in tests? [duplicate]
(1 answer)
How to mock specific methods but not all of them in Rust?
(2 answers)
How can I test stdin and stdout?
(1 answer)
Is there a way of detecting whether code is being called from tests in Rust?
(1 answer)
What is the proper way to use the `cfg!` macro to choose between multiple implementations?
(1 answer)
Closed 2 years ago.
Iam current building a application which heavy relies on File IO, so obviously lots of parts of my code have File::open(file).
Doing some integration tests are ok, I can easily set folders to load file and scenarios needed for it.
The problem comes whatever I want to unit tests, and code branches. I know there is lots of mocking libraries out there that claim to mocks, but i feel my biggest problem is code design itself.
Let's say for instance, I would do the same code in any object oriented language (java in the example), i could write some interfaces, and on tests simple override the default behavior I want to mock, set the a fake ClientRepository, whatever reimplemented wih a fixed return, or use some mocking framework, like mockito.
public interface ClientRepository {
Client getClient(int id)
}
public class ClientRepositoryDB {
private ClientRepository repository;
//getters and setters
public Client getClientById(int id) {
Client client = repository.getClient(id);
//Some data manipulation and validation
}
}
But i couldn`t manage to get the same results in rust, since we endup mixing data with behavior.
On the RefCell documentation, there is a similar example with the one I gave on java. Some of answers points to traits, clojures, conditional compiliation
We might come with some scenarios in test, first one a public function in some mod.rs
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct SomeData {
pub name: Option<String>,
pub address: Option<String>,
}
pub fn get_some_data(file_path: PathBuf) -> Option<SomeData> {
let mut contents = String::new();
match File::open(file_path) {
Ok(mut file) => {
match file.read_to_string(&mut contents) {
Ok(result) => result,
Err(_err) => panic!(
panic!("Problem reading file")
),
};
}
Err(err) => panic!("File not find"),
}
// using serde for operate on data output
let some_data: SomeData = match serde_json::from_str(&contents) {
Ok(some_data) => some_data,
Err(err) => panic!(
"An error occour when parsing: {:?}",
err
),
};
//we might do some checks or whatever here
Some(some_data) or None
}
mod test {
use super::*;
#[test]
fn test_if_scenario_a_happen() -> std::io::Result<()> {
//tied with File::open
let some_data = get_some_data(PathBuf::new);
assert!(result.is_some());
Ok(())
}
#[test]
fn test_if_scenario_b_happen() -> std::io::Result<()> {
//We might need to write two files, and we want to test is the logic, not the file loading itself
let some_data = get_some_data(PathBuf::new);
assert!(result.is_none());
Ok(())
}
}
The second the same function becoming a trait and some struct implement it.
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct SomeData {
pub name: Option<String>,
pub address: Option<String>,
}
trait GetSomeData {
fn get_some_data(&self, file_path: PathBuf) -> Option<SomeData>;
}
pub struct SomeDataService {}
impl GetSomeData for SomeDataService {
fn get_some_data(&self, file_path: PathBuf) -> Option<SomeData> {
let mut contents = String::new();
match File::open(file_path) {
Ok(mut file) => {
match file.read_to_string(&mut contents) {
Ok(result) => result,
Err(_err) => panic!("Problem reading file"),
};
}
Err(err) => panic!("File not find"),
}
// using serde for operate on data output
let some_data: SomeData = match serde_json::from_str(&contents) {
Ok(some_data) => some_data,
Err(err) => panic!("An error occour when parsing: {:?}", err),
};
//we might do some checks or whatever here
Some(some_data) or None
}
}
impl SomeDataService {
pub fn do_something_with_data(&self) -> Option<SomeData> {
self.get_some_data(PathBuf::new())
}
}
mod test {
use super::*;
#[test]
fn test_if_scenario_a_happen() -> std::io::Result<()> {
//tied with File::open
let service = SomeDataService{}
let some_data = service.do_something_with_data(PathBuf::new);
assert!(result.is_some());
Ok(())
}
}
On both examples, we have a hard time unit testing it, since we tied with File::open, and surely, this might be extend to any non-deterministic function, like time, db connection, etc.
How would you design this or any similar code to make easier to unit testing and better design?
How would you design this or any similar code to make easier to unit testing and better design?
One way is to make get_some_data() generic over the input stream. The std::io module defines a Read trait for all things you can read from, so it could look like this (untested):
use std::io::Read;
pub fn get_some_data(mut input: impl Read) -> Option<SomeData> {
let mut contents = String::new();
input.read_to_string(&mut contents).unwrap();
...
}
You'd call get_some_data() with the input, e.g. get_some_data(File::open(file_name).unwrap()) or get_some_data(&mut io::stdin::lock()), etc. When testing, you can prepare the input in a string and call it as get_some_data(io::Cursor::new(prepared_data)).
As for the trait example, I think you misunderstood how to apply the pattern to your code. You're supposed to use the trait to decouple getting the data from processing the data, sort of how you'd use an interface in Java. The get_some_data() function would receive an object known to implement the trait.
Code more similar to what you'd find in an OO language might choose to use a trait object:
trait ProvideData {
fn get_data(&self) -> String
}
struct FileData(PathBuf);
impl ProvideData for FileData {
fn get_data(&self) -> String {
std::fs::read(self.0).unwrap()
}
}
pub fn get_some_data(data_provider: &dyn ProvideData) -> Option<SomeData> {
let contents = data_provider.get_data();
...
}
// normal invocation:
// let some_data = get_some_data(&FileData("file name".into()));
In test you'd just create a different implementation of the trait - for example:
#[cfg(test)]
mod test {
struct StaticData(&'static str);
impl ProvideData for StaticData {
fn get_data(&self) -> String {
self.0.to_string()
}
}
#[test]
fn test_something() {
let some_data = get_some_data(StaticData("foo bar"));
assert!(...);
}
}
First of all, I would like to thank #user4815162342 for enlightenment of traits. Using his answer as base, i solve with my own solution for the problem.
First, I build as mention, traits to better design my code:
trait ProvideData {
fn get_data(&self) -> String
}
But I had some problems, since there were tons of bad design code, and lots code I had to mock before run the test, something like the below code.
pub fn some_function() -> Result<()> {
let some_data1 = some_non_deterministic_function(PathBuf::new())?;
let some_data2 = some_non_deterministic_function_2(some_data1);
match some_data2 {
Ok(ok) => Ok(()),
Err(err) => panic!("something went wrong"),
}
}
I would need to change almost all functions signatures to accept Fn, this would not only change most my code, but will actually make it hard to read, since most of it I was changing for testing purpose only.
pub fn some_function(func1: Box<dyn ProvideData>, func2: Box<dyn SomeOtherFunction>) -> Result<()> {
let some_data1 = func1(PathBuf::new())?;
let some_data2 = func2(some_data1);
match some_data2 {
Ok(ok) => Ok(()),
Err(err) => panic!("something went wrong"),
}
}
Reading a little more deep the rust documentation, I slight changed the implementation.
Change almost all my code to use traits and structs ( Lots of code were public functions )
trait ProvideData {
fn get_data(&self) -> String;
}
struct FileData(PathBuf);
impl ProvideData for FileData {
fn get_data(&self) -> String {
String::from(format!("Pretend there is something going on here with file {}", self.0.to_path_buf().display()))
}
}
Add a new functions for default implementation in the structs, and add constructor with default implementation using dynamic dispatch functions.
struct SomeData(Box<dyn ProvideData>);
impl SomeData {
pub fn new() -> SomeData {
let file_data = FileData(PathBuf::new());
SomeData {
0: Box::new(file_data)
}
}
pub fn get_some_data(&self) -> Option<String> {
let contents = self.0.get_data();
Some(contents)
}
}
Since the constructor is private, we prevent user from injecting code, and we can freely change the internal implementation for testing purpose, and the integration tests keep running smooth.
fn main() {
//When the user call this function, it would no know that there is multiple implementations for it.
let some_data = SomeData::new();
assert_eq!(Some(String::from("Pretend there is something going on here with file ")),some_data.get_some_data());
println!("HEY WE CHANGE THE INJECT WITHOUT USER INTERATION");
}
And finally, since we test inside the declaration scope, we might change the injection even if is private:
mod test {
use super::*;
struct MockProvider();
impl ProvideData for MockProvider {
fn get_data(&self) -> String {
String::from("Mocked data")
}
}
#[test]
fn test_internal_data() {
let some_data = SomeData(Box::from(MockProvider()));
assert_eq!(Some(String::from("Mocked data")), some_data.get_some_data())
}
#[test]
fn test_ne_internal_data() {
let some_data = SomeData(Box::from(MockProvider()));
assert_ne!(Some(String::from("Not the expected data")), some_data.get_some_data())
}
}
The result code can be seem in the rust playground, hope this help user to design their code.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=62348977502accfed55fa4600d149bcd

Comparing functions for equality in Rust

I have a function which takes a number as an argument, and then returns a function based on the number. Depending on many different things, it might return any of ~50 functions, and the cases for which one it should return get pretty complicated. As such, I want to build some tests to make sure the proper functions are being returned. What I have so far looks roughly like this.
fn pick_a_function(decider: u32) -> fn(&mut SomeStruct) {
match decider {
1 => add,
2 => sub,
_ => zero,
}
}
fn add(x: &mut SomeStruct) {
x.a += x.b;
}
fn sub(x: &mut SomeStruct) {
x.a -= x.b;
}
fn zero(_x: &mut SomeStruct) {
x.a = 0;
}
fn main() {
let mut x = SomeStruct { a: 2, b: 3 };
pick_a_function(1)(&mut x);
println!("2 + 3 = {}", x.a);
}
#[cfg(test)]
mod tests {
use super::*;
fn picks_correct_function() {
assert_eq!(pick_a_function(1), add);
}
}
The problem is that the functions don't seem to implement the Eq or PartialEq traits, so assert_eq! just says that it can't compare them. What options do I have for comparing the returned function to the correct function?
So it turns of that functions in Rust actually do implement PartialEq as long as there is not a lifetime attached, and as long as the function takes less than 10 arguments. This restriction is because each form of function signature has to have the traits implemented directly, because the compiler considers all of them to be completely unrelated types.
The functions I was returning took a mutable reference to a struct, which implicitly gives the function a lifetime, so they no longer had a type signature which implemented PartialEq. All that rust really does internally to compare function equality though is cast both of them to pointers and then compare, so we can actually just do the same thing.
#[cfg(test)]
mod tests {
use super::*;
fn picks_correct_function() {
assert_eq!(
pick_a_function(1) as usize,
add as usize
);
}
}
You should compare the result instead of the function,for example:
#[cfg(test)]
mod tests {
use super::*;
fn picks_correct_function() {
let add_picked = pick_a_function(1);
assert_eq!(add_picked(1,2), add(1,2));
}
}
Or in more complex scenarios you can compare the inputs making a function that takes one parameter and another that takes two,try to call any of them and see if you get a compiler error.

How do I 'stop' the spread of generic type definitions when translating C++ interface classes to Rust traits? [duplicate]

I have a configuration struct that looks like this:
struct Conf {
list: Vec<String>,
}
The implementation was internally populating the list member, but now I have decided that I want to delegate that task to another object. So I have:
trait ListBuilder {
fn build(&self, list: &mut Vec<String>);
}
struct Conf<T: Sized + ListBuilder> {
list: Vec<String>,
builder: T,
}
impl<T> Conf<T>
where
T: Sized + ListBuilder,
{
fn init(&mut self) {
self.builder.build(&mut self.list);
}
}
impl<T> Conf<T>
where
T: Sized + ListBuilder,
{
pub fn new(lb: T) -> Self {
let mut c = Conf {
list: vec![],
builder: lb,
};
c.init();
c
}
}
That seems to work fine, but now everywhere that I use Conf, I have to change it:
fn do_something(c: &Conf) {
// ...
}
becomes
fn do_something<T>(c: &Conf<T>)
where
T: ListBuilder,
{
// ...
}
Since I have many such functions, this conversion is painful, especially since most usages of the Conf class don't care about the ListBuilder - it's an implementation detail. I'm concerned that if I add another generic type to Conf, now I have to go back and add another generic parameter everywhere. Is there any way to avoid this?
I know that I could use a closure instead for the list builder, but I have the added constraint that my Conf struct needs to be Clone, and the actual builder implementation is more complex and has several functions and some state in the builder, which makes a closure approach unwieldy.
While generic types can seem to "infect" the rest of your code, that's exactly why they are beneficial! The compiler knowledge about how big and specifically what type is used allow it to make better optimization decisions.
That being said, it can be annoying! If you have a small number of types that implement your trait, you can also construct an enum of those types and delegate to the child implementations:
enum MyBuilders {
User(FromUser),
File(FromFile),
}
impl ListBuilder for MyBuilders {
fn build(&self, list: &mut Vec<String>) {
use MyBuilders::*;
match self {
User(u) => u.build(list),
File(f) => f.build(list),
}
}
}
// Support code
trait ListBuilder {
fn build(&self, list: &mut Vec<String>);
}
struct FromUser;
impl ListBuilder for FromUser {
fn build(&self, list: &mut Vec<String>) {}
}
struct FromFile;
impl ListBuilder for FromFile {
fn build(&self, list: &mut Vec<String>) {}
}
Now the concrete type would be Conf<MyBuilders>, which you can use a type alias to hide.
I've used this to good effect when I wanted to be able to inject test implementations into code during testing, but had a fixed set of implementations that were used in the production code.
The enum_dispatch crate helps construct this pattern.
You can use the trait object Box<dyn ListBuilder> to hide the type of the builder. Some of the consequences are dynamic dispatch (calls to the build method will go through a virtual function table), additional memory allocation (boxed trait object), and some restrictions on the trait ListBuilder.
trait ListBuilder {
fn build(&self, list: &mut Vec<String>);
}
struct Conf {
list: Vec<String>,
builder: Box<dyn ListBuilder>,
}
impl Conf {
fn init(&mut self) {
self.builder.build(&mut self.list);
}
}
impl Conf {
pub fn new<T: ListBuilder + 'static>(lb: T) -> Self {
let mut c = Conf {
list: vec![],
builder: Box::new(lb),
};
c.init();
c
}
}

Creating an `std::env::Args` iterator for testing

Is there a way in Rust to create a std::env::Args from a Vec<String> in order to use it in a #[test] function?
I wish to test a function that gets a std::env::Args as an argument, but I don't know how to create such an object with a list of arguments I supply for the test.
I wasn't able to figure this one out from the docs, the source nor from Google searches.
The fields of std::env::Args are not documented, and there doesn't appear to be a public function to create one with custom fields. So, you're outta luck there.
But since it's just "An iterator over the arguments of a process, yielding a String value for each argument" your functions can take a String iterator or Vec without any loss of functionality or type safety. Since it's just a list of Strings, it doesn't make much sense to arbitrarily limit your functions to strings which happen to come from the command line.
Looking through Rust's own tests, that's just what they do. There's a lot of let args: Vec<String> = env::args().collect();
There's even an example in rustbuild where they strip off the name of the program and just feed the list of arguments.
use std::env;
use bootstrap::{Config, Build};
fn main() {
let args = env::args().skip(1).collect::<Vec<_>>();
let config = Config::parse(&args);
Build::new(config).build();
}
And bootstrap::Config::parse() looks like so:
impl Config {
pub fn parse(args: &[String]) -> Config {
let flags = Flags::parse(&args);
...
I'm not a Rust expert, but that seems to be how the Rust folks handle the problem.
#Schwern's answer is good and it led me to this simpler version. Since std::env::Args implements Iterator with Item = String you can do this:
use std::env;
fn parse<T>(args: T)
where
T: Iterator<Item = String>,
{
for arg in args {
// arg: String
print!("{}", arg);
}
}
fn main() {
parse(env::args());
}
To test, you provide parse with an iterator over String:
#[test]
fn test_parse() {
let args = ["arg1", "arg2"].iter().map(|s| s.to_string());
parse(args);
}
I've wrote a little macro to make this easier, based on #Rossman's answer (and therefore also based on #Schwern's answer; thanks go to both):
macro_rules! make_string_iter {
($($element: expr), *) => {
{
let mut v = Vec::new();
$( v.push(String::from($element)); )*
v.into_iter()
}
};
}
It can be used in that way:
macro_rules! make_string_iter {
($($element: expr), *) => {
{
let mut v = Vec::new();
$( v.push(String::from($element)); )*
v.into_iter()
}
};
}
// We're using this function to test our macro
fn print_args<T: Iterator<Item = String>>(args: T) {
for item in args {
println!("{}", item);
}
}
fn main() {
// Prints a, b and c
print_args(make_string_iter!("a", "b", "c"))
}
Or try it out on the Rust Playground.
I'm not (yet) an expert in rust, any suggestions are highly welcome :)