"Borrowed Value Does Not Live Long Enough" when pushing into a vector - list

I am trying a daily programmer problem to shuffle a list of arguments and output them.
I'm not sure if this is the correct approach but it sounded like a good idea: remove the element from the args vector so it doesn't get repeated, and insert it into the result vector.
extern crate rand; // 0.7.3
use std::io;
use std::cmp::Ordering;
use std::env;
use rand::Rng;
fn main() {
let mut args: Vec<_> = env::args().collect();
let mut result: Vec<_> = Vec::with_capacity(args.capacity());
if args.len() > 1 {
println!("There are(is) {} argument(s)", args.len() - 1)
}
for x in args.iter().skip(1) {
let mut n = rand::thread_rng().gen_range(1, args.len());
result.push(&args.swap_remove(n));
}
for y in result.iter() {
println!("{}", y);
}
}
I get the error:
error[E0716]: temporary value dropped while borrowed
--> src/main.rs:18:22
|
18 | result.push(&args.swap_remove(n));
| ^^^^^^^^^^^^^^^^^^^ - temporary value is freed at the end of this statement
| |
| creates a temporary which is freed while still in use
...
21 | for y in result.iter() {
| ------ borrow later used here
|
= note: consider using a `let` binding to create a longer lived value
Older compilers said:
error[E0597]: borrowed value does not live long enough
--> src/main.rs:18:42
|
18 | result.push(&args.swap_remove(n));
| ------------------- ^ temporary value dropped here while still borrowed
| |
| temporary value created here
...
24 | }
| - temporary value needs to live until here
|
= note: consider using a `let` binding to increase its lifetime

Let's start with a smaller example. This is called an Minimal, Reproducible Example , and is very valuable for both you as a programmer and for us to answer your question. Additionally, it can run on the Rust Playground, which is convenient.
fn main() {
let mut args = vec!["a".to_string()];
let mut result = vec![];
for _ in args.iter() {
let n = args.len() - 1; // Pretend this is a random index
result.push(&args.swap_remove(n));
}
for y in result.iter() {
println!("{}", y);
}
}
The problem arises because when you call swap_remove, the item is moved out of the vector and given to you - the ownership is transferred. You then take a reference to the item and try to store that reference in the result vector. The problem is that the item is dropped after the loop iteration has ended because nothing owns it. If you were allowed to take that reference, it would be a dangling reference, one that points to invalid memory. Using that reference could cause a crash, so Rust prevents it.
The immediate fix is to not take a reference, but instead transfer ownership from one vector to the other. Something like:
for _ in args.iter() {
let n = args.len() - 1; // Pretend this is a random index
result.push(args.swap_remove(n));
}
The problem with this is that you will get
error[E0502]: cannot borrow `args` as mutable because it is also borrowed as immutable
--> src/main.rs:7:21
|
5 | for _ in args.iter() {
| -----------
| |
| immutable borrow occurs here
| immutable borrow later used here
6 | let n = args.len() - 1;
7 | result.push(args.swap_remove(n));
| ^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here
See the args.iter? That creates an iterator that refers to the vector. If you changed the vector, then the iterator would become invalid, and allow access to an item that may not be there, another potential crash that Rust prevents.
I'm not making any claim that this is a good way to do it, but one solution would be to iterate while there are still items:
while !args.is_empty() {
let n = args.len() - 1; // Pretend this is a random index
result.push(args.swap_remove(n));
}
I'd solve the overall problem by using shuffle:
use rand::seq::SliceRandom; // 0.8.3
use std::env;
fn main() {
let mut args: Vec<_> = env::args().skip(1).collect();
args.shuffle(&mut rand::thread_rng());
for y in &args {
println!("{}", y);
}
}

Related

Rust regexes live long enough for match but not find

I'm trying to understand why behavior for the match regex is different from the behavior for find, from documentation here.
I have the following for match:
use regex::Regex;
{
let meow = String::from("This is a long string that I am testing regexes on in rust.");
let re = Regex::new("I").unwrap();
let x = re.is_match(&meow);
dbg!(x)
}
And get:
[src/lib.rs:142] x = true
Great, now let's identify the location of the match:
{
let meow = String::from("This is a long string that I am testing regexes on in rust.");
let re = Regex::new("I").unwrap();
let x = re.find(&meow).unwrap();
dbg!(x)
}
And I get:
let x = re.find(&meow).unwrap();
^^^^^ borrowed value does not live long enough
}
^ `meow` dropped here while still borrowed
`meow` does not live long enough
I think I'm following the documentation. Why does the string meow live long enough for a match but not long enough for find?
Writing a value without ; at the end of a { } scope effectively returns that value out of the scope. For example:
fn main() {
let x = {
let y = 10;
y + 1
};
dbg!(x);
}
[src/main.rs:7] x = 11
Here, because we don't write a ; after the y + 1, it gets returned from the inner scope and written to x.
If you write a ; after it, you will get something different:
fn main() {
let x = {
let y = 10;
y + 1;
};
dbg!(x);
}
[src/main.rs:7] x = ()
Here you can see that the ; now prevents the value from being returned. Because no value gets returned from the inner scope, it implicitly gets the empty return type (), which gets stored in x.
The same happens in your code:
use regex::Regex;
fn main() {
let z = {
let meow = String::from("This is a long string that I am testing regexes on in rust.");
let re = Regex::new("I").unwrap();
let x = re.is_match(&meow);
dbg!(x)
};
dbg!(z);
}
[src/main.rs:9] x = true
[src/main.rs:12] z = true
Because you don't write a ; after the dbg!() statement, its return value gets returned from the inner scope. The dbg!() statement simply returns the value that gets passed to it, so the return value of the inner scope is x. And because x is just a bool, it gets returned without a problem.
Now let's look at your second example:
use regex::Regex;
fn main() {
let z = {
let meow = String::from("This is a long string that I am testing regexes on in rust.");
let re = Regex::new("I").unwrap();
let x = re.find(&meow).unwrap();
dbg!(x)
};
dbg!(z);
}
error[E0597]: `meow` does not live long enough
--> src/main.rs:8:25
|
4 | let z = {
| - borrow later stored here
...
8 | let x = re.find(&meow).unwrap();
| ^^^^^ borrowed value does not live long enough
9 | dbg!(x)
10 | };
| - `meow` dropped here while still borrowed
And now it should be more obvious what's happening: It's basically the same as the previous example, just that the returned x is now a type that internally borrows meow. And because meow gets destroyed at the end of the scope, x cannot be returned, as it would outlive meow.
The reason why x borrows from meow is because regular expression Matches don't actually copy the data they matched, they just store a reference to it.
So if you add a ;, you prevent the value from being returned from the scope, changing the scope return value to ():
use regex::Regex;
fn main() {
let z = {
let meow = String::from("This is a long string that I am testing regexes on in rust.");
let re = Regex::new("I").unwrap();
let x = re.find(&meow).unwrap();
dbg!(x);
};
dbg!(z);
}
[src/main.rs:9] x = Match {
text: "This is a long string that I am testing regexes on in rust.",
start: 27,
end: 28,
}
[src/main.rs:12] z = ()

Return Result of a Regex::replace_all as str [duplicate]

This question already has an answer here:
Return &str instead of std::borrow::Cow<'_, str>
(1 answer)
Closed 8 months ago.
I am trying to create a function that takes a &str as an input and gives a &str as an output. However I do not find any solution on how to comply with the borrow checker rules. It seems that the returned type is Cow, but I didn't found a way to convert it to a &str. Maybe I should keep this type?
Here is the function that I want to write:
fn replace_by_regex<'a>(text: &'a str) -> &'a str {
let re = Regex::new(r"(PATTERN)").unwrap();
let after = re.replace_all(&text, "").to_string().as_str();
// println!("{}", after);
after
}
But I get this error:
error[E0515]: cannot return value referencing temporary value
--> src/lib.rs:91:5
|
89 | let after = re.replace_all(&text, "").to_string().as_str();
| ------------------------------------- temporary value created here
90 | // println!("{}", after);
91 | after
| ^^^^^ returns a value referencing data owned by the current function
But at the end of the day I aim to collect a modified &str.
How about returning a String instead? Got it working in the playground and included a test.
use regex::Regex;
pub fn replace_by_regex(text: &str) -> String {
let re = Regex::new(r"(PATTERN)").unwrap();
let after = re.replace_all(text, "");
println!("{}", after);
after.into_owned()
}

Creating a serializable fixed size char array in F#

I am dealing with a very large amount of data I need to load / save to disk where speed is the key.
I wrote this code:
// load from cache
let loadFromCacheAsync<'a when 'a: (new: unit -> 'a) and 'a: struct and 'a :> ValueType> filespec =
async {
let! bytes = File.ReadAllBytesAsync(filespec) |> Async.AwaitTask
let result =
use pBytes = fixed bytes
let sourceSpan = Span<byte>(NativePtr.toVoidPtr pBytes, bytes.Length)
MemoryMarshal.Cast<byte, 'a>(sourceSpan).ToArray()
return result
}
// save to cache
let saveToCacheAsync<'a when 'a: unmanaged> filespec (data: 'a array) =
Directory.CreateDirectory cacheFolder |> ignore
let sizeStruct = sizeof<'a>
use ptr = fixed data
let nativeSpan = Span<byte>(NativePtr.toVoidPtr ptr, data.Length * sizeStruct).ToArray()
File.WriteAllBytesAsync(filespec, nativeSpan) |> Async.AwaitTask
and it requires the data structures to be unmanaged.
For example, I have:
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type ShortTradeData =
{
[<FieldOffset(00)>] Timestamp: DateTime
[<FieldOffset(08)>] Price: double
[<FieldOffset(16)>] Quantity: double
[<FieldOffset(24)>] Direction: int
}
or
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type ShortCandleData =
{
[<FieldOffset(00)>] Timestamp: DateTime
[<FieldOffset(08)>] Open: double
[<FieldOffset(16)>] High: double
[<FieldOffset(24)>] Low: double
[<FieldOffset(32)>] Close: double
}
etc...
I'm now facing a case where I need to store a string. I know the max length of the strings but I'm trying to find out how I can do this with un-managed types.
I'm wondering if I could do something like this (for 256 bytes):
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type TestData =
{
[<FieldOffset(00)>] Timestamp: DateTime
[<FieldOffset(08)>] Text: char
[<FieldOffset(264)>] Dummy: int
}
Would it be safe then to get a pointer to Text, cast it to a char array, read / write what I want in it and then save / load it as needed?
Or am I asking for some random troubles at some point?
As a side question, any way to speed up the loadFromCache function is very welcome too :)
Edit:
I came up with this for now. It converts a list of complex event objects into something serializable.
The line:
let bytes = Pipeline.serializeBinary event
turns the original event data into a byte array.
Then I create the struct that will hold the binary stream, write the length, create a span representing the struct and copy the bytes. Then I marshal the span into the struct type (ShortEventData).
I can't use Marshal copy since I can't put a destination offset, so I have to copy the bytes with a loop. But there has to be a better way.
And I think, there has to be a better way for everything else in this as well :D Any suggestion would help, I just don't really like this solution.
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type ShortEventData =
{
[<FieldOffset(00)>] Timestamp: DateTime
[<FieldOffset(08)>] Event: byte
[<FieldOffset(1032)>] Length: int
}
events
|> List.map (fun event ->
let bytes = Pipeline.serializeBinary event
let serializableEvent : DataCache.ShortEventData =
{
Timestamp = event.GetTimestamp()
Event = byte 0
Length = bytes.Length
}
use ptr = fixed [|serializableEvent|]
let nativeSpan = Span<byte>(NativePtr.toVoidPtr ptr, serializableEvent.Length * sizeStruct)
for i = 0 to bytes.Length - 1 do
nativeSpan[8 + i] <- bytes[i]
MemoryMarshal.Cast<byte, DataCache.ShortEventData>(nativeSpan).ToArray()[0]
)
Edit:
Adding benchmarks for different serialization models:
open System
open System.IO
open System.Runtime.InteropServices
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
open MBrace.FsPickler
open Microsoft.FSharp.NativeInterop
open Newtonsoft.Json
#nowarn "9"
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type TestStruct =
{
[<FieldOffset(00)>] SomeValue: int
[<FieldOffset(04)>] AnotherValue: int
[<FieldOffset(08)>] YetAnotherValue: double
}
static member MakeOne(r: Random) =
{
SomeValue = r.Next()
AnotherValue = r.Next()
YetAnotherValue = r.NextDouble()
}
[<MemoryDiagnoser>]
type Benchmarks () =
let testData =
let random = Random(1000)
Array.init 1000 (fun _ -> TestStruct.MakeOne(random))
// inits, outside of the benchmarks
// FSPickler
let FSPicklerSerializer = FsPickler.CreateBinarySerializer()
// APEX
let ApexSettings = Apex.Serialization.Settings().MarkSerializable(typeof<TestStruct>)
let ApexBinarySerializer = Apex.Serialization.Binary.Create(ApexSettings)
[<Benchmark>]
member _.Thomas() = // thomas' save to disk
let sizeStruct = sizeof<TestStruct>
use ptr = fixed testData
Span<byte>(NativePtr.toVoidPtr ptr, testData.Length * sizeStruct).ToArray()
[<Benchmark>]
member _.Newtonsoft() =
JsonConvert.SerializeObject(testData)
[<Benchmark>]
member _.FSPickler() =
FSPicklerSerializer.Pickle testData
[<Benchmark>]
member _.Apex() =
let outputStream = new MemoryStream()
ApexBinarySerializer.Write(testData, outputStream)
[<EntryPoint>]
let main _ =
let _ = BenchmarkRunner.Run<Benchmarks>()
0
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------- |-------------:|-------------:|-------------:|---------:|--------:|--------:|----------:|
| Thomas | 878.4 ns | 11.74 ns | 10.41 ns | 2.5444 | 0.1411 | - | 16 KB |
| Newtonsoft | 880,641.2 ns | 16,346.50 ns | 15,290.52 ns | 103.5156 | 79.1016 | 48.8281 | 508 KB |
| FSPickler | 71,786.6 ns | 1,373.89 ns | 1,349.35 ns | 13.6719 | 2.0752 | - | 84 KB |
| Apex | 1,088.8 ns | 20.59 ns | 22.03 ns | 2.6093 | 0.0725 | - | 16 KB |
It looks like Apex is very close to what I did, but it's probably a lot more flexible and more optimized, so it could make sense to switch to it, UNLESS what I have can be a lot more optimized.
I have to see how #JL0PD's excellent comments can improve the speed.
Out of interest I took the lambda at the end of your question and tested three similar implementations and ran it on Benchmark.Net.
Reference - as you have shown
Mutable Struct - as I might have done it with a mutable struct
Record - using a plain old dumb record
See the results for yourself. Plain old dumb record is the fastest (though only marginally faster than my attempt and ~10x faster than your example). Write dumb code first. Benchmark it. Then try to improve.
#nowarn "9"
open System
open System.Runtime.InteropServices
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
open Microsoft.FSharp.NativeInterop
type ShortEventDataRec =
{
Timestamp: DateTime
Event: byte[]
Length: int
}
[<Struct>]
[<StructLayout(LayoutKind.Explicit)>]
type ShortEventData =
{
[<FieldOffset(00)>] Timestamp: DateTime
[<FieldOffset(08)>] Event: byte
[<FieldOffset(1032)>] Length: int
}
[<StructLayout(LayoutKind.Explicit)>]
type MutableShortEventData =
struct
[<FieldOffset(00)>] val mutable Timestamp: DateTime
[<FieldOffset(08)>] val mutable Event: byte
[<FieldOffset(1032)>] val mutable Length: int
end
[<MemoryDiagnoser>]
type Benchmarks () =
let event =
Array.init 1024 (fun i -> byte (i % 256))
let time = DateTime.Now
let sizeStruct = sizeof<ShortEventData>
[<Benchmark>]
member __.Reference() =
let bytes = event
let serializableEvent =
{
ShortEventData.Timestamp = time
Event = byte 0
Length = bytes.Length
}
use ptr = fixed [|serializableEvent|]
let nativeSpan = Span<byte>(NativePtr.toVoidPtr ptr, sizeStruct)
for i = 0 to bytes.Length - 1 do
nativeSpan.[8 + i] <- bytes.[i]
MemoryMarshal.Cast<byte, ShortEventData>(nativeSpan).[0]
[<Benchmark>]
member __.MutableStruct() =
let bytes = event
let targetBytes = GC.AllocateUninitializedArray(sizeStruct)
let targetSpan = Span(targetBytes)
let targetStruct = MemoryMarshal.Cast<_, MutableShortEventData>(targetSpan)
targetStruct.[0].Timestamp <- time
let targetEvent = bytes.CopyTo(targetSpan.Slice(8, 1024))
targetStruct.[0].Length <- event.Length
targetStruct.[0]
[<Benchmark>]
member __.Record() =
let bytes = event
let serializableEvent =
{
ShortEventDataRec.Timestamp = time
Event =
let eventBytes = GC.AllocateUninitializedArray(bytes.Length)
System.Array.Copy(bytes, eventBytes, bytes.Length)
eventBytes
Length = bytes.Length
}
serializableEvent
[<EntryPoint>]
let main _ =
let _ = BenchmarkRunner.Run<Benchmarks>()
0
Method
Mean
Error
StdDev
Gen 0
Gen 1
Allocated
Reference
526.88 ns
6.318 ns
5.909 ns
0.0629
-
1 KB
MutableStruct
49.50 ns
0.966 ns
1.074 ns
0.0636
-
1 KB
Record
42.73 ns
0.672 ns
0.628 ns
0.0650
0.0002
1 KB

rust how to collapse if let - clippy suggestion

I run cargo clippy to get some feedback on my code and clippy told me that I can somehow collapse a if let.
Here is the exact "warning":
warning: this `if let` can be collapsed into the outer `if let`
--> src\main.rs:107:21
|
107 | / if let Move::Normal { piece, from, to } = turn {
108 | | if i8::abs(from.1 - to.1) == 2 && piece.getColor() != *color && to.0 == x {
109 | | let offsetX = x - to.0;
110 | |
... |
116 | | }
117 | | }
| |_____________________^
I thought I could maybe just append the inner if using && but then i get a warning ( `let` expressions in this position are experimental, I am using rust version 1.57.0, not nightly).
Any idea what clippy wants me to do?
Edit:
the outer if let is itself again inside another if let:
if let Some(turn) = board.getLastMove() {
And it seems you can indeed combine them like so:
if let Some(Move::Normal { piece, from, to }) = board.getLastMove() {
In my opinion the clippy lint should include the line above as it is otherwise, at least for me, somewhat confusing
Edit 2:
Turns out I just cant read, below the warning listed above was some more information telling me exactly what to do.
= note: `#[warn(clippy::collapsible_match)]` on by default
help: the outer pattern can be modified to include the inner pattern
--> src\main.rs:126:29
|
126 | if let Some(turn) = board.getLastMove() {
| ^^^^ replace this binding
127 | if let Move::Normal { piece, from, to } = turn {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ with this pattern
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#collapsible_match

antlr visitor: lookup of reserved words efficiently

I'm learning Antlr. At this point, I'm writing a little stack-based language as part of my learning process -- think PostScript or Forth. An RPN language. For instance:
10 20 mul
This would push 10 and 20 on the stack and then perform a multiply, which pops two values, multiplies them, and pushes 200. I'm using the visitor pattern. And I find myself writing some code that's kind of insane. There has to be a better way.
Here's a section of my WaveParser.g4 file:
any_operator:
value_operator |
stack_operator |
logic_operator |
math_operator |
flow_control_operator;
value_operator:
BIND | DEF
;
stack_operator:
DUP |
EXCH |
POP |
COPY |
ROLL |
INDEX |
CLEAR |
COUNT
;
BIND is just the bind keyword, etc. So my visitor has this method:
antlrcpp::Any WaveVisitor::visitAny_operator(Parser::Any_operatorContext *ctx);
And now here's where I'm getting to the very ugly code I'm writing, which leads to the question.
Value::Operator op = Value::Operator::NO_OP;
WaveParser::Value_operatorContext * valueOp = ctx->value_operator();
WaveParser::Stack_operatorContext * stackOp = ctx->stack_operator();
WaveParser::Logic_operatorContext * logicOp = ctx->logic_operator();
WaveParser::Math_operatorContext * mathOp = ctx->math_operator();
WaveParser::Flow_control_operatorContext * flowOp = ctx->flow_control_operator();
if (valueOp) {
if (valueOp->BIND()) {
op = Value::Operator::BIND;
}
else if (valueOp->DEF()) {
op = Value::Operator::DEF;
}
}
else if (stackOp) {
if (stackOp->DUP()) {
op = Value::Operator::DUP;
}
...
}
...
I'm supporting approximately 50 operators, and it's insane that I'm going to have this series of if statements to figure out which operator this is. There must be a better way to do this. I couldn't find a field on the context that mapped to something I could use in a hashmap table.
I don't know if I should make every one of my operators have a separate rule, and use the corresponding method in my visitor, or if what else I'm missing.
Is there a better way?
With ANTLR, it's usually very helpful to label components of your rules, as well as the high level alternatives.
If part of a parser rule can only be one thing with a single type, usually the default accessors are just fine. But if you have several alternatives that are essentially alternatives for the "same thing", or perhaps you have the same sub-rule reference in a parser rule more than one time and want to differentiate them, it's pretty handy to give them names. (Once you start doing this and see the impact to the Context classes, it'll become pretty obvious where they provide value.)
Also, when rules have multiple top-level alternatives, it's very handy to give each of them a label. This will cause ANTLR to generate a separate Context class for each alternative, instead of dumping everything from every alternative into a single class.
(making some stuff up just to get a valid compile)
grammar WaveParser
;
any_operator
: value_operator # val_op
| stack_operator # stack_op
| logic_operator # logic_op
| math_operator # math_op
| flow_control_operator # flow_op
;
value_operator: op = ( BIND | DEF);
stack_operator
: op = (
DUP
| EXCH
| POP
| COPY
| ROLL
| INDEX
| CLEAR
| COUNT
)
;
logic_operator: op = (AND | OR);
math_operator: op = (ADD | SUB);
flow_control_operator: op = (FLOW1 | FLOW2);
AND: 'and';
OR: 'or';
ADD: '+';
SUB: '-';
FLOW1: '>>';
FLOW2: '<<';
BIND: 'bind';
DEF: 'def';
DUP: 'dup';
EXCH: 'exch';
POP: 'pop';
COPY: 'copy';
ROLL: 'roll';
INDEX: 'index';
CLEAR: 'clear';
COUNT: 'count';