How to build multiple concurrent servers with Rust and Tokio? - concurrency

I'm looking to build multiple concurrent servers on different ports with Rust and Tokio:
let mut core = Core::new().unwrap();
let handle = core.handle();
// I want to bind to multiple port here if it's possible with simple addresses
let addr = "127.0.0.1:80".parse().unwrap();
let addr2 = "127.0.0.1:443".parse().unwrap();
// Or here if there is a special function on the TcpListener
let sock = TcpListener::bind(&addr, &handle).unwrap();
// Or here if there is a special function on the sock
let server = sock.incoming().for_each(|(client_stream, remote_addr)| {
// And then retrieve the current port in the callback
println!("Receive connection on {}!", mysterious_function_to_retrieve_the_port);
Ok(())
});
core.run(server).unwrap();
Is there an option with Tokio to listen to multiple ports or do I need to create a simple thread for each port and run Core::new() in each?
Thanks to rust-scoped-pool, I have:
let pool = Pool::new(2);
let mut listening_on = ["127.0.0.1:80", "127.0.0.1:443"];
pool.scoped(|scope| {
for address in &mut listening_on {
scope.execute(move ||{
let mut core = Core::new().unwrap();
let handle = core.handle();
let addr = address.parse().unwrap();
let sock = TcpListener::bind(&addr, &handle).unwrap();
let server = sock.incoming().for_each(|(client_stream, remote_addr)| {
println!("Receive connection on {}!", address);
Ok(())
});
core.run(server).unwrap();
});
}
});
rust-scoped-pool is the only solution I have found to execute multiple threads and wait forever after spawning them. I think it's working but I was wondering if a simpler solution existed.

You can run multiple servers from one thread. core.run(server).unwrap(); is just a convenience method and not the only/main way to do things.
Instead of running the single ForEach to completion, spawn each individually and then just keep the thread alive:
let mut core = Core::new().unwrap();
let handle = core.handle();
// I want to bind to multiple port here if it's possible with simple addresses
let addr = "127.0.0.1:80".parse().unwrap();
let addr2 = "127.0.0.1:443".parse().unwrap();
// Or here if there is a special function on the TcpListener
let sock = TcpListener::bind(&addr, &handle).unwrap();
// Or here if there is a special function on the sock
let server = sock.incoming().for_each(|(client_stream, remote_addr)| {
// And then retrieve the current port in the callback
println!("Receive connection on {}!", mysterious_function_to_retrieve_the_port);
Ok(())
});
handle.spawn(sock);
handle.spawn(server);
loop {
core.turn(None);
}

I'd just like to follow up that there seems to be a slightly less manual way to do things than 46bit's answer (at least as of 2019).
let addr1 = "127.0.0.1:80".parse().unwrap();
let addr2 = "127.0.0.1:443".parse().unwrap();
let sock1 = TcpListener::bind(&addr1, &handle).unwrap();
let sock2 = TcpListener::bind(&addr2, &handle).unwrap();
let server1 = sock1.incoming().for_each(|_| Ok(()));
let server2 = sock2.incoming().for_each(|_| Ok(()));
let mut runtime = tokio::runtime::Runtime()::new().unwrap();
runtime.spawn(server1);
runtime.spawn(server2);
runtime.shutdown_on_idle().wait().unwrap();

Related

How do I un-gzip a file without saving it?

I am new to rust and I am trying to port golang code that I had written previosuly. The go code basically downloaded files from s3 and directly (without writing to disk) ungziped the files and parsed them.
Currently the only solution I found is to save the gzipped files on disk then ungzip and parse them.
Perfect pipeline would be to directly ungzip and parse them.
How can I accomplish this?
const ENV_CRED_KEY_ID: &str = "KEY_ID";
const ENV_CRED_KEY_SECRET: &str = "KEY_SECRET";
const BUCKET_NAME: &str = "bucketname";
const REGION: &str = "us-east-1";
use anyhow::{anyhow, bail, Context, Result}; // (xp) (thiserror in prod)
use aws_sdk_s3::{config, ByteStream, Client, Credentials, Region};
use std::env;
use std::io::{Write};
use tokio_stream::StreamExt;
#[tokio::main]
async fn main() -> Result<()> {
let client = get_aws_client(REGION)?;
let keys = list_keys(&client, BUCKET_NAME, "CELLDATA/year=2022/month=06/day=06/").await?;
println!("List:\n{}", keys.join("\n"));
let dir = Path::new("input/");
let key: &str = &keys[0];
download_file_bytes(&client, BUCKET_NAME, key, dir).await?;
println!("Downloaded {key} in directory {}", dir.display());
Ok(())
}
async fn download_file_bytes(client: &Client, bucket_name: &str, key: &str, dir: &Path) -> Result<()> {
// VALIDATE
if !dir.is_dir() {
bail!("Path {} is not a directory", dir.display());
}
// create file path and parent dir(s)
let mut file_path = dir.join(key);
let parent_dir = file_path
.parent()
.ok_or_else(|| anyhow!("Invalid parent dir for {:?}", file_path))?;
if !parent_dir.exists() {
create_dir_all(parent_dir)?;
}
file_path.set_extension("json");
// BUILD - aws request
let req = client.get_object().bucket(bucket_name).key(key);
// EXECUTE
let res = req.send().await?;
// STREAM result to file
let mut data: ByteStream = res.body;
let file = File::create(&file_path)?;
let Some(bytes)= data.try_next().await?;
let mut gzD = GzDecoder::new(&bytes);
let mut buf_writer = BufWriter::new( file);
while let Some(bytes) = data.try_next().await? {
buf_writer.write(&bytes)?;
}
buf_writer.flush()?;
Ok(())
}
fn get_aws_client(region: &str) -> Result<Client> {
// get the id/secret from env
let key_id = env::var(ENV_CRED_KEY_ID).context("Missing S3_KEY_ID")?;
let key_secret = env::var(ENV_CRED_KEY_SECRET).context("Missing S3_KEY_SECRET")?;
// build the aws cred
let cred = Credentials::new(key_id, key_secret, None, None, "loaded-from-custom-env");
// build the aws client
let region = Region::new(region.to_string());
let conf_builder = config::Builder::new().region(region).credentials_provider(cred);
let conf = conf_builder.build();
// build aws client
let client = Client::from_conf(conf);
Ok(client)
}
Your snippet doesn't tell where GzDecoder comes from, but I'll assume it's flate2::read::GzDecoder.
flate2::read::GzDecoder is already built in a way that it can wrap anything that implements std::io::Read:
GzDecoder::new expects an argument that implements Read => deflated data in
GzDecoder itself implements Read => inflated data out
Therefore, you can use it just like a BufReader: Wrap your reader and used the wrapped value in place:
use flate2::read::GzDecoder;
use std::fs::File;
use std::io::BufReader;
use std::io::Cursor;
fn main() {
let data = [0, 1, 2, 3];
// Something that implements `std::io::Read`
let c = Cursor::new(data);
// A dummy output
let mut out_file = File::create("/tmp/out").unwrap();
// Using the raw data would look like this:
// std::io::copy(&mut c, &mut out_file).unwrap();
// To inflate on the fly, "pipe" the data through the decoder, i.e. wrap the reader
let mut stream = GzDecoder::new(c);
// Consume the `Read`er somehow
std::io::copy(&mut stream, &mut out_file).unwrap();
}
playground
You don't mention what "and parse them" entails, but the same concept applies: If your parser can read from an impl Read (e.g. it can read from a std::fs::File), then it can also read directly from a GzDecoder.

How do I get difficulty over time from Kulupu (polkadotjs)?

// Import
import { ApiPromise, WsProvider } from "#polkadot/api";
// Construct
/*
https://rpc.kulupu.network
https://rpc.kulupu.network/ws
https://rpc.kulupu.corepaper.org
https://rpc.kulupu.corepaper.org/ws
*/
(async () => {
//const wsProvider = new WsProvider('wss://rpc.polkadot.io');
const wsProvider = new WsProvider("wss://rpc.kulupu.network/ws");
const api = await ApiPromise.create({ provider: wsProvider });
// Do something
const chain = await api.rpc.system.chain();
console.log(`You are connected to ${chain} !`);
console.log(await api.query.difficulty.pastDifficultiesAndTimestamps.toJSON());
console.log(api.genesisHash.toHex());
})();
The storage item pastDifficultiesAndTimestamps only holds the last 60 blocks worth of data. For getting that information you just need to fix the following:
console.log(await api.query.difficulty.pastDifficultiesAndTimestamps());
If you want to query the difficulty of a blocks in general, a loop like this will work:
let best_block = await api.derive.chain.bestNumber()
// Could be 0, but that is a lot of queries...
let first_block = best_block - 100;
for (let block = first_block; block < best_block; block++) {
let block_hash = await api.rpc.chain.getBlockHash(block);
let difficulty = await api.query.difficulty.currentDifficulty.at(block_hash);
console.log(block, difficulty)
}
Note that this requires an archive node which has informaiton about all the blocks. Otherwise, by default, a node only stores ~256 previous blocks before state pruning cleans things up.
If you want to see how to make a query like this, but much more efficiently, look at my blog post here:
https://www.shawntabrizi.com/substrate/porting-web3-js-to-polkadot-js/

What is the idiomatic way to write Rust microservice with shared db connections and caches?

I'm writing my first Rust microservice with hyper. After years of development in C++ and Go I tend to use controller for processing requests (like here - https://github.com/raycad/go-microservices/blob/master/src/user-microservice/controllers/user.go) where the controller stores shared data like db connection pool and different kinds of cache.
I know, with hyper, I can write it this way:
use hyper::{Body, Request, Response};
pub struct Controller {
// pub cache: Cache,
// pub db: DbConnectionPool
}
impl Controller {
pub fn echo(&mut self, req: Request<Body>) -> Result<Response<Body>, hyper::Error> {
// extensively using db and cache here...
let mut response = Response::new(Body::empty());
*response.body_mut() = req.into_body();
Ok(response)
}
}
and then use it:
use hyper::{Server, Request, Response, Body, Error};
use hyper::service::{make_service_fn, service_fn};
use std::{convert::Infallible, net::SocketAddr, sync::Arc, sync::Mutex};
async fn route(controller: Arc<Mutex<Controller>>, req: Request<Body>) -> Result<Response<Body>, hyper::Error> {
let mut c = controller.lock().unwrap();
c.echo(req)
}
#[tokio::main]
async fn main() {
let addr = SocketAddr::from(([127, 0, 0, 1], 3000));
let controller = Arc::new(Mutex::new(Controller{}));
let make_svc = make_service_fn(move |_conn| {
let controller = Arc::clone(&controller);
async move {
Ok::<_, Infallible>(service_fn(move |req| {
let c = Arc::clone(&controller);
route(c, req)
}))
}
});
let server = Server::bind(&addr).serve(make_svc);
if let Err(e) = server.await {
eprintln!("server error: {}", e);
}
}
Since the compiler doesn't let me share mutable structure between threads I got to use Arc<Mutex<T>> idiom. But I'm afraid the let mut c = controller.lock().unwrap(); part would block the entire controller while processing single request, i.e. there's no concurrency here.
What is the idiomatic way to address this problem?
&mut always acquires a (compile time or runtime) exclusive lock to the value.
Only acquire a &mut at the exact scope you want to get locked.
If a value owned by the locked value needs separate locking management,
wrap it in a Mutex.
Assuming your DbConnectionPool is structured like this:
struct DbConnectionPool {
conns: HashMap<ConnId, Conn>,
}
We need to &mut the HashMap when we add/remove items on the HashMap,
but we don't need to &mut the value in Conn.
So Arc allows us to separate the mutability boundary from its parent,
and Mutex allows us to add its own interior mutability.
Moreover, our echo method doesn't want to be &mut,
so another layer of interior mutability needs to be added on the HashMap.
So we change this to
struct DbConnectionPool {
conns: Mutex<HashMap<ConnId, Arc<Mutex<Conn>>>,
}
Then when you want to get a connection,
fn get(&self, id: ConnId) -> Arc<Mutex<Conn>> {
let mut pool = self.db.conns.lock().unwrap(); // ignore error if another thread panicked
if let Some(conn) = pool.get(id) {
Arc::clone(conn)
} else {
// here we will utilize the interior mutability of `pool`
let arc = Arc::new(Mutex::new(new_conn()));
pool.insert(id, Arc::clone(&arc));
arc
}
}
(the ConnId param and the if-exists-else logic is used to simplify the code; you can change the logic)
On the returned value you can do
self.get(id).lock().unwrap().query(...)
For convenient illustration I changed the logic to user supplying the ID.
In reality, you should be able to find a Conn that has not been acquired and return it.
Then you can return a RAII guard for Conn,
similar to how MutexGuard works,
to auto free the connection when user stops using it.
Also consider using RwLock instead of Mutex if that might result in a performance boost.

persistent HTTP connection using cohttp library (or other)

It seems (based on wireshark), cohttp client closes its connection automatically after response to GET request was received.
Is there a way to keep this connection alive (to make it persistent)?
If no is there any other HTTP library to create persistent connections?
Looking at the code at github it doesn't look like there is such an option.
let call ?(ctx=default_ctx) ?headers ?(body=`Empty) ?chunked meth uri =
...
Net.connect_uri ~ctx uri >>= fun (conn, ic, oc) ->
let closefn () = Net.close ic oc in
...
read_response ~closefn ic oc meth
Where read_response is:
let read_response ~closefn ic oc meth =
...
match has_body with
| `Yes | `Unknown ->
let reader = Response.make_body_reader res ic in
let stream = Body.create_stream Response.read_body_chunk reader in
let closefn = closefn in
Lwt_stream.on_terminate stream closefn;
let gcfn st = closefn () in
Gc.finalise gcfn stream;
let body = Body.of_stream stream in
return (res, body)
If I am reading this correctly the connection will close as soon as the GC cleans up the stream.

Sharing mutable self between multiple threads

I have a server that accepts connections from multiple clients. Each client could send a message to the server, which is broadcast to all other clients. The problem is that the function that handles each connection should have a reference to the server. However, I want to handle the connections in separate threads, so I cannot use a reference directly.
Since scoped is deprecated, I tried wrapping self in an Arc, but more problems ensued. Below is my attempt:
struct Server {
listener: TcpListener,
clients: Vec<TcpStream>
}
impl Server {
fn new() -> Server {
Server {
listener : TcpListener::bind("127.0.0.1:8085").unwrap(),
clients : vec![]
}
}
fn handle(&self) {
println!("test");
}
fn start(mut self) {
let mut handles = vec![];
let a : Arc<Mutex<Server>> = Arc::new(Mutex::new(self));
let mut selfm = a.lock().unwrap();
// cannot borrow as mutable... ?
for stream in selfm.listener.incoming() {
match stream {
Ok(stream) => {
selfm.clients.push(stream);
let aa = a.clone();
handles.push(thread::spawn(move || {
aa.lock().unwrap().handle();
}));
},
Err(e) => { println!("{}", e); },
}
}
}
Rust Playground
I don't understand what to do anymore, and I fear deadlocks will arise with all these locks. Do you have any suggestions?
The error is pretty much unrelated to having multiple threads. The issue is, as the compiler says, that selfm is already borrowed in the line
for stream in selfm.listener.incoming() {
so it cannot be mutably borrowed in the line
selfm.clients.push(stream);
One way to fix this is to destructure selfm before the loop, so the borrows don't conflict. Your start method will then look as follows:
fn start(mut self) {
let mut handles = vec![];
let a : Arc<Mutex<Server>> = Arc::new(Mutex::new(self));
let mut selfm = a.lock().unwrap();
// destructure selfm here to get a reference to the listener and a mutable reference to the clients
let Server { ref listener, ref mut clients} = *selfm;
for stream in listener.incoming() { // listener can be used here
match stream {
Ok(stream) => {
clients.push(stream); // clients can be mutated here
let aa = a.clone();
handles.push(thread::spawn(move || {
aa.lock().unwrap().handle();
}));
},
Err(e) => { println!("{}", e); },
}
}
}
(That being said, you're right to be concerned about the locking, since the mutex will remain locked until selfm goes out of scope, i.e. only when start terminates, i.e. never. I would suggest an alternative design, but it's not really clear to me why you want the threads to have access to the server struct.)