Ported Python code directly to Rust—how can I optimize it to fully leverage Rust’s performance benefits?
I've recently converted some Python code to Rust, hoping for a big performance boost. While it works, I suspect I'm not fully tapping into Rust's strengths as I learned Rust just for this and its my first "project". Below, I’ve shared the code and a screenshot displaying a profiling of the code—I'd really appreciate any tips on how to optimize it further to make the most of Rust’s capabilities.
struct PolyState {
a: Integer,
b: Integer,
c: Integer,
b_list: Vec<Integer>,
bainv: HashMap<u32, Vec<u32>>,
soln_map: HashMap<u32, (u32, u32)>,
s: u32,
afact: HashSet<u32>,
}
fn generate_first_polynomial(
qs_state: &mut QsState,
n: &Integer,
m: u32,
factor_base: &Vec<i32>,
poly_a_list: &mut Vec<Integer>,
) -> PolyState {
let (a, qli, afact) = generate_a(n, m, factor_base, poly_a_list);
let s = qli.len();
let mut b_list: Vec<Integer> = Vec::new();
for l in 0..s {
let p = factor_base[qli[l]] as u32;
let r1 = qs_state.root_map[&p].0 as i32;
let aq = (&a / p).complete();
let invaq = modinv(aq.clone(), p as i32);
let mut gamma = (r1 * invaq).rem_euclid(p as i32) as u32; // if code doesn't work, ensure overflow isn't happening here
if gamma > p / 2 {
gamma = p - gamma;
}
b_list.push(aq * gamma);
}
let b: Integer = b_list.iter().sum::<Integer>().modulo(&a);
let c = (&b * &b - n).complete() / &a;
let mut bainv: HashMap<u32, Vec<u32>> = HashMap::new();
let mut soln_map: HashMap<u32, (u32, u32)> = HashMap::new();
for p in factor_base {
if (&a % *p).complete() == 0 || *p < 3 {
continue;
}
bainv.insert(*p as u32, vec![0; s]);
let ainv = modinv(a.clone(), *p);
// store bainv
let vector = bainv
.get_mut(&(*p as u32))
.expect("Could not get mutable reference");
let mut value: Integer;
for j in 0..s {
value = Integer::from(&b_list[j]) * 2;
value *= ainv;
vector[j] = value.mod_u(*p as u32);
}
// store roots
let (r1, r2) = qs_state.root_map.get(&(*p as u32)).unwrap();
let mut r1 = (r1 - &b).complete();
let mut r2 = (r2 - &b).complete();
r1 *= ainv;
r2 *= ainv;
r1 %= p;
r2 %= p;
soln_map.insert(*p as u32, (r1.to_u32().unwrap(), r2.to_u32().unwrap()));
}
PolyState {
a,
b,
c,
b_list,
bainv,
soln_map,
s: s as u32,
afact,
}
}
A lot of the code involves huge Integers so I can't use Rust's native types (even u128) easily. I do not think I am utilizing the correct data structures either but am not really sure how else to approach this. Here is a screenshot of the profiled code using Instruments:
EDIT 1: Here is the code after incorporating many of the given improvements. I have achieved a 2.55x speedup over my initial code in both release and debug mode:
fn generate_first_polynomial(
qs_state: &mut QsState,
n: &Integer,
m: u32,
factor_base: &Vec<i32>,
poly_a_list: &mut Vec<Integer>,
) -> PolyState {
let (a, qli, afact) = generate_a(n, m, factor_base, poly_a_list);
let s = qli.len();
let mut b_list: Vec<Integer> = Vec::new();
for l in 0..s {
let p = factor_base[qli[l]] as u32;
let r1 = qs_state.root_map[&p].0 as i32;
let aq = (&a / p).complete();
let invaq = modinv(&aq, p as i32);
let mut gamma = (r1 * invaq).rem_euclid(p as i32) as u32; // if code doesn't work, ensure overflow isn't happening here
if gamma > p / 2 {
gamma = p - gamma;
}
b_list.push(aq * gamma);
}
let b: Integer = b_list.iter().sum::<Integer>().modulo(&a);
let c = (&b * &b - n).complete() / &a;
let mut bainv: FxHashMap<u32, Vec<u32>> = FxHashMap::default();
let mut soln_map: FxHashMap<u32, (u32, u32)> = FxHashMap::default();
bainv.reserve(factor_base.len());
soln_map.reserve(factor_base.len());
let mut r1 = Integer::new();
let mut r2 = Integer::new();
let mut res = Integer::new();
for p in factor_base {
res.assign(&a % *p);
if res == 0 || *p < 3 {
continue;
}
let ainv = modinv(&a, *p);
// store bainv
let mut vector = vec![0; s];
let mut value = Integer::new();
for j in 0..s {
value.assign(&b_list[j]);
value *= 2;
value *= ainv;
vector[j] = value.mod_u(*p as u32);
}
bainv.insert(*p as u32, vector);
// store roots
let (r1_val, r2_val) = qs_state.root_map.get(&(*p as u32)).unwrap();
r1.assign(r1_val - &b);
r2.assign(r2_val - &b);
r1 *= &ainv;
r2 *= &ainv;
soln_map.insert(*p as u32, (r1.mod_u(*p as u32), r2.mod_u(*p as u32)));
}
PolyState {
a,
b,
c,
b_list,
bainv,
soln_map,
s: s as u32,
afact,
}
}
EDIT 2: Probably my last edit unless I see another huge performance jump. Incorporating the suggestions of replacing HashMaps with Vectors(at the cost of a lot of memory) and some other minor inefficiencies I found in the code, the code now runs 3.5x faster than the original code.
As a note I did try using Rayon alongside DashMap but only got a 1.2x performance boost at the expense of 600% CPU utilization and deemed it was not worth it.
Vectors initialized outside of the function:
// b is a bit larger than largest prime in factor_base
let mut bainv: Vec<Vec<u32>> = vec![vec![0; 30]; (b + 1) as usize];
let mut soln_map: Vec<(u32, u32)> = vec![(0, 0); (b + 1) as usize];
And the new function:
fn generate_first_polynomial(
qs_state: &mut QsState,
n: &Integer,
m: u32,
bainv: &mut Vec<Vec<u32>>,
soln_map: &mut [(u32, u32)],
factor_base: &Vec<i32>,
poly_a_list: &mut Vec<Integer>,
) -> PolyState {
let (a, qli, afact) = generate_a(n, m, factor_base, poly_a_list);
let s = qli.len();
let mut b_list: Vec<Integer> = vec![Integer::new(); s];
for i in 0..s {
let p = factor_base[qli[i]] as u32;
let r1 = qs_state.root_map[&p].0 as i32;
let aq = (&a / p).complete();
let invaq = modinv(&aq, p as i32);
let mut gamma = (r1 * invaq).rem_euclid(p as i32) as u32; // if code doesn't work, ensure overflow isn't happening here
if gamma > p / 2 {
gamma = p - gamma;
}
b_list[i] = aq * gamma;
}
let b: Integer = b_list.iter().sum::<Integer>().modulo(&a);
let c = (&b * &b - n).complete() / &a;
let mut r1 = Integer::new();
let mut r2 = Integer::new();
let mut res = Integer::new();
let mut value = Integer::new();
factor_base.iter()
.for_each(|p| {
res.assign(&a % *p);
if res == 0 || *p < 3 {
return;
}
let ainv = modinv(&a, *p);
let ainv2 = 2 * ainv;
// store bainv
for j in 0..s {
value.assign(&b_list[j]);
value *= ainv2;
bainv[*p as usize][j] = value.mod_u(*p as u32);
}
// store roots
let (r1_val, r2_val) = qs_state.root_map.get(&(*p as u32)).unwrap();
r1.assign(r1_val - &b);
r2.assign(r2_val - &b);
r1 *= &ainv;
r2 *= &ainv;
soln_map[*p as usize].0 = r1.mod_u(*p as u32);
soln_map[*p as usize].1 = r2.mod_u(*p as u32)
});
PolyState {
a,
b,
c,
b_list,
s: s as u32,
afact,
}
}