Ported Python code directly to Rust—how can I optimize it to fully leverage Rust’s performance benefits?

I've recently converted some Python code to Rust, hoping for a big performance boost. While it works, I suspect I'm not fully tapping into Rust's strengths as I learned Rust just for this and its my first "project". Below, I’ve shared the code and a screenshot displaying a profiling of the code—I'd really appreciate any tips on how to optimize it further to make the most of Rust’s capabilities.

struct PolyState {
    a: Integer,
    b: Integer,
    c: Integer,
    b_list: Vec<Integer>,
    bainv: HashMap<u32, Vec<u32>>,
    soln_map: HashMap<u32, (u32, u32)>,
    s: u32,
    afact: HashSet<u32>,
}

fn generate_first_polynomial(
    qs_state: &mut QsState,
    n: &Integer,
    m: u32,
    factor_base: &Vec<i32>,
    poly_a_list: &mut Vec<Integer>,
) -> PolyState {
    let (a, qli, afact) = generate_a(n, m, factor_base, poly_a_list);
    let s = qli.len();
    let mut b_list: Vec<Integer> = Vec::new();
    for l in 0..s {
        let p = factor_base[qli[l]] as u32;
        let r1 = qs_state.root_map[&p].0 as i32;
        let aq = (&a / p).complete();
        let invaq = modinv(aq.clone(), p as i32);
        let mut gamma = (r1 * invaq).rem_euclid(p as i32) as u32; // if code doesn't work, ensure overflow isn't happening here
        if gamma > p / 2 {
            gamma = p - gamma;
        }
        b_list.push(aq * gamma);
    }

    let b: Integer = b_list.iter().sum::<Integer>().modulo(&a);
    let c = (&b * &b - n).complete() / &a;
    let mut bainv: HashMap<u32, Vec<u32>> = HashMap::new();
    let mut soln_map: HashMap<u32, (u32, u32)> = HashMap::new();

    for p in factor_base {
        if (&a % *p).complete() == 0 || *p < 3 {
            continue;
        }
        bainv.insert(*p as u32, vec![0; s]);
        let ainv = modinv(a.clone(), *p);

        // store bainv
        let vector = bainv
            .get_mut(&(*p as u32))
            .expect("Could not get mutable reference");

        let mut value: Integer;
        for j in 0..s {
            value = Integer::from(&b_list[j]) * 2;
            value *= ainv;
            vector[j] = value.mod_u(*p as u32);
        }

        // store roots

        let (r1, r2) = qs_state.root_map.get(&(*p as u32)).unwrap();
        let mut r1 = (r1 - &b).complete();
        let mut r2 = (r2 - &b).complete();
        r1 *= ainv;
        r2 *= ainv;
        r1 %= p;
        r2 %= p;
        soln_map.insert(*p as u32, (r1.to_u32().unwrap(), r2.to_u32().unwrap()));
    }
    PolyState {
        a,
        b,
        c,
        b_list,
        bainv,
        soln_map,
        s: s as u32,
        afact,
    }
}

A lot of the code involves huge Integers so I can't use Rust's native types (even u128) easily. I do not think I am utilizing the correct data structures either but am not really sure how else to approach this. Here is a screenshot of the profiled code using Instruments:

Profiling Results

EDIT 1: Here is the code after incorporating many of the given improvements. I have achieved a 2.55x speedup over my initial code in both release and debug mode:

fn generate_first_polynomial(
    qs_state: &mut QsState,
    n: &Integer,
    m: u32,
    factor_base: &Vec<i32>,
    poly_a_list: &mut Vec<Integer>,
) -> PolyState {
    let (a, qli, afact) = generate_a(n, m, factor_base, poly_a_list);
    let s = qli.len();
    let mut b_list: Vec<Integer> = Vec::new();
    for l in 0..s {
        let p = factor_base[qli[l]] as u32;
        let r1 = qs_state.root_map[&p].0 as i32;
        let aq = (&a / p).complete();
        let invaq = modinv(&aq, p as i32);
        let mut gamma = (r1 * invaq).rem_euclid(p as i32) as u32; // if code doesn't work, ensure overflow isn't happening here
        if gamma > p / 2 {
            gamma = p - gamma;
        }
        b_list.push(aq * gamma);
    }

    let b: Integer = b_list.iter().sum::<Integer>().modulo(&a);
    let c = (&b * &b - n).complete() / &a;
    let mut bainv: FxHashMap<u32, Vec<u32>> = FxHashMap::default();
    let mut soln_map: FxHashMap<u32, (u32, u32)> = FxHashMap::default();
    bainv.reserve(factor_base.len());
    soln_map.reserve(factor_base.len());

    let mut r1 = Integer::new();
    let mut r2 = Integer::new();
    let mut res = Integer::new();
    for p in factor_base {
        res.assign(&a % *p);
        if res == 0 || *p < 3 {
            continue;
        }
        let ainv = modinv(&a, *p);

        // store bainv
        let mut vector = vec![0; s];

        let mut value = Integer::new();
        for j in 0..s {
            value.assign(&b_list[j]);
            value *= 2;
            value *= ainv;
            vector[j] = value.mod_u(*p as u32);
        }
        bainv.insert(*p as u32, vector);

        // store roots

        let (r1_val, r2_val) = qs_state.root_map.get(&(*p as u32)).unwrap();
        r1.assign(r1_val - &b);
        r2.assign(r2_val - &b);
        r1 *= &ainv;
        r2 *= &ainv;
        soln_map.insert(*p as u32, (r1.mod_u(*p as u32), r2.mod_u(*p as u32)));
    }
    PolyState {
        a,
        b,
        c,
        b_list,
        bainv,
        soln_map,
        s: s as u32,
        afact,
    }
}

EDIT 2: Probably my last edit unless I see another huge performance jump. Incorporating the suggestions of replacing HashMaps with Vectors(at the cost of a lot of memory) and some other minor inefficiencies I found in the code, the code now runs 3.5x faster than the original code.

As a note I did try using Rayon alongside DashMap but only got a 1.2x performance boost at the expense of 600% CPU utilization and deemed it was not worth it.

Vectors initialized outside of the function:

 // b is a bit larger than largest prime in factor_base
let mut bainv: Vec<Vec<u32>> = vec![vec![0; 30]; (b + 1) as usize];
let mut soln_map: Vec<(u32, u32)> = vec![(0, 0); (b + 1) as usize];

And the new function:

fn generate_first_polynomial(
    qs_state: &mut QsState,
    n: &Integer,
    m: u32,
    bainv: &mut Vec<Vec<u32>>,
    soln_map: &mut [(u32, u32)],
    factor_base: &Vec<i32>,
    poly_a_list: &mut Vec<Integer>,
) -> PolyState {
    let (a, qli, afact) = generate_a(n, m, factor_base, poly_a_list);
    let s = qli.len();
    let mut b_list: Vec<Integer> = vec![Integer::new(); s];
    for i in 0..s {
        let p = factor_base[qli[i]] as u32;
        let r1 = qs_state.root_map[&p].0 as i32;
        let aq = (&a / p).complete();
        let invaq = modinv(&aq, p as i32);
        let mut gamma = (r1 * invaq).rem_euclid(p as i32) as u32; // if code doesn't work, ensure overflow isn't happening here
        if gamma > p / 2 {
            gamma = p - gamma;
        }
        b_list[i] = aq * gamma;
    }

    let b: Integer = b_list.iter().sum::<Integer>().modulo(&a);
    let c = (&b * &b - n).complete() / &a;

    let mut r1 = Integer::new();
    let mut r2 = Integer::new();
    let mut res = Integer::new();
    let mut value = Integer::new();

    factor_base.iter()
        .for_each(|p| {
        res.assign(&a % *p);
        if res == 0 || *p < 3 {
            return;
        }
        let ainv = modinv(&a, *p);
        let ainv2 = 2 * ainv;

        // store bainv

        for j in 0..s {
            value.assign(&b_list[j]);
            value *= ainv2;
            bainv[*p as usize][j] = value.mod_u(*p as u32);
        }

        // store roots

        let (r1_val, r2_val) = qs_state.root_map.get(&(*p as u32)).unwrap();
        r1.assign(r1_val - &b);
        r2.assign(r2_val - &b);
        r1 *= &ainv;
        r2 *= &ainv;
        soln_map[*p as usize].0 = r1.mod_u(*p as u32);
        soln_map[*p as usize].1 = r2.mod_u(*p as u32)
    });

    PolyState {
        a,
        b,
        c,
        b_list,
        s: s as u32,
        afact,
    }
}