10 years and 30,000 files of audit data

Greetings! I am a data hoarder/curator in my spare time and a compliance engineer by trade. After our last audit I'm starting to dig into the task of curating all of our previous audit responses to help looking up answer for future audits.

To that end I'm looking for a tool or combination of tools that process all 30,000 files (Word, Excel, PDF, TXT and image files) and curate them. Auto-tag them, pull everything into one big searchable database to search on key words & phrases, etc.

As this audit data this would have to stay on prem but in my early searches I've found if I want something that leverages AI for auto-tagging, it isn't on-prem.

Any suggestions are appreciated. Really just trying to wrap my arms around it at this point.