Arguing with lead engineer about incremental file approach
We are using autoloader. However, the incoming files are .gz zipped archives coming from data sync utility. So we have an intermediary process that unzips the archives and moves them to the autoloader directory.
This means we have to devise an approach to determine the new archives coming from data sync.
My proposal has been to use the LastModifiedDate from the file metadata, using a control table to store the watermark.
The lead engineer has now decided they want to unzip and copy ALL files every day to the autoloader directory. Meaning, if we have 1,000 zip archives today, we will unzip and copy 1,000 files to autoloader directory. If we receive 1 new zip archive tomorrow, we will unzip and copy the same 1,000 archives + the 1 new archive.
While I understand the idea and how it supports data resiliency, it is going to blow up our budget, hinder our ability to meet SLAs, and in my opinion goes against the basic principal of a lake house to avoid data redundancy.
What are your thoughts? Are there technical reasons I can use to argue against their approach?