How can I measure the response quality of my RAG?

I want to measure the quality of my RAG outputs to determine if the changes I’m making improve or worsen the results.

Is there a way to measure the quality of RAG outputs? Something similar to testing with test data in machine learning regression or classification tasks?

Does any method exist, or this is more based on intuition?