Goodreads book reviews dataset - 10 million books, 6 million reviews
Just thought I'd share this Goodreads dataset here. It took me quite a lot of internet sleuthing to find an interesting, complete and large dataset to practice machine learning and more specifically recommender systems.
This data was originally pulled from Goodreads in 2017 by Zygmunt Zając . It contains detailed metadata information for 10 000 books (sorry about the typo in the title), as well as 6 million individual numerical ratings collected from 53 000 users. There is no demographic information available for users, but the different files included in the release form an interesting basis for a recommender system.
I have released an expansion pack of sorts for this dataset, that adds book descriptions, genres and other features, enabling the use of various NLP strategies. See here for the augmented dataset. Cheers.