Optimal Algorithms for Stochastic Linear Bandits

I am wondering if anyone has seen a paper matching a regret lower bound for the following setting. A linear bandit where one of n loss functions is chosen stochastically at each time step according a distribution over 1, ..., n. The learner chooses an action in a d-dimensional continuous space. I'm assuming bandit feedback; the learner only observes the loss of each action at each time.

From what I've read, a lower bound of of O( d \sqrt{T} ) has been proven; however, I think there hasn't been an algorithm shown that has expected regret upper bounded by O( d \sqrt{T} ). The best algorithms I've seen, trying to look as broadly as possible, guarantee O( d \sqrt{T log T} ) expected regret. If there was an algorithm guaranteeing O( d \sqrt{T} ), I would expect the paper to be easy to find.

Does anyone know of an algorithm that shaves off logarithmic factors that guarantees O( d \sqrt{T} ) under bandit feedback?