Supervised by Debabrota Basu and Philippe Preux
For the $t$-th patient
$\color{blue}{\epsilon}$: Privacy budget
Privacy Constraint: Rewards may contain sensitive information about individuals
Privacy as a "stability" constraint: How similar is the sequence of actions when $\pi$ interacts with two environments $\nu$ and $\nu'$?
Proof. Construct a maximally "coupled environment"
Plugging the KL upper bound into classic lower bound proof, we generate:
i.e. the sequence of actions only depend on these statistics
2. Estimate the sequence of "sufficient" statistics privatelywhere $ I_a(t-1) = \hat{\mu}_a(t - 1) + \sqrt{\frac{\alpha \log(t)}{2 N_a(t - 1)}} $
Setting | Leakage Score |
---|---|
Empirical mean | $\frac{1}{n} \| z^\star - \mu \|_{C_\sigma^{-1}}^2$ |
Gaussian Noise $(\gamma > 0)$ | $\frac{\sigma^2}{\sigma^2 + \gamma^2} \frac{1}{n} \| z^\star - \mu \|_{C_\sigma^{-1}}^2$ |
Sub-sampling $(\rho < 1)$ | $\frac{\rho}{n} \| z^\star - \mu \|_{C_\sigma^{-1}}^2$ |
Target Misspecification | $\frac{1}{n} \left(z^\star_{\mathrm{targ}} - \mu\right)^T C_\sigma^{-1}\left(z^\star_{\mathrm{true}} - \mu \right)$ |
Setting | Minimax | Problem Dependent |
---|---|---|
Finite-armed bandits | $\max \biggl(\frac{1}{27} \sqrt{T(K-1)}, \frac{1}{22} \frac{K-1}{\epsilon} \biggr)$ | $\sum_{a: \Delta_{a}>0} \frac{\Delta_{a}\log(T)}{ \min (d_a, \epsilon t_a) } $ |
Linear bandits | $\max \biggl (\frac{\exp (-2)}{8} d\sqrt{T}, \frac{\exp (-1)}{4} \frac{d}{\epsilon} \biggr )$ |
$ \inf _{\alpha \in[0, \infty)^{\mathcal{A}}} \sum_{a \in \mathcal{A}} \alpha(a) \Delta_{a}\log(T) $ $\text { s.t. }\|a\|_{H_{\alpha}^{-1}}^{2} \leq 0.5 \Delta_a \min \left (\Delta_{a}, \epsilon \rho(\mathcal{A}) \right )$ |
Minimax | |
---|---|
Stochastic Multi-armed bandit | $\max \biggl(\frac{1}{27} \sqrt{T(K-1)}, \frac{1}{124} \sqrt{\frac{K-1}{\rho}} \biggr)$ |
Stoachastic Linear bandit | $\max \biggl (\frac{\exp (-2)}{8} d\sqrt{T}, \frac{\exp (-2.25)}{4} \frac{d}{\sqrt{\rho}} \biggr )$ |
Bandit Setting | Regret Upper Bound | Regret Lower Bound |
---|---|---|
Finite-armed | $ \mathcal{O}\left(\sqrt{K T \log(T)}\right) + \color{blue}{\mathcal{O} \left( \frac{ K }{\sqrt{\rho}} \sqrt{\log(T)} \right)}$ | $\Omega\left(\max \left ( \sqrt{KT}, \color{blue}{\sqrt{\frac{K}{\rho}}} \right) \right)$ |
Linear | $\mathcal{O} \left ( \sqrt{d T \log(KT)} \right) + \color{blue}{ \mathcal{O} \left (\frac{d}{\sqrt{\rho}}\log^{\frac{3}{2}}(KT) \right)}$ | $ \Omega\left(\max \left ( d \sqrt{T}, \color{blue}{\frac{d}{\sqrt{\rho}}} \right) \right)$ |
Contextual | $\mathcal{O}\left( d \log(T) \sqrt{T} \right) + \color{blue}{ \mathcal{O} \left ( \frac{d^2}{\sqrt{\rho}} \log^2(T) \right)}$ |