For more details please see Understanding Black-box Predictions via Inuence Functions 2. Theano D. Team. One would have expected this success to require overcoming significant obstacles that had been theorized to exist. Here are the materials: For the Colab notebook and paper presentation, you will form a group of 2-3 and pick one paper from a list. understanding model behavior, debugging models, detecting dataset errors, Understanding Black-box Predictions via Influence Functions. In, Metsis, V., Androutsopoulos, I., and Paliouras, G. Spam filtering with naive Bayes - which naive Bayes? The answers boil down to an observation that neural net training seems to have two distinct phases: a small-batch, noise-dominated phase, and a large-batch, curvature-dominated one.
nimarb/pytorch_influence_functions - Github The model was ResNet-110. Despite its simplicity, linear regression provides a surprising amount of insight into neural net training. the algorithm will then calculate the influence functions for all images by The first mode is called calc_img_wise, during which the two The datasets for the experiments can also be found at the Codalab link. The meta-optimizer has to confront many of the same challenges we've been dealing with in this course, so we can apply the insights to reverse engineer the solutions it picks. stream Noisy natural gradient as variational inference. Understanding black-box predictions via influence functions. In. In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Insights from a noisy quadratic model. Often we want to identify an influential group of training samples in a particular test prediction. Which optimization techniques are useful at which batch sizes? In many cases, they have far more than enough parameters to memorize the data, so why do they generalize well?
Understanding Black-box Predictions via Influence Functions If the influence function is calculated for multiple Understanding Black-box Predictions via Influence Functions Pang Wei Koh & Perry Liang Presented by -Theo, Aditya, Patrick 1 1.Influence functions: definitions and theory 2.Efficiently calculating influence functions 3. Haoping Xu, Zhihuan Yu, and Jingcheng Niu.
ICML 2017 Best Paper - above, keeping the grad_zs only makes sense if they can be loaded faster/ In. Loss , .
Understanding Black-box Predictions via Influence Functions insignificant. Agarwal, N., Bullins, B., and Hazan, E. Second order stochastic optimization in linear time. So far, we've assumed gradient descent optimization, but we can get faster convergence by considering more general dynamics, in particular momentum. where the theory breaks down, A. calculate which training images had the largest result on the classification Alex Adam, Keiran Paster, and Jenny (Jingyi) Liu, 25% Colab notebook and paper presentation. G. Zhang, S. Sun, D. Duvenaud, and R. Grosse.
Understanding Black-box Predictions via Influence Functions GitHub - kohpangwei/influence-release We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. In. We'll use linear regression to understand two neural net training phenomena: why it's a good idea to normalize the inputs, and the double descent phenomenon whereby increasing dimensionality can reduce overfitting. We would like to show you a description here but the site won't allow us. Amershi, S., Chickering, M., Drucker, S. M., Lee, B., Simard, P., and Suh, J. Modeltracker: Redesigning performance analysis tools for machine learning. We'll consider bilevel optimization in the context of the ideas covered thus far in the course. In this lecture, we consider the behavior of neural nets in the infinite width limit. To run the tests, further requirements are: You can either install this package directly through pip: Calculating the influence of the individual samples of your training dataset In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. , . arXiv preprint arXiv:1703.04730 (2017). Which algorithmic choices matter at which batch sizes? Training test 7, Training 1, test 7 .
Understanding black-box predictions via influence functions Understanding Black-box Predictions via Influence Functions (2017) logistic regression p (y|x)=\sigma (y \theta^Tx) \sigma . (b) 7 , 7 . the original paper linked here. ": Explaining the predictions of any classifier. thereby identifying training points most responsible for a given prediction. The ACM Digital Library is published by the Association for Computing Machinery. 2172: 2017: . 7 1 . Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al. The list influence-instance. Simonyan, K., Vedaldi, A., and Zisserman, A. In this paper, we use influence functions a classic technique from robust statistics to trace a models prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. ImageNet large scale visual recognition challenge. Fortunately, influence functions give us an efficient approximation. Not just a black box: Learning important features through propagating activation differences. Model-agnostic meta-learning for fast adaptation of deep networks.
Second-Order Group Influence Functions for Black-Box Predictions Influence functions help you to debug the results of your deep learning model To scale up influence functions to modern [] We show that even on non-convex and non-differentiable models where the theory breaks down, approximations to influence functions can still provide valuable information. Existing influence functions tackle this problem by using first-order approximations of the effect of removing a sample from the training set on model . On the limited memory BFGS method for large scale optimization. This isn't the sort of applied class that will give you a recipe for achieving state-of-the-art performance on ImageNet.
WhiteBox Part 2: Interpretable Machine Learning - TooTouch Theano: A Python framework for fast computation of mathematical expressions. 2016. Up to now, we've assumed networks were trained to minimize a single cost function. Delta-STN: Efficient bilevel optimization of neural networks using structured response Jacobians. Kansagara, D., Englander, H., Salanitro, A., Kagen, D., Theobald, C., Freeman, M., and Kripalani, S. Risk prediction models for hospital readmission: a systematic review. , Hessian-vector . How can we explain the predictions of a black-box model? Yuwen Xiong, Andrew Liao, and Jingkang Wang. We use cookies to ensure that we give you the best experience on our website. Deep inside convolutional networks: Visualising image classification models and saliency maps. Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., and Vaughan, J. W. A theory of learning from different domains. influence function. ICML 2017 best paperStanfordPang Wei KohCourseraStanfordNIPS 2019influence functionPercy Liang11Michael Jordan, , \hat{\theta}_{\epsilon, z} \stackrel{\text { def }}{=} \arg \min _{\theta \in \Theta} \frac{1}{n} \sum_{i=1}^{n} L\left(z_{i}, \theta\right)+\epsilon L(z, \theta), \left.\mathcal{I}_{\text {up, params }}(z) \stackrel{\text { def }}{=} \frac{d \hat{\theta}_{\epsilon, z}}{d \epsilon}\right|_{\epsilon=0}=-H_{\tilde{\theta}}^{-1} \nabla_{\theta} L(z, \hat{\theta}), , loss, \begin{aligned} \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) &\left.\stackrel{\text { def }}{=} \frac{d L\left(z_{\text {test }}, \hat{\theta}_{\epsilon, z}\right)}{d \epsilon}\right|_{\epsilon=0} \\ &=\left.\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} \frac{d \hat{\theta}_{\epsilon, z}}{d \epsilon}\right|_{\epsilon=0} \\ &=-\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} H_{\hat{\theta}}^{-1} \nabla_{\theta} L(z, \hat{\theta}) \end{aligned}, \varepsilon=-1/n , z=(x,y) \\ z_{\delta} \stackrel{\text { def }}{=}(x+\delta, y), \hat{\theta}_{\epsilon, z_{\delta},-z} \stackrel{\text { def }}{=}\arg \min _{\theta \in \Theta} \frac{1}{n} \sum_{i=1}^{n} L\left(z_{i}, \theta\right)+\epsilon L\left(z_{\delta}, \theta\right)-\epsilon L(z, \theta), \begin{aligned}\left.\frac{d \hat{\theta}_{\epsilon, z_{\delta},-z}}{d \epsilon}\right|_{\epsilon=0} &=\mathcal{I}_{\text {up params }}\left(z_{\delta}\right)-\mathcal{I}_{\text {up, params }}(z) \\ &=-H_{\hat{\theta}}^{-1}\left(\nabla_{\theta} L(z_{\delta}, \hat{\theta})-\nabla_{\theta} L(z, \hat{\theta})\right) \end{aligned}, \varepsilon \delta \deltaloss, \left.\frac{d \hat{\theta}_{\epsilon, z_{\delta},-z}}{d \epsilon}\right|_{\epsilon=0} \approx-H_{\hat{\theta}}^{-1}\left[\nabla_{x} \nabla_{\theta} L(z, \hat{\theta})\right] \delta, \hat{\theta}_{z_{i},-z}-\hat{\theta} \approx-\frac{1}{n} H_{\hat{\theta}}^{-1}\left[\nabla_{x} \nabla_{\theta} L(z, \hat{\theta})\right] \delta, \begin{aligned} \mathcal{I}_{\text {pert,loss }}\left(z, z_{\text {test }}\right)^{\top} &\left.\stackrel{\text { def }}{=} \nabla_{\delta} L\left(z_{\text {test }}, \hat{\theta}_{z_{\delta},-z}\right)^{\top}\right|_{\delta=0} \\ &=-\nabla_{\theta} L\left(z_{\text {test }}, \hat{\theta}\right)^{\top} H_{\hat{\theta}}^{-1} \nabla_{x} \nabla_{\theta} L(z, \hat{\theta}) \end{aligned}, train lossH \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) , -y_{\text {test }} y \cdot \sigma\left(-y_{\text {test }} \theta^{\top} x_{\text {test }}\right) \cdot \sigma\left(-y \theta^{\top} x\right) \cdot x_{\text {test }}^{\top} H_{\hat{\theta}}^{-1} x, influence functiondebug training datatraining point \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right) losstraining pointtraining point, Stochastic estimationHHHTFO(np)np, ImageNetdogfish900Inception v3SVM with RBF kernel, poisoning attackinfluence function59157%77%10590/591, attackRelated worktraining set attackadversarial example, influence functionbad case debug, labelinfluence function, \mathcal{I}_{\text {up,loss }}\left(z_{i}, z_{i}\right) , 10%labelinfluence functiontrain lossrandom, \mathcal{I}_{\text {up, loss }}\left(z, z_{\text {test }}\right), \mathcal{I}_{\text {up,loss }}\left(z_{i}, z_{i}\right), \mathcal{I}_{\text {pert,loss }}\left(z, z_{\text {test }}\right)^{\top}, H_{\hat{\theta}}^{-1} \nabla_{x} \nabla_{\theta} L(z, \hat{\theta}), Less Is Better: Unweighted Data Subsampling via Influence Function, influence functionleave-one-out retraining, 0.86H, SVMhinge loss0.95, straightforwardbest paper, influence functionloss. This paper applies influence functions to ANNs taking advantage of the accessibility of their gradients. Jianxin Ma, Peng Cui, Kun Kuang, Xin Wang, and Wenwu Zhu. Understanding black-box predictions via influence functions Proc 34th Int Conf on Machine Learning, p.1885-1894. which can of course be changed. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. and even creating visually-indistinguishable training-set attacks. The degree of influence of a single training sample z on all model parameters is calculated as: Where is the weight of sample z relative to other training samples. In. Cook, R. D. Assessment of local influence. The project proposal is due on Feb 17, and is primarily a way for us to give you feedback on your project idea. Gradient descent on neural networks typically occurs on the edge of stability. S. McCandish, J. Kaplan, D. Amodei, and the OpenAI Dota Team. Metrics give a local notion of distance on a manifold. Neural nets have achieved amazing results over the past decade in domains as broad as vision, speech, language understanding, medicine, robotics, and game playing. Programming languages & software engineering, Programming languages and software engineering, Designing AI Systems with Steerable Long-Term Dynamics, Using platform models responsibly: Developer tools with human-AI partnership at the center, [ICSE'22] TOGA: A Neural Method for Test Oracle Generation, Characterizing and Predicting Engagement of Blind and Low-Vision People with an Audio-Based Navigation App [Pre-recorded CHI 2022 presentation], Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation [video], Closing remarks: Empowering software developers and mathematicians with next-generation AI, Research talks: AI for software development, MDETR: Modulated Detection for End-to-End Multi-Modal Understanding, Introducing Retiarii: A deep learning exploratory-training framework on NNI, Platform for Situated Intelligence Workshop | Day 2. In this paper, we use influence functions a classic technique from robust statistics to trace a models prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Stochastic Optimization and Scaling [Slides]. . In. No description, website, or topics provided. Lectures will be delivered synchronously via Zoom, and recorded for asynchronous viewing by enrolled students. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1885--1894. lage2019evaluationI. on the final predictions is straight forward.
Understanding Black-box Predictions via Influence Functions Your search export query has expired. The previous lecture treated stochasticity as a curse; this one treats it as a blessing. The reference implementation can be found here: link. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., and Darrell, T. Decaf: A deep convolutional activation feature for generic visual recognition.
Understanding Black-box Predictions via Influence Functions Negative momentum for improved game dynamics.
[ICML] Understanding Black-box Predictions via Influence Functions In. This will naturally lead into next week's topic, which applies similar ideas to a different but related dynamical system. Reference Understanding Black-box Predictions via Influence Functions : , , , . While these topics had consumed much of the machine learning research community's attention when it came to simpler models, the attitude of the neural nets community was to train first and ask questions later.