Bernhard Schölkopf

Bernhard Schölkopf
Bernhard Schölkopf
Born	February 1968 (age 56)
Alma mater	University of London (1992, MSc in Mathematics); University of Tübingen (1994, Diplom in Physics); TU Berlin (1997, PhD in Computer Science);
Known for	Machine Learning; Kernel Methods; Causal Inference;
Awards	J. K. Aggarwal Prize of the International Association for Pattern Recognition (2006); Max Planck Research Award (2011); Academy Prize of the Berlin-Brandenburg Academy of Sciences and Humanities (2012); Milner Award (2014); Member of the German National Academy of Science (Leopoldina) (2017); Fellow of the ACM (Association for Computing Machinery) (2018); Leibniz Prize (2018); Causality in Statistics Education Award, American Statistical Association; Körber European Science Prize (2019); BBVA Foundation Frontiers of Knowledge Awards (2020);
	Scientific career
Institutions	Max Planck Institute for Intelligent Systems
Thesis	Support Vector Learning (1997)
Doctoral advisor	Stefan Jähnichen [de]; Vladimir Vapnik;
Doctoral students	Ulrike von Luxburg;

Bernhard Schölkopf (born 20 February 1968) is a German computer scientist known for his work in machine learning, especially on kernel methods and causality. He is a director at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he heads the Department of Empirical Inference. He is also an affiliated professor at ETH Zürich, honorary professor at the University of Tübingen and the Technical University Berlin, and chairman of the European Laboratory for Learning and Intelligent Systems (ELLIS).

Research[edit]

Kernel methods[edit]

Schölkopf developed SVM methods achieving world record performance on the MNIST pattern recognition benchmark at the time.^[2] With the introduction of kernel PCA, Schölkopf and coauthors argued that SVMs are a special case of a much larger class of methods, and all algorithms that can be expressed in terms of dot products can be generalized to a nonlinear setting by means of what is known as reproducing kernels.^[3]^[4]^[5] Another significant observation was that the data on which the kernel is defined need not be vectorial, as long as the kernel Gram matrix is positive definite.^[3] Both insights together led to the foundation of the field of kernel methods, encompassing SVMs and many other algorithms. Kernel methods are now textbook knowledge and one of the major machine learning paradigms in research and applications.

Developing kernel PCA, Schölkopf extended it to extract invariant features and to design invariant kernels^[4]^[6]^[7] and showed how to view other major dimensionality reduction methods such as LLE and Isomap as special cases. In further work with Alex Smola and others, he extended the SVM method to regression and classification with pre-specified sparsity^[8] and quantile/support estimation.^[9] He proved a representer theorem implying that SVMs, kernel PCA, and most other kernel algorithms, regularized by a norm in a reproducing kernel Hilbert space, have solutions taking the form of kernel expansions on the training data, thus reducing an infinite dimensional optimization problem to a finite dimensional one. He co-developed kernel embeddings of distributions methods to represent probability distributions in Hilbert Spaces,^[10]^[11]^[12]^[13] with links to Fraunhofer diffraction^[14] as well as applications to independence testing.^[15]^[16]^[17]

Causality[edit]

Starting in 2005, Schölkopf turned his attention to causal inference. Causal mechanisms in the world give rise to statistical dependencies as epiphenomena, but only the latter are exploited by popular machine learning algorithms. Knowledge about causal structures and mechanisms is useful by letting us predict not only future data coming from the same source, but also the effect of interventions in a system, and by facilitating transfer of detected regularities to new situations.^[18]

Schölkopf and co-workers addressed (and in certain settings solved) the problem of causal discovery for the two-variable setting^[19]^[20]^[21]^[22]^[23] and connected causality to Kolmogorov complexity.^[24]

Around 2010, Schölkopf began to explore how to use causality for machine learning, exploiting assumptions of independence of mechanisms and invariance.^[25] His early work on causal learning was exposed to a wider machine learning audience during his Posner lecture ^[26] at NeurIPS 2011, as well as in a keynote talk at ICML 2017.^[27] He assayed how to exploit underlying causal structures in order to make machine learning methods more robust with respect to distribution shifts^[18]^[28]^[29] and systematic errors,^[30] the latter leading to the discovery of a number of new exoplanets^[31] including K2-18b, which was subsequently found to contain water vapour in its atmosphere, a first for an exoplanet in the habitable zone.

Education and employment[edit]

Schölkopf studied mathematics, physics, and philosophy in Tübingen and London. He was supported by the Studienstiftung and won the Lionel Cooper Memorial Prize for the best M.Sc. in Mathematics at the University of London.^[32] He completed a Diplom in Physics, and then moved to Bell Labs in New Jersey, where he worked with Vladimir Vapnik, who became co-adviser of his PhD thesis at the TU Berlin (with Stefan Jähnichen). His thesis, defended in 1997, won the annual award of the German Informatics Association.^[33] In 2001, following positions in Berlin, Cambridge and New York, he founded the Department for Empirical Inference at the Max Planck Institute for Biological Cybernetics, which grew into a leading center for research in machine learning. In 2011, he became founding director at the Max Planck Institute for Intelligent Systems.^[34]^[35]

With Alex Smola, Schölkopf co-founded the series of Machine Learning Summer Schools.^[36] He also co-founded a Cambridge-Tübingen PhD Programme^[37] and the Max Planck-ETH Center for Learning Systems.^[38] In 2016, he co-founded the Cyber Valley research consortium.^[39] He participated in the IEEE Global Initiative on "Ethically Aligned Design".^[40]

Schölkopf is co-editor-in-Chief of the Journal of Machine Learning Research, a journal he helped found, being part of a mass resignation of the editorial board of Machine Learning (journal). He is among the world’s most cited computer scientists.^[41] Alumni of his lab include Ulrike von Luxburg, Carl Rasmussen, Matthias Hein, Arthur Gretton, Gunnar Rätsch, Matthias Bethge, Stefanie Jegelka, Jason Weston, Olivier Bousquet, Olivier Chapelle, Joaquin Quinonero-Candela, and Sebastian Nowozin.^[42]

Awards[edit]

Schölkopf’s awards include the Royal Society Milner Award and, shared with Isabelle Guyon and Vladimir Vapnik, the BBVA Foundation Frontiers of Knowledge Award in the Information and Communication Technologies category. He was the first scientist working in Europe to receive this award.^[43]

References[edit]

^ "Causality in Statistics Education Award". www.amstat.org.
^ Decoste, Dennis; Schölkopf, Bernhard (1 January 2002). "Training Invariant Support Vector Machines". Machine Learning. 46 (1): 161–190. doi:10.1023/A:1012454411458. hdl:11858/00-001M-0000-0013-E06A-A. S2CID 85843 – via Springer Link.
^ ^a ^b Schölkopf, Bernhard (1997). Support vector learning. GMD-Berichte. München Wien: Oldenbourg. ISBN 978-3-486-24632-2.
^ ^a ^b Schölkopf, Bernhard; Smola, Alexander; Müller, Klaus-Robert (1 July 1998). "Nonlinear Component Analysis as a Kernel Eigenvalue Problem". Neural Computation. 10 (5): 1299–1319. doi:10.1162/089976698300017467. ISSN 0899-7667. S2CID 6674407.
^ Burges, Christopher J.C. (1 June 1998). "A Tutorial on Support Vector Machines for Pattern Recognition". Data Mining and Knowledge Discovery. 2 (2): 121–167. doi:10.1023/A:1009715923555. S2CID 221627509 – via Springer Link.
^ Schölkopf, P. Simard, A. J. Smola, and V. Vapnik. Prior knowledge in support vector kernels. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems 10, pages 640–646, Cambridge, MA, USA, 1998d. MIT Press
^ Chapelle and B. Schölkopf. Incorporating invariances in nonlinear SVMs. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 609–616, Cambridge, MA, USA, 2002. MIT Press
^ B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12(5):1207–1245, 2000a
^ B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443–1471, 2001b
^ A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf and A. Smola. A Kernel Method for the Two-Sample-Problem. Advances in Neural Information Processing Systems 19: 513—520, 2007
^ A. J. Smola and A. Gretton and L. Song and B. Schölkopf. A Hilbert Space Embedding for Distributions. Algorithmic Learning Theory: 18th International Conference: 13—31, 2007
^ B. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf and G. Lanckriet. Hilbert Space Embeddings and Metrics on Probability Measures. Journal of Machine Learning Research, 11: 1517—1561, 2010
^ A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf and A. J. Smola. A Kernel Two-Sample Test. Journal of Machine Learning Research, 13: 723—773, 2012
^ S. Harmeling, M. Hirsch, and B. Schölkopf. On a link between kernel mean maps and Fraunhofer diffraction, with an application to super-resolution beyond the diffraction limit. In Computer Vision and Pattern Recognition (CVPR), pages 1083–1090. IEEE, 2013
^ A. Gretton, R. Herbrich, A. J. Smola, O. Bousquet, and B. Schölkopf. Kernel methods for measuring independence. Journal of Machine Learning Research, 6:2075–2129, 2005a
^ A. Gretton, O. Bousquet, A. J. Smola and B. Schölkopf. Measuring Statistical Dependence with Hilbert-Schmidt Norms. Algorithmic Learning Theory: 16th International Conference, 2005b
^ A. Gretton, K. Fukumizu, C.H. Teo, L. Song, B. Schölkopf and A. J. Smola. A Kernel Statistical Test of Independence. Advances in Neural Information Processing Systems 20, 2007
^ ^a ^b B. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal learning. In J. Langford and J. Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1255–1262, New York, NY, USA, 2012. Omnipress
^ P. O. Hoyer, D. Janzing, J. M. Mooij, J. Peters, and B. Schölkopf. Nonlinear causal discovery with additive noise models. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 689–696, Red Hook, NY, USA, 2009. Curran
^ D. Janzing, P. Hoyer, and B. Schölkopf. Telling cause from effect based on high-dimensional observations. In J. Fu ̈rnkranz and T. Joachims, editors, Proceedings of the 27th International Conference on Machine Learning, pages 479–486, Madison, WI, USA, 2010. International Machine Learning Society
^ J.M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Schölkopf. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17(32):1–102, 2016
^ J. Peters, JM. Mooij, D. Janzing, and B. Schölkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15:2009–2053, 2014
^ P. Daniusis, D. Janzing, J. Mooij, J. Zscheischler, B. Steudel, K. Zhang, and B. Schölkopf. Inferring deterministic causal relations. In P. Grünwald and P. Spirtes, editors, 26th Conference on Uncertainty in Artificial Intelligence, pages 143–150, Corvallis, OR, 2010. AUAI Press. Best student paper award
^ Janzing, Dominik; Schölkopf, Bernhard (6 October 2010). "Causal Inference Using the Algorithmic Markov Condition". IEEE Transactions on Information Theory. 56 (10): 5168–5194. arXiv:0804.3678. doi:10.1109/TIT.2010.2060095. S2CID 11867432 – via IEEE Xplore.
^ Schölkopf, Bernhard; Janzing, Dominik; Peters, Jonas; Sgouritsa, Eleni; Zhang, Kun (27 June 2012). "On Causal and Anticausal Learning" (PDF). International Conference of Machine Learning.
^ "From kernels to causal inference". videolectures.net.
^ "Causal Learning --- Bernhard Schölkopf". 15 October 2017 – via Vimeo.
^ K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional shift. In S. Dasgupta and D. McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of JMLR Workshop and Conference Proceedings, pages 819–827, 2013
^ Schölkopf, Bernhard (6 February 2015). "Learning to see and act". Nature. 518 (7540): 486–487. doi:10.1038/518486a. PMID 25719660. S2CID 4461791 – via www.nature.com.
^ Schölkopf, Bernhard; Hogg, David W.; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas (5 July 2016). "Modeling confounding by half-sibling regression". Proceedings of the National Academy of Sciences. 113 (27): 7391–7398. Bibcode:2016PNAS..113.7391S. doi:10.1073/pnas.1511656113. PMC 4941423. PMID 27382154.
^ D. Foreman-Mackey, B. T. Montet, D. W. Hogg, T. D. Morton, D. Wang, and B. Schölkopf. A systematic search for transiting planets in the K2 data. The Astrophysical Journal, 806(2), 2015
^ "Curriculum Vitae Prof. Dr. Bernhard Schölkopf" (PDF). Leopoldina (in German).
^ "TU Berlin – Medieninformation Nr. 209 – 17. September 1998". archiv.pressestelle.tu-berlin.de.
^ "History of the Institute". www.kyb.tuebingen.mpg.de.
^ "Prescriptions for the Medicine of Tomorrow" (PDF). The Science Magazine of the Max Planck Society. 2011.
^ "Machine Learning Summer Schools – MLSS". mlss.cc.
^ "Cambridge Machine Learning Group". Cambridge Machine Learning Group.
^ Williams, Jonathan. "Max Planck ETH Center for Learning Systems". cls-staging.is.localnet.
^ "Service". Baden-Württemberg.de. 15 December 2016.
^ "Ethically Aligned Design" (PDF). IEEE. 13 December 2016.
^ "World's Top Computer Scientists: H-Index Computer Science Ranking". www.guide2research.com.
^ "Alumni". people.tuebingen.mpg.de.
^ Williams, Jon. "Bernhard Schölkopf receives Frontiers of Knowledge Award | Empirical Inference". Max Planck Institute for Intelligent Systems.

External links[edit]

Scholia has an author profile for Bernhard Schölkopf.

Bernhard Schölkopf publications indexed by Google Scholar

[1] "Causality in Statistics Education Award". www.amstat.org.

[2] Decoste, Dennis; Schölkopf, Bernhard (1 January 2002). "Training Invariant Support Vector Machines". Machine Learning. 46 (1): 161–190. doi:10.1023/A:1012454411458. hdl:11858/00-001M-0000-0013-E06A-A. S2CID 85843 – via Springer Link.

[:0-3] Schölkopf, Bernhard (1997). Support vector learning. GMD-Berichte. München Wien: Oldenbourg. ISBN 978-3-486-24632-2.

[:1-4] Schölkopf, Bernhard; Smola, Alexander; Müller, Klaus-Robert (1 July 1998). "Nonlinear Component Analysis as a Kernel Eigenvalue Problem". Neural Computation. 10 (5): 1299–1319. doi:10.1162/089976698300017467. ISSN 0899-7667. S2CID 6674407.

[5] Burges, Christopher J.C. (1 June 1998). "A Tutorial on Support Vector Machines for Pattern Recognition". Data Mining and Knowledge Discovery. 2 (2): 121–167. doi:10.1023/A:1009715923555. S2CID 221627509 – via Springer Link.

[6] Schölkopf, P. Simard, A. J. Smola, and V. Vapnik. Prior knowledge in support vector kernels. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems 10, pages 640–646, Cambridge, MA, USA, 1998d. MIT Press

[7] Chapelle and B. Schölkopf. Incorporating invariances in nonlinear SVMs. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 609–616, Cambridge, MA, USA, 2002. MIT Press

[8] B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12(5):1207–1245, 2000a

[9] B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443–1471, 2001b

[10] A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf and A. Smola. A Kernel Method for the Two-Sample-Problem. Advances in Neural Information Processing Systems 19: 513—520, 2007

[11] A. J. Smola and A. Gretton and L. Song and B. Schölkopf. A Hilbert Space Embedding for Distributions. Algorithmic Learning Theory: 18th International Conference: 13—31, 2007

[12] B. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf and G. Lanckriet. Hilbert Space Embeddings and Metrics on Probability Measures. Journal of Machine Learning Research, 11: 1517—1561, 2010

[13] A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf and A. J. Smola. A Kernel Two-Sample Test. Journal of Machine Learning Research, 13: 723—773, 2012

[14] S. Harmeling, M. Hirsch, and B. Schölkopf. On a link between kernel mean maps and Fraunhofer diffraction, with an application to super-resolution beyond the diffraction limit. In Computer Vision and Pattern Recognition (CVPR), pages 1083–1090. IEEE, 2013

[15] A. Gretton, R. Herbrich, A. J. Smola, O. Bousquet, and B. Schölkopf. Kernel methods for measuring independence. Journal of Machine Learning Research, 6:2075–2129, 2005a

[16] A. Gretton, O. Bousquet, A. J. Smola and B. Schölkopf. Measuring Statistical Dependence with Hilbert-Schmidt Norms. Algorithmic Learning Theory: 16th International Conference, 2005b

[17] A. Gretton, K. Fukumizu, C.H. Teo, L. Song, B. Schölkopf and A. J. Smola. A Kernel Statistical Test of Independence. Advances in Neural Information Processing Systems 20, 2007

[auto-18] B. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal learning. In J. Langford and J. Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1255–1262, New York, NY, USA, 2012. Omnipress

[19] P. O. Hoyer, D. Janzing, J. M. Mooij, J. Peters, and B. Schölkopf. Nonlinear causal discovery with additive noise models. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 689–696, Red Hook, NY, USA, 2009. Curran

[20] D. Janzing, P. Hoyer, and B. Schölkopf. Telling cause from effect based on high-dimensional observations. In J. Fu ̈rnkranz and T. Joachims, editors, Proceedings of the 27th International Conference on Machine Learning, pages 479–486, Madison, WI, USA, 2010. International Machine Learning Society

[21] J.M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Schölkopf. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17(32):1–102, 2016

[22] J. Peters, JM. Mooij, D. Janzing, and B. Schölkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15:2009–2053, 2014

[23] P. Daniusis, D. Janzing, J. Mooij, J. Zscheischler, B. Steudel, K. Zhang, and B. Schölkopf. Inferring deterministic causal relations. In P. Grünwald and P. Spirtes, editors, 26th Conference on Uncertainty in Artificial Intelligence, pages 143–150, Corvallis, OR, 2010. AUAI Press. Best student paper award

[24] Janzing, Dominik; Schölkopf, Bernhard (6 October 2010). "Causal Inference Using the Algorithmic Markov Condition". IEEE Transactions on Information Theory. 56 (10): 5168–5194. arXiv:0804.3678. doi:10.1109/TIT.2010.2060095. S2CID 11867432 – via IEEE Xplore.

[25] Schölkopf, Bernhard; Janzing, Dominik; Peters, Jonas; Sgouritsa, Eleni; Zhang, Kun (27 June 2012). "On Causal and Anticausal Learning" (PDF). International Conference of Machine Learning.

[26] "From kernels to causal inference". videolectures.net.

[27] "Causal Learning --- Bernhard Schölkopf". 15 October 2017 – via Vimeo.

[28] K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional shift. In S. Dasgupta and D. McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of JMLR Workshop and Conference Proceedings, pages 819–827, 2013

[29] Schölkopf, Bernhard (6 February 2015). "Learning to see and act". Nature. 518 (7540): 486–487. doi:10.1038/518486a. PMID 25719660. S2CID 4461791 – via www.nature.com.

[30] Schölkopf, Bernhard; Hogg, David W.; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas (5 July 2016). "Modeling confounding by half-sibling regression". Proceedings of the National Academy of Sciences. 113 (27): 7391–7398. Bibcode:2016PNAS..113.7391S. doi:10.1073/pnas.1511656113. PMC 4941423. PMID 27382154.

[31] D. Foreman-Mackey, B. T. Montet, D. W. Hogg, T. D. Morton, D. Wang, and B. Schölkopf. A systematic search for transiting planets in the K2 data. The Astrophysical Journal, 806(2), 2015

[32] "Curriculum Vitae Prof. Dr. Bernhard Schölkopf" (PDF). Leopoldina (in German).

[33] "TU Berlin – Medieninformation Nr. 209 – 17. September 1998". archiv.pressestelle.tu-berlin.de.

[34] "History of the Institute". www.kyb.tuebingen.mpg.de.

[35] "Prescriptions for the Medicine of Tomorrow" (PDF). The Science Magazine of the Max Planck Society. 2011.

[36] "Machine Learning Summer Schools – MLSS". mlss.cc.

[37] "Cambridge Machine Learning Group". Cambridge Machine Learning Group.

[38] Williams, Jonathan. "Max Planck ETH Center for Learning Systems". cls-staging.is.localnet.

[39] "Service". Baden-Württemberg.de. 15 December 2016.

[40] "Ethically Aligned Design" (PDF). IEEE. 13 December 2016.

[41] "World's Top Computer Scientists: H-Index Computer Science Ranking". www.guide2research.com.

[42] "Alumni". people.tuebingen.mpg.de.

[43] Williams, Jon. "Bernhard Schölkopf receives Frontiers of Knowledge Award | Empirical Inference". Max Planck Institute for Intelligent Systems.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

Authority control databases
International	ISNI VIAF
National	Germany Israel United States Czech Republic Netherlands
Academics	Association for Computing Machinery CiNii DBLP Google Scholar Leopoldina MathSciNet Mathematics Genealogy Project ORCID Publons ResearcherID Scopus zbMATH
Other	IdRef