Please log in to watch this conference skillscast.
Because of deep learning, there has been a surge in interest in automatic differentiation, especially from the functional programming community. As a result, there are many recent papers that look at automatic differentiation from a Category Theory perspective. However, Category Theorists have already been looking at differentiation and calculus in general since the late ’60s in the context of Synthetic Differential Geometry, but it seems that this work is largely ignored by those interested in AD. In this talk, we will provide a gentle introduction to the ideas behind SDG, by relating them to dual numbers, and show how it provides a simple axiomatic and purely algebraic approach to (automatic) differentiation and integration. And no worries if you suffer from arithmophobia, there will be plenty of Kotlin code that turns the math into something fun you can play with for real.
Question: Is there a connection between Dual numbers and Complex numbers? Both have a similar form: (a + Ae) and (Real + i Imaginary)
Answer: Absolutely. Instead of i^2 = -1 we have e^2 = 0
One is a pair of numbers a+bi where i^2 = -1, but what is special about 1, why not -1 or even 1.
Question: Could you explain Skolemization again?
If you remember logic programming, it is a technique used there as well.
Basically, you are eliminating existentially quantified variables by a function of the universally quantified variables on which that existentially quantified variable depends.
Question: Can you provide the book names that you showed in the talk
Question: Could you elaborate a bit further on some examples of programming 1.0 (algebra) and 2.0 (calculus), and how the concepts in the talk related to programming? from my current (naive) viewpoint, this feels super abstract.
Answer: Often Software 2.0 is defined as differentiable programming, i.e. all you programs are differentiable. I.e. you do calculus.
Traditional programming is algebra https://books.google.com/books/about/AlgebraofProgramming.html?id=P5NQAAAAMAAJ
Question: Could you explain how these algebraic methods apply to deep learning? What's the link?
Answer: Training a NN uses automatic differentiation. To train a NN you use backpropagation to learn the parameters. Backpropagation is (backwards) differentiation. But you can "learn" any parameter of any (differentiable) program, as long as it is differentiable.
Question: Are you saying one of the points of this talk was to be able to identify the classes of program that can be replaced with a NN?
Answer: Not really, but I would argue that many programs have some of their parameters be learned.
Question: When you're talking about software 2.0, are you talking more about self-learning software? For those less initiated with calculus and the topics covered, what would be some examples of using the maths presented?
Answer: Here is a really nice example https://fluxml.ai/blog/2019/03/05/dp-vs-rl.html
Yes, it is all about learning parameters (read initializing variables) from examples.
Question: This talk inspires me to re-learn calculus. Has anyone here used calculus as part of their day to day software development? If yes, please share.
Answer: Nice! All of deep learning relies on this stuff. But they make a big deal of it. What I am trying to show is that it is not that different from what you already know.
Question: When the building blocks of differentiation are then available, are you progressing to work on higher-level constructs such as defining distributions and then allowing operations on distributions, such as marginalisation of posterior parameters as well as repeated draws such as in tools like Stan?
Answer: Yup, that is Probabilistic programming!
As you know, that is really all about integration. The sad thing here is that the axiomatization allows you to derive all the theory of integration, but it does not suggest an implementation. For that you need to do Monte Carlo simulation.
Question: Is it leading towards a general domain-specific language for probabilistic programming or would be rather more leading towards lower-level building blocks enabling those more domain-specific applications?
Answer: For probabilistic programming, the trend is more towards DSLs because that makes it easier to implement inference efficiently.
We went from adding PPL primitives to Hack/PHP to defining a DSL Bean Machine in Python.
Question:So the underlying methods you're working on will eventually make this a fundamental part of the language instead of a hack at the top of the different libraries? Do you have a view on what is happening in the swift tensor flow api or is that also a hack on top of the other libraries (and perhaps even python)?
We are building something similar of S2TF. Our PPL is more like a DSL https://pgm2020.cs.aau.dk/wp-content/uploads/2020/09/tehrani20.pdf
YOU MAY ALSO LIKE:
Inside Every Calculus Is A Little Algebra Waiting To Get Out
Erik Meijer is a Dutch computer scientist and entrepreneur. He received his Ph.D. from Nijmegen University in 1992 and has contributed to both academic institutions and major technology corporations.
Erik's research has included the areas of functional programming (particularly Haskell) compiler implementation, parsing, programming language design, XML, and foreign function interfaces. He has worked as an associate professor at Utrecht University, adjunct professor at the Oregon Graduate Institute, part-time professor of Cloud Programming within the Software Engineering Research Group at Delft University of Technology, and Honorary Professor of Programming Language Design at the School of Computer Science of the University of Nottingham, associated with the Functional Programming Laboratory.
From 2000 to early 2013 Erik was a software architect for Microsoft where he headed the Cloud Programmability Team. His work at Microsoft included C#, Visual Basic, LINQ, Volta, and the reactive programming framework (Reactive Extensions) for .NET. He founded Applied Duality Inc. in 2013 and since 2015 has been a Director of Engineering at Facebook.