Please log in to watch this conference skillscast.
In many applications, one wants to identify identical subtrees of a program syntax tree. This identification should ideally be robust to alpha-renaming of the program, but no existing technique has been shown to achieve this with good efficiency (better than O(n^2) in expression size). We present a new, asymptotically efficient way to hash modulo alpha-equivalence. A key insight of our method is to use a weak (commutative) hash combiner at exactly one point in the construction, which admits an algorithm with O(n*(log n)^2) time complexity. We prove that the use of the commutative combiner nevertheless yields a strong hash with low collision probability.
Q&A
Question: What effect does this have on GHC compile times?
Answer: I have not tried this in GHC. We are using it at MSR for an optimiser for machine-learning programs
Question: I recall an video once where you jokingly/seriously talked about how you think "Haskell is Useless!". Limited Side effects, but not 'useful' like other languages.
What do you think about that 10 years on?
How do you think passionate everyday FP'ers can drag Haskell/FP to the "Useful" Nirvana?
Answer: It was indeed jokingly. Haskell is now increasingly used in production, I'm happy to say; so much so that we've started the Haskell Foundation https://haskell.foundation/ to support this increasingly mission-critical role.
Question: Is there a measure of how much improvement there is in sharing, given the de bruijin etc. is incorrect? assuming the point here is to find common sub expressions and share references to them, the overall expression size should become smaller. I’m wondering how much better this tends to be at that due to fining all common sub-expressions accurately.
Answer: You mean: for CSE we could put up with false negatives, provided we don't get any false positives. Yes, that's true. Indeed, in GHC, we use an even simpler CSE that has lots of false negatives. But I don't know of any measurements that tell us how much difference that makes in practice. I bet it's not much!
So maybe this talk won't change the world of compiler writers after all. But it's much more satisfying to have an algorithm that finds precisely the right answer, and does so reasonably quickly. My inner geek is happy.
Plus, I suppose, best-efforts algorithms can be fragile to "I made a small change to my program and it runs way slower now... the compiler missed some vital optimisation"
YOU MAY ALSO LIKE:
Hashing Modulo Alpha Equivalence
Simon Peyton Jones
Simon Peyton Jones, MA, MBCS, CEng, graduated from Trinity College Cambridge in 1980. Simon was a key contributor to the design of the now-standard functional language Haskell, and is the lead designer of the widely-used Glasgow Haskell Compiler (GHC). He has written two textbooks about the implementation of functional languages.
After two years in industry, he spent seven years as a lecturer at University College London, and nine years as a professor at Glasgow University before moving to Microsoft Research (Cambridge) in 1998.
His main research interest is in functional programming languages, their implementation, and their application. He has led a succession of research projects focused around the design and implementation of production-quality functional-language systems for both uniprocessors and parallel machines.
More generally, he is interested in language design, rich type systems, software component architectures, compiler technology, code generation, runtime systems, virtual machines, and garbage collection. He is particularly motivated by direct use of principled theory to practical language design and implementation -- that's one reason he loves functional programming so much.