Please log in to watch this conference skillscast.
Software systems form an intrinsic feedback loop, the more we use and rely on a system, the more we demand of that system. This loop drives a cost and complexity that if left unchecked eventually inhibits growth and improvements to the software or in the worst case brings it crashing down.
The crux of this complexity in many (most?) systems is the management of state and data over time, and importantly how different systems co-ordinate around that state. If we look to modern, and even not so modern, software we can see a recurring pattern to dealing with this through the use of immutable, persistent data. We see this pattern from databases and distributed systems through to user interface frameworks. Building complex systems that work correctly, reliably and efficiently often comes down to building systems that don't forget.
In this talk we will look at the design, benefits and challenges of building a system around the ideas of immutable and persistent data using a specific example of a system that I work on to support urban planning & climate policy modelling.
Q&A
Question: #[derive(CacheHash)] looks interesting. Is that something internal to your codebase or is there a cool crate I'm not aware of?
Answer: Yeh, it is internal, using custom derive. But there are a few similar style crates around I think.
https://doc.rust-lang.org/reference/attributes/derive.html
Question: Question to anyone that knows… I'm not extremely familiar with Zippers. Am I right in saying that it's a data structure with a focus, and a collection of things to its left, and a collection of things to its right?
How does rewriting a Tree into a Zipper flip the arrows?
Answer Thread:
David Overton
A zipper doesn’t flip all the arrows, just those pointing back to the root of the tree.
Alexey Kotlyarov
I kind of get this, but I'd still like to see a type definition in Haskell/Rust/whatever
particularly how would you use it to patch a structure
George Wilson
I did a talk on this once https://m.youtube.com/watch?v=woK7ntZRwXQ
I don't remember if this talk was good; YMMV
Ed Kmett
Alexey: think of a data type describing a 'path' down into the tree. if your tree is a binary tree, at each node you could maybe stop there, go left or go right, if you go left, store the sibling and maybe the value on the current node, if you go right store the other sibling and the value, if you stay you're at the leaf. So now you can represent a tree with a selected element as a path + the node at that point. if you have an octree, then you tell me which child you walk into and all 7 of the other children. the zipper for a data type will depend on the recursive structure of it.
David Overton
I get all that (also did a talk on it once :) ), but it didn’t look like what Mark was doing.
Ed Kmett
Well, we can come up with a different way to store the spine, e.g. you can store the entire path as a flattened array. http://ekmett.github.io/transients/src/Data-Transient-WordMap-Internal.html#TWordMap
That is a 'word map' which is like an intmap, but modified to have a finger at the last accessed element. (it uses a funny trick i posted at some point to get away with fewer operations compared to an intmap) the finger makes it so you get a nice fast access near where you last accessed it
Tony Morris
I start with an (very loose) initial intuition for "make a data structure, make it physically, such as a list, now go stand on an element, look around, what do you see?" Well, you'd see elements to your left, elements to your right, and the element you are standing on. Now, what about a tree? Make a physical tree, go stand on an element, look around, what do you see?
Sanjiv Sahayam
it’s also covered in LYAHFGG: http://learnyouahaskell.com/zippers
Alexey Kotlyarov
So let's say I have a Node (Leaf 1) (Node (Leaf 2) (Leaf 3)) , and I want to replace 2 with 22, what would would (zipper versions of) both trees look like? I'm assuming they'll somehow still be useful as trees afterwards.
Tony Morris
loosely, toZipper > moveRightSibling > setFocus(22) > unZipper
You can calculate the zipper for a data structure by taking the partial derivative, if that helps; I have docs on it somewhere.
e.g. d/da. List a comes to List a * List a (focus is missing)
Mark Hibberd
For slightly additional context on my slides, the structure in rust is far from an immutable/persistent structure, and hence the hand-wavy nature of the explanation which could of been better, was trying to just provide a rough analogue for what it looks like from the outside of the system. +1 for Tony's explanation above for zippers.
Alexey Kotlyarov
Ah, I think I misunderstood that part in the talk then. I thought using zippers somehow achieves even greater sharing than possible with the regular tree structure.
Mark Hibberd
Yeh, apologies, I kind of fumbled the explanation a bit. The crux of the technique that it uses is to store a densely packed array of the path to nodes, the nodes contain references to the array rather than other nodes, such that you can efficiently do copy-on-write of that array changing a node without having to rebuild all the nodes (as the pointers between nodes are indirect, kind of like names). The way it is designed, relies on some overlap in techniques/concepts, and I made the mistake of being a bit fluffy in the comparison. (edited)
Question: why Rust in this context? What made it a good fit?
Answer: Query performance, I didn't get time to go into it in much detail, but the reality is that I didn't want to (and couldn't) spend very much time on it. I spent 3 days, building the query engine 18months ago, and have not had to worry about it and been able to concentrate on more important things. In most other situations I would of had to spend a lot of time tuning to get where I was with very little effort (partially because of libraries - decent enough array/simd support etc..., partially because I could write what I needed fairly explicitly).
Something about, the biggest bottleneck to performance is freeing up your time to optimise. And it had a good balance for this particular problem (and a decent enough fit for the team skillset).
Question: Do you have some particular query mechanism like Datomic, a loosely similar system that claims to never forgets, uses an implementation of Datalog ?
Answer: Not really, it far coarser. Queries are effectively parameterised by the scenario identifier + the table identifier, that will always point at the same data. The data-set it points to can be large, and of various different shapes (graphs, tables, vectors), and isn't all indexed by time as facts, but is a value that is repeatable as a whole.
YOU MAY ALSO LIKE:
Systems That Don't Forget
Mark Hibberd
CTOKinesis