Revisioned Python dict
Concept: An object that acts like a dict, but keeps track of revisions of itself (and any child dicts/lists).
Usecase: In this particular case, a dict holds information for a particular task. This dict is serialized and handed off to various processing scripts - potentially simultaneously. Each of these scripts may modify the dict or any of its children at any level. We need to ensure that we can diff the result of the script, against the original revision of the dict that the script received. This way, multiple scripts can simultaneously process the same task and return different results, without either of the scripts overriding the results of another.
Challenges
- Handling merge conflicts - what if a value is added to a key that was previously deleted?
Methodology
- Object that implements dict-like methods, and has an internal dict of revisions - each of which has an ID and a corresponding dict.
- The dict also has a _rev key inserted, that holds the revision ID of the current revision. This information is used to match the origin revision of a task, when a script returns a result.
- The update() method will function differently - instead of doing a standard merge, it will diff the new data against the known original revision (according to the reference ID in the new data). Changes between the two are applied to the latest (not the original!) revision.
- Later destructive changes will override earlier ones. Data loss is therefore possible, but only when done intentionally (ie. by overwriting a value, rather than adding to it).
- Child lists and dicts are replaced with a custom revisioned list-like or dict-like, and instead of storing the dict/list directory, a tuple (rev_id, obj) is stored, where rev_id is the corresponding revision in the child object and obj is the child object itself.
- Merges (using a diff) are recursively propagated.
- List diffs will involve item removals and additions - a moved item will simply be a removal and an addition at a different index.