OpRemoveText: A well-engineered stick in the eye
Hi folks,
So for today's light reading, I thought I'd take a few moments and describe the
intent and workings of OpRemoveText. Recent activity in iterators and with images
has being bumping into the black-box nature of this, and it seemed like a good
topic to discuss.
History
=======
OpRemoveText started life sometime in November, 2012. Originally written as a direct
complement to OpInsertText, the earliest version was simply able to remove text
data within a text node.
Eventually, OpRemoveText grew to support complete removal of empty text nodes
(early April, 2013, Friedrich), then support for removing text across multiple
text nodes (late April, 2013, Aditya), and finally space element removal,
paragraph merging and list item removal.
The initial implementation was 66 lines long (including comments). By late July,
this operation had grown to 250 lines, and was accruing a significant number of
known limitations.
These include things such as:
- Inability to remove images, tables and other ODF objects
- Erratic removal of ODF character elements (e.g., spaces & tabs)
- Destruction of non-odf children when removing ODF containers
The number of non-text items that needed to have special handling to be removed
made it somewhat clear that the existing text neighbourhood approach was slowly
being outgrown.
Current
=======
Mid July, 2013, an alternative approach for OpRemoveText was put forward. The
limitations being specifically addressed in this rewrite were:
- Improve the ability to support ODF objects, and make it easy to extend this for
future object support
- Eliminate the text-oriented design of the existing implementation
- Preserve non-ODF elements during deletion (e.g., cursor nodes) without specialised
handling
The original OpRemoveText could be best described as delete-by-default behaviour.
The implementation carried specific logic to handle and cleanup editinfo nodes,
and keep the cursor nodes safely intact. The key issue with this approach was
that integration of foreign elements into an ODF document were not able to be
saved without patching OpRemoveText directly.
In contrast, the current design is actually save-by-default. This is achieved
by performing all node removal using the DomUtils.mergeIntoParent function.
This new approach landed late July, and thus far has survived with very few major
changes or bug fixes.
How the magic works
===================
Removal of content has a few major responsibilities attached:
- Paragraph merging: Merging two paragraphs together has some special edge cases
around what the resulting joined paragraph style should be.
- Empty container cleanup: In order to keep the document in a relatively performant
state, removal needs to do things such as discard unnecessary span elements,
remove empty frames etc.
- Maintain document structure: Removal of some elements should automatically clean
up parent containers. For example, removing the last item in a list should
result in the parent list also being removed.
The basic approach used in OpRemoveText to meet these needs is as follows:
1. Given a position & length, translate this to a DOMRange (via position iterators)
2. Fetch a list of all ODF objects or text elements FULLY contained within this range
- For each ODF object in this list, eliminate the object using domUtils.mergeIntoParent
to remove the object whilst preserving all it's children
- For each text node, delete the node entirely
- Check the parent of each object/text node that was removed. If it is a container
that should automatically collapse, and it is now empty, remove the parent using
mergeIntoParent. Repeat this process on it's own parent.
3. Fetch a list of all ODF paragraphs that intersect the current range. Merge these
together as all content within the range will have been removed.
(Special rules for styling apply which I won't go into detail here)
An auto-collapsing container is
- Not a paragraph element (OpRemoveText, line 107)
- Not a root element. E.g., office:text (OpRemoveText, line 107)
- Not a character element (OpRemovetext, line 82)
- Contains no ODF character elements (OpRemoveText, line 90)
- Has no text content (OpRemoveText, line 86)
The key concepts here are:
1. Collecting the set of objects/text elements that will be removed BEFORE doing
any removal prevents counting issues.
2. Deletion of elements is order-agnostic. If the child is processed after the
parent element, the parent can still automatically collapse automatically.
3. Auto-collapsing containers vs. specifically removable content starts to clarify
ODF elements that are navigable content vs. containers for content.
Worked Example: Merging two paragraphs
======================================
participants (1)
-
Philip Peitsch