Hi folks,
So for today's light reading, I thought I'd take a few moments and describe the
intent and workings of OpRemoveText. Recent activity in iterators and with images
has being bumping into the black-box nature of this, and it seemed like a good
topic to discuss.
History
=======
OpRemoveText started life sometime in November, 2012. Originally written as a direct
complement to OpInsertText, the earliest version was simply able to remove text
data within a text node.
Eventually, OpRemoveText grew to support complete removal of empty text nodes
(early April, 2013, Friedrich), then support for removing text across multiple
text nodes (late April, 2013, Aditya), and finally space element removal,
paragraph merging and list item removal.
The initial implementation was 66 lines long (including comments). By late July,
this operation had grown to 250 lines, and was accruing a significant number of
known limitations.
These include things such as:
- Inability to remove images, tables and other ODF objects
- Erratic removal of ODF character elements (e.g., spaces & tabs)
- Destruction of non-odf children when removing ODF containers
The number of non-text items that needed to have special handling to be removed
made it somewhat clear that the existing text neighbourhood approach was slowly
being outgrown.
Current
=======
Mid July, 2013, an alternative approach for OpRemoveText was put forward. The
limitations being specifically addressed in this rewrite were:
- Improve the ability to support ODF objects, and make it easy to extend this for
future object support
- Eliminate the text-oriented design of the existing implementation
- Preserve non-ODF elements during deletion (e.g., cursor nodes) without specialised
handling
The original OpRemoveText could be best described as delete-by-default behaviour.
The implementation carried specific logic to handle and cleanup editinfo nodes,
and keep the cursor nodes safely intact. The key issue with this approach was
that integration of foreign elements into an ODF document were not able to be
saved without patching OpRemoveText directly.
In contrast, the current design is actually save-by-default. This is achieved
by performing all node removal using the DomUtils.mergeIntoParent function.
This new approach landed late July, and thus far has survived with very few major
changes or bug fixes.
How the magic works
===================
Removal of content has a few major responsibilities attached:
- Paragraph merging: Merging two paragraphs together has some special edge cases
around what the resulting joined paragraph style should be.
- Empty container cleanup: In order to keep the document in a relatively performant
state, removal needs to do things such as discard unnecessary span elements,
remove empty frames etc.
- Maintain document structure: Removal of some elements should automatically clean
up parent containers. For example, removing the last item in a list should
result in the parent list also being removed.
The basic approach used in OpRemoveText to meet these needs is as follows:
1. Given a position & length, translate this to a DOMRange (via position iterators)
2. Fetch a list of all ODF objects or text elements FULLY contained within this range
- For each ODF object in this list, eliminate the object using domUtils.mergeIntoParent
to remove the object whilst preserving all it's children
- For each text node, delete the node entirely
- Check the parent of each object/text node that was removed. If it is a container
that should automatically collapse, and it is now empty, remove the parent using
mergeIntoParent. Repeat this process on it's own parent.
3. Fetch a list of all ODF paragraphs that intersect the current range. Merge these
together as all content within the range will have been removed.
(Special rules for styling apply which I won't go into detail here)
An auto-collapsing container is
- Not a paragraph element (OpRemoveText, line 107)
- Not a root element. E.g., office:text (OpRemoveText, line 107)
- Not a character element (OpRemovetext, line 82)
- Contains no ODF character elements (OpRemoveText, line 90)
- Has no text content (OpRemoveText, line 86)
The key concepts here are:
1. Collecting the set of objects/text elements that will be removed BEFORE doing
any removal prevents counting issues.
2. Deletion of elements is order-agnostic. If the child is processed after the
parent element, the parent can still automatically collapse automatically.
3. Auto-collapsing containers vs. specifically removable content starts to clarify
ODF elements that are navigable content vs. containers for content.
Worked Example: Merging two paragraphs
======================================
Paragraph 1 ttext:s/ext
Paragraph 2 text
Assuming the user has selected the range between the anchor and cursor nodes and
pressed the delete key:
1. Split text nodes at the selection boundaries. This ensures odtDocument.getTextElements
will never return partially selected text content
2. Fetch all text elements (line 211). For this selection, this will return:
["aph 1 t", text:s/, "ext", "Paragraph 2"]
3. Fetch all intersecting paragraphs:
[, ]
4. For each text element, remove it from the DOM using mergeIntoParent (line 216).
After this step the DOM looks like:
Paragr
text
5. For each paragraph intersecting the selection, merge the paragraph contents
into the first paragraph touching the selection (line 220). After this step,
the DOM looks like:
Paragr
text
6. Finally, remove each paragraph that was merged except the first.
Paragr
text
7. Fix the cursor positions (i.e., collapse now collapsed cursors etc.). In this
instance, there are is no position difference between the front and end now
(i.e., 0 steps difference), so the cursor is collapsed.
Paragr
text
Note, we never had to explicitly find and save the cursor or anchor nodes. These
migrated up the hierarchy via the mergeIntoParent removal process of the ODF
elements we wanted to delete.
Worked Example: Removing a list item
====================================
Now, a slightly more complex removal example.
text:list
text:list-item
Paragraph 1 ttext:s/ext
text:list-item
text:list-item
Paragraph 2 text
text:list-item
text:list
Assuming the user has selected the range between the anchor and cursor nodes:
1. Split text nodes at the selection boundaries. This ensures odtDocument.getTextElements
will never return partially selected text content
2. Fetch all text elements (line 211). For this selection, this will return:
["aph 1 t", text:s/, "ext", "Paragraph 2"]
3. Fetch all intersecting paragraphs:
[, ]
4. For each text element, remove it from the DOM using mergeIntoParent (line 216).
After this step the DOM looks like:
text:list
text:list-item
Paragr
text:list-item
text:list-item
text
text:list-item
text:list
5. For each paragraph intersecting the selection, merge the paragraph contents
into the first paragraph touching the selection (line 220). After this step,
the DOM looks like:
text:list
text:list-item
Paragr
text
text:list-item
text:list-item
text:list-item
text:list
6. Finally, remove each paragraph that was merged except the first. Paragraph
removal allows for the same container collapsing behaviour that text element
removal does (line 166 & 169). Both a list, and a list-item fit the definition
of an auto-collapsing container based on the previously defined rules, so
these could get automatically cleaned up when the paragraph element is removed.
After removing p2, the auto-collapse behaviour would see that the text:list-item
is removable if it is empty. In this case, there is no more ODF characters or
text content, so this element is collapsed. The list-item's parent list would
also be checked to see if it can be removed. As the list still has another
list-item, and the list-item contains a paragraph, and teh paragraph contains
text and character data, the list will NOT be collapsed.
text:list
text:list-item
Paragr
text
text:list-item
text:list
7. Fix the cursor positions.
text:list
text:list-item
Paragr
text
text:list-item
text:list
Worked Example: Removing a list
===============================
Finally, removing a list entirely.
Paragraph 1 ttext:s/ext
text:list
text:list-item
Paragraph 2 text
text:list-item
text:list
Assuming the user has selected the range between the anchor and cursor nodes:
1. Split text nodes at the selection boundaries. This ensures odtDocument.getTextElements
will never return partially selected text content
2. Fetch all text elements (line 211). For this selection, this will return:
["aph 1 t", text:s/, "ext", "Paragraph 2"]
3. Fetch all intersecting paragraphs:
[, ]
4. For each text element, remove it from the DOM using mergeIntoParent (line 216).
After this step the DOM looks like:
Paragr
text:list
text:list-item
text
text:list-item
text:list
5. For each paragraph intersecting the selection, merge the paragraph contents
into the first paragraph touching the selection (line 220). After this step,
the DOM looks like:
Paragr
text
text:list
text:list-item
text:list-item
text:list
6. Finally, remove each paragraph that was merged except the first. Remembering
that paragraph removal includes auto-collapsing containers. When p2 is removed,
it's text:list-item is checked to see if it can collapse. In this case, there
is no content left, so the list-item will remove itself. It will then check
if the list can collapse, which again is empty. As a result of these checks,
the list is properly cleaned up once the content is removed.
Paragr
text
7. Fix the cursor positions as per usual
Paragr
text
Well... if that hasn't put you to sleep yet I'm impressed. Any queries, questions,
or complaints... as ever, I'm happy to answer anything I can.
Happy hacking!
Philip