On 14/09/2013, at 12:14 AM, Friedrich W. H. Kossebau wrote:
[…]
So these properties: * a filter (at least one for empty textnodes, but can be extended by one passed as arg) * a root node from pure, unfiltered DOM, its subtree is where the iterator moves inside * a position, defined by a container and the current position in it Any other things queried are only derived from these core properties. Q: Correct?
Yep. Your summary looks correct.
Q: What is the actual semantic of the container and the positions in it, to what possible points in the DOM can/should container and position point to?
Given the following fragment: <p> a <span>simple</span> fragment </p> For the aforementioned fragment, a list of all exhaustive positions is: <p>, 0 "a"#text, 0 "a"#text, 1 <p>, 1 <span>, 0 "simple"#text, 0 … "simple"#text, 6 <span>, 1 <p>, 2 "fragment"#text, 0 … "fragment"#text, 8 <p>, 3 The iteration logic in nextPosition and previousPosition will report the following subset of these positions (calling container() and unfilteredDomOffset() ): "a"#text, 0 <p>, 1 "simple"#text, 0 … "simple"#text, 5 <span>, 1 "fragment"#text, 0 … "fragment"#text, 7 <p>, 3 Note, it doesn't return a mapping that needs to be translated, but simply a subset of the actual positions. The reasoning behind the subset I'm going to guess is because the eliminated positions are semantically identical to one or more of the included positions. E.g., (<p>, 0) means "before the first child", while ("a"#text, 0) means before the first character. These two positions would be visually indistinguishable in a selection, so one of them can safely be ignored. In the same way, ("a"#text, 1) means after the last character, which is basically the same as (<p>, 1) which is after the first child, but before the second child of <p>. I cannot really explain why this filtering has been done this way, but it appears that various places up the stack rely on some of these things. E.g., TextPositionFilter.acceptPosition will have a runtime assert if the unfilteredDomOffset === text.length (OdtDocument, line 204)
Q: Most API ignores the concept of container and offset and instead talks about nodes/siblings. But both use the same value of the "currentPos" var. Confuses me completely. Why is the container concept irrelevant in those calls? Or do these two different views on the iterator perfectly go along?
The views go together. nodes/siblings are used for navigation and searching. Once the desired position is found, container & unfilteredDomOffset return the location the iterator has ended up at. currentPos is a really confusingly named variable, as it's behaviour follows one of two patterns: For a text node, currentPos = index in text string (0 => text.length - 1) For an element node, currentPos === 0 until all children within the node are iterated. In this sense, it is almost the direction of travel. I.e., - When currentPos === 0: report current node, then progress to child or sibling (travel direction is down the tree towards children) - When currentPos === 1: report current node, then progress back to the parent (travel is up the tree towards parents)
So the states: The filter is at least the one for empty textnodes, so always exists and works. The root node is required in the API dox, so always exists (but could be moved out of the DOM during the lifetime of the iterator, API dox needs to require that this is not happening perhaps)
Handling of this is already defined in the standard for the TreeWalker. Potentially the API dox just need to link to this for the technical detail.
The position is stored via the "walker" object and the "currentPos" variable. The passed root node could have a subtree with no positions (after filtering). Q: What values would the properties have in that state? Q: Or do we have to require that the root node allows at least one position?
(I want to prefix these answers with the quick disclaimer that PositionIterator lacks almost any protection from abuse. It works generally in practice at the moment because there are not that many consumers of the iterator. It is relatively straightforward to cause the behaviour to run off the rails with the scenarios you've listed.) There is some constructor checks (as per one of the later questions) that happen in init(). This basically means the first call to previousPosition/nextPosition will return false, and the TreeWalker will not change position. If nextPosition is called and there are no walkable nodes, the function call will return false, but calls to container() will return rootNode, and unfilteredDomOffset will return container().childNodes.length. So it will return valid nodes and positions… just potentially ones that do not pass the supplied node filter (not likely desirable behaviour). If setUnfilteredPosition is called when there are no walkable nodes, an assertion will be thrown (line 295 & 326).
Q: Should the iterator always be in positions which are allowed by the filter? "moveToEndOfNode()" could resolve in a position that is not valid, because of "walker.currentNode = node;" which will not check against the root node or the filter (/me to blame for that code, but noone cleaned it as well ;) ). "setUnfilteredPosition()" is also not checking if the container is inside the root. Seems to me the states of the position are not properly defined, or at least slightly fragile with some methods.
Yep. +9000 votes from me on this. In my opinion, PositionIterator should be further stripped back to essentially be a TreeWalker, with the unfilteredDomOffset behaviour. Basically, the behaviour of root node and first container after setUnfilteredOffset should follow the much clearer TreeWalker standard behaviour in that they may not pass the supplied node filter. The current holes (thus far) have not resulted in crashes or incorrect behaviour *yet*, hence why none of them have been fixed. I think the correct approach to this is actually more around encouraging consumers of the position iterator to have less strict expectations about the positions returned, as this leads to the consumers being far more robust, and able to be re-used with arbitrary positions. At the moment, there are a bunch of places where a PositionIterator is used purely to try and find a "valid" position that the TextPositionFilter will understand (e.g., SessionController, line 209). I suspect there are undiscovered bugs that will be caused by "raw" positions being passed into the TextPositionFilter.
Questions for the implementation: Q: Why does container() return the parentNode in case of currentPos === 0 && t !== Node.TEXT_NODE, and not only if t !== Node.TEXT_NODE?
See previous answer for my best guess. I think this is eliminating some semantically equivalent positions.
Q: So what is the semantics of "container", where is the difference to "currentNode"?
container() will report an element's position within it's parent container for the first position. E.g., in the demo fragment above, it reports (<p>, 1) when the iterator is actually at (<span>, 0). These are almost identical points. For any position other than the first, container() === currentNode I can't see a clear reason as to why this is chosen, other than it makes the implementation of the unfilteredDomOffset() relatively straight forward.
Q: In init() why is the currentPos set to 1 if walker.firstChild() === null?
As noted in one of the previous examples, currentPos is the direction of travel. Setting it to 1 means "this node is completely iterated and there are no further positions available". This would actually catch your example of a root node with no walkable children.
Q: Why seems currentPos === 0 only valid with existing visible child nodes, what?
Previously answered. This is the direction of travel behaviour again :-)
And than these naming issues continue to confuse me: * getter methods named "get*" vs "*" * getter and setter methods too similar, e.g. "nextPosition" vs. "rightNode" * mixing of "right"/"left" and "next"/previous"
+1 agree with this. There is a difference in query vs move, but I don't think this is consistently applied (or even consistently used…).
* "dom" in the "*Offset" methods is redundant, no other offset method
There is another offset method: unfilteredDomOffset I actually want domOffset eliminated :-). At the moment it is only used in OdfDocument.getPositionInTextNode. Usage of this particular method is fraught with danger, as it is very difficult to translate this back into *real* DOM positions, because the returned offset is the number of filtered siblings. As such, one can't just do: container().childNodes[domOffset()] As this is will be incorrect if container() contains a filtered out node such as an empty text block or a cursor.
* "setUnfilteredPosition" sounds to me as if there is an unfiltered position possible and that gets set
This naming difference used to make sense when there was a method called setPosition which could only handle "filtered" container + offset as input pairs. It's name (similar to unfilteredDomOffset) is a remnant of a time bygone when PositionIterator exposed 3 different coordinate systems. I'd feel comfortable removing the "unfiltered" part out of setUnfilteredPosition and unfilteredDomOffset.
Q: What do you think about these renamings: // modifying moveToNextPosition() <- nextPosition() moveToPreviousPosition() <- previousPosition() moveToEnd() moveToEndOfNode(node)
+1 for these
setPositionByUnfilteredPosition(container, offset) <- setUnfilteredPosition(*) (hm, ideas for better name? moveCloseToUnfilteredPosition?)
setPosition seems clearest to me. The API dox detail the behaviour when this is called with a position that would not pass the node filter. There is no longer the confusion of multiple ways to set the position on a PositionIterator either which was the whole reason for the "unfiltered" part in the name.
// querying getContainer() <- container()
+1
getOffset() <- domOffset()
-1. As listed above, this is most likely NEVER the function a consumer wants to use. I think unfilteredDomOffset should become getOffset :-). And domOffset one should be renamed to "notTheDomOffsetYoureLookingFor" =)
getCurrentNode()
+1
getUnfilteredOffset() <- unfilteredDomOffset()
-1. See previous explanation
getRightNode() <- rightNode() getLeftNode() <- leftNode() getPreviousSibling() getNextSibling()
// querying, just used from tests getNodeFilter()
+1 for all these.
Sorry, I too long postponed getting a clue of all this (again), but better now than never.
As you said… better late than never :-). A different pair of eyes is always good. I fear I have become accustomed to the sharp edges and unthinkingly avoid them :-/. It would definitely help my sanity if we could simplify this and reduce the number of ways it can break :-)