August 2013 - WebODF - lists.nlnet.nl

Advancing WebODF paste support
by Philip Peitsch 01 Nov '13

01 Nov '13

Hi everyone, I'm starting to work towards improving the paste support in webodf so that it can do more than just paste plain text in a single line. The first step here is obviously designing what might work. I've put some documents together on this and have them sitting out in a branch (insert-fragment in my repo if anyone is interested). Here is where I'm at so far: Both of these are available in the branch as well at - https://gitorious.org/webodf/peitschie-webodf/source/insert-fragment:webodf… - https://gitorious.org/webodf/peitschie-webodf/source/insert-fragment:webodf… Copy & paste behaviour and support ================================== Overview -------- Paste support is extremely important to get right for an editor. Users have an expectation that data can be replicated with a high degree of accuracy when copied and pasted. This includes things such as pasting images, styled text (e.g., bold, underline), paragraph breaks, lists etc. This README does not cover details about how data is written or retrieved from the clipboard. For that information, please see README_clipboard.txt. Desired paste support --------------------- * Plain text with paragraphs, tabs, spaces * HTML text with direct formatting * HTML tables * HTML table rows & columns * Images (standalone) * Images within mixed HTML fragments (e.g., HTML fragment with paragraphs & images etc.) * Lists (bulleted & numbered) Requirements ------------ 1. Paste should be able to be undone/redone safely 2. Extra "formatting" steps should be able to be undone without removing the pasted content. E.g., automatically converting to a list or table (optional advanced feature) 3. Want to avoid duplicating logic in other operations (e.g., paragraph splitting & merging behaviours, image insertion, style addition, adding new list items etc.) 4. Any new operations must be able to be OT'ed easily Design ------ There are effectively two opposing approaches that can be taken to handle pasting of new data: 1. Create a complex operation (e.g., OpPasteData) that is responsible for determining how to insert a fragment into the document 2. Create a paste handler that attempts to break the paste fragment into a series of smaller operations Option 1 - Pro: Easily integrates with existing undo/redo manager - Con: OpPasteData likely to contain a lot of duplication of existing ops however - Pro: Less operations generated (less on-the-wire traffic) Option 2 - Pro: Better re-use of existing operations - Pro: Less complex operations required - Con: Need significant re-work of undo operation grouping to allow paste to be undone/redone OT adaption of a paste command is relatively straightforward for both options, as both largely generate insert-only operations. This means that usually the start position just needs to be shifted around to cope with added or removed characters. Based on the pro's and con's, Option 2 is the best approach for paste handling. The key argument for this is that it makes better use of existing operations (requirement#3). The existing undo manager grouping logic is not very extensible, and should be reworked anyways. Example paste steps ------------------- 1. Extract data from clipboard. Order of preference is - Custom webodf fragment ("application/vnd.webodf") - LO/MSWord fragment (??) - RTF fragment (??) - HTML ("text/html") - Plain text ("text/plain") 2. Convert data into webodf fragment (and associated styles) using appropriate import filter 3. Split the fragment up into separate paragraphs 4. Start a new transaction/undo group (this is a new feature...) 5. Add any new named styles (Op???) 6. Add any new auto styles (Op???) 7. For each paragraph - start a new paragraph at the current position (OpSplitParagraph) - insert the new paragraph (OpInsertFragment) 8. After all paragraphs have been inserted, remove the FIRST created split to merge the first inserted paragraph with it's previous sibling (allows the paragraphs to merge with the correct paragraph merge logic) 9. Finish transaction/undo group (Actually, this probably happens on the next edit op start) 10. Auto-convert things to lists, links, etc. (optional). This should be in a new transaction/undo group Questions --------- * Should pasting multiple paragraphs into a list should result in a new list item per paragraph? * Should links be automatically converted? Anyone have thoughts, concerns, feedback or cookies (I'm a little hungry after all this research…)? Cheers, Philip

2 2

Foreign elements & cursor positioning
by Philip Peitsch 27 Aug '13

27 Aug '13

Hi everyone, I have been looking at some funky "valid" ODT documents and discovering some fun behaviours around cursor positioning. I thought this was a good time to bring up some potential changes, as I noticed on IRC the other night that Friedrich had also discovered sometimes the user is unable to place the cursor inside a text:a block. The existing rules for valid cursor positions only allow placing the cursor inside what is known as a "grouping element", which is either a span, p or h. This has already caused me some challenges, because for one of the highlight overlays I'm doing on top of webodf I need to wrap document text in a normal HTML span, which then prevents the cursor from entering. In the situation Friedrich found, there is actually no requirement saying the text content within a text:a element must be placed in a span. The suggestion on IRC yesterday was to just add text:a into the grouping element definitions (a reasonable one), but I got to thinking a little more about whether this is the best long-term solution. Especially as there are other elements that could possible contain character data. And for extensibility purposes, ideally we don't want to have to redo this core cursor positioning logic every time the UI requires some extra containers and wrappers to help display things :). Re-reading the ODF specs[1], the preferred approach laid out is actually a blacklist, not a whitelist as we're currently doing. The blacklist as required for processing is already defined & used in StyleHelper.isAcceptedNode, is used for OpRemoveText and OpApplyDirectStyling. Would anyone have any problems if I changed OdtDocument.TestPositionFilter to use the blacklist approach instead? The blacklist will need an additional entry to exclude the cursor as well, so I'll put that in also. >From my testing with this, it appears to function identically to the existing approach, with the added bonus of being able to navigate within text:a tags that don't contain a span :) Cheers, Philip P.S., sorry for all the list spam lately! Apparently I have too much time for philosophy :) [1] http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#F…

2 2

2 month performance blitz for webodf
by Philip Peitsch 14 Aug '13

14 Aug '13

Hi all, I'm just putting together a quick plan of what I'll be looking at with regards to performance, to address some queries & requests raised as part of MR#114 (https://gitorious.org/webodf/webodf/merge_requests/114) As part of this next cycle, my goal is to have editing of a 20 page ODT and flat ODT with images up to a level that is responsive. At the moment, deleting a single character at the end of a sample 11 page document produces an extremely noticable 500ms delay before the character is gone. >From my initial investigation so far my plan of attack for this is: 1. Improve obviously suboptimal paths: OdtDocument * TextPositionFilter - Most of the container checks should be filtered nodes 2. Eliminate the number of times the average run needs to step through the document to find a position: * e.g., upgradeWhiteSpace is called at a specific position. Any Op using this therefore steps through the document up to that specific position usually 2 or 3 times. 3. Implement a bookmark system to quickly retrieve iterators at specific positions within the document. I've used one of these internally for several months now at a different layer above webodf, and have found this to be the most significant improvement. The plan with each of these is to introduce benchmarking numbers to allow the performance improvements to be proven. As such, I don't plan on addressing any of the performance related concerns in MR#114, as they are literally (from initial profiling checks) drops in a very large ocean of improvement. If people have other ideas, complaints, etc., as ever, I'm open to anything :) Cheers, Philip

3 2

MR#122
by Philip Peitsch 13 Aug '13

13 Aug '13

Hi, Just wanted to politely bump https://gitorious.org/webodf/webodf/merge_requests/122 for review. It's been waiting about a week now for any feedback. Cheers, Philip

1 0