I need support for a ListBox that can have hundreds of thousands of items in it. One of the bottlenecks for these huge lists is managing a selection - a SparseSet won’t cut it. I have my own implementation for dealing with a huge selection that most operations run in constant time (with some running in O(N) where N ~= selection size), in exchange for some per-item storage.
Would it be possible to make the necessary adjustments to ListBox (and TableListBox) so that the selection related calls are virtual and overridable?
Or should I just make my own copy of ListBox and TableListBox and diverge? What is planned in terms of improvements to these classes?
I’ve no plans to add anything like that… But perhaps a better approach would be to make SparseSet smarter?
The current SparseSet obviously works best when you have a small number of contiguous ranges, but perhaps it should be able to change its internal structure when its content becomes highly fragmented? If it could decide when to change from using ranges to using a BigInteger, that’d use only one bit per item, so would be at least as efficient as any other mechanism you could use.
Interesting approach I will have to think about that. What sticks out is that iterating over the selected elements might be a little bit painful.
As far as efficiency goes I’m not so much concerned about the storage, but more about the performance.
Looking at ListBox, I’m a little bit confused about how sorts interact with the selection. Is it the case that the SparseSet representing the selection is based on visible row indexes? It seems that way. This means that any sort would by necessity have to re-map the selection, or else functions like selectRangeOfRows() would not work properly. After a sort, lastRowSelected is not updated either, and there seems to be no way to change it from the model…
How are people dealing with multiple selection in a sorted TableListBox?
Maybe before sorting, set a ‘selected’ flag in each of your source items, based on the state of the SparseSet. Then sort them. Then reverse the process and create a new SparseSet from the new order.
I would really love it if the ListBox and TableListBox were refactored so that everything that deals with the selection passed through a tight set of overridable virtual members and I could provide my own implementation…
Remapping the SparseSet on a sort is really not an option, Sometimes a stream of fresh data is added to the list once per second and if there’s both a sort, and a selection, this will get ugly really fast.
Using a BigInteger and 1 bit per row to indicate the selected status is even worse for this…
Instead of calling the ListBoxModel with a SparseSet, it would be nice if the model had to call the ListBox to retrieve the SparseSet (i.e. listBox.getSelectedRowSet()) this way in my model I would not call that routine I would call my own functions that implement the constant-time selection algorithms.
Consider the benefits of this alternative implementation of selections:
The model provides a small amount of storage associated with each logical row
Each logical row is initialized with an integer ‘selection timestamp’ = 0
The list has an integer ‘selection timestamp’ = 1
List has a boolean ‘invert’ indicating if the selection is inverted
A row is considered selected if
row timestamp equals list timestamp, and invert == false, OR
row timestamp does not equal list timestamp and invert == true
To add rows to the selection:
set row timestamp = list timestamp if invert == false OR
set row timestamp = 0 if invert == true
To remove a row from the selection
set row timestamp = 0 if invert == false OR
set row timestamp = list timestamp if invert == true
To deselect all rows:
increment list timestamp, set invert == false
To select all rows
increment list timestamp, set invert == true
To invert the selection:
set invert = ! invert
How to iterate over rows? Maintain a doubly linked list using the per-row storage. When invert == true there is a little extra trickery needed but its straightforward (the list becomes the list of rows to skip, and must be sorted first).
Advantage of this algorithm:
* All operations execute in constant time regardless of list size (HOLY SCHITTE!!!) * The selection is preserved after a sort
Disadvantage:
Requires refactoring of ListBox and TableListBox
Passing a copy of the selected rows does not execute in O(1) time (have to build the SparseSet at that point)
This technique is not perfectly suitable for general use so it wouldn’t be ideal to throw out the SparseSet implementation, but it would be incredibly convenient to have them both available, that is why I suggest a refactoring.
I’m traveling for the week so no progress on this but two things come to mind:
Seems like you’ve already neatly grouped all of the selection operations together in a small set of member functions on ListBox. I think just making those routines virtual would be 90% of the work
I think that any change is going to have to combine sorting and selection into one interface, for the simple reason that when the sort changes, without knowledge of the sort, any selection method is going to have to perform some kind of expensive remapping operation.