Writing Performant Spreadsheets
Computer scientists refer to the amount of time a program takes to execute as its performance.
Spreadsheets are computer programs. Consequently, they are subject to the same computational limitations as other computer languages. Programmers in different computer languages, like Java, Python, and JavaScript, need to understand how fast their programs will run against large amounts of data. The same considerations hold for spreadsheet programmers.
A spreadsheet's performance depends on the amount of time each worksheet takes to complete. In turn, each worksheet's performance depends on the amount of time each cell takes to complete.
We have designed this section to help Pebblers create spreadsheets that can run against arbitrarily large input and still complete in the fastest possible time.
To understand whether or not your spreadsheet will execute quickly, Pebblers must first understand the different categories of cell-level operations in spreadsheet computing. Each of these different categories of operations has different performance profiles. Some categories perform better than others.
Pebblers should reference the following list of the different categories of cell level operations in order of fastest to slowest.
Relative Speed varies depending on the speed of your CPU and the amount of data being processed
The provided relative times are for worksheets that contain a thousand rows of data or less. An individual Pebblers' experience may vary, but the relative performance of each category of cell operation is still helpful to understand.
Cell Level Operation | Example | Relative Speed | Big O Notation |
---|---|---|---|
Arithmetic Calculation in a single cell | 43 + 17 * 3 | Less than a millisecond | Constant time O(1) |
String Operation in a single cell | MID, TEXT, LEFT, RIGHT, LEN | Less than a millisecond | Linear time O(n) where n is the length of the string. |
Cell Reference | A1, BA$54, $M$1024 | milliseconds | Constant time O(1) |
Single approximate match lookup on a sorted range | MATCH(D6,B6:B14,1) MATCH(D6,B6:B14,-1) | tens of milliseconds | Logarithmic time O(logn) where n is the number of cells in the searched range. |
Single exact match lookup on a range | MATCH(D6,B6:B14,0) | tens of milliseconds | Linear time O(n) where n is the number of cells in the searched range. |
Join | join, outer-join | tens of milliseconds | Linear time O(n), where n is the number of cells in the worksheets to be joined. |
Approximate match applied to every row in a worksheet | Drag down using the match-rows of MATCH(D6,B6:B14,1) | hundreds of milliseconds | Linear time O(m * logn), where n is the number of cells in the search range and m is the number of dragged down cells the search is executed in. |
Exact match applied to every row in a worksheet | Drag down using the match-rows directive of MATCH(D6,B6:B14,0) | hundreds of milliseconds | Quadratic time O(m * n) where n is the number of cells in the searched range and m is the number of dragged down cells the search is executed in. |
Cartesian product | cartesian-product | hundreds of milliseconds | Quadratic time O(m * n) where m is the number of rows in the first sheet and n is the number of rows in the second sheet. |
Lookups performed on multiple columns on every row in the worksheet. | MATCH(D6,B6:B14,0) | tens of seconds | Quadratic time O(m n c) where n is the number of cells in the search range and m is the number of dragged down cells and c is the number of columns the search is executed in. |
Big O Notation is how computer scientist classify an algorithm's speed.
Pebblers do not have to understand Big (O) notation to write fast-running worksheets; however, if you are interested in how computer scientists think about performance, Wikipedia does a fantastic job of explaining how this works.
Pebblers should use this table when designing worksheets. For instance, it is better to use the join directives to match rows of data in separate worksheets than to perform lookups. The improved performance is why join directives are preferred to lookups when designing Pebble Stream enhanced spreadsheets.
Updated over 2 years ago