Writing Performant Spreadsheets

Computer scientists refer to the amount of time a program takes to execute as its performance.

Spreadsheets are computer programs. Consequently, they are subject to the same computational limitations as other computer languages. Programmers in different computer languages, like Java, Python, and JavaScript, need to understand how fast their programs will run against large amounts of data. The same considerations hold for spreadsheet programmers.

A spreadsheet's performance depends on the amount of time each worksheet takes to complete. In turn, each worksheet's performance depends on the amount of time each cell takes to complete.

We have designed this section to help Pebblers create spreadsheets that can run against arbitrarily large input and still complete in the fastest possible time.

To understand whether or not your spreadsheet will execute quickly, Pebblers must first understand the different categories of cell-level operations in spreadsheet computing. Each of these different categories of operations has different performance profiles. Some categories perform better than others.

Pebblers should reference the following list of the different categories of cell level operations in order of fastest to slowest.

📘

Relative Speed varies depending on the speed of your CPU and the amount of data being processed

The provided relative times are for worksheets that contain a thousand rows of data or less. An individual Pebblers' experience may vary, but the relative performance of each category of cell operation is still helpful to understand.

Cell Level Operation

Example

Relative Speed

Big O Notation

Arithmetic Calculation in a single cell

43 + 17 * 3

Less than a millisecond

Constant time O(1)

String Operation in a single cell

MID, TEXT, LEFT, RIGHT, LEN

Less than a millisecond

Linear time O(n) where n is the length of the string.

Cell Reference

A1, BA$54, $M$1024

milliseconds

Constant time O(1)

Single approximate match lookup on a sorted range

MATCH(D6,B6:B14,1)
MATCH(D6,B6:B14,-1)

tens of milliseconds

Logarithmic time O(logn) where n is the number of cells in the searched range.

Single exact match lookup on a range

MATCH(D6,B6:B14,0)

tens of milliseconds

Linear time O(n) where n is the number of cells in the searched range.

Join

join, outer-join

tens of milliseconds

Linear time O(n), where n is the number of cells in the worksheets to be joined.

Approximate match applied to every row in a worksheet

Drag down using the match-rows of MATCH(D6,B6:B14,1)

hundreds of milliseconds

Linear time O(m * logn), where n is the number of cells in the search range and m is the number of dragged down cells the search is executed in.

Exact match applied to every row in a worksheet

Drag down using the match-rows directive of MATCH(D6,B6:B14,0)

hundreds of milliseconds

Quadratic time O(m * n) where n is the number of cells in the searched range and m is the number of dragged down cells the search is executed in.

Cartesian product

cartesian-product

hundreds of milliseconds

Quadratic time O(m * n) where m is the number of rows in the first sheet and n is the number of rows in the second sheet.

Lookups performed on multiple columns on every row in the worksheet.

MATCH(D6,B6:B14,0)

tens of seconds

Quadratic time O(m n c) where n is the number of cells in the search range and m is the number of dragged down cells and c is the number of columns the search is executed in.

📘

Big O Notation is how computer scientist classify an algorithm's speed.

Pebblers do not have to understand Big (O) notation to write fast-running worksheets; however, if you are interested in how computer scientists think about performance, Wikipedia does a fantastic job of explaining how this works.

Pebblers should use this table when designing worksheets. For instance, it is better to use the join directives to match rows of data in separate worksheets than to perform lookups. The improved performance is why join directives are preferred to lookups when designing Pebble Stream enhanced spreadsheets.


Did this page help you?