Writing Performant Spreadsheets

Spreadsheets are computer programs. Consequently, they are subject to the same computational limitations as other computer languages. Programmers in different computer languages, like Java, Python, and JavaScript, need to understand how fast their programs will run against large amounts of data. The same considerations hold for spreadsheet programmers.

A spreadsheet's performance depends on the amount of time each worksheet takes to complete. In turn, each worksheet's performance depends on the amount of time each cell takes to complete.

We have designed this section to help Pebblers create spreadsheets that can run against arbitrarily large input and still complete in the fastest possible time.

To understand whether or not your spreadsheet will execute quickly, Pebblers must first understand the different categories of cell-level operations in spreadsheet computing. Each of these different categories of operations has different performance profiles. Some categories perform better than others.

Pebblers should reference the following list of the different categories of cell level operations in order of fastest to slowest.

📘
Relative Speed varies depending on the speed of your CPU and the amount of data being processed
The provided relative times are for worksheets that contain a thousands of rows of data. An individual Pebblers' experience may vary. Don't focus too much on the timings. Consider the relative performance of each category of cell or range operation to better understand performance impact.

Cell Level Operation	Example	Relative Speed	Scalable?	Big O Notation
Arithmetic Calculation in a single cell	43 + 17 * 3	Less than a millisecond	Yes	Constant time O(1)
String Operation in a single cell	MID, TEXT, LEFT, RIGHT, LEN	Less than a millisecond	Yes	Linear time O(n) where n is the length of the string. String operations on long strings may take longer than on short strings.
Cell Reference	A1, BA$54, $M$1024	milliseconds	Yes	Constant time O(1)
Single approximate match lookup on a sorted range	MATCH(D6,B6:B14,1) MATCH(D6,B6:B14,-1)	tens of milliseconds	Yes	Logarithmic time O(logn) where n is the number of cells in the searched range. The range must be sorted!
Single summation on an unsorted range	SUM(A:A)	tens of milliseconds	Yes	Linear time O(n) where n is the number of rows in the summed range.
Single exact match lookup on a range	MATCH(D6,B6:B14,0)	tens of milliseconds	Yes	Linear time O(n) where n is the number of cells in the searched range.
grouping	sum, mult, balance	tens of milliseconds	Yes	Linear time O(n), where n is the number of rows in the worksheet to be grouped
joining	join, outer-join	tens of milliseconds	Yes	Linear time O(n), where n is the number of rows in the worksheets to be joined.
sorting	sort	hundreds of milliseconds	not really	Log-linear time O(nlogn), where n is the number of rows in the worksheet to be joined. Only apply sorting when grouping won't do!
Approximate match applied to every row in a worksheet	Drag down using the match-rows of MATCH(D6,B6:B14,1)	hundreds of milliseconds	No	Linear time O(m * logn), where n is the number of cells in the search range, and m is the number of dragged-down cells the search is executed in. Consider joining instead!
Exact match applied to every row in a worksheet	Drag down using the match-rows directive of MATCH(D6,B6:B14,0)	hundreds of milliseconds	No	Quadratic time O(m * n) where n is the number of cells in the searched range, and m is the number of dragged-down cells the search is executed in. Consider joining instead!
Cartesian product	cartesian-product	hundreds of milliseconds	No	Quadratic time O(m * n) where m is the number of rows in the first sheet and n is the number of rows in the second sheet. Consider joining instead!
Lookups (vlookup, xlookup, match) performed on multiple columns on every row in the worksheet.	MATCH(D6,B6:B14,0)	tens of seconds	No	Quadratic time O(m n c) where n is the number of cells in the search range, m is the number of dragged-down cells, and c is the number of columns the search is executed in. Consider joining instead!
*IFS performed on multiple columns	SUMIF, SUMIFS, COUNTIFS	tens of seconds	No	Quadratic time O(m n c) where n is the number of cells in the search range, m is the number of dragged-down cells, and c is the number of columns the search is executed in. Consider joining, or group summing instead!

📘
Big O Notation is how computer scientists classify an algorithm's speed.
Pebblers do not have to understand Big (O) notation to write fast-running worksheets; however, if you are interested in how computer scientists think about performance, Wikipedia does a fantastic job of explaining how Big (O) analysis predicts a spreadsheets scalabilty.

Pebblers should use this table when designing worksheets. For instance, it is better to use the join directives to match rows of data in separate worksheets than to perform lookups. The improved performance is why join directives are preferred to lookups when designing spreadsheets with Pebble Stream.

❗️
In general, if you are looking to process millions of rows quickly, avoid unscalable operations, particularly quadratic and n-squared time operations.

Writing Performant Spreadsheets

📘
Relative Speed varies depending on the speed of your CPU and the amount of data being processed

📘
Big O Notation is how computer scientists classify an algorithm's speed.

❗️
In general, if you are looking to process millions of rows quickly, avoid unscalable operations, particularly quadratic and n-squared time operations.

📘Relative Speed varies depending on the speed of your CPU and the amount of data being processed

📘Big O Notation is how computer scientists classify an algorithm's speed.

❗️In general, if you are looking to process millions of rows quickly, avoid unscalable operations, particularly quadratic and n-squared time operations.

📘
Relative Speed varies depending on the speed of your CPU and the amount of data being processed

📘
Big O Notation is how computer scientists classify an algorithm's speed.

❗️
In general, if you are looking to process millions of rows quickly, avoid unscalable operations, particularly quadratic and n-squared time operations.