The computing language Python appeared on my radar 10 years ago. I was having a great conversation with a college student who loved Excel. I started asking what class at school taught her about Excel and how she used Excel. But soon, she started telling me about an analytics course that used Python and how she could do so much more with Python than she could do in Excel.
That day, back in 2013, I bought a book on Python. I read through the examples, such as a customer segmentation process called k-means clustering, that seemed very powerful. But I made no progress with Python because I couldn’t follow the steps to get it installed on my computer. Any install process that starts with “Open a command prompt with administrative rights in Windows” is an immediate barrier to entry. The steps to be followed were something along the lines of “Download and install Python. Install Jupyter Notebook so you have a place to edit and run code.” Then it directed me to install a dizzying array of different open-source libraries—all from a command prompt. This was all before I could attempt my first line of Python code. It was all too complicated. The Python book collected dust. Eventually, the book was donated to a thrift store.
Microsoft Removes Many Barriers to Entry
To use today’s preview of Python in Excel, you need to be running Microsoft 365 and join the Microsoft Insiders program at the beta level. Restart Excel, and you’ll see a large Insert Python icon between the Function Library and the Name Manager (see Figure 1).
To start running Python, you can use the Insert Python icon, simply type “=PY(” or press Ctrl+Alt+Shift+P. After any of these actions, the Excel Formula Bar changes to a Python Editor. Paste in some Python code from the web or ChatGPT. A drop-down icon to the left of the Formula Bar offers a choice to return the Python results as Excel Values or as a Python object that you can reuse in later cells.
A Simple “Hello World” Example of Python in Excel
Open any spreadsheet that has some existing Excel data, including at least one column of numbers.
Select a blank cell to the right of your data.
Press Ctrl+Alt+Shift+P to change the Excel formula bar to a Python editor.
Most people keep their Excel formula bar set to display just one line of the formula. Use the “V” icon on the right side of the formula bar to expand the formula bar. Hover the mouse on the thin gray bar just below the formula bar to reveal a vertical two-headed arrow. Click and drag so you can see four to six lines in the formula bar.
A table in Python is called a “data frame.” Most examples on the internet store the table in a variable called “df,” which is short for data frame. Type “df=” to get your Excel data into Python. Using your mouse, click the top-left corner cell of your data. Use either the mouse or Ctrl+* or Ctrl+Shift+Down arrow, Right arrow to select the Excel data. The Python editor will create the following line of code:
df=xl("A1:C1262", headers=True)
In this code, the XL function is a new function designed to pass Excel data to Python. The headers=True is necessary to tell Python that the first row of your data set contains descriptive headers instead of numbers.
In the Python version of the Excel formula bar, use Enter to go to a new line and Ctrl+Enter to accept the formula. Press Enter to go to the second line of your code. Type df.describe(). Use the drop-down arrow to the left of the formula bar and choose to return the results as an Excel Value (see Figure 2).
Press Ctrl+Enter to run your code.
After a short delay, Python returns descriptive statistics about your data, as shown in Figure 3.
This is a simple example that could easily be replicated with a handful of Excel functions. The important concept here is that you were able to run your first Python script without ever leaving Excel. Behind the scenes, Excel set up an environment with all of the libraries shown in Figure 4 already loaded. As you copy code from ChatGPT or the internet, you can safely eliminate any line that’s already shown in Figure 4.
Actually, the code in Figure 4 might change from time to time. You can see the current initialization code by using the Initialization icon shown back in Figure 1.
An Advanced Customer Segmentation Example in Python
When I first read the Python book back in 2013, I was intrigued by the process called k-means clustering. You start with a data set that shows all of the items purchased by all of your customers. Your goal is to find clusters of similar customers so you can send them a similar marketing offer. Python uses machine learning to group customers into k-clusters with similar buying patterns.
One simple example is students at a middle school dance. If you plotted the location of each student in the gym, the computer would be able to identify a cluster of boys on one side of the gym and a cluster of girls on the other side of the gym. Actually, k-means will simply identify that there’s a cluster here and another cluster there. It will be up to you to figure out what each cluster has in common. If you ran the k-means process again, but this time asking for 10 clusters, you might find cliques, such as the football players here and the chess club members over there.
Another example is from a big-box retailer. If it can identify expectant mothers from customer purchases, it can develop a relationship with the mom-to-be, as she’s going to be needing a crib, stroller, and a lot of diapers. Find all the customers who are buying folic acid and have k-means look at all of the other purchases, and a cluster of expectant mothers will become clear.
As for me, I sell books about Excel, Visual Basic for Applications (VBA), Power Query, and Power BI. If I have a new book coming out about Power BI, I’d want to send a targeted email to customers who have shown an interest in Power BI in the past.
For my first experiment with k-means, I used 100 random customers with scores along an X and Y dimension. I guessed that there should be five clusters of customers in my database. Here’s a simplified explanation of the k-means process: The k-means algorithm randomly draws five cluster centers. It measures the distance from each cluster center to the nearest customer. The goal is to move the cluster centers around in order to minimize the sum of squares of the distance from the cluster center to the customers within that cluster. Python will run the algorithm again and again until it arrives at the smallest sum of squares.
Any time I start thinking about building this routine into Excel, my head starts spinning as I consider the VBA or circular reference formulas required.
Figure 5 shows the Python code that I mostly adapted from examples on the internet or by asking ChatGPT. It’s five basic steps: (1) Import the library that includes the k-means algorithm; (2) identify my Excel data using the xl() function; (3) include two lines to perform k-means; (4) most importantly, add a new column to my Excel data that identifies which cluster number each customer belongs to; and (5) chart the results.
The code from Figure 5 is entered as a formula in cell D2. This code returns an image of the chart showing the results of the k-means clustering, as shown in Figure 6. Each group of customers is in a different color, and the center of the five cluster centers is drawn in red.
If I edit the code in D2 to use three clusters instead of five, the chart will update with a new assignment of clusters, as shown in Figure 7. In this case, the center of the teal cluster in the top left is in between two obvious clusters of customers. It looks like five clusters is better than three clusters.
However, another edit of the code to look for eight clusters and the algorithm suggests that the cluster in the lower right could possibly be two distinct clusters, as shown in Figure 8.
Adapting This K-Means Example to Real-Life Data
The example above demonstrates a simple customer segmentation process with 100 customers and their purchases of two products. The chart looks great. But what do you do with this? And how are you supposed to translate your customer purchase information into the X and Y values? I started with a download of Customer ID, Product ID, and Revenue. Using an XLOOKUP table, I classified each product as belonging to one of four categories. I used a query using the Get & Transform tools (Power Query) on the Data tab in Excel to build a table showing Customer ID and then four columns of revenue, one for each product category. This table is shown in columns A:F of Figure 9. (Column B, with the actual customer names, is hidden in the screenshot.) Rather than editing the number of clusters, I built a small pivot table and a slicer so I could easily choose a number of clusters. The result of the slicer is out of view, in cell W11.
Notice how Python in Excel allows me to integrate my existing knowledge of Power Query and Pivot Tables in Excel to do familiar tasks, and then I can call Python for the heavy-lifting portions of the process.
The code shown in Figure 9 doesn’t try to build charts in four dimensions. Instead, the code simply adds a new column for Cluster Number to the right of the original data and then returns the data frame to Excel. There are more than 18,000 customers in my database, and the k-means algorithm runs in under 10 seconds. It’s an amazing second attempt at writing Python.
This is just one powerful example showing just one algorithm available from just one library designed for Python. While Excel loads the five popular Python libraries shown in Figure 4, you can import other libraries such as:
- Astropy for astronomy
- Beautifulsoup4 for screen scraping
- Gensim for natural language processing
- Matplotlib and Seaborn for charting
The complete set of supported libraries is here.
Perform a Google search for Seaborn gallery, and you’ll see 49 new chart types that aren’t available in Excel. A subset of those charts is shown in Figure 10. Click on any chart in the gallery, and simple Python code is presented.
With the addition of Python to Excel, Microsoft has added hundreds of new features to Excel at one time. Most importantly for me, Microsoft has removed much of the complexity of getting Python running on a Windows computer and has eliminated the need to run any code from a command prompt. I can use my existing Excel skills for most of the tasks and then call on Python when I need to do something that would be very difficult in Excel.
November 2023