In this edition, we delve into the evolution of creating cost-effective alternatives to traditional products, today focusing on expense management software. A decade ago, innovations like the initial releases of Expensify required hefty investments for simple point solutions. Nowadays, such solutions need to expand in multiple directions to maintain their competitive edge, as businesses were once willing to pay dearly for software that often led to fragmented systems and excessive subscriptions.
The ongoing joke (but not really) in the industry is that many B2B SaaS products are just fancy versions of Excel. The future might see a shift away from these visually appealing but complex interfaces towards using AI for back-office tasks through simple APIs, potentially reducing the need for many subscriptions.
Today, we'll concentrate on a particularly generic but crucial area of automation: expense software. The uniformity in accounting standards has made expense management a prime target for early automation efforts.
However, managing expenses remains cumbersome until it's possible to upgrade to a more comprehensive system. I’ll guide you through replicating an expense management tool, focusing on how to reconcile receipts with expenses after a project or event. This is especially useful if you're organizing an event or managing a project and don't have access to sophisticated company credit card software, making receipt management cumbersome.
The final system will look like this:
I’m going to break this project into three sections:
Environment Set-Up
Development
Execution
1. Environment Set Up
Our stack is going to look similar to last time - using just three tools:
Google Colab
OpenAI’s API
Google Drive
Let’s set up a new environment with Google Colab. You can create a new notebook here: https://colab.research.google.com/.
Name your notebook, and before we write any of our main code, we will need to install some packages. In the first block, copy and paste this code and hit the play button to the left of the code:
!pip install PyPDF2 openai
Next, let’s set up and obtain an OpenAI API key. I’ve gone over this before, and if you’ve been following my previous articles, I outline how to get one. But as a refresher (you'll probably need a new key here anyways), here’s a refresher article.
Finally, let’s set up our Google Drive environment. We will read a folder of PDF receipts directly from Google Drive and return a compiled Excel file to the Drive. So, to make things easy, let’s set up a folder as close to your home directory as possible with the name “Expenses.” Inside the folder, let’s create another folder called “Receipts.”
The official path to the receipts folder would be: “/content/drive/My Drive//ExpenseApp/Receipts”
2. Development
This part is an easy copy and paste with a few edits, but essentially, the steps we are coding serve these sequential functions:
Extract text from PDF
Using OpenAI’s API - pass the text through the model and prompt it to deliver information in a specific format. This helps account for the variability in the structure/format of different invoices.
Write to Excel - Create a new Excel sheet with each record being a new line.
Here’s the full code to copy and paste into a new block in Google Colab:
# Import necessary libraries
import os
import re
import pandas as pd
from PyPDF2 import PdfReader
from openai import OpenAI
client = OpenAI(api_key='INSERT API KEY HERE')
# Function to extract text from PDF
def extract_text_from_pdf(pdf_path):
text = ""
with open(pdf_path, "rb") as file:
pdf = PdfReader(file)
num_pages = len(pdf.pages)
for page in range(num_pages):
text += pdf.pages[page].extract_text() if pdf.pages[page].extract_text() else ""
return text
# Function to use OpenAI API for parsing text
def parse_receipt(text):
completion = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": f"Extract the AMOUNT, DATE, and VENDOR from this receipt text: {text}. Please respond exactly in this format: AMOUNT: $amount DATE: MM/DD/YYYY VENDOR: vendor"}
]
)
print(completion.choices[0].message.content)
return completion.choices[0].message.content
# Function to parse the structured string into components
def extract_info(formatted_string):
pattern = r"AMOUNT:\s*(.*?)\s*DATE:\s*(.*?)\s*VENDOR:\s*(.*)"
match = re.search(pattern, formatted_string, re.DOTALL)
if match:
return match.groups()
return None, None, None
# Function to write data to Excel
def write_to_excel(data, filename="output.xlsx"):
df = pd.DataFrame(data, columns=["Date", "Vendor", "Amount"])
df.to_excel(filename, index=False)
# Main function to process all PDFs in a folder
def process_receipts(folder_path):
data = []
for filename in os.listdir(folder_path):
if filename.endswith('.pdf'):
pdf_path = os.path.join(folder_path, filename)
text = extract_text_from_pdf(pdf_path)
if text is None:
print(f"No text extracted from {filename}. Skipping.")
continue
print(f"Extracting from {filename}:")
print("Extracted Text:", text[:500]) # Print the first 500 characters of the extracted text
formatted_string = parse_receipt(text)
print("AI Response:", formatted_string) # Debug print of what the AI parsed
if formatted_string:
amount, date, vendor = extract_info(formatted_string)
print("Parsed Data:", amount, date, vendor) # Debug print of the parsed data
if all([amount, date, vendor]):
data.append((date, vendor, amount))
else:
print(f"Data missing in fields for file {filename}")
else:
print(f"No formatted data returned for file {filename}")
# Writing the data to Excel, adjust the filename or path as necessary
write_to_excel(data, filename='INSERT PATH TO FOLDER HERE'/Processed Receipts.xlsx')
print("Data written to Excel successfully.") # Adjust path as needed
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
# Set folder path and process receipts
folder_path = 'INSERT PATH HERE' # Adjust the path to your receipts folder
process_receipts(folder_path)
You’ll need to make three main edits here:
Insert your OpenAI API Key
Insert your key at the line near the top that is “
client = OpenAI(api_key='INSERT API KEY HERE'):
Insert the path to your receipt folder
Insert the path near the bottom at
folder_path = 'INSERT PATH HERE'
Insert the path to where you want the program to deposit your new receipts
Insert the path where it says:
write_to_excel(data, filename='INSERT PATH TO FOLDER HERE/Processed Receipts.xlsx').
Keep the “/Processed Receipts.xlsx” part, as that is what your new Excel file will be called.
That’s it! You’re ready to run it.
3. Execution
Drop a few PDF invoices in the newly created “Receipts” folder, navigate to Google Colab, and run your code by clicking “Runtime” on the menu bar and navigating to “Run All.”
You’ll be prompted to connect to Google Drive, and your script will run! After a few seconds of running it, you should see a new Excel file, that looks like this, with your invoices nicely extracted:
Now this is just an example of a small slice of what this program could do, if you want, you can have ChatGPT edit your code to have it organize your personalized expenses off your credit card bill!
Limitations and Next Steps
Why Not Use Google Sheets?
You might wonder why we're not using Google Sheets for our project. The reason is straightforward: accessing Google Sheets via its API can be complex, and we aim to keep our tools simple and accessible for everyone. I'll gradually introduce more technically challenging projects, but for now, the focus is on keeping things simple.
What's Next?
At the stage this project marks a significant turning point as everyone has unique accounting and reconciliation needs. In this project, we explore a new approach that uses OpenAI to parse data from invoices, making the process more intuitive than using traditional methods like Regex. Using ChatGPT, take a swing at having it help you code a the program to meet your specifications. It’s very good at error handling for simple applications like this as well!
A Thank You to Our Subscribers
A special shout-out to all our new subscribers, many of whom joined after my recent LinkedIn post. Sharing content with peers I respect is always a bit daunting, but your support has been incredibly encouraging.
Help Spread the Word
If you found this project useful, please consider sharing it. Your support is crucial as we aim to equip the workforce for a new era of AI and automation.
See you in two weeks!
— Nishad