Quick and dirty hack to automate tedious expense report task
anelson October 18, 2024 #hack #genai #claude #productivityI really dislike business travel, but I really REALLY dislike expense reports. They are always so tedious, made worse by user-hostile war crimes like Concur (cursed be its name!). My most recent trip from Kyiv to Boston and back, however, presented a particularly unappealing task: nearly a dozen receipts spread across four currencies and two languages. Thankfully Elastio is not yet a big enough company to waste money on Concur licenses, and we still have an informal process whereby we email receipts to our accountant. However, for her sanity and mine, I needed to make a spreadsheet organizing each expense, including its description, amount, date, and some reference back to the PDF containing the receipt (recall that these receipts are in multiple languages). The prospect of manually entering all this data into a spreadsheet was, to put it mildly, unappealing. It would take me at least 10 minutes of tedious labor.
So I decided to spend 15 minutes fucking about with Anthropic Claude Sonnet 3.5 to make it do it for me!
The Problem
Hereās what I was dealing with:
- 11 individual receipts
- 4 different currencies (USD, EUR, HUF, UAH)
- Documents in both English and Ukrainian
- A mix of formats: hotel bookings, flight receipts, train tickets, and taxi fares
I can read the languages involved, but our accountant doesnāt read Ukrainian, and Iām not sure what our rules are for currency conversion.
I decided to leverage the power of LLMs, specifically in this case Claude 3.5 Sonnet.
The obvious solution that doesnāt work.
At first I thought this would be extremely trivial. I just dragged all of the PDF receipts over into Claudeās web UI andā¦
But I have more than 5 PDFs! FML!
The Quick and dirty solution that does work
When it comes to being lazy, I can be very persistent.
I invoked GPT-4o from my terminal (this would have also worked with Claude) to find out what the terminal one-liner to concat a bunch of PDFs would be. It came up with:
pdfunite *.pdf output.pdf
But I donāt seem to have pdfunite
in my path. A follow-up question revealed that itās part of the poppler
Brew package on macOS:
brew install poppler
Having run pdfunite
successfully, I had concatenated a bunch of small PDFs into one big 8MB PDF with all of my receipts. I wasnāt sure if Claude would be smart enough to understand that this was in fact multiple receipts, especially since some of them were multi-page receipts (Bolt, why!??). To ensure it had plenty of context, I used eza --oneline --no-icons *.pdf
to make a list of all of the PDF file names that went in to the monster PDF, and hoped for the best:
From the attached PDF containing all receipts from my recent business trip, generate a CSV with the following:
- Date of expense
- Description
- Amount
- Currency
I had to concatenate all receipts into one PDF file due to limitations on attachments. This PDF contains the following individual PDFs:
[List of PDF filenames produced with `eza --oneline --no-icons *.pdf`]
After getting the initial results, I was pleased to discover that my hack actually worked! But the description didnāt quite match the file names, and yet the description was more informative than the file names that I used, so I decided the Excel sheet should contain both the PDF file name of the receipt, and a description of the expense. That was easy:
Add another column, after āDateā, called āFilenameā with the name of the PDF containing that particular expense, from the list I provided earlier.
The Results
The LLM produced a neatly formatted CSV that looked like this (most rows redacted):
To my amazement, Claude correctly figured out which original PDF filename went with which expense in the concatenated PDF, even though there was nothing in the concatenated PDF to link any particular page to its original file name!
This quick hack saved me at least 10 minutes of mind-numbing tedium, and took no more than 15 minutes including fucking around with the PDF concatenation. If this were a thing I had to do regularly, I could get Claude to package that up into a shell script and it would take a minute at most to prepare the spreadsheet.
In fact, Iāve spent much more time writing this up than I did actually doing it. I just wanted to capture the satisfaction of turning a tedious task into the much more interesting task of automating the tedium. I also think it will be fun to read this in a year or two, when presumably this kind of thing is built into Office and kids point and laugh at the old people who gasped in wonder at the capabilities of primitive LLMs. Even in that bright future, though, Concur will be a piece of shit! Fuck you Concur!!