Quick and dirty hack to automate tedious expense report task

anelson October 18, 2024 #hack #genai #claude #productivity

I really dislike business travel, but I really REALLY dislike expense reports. They are always so tedious, made worse by user-hostile war crimes like Concur (cursed be its name!). My most recent trip from Kyiv to Boston and back, however, presented a particularly unappealing task: nearly a dozen receipts spread across four currencies and two languages. Thankfully Elastio is not yet a big enough company to waste money on Concur licenses, and we still have an informal process whereby we email receipts to our accountant. However, for her sanity and mine, I needed to make a spreadsheet organizing each expense, including its description, amount, date, and some reference back to the PDF containing the receipt (recall that these receipts are in multiple languages). The prospect of manually entering all this data into a spreadsheet was, to put it mildly, unappealing. It would take me at least 10 minutes of tedious labor.

So I decided to spend 15 minutes fucking about with Anthropic Claude Sonnet 3.5 to make it do it for me!

The Problem

Here’s what I was dealing with:

11 individual receipts
4 different currencies (USD, EUR, HUF, UAH)
Documents in both English and Ukrainian
A mix of formats: hotel bookings, flight receipts, train tickets, and taxi fares

I can read the languages involved, but our accountant doesn’t read Ukrainian, and I’m not sure what our rules are for currency conversion.

I decided to leverage the power of LLMs, specifically in this case Claude 3.5 Sonnet.

The obvious solution that doesn’t work.

At first I thought this would be extremely trivial. I just dragged all of the PDF receipts over into Claude’s web UI and…

Claude says “no”

But I have more than 5 PDFs! FML!

The Quick and dirty solution that does work

When it comes to being lazy, I can be very persistent.

I invoked GPT-4o from my terminal (this would have also worked with Claude) to find out what the terminal one-liner to concat a bunch of PDFs would be. It came up with:

pdfunite *.pdf output.pdf

But I don’t seem to have pdfunite in my path. A follow-up question revealed that it’s part of the poppler Brew package on macOS:

brew install poppler

Having run pdfunite successfully, I had concatenated a bunch of small PDFs into one big 8MB PDF with all of my receipts. I wasn’t sure if Claude would be smart enough to understand that this was in fact multiple receipts, especially since some of them were multi-page receipts (Bolt, why!??). To ensure it had plenty of context, I used eza --oneline --no-icons *.pdf to make a list of all of the PDF file names that went in to the monster PDF, and hoped for the best:

From the attached PDF containing all receipts from my recent business trip, generate a CSV with the following:

Date of expense

Description

Amount

Currency

I had to concatenate all receipts into one PDF file due to limitations on attachments. This PDF contains the following individual PDFs:
[List of PDF filenames produced with `eza --oneline --no-icons *.pdf`]

After getting the initial results, I was pleased to discover that my hack actually worked! But the description didn’t quite match the file names, and yet the description was more informative than the file names that I used, so I decided the Excel sheet should contain both the PDF file name of the receipt, and a description of the expense. That was easy:

Add another column, after “Date”, called “Filename” with the name of the PDF containing that particular expense, from the list I provided earlier.

The Results

The LLM produced a neatly formatted CSV that looked like this (most rows redacted):

Date,Filename,Description,Amount,Currency
2024-09-26,2024-09-24 Train from Kyiv to Western Ukraine.pdf,Train from Kyiv to Kolomyia,1689.65,UAH
2024-09-12,2024-09-12 BUD to BOS Flight Receipt.pdf,Air France flight from Budapest to Boston,996.10,USD
2024-10-12,2024-10-13 Taxi from BUD airport.pdf,Taxi from Budapest Airport,12630.00,HUF
2024-10-19,2024-10-19 Train from Western Ukraine to Kyiv - Part 1.pdf,Train from Uzhhorod to Kyiv,858.98,UAH

To my amazement, Claude correctly figured out which original PDF filename went with which expense in the concatenated PDF, even though there was nothing in the concatenated PDF to link any particular page to its original file name!

This quick hack saved me at least 10 minutes of mind-numbing tedium, and took no more than 15 minutes including fucking around with the PDF concatenation. If this were a thing I had to do regularly, I could get Claude to package that up into a shell script and it would take a minute at most to prepare the spreadsheet.

In fact, I’ve spent much more time writing this up than I did actually doing it. I just wanted to capture the satisfaction of turning a tedious task into the much more interesting task of automating the tedium. I also think it will be fun to read this in a year or two, when presumably this kind of thing is built into Office and kids point and laugh at the old people who gasped in wonder at the capabilities of primitive LLMs. Even in that bright future, though, Concur will be a piece of shit! Fuck you Concur!!