I Taught a Robot to Read the News So I Don’t Have To
Newsletters are dead! AI is now the first and last word in the daily news. If you are a journalist or a content creator, it is time to find a new line of work.
By now, you have probably heard a declaration like this in the context of every field and every job. Maybe one day, AI optimists will be correct, and computers will do everything for us. We aren't there yet but we are getting closer.
I have spent the last few months working to build a newsletter compiled and written entirely by AI. This newsletter grew out of my frustrations with product management news. Product is a field that is rich with long-form and strategic content. But searching for daily product management news doesn't return much. And I wanted to consume this content, not create it. So I decided to see if AI, specifically LLMs and GPT could live up to their hype. Could they serve me daily news, that was relevant to product managers, and explain to me why I should care? So far the answer is maybe. It is not perfect, but it does as well as I could.
A lot went into creating the newsletter, if you are curious to learn more, you can check out the README on GitHub. But I did want to give some details on how I used GPT. I took some conventional and non-conventional approaches. LLMs are often misused but when properly applied, they can do some incredible things. So let's look at some of the ways I used GPT in the newsletter.
Getting started
For everything I am going to describe here, I used either the gpt-3.5-turbo or gpt-3.5-turbo-16k APIs. The non-16k model was desirable because it was the best mix of performance and cost. For portions where I needed to process a lot of content, the 16k model can handle more tokens, as implied by its name.
I first started testing the concept for this newsletter from my phone, while on vacation. That meant I didn't have a lot of screen real estate or great tooling. But I was able to test the basic premise. I passed GPT the text of an article and asked it to tell me: if it was relevant to product management, and if it was, to provide me with a summary of the article. This worked better than I expected. But I learned at scale this approach was not reliable. Asking GPT to do too many things at once leads to some weird results. So I broke that one prompt up into a few. And then as the newsletter got more complex I added a few more along the way. Let's dive into each of those prompts.
Screen for Relevance
First off there are some topics that I do not want to include in the newsletter. The topics are all based on personal preference, they may be relevant to other product managers but didn't make sense for my needs. The nice thing about using a prompt is that you can turn topics on and off by updating the prompt. Someone could take these prompts and adjust them to a different domain or area of interest and have a newsletter personalized for them.
Then I asked it to review individual articles and determine if they are relevant for product managers. This prompt also asked it to summarize the relevant ones but I moved that to another part of the flow. As I mentioned, asking it to do too many things at once was leading to weird outputs.
Key learnings:
GPT is good at returning a consistent result when you give it certain options to return. I was getting prepared to write a function to catch all the possible responses but I have yet to see it return anything aside from the options I gave it.
Break things up. These used to be one prompt and I split them up. Asking it to filter topics and individual stories in one shot was leading to odd stories being selected. And lots of the items I told it to ignore were also being included.
GPT is solid at categorization exercises like this. If you can give it clear directions, it will do a great job of evaluating content.
Summarization and Intro Generation
Once articles have been vetted for relevance we send the text to GPT and ask it to create a summary. There is a limit to the amount of text that can be sent and many articles surpass the limit. The options were either to use the more expensive 16k model or limit the amount of text per article. Doing some testing I found that sending partial text still led to quality summaries. Each summary details what the article is about and explains why it is relevant to product managers.
Once all the summaries are generated they are all passed to GPT with a unique theme for the day. GPT uses the theme and the stories for the day to generate a unique intro for that day's newsletter.
Key learnings:
This type of work is the stuff that LLMs are made for. If you have large volumes of data that need summarization there's no reason not to use GPT or other models.
GPT can use emojis and understand which ones make sense in context. I added this to the prompt as a bit of a joke but have been surprised with its ability to use them in context.
Talk to GPT in the way you want it to talk to you. In my first attempt at the prompt for the intro, I told it to "use a casual and fun tone". I took that prompt to ChatGPT to ask for improvements and it told me that instead of asking for that tone, I should write the prompt in that tone. It re-wrote the prompt for me, and the intros are much more in-line with what I was after.
Deduping and Domain Authority
In the initial scrape of articles, more than one article will likely cover the same story. A recent example was the launch of Threads. There were at least 10 articles on this topic pulled into the newsletter flow. When this happens I do not want to include them all in the final output, I only want to include one. This leads to a need for two things to happen.
First, I need to identify the duplicate stories. To do this, I send all the URLs for stories to GPT in one payload and ask it to dedupe based on the keywords in the URLs. I found a way to do this through scikit-learn which is likely a more technically sound approach. But the whole point of this was to test the limits of GPT. I also tested sending the text of the articles with the URLs to help in deduping. But I found that GPT did almost as well with just the URLs, so I pulled the text out to keep things simple. Once it has a list of deduped stories it returns the URLs to include in the newsletter.
The second thing to think about with duplicates is deciding which source to use for a story. I looked at using a Domain Authority tool like Moz or Ahrefs, but they were expensive. Instead, I asked GPT if it knew how tools like Moz generate their domain authority scores. It explained to me how they do so and generated some sample scores. I updated the prompt to include instructions to use domain authority when selecting a source.
Key learnings:
If you can get creative with GPT it can save you a lot of money. Domain authority APIs can run hundreds of dollars a month. While they may do better at the same task the price-to-value trade-offs are astronomic. The entire daily flow for the newsletter runs between 1-2 cents. That is for everything, not just approximate domain authority score.
If you intend to send GPT things that aren't plain text or ask it to send you responses that aren't straightforward be ready to get creative with parsing. In the case of the URLs, I send them as one long string with commas separating them. I explain the format in the prompt and GPT can parse the URLs. When it is done processing it sends them back to me as a string. But, it would sometimes include a '\n' between each URL. In the prompt, I told it not to do this but it would still do it occasionally. So I had to change my parser for the responses to account for the fact that this might happen. For the most part, I had great success with GPT returning responses exactly as I ask. But when you get into complicated formats it may not always function exactly as expected.
All these pieces get combined into a flow that looks like the above. The result is an email that looks like this recent version of the newsletter.
This is a great time to be building things. The speed and availability of technology make it possible for everyone to be a builder. Incredibly powerful tools are available to anyone willing to put in the effort. They let someone like me, generate content each day with no effort and almost no cost (less than $1 a month). And while I did this for product management, it could be done for any topic. It would just take a few changes to the sources and prompts. So take a chance on AI and have it build a newsletter for you. If you want to learn more about how I built the entire flow and build your newsletter check out GitHub.