In Part 1 of this series I outlined the high-level ideas supporting the use of LLMs for transforming product-level carbon accounting methodologies into software specifications. I believe that by creating functional specifications for Product Carbon Footprinting methodologies, solution providers can assist organisations wishing to create reliable primary data through PCFs, and rapidly accelerate decarbonisation action at the lowest possible cost. It is my (and by proxy, ZeroTwentyFifty’s) ambition to scale the use of Product Carbon Footprints to get the planet to Zero by 2050 through the use of granular data access to guide informed embedded emissions reductions.
Come with me as I break down how to interact with an LLM for the purposes of using them for Product Carbon Footprints, including defining the outputs we want, how we define success and how to manage some of the pitfalls of working with LLMs.
ZeroTwentyFifty’s use of LLMs
I have used LLMs extensively whilst building ZeroTwentyFifty, they have provided benefits through code understanding, documentation help, the enhancement of PACT technical specification drill down and the production of high quality code. I am a very large proponent of LLMs in both their use and helpfulness in the development of software and their vital role in the wholesale reduction of trivial issues that are usually the bane of a programmer's working experience.
This is not to say that they don’t have their problems, I have had “conversations” with LLMs that have driven me insane, and anyone that has attempted to use a text-to-image model and has asked for there to be no text in the image knows that their selective hearing is akin to a dog immediately after you use the word “walkies”.
Overall though, corralling and managing an LLM is a skill like any other, and with practice and a bit of education (of both man and machine) the efficiency, speed and quality benefits of an LLM-enabled development workflow can result in hard to believe shifts in the way engineering is performed.
How do I see the process of using an LLM here working?
My initial concept for this article was to show readers a practical method for how to break down a PCF methodology, including all the prompts, source documents and otherwise. However, after I wrote a draft and then had conversations with some people far smarter than I (thanks Martin), it became clear that this piece was going to be a scene-setter for the ways you may want to structure your conversations with an LLM before simply jumping into it all starry-eyed.
Here is a rough guide of that process:
- Choose the source document
- Define our outputs
- Craft a prompt
- Feed the document to an LLM of our choice
- Get the outputs
- Iterate on this process, refeeding back into step b and c until we get a clean output
I think this makes for a fairly reasonable order of events, and I think that step 6 is going to occur a few times. As a side note, I suspect that there will be some issues around document loading as well as retrieval to and from the LLM.
Choosing our Product Carbon Footprinting Methodology
Now I am going to once again have to call Martin out, because I had grand and illustrious plans here to be talking about a specific PCF Methodology. I had planned a significant chunk of the article around this decision, based on an assumption that I had. During the course of our most recent conversation I mentioned that I was working on this article, and that it was going to form a large portion of the product roadmap for ZTF and he simply stated that he didn’t believe that my choice reflected reality, and that his conversations with practitioners of GHG accounting had led him to that opinion. So if you wish to thank anyone for this article actually being a series, thank him.
This is all to say that I have scheduled a new piece of writing covering Product Carbon Footprinting Methodologies. Honestly, it’s a bit overdue, so keep your eyes peeled for that. Maybe you want to sign up for the newsletter here?
Defining the outputs that we want from the process
I believe that an easy way to get garbage out is to put garbage in, and whilst this statement could easily be applied to my love affair with McDonalds Cheeseburgers; in this situation, I am referring to LLMs. I feel comfortable in stating that I am confident in my ability to interact with an LLM, and have seen first hand the type of detritus that can be produced by a model if you’re feeling particularly blasé in your attempts to explain yourself. Much like your personal relationships, good communication is key to not being misunderstood or offended at what such bad communication inevitably produces. Take this statement as a professional or personal tip, whichever you’re most needing right now.
So in order to alleviate this issue of maybe getting back some bad quality output, let’s define what success looks like for this project. If we gave our LLM all the inputs we have, and our minds were blown by its first response, what exactly would that look like?
The functional specification itself
Naturally, the output we want from the LLM is a complete and practical functional specification, with some core requirements:
- Completeness: We don’t want it to miss things that are included in the standard itself, it’d be a pretty terrible outcome to unintentionally omit core elements.
- Assurance: We’d ideally like to not have to check over it with a fine tooth comb, because at that point we could have just translated from the standard and produced the functional specification ourselves.some text
- Side Note: I believe that both this point and the first will not be a one-and-done type arrangement, hence my positioning of step 6 in the original process layout.
- Tone: We’d like it to be engaging, not at all dry. Users need to be able to read and understand it, should they wish to verify against the original.
- Verifiability: We need it to have direct references back to the undoctored source, in order to make it easy to fact check.
A standard format for specifications?
Now I’m going to say something fairly unsettling here, so prepare yourself, this could only come from a warped mind, but is there a specification for specifications? And if so, would we even want to use it?
Honestly, I could talk about this a bit more but I don’t think I’d do it justice and being frank, I’d be parroting what I’d read from Spolsky. I’d recommend giving this part a read, from which I have taken what I consider to be the most useful excerpt:
“A spec is a document that you want people to read. In that way, it is no different than an essay in The New Yorker or a college paper. Have you ever heard of a professor passing out templates for students to write their college papers? Have you ever read two good essays that could be fit into a template? Just drop the idea.”
Surely Spolsky’s 4-part series gives a template or format for us to use?
So, first, read one paragraph back, and second, not really. However, he does give a list of tips/rules for us to use when coming up with our own specs, so let’s aggregate those.
- Be Funny
- Write for humans to understand
- Write as simply as possible
- Review and reread several times
- Don’t use templates
Can we rely on our LLM to produce a good quality format?
I honestly do think so, but without meaning to sound like a broken record, it is going to come down to what we give it. So the next most important thing once we have defined our outputs and what we want from the model, is to craft this into a coherent prompt for us to give to the LLM, which we will be doing very soon!
What file type do we want the functional specification to be in?
If the intention of all of this work is for it to be read by people and understood, then we need it to be well formatted, easy to read, and in a format suitable for sharing, web viewing and editing. I believe that the simplest option here is markdown, so we’re going to go with that for the time being.
Crafting a prompt
Coming up with a good quality prompt when interacting with an LLM is the single most important and effective way to get good quality output from a model.Having a record of your prompts is also useful because you can, with slight adjustments, iterate to a very good quality prompt over time. This process is known as Prompt Engineering, with wikipedia providing the following definition for it:
Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative artificial intelligence model.
There is an absolutely excellent guide to Prompt Engineering from Digital Ocean, and as I have done already in this piece, I will probably just refer you to it instead of trying to rehash it, however, I am going to list the best practices at a high level here, so we can have a conceptual structure for our work.
Prompt Engineering Best Practices
- Be as specific as possible.
- Supply the AI with examples.
- Get better answers by providing data.
- Specify your desired output.
- Provide instructions on what to do instead of what not to do.
- Give the model a persona or frame of reference.
- Try chain of thought prompting.
- Split complex tasks into simpler ones.
- Understand the model’s shortcomings.
- Take an experimental approach to prompting.
Are there any important considerations when crafting our prompt?
Yes, a primary consideration to keep in mind is that our source document is a PDF file, and this will 100% cause some problems. There are countless posts about the difficulty of PDF querying, with many of the major platforms coming to support direct loading of PDF files over the last 12 months.
Choosing an LLM for our use case
The defining element of what we are trying to do here is create a usable output from a pre-existing PDF document, so it stands to reason that we need to pick the right LLM for that. However, if we define our prompt, input and outputs correctly, we should be able to have a fairly transferable set of artefacts, enabling us to move around and try the various different vendors, and see which one we get the best results from.
In light of the tone of this article, however, we are going to be deferring much of this to follow up pieces in the series, otherwise this article will become a behemoth. I intend to dig deeper into:
- PDF loading - How effectively we can work with PDFs without having to do much additional work formatting them into an intermediate format.
- Context windows - How much data we can insert into the model, this is a semi serious consideration given we have a 150 page PDF, along with a prompt, and a need for output, all of which take up space in the context window.
- Tools - The most recent development that is rapidly gaining traction, basically interfaces to additional features that models can use contextually.
We are going to be focusing our attention largely on three providers, all of them the most popular options. However, as mentioned above, if we get all of our ducks in a row, it won’t really matter later on if we want to expand our search towards other AI platforms.
- ChatGPT
- Google Gemini
- Claude
What is done with the outputs of our chat with the LLM?
Once we’ve finished our work of defining our inputs, outputs, prompt, chosen our model and actually gotten a response back, we will also need to put the results somewhere.
The core requirements I have for this are:
- I want it to be in a friendly and editable format.
- I want it to be easily viewable and accessible by anyone.
- I want it to be easy to host.
With this, what I currently see this looking like is as follows:
- There will be a new area on the zerotwentyfifty.com website, for example “example.com/specs” which hosts the functional specifications. I will then post them as we break down the methodology and progress the design plans.
- There will be a new section in the ZeroTwentyFifty documentation hosted on our Github (where exactly is still up for discussion and debate), and I am honestly wrestling with the idea of building a monorepo for ZTF’s projects to operate from. If it’s good enough for Google, why not for ZeroTwentyFifty?
Iterating and improving on the process
There will be times during this process when we are going to need to track backwards through our process and touch up our inputs, prompts, requirements and anything else we’ve defined along the way. So taking a moment to consider when this may be the case is probably wise. Largely I think it will depend on conditions existing that do not align with the “success criteria” listed in the “Defining our Outputs” part of this section. Here are a few more specific examples of this:
- Completeness: Are there core elements that are missing? Does it use the definitions provided by the document itself?
- Assurance: Are we having to repeatedly keep checking for things on follow up runs? Has it hallucinated information?
- Tone: Has the output got the wrong tone? Too robotic? Too stern? Too casual?
- Verifiability: Does the output fail to identify and insert clear references back to the source material for easy lookup?
From a more anecdotal point of view, backtracking will be required in the case of hallucination or when the model seems to be displaying a misunderstanding or ambiguity in the face of the requirements. This will be slightly more difficult to pick up on and I suspect will require knowledge of the source document, this is where domain expertise comes in.
How do we adjust when we are faced with a less than satisfactory response?
Without “hard data” from performing the steps listed in our plan, this will be a finger-in-the-air type of answer right now, but I imagine most editing will occur at the levels of:
- Editing and better clarifying the prompt, seeking to drive further “compliance” with the best practices listed our 10 dot points and further elaborated on by the Digital Ocean article.
- Providing more clarity on our outputs.
- Moving to a different LLM provider.
- Finding better ways to load the PDF.
In my mind this is also the hierarchy for “ease of edit”, with point one being the absolute easiest thing to adjust with relatively small amounts of time input, and with point 4 being the thing I want to do least. In reality, if we get to having to find better ways of loading the PDF, we’re probably going to be writing code from scratch and adjusting a custom LLM to suit our needs, which will be a significant time and resource investment.
The ethics of using LLMs for Climate work
The current conversation around LLMs ranges from blue sky hype paired with utopian dreams of a future driven by an benevolent AGI, and the stark reality of the energy and resource hungriness of the training of AI models. This leaves us in a twilight zone regarding the use of LLMs for climate work, with recent developments promising Small Modular Reactors (SMRs) for the energy supply needs of large data centres powering the training and serving of popular AI models. These developments have been dually heralded as a saviour and possible test bed for future developments in SMR technology. Simultaneously the news has been derided as an attempt to simply maintain the status quo, with a belief that there will be no impact on pre-existing demand for energy, and as such will not have any meaningful effect on grid shift, due to the ever increasing demands soaking up any clean energy generated by the projects.
A personal viewpoint
More recently I’ve attempted to provide more questions or hypotheticals in lieu of more rigid opinions. I think that it’s important to maintain flexibility in this space, as dogmatism in a fast moving environment will inevitably leave you stranded. So I would instead pose my dear readers a few questions regarding the use of LLMs in the pursuit of effective climate change solutions. In many ways it does tap back into the idea of Product Carbon Footprinting, however I don’t have the necessary data to back anything up, only a flimsy version of the dreaded thought experiment, truly the last bastion of a mind bereft.
To double down on my intellectual non-committal, I will use the words “it depends”. It depends on what you’re using the technology for. I believe that if your intention is to scale emissions solutions that will over time produce an accretive and significant reduction in global CO2 levels, then you can be justified in using LLMs. However, I also believe that if you asked a person from each industry and sector whether their use of LLMs is justified and a good use of the energy, they’d probably also say yes, so I don’t know how much weight my statement really holds. I also think that whilst LLMs are excellent for helping with domains 6 months in the past, they are fundamentally less useful for domains at the edge, because of the training data gap. Try asking a model for help with a major programming library change, the hallucinations an LLM is capable of producing are often applaudable in how bad they are. This is but a small example, stretching it to more complex issues only furthers the argument.
Final Thoughts
In Part 1 of this series I outlined the benefits and high level ideas supporting the use of LLMs for translating a modern Product Carbon Footprint methodology into a software specification in order to improve the uptake and reliability of product-level carbon accounting. In this blog I’ve covered how we intend to structure our process for working with an LLM.In our next piece, I will use this process to craft the real inputs that we will use to generate a meaningful functional specification from an LLM of our choice. I hope to educate readers throughout this series by showing how LLM’s can be used to advance the aims of product-level carbon accounting and translating those advancements into meaningful carbon emissions reductions through more reliable and well designed software.
If you’ve resonated with this article, I’d really appreciate you sharing this article on whatever platforms you use. Alternatively, you can follow ZeroTwentyFifty or add me on Linkedin. I release all writing on our free newsletter. You can also book a 30 minute no-obligation call with me.