REST API calls (including calls to an LLM API) are I/O operations. Blocking by default, but optionally asynchronous and non-blocking.

LLMs are stateless, and multiple LLM calls in an app are often independent of one another. Not always, you may have a complex flow where one calls builds on output from the previous one, but in many cases the calls are independent.

When multiple LLM calls are independent of each other, they can easily be run in parallel using async I/O, which saves a lot of time.

I already knew this, but I somehow forgot, so publishing a quick note to make it extra embarrassing for myself to ever forget this again.

In MkFlashcards, multiple “chunks” of text are processed in independent LLM calls to generate flashcards based on the text in the chunk. An earlier naive implementation had these executing serially, with the complete set taking longer the more text is being submitted. I fixed it in 627c7d41174c28b1ad7342b11f3347ce452ce734 by processing all the chunks in parallel using async I/O. Now the speed for processing texts of any length is almost fixed (almost, because there’s one single LLM call that takes a larger part of the text).

For a very simple implementation of this pattern, reference this:

import asyncio | |

from openai import AsyncOpenAI | |

import random | |

oai = AsyncOpenAI() | |

numbers = [str(random.randint(0, 123456789)) for _ in range(7)] | |

async def explain_number(number): | |

print(f'START: Explain the number {number}') # DEBUG | |

result = await oai.chat.completions.create( | |

model='gpt-4o-mini', | |

messages=[{'role': 'user', 'content': f'Explain the number {number}'}], | |

temperature=1.0, | |

max_tokens=23, | |

) | |

print(f'END: Explain the number {number}') # DEBUG | |

return result.choices[0].message.content | |

async def main(): | |

tasks = [explain_number(number) for number in numbers] | |

explanations = await asyncio.gather(*tasks) | |

for explanation in explanations: | |

print(explanation + '\n---\n') | |

asyncio.run(main()) | |

########## | |

# Output # | |

########## | |

# START: Explain the number 67124777 | |

# START: Explain the number 116823779 | |

# START: Explain the number 43996485 | |

# START: Explain the number 2485252 | |

# START: Explain the number 20357363 | |

# START: Explain the number 19815246 | |

# START: Explain the number 41277191 | |

# END: Explain the number 67124777 | |

# END: Explain the number 43996485 | |

# END: Explain the number 41277191 | |

# END: Explain the number 2485252 | |

# END: Explain the number 20357363 | |

# END: Explain the number 116823779 | |

# END: Explain the number 19815246 | |

# The number 67,124,777 can be understood in different contexts, such as mathematics, numerology, or | |

# --- | |

# | |

# The number 116823779 is simply a numerical value and can be examined in various mathematical contexts. Here are a | |

# --- | |

# | |

# The number 43996485 can be analyzed in various ways, such as its mathematical properties, significance, or context | |

# --- | |

# | |

# The number 2485252 can be analyzed from various perspectives: | |

# | |

# 1. **Numerical Properties**: | |

# - | |

# --- | |

# | |

# The number 20,357,363 can be analyzed in various ways depending on the context you want to explore. | |

# --- | |

# | |

# The number 19815246 is simply a numerical value, and without specific context, it can represent various things. | |

# --- | |

# | |

# The number 41277191 can be interpreted or analyzed in various ways depending on the context. Here are a few | |

# --- |