Back

Building internal AI tools with Streamlit

Published onAugust 03, 2024

2Views

4Minutes Read

Building internal AI tools with Streamlit

Most companies have a ton of valuable data internally. This could be analytics data on your customers interactions with your product. This could be an audit log of actions taken within the product (which is another way to see when different features are enabled).

Even if you are a small startup, you likely have valuable data in the form of support tickets which can show you the areas of the product that need the most attention. You also likely have feature requests scattered throughout those support tickets.

Pre-LLMs, trying to extract insights from any data required specialized knowledge. You often needed to train your own model, which meant feature engineering & NLP, choosing a model, and most onerous of all… gathering your own training data.

Nowadays, you can just write a prompt like:

Categorize the following ticket using these categories: Uptime, Security, Bug, Feature Request, Other

{put ticket here}

And voila, you have a classifier and it’s probably decent without much tuning (although you absolutely should modify it).

What we’re building today

Users should be able to:

Log in, so we know who they are and what data they have access to
Write a prompt. In this case, it’s for a ticket classification system.
Test the prompt on some sample data and see the output (including errors).
Save the prompt for others to use.

Step 1: Loading and visualizing our data

What good is a data application without data? To get a sense for what we can do, let’s start by hard coding some data:

Streamlit does have built in support for a number of data sources. For example, if we wanted to connect to Postgres, we’d first tell Streamlit how to connect to our Postgres database via the

file:

We’d install

(which is needed to connect to Postgres).

And then we can update our

function:

Similarly, we can connect to Snowflake or even a Google sheet. It’ll always end up as a dataframe that we can easily visualize.

A brief note on caching

The

call has a built-in caching mechanism in the form of a TTL. There are however, two other options for caching: st.cache_resource and st.cache_data.

is commonly used for caching connections so you can use it for caching the database connection or for the OpenAI client that we’ll construct later on.

is commonly used for caching the result of expensive queries. You can annotate the

function which will speed up subsequent requests to load it.

Step 2: Running our data through the prompt

We started by taking in a prompt from the user. Then we loaded the data. Now it’s time to execute the prompt on that data.

For us, we’re going to ask our users to make sure their prompt outputs valid JSON with the form:

We can do a fairly simple transformation of our dataframe which adds 3 columns:

, and

(in the event that something went wrong).

Easy enough! The big question now is… how frequently do we want this to run? We’ve cached the most expensive part of this (both time and money-wise), which is the call to OpenAI.

But, we want to be careful to not run this every time someone makes a small change to the prompt. The easiest fix? Let’s just add a button to trigger the re-running of the prompt on the data.

Streamlit makes this really easy:

will return True when the button is clicked and will modify the Dataframe to add new columns. A user that has never clicked

will see just the unclassified sample data for inspiration.

And that’s… most of it!

A user can open up our app, view some sample data, write a prompt, and see the results of running that prompt on the sample data. The only thing that’s a bit scary, is we haven’t added any authentication. Any user can interact with our data and we don’t know who wrote which prompts.

Step 3: Adding authentication

One quirk of Streamlit today is that authentication is difficult. The out of the box options aren’t great for a company building an internal app, where you might want sign in with Okta or Azure via SSO/SAML or you might want to make sure 2FA is required before using the application.

PropelAuth is a great fit here as PropelAuth provides full authentication UIs that you can use directly with Streamlit.

Looking to set up a really basic application with just email and password login? The code will look like this:

Looking to set up an advanced application that requires users to log in via SAML? The code snippet is the same!

For a full guide on how to get started, check out our documentation here. The gist is that we will create a file

which will export an

object.

At the top of our script, we just need to make sure we can load the user or stop the rest of the script from running. We can then user the user’s ID in any of queries to the data to make sure they are only seeing data they have access to.

Step 4: Saving the prompt

As an optional last step — let’s say we wanted to provide a way for users to save their prompts. We can re-use basically everything we’ve learned so far and make this really easy:

Summary

In this guide, we’ve explored how to build powerful internal AI tools using Streamlit. We’ve covered loading and visualizing data, running prompts on that data, adding authentication with PropelAuth, and even saving user-generated prompts. By leveraging these techniques, you can create robust, secure, and interactive data applications that harness the power of AI for your organization’s specific needs.

Tags:

#Python

Comments (0)

No comments yet. Be the first to comment!