John R. SchullerBlogLinkedIn

what exactly are models?

03 July, 2023 - 7 min read

what exactly are models?

We hear a lot of talk about AI models these days.

You go scroll along any social media and the concept is thrown around so much, it's easy to allow it to enter your vernacular without really ever graspoing what it is.

What I'm going to do here then, is make as short and simple of a post I can make, to provide an easy way of understanding what the hell a model is. This way the next time you see one mentioned in the wild, you'll have a better basis of understanding what exactly it is, and what you should ask about it.

Because as this AI moment progresses, as the world of AI gets larger, models are going to become an ever more present part of daily life.

And you know what? They already have been, most of us were just unaware. Up until now, they didn't make it into the mainstream, and certainly not until GPT did they have an experience in such a place where anyone could interact with them directly, and intentionally, with just an internet connection and a web browser.

but first, a database

This association could annoy some people in the deep learning community, but I want a layman definition, so anybody off the street can be able to understand what it is. But, here goes.

The easiest way to understand a model, is as a database.

But, what is a database?

A database is an organized collection of information within a framework. For example, a menu for a restaurant, is essentially a database, organizing food into categories of courses, having a unique ID for each item, into a framework on paper.

A menu serves as a database of what is available at that restaurant, coordinated and categorized in sections, and each item is named uniquely so that when you order say a mac and cheese, it doesn't require several rounds of communication to know what you want ( or dissapointment when something else gets to the table ). You have your categories of appetizers, you have your main courses, you have your salads, drinks, all of that stuff to make it easier to traverse the database.

You can make a digital database in a computer that has all that information in there, and you know what? These days with point of scale systems, they are.

what's it a model of?

Now a model, is when a deep learning pattern is used to iterate, or put more plainly, go over, and over, and over, and over, a set of data in order to find and record associations within the data.

Hang with me because I know I'm already getting very technical, but I'm quickly going to come back to menus and burgers.

After a model is trained, which is what this over and over again is called, someone can put new data into the model and it'll come back with an answer which has the highest amount of associations, called a confidence score.

Let's keep with our menu example. We make a model that's built up from menus from several restaurant chains. Maybe when it's trained, it starts finding associations with the ingrediences and where they'll end up on the menu.

There's beef in the appetizers. There's also beef in the main courses, but there's not beef in any desserts. And when you ask it where should a new menu item beef tartar be placed?

It could see that it comes with sourdough, is meant to share and it could present you with it being most confident it's an appetizer. It can also say that it's not very confident in that, because it also scores high enough to be a potential main course.

Chicken tenders could show up like this too, being one of those peculiar items that could go just as easily to start a meal split among folks, as it could with one person on their lonesome, or before millenials grew up, exclusively no the children's menu.

you only know what you know

This confidence score, this likelihood a menu item is an cocktail vs on the kid's menu, are all from the many paths all of the data points it was trained oncould take. It's worth remembering, underlining and putting in bold, what it presents back is the highest likelihood based off of the data it was trained over.

You see, what we forget right now, but is the ground truth for a model itself, is the results are entirely dependent on the data it was trained over.

What's the likelihood that something I put in there is going to end up on the appetizer list? What's the likelihood that something I put in there is going to end up in the desserts?

The answer to that from our fictitious model, would be completely tied to the data it trained running over and over, and what was in them. Throw a bunch of miscategorize data in there, and you could end up with spicy pepper dishes in your deserts. Or enchiladas next to chocolate cake.

visualizing a model

The easiest way to think about how a model is structured, is to think about graph paper.

Imagine for a moment you have a stack of graph paper ( or look up to the image on this post ).

The paper has squares that are all lined up on top of each other when the paper is in a stack. Well, just imagine each corner of those squares has a dot on it, and from those dots lines jut out and connect that dot to all the other dots on the paper above it and below it in the stack.

Now attempting to keep this as simple as I seem able to, we're going to say that each dot, to dot, to dot connection, is a representation of routes that the data can take, built up from that training run over and over that happened.

Dumplings ending up on the main courses, could be a dot to dot connection, with each of the ingredients being points, that when put together, represent the full menu item. Then when you feed in perogis or empanadas, enough is known about those associations of ingredients to give a good guess that it's another dumpling like dish, and should be a main course.

The larger the data that the model is trained off of, the more complex the connections it's asked to make, the larger the model will be.

And in essence, that's what a model is. It's not a sentient thing that's going to take over the world. It's not Terminator. But if trained properly off of data that already acts like the Terminator, it can be used to mimic the Terminator.

It does thiss off of assumptions. It can make guesses based off of a lot of different things, which is why we have these large language models, LLMs, like GPT, Falcon, Red Pajama, doing such amazing things.

Because they're fed in massive amounts of curated information, asked to take the data and train over and break down all the semantic structures of individual words, the relations they tend to make to other words, and build up a massive stack of graph paper for the routes it can take.

In the simplest way I've thought of yet, this is how you can copy and paste a menu for a restaurant with a prompt "can you just choose for me one appetizer, and one main course, for someone who wants to try something classic but a little adventurous", and have GPT provide a detailed response .

Certainly! Based on your criteria of wanting something classic but a little adventurous, I would recommend the following:

Appetizer: Baked Crab Cake - This appetizer offers a classic dish with a twist. The combination of blue crab, preserved lemon, and rémoulade adds a touch of adventure to the traditional crab cake.

Main Course: Lamb Chops* - For a classic yet adventurous main course, the lamb chops with roasted garlic and rosemary would be a great choice. Lamb chops offer a unique and flavorful dining experience that is both delicious and slightly adventurous.

These choices allow you to explore flavors that are both familiar and intriguing. Enjoy your meal!