🐳 A brief overview of DeepSeek 💭

Game Changer or a Risk?

what is open source?

Open source put very simply, means you can access / view the code that is used to run the application. It typically means users can view, modify, and distribute the code according to the license. If you circumvent the license gets into ethical and legal considerations.

There’s a varying degree of open source licenses too, from what you’re agreeing to do with the software, or if you can take the code and use it to power another application that you’re charging money for.

open source models

Open source Machine Learning models, are where you can see the data they used to build the models, you can download them to use the model ( in the manner agreed to in the open source license like MIT, and are able to see how the models were built.

Some models, limit if you’re able to plug them into a commercial application, and when you use a platform like HuggingFace to download them, you have to agree to terms before taking them. Others, like Meta’s Llama, you have to be given access to them– even if they are open source in the sense that their research, dataset and means to build are available. You’re not just able to use them however you’d like, or get them with no gate.

Something that's slippery with the definition of Open Source in the context of particularly GenAI models, is being able to see the full source code and datasets that are used to build them. Timnit Gebru rightfully points out in a recent post, that the vast majority of these models don't make the dataset they're trained on publicly available, nor the code. I think valid speculation is due to them using a non-trivial amount of copyrighted or scraped in an unethical way.

what about DeepSeek’s models

DeepSeek is an open source model, by the standard that these open source models have come to be judge by. That being, they’ve released the research and the models are free to download ( so long as you provide your contact information on HuggingFace to get access ).

( Like all of these open source models ) there are restrictions as to how you use it, and in what way. You can use it in a commercial application so long as you agree to their terms. Which honestly, are very broad and easy to comply with.

In short, just document any changes you make to the model, let it be known you’ve modified them and be up front with what model may be powering your app.

what they did is impressive | Necessity is the mother of invention

The whole crowd in the AI field is moving towards one direction, in which having massive racks of GPUs and a fleet of people available to you from venture capital injections, the way things are being done seems like the way they need to be done.

Driven by many factors, namely a limitation on the availability of GPUs from sanctions, DeepSeek was able to train differently by having, in short, a more efficient means of training. They trained on not as state-of-the-art GPUs ( Nvidia H800 vs the standard Nvidia H100 ), but something being missed here is that they still trained on 2,048 GPUs. Which puts into scale how many GPUs are being used to train some of these other models, tens of thousands of H100s, which just think about for a second the power draw there.

It’ll make a whole lot more sense as to why Mark Zuckerberg or Sam Altman are talking about the limitation in model training being electricity, and having a desire to build power plants dedicated to a data center. That is, if they see the way they’re doing it as the only way– something DeepSeek has proven isn’t the case.

Which is always the case in anything, and something I find myself having to mention to clients. There are many paths to a destination, but as humans when a group of us catch a drift we forget that and move together thinking it’s the only way.

Some of this comes down to training efficiency, which depends on more than just having the right type or number of GPUs. Factors like model architecture, data quality, training duration, and optimizations play a significant role in training costs and performance. I’m painting a broad brush, but trying to clear some of that noise.

Which brings up another way they did things differently. In their training runs they didn’t have a human come into the loop to label data, or provide context to the model as it iterates over it to build. Coming in with labels on the data to better identify the model as it’s training what the data is, or Supervised Fine-Tuning as it’s called, is considered the best practice ( at the moment ).

What they did instead, is something similar to what AlphaGo Zero did to learn chess by playing itself over and over again, called Reinforcement Learning. Doing this for LLM development is somewhat novel, as it’s been done before but not to the effect that this model is able to accomplish with it.

With both the hardware they trained on, the output performance they got, and being able to do it with less resources is something to applaud. ⭐️

now what about their API

DeepSeek, like Mistral, is open source but also provides an API that you can call if you don’t have your own hardware, you want to build this inside an existing app, or it just makes sense for you.

This is where some of the criticisms of DeepSeek will start to unfold, and where I think they’re valid ( but also should extend further than just an adversarial owned company ).

If you’re going to send data through their API, you should be very careful as to what that data is. If it’s proprietary information about your company, industry, or someone’s personal information, you should refrain from sending it.

Not just because they’re a Chinese company that has requirements to the CCP as to how the data is shared ( which they do ) but because your data is valuable, you have agreements with your customers as to what you do with their data, to say nothing of government requirements where you’re operating.

Though, this isn’t any different from what advice I give clients and strangers on LinkedIn from any other platform. The nuance as to why DeepSeek is different, is something that anyone who’s been operating within China would know already or speak to at greater length, but I would caution any company from providing data over to OpenAI or Claude or anyone else as well.

You may find a super nifty solution for your business vertical by doing so that you can model out super quickly with an API, but you also may be handing them a use case to build themselves, not to mention data they can use to train.

There’s ways you can avoid this with OpenAI by going through Azure to encrypt your requests, and other agreements you can have with providers from AWS etc, but as the CEO of HuggingFace has ( rightfully ) been touting, you should host your own models because it’s cheaper, you’ll have more control over the quality ( and if they break, which I can tell you from implementing OpenAI they change their model more than they admit ), and can swap it out for better tech as it comes available on your own terms.

Like a new model that comes out which blows the existing models out of the water, something that is going to keep happening for the foreseeable future.

Should you use DeepSeek?

One point that should be considered heavily, is censorship. As more people use it this will become more known than it is now, but similar to what’s become known with the Falcon models, if you ask it questions to DeepSeek like is Taiwan a country or any of the number of items the CCP would like to spin, it will give you an answer they want it to give. Which to a Western audience will not likely match up with what you’d expect it to give.

Only a few days into this, and people are already noticing the difference between the API / chatbot interface on the web and the model itself. Turns out, at no surprise, the model DeepSeek hosts is more heavily censored than if you downloaded it and used it locally. For many tasks, I’d imagine that wouldn’t be an issue.

From the time I spent studying in Taiwan, I’m more versed than most in my field into the complexities of China, Chinese companies and how the CCP operates. With that as a setup as context, I still think their models could work for many instances and I wouldn’t advocate avoiding them altogether.

What they’ve built is very impressive and is going to send shockwaves into research and development into the field that are exciting. And if it makes sense to you, I’d give the same thoughts as using a Llama model from Meta– host the model yourself and use it within the license they provide. Fine tune it if it makes sense, but be cautious using an API, and consider that predicting how a Chinese company operates, especially in the digital domain, is going to be tricky.

If you have any questions on how to build anything with these LLMs, other ML or really anything, feel free to shoot me a message via email or LinkedIn.