March 11, 2025

Episode 410: 🧠🧐 Local LLMs: Open-Source AI meets Business Central 🧐🧠

1️⃣ In this insightful episode, the conversation explores the growing world of open-source large language models (LLMs) and their transformative potential when deployed locally. 
2️⃣ Guest Stefano Demiliani joins Kris Ruyeras and Brad Prendergast to break down the technical challenges and rewards of running models like DeepSeek on local hardware, from navigating hefty resource demands to leveraging techniques like quantization and distillation for efficiency. 
3️⃣ The discussion dives into practical business applicationsβ€”inventory management, autonomous AI agents, forecasting, and even image recognitionβ€”all powered by offline models to prioritize data security and cost control.
4️⃣ Listeners will discover how integrating these customizable, secure solutions with tools like Microsoft Dynamics 365 Business Central can streamline operations and unlock new efficiencies. From setup essentials to the collaborative future of AI agents, this episode offers a clear-eyed look at how local AI is reshaping business innovation with privacy, precision, and purpose.

Send us a text

#MSDyn365BC #BusinessCentral #BC #DynamicsCorner

Follow Kris and Brad for more content:
https://matalino.io/bio
https://bprendergast.bio.link/

Chapters

00:24 - Welcome to Dynamics Corner

04:05 - Introducing Local LLMs

11:14 - Meet Your Hosts: Chris and Brad

27:53 - The Benefits of Running LLMs Locally

48:20 - AI Applications with Stefano D'Amelio

01:52:55 - Future of Local AI Models

06:40:58 - How Local Models Improve Security

21:43:09 - Real

Transcript

WEBVTT

00:00:00.240 --> 00:00:04.051
Welcome everyone to another exciting episode of Dynamics Corner.

00:00:04.051 --> 00:00:09.029
What is Local LLMs DeepSeek Vy4?

00:00:09.029 --> 00:00:10.612
I'm your co-host, chris.

00:00:11.141 --> 00:00:11.882
And this is Brad.

00:00:11.882 --> 00:00:15.473
This episode was recorded on February 20th 2025.

00:00:15.473 --> 00:00:16.801
Chris, chris, chris.

00:00:16.801 --> 00:00:18.966
Local language models.

00:00:18.966 --> 00:00:20.850
Local large language models.

00:00:21.170 --> 00:00:21.632
Is that what that?

00:00:21.672 --> 00:00:21.992
means.

00:00:21.992 --> 00:00:22.653
Yes.

00:00:22.653 --> 00:00:27.532
This was another mind blowing conversation.

00:00:27.532 --> 00:00:48.198
In this conversation, we learned about large language models running large language models locally what are all of these models and how we can communicate with these models, with Business Central With us today, we had the opportunity to learn about many things.

00:00:48.198 --> 00:00:50.281
Ai with Stefano D'Amelio.

00:00:50.281 --> 00:01:01.133
Good morning Good afternoon.

00:01:01.173 --> 00:01:02.274
Good morning for me.

00:01:02.274 --> 00:01:03.856
Good morning for me Good night.

00:01:04.477 --> 00:01:07.944
Good morning for me, good night, good afternoon for you.

00:01:11.001 --> 00:01:17.340
It feels like nighttime here, but it's early morning, it always feels like nighttime here.

00:01:17.760 --> 00:01:21.248
I always forgot the time zone.

00:01:21.248 --> 00:01:22.832
Yes, you are early morning.

00:01:24.320 --> 00:01:32.307
Well, you are six hours ahead of me, okay, and then nine hours ahead of chris okay, so perfect.

00:01:32.307 --> 00:01:34.730
So, yeah, it is, it's perfect.

00:01:34.730 --> 00:01:41.189
It's perfect for me because it's not nighttime, it's perfect for you because it's late.

00:01:41.189 --> 00:01:45.859
Yeah, exactly, it's perfect for chris because it's very early so it's perfect for everybody.

00:01:45.859 --> 00:02:09.716
It's normal that I have an uploading uh message on top yeah, yes, yes, yes it it collects the local audio and video so that we have some high quality files to put together to make you sound amazing.

00:02:09.716 --> 00:02:10.962
But you already sound amazing.

00:02:11.022 --> 00:02:15.133
No, not too much you are amazing with your podcast.

00:02:17.062 --> 00:02:21.408
Yeah, thank you we're only amazing because of individuals like you.

00:02:21.408 --> 00:02:25.436
And what's the greeting in Italy?

00:02:25.436 --> 00:02:28.247
It's not ciao, it's how do you say?

00:02:28.247 --> 00:02:30.669
You know, usually we'll say good morning.

00:02:31.408 --> 00:02:35.025
Hello, we say ciao or no.

00:02:35.025 --> 00:02:38.680
We usually use ciao, it's the standard.

00:02:38.680 --> 00:02:40.064
Buongiorno.

00:02:41.329 --> 00:02:41.810
Buongiorno.

00:02:42.682 --> 00:02:49.079
Buongiorno is another way, ciao is more informal.

00:02:49.098 --> 00:02:52.885
Ok, and then when you say bye, do you say arrivederci?

00:02:52.925 --> 00:02:58.110
or ciao again, arrivederci exactly you speak.

00:02:58.110 --> 00:03:01.713
Italian perfectly, I'm ready to go to Italy.

00:03:02.650 --> 00:03:03.520
I'm ready to go to Italy.

00:03:03.520 --> 00:03:05.219
Still haven't made it over to Europe.

00:03:06.651 --> 00:03:21.830
It's a struggle, but hopefully this year I'll be able to make, I'll be able to make one of the appearances over there, one of the conferences out there it's always a challenge one of the next European conferences yes, yes, there's several coming up.

00:03:22.060 --> 00:03:25.948
It's a matter of trying to find the one that works out best logistically.

00:03:25.948 --> 00:03:26.890
Yeah, I agree.

00:03:27.360 --> 00:03:39.051
It's not always easy to balance every event that there's outside, so balance events, working family and so on is not easy.

00:03:40.141 --> 00:04:04.186
No, it's not easy In Europe we spoke about before, I think casually, europe is like the United States in the sense that, oh, excuse me, the United States in itself is like Europe, where you have the United States as a large continent or a large country, excuse me and it has many states In Europe Now, probably one more If you join Canada.

00:04:05.540 --> 00:04:06.806
Don't even get me started on that, I don't want Canada.

00:04:06.806 --> 00:04:08.877
If you join Canada, don't even get me started on that, I don't want Canada.

00:04:08.877 --> 00:04:09.319
They can keep Canada.

00:04:09.319 --> 00:04:13.050
Let's give Canada to somebody else.

00:04:13.050 --> 00:04:22.629
But we travel in the United States across states, like Europeans travel across countries.

00:04:24.403 --> 00:04:27.891
So when there's European conferences it's a little bit easier for you to move around.

00:04:27.891 --> 00:04:28.701
I understand.

00:04:28.701 --> 00:04:40.812
Also, for you to come over to the United States it's a little difficult because you understand, in essence it's a day of travel somewhere and then you have to attend a conference or do something, then a day of travel back.

00:04:40.812 --> 00:04:49.517
So you don't usually do something like that without trying to get additional time to do yeah it's easier for you though because you're east coast.

00:04:49.557 --> 00:04:52.826
If you're flying east coast to europe, it's it's a much shorter flight, like for me.

00:04:52.846 --> 00:05:04.230
I have to cross the country, yeah, and then go the other way I remember when I was in the us, uh, some years ago, from los Los Angeles to moving to New York.

00:05:04.230 --> 00:05:11.098
It was about, if I remember, four or five hours of flight, something like that?

00:05:11.178 --> 00:05:13.382
Yeah, some of the flights, like you said.

00:05:13.382 --> 00:05:19.788
Yeah, this is about five to six hours, depending on where on the East Coast that you go, so that is just itself going one side to the other.

00:05:19.788 --> 00:05:23.303
It's a little challenging, chris, it becomes.

00:05:23.303 --> 00:05:24.709
Which airport do you go to?

00:05:24.709 --> 00:05:32.394
Yeah, and Europe is fortunate that they have a great rail system because you can go from country to country easily.

00:05:32.394 --> 00:05:32.882
And.

00:05:32.922 --> 00:05:34.946
I often forget that, so I see some of these events.

00:05:35.581 --> 00:05:41.252
I was talking with someone they said they were recommending if I wanted to go to one of the events, we'll fly to this airport.

00:05:41.252 --> 00:05:43.529
You could probably get a direct flight.

00:05:43.529 --> 00:05:57.050
Then you can take a train easily for a few hours to get to the destination, which was much shorter, when I looked at it, than flying oh yeah, for sure, to an airport and having the connections yeah, they do have good.

00:05:57.451 --> 00:05:58.980
You do have good transportation.

00:05:58.980 --> 00:06:03.709
Ours is like a greyhound bus, but that takes like forever to get around.

00:06:05.411 --> 00:06:08.456
I wish I do wish we had a better transit system.

00:06:08.456 --> 00:06:10.687
Some of the cities have great transit systems.

00:06:10.687 --> 00:06:15.773
Boston has a subway and they have some rail, exactly.

00:06:15.773 --> 00:06:21.050
And then New York has it used to be a good system, but now, from my understanding, it's a disaster.

00:06:21.050 --> 00:06:22.533
You avoid it.

00:06:22.533 --> 00:06:40.540
There's ways that you can get around, uh, but if you want to go from boston to florida, for example, you can take a train, but the train will take you a day and it's so it's it's challenging it's challenging but thank you for taking the time to speak with us.

00:06:40.579 --> 00:06:52.321
I've been looking forward to speaking with you about a topic that is interesting to most people these days, even more so, I think, in the development point of view.

00:06:52.321 --> 00:06:55.651
But before we jump into it, can you tell everyone a little bit about yourself?

00:06:57.281 --> 00:06:58.362
A little bit about myself.

00:06:58.362 --> 00:07:00.464
My name is Stefano.

00:07:00.464 --> 00:07:09.326
I'm working mainly in the business central area and in the Azure area, so this is the topic that I cover.

00:07:09.326 --> 00:07:09.889
In my company.

00:07:09.889 --> 00:07:14.511
I am responsible for all the development team inside my group.

00:07:14.511 --> 00:07:30.307
My group is called Lodestar and we are quite a large group in Italy and I have the responsibility of managing the development part of business central area and the Azure area, so serverless applications and so on.

00:07:30.307 --> 00:07:54.872
Recently, as you can imagine, we have also started working on the AI staff, and so I'm currently also leading it at the moment small, but I hope that we will grow team that is involved on providing also AI solutions to the customers.

00:07:55.973 --> 00:08:00.805
I have a long history from in business central area, previously NIV.

00:08:00.805 --> 00:08:09.415
I started in NIV I in version 2.1, navision when it was NaVision 2.1.

00:08:09.415 --> 00:08:15.011
Then it was acquired by Microsoft and so on.

00:08:15.011 --> 00:08:21.644
So I follow all the roadmap of this product and now we are here.

00:08:21.644 --> 00:08:22.387
We are in the cloud.

00:08:22.387 --> 00:08:33.091
So there was lots of evolution in the product, lots of steps and there really was.

00:08:33.472 --> 00:08:33.613
I.

00:08:33.613 --> 00:08:56.821
One day we'll have to sit down with a few people that have been working with it as long as you have and just talk about the evolution of the product from where it was, back with the classic client, with the native database that they had, then when they added SQL, then when they added the roll-tailed client, you know, continue through the progression of the evolution of both the product and the language.

00:08:57.361 --> 00:09:01.551
And I said it before, originally they had three versions, if you recall.

00:09:01.551 --> 00:09:06.283
They had the financials version, the distribution version and the manufacturing version.

00:09:06.283 --> 00:09:11.004
So depending on which customer type you were, you would get a specific version of.

00:09:11.245 --> 00:09:19.808
Navision and that was Navision has had a lot of evolutions in the years.

00:09:19.808 --> 00:09:40.602
I remember we started with the classic client and the native database, so this was extremely fast, so very, very great on that, with a lot of limitations probably when going to big customers and unfortunately we started.

00:09:40.602 --> 00:10:08.229
My first Navision project in my life was with a very big customer because we decided to move to Navision and all the healthcare system that we have and we historically in my company we have an healthcare dedicated sector and we have a solution, previously handmade solution based on Oracle database.

00:10:08.229 --> 00:10:38.523
In years, two or three years before the introduction of the euros, we decided to move this solution, solution to Navision classic database, because it was only possible this solution had, if I remember four or five hundred users and it was a very big solution.

00:10:38.543 --> 00:10:40.005
And then we moved to SQL Server.

00:10:40.005 --> 00:11:05.490
When we moved to SQL Server from classic, there was a lot of problem conversion of data and something like that but the solution is still live and the curious part of that is that we are in 2025, we'll be the central line and so on, but we have also today customers that are using the old Navision, converted to NAV 2009,.

00:11:05.490 --> 00:11:11.928
But we have still today live customers and also big customers that are still on the platform.

00:11:11.928 --> 00:11:13.264
We are trying to convince them.

00:11:13.264 --> 00:11:13.907
Wow.

00:11:14.912 --> 00:11:15.318
Is that it?

00:11:15.318 --> 00:11:24.278
I know of a customer as well that's using Nav 2009, and I think they have close to 400 users and they haven't decided to make a move.

00:11:24.480 --> 00:11:25.504
The curious part of that.

00:11:25.504 --> 00:11:41.268
What sometimes makes me crazy is that in my everyday job at office, maybe during the day, I need to switch from yes Code, iel language and so on to OpenClassicLine and IV2009 to fix something or to add something.

00:11:41.268 --> 00:11:46.067
So also today we need to switch from totally different.

00:11:46.349 --> 00:12:01.947
Wow, it is interesting to see the difference and, as you had mentioned, you get used to working with AL and VS Code and all the tools that you have within VS Code, all the things that were added, and you go back to 2009,.

00:12:01.947 --> 00:12:08.547
You see what we really had to do to do code, even when they added the separation for functions.

00:12:08.547 --> 00:12:14.447
It was a big deal for me that they had the gray bar where you could separate between the functions, which was an even fine reference.

00:12:14.447 --> 00:12:14.707
It was good.

00:12:14.707 --> 00:12:17.808
Also, I didn't get a chance to speak with you in person.

00:12:17.808 --> 00:12:22.083
I know we've communicated with text and written, but congratulations on the book that you and Julio put out.

00:12:22.083 --> 00:12:22.926
It's a great book.

00:12:22.926 --> 00:12:23.726
I did pick it up.

00:12:23.807 --> 00:12:24.729
I have it.

00:12:24.729 --> 00:12:26.932
Yeah, we have worked quite a lot on that.

00:12:28.419 --> 00:12:29.649
So we hope that I can only imagine.

00:12:29.649 --> 00:12:30.799
I can only imagine.

00:12:31.360 --> 00:12:34.447
We receive a lot of positive feedback from the community Very useful.

00:12:35.529 --> 00:12:36.191
It's very useful.

00:12:36.191 --> 00:12:36.972
It is on my shelf.

00:12:36.972 --> 00:12:38.224
I have it right behind me.

00:12:38.745 --> 00:12:50.006
Yes yes so it's uh, I have it as well no, so, uh, thank you for doing that and creating that and congratulations on putting together something so informative, uh for users.

00:12:50.006 --> 00:12:54.121
But now let's jump into this llm stuff.

00:12:54.121 --> 00:13:13.130
Yeah, because you have been doing some things that I don't know, if I can say I understand or don't understand, but anytime I see something that you post, you're always doing something new with local language, large language models, but you're also doing a lot locally, exactly.

00:13:13.130 --> 00:13:20.673
I see so you're installing and setting up AI or language models on your computer.

00:13:21.274 --> 00:13:22.684
Yes, your local machine.

00:13:22.804 --> 00:13:23.861
Wow, exactly, computer.

00:13:23.861 --> 00:13:24.563
Yes, your local machine.

00:13:24.563 --> 00:13:34.653
Wow, exactly uh what we have uh, but I think that everyone that is following uh technology information uh today on socials or on internet or something like that.

00:13:34.653 --> 00:13:44.121
You everywhere read about ai, uh, ai is a topic that is absolutely exploding and uh there are.

00:13:44.201 --> 00:13:47.885
I don't think you can go five minutes without hearing it Exactly.

00:13:47.885 --> 00:13:51.746
I really don't, except when you're sleeping, I think, even maybe what you're talking about.

00:13:51.746 --> 00:14:05.950
If you're listening to the news, if you're having a conversation with someone at work, if you're reading something online, I think you can't go five minutes unless you, like you had mentioned, Chris unless you're sleeping or you just are sitting by yourself in the woods somewhere without hearing AI Exactly.

00:14:06.610 --> 00:14:07.773
And I totally agree.

00:14:07.773 --> 00:14:38.192
And the history about my these stuffs that I'm doing today is I think that the majority of us knows that the big AI vendors like OpenAI, microsoft, google, something like that so these are now also Twitter or X sorry, not Twitter X Grok, as we recently released Grok3, that is extremely powerful.

00:14:38.192 --> 00:14:49.120
So the concept that we embraced some years ago is that we start providing AI solutions by using standard AI models.

00:14:49.120 --> 00:14:58.889
So Azure, openai was our first choice, and this was absolutely easy to do.

00:14:58.889 --> 00:15:03.589
Just go on Azure, set up a model, deploy.

00:15:03.589 --> 00:15:08.970
Deploy a model and then you can use your model in business central or in different applications you want to use.

00:15:08.970 --> 00:15:12.260
We.

00:15:12.280 --> 00:15:32.572
We have some problems on that, so on on some scenarios, and the problem of that is that sometimes, when is it not easy to provide and convince customers that an AI solution is something that can be a winning choice for them?

00:15:32.572 --> 00:15:50.134
So you need to demonstrate something, and some customers also are not so prone to leave your data accessible to internet or maybe have some devices, particular devices.

00:15:50.134 --> 00:16:04.542
We have, for example, scenarios in manufactories where they cannot access internet or don't want to access internet for different reasons, or cannot access the browser, for example.

00:16:04.542 --> 00:16:09.208
This was another limitation no browser as the way to interact.

00:16:09.208 --> 00:16:22.005
And so for that reason, this was one of the reasons that turned me the light to start exploring something different.

00:16:22.005 --> 00:16:41.587
And the second reason for that was that there are a lot of scenarios at least in my experience lot of scenarios where AI can be useful, but for these scenarios is not absolutely needed the full power of a giant LLM.

00:16:41.587 --> 00:16:56.591
For example, why I need to pay for I don't know GPT-4A when I only need small staffs or I only need to do function calling or something like that.

00:16:56.591 --> 00:17:04.913
Sometimes AI for a big company can be costly for probably nothing.

00:17:04.913 --> 00:17:19.534
For a big company can be costly for probably nothing, and it's not absolutely not always choosing the best performance LLM is gives an advantage to the final customer.

00:17:23.644 --> 00:17:38.792
So, with these reasons, I started exploring a new world, that is, the open source LLMs, because it's probably a world that is not so spread everywhere.

00:17:38.792 --> 00:17:56.769
But the AI world is also full of open source LLMs, and these open source LLMs are also provided by big vendors like Microsoft is providing open source LLMs, google is providing open source LLMs, meta Lama and more.

00:17:56.769 --> 00:18:01.727
So DeepSeek is also provided as an open LLM.

00:18:01.727 --> 00:18:39.867
These LLM these LLM are, in many scenarios absolutely powerful can be executed offline and sometimes can give the same result to the customers as using one of the full version that you have available in OpenAI or Azure, openai or X or something like that, giving absolutely the same results but without going to internet, without totally private, and so on.

00:18:39.867 --> 00:18:43.609
So that's why I started exploring this world.

00:18:45.442 --> 00:18:46.887
My mind is full of questions.

00:18:46.887 --> 00:18:57.915
So you're working with open source LLMs to run AI locally, the language models locally, versus running them online.

00:18:57.915 --> 00:18:59.144
I have several questions.

00:18:59.144 --> 00:19:00.902
With that One we'll get to.

00:19:00.902 --> 00:19:03.888
How do you set all that up, but we'll talk about that after.

00:19:03.888 --> 00:19:28.875
How do you determine the differences between the models that you choose to use, which they and you had mentioned some of the big names that we hear of outside of the open source ones, with microsoft, with google, with meta and now XAI how do you know which model to use, or what's the difference between the models?

00:19:28.875 --> 00:19:37.146
Because I see, like the GPT-4.0, grok 3, grok 2, Cloud, sonnet 3.5.

00:19:37.146 --> 00:19:44.071
I see all these different language models and how do you know what the difference is between them?

00:19:44.071 --> 00:19:44.500
Or is?

00:19:44.560 --> 00:19:48.332
it just all the same, and it's a different name based upon who creates it.

00:19:48.332 --> 00:19:49.664
Are they created equal?

00:19:50.660 --> 00:19:55.952
No, if I can try to share a screen, if possible, so that we can.

00:19:56.559 --> 00:19:57.859
Yes, that would be wonderful.

00:19:58.221 --> 00:20:02.551
We can talk probably now.

00:20:03.941 --> 00:20:05.585
Very cool Excellent.

00:20:06.086 --> 00:20:06.929
I'm excited about this.

00:20:06.929 --> 00:20:07.932
I'm excited.

00:20:08.000 --> 00:20:10.500
There's some cool stuff on your screen with graphs moving.

00:20:10.599 --> 00:20:11.826
And you're a Mac user.

00:20:15.027 --> 00:20:15.900
But now it's working.

00:20:15.900 --> 00:20:26.406
Sorry for the problem, but I don't know why no one will know, so we can see your screen.

00:20:26.488 --> 00:20:30.520
You have a window open with some graphs and some things moving.

00:20:30.520 --> 00:20:31.142
Yes, what?

00:20:31.241 --> 00:20:40.691
I will start first showing is this, this window.

00:20:40.691 --> 00:20:42.333
So Hugging Face.

00:20:42.333 --> 00:21:26.127
Hugging Face is one of the main, probably probably one of the main portals and platforms where open source LLMs are distributed from all the different vendors, and so every vendor that wants to distribute an AI model today in the open source world release on AgingFace and on AgingFace you can see, if you click on models, you can see that here there are tons of models deployed here.

00:21:26.127 --> 00:21:38.570
Some are models completely open source, models like and not very known models like, as you can see, a lot of names that are not so famous.

00:21:38.570 --> 00:21:59.490
But there are models that instead are extremely famous and they have also their counterpart that is not open source and is released as a paid service, like, for example, probably one of the most famous today is DeepSeq.

00:21:59.490 --> 00:22:05.031
Deepseq is a very powerful model.

00:22:05.031 --> 00:22:29.945
Deepseq, as the full DeepSeq model, is a big model with 671 billions of parameters, so it's a very extreme large model that, in order to be executed locally, requires more than 400 gigabytes of RAM.

00:22:29.945 --> 00:22:30.607
Wow.

00:22:32.140 --> 00:22:34.906
So you need 400 gig of RAM to run this locally.

00:22:34.906 --> 00:22:35.127
Wow.

00:22:38.522 --> 00:22:40.348
That was one of my questions.

00:22:40.900 --> 00:22:42.286
The hardware requirements.

00:22:42.286 --> 00:22:50.012
Well, you have a large model that is run online, such as DeepSeek and the ones that we had mentioned.

00:22:50.012 --> 00:23:03.594
That was the first question I had is if you want to run these locally, what are the requirements that you have to run them locally, Because I don't know of many people that have a 400 gig of RAM computer?

00:23:10.220 --> 00:23:12.269
people that have a 400 gig of ram computer.

00:23:12.269 --> 00:23:24.088
It's uh, it's something that uh you cannot execute uh in a local, uh local machine, but here for uh open source model, that's uh an important concept to understand.

00:23:24.088 --> 00:23:25.772
That is called quantization.

00:23:25.772 --> 00:23:40.431
So quantization is, in simple terms, is a technique that an LLM vendor can use to reduce the computational and memory cost requirements of a model.

00:23:40.431 --> 00:23:53.919
So in try to explain that in simple terms, is like starting from a full power LLM.

00:23:53.919 --> 00:24:02.529
So an LLM that is provided by the vendor cannot be executed online because it requires a data center in order to be executed.

00:24:03.672 --> 00:24:17.326
These models pass through a process that reduces the precision of the model, so can reduce the floating point required representation of that models.

00:24:17.326 --> 00:24:34.784
So it's something like compressing that model and create from that model a smallest model with the same capacity but with less precision.

00:24:34.784 --> 00:24:37.207
That's the idea.

00:24:37.207 --> 00:24:38.768
So you start from a giant.

00:24:38.768 --> 00:24:51.900
You can detach smaller children of that giant with a bit of smaller precision.

00:24:51.900 --> 00:24:58.894
But smaller precision doesn't mean precision in terms of responses or in terms of capacity.

00:24:58.894 --> 00:25:06.133
It's something like reducing the neural network inside that model.

00:25:07.141 --> 00:25:32.435
So if you can see here, for example, without going to mathematical concept, because quantization is honestly a mathematical concept, if you can see here this is the full DeepSeq model 671 billion of parameters.

00:25:32.435 --> 00:25:49.751
These models cannot be executed offline unless you have a cluster of machines, because it requires not less than 400 gigabytes of RAMs and GPUs in order to be executed online.

00:25:49.751 --> 00:26:01.548
So I cannot execute it offline and probably you cannot execute it offline in your machines and probably also many of them, Unless you got a data center there, Brad somewhere.

00:26:02.963 --> 00:26:05.509
It's under my desk.

00:26:05.509 --> 00:26:15.525
This is why these models are provided as services from the cloud, so you can execute, activate a subscription to DeepSeq or deploy DeepSeq today.

00:26:15.525 --> 00:26:16.228
Also on Azure.

00:26:16.228 --> 00:26:18.962
It's available on Azure AI Foundry.

00:26:18.962 --> 00:26:23.367
You can deploy the full DeepSeq and you can use as services.

00:26:23.367 --> 00:26:27.778
But here you can use as services.

00:26:27.778 --> 00:26:35.148
But here you can see that also there are available the distilled models and that distilled models are a reduced version of DeepSeq in this case.

00:26:36.059 --> 00:26:51.199
So models that are passed through a process called quantizations and through a second process in this case, from the case of DeepSeq called quantizations, and through a second process in this case from the case of DeepSeq called distillations.

00:26:51.199 --> 00:27:03.770
And distillation, as you can see here is another technique that is using open source AI.

00:27:03.770 --> 00:27:27.248
So the distillation is a machine learning technique that involves transferring knowledge from a large model to a smaller one in order to create a model that has the same features and knowledge of the big but of the medium.

00:27:27.248 --> 00:27:32.429
In this case, dpsic transfer it to a smaller model.

00:27:32.429 --> 00:27:43.208
So in this case you can see that here DPSIC is providing several distillation of DPSIC, so it's coming from these models.

00:27:43.208 --> 00:27:46.829
These are the base model that is used to.

00:27:46.829 --> 00:27:54.590
Deepseq has trained this model in order to have a new model called with these names, ah.

00:27:56.222 --> 00:27:57.287
It's a voluntary model.

00:27:59.284 --> 00:28:09.142
So with this process, just to take it back, so in the cloud, they have a model that has billions of parameters, as you had mentioned.

00:28:09.142 --> 00:28:16.746
They go through a distillation process and they reduce it so that it can run locally on a reasonable machine.

00:28:16.746 --> 00:28:24.568
Exactly, you said that the precision is off, is there a difference in the results?

00:28:24.568 --> 00:28:27.086
What's the difference with them?

00:28:27.086 --> 00:28:29.787
Reducing it versus running it in the cloud?

00:28:29.787 --> 00:28:31.767
Is it speed in response?

00:28:31.767 --> 00:28:33.224
Is it accuracy?

00:28:33.224 --> 00:28:35.285
I don't even want to use the word accuracy.

00:28:35.859 --> 00:28:59.866
The main difference that you can experience on some scenarios is probably accuracy, because the full model has obviously more parameters, so accuracy is sometimes at least not always but for some tasks accuracy is probably better.

00:29:00.789 --> 00:29:33.886
If you have followed some of the posts that I have done, I've done, for example, some tests on auto-generating JavaScript complex scripts for creating animations or something like that, and for, for example, these tasks, probably the full model is more accurate With the distilled model, so the local model is less accurate With the distilled model, so the local model is a bit less accurate and you need to more turn up the prompt in order to have the same result.

00:29:33.886 --> 00:29:59.626
But here, for example, for speaking for interaction with business center, for example, or for creating agents or something like that, these models are absolutely comparable to the online model, with the advantage that you don't pay nothing, that you can deploy it offline with also a reasonable amount of RAM.

00:29:59.626 --> 00:30:07.554
It depends of the number of parameters that the model has.

00:30:07.554 --> 00:30:14.394
So this number that appears here is the number of parameters that this model has.

00:30:14.394 --> 00:30:17.829
So, for example, this is 70 billion parameters.

00:30:17.829 --> 00:30:21.490
This is 32 billion parameters.

00:30:21.839 --> 00:30:28.891
This for example is the model that I used and I'm still using for my tests with DeepSeq.

00:30:29.520 --> 00:30:31.766
Which model of DeepSeq are you using for your tests?

00:30:32.567 --> 00:30:34.752
32 billion parameters.

00:30:34.752 --> 00:30:41.888
Here's a distillation of DeepSeq using 32 billion parameters and this works absolutely great.

00:30:44.642 --> 00:30:46.440
But how do you tell which like?

00:30:46.440 --> 00:30:52.009
If you look at the 32 billion parameters, like you're running it clearly on a MacBook.

00:30:52.730 --> 00:30:52.932
Yes.

00:30:53.579 --> 00:30:57.731
And how do you know if your MacBook will handle that?

00:31:03.324 --> 00:31:07.621
To know if the local machine can handle that you can.

00:31:07.621 --> 00:31:23.510
The number of parameters there's a calculation that gives you the rough estimate of gigabytes of RAM that you can use in order to run these parameters.

00:31:23.510 --> 00:31:30.992
Very, very rough number is if you multiply this number.

00:31:30.992 --> 00:31:46.346
Multiply this number by 1.5, for example, is usually a large estimate of the number of gigabytes that you need to run.

00:31:47.063 --> 00:31:54.625
So you multiply the number of parameters by 1.5, or which number Exactly this gives you about the number of gigabytes that these?

00:31:54.746 --> 00:32:01.508
1.2, 1.5, 1.5 if you want to stay large.

00:32:01.508 --> 00:32:08.346
This is the number of gigabytes required to efficiently run this model locally.

00:32:08.346 --> 00:32:18.111
So, for example, this requires to have at least 40 gigabytes of RAM to run locally.

00:32:19.419 --> 00:32:21.104
Okay, oh, wow Okay.

00:32:21.224 --> 00:32:27.502
If you have strict requirements or, I don't know, if you have, for example, a local machine with 16 GB of RAM.

00:32:27.502 --> 00:32:32.007
Probably this is the model to use.

00:32:32.047 --> 00:32:41.615
So if you have 32 billion parameters, you multiply that by 1.5, roughly in the SL, again 1.2, 1.5, so that's where you get the 40.

00:32:41.615 --> 00:32:46.607
So it's not 32 billion times 1.5, it's 32.

00:32:46.607 --> 00:32:47.398
So it's a number of billions.

00:32:47.398 --> 00:32:48.403
Okay, to be clear.

00:32:49.339 --> 00:33:00.671
There's a more precise number, so more precise calculation that compares not only the big parameters but also other set of parameters.

00:33:00.671 --> 00:33:15.801
But in my experience, when I have to quickly evaluate if I can use this model on online or or offline or not, uh, taking in into consideration the resources that I have, I use this estimate.

00:33:15.801 --> 00:33:31.759
So, uh, these multiply per one, one, one point, uh, two, something like that 1.2, something like that 1.5, if I want to stay large, gives me if this model is able to run on my machine or not, 16 gigs 16, 17 gigs for that one.

00:33:31.859 --> 00:33:33.205
It can also be run on iPhones.

00:33:33.205 --> 00:33:38.229
It did take me to a whole different world here.

00:33:38.409 --> 00:33:44.624
So you can run this on the phone, but I just want to take it back up a notch before we get my mind.

00:33:44.624 --> 00:33:47.618
Has this whole list of questions Amazing?

00:33:47.618 --> 00:34:05.173
So we have a large language model that's in the cloud that went through a distillation process to now run locally, where there's different models, or mini models, I guess you could say, or distilled models that have different parameters where you had mentioned.

00:34:05.173 --> 00:34:12.748
In some cases, what you may lose is some accuracy In some cases, not always, not always.

00:34:12.748 --> 00:34:25.333
Now I hear about language models being trained constantly with information on the Internet or trained by different sources, with this being run locally.

00:34:25.333 --> 00:34:30.652
Does it have all of that information and what happens if the model gets updated?

00:34:30.652 --> 00:34:38.992
Is that the whole point of having different model versions is it has a different set of data, or if because a different set of parameters.

00:34:39.012 --> 00:34:42.286
Let's just say we index the internet for a website.

00:34:42.286 --> 00:34:48.990
So let's's just say we index Microsoft Learn today and have a model that's focused on Microsoft Learn.

00:34:48.990 --> 00:34:51.480
They constantly add documents.

00:34:51.480 --> 00:34:59.050
I now have a local copy of DeepSeek that use that Learn source.

00:34:59.050 --> 00:35:01.987
How do I get updated information?

00:35:03.771 --> 00:35:04.132
Exactly.

00:35:04.132 --> 00:35:15.646
The main limitation of the local LLMs is that they are periodically refreshed, so it's local.

00:35:15.646 --> 00:35:28.188
When you have downloaded a local LLM, like for example, here in my machine, I have this set of local LLMs, some from Microsoft, some from Lama and DeepSeq.

00:35:28.188 --> 00:35:31.905
Let me try to do this.

00:35:31.905 --> 00:36:00.414
Local LLMs are downloaded with the knowledge of when the vendor releases that model, so, for example, your latest update date, and sometimes they response giving you that you.

00:36:04.460 --> 00:36:06.704
For example, 5.4 is not a recent model.

00:36:06.704 --> 00:36:18.579
It has knowledge, so it has knowledge, public knowledge of facts, internet facts until this date, probably now.

00:36:18.579 --> 00:36:20.385
I have not updated it yet.

00:36:20.385 --> 00:36:23.603
Probably if I download a new update you can.

00:36:23.603 --> 00:36:28.742
It's something like Docker the technology so you can download the model.

00:36:28.742 --> 00:36:31.422
It creates a local model.

00:36:31.422 --> 00:36:35.501
Then you can pull again in order to see if there are updates of that model.

00:36:35.501 --> 00:36:44.835
So when I used this model this is, for example, one of the most powerful, in my opinion, small language models that can run locally.

00:36:44.835 --> 00:36:46.362
So FI4 from Microsoft.

00:36:48.456 --> 00:36:49.199
Which model is that?

00:36:49.199 --> 00:36:51.125
Again did you say FI4.

00:36:52.036 --> 00:36:54.603
Microsoft FI4 is good Microsoft FI4?

00:36:54.603 --> 00:36:54.603
.

00:36:54.603 --> 00:36:56.420
Yes, it's this model here.

00:37:00.297 --> 00:37:09.445
It's one of the best, in my opinion, models from Microsoft that can run fully offline.

00:37:09.445 --> 00:37:23.840
So, in terms the probably the main limitation of uh open source and local language model are if you intend it to use as a model that knows internet.

00:37:23.840 --> 00:37:38.614
So this can be probably the scenarios where they can have the main limitations Because they are created and deployed in a particular way.

00:37:38.614 --> 00:37:43.139
They know the knowledge until that particular way and then you can download.

00:37:43.139 --> 00:37:47.324
But honestly, this is absolutely not my scenarios.

00:37:47.324 --> 00:37:55.815
So my scenario is not having a chat GPT offline.

00:37:55.815 --> 00:38:02.583
That works perfectly because I can here it fails only if I know internet facts.

00:38:02.875 --> 00:38:23.018
So if I know who is the USA president, president, I don't know if it's able to so you're saying that when you're, when you download these small LLMs locally, running locally, does it not have access to the internet at all, or can you tell it to have access to the internet?

00:38:24.942 --> 00:38:31.362
Jen, usually the model that runs offline by default has no access to internet.

00:38:31.362 --> 00:38:36.695
You can enable access to internet, but as default it has no access to internet.

00:38:36.695 --> 00:38:43.894
So because it's trained with the knowledge when the vendor releases it.

00:38:44.376 --> 00:38:45.940
At that time it was published, got it Exactly.

00:38:45.960 --> 00:38:49.972
So, for example, if I'm asking DeepSeek, when the vendor released it At that time it was published, got it Okay?

00:38:49.972 --> 00:38:53.965
So, for example, yeah, if I, if I asking DeepSeek who is the USA president, he's giving me that.

00:38:53.965 --> 00:38:58.476
As for my last update, joe Biden is the president, because it's not.

00:38:59.639 --> 00:39:01.422
Right, october 2023 is the last.

00:39:01.443 --> 00:39:24.126
It's an online model so, but so the question is if you want to have a reliable chat GPT probably an offline model sometimes can fail because you need to be sure that it was updated with the latest data coming from internet.

00:39:25.074 --> 00:39:27.141
So that's a good point that you make it.

00:39:27.141 --> 00:39:35.436
It's all a matter of, or it is a matter of, how you're going to use, or what you need to use for the model that you're running locally.

00:39:35.436 --> 00:39:40.286
Right, I want to get into this and I hope that you publish it someday.

00:39:40.286 --> 00:39:41.789
How do I install this?

00:39:41.789 --> 00:39:48.224
But can you train it with your own data as well on a local model?

00:39:49.347 --> 00:40:09.856
Yes, so if I had, if I was an organization that had security reasons that had policies for my, my employees, or I had other documents that I wanted to put into the AI so that the members of our team could use the AI to find something simple.

00:40:09.856 --> 00:40:21.695
So we may have a handbook for an employee, handbook that has the policies for taking time off for holidays, where an employee could just type to the model what are the holidays we have?

00:40:23.278 --> 00:40:23.659
Exactly.

00:40:23.659 --> 00:40:29.289
Here is exactly the point where these models are, in my opinion, interesting.

00:40:29.289 --> 00:40:42.079
So I think that these models are not extremely interesting if you want to have a chat GPT offline, or at least if you want to have a chat GPT offline.

00:40:42.079 --> 00:40:45.324
There are scenarios where they are extremely interesting.

00:40:45.324 --> 00:40:58.666
For example, if I need to ask something for coding, they can give me an answer without going to internet, so I can also use it on an airplane or everywhere I want.

00:40:58.666 --> 00:41:05.088
Also, from an iPhone, for example, I can use these models.

00:41:05.088 --> 00:41:33.106
But the second scenario is with company data, and here is where I've spent my last months on and also we have live projects on that using these models, because you can use these models fully locally, without paying nothing and without having access to internet for doing business stuff.

00:41:33.106 --> 00:41:51.438
So, for example, I at least in my case I don't have customers that ask me to provide an AI solution for going to internet and asking everything they want because there are co-pilots, there's or there's a chat gpt for that.

00:41:51.438 --> 00:41:59.557
All, all customers that are asking us ai solutions wants ai solution that works with their business.

00:41:59.557 --> 00:42:12.646
So they want the to have a ai solution that are able to talk with business central ai solution that are able to talk with Business Central, ai solutions that are able to talk with their documents or reasoning, with data coming from their corporate data, and something like that.

00:42:12.646 --> 00:42:19.775
So these are the AI solutions that are useful for that customer.

00:42:19.775 --> 00:42:22.164
So business solution, not a general chat.

00:42:22.164 --> 00:42:32.144
So an offline model is great on that because you can use function calling.

00:42:32.144 --> 00:42:48.567
You can use every of the feature that you have in one of the online models where something like GPT-4A or something like that so, for example, this model that is very small can be executed in also 60 gigabyte machines.

00:42:48.567 --> 00:42:59.829
It has the same power of GPT-4A in terms of function calling, agent creation and manipulation, something like that.

00:42:59.829 --> 00:43:06.838
And this can work completely offline and I can show you some examples completely offline.

00:43:06.858 --> 00:43:09.923
And then I can show you some example.

00:43:09.923 --> 00:43:19.577
So very, very stupid example, but just to show you something, let me move this here.

00:43:19.577 --> 00:43:30.409
For example, I don't want to go into much details into the code, so take this only as an example, but let me reduce this.

00:43:30.409 --> 00:43:37.204
So here, for example, I have a very stupid code that uses a local model.

00:43:37.204 --> 00:43:42.407
So this is my local model running in my local.

00:43:42.407 --> 00:43:45.927
It uses DeepSeq, so the version of DeepSeq that I previously mentioned in my local, and this is DeepSeq.

00:43:45.927 --> 00:44:12.782
So the version of DeepSeq that I mentioned previously mentioned in my local environment and in this example here I imagine that I want I am a business center company and I want to have the possibility to pass my data to this model in order to be able to have an AI solution where I can ask something about my data.

00:44:14.215 --> 00:44:53.306
If you want to do that in using online models, for example, staying in the Microsoft family you need to, for example, deploy I don't know GPT-4A for having the DLLM, and then you need a vector database and a vector models like text embedding, ada or something like that, because you need to convert data coming from business center to something that the model can understand, and for doing that you need also to have a vector database.

00:44:53.306 --> 00:44:59.601
Microsoft has Azure AI search for that, and this costs a lot.

00:44:59.601 --> 00:45:20.411
So this solution can cost not less than $400 per month minimum to have a full RAC solution working with business central data and an online model.

00:45:20.411 --> 00:45:30.175
The same result of this can also be executed totally offline, and this is a very quick, stupid example.

00:45:30.175 --> 00:45:39.201
So here I have my model running locally, so it's runs on my machine.

00:45:40.476 --> 00:45:46.074
The model is DeepSeq in this case, but you can use one of the available model.

00:45:46.074 --> 00:45:50.003
I use DeepSeIG here in this example, because the SIG is a reasoning model.

00:45:50.003 --> 00:45:56.367
Now, one of the latest trends in AI is reasoning models.

00:45:56.367 --> 00:46:05.284
Reasoning models are models that, before giving you the final response, performs a long reasoning process.

00:46:05.284 --> 00:46:14.985
They can explain all the steps that they use for reasoning and then they can give you the result.

00:46:14.985 --> 00:46:26.106
And here I also use the embeddings because I want to pass data, and this is, for example, one of the available embedding models open source models.

00:46:26.106 --> 00:46:27.992
I use this because it's the smallest.

00:46:27.992 --> 00:46:29.155
One of the available embedding models open source models.

00:46:29.155 --> 00:46:31.041
I use this because it's the smallest.

00:46:34.094 --> 00:46:35.800
So you have a local language model, DeepSeq, installed.

00:46:35.800 --> 00:46:40.222
You want to train it on your business central data all local so it doesn't go out to the internet.

00:46:40.222 --> 00:46:44.545
So you also now need to create or install another model.

00:46:44.545 --> 00:46:46.677
What was the model?

00:46:46.677 --> 00:46:49.202
You called that To process or to hold your data.

00:46:49.994 --> 00:47:00.523
The model is the embedded model is this I use this, but you can use different, so the embedded model is used to work with your data within the language model that you're using.

00:47:00.635 --> 00:47:01.981
Without going to the old steps.

00:47:01.981 --> 00:47:18.371
Yeah, I'm personally a big fan of this tool, this SDK called Microsoft Semantic Kernel.

00:47:18.371 --> 00:47:42.119
Microsoft Semantic Kernel is a SDK deployed from Microsoft that permits you to SDK, deployed from Microsoft that permits you to create AI solutions that are independent from the model, plus many other features, but one of the main features is that you can create and it abstracts you the creation of the AI, your AI solution, despite the model, and that you can use.

00:47:42.119 --> 00:47:55.512
So, with this tool, here I'm creating my service and in this service I'm passing data.

00:47:55.512 --> 00:48:04.268
Here I put stupid data, but imagine that I pass data from the sales coming from business center.

00:48:04.268 --> 00:48:06.952
Yeah, I, I simply passed my data.

00:48:06.972 --> 00:48:12.583
Just just to provide you an example, uh, as a list of data.

00:48:12.583 --> 00:48:30.280
So the concept is that I, to the memory of my uh ai model, I need to pass all the data that he needs to know, and this data can be the content of business central tables or summarization of the business central tables, I don't know.

00:48:30.280 --> 00:48:34.746
Here, just to provide a very easy example, I passed a set of data.

00:48:34.746 --> 00:48:50.936
So, for example, the sum of the amount of the sales for a customer in a month and then for each customer, the same amount for each of the product category that I I using.

00:48:50.936 --> 00:49:02.460
So the model now knows that he has a total amount and a total amount for this customer category, uh, this uh item category.

00:49:02.460 --> 00:49:06.606
So here, each data for each customer.

00:49:06.606 --> 00:49:13.141
Imagine that this can be your rough business central table, or what you want.

00:49:13.161 --> 00:49:26.206
So you could pick the data that you want to load the customer table vendor table, customer ledger all of the whichever specific things that you want your model to know Any specific thing that I want that my model knows.

00:49:26.206 --> 00:49:32.288
That's the idea and then I can start asking DeepSeq.

00:49:32.288 --> 00:49:34.099
So here, for example, I run this.

00:49:34.099 --> 00:49:49.364
It will not be extremely quick because here I use the biggest model I can use for that also a small distillation.

00:49:49.364 --> 00:49:54.514
So also the 16 billion parameters is okay.

00:49:54.514 --> 00:50:03.601
But here my models has memorized all this data and now DeepSeq is reasoning.

00:50:03.775 --> 00:50:08.226
I don't love to match the reasoning part of DeepSeq because it's long.

00:50:08.226 --> 00:50:08.835
You can.

00:50:08.835 --> 00:50:11.706
There's a way also to avoid DeepSeq reasoning.

00:50:11.706 --> 00:50:16.014
But yeah, I've asked in the program.

00:50:16.014 --> 00:50:23.585
I've asked the model to give me the sales amount for digital services in 2025.

00:50:23.585 --> 00:50:35.702
So the models need to go into each customer, retrieve the sales amount for that particular category and do the output.

00:50:35.702 --> 00:50:38.922
And here is, you can see the reasoning.

00:50:38.922 --> 00:50:41.541
So my model is responsible for that.

00:50:41.541 --> 00:50:44.461
First of all, sorry I forgot to mention.

00:50:44.461 --> 00:50:47.242
I opened this for that reason.

00:50:47.242 --> 00:50:51.824
When you run that, I can rerun again.

00:50:51.824 --> 00:50:52.284
You can see.

00:50:52.284 --> 00:51:02.543
When you run a local model, you will see that your GPU is going to the max, because the local model first uses GPU.

00:51:03.867 --> 00:51:05.610
Oh, okay, then memory.

00:51:06.253 --> 00:51:09.402
So I will later relaunch the process.

00:51:09.402 --> 00:51:24.264
You will see that my GPU will go to the top, because the model any LLM uses GPU at max in order to perform reasoning, calculations and so on.

00:51:24.264 --> 00:51:33.164
Then when GPU is not available, it uses RAM and CPU, but first of all it's GPU that is used.

00:51:33.164 --> 00:51:35.695
But now you can see that my model has responded.

00:51:35.695 --> 00:51:39.565
So DeepSeq has done that reasoning.

00:51:39.565 --> 00:51:43.219
Okay, I need to figure out the total sales amount for Jesus.

00:51:43.219 --> 00:51:43.721
Blah, blah, blah.

00:51:43.721 --> 00:51:46.663
It's explaining all this mental reasoning.

00:51:46.663 --> 00:51:51.643
So first look at Contoso and it retrieved that in these two months.

00:51:51.643 --> 00:52:02.202
Contoso has done that for digital services, then Adatum only one month Then Kronos and so on.

00:52:02.202 --> 00:52:04.461
Then it gives you all the explanation.

00:52:04.474 --> 00:52:16.981
Okay, now I need to sum and blah, blah, blah and the total result is this so it basically you can see what it's doing to come up with the number when you loaded the data.

00:52:16.981 --> 00:52:20.083
You only have to load that data one time, correct.

00:52:20.083 --> 00:52:25.083
Yes, one time so you don't have to do it for each query or each question or each prompt Data.

00:52:25.304 --> 00:52:26.565
do it for each query or each question or each prompt you can.

00:52:26.565 --> 00:52:29.389
So if we had a business, we had a business central database.

00:52:29.715 --> 00:52:34.856
we could, in essence, in your example, load the sales Every day.

00:52:34.856 --> 00:52:42.664
We could export or import, however you phrase it the sales information into our language model.

00:52:42.664 --> 00:52:46.034
So now it has up-to-date sales data.

00:52:46.034 --> 00:52:51.101
So anytime we run this it will have the most accurate information.

00:52:51.101 --> 00:52:53.427
Exactly, oh, that's excellent.

00:52:53.755 --> 00:52:59.820
And as a data store, so data store for these embeddings.

00:52:59.820 --> 00:53:02.983
Now you can have different type of embedding.

00:53:02.983 --> 00:53:29.782
For example, microsoft now has released the support for embeddings also in SQL Server or Azure SQL, and Azure SQL is absolutely a good choice in terms of money if you want to use also the online version, because having embeddings in Azure AI Search or in Azure Sequence there are Azure Search is very costly.

00:53:29.782 --> 00:53:33.824
Why Azure Sequence is absolutely cheaper than that.

00:53:33.824 --> 00:53:49.925
But here, just to show that here is I have asked a question to my LLM running locally about a set of data that I have done and he has done reasoning and he has provided me a result.

00:53:49.925 --> 00:54:02.724
So this can be useful if you want, for example, to have a service that is able to analyze your business central data and gives you the query according to the user question.

00:54:04.717 --> 00:54:05.822
I can't wait to play with this.

00:54:05.822 --> 00:54:14.342
I'm calling you later and we're going to set this up on my machine, but once you have Just to show what I forgot to mention before.

00:54:15.675 --> 00:54:18.545
If I do that again you will see.

00:54:18.545 --> 00:54:53.164
Yeah, so during the process of reasoning, so during the process of reasoning of your local machine is increasing.

00:54:53.164 --> 00:54:53.728
So imagine a data center.

00:54:53.728 --> 00:54:54.230
What happens?

00:54:54.230 --> 00:55:00.065
So data center, I read, the latest data center in US.

00:55:00.065 --> 00:55:06.998
The consumption of energy in US data center, if I remember, is consumes the.

00:55:06.998 --> 00:55:12.541
The consumption of energy in the US data center, if I remember, consumes 13% of the energy in the US.

00:55:12.541 --> 00:55:15.663
All the power that we have in the data center.

00:55:15.663 --> 00:55:16.476
So what?

00:55:16.476 --> 00:55:16.978
The main?

00:55:22.590 --> 00:55:31.945
What is that that you're running that shows the graph of the usage the GPU, and so the tool that you're running that shows the graph of the usage the GPU, yeah, it's called MacTop.

00:55:34.641 --> 00:55:35.184
It's this tool.

00:55:35.184 --> 00:55:39.559
Let me open it, browse.

00:55:39.559 --> 00:55:41.746
I use this.

00:55:41.746 --> 00:55:43.059
There are different.

00:55:43.059 --> 00:55:44.143
I use this.

00:55:44.143 --> 00:55:46.297
It's an open source.

00:55:46.297 --> 00:55:55.021
You can simply this resource monitoring for Mac.

00:55:55.021 --> 00:55:55.666
Yes, it's a resource monitoring.

00:55:55.666 --> 00:55:56.474
It's quite useful.

00:55:56.534 --> 00:55:59.322
So you're using an open source resource monitoring tool for a Mac.

00:55:59.342 --> 00:56:00.846
Yes, it's open source, absolutely, that's good.

00:56:01.494 --> 00:56:02.295
This is excellent.

00:56:02.317 --> 00:56:05.161
You can easily install with this.

00:56:06.945 --> 00:56:08.807
So we install our language model.

00:56:08.807 --> 00:56:22.001
We can I use the word export, but we can send the data to the language model from our business central environment or anything else, any other data that we want to send to it.

00:56:22.001 --> 00:56:30.436
The model will learn the data, train the data, the data you can ask in this case, deep seek a prompt.

00:56:30.436 --> 00:56:31.900
It will show you the reason.

00:56:31.900 --> 00:56:35.177
I like that so you can see exactly what it's doing to come up with the calculation.

00:56:35.177 --> 00:56:37.081
And now we have the result.

00:56:37.081 --> 00:56:40.186
So now we're doing this completely offline.

00:56:40.186 --> 00:56:52.465
So those that have questions of security, of data being transmitted in the cloud somewhere or teaching a model that somebody else could potentially get the data we eliminated that because this doesn't go out to the Internet.

00:56:53.235 --> 00:56:56.547
Now that we have that language model installed locally.

00:56:56.547 --> 00:56:59.094
Can we use it with Business Central itself?

00:56:59.094 --> 00:57:03.385
So Business Central with the newer versions has Copilot where we can prompt or ask questions and it will do things.

00:57:03.385 --> 00:57:06.192
Has co -pilot where we can prompt or ask questions and it will do things.

00:57:06.192 --> 00:57:09.844
Is there a way that we could use our local model?

00:57:09.844 --> 00:57:10.396
Within.

00:57:10.396 --> 00:57:10.737
Business.

00:57:10.777 --> 00:57:12.304
Central to get that information.

00:57:12.304 --> 00:57:20.302
Every local model, local model, in my opinion, are suitable for some types of scenarios.

00:57:20.302 --> 00:57:32.885
So Business Central, every local model, first of all, as you can see from here, so every local model is available as a local service.

00:57:32.885 --> 00:57:46.905
So it runs as a service from your local machine or your machine in your local network and you can use with the same APIs as the online model.

00:57:46.905 --> 00:57:53.864
So if I use DeepSeq offline, it's exactly like using DeepSeq online.

00:57:53.864 --> 00:58:04.045
If I use VI4, the Microsoft offline one of the Microsoft offline model is the same as using GPT-4A online.

00:58:04.045 --> 00:58:11.599
So in terms of API calls and so on, obviously a local model is local because it runs in your local network.

00:58:11.599 --> 00:58:22.184
So Business Central Online, calling directly a local model, you should expose this to Business Central Online.

00:58:22.184 --> 00:58:37.121
So this is honestly, you can do that, not maybe directly, but with a middle layer in the alpha that is able to cook to from the center.

00:58:37.121 --> 00:58:40.829
You call something like an Azure function and then an Azure function can call your local service.

00:58:40.829 --> 00:58:47.485
This is absolutely available, possible Azure function can expose this in a virtual network.

00:58:47.485 --> 00:58:51.945
So in order to have the security of the messages.

00:58:51.945 --> 00:59:02.661
But and this is possible so Business Central can call a local model, but you need something in order to expose the local service to Business Central.

00:59:02.661 --> 00:59:09.103
If you want to have something like a local a copilot inside Business Central using If you want to have something like a local eco-pilot inside Business Central using a local model.

00:59:10.835 --> 00:59:19.054
Honestly, my scenario that I at the moment used in real projects are opposite.

00:59:19.054 --> 00:59:24.588
So it's a local model that needs to interact with Business Central.

00:59:24.588 --> 00:59:25.518
So I need to.

00:59:25.518 --> 00:59:38.726
The scenario is I am a company that I have business center online, but for my AI solutions I want to have AI solutions that runs offline.

00:59:38.726 --> 00:59:44.262
So my AI solution is offline but needs to interact with business center in some ways.

00:59:44.262 --> 00:59:57.681
So, for example, we have AI solutions that to reach in projects that we have done with customers is there's a customer that is working in the manufacturing industry.

00:59:57.681 --> 01:00:20.485
They want to have in the production departments they cannot use browser but for different reasons and they want to have the possibility to have a chat that is able to work with business central data.

01:00:20.485 --> 01:00:29.596
So an example is I am in the production machine and I want to know where this item is used in my production order.

01:00:29.596 --> 01:00:46.686
I can directly open my console and typing where is this item used in my production orders and then the local model called Business Central and can give the response.

01:00:46.686 --> 01:00:47.788
That's helpful.01:00:48.175 --> 01:00:49.822


So they don't have to go to Business Central, right?01:00:49.822 --> 01:00:52.219


They just ask local, exactly, or?01:00:52.259 --> 01:00:57.005


something like can you set the inventory to design into five pieces?01:00:57.005 --> 01:01:03.163


Can you move the starting date of this production order to tomorrow?01:01:03.163 --> 01:01:17.963


And we have a solution for that, fully running locally that permits you to interact with your production orders, manufacturing inventory movements, something like that fully offline.01:01:18.688 --> 01:01:33.443


So the language model that you are talking about, or what you have set up, is not only learning on the Business Central data but it's interacting with Business Central to where it's updating information in Business Central.01:01:34.014 --> 01:01:35.876


Exactly, all locally, exactly.01:01:35.876 --> 01:01:57.947


Another example that I have here in my machine that I can maybe quickly show is that we have in no sorry, not this, but that we have in sorry, not this, but that we have recently deployed in a solution is one second.01:01:57.947 --> 01:02:03.400


Is this One second?01:02:03.400 --> 01:02:04.764


I need to open the window.01:02:04.764 --> 01:02:09.137


Okay, this is, for example.01:02:15.864 --> 01:02:46.405


We have some scenarios where we need to do image recognition and, for example, we have a customer that asks us to have the possibility to recognize if a warehouse is something like this, it's not needed.01:02:46.405 --> 01:02:47.969


Yes, something like this.01:02:47.969 --> 01:03:11.268


So, picture of the warehouses taken from the camera, and they want to know if the warehouse is over a certain level of fields in order to block the possibility to do pickings, so to put the ways on that locations.01:03:11.268 --> 01:03:22.985


So what happens in this scenario is that there are some cameras on this warehouse that takes the picture every X minutes of these warehouses.01:03:22.985 --> 01:03:33.518


They store the camera images into a folder, in this case, and then here we have a local model, lamavision.01:03:33.518 --> 01:03:42.340


In this case, lamavision is a powerful local model, open source model, that is able to do image recognition, ocr, something like that.01:03:42.340 --> 01:03:44.097


Offline too, right, offline.01:03:44.338 --> 01:03:50.840


Oh, I want to set something up to analyze all of my, I have like 60,000 photos that I had taken over the course of my life.01:03:50.840 --> 01:03:56.018


I wonder if I could use the language model to organize them for me.01:03:56.018 --> 01:03:59.206


Yes, absolutely oh.01:03:59.568 --> 01:04:00.289


It's possible.01:04:00.309 --> 01:04:01.233


yes, I'm emailing you.01:04:01.373 --> 01:04:06.679


We're setting up a date, I'll send you some wine If I we're setting up a date.01:04:06.679 --> 01:04:14.101


Oh yeah, for example, if I launch this application, it starts analyzing I'm hoping to have not change the parameters so it starts analyzing the photos.01:04:14.101 --> 01:04:17.981


Can you see here that is going to?01:04:17.981 --> 01:04:24.681


Cpu is going to the max, so GPU is going to the max because images are processed.01:04:24.681 --> 01:04:28.481


And what happens here is that, yeah, it's analyzing my warehouse images.01:04:28.481 --> 01:04:32.423


The prompt that I have under that is analysis image.01:04:32.423 --> 01:04:34.802


Try to recognize the level of fill.01:04:34.802 --> 01:04:39.641


The fill level of this warehouse Gives me a JSON response.01:04:39.641 --> 01:04:50.364


This is the JSON response with a fill level and we store that field level inside the beans in Business Central.01:04:50.364 --> 01:04:51.860


So the location in Business Central.01:04:51.860 --> 01:05:07.360


So the model is first analyzing the image locally and then calls Business Central in order to update a field that we have in the location card.01:05:07.380 --> 01:05:10.880


Wow, so make recommendations for you.01:05:12.023 --> 01:05:12.364


In order.01:05:12.364 --> 01:05:12.806


Exactly.01:05:12.806 --> 01:05:29.018


This is a local service or a local agent that is able to periodically analyze the image coming from the warehouse and store the data in business center In order to.01:05:29.018 --> 01:05:38.543


In business center we have an extension that blocks the possibility to do put away in certain locations that are filled over a certain level.01:05:38.543 --> 01:05:48.072


So this is handled by an agent running automatically that every time checks that camera images.01:05:48.072 --> 01:05:51.266


Handled by an agent running automatically that every time checks that camera images, analyze the images and blocks.01:05:51.266 --> 01:05:58.222


Another example here that we are not yet deployed this is deployed in LIME environment.01:05:58.222 --> 01:06:10.748


Another example related to that is, for example, that we are trying testing at the moment is relating to object counting.01:06:10.748 --> 01:06:15.045


So we have customers that do that.01:06:16.402 --> 01:06:17.193


Oh, and it counts that.01:06:17.275 --> 01:06:25.885


So we have customers that sell apples, and each apple must be placed into a box.01:06:28.936 --> 01:06:31.204


And how many apples can you fit in the box Exactly?01:06:31.295 --> 01:06:48.434


And this box contains apples, and we are testing a local agent that scans every box of apples and returns the content, so it takes an image, a picture Exactly every box of apples and returns the content.01:06:49.496 --> 01:06:52.077


See it takes an image, a picture as you Exactly it takes the pictures here.01:06:52.077 --> 01:07:16.862


If I have now, it's just the text, but if I have this agent that every time now you can see this working starts analyzing each image and gives me the count of the number of the apples that there are in this image in a JSON format that I can use in order to do actions.01:07:18.097 --> 01:07:19.119


Wow, that's so cool.01:07:19.119 --> 01:07:20.143


This is amazing.01:07:20.143 --> 01:07:21.340


So your own local.01:07:21.894 --> 01:07:39.338


So, as you can see, here I have a description and account it's well, this it the impressive thing here, besides it being local and not having to use a cloud service which may have cost or if you're working with sensitive data.01:07:39.338 --> 01:08:02.402


But these are just additional practical uses of ai within a business, or even a business central implementation where you can easily see in this your scenario where you're counting apples, where you may have had an individual have to count those before now, you can use AI to count those, or even managing your warehouse without sending someone out to see now.01:08:02.422 --> 01:08:06.530


AI can analyze your warehouse and tell you.01:08:08.289 --> 01:08:15.277


It's an autonomous agent that can work when you want and.01:08:17.979 --> 01:08:18.502


I'm sold.01:08:18.502 --> 01:08:24.784


This to me has opened up a lot of thought.01:08:24.784 --> 01:08:29.182


Even geez in my house, I could use this in my house to do stuff.01:08:29.574 --> 01:08:38.649


Quick question on the inventory Can you use these mini LLMs to do maybe even forecasting?01:08:39.875 --> 01:08:41.962


Yes, you can absolutely do.01:08:41.962 --> 01:08:46.363


There are LLMs that are good on that.01:08:46.363 --> 01:08:49.801


Wow, deepseek, for example, is good on that.01:08:51.074 --> 01:08:52.868


So you can have your own local LLMs.01:08:54.899 --> 01:09:03.646


If you pass, obviously the LLM, as in the previous example, so the LLM needs to know the knowledge.01:09:03.646 --> 01:09:11.461


So if you pass, for example, your I don't know purchase order, sales order or Item ledger entries.01:09:11.814 --> 01:09:13.341


Item ledger entries, something like that.01:09:14.175 --> 01:09:23.722


If you pass that to the model, the model is able to reason on that and it can analyze your trends and gives you the response.01:09:23.722 --> 01:09:27.905


That's amazing because you know how many times where people want to do that.01:09:27.905 --> 01:09:29.527


It absolutely works.01:09:31.938 --> 01:09:35.606


There are so many practical uses of this with the different models.01:09:35.606 --> 01:09:40.002


I'm speechless in a sense.01:09:40.002 --> 01:09:48.389


I can see so many different uses of it because now we can interact with business central data bi-directionally.01:09:48.389 --> 01:09:55.185


So you're getting information from a JSON in a JSON format that you can send back and update business central, but you can also teach it on the data.01:09:55.185 --> 01:09:57.802


Yeah, and it's all local, so it's secure.01:09:58.817 --> 01:09:59.841


It's pretty local.01:10:01.796 --> 01:10:10.787


Say that, chris, so it's more conversational now, versus just looking at numbers and then like trying to figure out okay, this is, this is what's recommending.01:10:10.787 --> 01:10:26.686


Now you can to like I'm thinking ahead a little bit here where you can use this tool to make the recommendation and forecast and possibly perhaps you can send that information back to Business Central based on the results.01:10:26.686 --> 01:10:29.289


Exactly that's crazy.01:10:30.195 --> 01:10:40.502


Yeah, the power of Asvering, so forcing the model to not just Asvering text but Asvering in a structural format.01:10:40.502 --> 01:11:01.509


You can in the prompt you can say the model I want always the response in this particular format is powerful because you can then extract data from the response and do actions Like in this example we update the content, we update the location card and something like that.01:11:01.509 --> 01:11:52.467


I have, for example, here another example that I'm currently testing, for example, in our company we have I think for you is the same we have a lot of customers, business center, online customers deployed on different tenants and sometimes when we update an app, one of the apps that we have on AppSource we have quite a large set of apps in AppSource we would like to update those apps also to the online customers immediately, because maybe we have a fix or something like that, and sometimes this requires minimum standard.01:11:52.467 --> 01:11:57.002


Standard is going to each of the tenant in the admin center and the app.01:11:57.002 --> 01:12:13.670


Otherwise, you can use APIs for that in order to spread the apps to everywhere, but APIs are something that are not so at the end of everywhere.01:12:13.670 --> 01:12:24.082


So our consultants, for example, are not useful to use automation APIs or something like that in order to update the apps.01:12:24.715 --> 01:12:48.908


So here we are testing an agent for that, an AI agent, and here there's a set of AI agents that are able to talk with our consultant, asking what they want to do and providing actions.01:12:48.908 --> 01:13:09.567


So, yeah, just very quickly to show, because it's a prototype at the moment, but yeah, we have different agents, so a team of agents working together and the team of agents is there's a what I call here is a customer support agent.01:13:09.567 --> 01:13:14.921


That is the agent that is responsible to talk with my consultant.01:13:14.921 --> 01:13:24.625


There's a manager that is responsible to decide if an action can be done and there's what I call a PowerShell developer.01:13:24.625 --> 01:13:27.184


That is the agent that is responsible to do the actions.01:13:27.184 --> 01:13:39.457


So, just to show you something here, if I run this agent, okay, I have here a customer support agent that is talking to me and this give you.01:13:39.476 --> 01:13:41.222


Okay, hello, blah, blah, blah.01:13:41.222 --> 01:13:45.180


May I kindly ask if you have business center app that you would like to update today?01:13:45.180 --> 01:13:49.384


If so, please provide the app ID and the tenant ID.01:13:49.384 --> 01:13:55.262


If you would like to update all apps, so update all apps in a given tenant.01:13:55.262 --> 01:13:56.961


Please only provide the tenant ID.01:13:56.961 --> 01:14:01.016


Okay, yeah, I can write, let me.01:14:03.637 --> 01:14:12.914


So you designed this agent and you told it to create the prompt or the question to the consultant to answer Exactly.01:14:13.195 --> 01:14:14.698


Here is the agent that is.01:14:14.698 --> 01:14:19.636


I've made my prompt later I will show you is just simply a question to ask.01:14:19.636 --> 01:14:22.140


Politely, ask to my agent what you want to do.01:14:22.140 --> 01:14:33.377


I've given the instruction that if the consultant wants to update an app, he needs to provide the app ID and the tenant ID.01:14:33.377 --> 01:14:46.680


If he wants to update all the apps in the tenant, he needs to provide the tenant ID and not the app ID and you have different agents within the agent working together.01:14:46.881 --> 01:15:04.765


yeah, wow, so that goes back to where we're having a conversation brad like, remember how, how it's different agents doing specific tasks, and this is a perfect example where it's calling all the different agents say you need to work together to do this specific task well, you have an agent manager, right that?01:15:04.765 --> 01:15:05.567


Is so amazing.01:15:06.454 --> 01:15:14.823


So you have agents that have specific functions, and then you have an agent that manages the agents and uses the specific agents.01:15:15.795 --> 01:15:16.899


Yeah, it's exactly like this.01:15:16.899 --> 01:15:33.355


So, yeah, if I put, for example, this if I ask update, app this and I forgot to insert the tenant ID, the manager.01:15:33.355 --> 01:15:42.680


The manager asked to the customer support to talk to the customer, that the tenant ID must be provided.01:15:42.680 --> 01:15:46.734


And then the customer support agent asked me okay, thank you for providing the ID.01:15:46.734 --> 01:15:49.720


In order to proceed with updates, could you please provide the tenant ID?01:15:49.720 --> 01:15:53.479


That's so crazy.01:15:53.479 --> 01:15:58.367


And I put another GUID, for example.01:15:58.367 --> 01:16:00.300


Let me copy another.01:16:00.319 --> 01:16:00.400


GUID.01:16:00.400 --> 01:16:01.560


I'm so excited about this.01:16:01.560 --> 01:16:06.006


This is a perfect showing of how agents work together.01:16:06.475 --> 01:16:08.722


Okay, now I provided this.01:16:08.722 --> 01:16:11.721


The manager analyze.01:16:11.721 --> 01:16:13.145


Okay, everything provided.01:16:13.145 --> 01:16:18.823


Now the PowerShell executor is called and now there's a third agent that updates the app.01:16:18.823 --> 01:16:28.353


Here is a call to admin center API is done via function calling pass the tenant ID and the app ID.01:16:28.353 --> 01:16:31.479


A call to admin center APIs down via function calling pass the tenant ID and the app ID.01:16:31.479 --> 01:16:42.563


So there are three agents that works together in order to make a task when a customer support is responsible to ask what I need to do, complete an action.01:16:42.563 --> 01:16:53.375


The manager is responsible to involve each agent according to the task and the PowerShell is for me is an agent.01:16:53.475 --> 01:16:54.533


This is a perfect illustration.01:16:54.635 --> 01:17:00.979


Perfect illustration of how, what the future is gonna be, with different agents doing specific tasks.01:17:01.876 --> 01:17:03.381


This is amazing.01:17:03.381 --> 01:17:04.719


I mean we've gone through.01:17:05.595 --> 01:17:18.762


Without a GUID, the manager as you can see, that is the agent, the model here in this case is able to recognize that this is not a valid GUID.01:17:18.762 --> 01:17:36.367


So the manager is okay, customer support told to the customer and say to him that the GUID is not correct, and here the customer support say to me that please ensure that both should be valid GUID.01:17:36.367 --> 01:17:52.304


So here is an example of interaction between agents and this can be useful in order to, for example, provide a user interface for consultants in order to update apps on tenants.01:17:54.958 --> 01:17:57.485


This is mind-numbing to me.01:17:57.485 --> 01:18:05.025


I can see so many different practical uses of this, so let's take it back If somebody wanted to work with this.01:18:05.025 --> 01:18:08.877


So let's just take a sequence of steps which I keep telling you.01:18:08.877 --> 01:18:12.546


I'm calling you later and we're going to set this up on one of my.01:18:12.546 --> 01:18:13.555


I use a mac.01:18:13.555 --> 01:18:17.310


I use parallel, so I'll create a mac vm and we'll set all this stuff up.01:18:17.310 --> 01:18:19.658


What are the steps that someone has to go through?01:18:19.658 --> 01:18:24.038


So the first thing is they have to determine which model they want to use correct, exactly.01:18:24.038 --> 01:18:25.975


The first is determine which model you want to use correct Exactly.01:18:26.114 --> 01:18:29.885


The first is determine which model you want to use.01:18:29.885 --> 01:18:31.087


Based on your scenario.01:18:33.519 --> 01:18:36.345


And starting point.01:18:36.345 --> 01:18:42.965


So first of all let me go a step back.01:18:42.965 --> 01:18:47.518


The first is my opinion.01:18:47.518 --> 01:18:48.559


The first is okay.01:18:48.578 --> 01:19:06.032


If you want to run a local model, first of all select the platform to host your local model, and there are different platforms to host local models, some more complex, some less complex.01:19:06.032 --> 01:19:15.494


I honestly suggest using Ollama.01:19:15.494 --> 01:19:21.971


Ollama is a great platform for hosting local models.01:19:21.971 --> 01:19:42.506


You simply download Ollama for Windows, for Linux, for MacOS, and when you have Ollama downloaded, simply Ollama has a set of models here, the same models that I previously showed, divided for vendors.01:19:42.506 --> 01:19:46.563


For each model there are.01:19:46.563 --> 01:20:00.877


If a model is unique, like Microsoft VIE4, there's only this model to download and simply write olamapool, vie4, and it downloads you the model locally.01:20:00.877 --> 01:20:19.943


If you have a more complex model, like DeepSeq, you can download one of the available distillation of the models starting from this.01:20:19.943 --> 01:20:45.202


That is the big DeepSeq, the biggest available in Ollama that can be run locally to the smallest I've previously used this, so deep seek simply run this and your model is up to be executed into your local machine and available as a local service.01:20:46.796 --> 01:20:49.484


If you don't want to use Ollama, there's LMS Studio.01:20:49.484 --> 01:20:56.271


That is another available tool for running local model.01:20:56.271 --> 01:21:01.106


Lms Studio is much more user-friendly because Ollama has no user interface.01:21:01.106 --> 01:21:09.613


Ollama runs as a service like I have here, or LAMA runs as a service like I have here.01:21:09.613 --> 01:21:11.634


Lm Studio instead has a user interface that you can chat with the model.01:21:11.634 --> 01:21:16.266


It's something like more user-friendly.01:21:16.266 --> 01:21:27.247


Otherwise, there are other tools like this, like this.01:21:28.256 --> 01:21:35.069


This Lama CPP is another tool available to run local models.01:21:35.069 --> 01:21:40.597


I don't remember where is the repo, if it is yes, it is.01:21:40.597 --> 01:22:02.082


You can download this tool and run easily with a simple command, one on the model, with using this command minus m name of your model or URL of your model that you can download from here.01:22:02.082 --> 01:22:05.744


The URL of this that you have appears here.01:22:05.744 --> 01:22:10.381


Or you can launch a server.01:22:10.381 --> 01:22:14.465


I, honestly, all my sample that I use are using Ollama.01:22:14.465 --> 01:22:16.980


It's easy and it's powerful.01:22:16.980 --> 01:22:24.717


When you have the platform, you can then decide the model Our model to use.01:22:24.717 --> 01:22:29.711


Obviously, it depends on your needs.01:22:29.711 --> 01:22:41.128


Sometimes you need a lot of power, like DeepSeq is able to, for example, doing reasoning.01:22:41.128 --> 01:22:54.823


So if you have something like needs to do advanced reasoning like, for example, I have to create a forecasting application probably DeepSeq is better because it can do complex reasoning.01:22:54.823 --> 01:22:56.565


More parameters.01:22:58.128 --> 01:23:00.158


Exactly Parameters.01:23:00.158 --> 01:23:04.025


Yes, the model of the parameters depends obviously on your local resources.01:23:04.025 --> 01:23:10.307


So download accordingly to your local resources.01:23:10.307 --> 01:23:30.751


So if I have, for example, I don't know 60 GB of machines, probably here is my limit I cannot download these because otherwise it will be too slow to have a response.01:23:30.751 --> 01:23:35.025


But these are absolutely tests that you can do.01:23:35.025 --> 01:23:37.579


So you can download the model, try.01:23:37.579 --> 01:23:43.010


If it's too slow, go to the smallest version.01:23:45.519 --> 01:23:50.266


And my personal experience, deepseq is a great model for advanced reasoning.01:23:50.266 --> 01:23:59.925


So if you have, if you require advanced reasoning, general text question or code generation, deepseq is good.01:23:59.925 --> 01:24:28.362


In the way the open source family, my favorite models in absolute are these Lama 3.3 for me, is one of the models that is able to satisfy every need I have today, especially when working with Business Central.01:24:28.362 --> 01:24:31.755


Is able to perform function calling, is able to do, honestly, quite everything.01:24:31.755 --> 01:24:33.497


It's not a reasoning model.01:24:33.497 --> 01:24:42.761


So if you require complex reasoning, deepseq is better, but for every other task, lama 3.3 is great.01:24:42.761 --> 01:24:48.971


Otherwise, my required choice is FII4 from Microsoft.01:24:48.971 --> 01:25:05.306


That is another great OpenSUSE model, honestly quite comparable to what the result that you have in GTP, gpt-48 in many fields, and these are also listed here in this order because they are the most downloaded.01:25:05.988 --> 01:25:15.628


OpenSuite models Okay so we take a platform, we take a model, we install it and we're up and running, basically.01:25:16.936 --> 01:25:17.717


You are up and running.01:25:17.717 --> 01:25:20.601


Obviously your model is up and running.01:25:20.601 --> 01:25:24.648


You can use your model like a local chat, like I've done here.01:25:24.648 --> 01:25:33.920


So here I have all my local models and I can select one and start using the local model as a chat.01:25:34.335 --> 01:25:35.740


What are you using to connect to?01:25:35.740 --> 01:25:38.663


Which application are you using to connect to your models?01:25:39.215 --> 01:25:42.064


This is another open source application called.01:25:45.997 --> 01:25:46.539


This MSDF.01:25:48.957 --> 01:25:54.484


If you want to have a user interface Otherwise, via command line, you can.01:25:54.484 --> 01:25:58.466


Every model offers the command line interface to interact.01:25:58.466 --> 01:26:05.746


So when you download the model, the model starts and then from command line you can start typing and the model answers.01:26:05.746 --> 01:26:07.990


Model starts and then from command line you can start typing and the model answers.01:26:07.990 --> 01:26:15.926


If you want something more user friendly, a local user interface is required.01:26:15.926 --> 01:26:17.921


I use this because it permits.01:26:17.921 --> 01:26:26.367


This is an open source user interface that is able to work with local models.01:26:27.395 --> 01:26:28.117


What is the name of it?01:26:28.117 --> 01:26:31.625


Again, misty, misty.01:26:32.086 --> 01:26:32.386


M-S-T-Y.01:26:37.550 --> 01:26:58.444


M-S-T-Y Exactly, so you can download for the platform you want and automatically it discovers if you have downloaded local models and all your local models are available here and you can also add online providers.01:26:58.444 --> 01:27:07.686


So if you have I don't know if you have an account with OpenAI, an account with DicOnline and so on you can also use the model from here.01:27:08.435 --> 01:27:10.863


So then you have a local desktop application.01:27:13.319 --> 01:27:13.680


Exactly.01:27:13.680 --> 01:27:22.984


I always use this because it's useful for testing, for example, if you want to test a prompt or something like that it is nice for testing.01:27:24.957 --> 01:27:47.184


Then, instead of for creating application, application are creating in code in my case, so I have some so we now have an interface to the model via command prompt or via a tool, and then as far as sending our data to it, does that vary from model to model on how to do that?01:27:49.755 --> 01:28:01.384


Sending data to model is not related to model, or for sending data to model, you have essentially two ways.01:28:01.384 --> 01:28:11.456


First of all, you can use the REST APIs exposed by the model itself when you download the model.01:28:11.456 --> 01:28:23.426


The model is available, as I previously showed, as a service, local service, so you can use REST APIs to talk with the model and these possible scenarios.01:28:23.426 --> 01:28:29.818


But in this case you need to know the format of each REST APIs of each model.01:28:29.818 --> 01:28:34.863


Usually they're quite the same, but you need to know the format of these models.01:28:34.863 --> 01:28:47.523


It's always explained if you go on Hugging Face Hugging Face is the main portal for open source models Each model has the explanation of their APIs.01:28:48.855 --> 01:28:50.682


I honestly don't never do that.01:28:50.682 --> 01:28:57.645


That's why I previously show you my examples here.01:28:57.645 --> 01:29:02.760


Always use use example that I show.01:29:02.760 --> 01:29:10.762


Also use abstraction tools, like, for example, here I'm using a semantic kernel.01:29:10.762 --> 01:29:32.029


A semantic kernel is an abstraction tool, so with semantic kernel, I don't need to take care of knowing the rest API that I need to use with GPT-48, with DeepSeq, with OpenAI or something like that, because it does that for me, with the deep seek, with the open AI or something like that, because it does that for me.01:29:32.234 --> 01:29:36.194


So you downloaded Samantha Kernel and you installed Samantha Kernel and that interfaces with your local model.01:29:36.295 --> 01:29:44.621


Exactly this is when creating advanced solutions and you don't want to rely on REST APIs.01:29:44.621 --> 01:30:10.055


It's a recommended approach because this solution can be easily swapped between different providers and, honestly, when I create an AI solution or an AI agent or something like that, I would like to be able to use different providers Also.01:30:10.055 --> 01:30:26.399


The previous example that I show, where I show a solution where three agents work together, I would like also to have the freedom to have previously in my solution there was three agents the manager, the customer support and the PowerShell executor.01:30:26.399 --> 01:30:42.359


I can, in that solution, I can say that the PowerShell executor uses GPT-4a, while the customer support only uses GPT-3.5 because it costs less.01:30:42.359 --> 01:31:10.283


So I can spread the model across agents also, and so creating a AI solution that are platform agnostic sometimes are great, because your same solution can be executed with the platform and if, a month later, I want to change the platform, I can change that easily.01:31:11.845 --> 01:31:14.365


For example, one example was DeepSeek Online.01:31:14.365 --> 01:31:32.743


Deepseek Online, when it was released, was the cheapest model of the history, so DeepSeek costs really quite nothing and it's very powerful.01:31:32.743 --> 01:31:42.186


So compared, for example, to Microsoft's GPT-4A, gpt-4a costs 10,000 more for each call compared to DeepSeq.01:31:42.186 --> 01:32:01.979


We have some solution deployed lots of months ago, when DeepSeq was not available, that simply by changing the parameters and changing that parameter to DeepSeq works without changing nothing.01:32:01.998 --> 01:32:02.841


So that's the key then.01:32:02.841 --> 01:32:13.462


So it's to install the model, choose the platform, install the model semantic the platform, install the model semantic kernel, and then you're on your way, and then as you had just mentioned Exactly, you can use totally cross-platform applications.01:32:16.056 --> 01:32:17.341


So you're not tied to a model.01:32:17.341 --> 01:32:22.542


At that point, the semantic kernel will communicate with the model.01:32:22.542 --> 01:32:25.043


You just tell it which model to use.01:32:25.043 --> 01:32:31.439


So in your case, as you had mentioned, you had started with one model.01:32:31.439 --> 01:32:39.029


A new model was released and, simply by changing which model to use, your application was still functional using the new model exactly this is great.01:32:39.088 --> 01:32:43.301


This is great exactly, it's not a platform Gnostic, and that meets you too.01:32:45.064 --> 01:32:47.488


And my favorite part is this all is running on a Mac.01:32:49.015 --> 01:32:59.844


And this runs on a Mac, I'm obviously I love Windows but honestly, for AI staffs, the Macs have something more.01:32:59.844 --> 01:33:02.279


The.01:33:02.319 --> 01:33:02.600


Macs.01:33:02.600 --> 01:33:11.037


Listen, I like Windows too, don't get me wrong, but the Macs always have something more, and I'm thankful that we can communicate with Business Central with VS Code, Especially for.01:33:11.057 --> 01:33:11.520


AI stuff.01:33:11.520 --> 01:33:17.485


The Mac has lots of more power than compared to Windows.01:33:19.516 --> 01:33:20.421


I'm glad you said that.01:33:20.421 --> 01:33:22.661


Yeah, can you repeat that again?01:33:22.661 --> 01:33:23.623


We're best friends.01:33:26.555 --> 01:33:30.527


Well, Stefano, you had blown my mind this is amazing.01:33:30.675 --> 01:33:34.889


I just downloaded MST, by the way, just so that I can interact with this stuff.01:33:35.015 --> 01:33:42.689


I'm just telling you I hope you're not going to bed soon because I'm going to send you a text with a question asking for all these links and a meeting.01:33:42.689 --> 01:33:45.042


So just give me a few minutes.01:33:45.042 --> 01:33:47.301


This is amazing.01:33:47.301 --> 01:34:06.684


You've covered so much and you've inspired I know me and I'm sure, anyone listening to see how you could utilize a local language model or AI in that sense, there are lots of scenarios where this fits.01:34:06.684 --> 01:34:08.881


It fits everywhere.01:34:08.881 --> 01:34:15.228


Just your scenarios of the warehouse the apples, the agents.01:34:15.247 --> 01:34:15.828


The vision.01:34:16.655 --> 01:34:30.640


It's just to show that you've just, in those examples that you've given us, have crossed many areas, how you could use AI to gain efficiency within an implementation, and I think it's wonderful.01:34:30.640 --> 01:34:39.288


It's amazing and I'm sort of speechless because my mind is thinking of all the applications of this.01:34:40.375 --> 01:34:48.849


Yes, I think it fits, especially when talking about the term that now is a lot of topic today.01:34:48.849 --> 01:35:09.582


So the agentic AI features because, especially in the business central world, we are always so the most common features are I click an action on business central and this action does, calls an LLM and does something, so it's user-centric.01:35:09.582 --> 01:35:16.266


Here we are moving a step over of that.01:35:16.266 --> 01:35:23.618


So local LLMs, in my opinion, are extremely powerful when you want to create agents that work with business center.01:35:23.618 --> 01:35:33.020


So I am a company and inside my company, I want to have offline agents that does actions or decision or something like that.01:35:33.020 --> 01:35:35.279


Also we business center data.01:35:35.279 --> 01:35:42.743


So or do actions inside business center data, like in this example stupid example, but they can, I think, can give the idea.01:35:42.743 --> 01:36:06.345


So these are local applications running autonomously and also maybe in teams, in team teams, not teams the application, but teams in terms of groups of Organization teams yeah, Exactly, you can have multiple agents that work together in order to achieve a task.01:36:07.155 --> 01:36:18.291


I think it eases the minds of organizations or businesses where they may be afraid of using LLMs online and they want to maintain their data within their organization.01:36:18.291 --> 01:36:26.426


This, right here, is a game changer of seeing a good example of use of local LLMs.01:36:26.787 --> 01:36:36.038


It's not only the security concerns of sharing sensitive data, and I use the word sensitive meaning anything that someone feels they don't want to share with someone else.01:36:36.038 --> 01:36:52.823


It doesn't have to be sensitive in the sense of identification If I don't want to share my sales, for example, but it's also a way to help control your cost, so it's a-factor you have a fixed cost in a sense, because you have the machine or the hardware to run it.01:36:52.844 --> 01:37:05.853


Yes, if you have the right hardware, but that's a fixed cost in a sense, outside of the electricity to power that hardware, whereas with some of these other models, depending upon how much you use it, your cost could fluctuate or vary.01:37:05.853 --> 01:37:15.376


Where this gives you a fixed cost and you have control of the data, um, I don't, I think I don't even know what to say anymore.01:37:15.376 --> 01:37:33.034


My mind is full of all of this and now I have a greater appreciation, uh, for all of the things that you've been sharing and posting about local language models or running language models locally, large language models locally, see I thought it's not just locally in your machine.01:37:33.135 --> 01:37:35.962


You can I mean you could technically have this on azure.01:37:35.962 --> 01:37:37.365


It just means it's offline, right?01:37:37.365 --> 01:37:42.032


It just right so you could put in a virtual machine.01:37:42.032 --> 01:37:52.527


It just means that you don't need to give it access to the online world yes absolutely well, mr stefano.01:37:52.546 --> 01:37:55.030


Thank you, I I was sold.01:37:55.431 --> 01:37:56.954


He had me a hello, as they say.01:37:56.954 --> 01:38:04.542


I know that for you is evening, so having this year no, this, this is great.01:38:04.582 --> 01:38:06.426


Thank you very much for taking the time to speak with us.01:38:06.426 --> 01:38:09.539


This was wonderful.01:38:09.539 --> 01:38:23.956


You shared so much information to help break down what running a local running a large language model locally entails, and also extremely valuable scenarios on how it can be applied.01:38:23.956 --> 01:38:37.600


If anyone would like to contact you, has additional questions or may want to learn more about large language models and or see some of the other great things that you've done, what is the best way to contact you?01:38:39.515 --> 01:38:45.542


I'm always available on LinkedIn or on X or on Blue Sky.01:38:45.542 --> 01:38:54.061


You can reach me directly on that social it's probably the best or directly from the contact of my website.01:38:54.061 --> 01:38:58.079


It's the best way to reach me directly.01:38:58.079 --> 01:39:02.501


I always ask there to as many of you know.01:39:02.501 --> 01:39:10.244


I'm always available to ask there, so feel free to contact me if you have follow-ups.01:39:11.568 --> 01:39:20.154


Excellent, excellent, and I definitely would like to have you back on again to follow up to this, because, seeing all the great things that you've been doing, I can only imagine where you'll be in a few months.01:39:20.154 --> 01:39:27.082


So we'll have to see if we can get you on later in the year to see where you have taken some of this.01:39:27.082 --> 01:39:29.944


Are you going to directions in North America?01:39:31.315 --> 01:39:47.850


Unfortunately I will skip directions in North America this year I will be at the Dynamics Mines and we are organizing the Business Central Day event in Italy with directions.01:39:47.850 --> 01:40:02.523


We have a lot of work in order to be able to do this event and it is extremely near to Direction NA.01:40:02.523 --> 01:40:15.143


My initial plan was to go with Duilio to do the session that we have done in Direction NA about large customers.01:40:15.143 --> 01:40:29.002


This was a very appreciated session and we would like to repeat that session to Direction NA, but when we started the organization of DirectionNA, but when we started the organization of Direction Italy.01:40:29.002 --> 01:40:52.627


Unfortunately we are forced to have a fixed date by Microsoft Italy because they give us the headquarter for the event and so it's extremely near to direction and for me it's not possible to be outside my company.01:40:52.627 --> 01:40:53.467


It's so large.01:40:54.778 --> 01:40:55.582


We understand that.01:40:55.582 --> 01:41:04.944


We ourselves run into the challenges of which conferences, of which events to attend, because there are many and, as we talked about, there's some travel considerations as well.01:41:05.095 --> 01:41:10.145


The problem is that sometimes these events are really near each other.01:41:10.145 --> 01:41:16.622


So when you have a get a major choice that requires uh.01:41:16.622 --> 01:41:27.868


So sometimes my company is flexible to permit me to go a week outside for events about two weeks uh when you're doing all this great stuff.01:41:27.908 --> 01:41:28.389


I can see that.01:41:28.389 --> 01:41:29.136


That's okay.01:41:29.136 --> 01:41:30.621


We'll have pizza with willio again.01:41:30.681 --> 01:41:36.998


So it does, it's, it's uh, we, I will be, we will be us for sure in this year.01:41:36.998 --> 01:41:39.625


Uh, it's a promise that I've done with willio.01:41:39.625 --> 01:41:46.786


If not, the ration na may be, uh, the other direction US, or something like that, but we will do it.01:41:46.805 --> 01:41:48.288


Well, there's Days of Knowledge, we'll do it.01:41:48.288 --> 01:41:49.701


And then Summit is in October.01:41:49.701 --> 01:41:54.835


And the call for speakers opened up for that, and that's in October.01:41:54.854 --> 01:41:56.802


We are planning to go in one of that.01:41:58.735 --> 01:42:00.239


I'll be looking forward to see you in person.01:42:01.382 --> 01:42:01.864


Yeah yeah.01:42:01.864 --> 01:42:02.667


That would be excellent.01:42:02.667 --> 01:42:04.421


Well, sir, thank you very much.01:42:04.494 --> 01:42:11.885


We appreciate you taking the time with us, and I look forward to speaking with you soon and, as always, for your great podcast and your great initiative you are doing.01:42:11.885 --> 01:42:13.828


Thank you, thank you very much.01:42:14.215 --> 01:42:15.537


We appreciate you.01:42:15.537 --> 01:42:16.279


Thank you very much, sir.01:42:16.520 --> 01:42:17.944


Thank you, stefan.01:42:17.944 --> 01:42:18.987


All right, ciao, ciao, bye-bye.01:42:19.367 --> 01:42:20.256


Ciao, bye-bye.01:42:20.256 --> 01:42:28.863


Thank you, chris, for your time for another episode of In the Dynamics Corner Chair, and thank you to our guests for participating.01:42:29.175 --> 01:42:30.662


Thank you, brad, for your time.01:42:30.662 --> 01:42:34.125


It is a wonderful episode of Dynamics Corner Chair.01:42:34.125 --> 01:42:37.645


I would also like to thank our guests for joining us.01:42:37.645 --> 01:42:40.663


Thank you for all of our listeners tuning in as well.01:42:40.663 --> 01:42:55.188


You can find Brad at developerlifecom, that is D-V-L-P-R-L-I-F-E dot com, and you can interact with them via Twitter D-V-L-P-R-L-I-F-E.01:42:55.188 --> 01:43:08.546


You can also find me at matalinoio, m-a-t-a-l-i-n-o dot I-O, and my Twitter handle is Mattelino16.01:43:08.546 --> 01:43:12.243


And you can see those links down below in the show notes.01:43:12.243 --> 01:43:13.606


Again, thank you everyone.01:43:13.606 --> 01:43:15.161


Thank you and take care.

Stefano Demiliani Profile Photo

Stefano Demiliani

Microsoft MVP - Microsoft MCT Regional Lead for Italy - CTO@EID|Lodestar Group