It gives high speed at any scale and is apt for fast and flexible app development. In this video we will be covering overview of cosmos db. What exactly is azure cosmos db, creating an azure cosmos db? What are the request units choosing a partition, key, creating a database and container in cosmos db and some questions? It would be helpful, especially when youre preparing for azure data science, certification, thats data engineering on microsoft, azure db203, which will earn you a short data. Engineer, certification welcome to another episode of azure video series from k21 academy, where we take you from complete beggner, covering implement data storage and designing for data security to all the way designing for resilience, including batch processing, analytics architecture and monitoring, as well as how to prepare. For the azure data engineer certification, we have taken a clip from one of our certification training program on data engineering, on microsoft, azure thats dp203, and in this clip a microsoft certified trainer will talk about azure cosmos db. So this is a clip taken from a module on building globally distributed database with cosmos db. Now lets hear from my expert trainer on the same as part of model number four. We are talking about cosmos database and well understand whats, the meaning of cosmos database. How do we create it? How do we query the cosmos db? How do we uh create any application in cosmos with cosmos database? So lesson number four is optional: its not part of your certification as such, its just fi information and well also talk about distribute your data globally with azure cosmos db right, so lets get started.

So the very first thing is like uh lesson: number one: what is cosmos db, so what is db account? What is request unit? What is partition key? So we understand each components and element of cosmos database. So just fyi cosmos databases are very highly scalable globally distributed database, so in the starting when we create the cosmos db for the first time. It is having one end point, so you can see that here one cosmos database account is getting created, so account is more like a wrapper and inside the wrapper. You can have multiple databases. So once the account is created, you can select the multiple regions in the world where you want to make your endpoint. So maybe if i have a distributed data application, so one of my endpoint can be south india data base a south india data center one another endpoint can be another endpoint of cosmos database can be west, india, uh data center or another can be in east u.s. Another can be in uae central uh, uae north, maybe or another endpoint may be in west europe, so the application might be globally distributed and having all these databases are syncing with each other and wherever the customers are. The applications are pointing to the nearest end point. So if we need so, if any company requests database which is globally distributed, then cosmos db comes into picture now. Something which i want to uh talk about over here is uh scalability, its its its uh.

You can say we have options such as autopilot autopilot means, as per the requirement automatically cosmos db, will scale out scaling so its highly scalable when it comes to performance couple of numbers. I want you to remember so when, when read latency comes into picture, read latency is less than 10 milliseconds. When we talk about right, latency right is less than 15 milliseconds. So these remember these two numbers and uh like how performant is cosmos database, its highly available. It is having multiple endpoints globally so when it comes to availability, even for if one of the endpoint goes down, we have more endpoints available from where we can read so its a its like. A active active sites are there and its pretty high available and sdk is available in all the programming languages for with which you can create applications using cosmos db. So im taking back you to the the next slide and the this is the account page. How do we look and feel like? How does it look and feel like and in the lab we will be creating it right, so you have to hang in for the hands on part. Now, before i go ahead, i want you to understand uh something pretty important over here. There are four things i want you to understand, or maybe uh actually uh. So let me use microsoft. Documentation for that purpose now check this out over here. Now there is a so we have various type of non relational data stores are there.

One of the data store is called as key value pair, and this is how the key value pair looks like. So if i want cosmos database to become a key value pair and store information in key and value format, then i will select uh. You see this option over here. Api sql api right. So in this option in the drop down menu we will select table api. So if you select table api over there, it will become key value pair and the information will be stored in this format. Secondly, if i want document database right a lot of times, uh, you can say uh its document. Databases are pretty easy to work with schema. Less and you can say very flexible as such and schema will be imposed on on read not on write, so you can write very fast as well. So if i cospose database to behave like a document database, i have two options which i can go go for. First option is, i can leave it like uh sql, core api, or i can select mongodb api over there. So if i select cospos tb with sql api, it is compatible with sql, so sql queries can run over there. If i select mongodb api, it is still a document database, but now longer queries can run over there. So apart from this, this is called as graph database. So if we select gremlin api while creating cosmos database, this is how it stores information in graph format and uh.

Similarly, we have a column family database. So if we select uh cassandra api, this is how it stores information in column family. You can say in column, family database uh like that and the example of one column. Family database in the world is called as cassandra, so these are various modes of cosmos db. So if somebody says that cosmos database is multimodal the meaning of that is it is, it can have multiple modes like that, like it can have various ways in which it stores information its a non relational database, but it can store information in multiple ways, graph database Graph manner, key value pair, documentdb or column family database like that now, how do we pay for it? So remember this line, which i am saying one request unit, is equal to the amount of resources required to read one kb of document in one second, remember this. So if i have one kb of document, i want to read that in one second, i so i have to provision one request unit to do so. So if i have 400 kb of document, i and i want to read 400 kb of document in one. Second, i need to provision 400 rus over there. Remember this. So what is request unit see now, if in infrastructure as service, we have ram cpu, iops and so and so forth, by which you can provision uh, you can provision resources, uh or provision in database terms.

You can provision lets, say throughput over there read and write. How much can we read write per second, however, now cosmos database is not infrastructure service, its more like a platform service globally distributed database. So what did they do? They gave you one new term and the new term is called as request unit and request unit is me request. Unit means amount of resources, amount of resources can be any resource, cpu ram, iops storage, uh anything behind the scenes which is required for you to read or write with some sort of can say throughput, so we pay in request unit. So you can assign request units to your database or the containers, and you will be able to read write accordingly. So if you try to read more than what is allocated, obviously your sub – we are supposed to get some error and the error which we get is called is 429, so 429 basically means rate limit upper limit hit. So what do we do? Either? We give more resources to cosmos database or we simply uh enable autopilot over there. Autopilot means that it will automatically scale out scaling, thats a pretty interesting functionality. Now, whenever databases comes into picture, partitioning is of will be there and how so one of the art of any database, illustrator or data engineer uh will be in terms of how to select the partition key. Now, if we, if i select the wrong partition key, my queries will not be optimized.

If i sell, if i dont do partitioning, it will be pretty uh, you can say pretty pretty bad uh. If i select the right partition key and there is no – and for its there is no thumb rule as such. For every time you can say for every scenario, every type of data, every type of queries the partition key – can change. So what do we do? We? So we do a reverse engineering. First of all, we go and check out what type of queries well be running, so maybe lets say i have 10 queries which are running over there, so we cannot make everyone happy. We can make maybe 70 of the people happy or maybe 80 of the people happy. So out of you can say uh 10 percent 10 queries as such. You will try to do the partitioning in a manner which can keep six to seven or maybe eight queries. Happy in that particular sense, so you can see uh. Why do we have partition uh? Why have a partition strategy having a partition strategy ensures that your database needs to grow so it can do easily and continue to perform efficiently so and continue to perform efficient queries and transactions. What is partition key partition key is the value by which the azure organizer data in the logical division and these logical divisions are behind the scenes and physical partitions range of values. The more value your partition key has the more scalability you have, but remember one thing do not select a partition key, which is having very unique value, for example like cell phone uh as such, or primary key of some sort of or having too unique.

If the uniqueness of the we call this cardinality as such high cardinality values, so if, if the unique, if, if the keys are not repeating, then your data will be too distributed and that will be all thats also a part of wrong partition strategy during the session. I will explain more about partition ill, take more time into explaining in using whiteboard and so and so forth. However, uh so as youre just discussing about to determine the best partition key for read heavy workload review the top three queries out of the five queries you are planning to do so do a reverse engineering and do the partitioning accordingly transactional workload for right, heavy workload. You need to understand the transactional needs of your workload. How are we doing the transactions asset and based on that? We do the partitioning key choosing now inside cosmos database account. You have the database over there now cosmos database is equal to database in relational database sense. In sql you can imagine like that and container is equal to table in a relational database. So if you have, if you understand what is sql, you have seen, we understand what is sql right. Sql is basically having some sort of databases and tables and tables are having rows inside there and so and so forth. So container is equal to table and rows is equal to documents. Container is equal to table and rows inside our table is equal to documents inside container.

Remember that and database is equal to database. So we create databases we assign the throughput over there. We specify the containers and these containers are having some partitions and inside that we store our data in some sort of panel, like documents might be there. So lets have a quick, uh revision and a couple of more figures. I want you to understand so check this out, so you want to ensure there is 99.999 percent availability for read and write of your data. How can this be achieved? So what we can do? We can configure multi region accounts and we can do multi region rights as such, so you just need to enable multiple cosmos, db endpoints and then that high availability will, you can say the dot nines will increase. What are the main advantages of using cosmos database now main advantage of cospos database? Uh is basically that it is global in nature. Multiple modes are there availability, minimum availability is 99.99 uh and latency is pretty good in around tens of milliseconds latency. Is there and actual numbers will be uh can say, read latency will be less than 10 milliseconds and write. Latency is less than 15 milliseconds to be in exact numbers like that in that particular sense, so uh cosmos db offer global distributed capabilities out of box, and it is minimum availability of 99.99 and uh response. Time of read. Write latency is typically in the order of tenths of millisecond right and remember that actual numbers less than 10 millisecond and less than 15 milliseconds over there.

So that was a clip taken from one of the lessons from our step by step training program on data engineering. On microsoft, azure tp203, now i would like to invite you for a free, 90 minute session with microsoft. Certified expert trainer, where we talk about azure data, engineer, training and share information about getting certified by using a step by step roadmap to go from complete beginner to a certified azure data. Engineer if you are interested register for a free class by going on to 20302, additionally, we will show a live demo. We will also share information about the certification exam, so you can register for free by going on to this url db20302. I will see you in another episode of azure data science.