What does it mean that Azure Cosmos DB is multi-model?

Cosmos DB is a single NoSQL data engine, an evolution of Document DB. When you create a container (“database instance”) you choose the most relevant API for your use case which optimises the way you interact with the underling data store and how the data is persisted in to that store.

So, depending on the API chosen, it projects the desired model (graph, column, key value or document) on to the underlying store.

You can only use one API against a container, multiple are not possible due to the way the data is stored and retrieved. The API dictates the storage model – graph, key value, column etc, but they all map back on to the same technology under the hood.

Multi-model, multi-API support

Azure Cosmos DB natively supports multiple data models including documents, key-value, graph, and column-family. The core content-model of Cosmos DB’s database engine is based on atom-record-sequence (ARS). Atoms consist of a small set of primitive types like string, bool, and number. Records are structs composed of these types. Sequences are arrays consisting of atoms, records, or sequences. The database engine can efficiently translate and project different data models onto the ARS-based data model. The core data model of Cosmos DB is natively accessible from dynamically typed programming languages and can be exposed as-is as JSON.

The service also supports popular database APIs for data access and querying. Cosmos DB’s database engine currently supports DocumentDB SQL, MongoDB, Azure Tables (preview), and Gremlin (preview). You can continue to build applications using popular OSS APIs and get all the benefits of a battle-tested and fully managed, globally distributed database service.

This article is referenced here;

https://stackoverflow.com/questions/44304947/what-does-it-mean-that-azure-cosmos-db-is-multi-model

Difference between SSIS, Azure Data Factory and Azure Data Bricks

For Data Engineering workloads within Microsoft landscape, there are multiple options to carry out Data Engineering tasks to extract data from myriad of data sources. Currently three options are available:

  • SQL Server Integration Services (SSIS): It is part of Microsoft SQL Server Suite and SSIS is a very well-known popular ETL tool for Data Integration along with rich built in transformations. Introduced in 2005. Mainly for on-premises. Aggregations, splits and joins.
  • Azure Data Factory (ADF): Unlike SSIS, ADF is a ELT tool along with Data Orchestration tool to build pipelines to move data across different layers. From on-Premise to Cloud and within Cloud landscape. Movement and Orchestration.
    • Data movement & Orchestration
    • Extract, Load & Transform
    • Transformation activities.

People familiar with SSIS can use it and existing SSIS packages can also be migrated.

Azure Data Bricks: Azure Data Bricks is latest entry into this for Data engineering and Data Science workloads, unlike SSIS and ADF which are more of Extract Transform Load (ETL), Extract Load Transform (ELT) and data Orchestration tools, Azure data bricks can handle data Engineering and data science workloads.

In a nutshell, although you can compare and contrast these tools, they actually compliment each other. For example you can call existing SSIS packages using Azure Data Factory and trigger Azure data bricks notebooks using Azure Data Factory.

The 7Ws Framework

1-Who is involved?
Person or organization of interest to the enterprise. That is, “Who is important to the business?” Often a ‘who’ is associated with a role such as Customer or Vendor
For example Employee, Patient, Gambler, Suspect, Customer, Vendor, Student, Passenger, Competitor. An invoice can have this info; Who sold it

2-What did they do? To what is it done?
Product or service of interest to the enterprise. It often refers to what the organization makes that keeps it in business. That is, “What is important to the business?”. For example Product, Service, Raw Material, Finished Good, Course, Song, Photograph. An invoice can have this info; What was sold

3-When did it happen?
Calendar or time interval of interest to the enterprise. That is, “When is the business in operation?”. For example Time, Date, Month, Quarter, Year, Calendar, Semester, Fiscal Period, Minute. An invoice can have this info; When was it sold

4-Where did it happen?
Location of interest to the enterprise. Location can refer to actual places as well as electronic places. That is, “Where is business conducted?. For example
Mailing Address, Distribution Point, Website URL, IP Address. An invoice can have this info; Where was it shipped

5-Why did it happen?
Event or transaction of interest to the enterprise. These events keep the business afloat. That is, “Why is the business in business?”. For example
Order, Return, Complaint, Withdrawal, Deposit, Compliment, Inquiry, Trade, Claim. An invoice can have this info; This happened because of an Order

6-How did it happen – in what manner?
Documentation of the event of interest to the enterprise. Documents record the events, such as a Purchase Order recording an Order event. That is, “How does the business stay in business?”. For example Invoice, Contract, Agreement, Account, Purchase Order, Speeding Ticket. An invoice can have this info; How many items were sold

7-How many or much was recorded – how can it be measured?
How much of documentation is recorded and what’s the left over. An invoice can have this info; How much items were sold

The 7Ws are interrogatives question forming words.
Fact table represents verbs. Dimensions that surround them are nouns. Out of 7Ws, 5Ws are dimension and 6th, 7th one is fact.

Data projects design that reflects ETL approach

Azure approach that reflects ETL;

  • Source: Identify the source systems to extract from.

In Azure, data sources include Azure Cosmos DB, Azure Data Lake, files, and Azure Blob storage

  • Ingest: Identify the technology and method to load the data.

During a load, many Azure destinations can accept data formatted as a JavaScript Object Notation (JSON), file, or blob. You might need to write code to interact with application APIs

Azure Data Factory offers built-in support for Azure Functions. You’ll also find support for many programming languages, including Node.js, .NET, Python, and Java. Although Extensible Markup Language (XML) was common in the past, most systems have migrated to JSON because of its flexibility as a semistructured data type.

  • Prepare: Identify the technology and method to transform or prepare the data

The most common tool is Azure Data Factory, which provides robust resources and nearly 100 enterprise connectors. Data Factory also allows you to transform data by using a wide variety of languages.

  • Analyze: Identify the technology and method to analyze the data.
  • Consume: Identify the technology and method to consume and present the data.

In traditional descriptive analytics projects, we might have transformed data in Azure Analysis Services and then used Power BI to consume the analyzed data. New AI technologies such as Azure Machine Learning services and Azure Notebooks provide a wider range of technologies to automate some of the required analysis.

You might find that you also need a repository to maintain information about your organization’s data sources and dictionaries. Azure Data Catalog can store this information centrally.