This Lab is organized into the following 4 Challenges:
Challenge | Description | Est. Time |
---|---|---|
Challenge 1 | Create a free ADX cluster | 15 Min |
Challenge 2 | Load Data from Azure Storage | 30 Min |
Challenge 3 | Starting with the basics of KQL | 1 Hour |
Challenge 4 | Explore and Transform Data | 45 min |
Each challenge has a set of tasks that need to be completed in order to move on to the next challenge. It is advisable to complete the challenges and tasks in the prescribed order.
In order to receive the "ADX-In-A-Day" digital badge, you will need to complete the tasks marked with ✅ in Lab 1 & Lab 2. Submit your answers for Lab 1 and Lab 2 quizzes in order to receive the "ADX in a Day" digital badge. You may edit your answers after or try again.
ℹ️ Note |
---|
For Lab 1, please submit the results for the tasks marked with ✅ in the following link: Quiz ADX in A Day Lab 1 |
ℹ️ Note |
---|
Please allow us 5 working days to issue the badge |
To use Azure Data Explorer (ADX), you first have to create a free ADX cluster, and create one or more databases in that cluster. Each database has tables. Then you can ingest data into a database so that you can run queries against it.
In this Challenge, you will create a Free cluster and a database. You will run simple KQL query in Kusto Web Explorer (KWE UI).
Tasks:
- Task 1: Create an ADX cluster and Database
- Task 2: Review the free cluster home page and the Azure Data Explorer Web UI
- Task 3: Write your first Kusto Query Language (KQL) query
Expected Learning Outcomes:
- Create and work with Free ADX cluster.
-
Create your free cluster and database here: Free ADX Cluster. If you want to open the ADX Web UI in another Tab click on the link holding down the
CTRL
Key.
-
Have a look at the cluster home page.
-
Click on the Icon My Cluster in the left navigation pane.
On the page My Cluster, you'll see the following:
- Your cluster's name, the option to upgrade to a full cluster, and the option to delete the cluster.
- Cluster details like: cluster's location, and URI links for connecting to your cluster via APIs or other tools.
- Quick actions you can take to get started with your cluster.
- A list of databases in your cluster.
-
Click on the Button Create in the tile Create Database
-
Enter a name for the database in field Database. As an example for the name you can use
FreeTestDB
.
ℹ️ Note |
---|
If you already have a free cluster and just want to create a new database for this lab, use the Create button in the Create database tile. |
What is a Kusto query?
Azure Data Explorer provides a web experience that enables you to connect to your Azure Data Explorer clusters and write and run Kusto Query Language queries. The web experience is available in the Azure portal and as a stand-alone web application, the Azure Data Explorer Web UI, which we will use later.
A Kusto query is a read-only request to process data and return results. The request is stated in plain text that's easy to read. A Kusto query has one or more query statements and returns data in a tabular or graph format.
In the next Challenge, we'll ingest data to the cluster, and then learn the most important concepts in KQL and write interesting queries. In this task, you will write a basic query to get an understanding of the environment.
In this example, you'll use the Azure Data Explorer web interface as a query editor.
Kusto Query Language can also be used in other services that are built on-top of Azure Data Explorer, like:
- Azure Monitor Logs
- Azure Sentinel
- Microsoft Defender for IoT
- Microsoft Defender for Endpoint
- Microsoft Defender for Cloud
- Application Insights
-
We can see our cluster and the database that we created. If you followd the steps above the database is named
FreeTestDB
. -
To run KQL queries, you must select the Query button on the Free Cluster page.
-
Now you can write a simple KQL query:
print "Hello World"
-
Hit the Run button. The query will be executed and its result can be seen in the result grid at the bottom of the page.
ℹ️ Note Windows users can also download Kusto Explorer, a desktop client to run the queries and benefit from advanced features available in the client.
Data ingestion to ADX is the process used to load data records from one or more sources into a table in your ADX cluster. Once ingested, the data becomes available for query.
ADX supports several ingestion methods, including ingestion tools, connectors and plugins, Azure managed pipelines, programmatic ingestion using SDKs, and direct access to ingestion.
Tasks:
- Task 1: Create the raw table - logsRaw
- Task 2: Use the “One-click” UI (User Interface) to ingest data from Azure blob storage
Expected Learning Outcomes:
- Ingest data using one-click ingestion from Azure Blob Storage to your ADX cluster.
-
Go to the Query tab and run the following command to create our table:
.create table logsRaw( Timestamp:datetime, Source:string, Node:string, Level:string, Component:string, ClientRequestId:string, Message:string, Properties:dynamic )
Challenge 2, Task 2: Use the "One-click" UI (User Interface) to ingest data from Azure blob storage container
You need to analyze the system logs for Contoso. The logs have been saved to an Azure Storage Container as CSVs. Let's import them to do the analysis in the platform.
-
Go to the One-Click UI via https://dataexplorer.azure.com/oneclick > click Ingest. Select the name of your free-cluster and database name.
-
Make sure the cluster and the Database fields are correct. In our example the cluster is named
MyFreeCluster
and the database is namedADX in a day
. Select the option Existing table. SelectlogsRaw
from the list. Click Next: Source to proceed.ℹ️ Note We used an example table name as logsRaw
here. You can give any name to your table but be sure to use it in all your queries going forward. -
Select Source Type: Blob container. Leave "Ingestion type" to the default One-Time and "Select source" to the default value Add URL.
-
In Link to source paste the following URL:
https://adxsamplefiles.blob.core.windows.net/publiccsvsamples/logsbenchmark-onegb/2014
-
The Schema defining file should automatically select file:
03/08/00/data.csv.gz
. One file will be used to determine the schema of the data. For this lab, it does not matter which file you choose because all files have the same structure. Click Next: Schema to proceed. -
Under "Data format" select Keep current table schema and deselect Ignore the first record. Click Next: Start ingestion to proceed.
-
The ingestion should be completed within a few seconds, and click Close to finish.
-
Go to the Query page. Run the following KQL query to verify that data was ingested to the table.
logsRaw | count
The
logsRaw
table should have 3,834,012 records.
In this challenge you’ll write queries in Kusto Query Language (KQL) to explore and gain insights from your data.
Tasks:
- Task 0: Journey from SQL to KQL!
- Task 1: Basic KQL queries - explore the data
- Task 2: Explore the table and columns✅
- Task 3: Keep the columns of your interest✅
- Task 4: Filter the output✅
- Task 5: Sorting the results✅
- Task 6: Data profiling✅
- Task 7: Total number of records✅
- Task 8: Aggregations and string operations✅
- Task 9: Render a chart✅
- Task 10: Create bins and visualize time series✅
- Task 11: Shortcuts
Expected Learning Outcomes:
- Know how to write queries with KQL.
- Use KQL to explore data by using the most common operators.
What is a Kusto query?
A Kusto query is a read-only request to process data and return results. The request is stated in plain text that's easy to read, author, and automate. A Kusto query has one or more query statements and returns data in a tabular or graph format.
What is a tabular statement?
The most common kind of query statement is a tabular expression statement. Both its input and its output consist of tables or tabular datasets.
Tabular statements contain zero or more operators. Each operator starts with a tabular input and returns a tabular output. Operators are sequenced by a pipe (|
). Data flows, or is piped, from one operator to the next. The data is filtered or manipulated at each step and then fed into the following step.
It's like a funnel, where you start out with an entire data table. Each time the data passes through another operator, it's filtered, rearranged, or summarized. Because the piping of information from one operator to another is sequential, the query's operator order is important. At the end of the funnel, you're left with a refined output. Let's look at an example query:
logsRaw
| take 10
This query has a single tabular expression statement. The statement begins with a reference to the table logsRaw
and contains the operators take
. Each operator is separated by a pipe.
References:
For all the SQL pros out there, Azure data explorer allows a subset of TSQL queries. Try running the following SQL query in web UI
SELECT COUNT() FROM logsRaw
ℹ️ Note |
---|
Intellisense will not work for SQL queries. |
The primary language to interact with Kusto is KQL (Kusto Query Language). To make the transition and learning experience easier, you can use the explain
operator to translate SQL queries to KQL.
explain SELECT MAX(Timestamp) AS MaxTimestamp FROM logsRaw WHERE Level='Error'
Output of the above query will be a corresponsing KQL query
logsRaw
| where (Level == "Error")
| summarize MaxTimestamp=max(Timestamp)
| project MaxTimestamp
References:
In this task, you will see some KQL examples. For this task, we will use the table logsRaw, which has data we loaded in previous challenge from storage account.
-
Execute the queries and view the results. KQL queries can be used to filter data and return specific information. Now, you'll learn how to choose specific rows of data. The
where
operator filters results that satisfy a certain condition.logsRaw | where Level=="Error" | take 10
The
take
operator samples any number of records from our table without any order. In the above example, we asked to provide 10 random records. -
Find out how many records are in the table
logsRaw | summarize count() // or: count
-
Find out the minimum and maximum Timestamp
logsRaw | summarize min(Timestamp), max(Timestamp)
Azure Data Explorer provides a set of system data types that define all the types of data that can be stored.
Some data types for example are: string
, int
, decimal
, GUID
, bool
, datetime
.
Note, that the data type of the Properties column is dynamic
. The dynamic
data type is special in that it can take on any value of other data types, as well as arrays and property bags (dictionaries).
Our dataset has trace records written by Contoso's DOWNLOADER program (| where Component == "DOWNLOADER"
), which downloads files from blob storage as part of its business operations.
This is how a typical Properties column looks like:
The dynamic
type is extremely beneficial when it comes to storing JSON data, since KQL makes it simple to access fields in JSON and treat them like an independent column: just use either the dot notation (dict.key
) or the bracket notation (dict["key"]
).
The extend
operator adds a new calculated column to the result set, during query time. This allows for the creation of new standalone columns to the result set, from the JSON data in dynamic
columns.
logsRaw
| where Component == "DOWNLOADER"
| take 100
| extend originalSize=Properties.OriginalSize, compressedSize=Properties.compressedSize
Note that although the dynamic type appears JSON-like, it can hold values that the JSON model does not represent because they don't exist in JSON (e.g., long
, real
, datetime
, timespan
, and GUID
).
After subscripting a dynamic object, it is necessary to cast (convert) the value to a simple type in order to utilize them (for example, if you want to summarize the sizes of all the OriginalSize
, you should convert the dynamic
type to a numeric type, like long
).
-
Write a query to get the table that is shown in the image below (we want to convert the
OriginalSize
andCompressedSize
columns tolong
).- Hint 1: Observe there are 2 new columns originalSize and compressedSize with datatype
long
- Hint 2: Accessing a sub-object of a dynamic value yields another dynamic value, even if the sub-object has a different underlying type.
- Hint 3: After subscripting a dynamic object, you must cast the value to a simple type.
- Hint 1: Observe there are 2 new columns originalSize and compressedSize with datatype
❓ Question 3.2 |
---|
What is the "Datatype" of "ColumnType = long ? |
Example result:
References:
You are investigating an incident and wish to review only several columns of the dataset.
- Write a query to get only specific desired columns: Timestamp, ClientRequestId, Level, Message. Take arbitrary 10 records.
❓ Question 3.3 |
---|
If we have to change ClientRequestId column from string to guid datatype, what is the function we should use? |
References:
You are investigating an incident that occurred within a specific time frame.
-
Write a query to get only specific desired columns:
Timestamp
,ClientRequestId
,Level
,Message
. Take all the records between2014-03-08 01:00
and2014-03-08 10:00
.- Hint 1: In case you see 0 records, remember that operators are sequenced by a pipe (
|
). Data is piped, from one operator to the next. The data is filtered or manipulated at each step and then fed into the following step. By using the ‘Take’ operator, there is no guarantee which records are returned
- Hint 1: In case you see 0 records, remember that operators are sequenced by a pipe (
❓ Question 3.4 |
---|
What is the count of records in this timeframe? |
References:
Your system generated an alert indicating a significant decrease in incoming data. You want to check the traces of the "INGESTOR_EXECUTER" [sic] component of the program.
-
Write a query that returns 20 sample records in which the
Component
column equals the word "INGESTOR_EXECUTER" [sic]. -
Once done, rewrite the query to take the top 1 records by the value of
rowCount
(for the "INGESTOR_EXECUTER" [sic] records).-
Hint 1: Extract
rowCount
fromProperties
column -
Hint 2: Think about the datatype and conversion
-
Hint 3: Note the "Executer" [sic] spelling
-
❓ Question 3.5 |
---|
What is the value of rowCount column of this record? |
Example result:
References:
As part of the incident investigation, you want to extract format
and rowCount
from INGESTOR_EXECUTER [sic] component. Rename the calculated fields to fileFormat and rowCount respectively. Also, Make Sure Timestamp
, fileFormat
and rowCount
are the first 3 columns.
❓ Question 3.6 |
---|
How many different file formats are present in this data? |
Example result:
References:
The system comprises of several "components", but you don't know their names or how many records were generated by each.
- Write a query to find out how many records were generated by each component. Use the
Component
column.
❓ Question 3.7 |
---|
What is the count of "DATAACCESS" component? |
References:
You assume that the incident being investigated has a connection to the ingestion process run by Contoso's program.
- Write a query to find out how many records contain the string
'ingestion'
in theMessage
column. Aggregate the results byLevel
.
❓ Question 3.8 |
---|
What is count of "Error" records? |
Example result:
References:
- Write a query to find out how many total records are present per
Level
(aggregated byLevel
) and render a piechart.
❓ Question 3.9 |
---|
What is the "Warning" %? |
Example result:
References:
- Write a query to show a timechart of the number of records in 30 minute bins (buckets). Each point on the timechart represent the number of logs in that bucket.
❓ Question 3.10 |
---|
What is the count of records on March 8th, 10:30 ? |
Example result:
References:
Purpose of this task is to expose some cool productivity features of Azure Data Explorer Web Interface. This task is not evaluated.
There are many keyboard shortcuts available in ADX Web UI and Kusto Explorer to increase productivity while working with KQL.
Below are a few examples
-
You don't have to select a block of code. Based on current cursor position, code that is separated by empty lines is considered a single block of code.
-
You can execute a block of code using
Shift
+Enter
-
You can directly insert filters based on data cells selections using
Ctrl
+Shift
+Space
References:
In this challenge we will explore 3 capabilities of Data Explorer
-
User-defined functions are reusable KQL subqueries that can be defined as part of the query itself (ad-hoc functions), or persisted as part of the database metadata (stored functions - reusable KQL query, with the given name). Stored functions are invoked through a name, are provided with zero or more input arguments (which can be scalar or tabular), and produce a single value (which can be scalar or tabular) based on the function body.
-
Update Policy is like an internal ETL. It can help you manipulate or enrich the data as it gets ingested into the source table (e.g. extracting JSON into separate columns, creating a new calculated column, joining the newly ingested records with a static dimension table that is already in your database, etc). For these cases, using an update policy is a very common and powerful practice.
Each time records get ingested into the source table, the update policy's query (which we'll define in the update policy) will run on them (only on newly ingested records - other existing records in the source table aren’t visible to the update policy when it runs), and the results of the query will be appended to the target table. This function's output schema and target table schema should exactly match.
Tasks:
Expected Learning Outcomes:
- Create user defined functions to use repeatable logic
- Create an update policy to transform the data at ingestion time
For the next task, we will use the logsRaw
table.
-
Create a stored functions, named
ManiputatelogsRaw
, that will contain the code below. Make sure the function works.logsRaw | where Component in ( 'INGESTOR_EXECUTER', 'INGESTOR_GATEWAY', 'INTEGRATIONDATABASE', 'INTEGRATIONSERVICEFLOWS', 'INTEGRATIONSERVICETRACE', 'DOWNLOADER')
❓ Question 4.1 |
---|
What is property used when creating function that is used for UI functions categorization? |
References:
- See the
create function
article.
In this task, we will use an update policy
to filter the raw data in the logsRaw
table (the source table) for ingestion logs, that will be ingested into the new table ingestionLogs
that we’ll create.
-
Build the target table
.create table ingestionLogs ( Timestamp: datetime, Source: string, Node: string, Level: string, Component: string, ClientRequestId: string, Message: string, Properties: dynamic)
-
Create a function for the update policy
**Use the function created in Task 1**
-
Create the update policy(Fill in the blanks) ✅
.alter table ...... policy update @'[{ "IsEnabled": true, "Source": "....", "Query": ".....", "IsTransactional": true, "PropagateIngestionProperties": false} ]'
-
Update policy can transform and move the data from source table from the time it is created. It cannot look back at already existing data in source table. We will ingest new data into logsraw table and see new data flowing into ingestionLogs table
// Note, the following .ingest commands set creationTime to 2014 as you may notice in the file path.
// This param allows to backfill the table with historical data and index it according ot the creationTime setting.
.execute database script <|
.ingest async into table logsRaw (h'https://adxsamplefiles.blob.core.windows.net/publiccsvsamples/logsbenchmark-onegb/2014/03/08/00/data.csv.gz') with (format='csv', creationTime='2024-03-08T00:00:00Z');
.ingest async into table logsRaw (h'https://adxsamplefiles.blob.core.windows.net/publiccsvsamples/logsbenchmark-onegb/2014/03/08/01/data.csv.gz') with (format='csv', creationTime='2024-03-08T01:00:00Z');
.ingest async into table logsRaw (h'https://adxsamplefiles.blob.core.windows.net/publiccsvsamples/logsbenchmark-onegb/2014/03/08/02/data.csv.gz') with (format='csv', creationTime='2024-03-08T02:00:00Z');
.ingest async into table logsRaw (h'https://adxsamplefiles.blob.core.windows.net/publiccsvsamples/logsbenchmark-onegb/2014/03/08/03/data.csv.gz') with (format='csv', creationTime='2024-03-08T03:00:00Z');
.ingest async into table logsRaw (h'https://adxsamplefiles.blob.core.windows.net/publiccsvsamples/logsbenchmark-onegb/2014/03/08/04/data.csv.gz') with (format='csv', creationTime='2024-03-08T04:00:00Z');
ℹ️ Note |
---|
The above database script does not complete immediately. Because we used the async parameter for the .ingest commands, the output of the above script will be OperationId & Result for each of the 5 commands in the database script itself. The detailed progress of each operation in the script can be checked by used the below command |
-
Check progress of the commands
.show operations <operationId>
-
Make sure the data is transformed correctly in the destination table
ingestionLogs | count
Check if the count of ingestionLogs table is 93,648.
ℹ️ Note If the count is not matching for ingestionLogs table, it means that one of the above .ingest
commands have throttled or failed. -
Please run the following command to clean ingestionLogs table
.clear table ingestionLogs data
-
You can then run the above
.ingest
commands one by one and this will result in 93,648 count in ingestionLogs table.- Hint 1: Remember we have already ingested data into logsRaw in Challenge 2. We need the count of records from latest ingestion only.
- Hint 2:
ingestion_time()
is a hidden column that stores ingested datetime of extents. - Hint 3: Use
ago()
to filter for latest records inserted.
-
What is the count of records that were ingested into
ingestionLogs(target)
table with this update policy?
❓ Question 4.2 |
---|
Calculate the ratio => ingestionLogs count / logsRaw count (Only the latest ingestion). Consider 4 digits after decimal points in the output. |
References:
- Kusto update policy - Azure Data Explorer | Microsoft Docs
- Kusto Ingest from Storage | Microsoft Docs
🎉 Congrats! You've completed ADX in a Day Lab 1. Keep going with Lab 2: Advanced KQL, Policies and Visualization