{"id":80959,"date":"2019-02-07T17:44:39","date_gmt":"2019-02-07T17:44:39","guid":{"rendered":"https:\/\/news.microsoft.com\/?p=431199"},"modified":"2019-02-07T17:44:39","modified_gmt":"2019-02-07T17:44:39","slug":"azure-data-explorer-now-available-can-query-1-billion-records-in-under-a-second","status":"publish","type":"post","link":"https:\/\/sickgaming.net\/blog\/2019\/02\/07\/azure-data-explorer-now-available-can-query-1-billion-records-in-under-a-second\/","title":{"rendered":"Azure Data Explorer, now available, can query 1 billion records in under a second"},"content":{"rendered":"<p>As Julia White mentioned in her blog today, we\u2019re pleased to announce the general availability of Azure Data Lake Storage Gen2 and Azure Data Explorer. We also announced the preview of Azure Data Factory Mapping Data Flow. With these updates, Azure continues to be the best cloud for analytics with unmatched price-performance and security. In this blog post we\u2019ll take a closer look at the technical capabilities of these new features.<\/p>\n<h2>Azure Data Lake Storage &#8211; The no compromise Data Lake<\/h2>\n<p>Azure Data Lake Storage (ADLS) combines the scalability, cost effectiveness, security model, and rich capabilities of Azure Blob Storage with a high-performance file system that is built for analytics and is compatible with the Hadoop Distributed File System. Customers no longer have to tradeoff between cost effectiveness and performance when choosing a cloud data lake.<\/p>\n<p>One of our key priorities was to ensure that ADLS is compatible with the Apache ecosystem. We accomplished this by developing the Azure Blob File System (ABFS) driver. The ABFS driver is officially part of Apache Hadoop and Spark and is incorporated in many commercial distributions. The ABFS driver defines a URI scheme that allows files and folders to be distinctly addressed in the following manner:<\/p>\n<pre>&#013;\nabfs[s]:\/\/file_system@account_name.dfs.core.windows.net\/&lt;path&gt;\/&lt;path&gt;\/&lt;filename&gt;<\/pre>\n<p>It is important to note that the file system semantics are implemented server-side. This approach eliminates the need for a complex client-side driver and ensures high fidelity file system transactions.<\/p>\n<p>To further boost analytics performance, we implemented a hierarchical namespace (HNS) which supports atomic file and folder operations. This is important because it reduces the overhead associated with processing big data on blob storage. This speeds up job execution and lowers cost because fewer compute operations are required.<\/p>\n<p>The ABFS driver and HNS significantly improve ADLS\u2019 performance, removing scale and performance bottlenecks.\u00a0 This performance enhancement is now available at the same low cost as Azure Blob Storage.<\/p>\n<p>ADLS offers the same powerful data security capabilities built into Azure Blob Storage, such as:<\/p>\n<ul>\n<li>Encryption of data in transit and at rest via TLS 1.2<\/li>\n<p>&#013; <\/p>\n<li>Storage account firewalls<\/li>\n<p>&#013; <\/p>\n<li>Virtual network integration<\/li>\n<p>&#013; <\/p>\n<li>Role-based access security<\/li>\n<p>&#013;\n<\/ul>\n<p>In addition, ADLS\u2019 file system provides support for POSIX compliant access control lists (ACLs). With this approach, you can provide granular security protection that restricts access to only authorized users, groups, or service principals and provides file and object data protection.<\/p>\n<p><a href=\"https:\/\/azurecomcdn.azureedge.net\/mediahandler\/acomblog\/media\/Default\/blog\/c8fb8bba-1654-4506-bc96-396ad740b1c8.png\"><img loading=\"lazy\" decoding=\"async\" alt=\"Azure Data Lake Storage diagram.jpg\" border=\"0\" height=\"450\" src=\"http:\/\/www.sickgaming.net\/blog\/wp-content\/uploads\/2019\/02\/azure-data-explorer-now-available-can-query-1-billion-records-in-under-a-second.png\" title=\"Azure Data Lake Storage diagram.jpg\" width=\"800\" \/><\/a><\/p>\n<p>ADLS is tightly integrated with Azure Databricks, Azure HDInsight, Azure Data Factory, Azure SQL Data Warehouse, and Power BI, enabling an end-to-end analytics workflow that delivers powerful business insights throughout all levels of your organization. Furthermore, ADLS is supported by a global network of big data analytics ISV\u2019s and system integrators, including Cloudera and Hortonworks.<\/p>\n<h3>Next steps<\/h3>\n<h2>Azure Data Explorer \u2013 The fast and highly scalable data analytics service<\/h2>\n<p>Azure Data Explorer (ADX) is a fast, fully managed data analytics service for real-time analysis on large volumes of streaming data. ADX is capable of querying 1 billion records in under a second with no modification of the data or metadata required. ADX also includes native connectors to Azure Data Lake Storage, Azure SQL Data Warehouse, and Power BI and comes with an intuitive query language so that customers can get insights in minutes.<\/p>\n<p>Designed for speed and simplicity, ADX is architected with two distinct services that work in tandem: The Engine and Data Management (DM) service. Both services are deployed as clusters of compute nodes (virtual machines) in Azure.<\/p>\n<p><a href=\"https:\/\/azurecomcdn.azureedge.net\/mediahandler\/acomblog\/media\/Default\/blog\/1f1a14f4-a0b2-4e12-a9aa-8d48ce6419fc.png\"><img loading=\"lazy\" decoding=\"async\" alt=\"Azure Data Explorer diagram\" border=\"0\" height=\"604\" src=\"http:\/\/www.sickgaming.net\/blog\/wp-content\/uploads\/2019\/02\/azure-data-explorer-now-available-can-query-1-billion-records-in-under-a-second-1.png\" title=\"Azure Data Explorer diagram\" width=\"1024\" \/><\/a><\/p>\n<p>The Data Management (DM) service ingests various types of raw data and manages failure, backpressure, and data grooming tasks when necessary. The DM service also enables fast data ingestion through a unique method of automatic indexing and compression.<\/p>\n<p>The Engine service is responsible for processing the incoming raw data and serving user queries. It uses a combination of auto scaling and data sharding to achieve speed and scale. The read-only query language is designed to make the syntax easy to read, author, and automate. The language provides a natural progression from one-line queries to complex data processing scripts for efficient query execution.<\/p>\n<p>ADX is available in 41 Azure regions and is supported by a growing ecosystem of partners, including ISV\u2019s and system integrators.<\/p>\n<h3>Next steps<\/h3>\n<h2>Azure Data Factory Mapping Data Flow \u2013 Visual, zero-code experience for data transformation<\/h2>\n<p>Azure Data Factory (ADF) is a hybrid cloud-based data integration service for orchestrating and automating data movement and transformation. ADF provides over 80 built-in connectors to structured, semi-structured, and unstructured data sources.<\/p>\n<p>With Mapping Data Flow in ADF, customers can visually design, build, and manage data transformation processes without learning Spark or having a deep understanding of their distributed infrastructure.<\/p>\n<p><a href=\"https:\/\/azurecomcdn.azureedge.net\/mediahandler\/acomblog\/media\/Default\/blog\/3688c704-5294-460b-9385-13c3b4ee817c.png\"><img loading=\"lazy\" decoding=\"async\" alt=\"Azure Data Factory Mapping Data Flow\" border=\"0\" height=\"885\" src=\"http:\/\/www.sickgaming.net\/blog\/wp-content\/uploads\/2019\/02\/azure-data-explorer-now-available-can-query-1-billion-records-in-under-a-second-2.png\" title=\"Azure Data Factory Mapping Data Flow\" width=\"1532\" \/><\/a><\/p>\n<p>Mapping Data Flow combines a rich expression language with an interactive debugger to easily execute, trigger, and monitor ETL jobs and data integration processes.<\/p>\n<p>Azure Data Factory is available in 21 regions and expanding, and is supported by a broad ecosystem of partners including ISV\u2019s and system integrators.<\/p>\n<h3>Next steps<\/h3>\n<h2>Azure is the best place for data analytics<\/h2>\n<p>With these technical innovations announced today, Azure continues to be the best cloud for analytics. Learn more why <a href=\"http:\/\/aka.ms\/simply-unmatched\" target=\"_blank\">analytics in Azure is simply unmatched<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As Julia White mentioned in her blog today, we\u2019re pleased to announce the general availability of Azure Data Lake Storage Gen2 and Azure Data Explorer. We also announced the preview of Azure Data Factory Mapping Data Flow. With these updates, Azure continues to be the best cloud for analytics with unmatched price-performance and security. In [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":80960,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[49],"tags":[54,50],"class_list":["post-80959","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-microsoft-news","tag-azure","tag-recent-news"],"_links":{"self":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/80959","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/comments?post=80959"}],"version-history":[{"count":0,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/posts\/80959\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media\/80960"}],"wp:attachment":[{"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/media?parent=80959"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/categories?post=80959"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sickgaming.net\/blog\/wp-json\/wp\/v2\/tags?post=80959"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}