Querying Data in Azure Data Lake Storage Gen 2 from Power BI

Decorative cloud

Post last updated: Dec 6, 2022

There are a few ways to interact with Azure Data Lake Storage Gen 2 from Power BI:

  • Dataflows functionality in the Power BI Service, or

  • Data connector for ADLS Gen 2 in Power BI Desktop <—focus of this post

  • Other more indirect paths, such as when you have Azure Databricks or another compute technology in the middle.

If you want to get familiar with dataflows, check out this summary I wrote about the 3 ways to use Power BI Dataflows. Matthew Roche has a lot of helpful content - start with his table of contents for dataflows posts.

Matthew also has a very good post about using Power BI in conjunction with ADLS Gen 2 which is not associated with dataflows. Great food for thought.

If you’re not yet familiar with what ADLS Gen 2 is, check out this post I wrote for BlueGranite - it’s an older post but still useful to understand what it is: 10 Things to Know About Azure Data Lake Storage Gen 2.

The remainder of this post is on the URLs for using the connector in Power BI Desktop.

Two Ways to Access Azure Data Lake Storage Gen 2

To get data from an ADLS Gen 2 account directly into Power BI Desktop from the data lake (without going through dataflows for this particular scenario), there are two connectivity options:

  1. Use the Azure Data Lake Storage Gen 2 connector within Power BI Desktop in conjunction with the DFS endpoint. This is recommended.

  2. Use the Azure Blob Storage connector within Power BI Desktop in conjunction with the blob endpoint. This utilizes a new feature called Multi-Protocol Access (MPA) so you can use the blob endpoint instead of the DFS endpoint. This is really just for backward compatibility only.

Option 1 above is strongly preferred.

Option 1: Using the ADLS Gen 2 Connector in Power BI Desktop

The ADLS Gen 2 connector is available in Power BI Desktop:

ADLS Gen 2 connector in Power BI Desktop
 

The connector asks for a URL. The URL can be any of the following formats.

All files in the file system:

https://storageacctname.dfs.core.windows.net/filesystemname

or

All files under a directory (including subdirectories):

https://storageacctname.dfs.core.windows.net/filesystemname/directoryname/directoryname

or

One file:

https://storageacctname.dfs.core.windows.net/filesystemname/directoryname/directoryname/filename.ext

What that looks like for me is as follows:

ADLS Gen 2 URL

What gets returned is a flattened list of every file that resides under the path specified in the URL:

ADLS Gen 2 Connector Flattened Data Results

At this point we choose which files we want to combine and then continue on with normal Power BI data cleansing, modeling, and visualization activities which I won’t cover in this post.

Since I specified the entire container (aka filesystem) in the URL above, that means I’m getting all of the data I have permission to view. That could be raw data, curated data, master data, etc. My recommendation is:

Light bulb icon for emphasis

Make your URL as specific as possible, down to the lowest directory you need. Import as few files as possible into Power BI.

If your data lake is designed well, then all of the files underneath a specific path should be of the same file type & same structure which will make it easy to combine those files in Power BI.

Option 2: Using the Blob Storage Connector in Power BI Desktop (or Excel) to Access Data in Azure Data Lake Storage Gen 2

This second option refers to using the Blob Storage connector in Power BI Desktop. This Azure Blob Storage connector does now work when pointed to an ADLS Gen 2 account since Multi-Protocol Access (MPA) is supported, which which allows access to an ADLS Gen 2 account from either the new dfs endpoint or the older blob endpoint. (If you want to know more about why two endpoints are helpful, check out this post.)

Light bulb icon for emphasis

At this point, I do **not** suggest that you use this route for connecting to Power BI. Multi-protocol access, and backwards compatibility with the blob endpoint, is really all about allowing tools that cannot easily use the dfs endpoint to still connect.

Having said that, let me give you the URLs in case you want to use the blob storage connector.

All files in the file system:

https://storageacctname.blob.core.windows.net/filesystemname

or

All files under a directory:

https://storageacctname.blob.core.windows.net/filesystemname/directoryname/directoryname

or

One file:

https://storageacctname.blob.core.windows.net/filesystemname/directoryname/directoryname/filename.ext

Troubleshooting the Power BI Connector

If you get the error “Blob API is not supported for hierarchical namespace accounts” that means you’re trying to use the new connector with the older endpoint. If you change the URL to use DFS instead of Blob, you should be good to go.

You do need specific permissions to the data in ADLS Gen 2 to be able to retrieve the data. There are two levels of permissions to be aware of: role-based access control (RBAC) on the account and access control lists (ACLs) at the directory and file level.