Accessing azure blob storage in azure databricks

 Accessing azure blob storage using azure databricks

There are two different ways to access azure blob storage.
  1. Mounting
  2. Directly accessing storage 

Mounting azure storage 

When we mount azure storage in databricks it behaves like DBFS. We can run all the DBFS commands on the mount point. The mount point persist till we unmount it. If a Blob storage container is mounted using a storage account access key, DBFS uses temporary SAS tokens derived from the storage account key when it accesses this mount point. Below is the code to mount azure storage.

Python code

dbutils.fs.mount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")}) 

  • <conf-key> can be either fs.azure.account.key.<storage-account-name>.blob.core.windows.net or fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net

Scala code 

dbutils.fs.mount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>",
  mountPoint = "/mnt/<mount-name>",
  extraConfigs = Map("<conf-key>" -> dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")))

Directly access azure blob storage

Using account key (python)

spark.conf.set(
  "fs.azure.account.key.<storage-account-name>.blob.core.windows.net",
  "<storage-account-access-key>")

Using SAS (python)

spark.conf.set(
  "fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net",
  "<complete-query-string-of-sas-for-the-container>")

When you directly access the azure storage then scope of access is limited to your notebook but if you mount the storage then you can access data across different notebooks.

No comments:

Post a Comment

T-SQL LEAD LAG and SUM function based query

  Query on T-SQL window clause Below is the sales table Order_Date Name Product SubCategory ...