Skip to content

Databricks Python pip authentication#

Before the Databricks Unit Catalog's release, we used init scripts stored in DBFS to generate the pip.conf file during cluster startup, allowing each cluster its unique auth token. But with init scripts no longer available in the Unit Catalog's shared mode, an alternative approach is required.

I haven't tested all of the methods below.

Just tested Method 1 and Method 2. Method 3 and Method 4 are from here.

Unity Catalog needs Databricks runtime 11.3 LTS or above

Method 1: Preparing pip.conf file in advance#

A workaround involves placing a prepared pip.conf in the Databricks workspace and setting the PIP_CONFIG_FILE environment variable to point to this file. This method, however, presents security concerns: the pip.conf file, containing the auth token, becomes accessible to the entire workspace, potentially exposing it to all users and clusters. See here to check this workaround.

In contrast, the Unit Catalog's single mode retains init script availability. Here, the pip auth token is stored securely in a vault and accessed via the Databricks secret scope. Upon cluster startup, the init script fetches the token from the vault, generating the pip.conf file. This approach is considerably more secure than the shared mode alternative.

Method 2: Keeping using init scripts but with Azure ADLS Gen2 instead of DBFS#

Unit Catalog's shared mode does not allow init scripts stored in DBFS. However, the init script can be stored in Azure ADLS Gen2 accessible by ABFSS. It is also needed to configure the credentials to connect to Azure ADLS Gen2.

Ref to this PDF filefor details.

Databricks runtimes from 11.3 LTS to 13.3 LTS (13.3 not included) might be not supported, as not tested yet.

Method 3: Keeping using init scripts in DBFS with allowlist#

Refer to this link, And be aware that this feature is only available in Databricks runtime 13.3 LTS or above.

Method 4: Keeping using init scripts but with UC volume instead of DBFS#

Refer to this PDF file for details. And be aware that this feature is only available in Databricks runtime 13.3 LTS or above.

Comments