Runtime¶

Knowing how we run your code will help you prepare your app for Databricks, and troubleshoot any issues.

Operating System¶

We host your app in Linux VMs. We also limit, for security reasons, the OS calls your app can make. The majority of apps won't have a problem given Python's OS interoperability.

Resources¶

Every app gets 2 vCPUs and 8Gb of memory. We will continue to evaluate what is the best default setting.

Python¶

We use Python 3.11, and all dependencies are installed with pip in a virtual environment. The app command (from app.yaml) is executed within the virtual environment context, which means that it only has access to executables from the virtual environment.

Contract¶

We expect your app to respect a few guidelines we set. These are fairly standard in runtime environments.

Must not implement TLS encryption on their own. See Proxy.
Must handle HTTP/2 requests in cleartext (h2c) format
Should handle graceful shutdown within 10 seconds SIGTERM, then SIGKILL.
Must not require to run in a privileged security context.
Must use DATABRICKS_APP_PORT network port and 0.0.0.0 host. At the moment the port is 8000.

Environment Variables¶

The following environment variables are automatically set for the app.

General¶

Variable	Description
DATABRICKS_APP_NAME	Name of the app.
DATABRICKS_WORKSPACE_ID	ID of the Databricks workspace the app belongs to.
DATABRICKS_HOST	URL of the Databricks workspace.
DATABRICKS_APP_PORT	Network port to listen to. See Runtime Contract.
DATABRICKS_CLIENT_ID	Client ID for the app's service principal.
DATABRICKS_CLIENT_SECRET	Client Secret for the app's service principal.

In addition to the above, we set a few framework-specific environment variables so they just work.

Optionally, You can inject your own environment variables in app.yaml.

Info

The environment variables injected are also logged so you can troubleshoot. Docs can be out-of-date.

Logs¶

Apps must write to stdout and stderr. There is currently no support for custom log files.

Pre-installed libraries¶

To make your life easier, we pre-install several libraries so you don't even need to worry about adding a requirements.txt file.

databricks-sql-connector~=3.0
databricks-connect~=14.0
databricks-sdk~=0.26
gradio~=4.0
streamlit~=1.0
dash~=2.0
flask~=3.0
fastapi~=0.110
'uvicorn[standard]'~=0.29
gunicorn~=22.0
pandas
numpy

Warning

Please note that PySpark is NOT compatible with databricks-connect. We are making the deliberate choice of pre-installing databricks-connect as it is the recommended approach to interact with spark from Databricks Apps. See Best Practices

Proxy¶

The app is behind a reverse-proxy, so you need to ensure your app is not making assumptions on the request origin. We set the necessary configuration for streamlit and gradio.

Some frameworks use X-Forwarded-* headers. We follow industry standards and should .

Header	Description
`X-Forwarded-Host`	Original request URL.
`X-Forwarded-Preferred-Username`	User name provided by IdP.
`X-Forwarded-User`	User Id provided by IdP.
`X-Forwarded-Email`	User Email provided by IdP.
`X-Forwarded-Access-Token`	Databricks access token of end user.
`X-Real-Ip`	Real Ip of user machine.
`X-Request-Id`	UUID of request.

TODO¶

Maximum size of request
Default settings on the reverse proxy that may be important to customer
Any limitations with websockets, keepalive, and timeouts we may have
Protocols that we support/limit (grpc, websocket, etc) (i know we require http/2 with H2C but any other limitations)
Any CORS headers/x-frame-options that we control or recommend for embedding the app/etc
Number of concurrent connections