Skip to content

Runtime

Knowing how we run your code will help you prepare your app for Databricks, and troubleshoot any issues.

Operating System

We host your app in Linux VMs. We also limit, for security reasons, the OS calls your app can make. The majority of apps won't have a problem given Python's OS interoperability.

Resources

Every app gets 2 vCPUs and 8Gb of memory. We will continue to evaluate what is the best default setting.

Python

We use Python 3.11, and all dependencies are installed with pip in a virtual environment. The app command (from app.yaml) is executed within the virtual environment context, which means that it only has access to executables from the virtual environment.

Contract

We expect your app to respect a few guidelines we set. These are fairly standard in runtime environments.

  1. Must not implement TLS encryption on their own. See Proxy.
  2. Must handle HTTP/2 requests in cleartext (h2c) format
  3. Should handle graceful shutdown within 10 seconds SIGTERM, then SIGKILL.
  4. Must not require to run in a privileged security context.
  5. Must use DATABRICKS_APP_PORT network port and 0.0.0.0 host. At the moment the port is 8000.

Environment Variables

The following environment variables are automatically set for the app.

General

Variable Description
DATABRICKS_APP_NAME Name of the app.
DATABRICKS_WORKSPACE_ID ID of the Databricks workspace the app belongs to.
DATABRICKS_HOST URL of the Databricks workspace.
DATABRICKS_APP_PORT Network port to listen to. See Runtime Contract.
DATABRICKS_CLIENT_ID Client ID for the app's service principal.
DATABRICKS_CLIENT_SECRET Client Secret for the app's service principal.

In addition to the above, we set a few framework-specific environment variables so they just work.

Optionally, You can inject your own environment variables in app.yaml.

Info

The environment variables injected are also logged so you can troubleshoot. Docs can be out-of-date.

Logs

Apps must write to stdout and stderr. There is currently no support for custom log files.

Pre-installed libraries

To make your life easier, we pre-install several libraries so you don't even need to worry about adding a requirements.txt file.

  • databricks-sql-connector~=3.0
  • databricks-connect~=14.0
  • databricks-sdk~=0.26
  • gradio~=4.0
  • streamlit~=1.0
  • dash~=2.0
  • flask~=3.0
  • fastapi~=0.110
  • 'uvicorn[standard]'~=0.29
  • gunicorn~=22.0
  • pandas
  • numpy

Warning

Please note that PySpark is NOT compatible with databricks-connect. We are making the deliberate choice of pre-installing databricks-connect as it is the recommended approach to interact with spark from Databricks Apps. See Best Practices

Proxy

The app is behind a reverse-proxy, so you need to ensure your app is not making assumptions on the request origin. We set the necessary configuration for streamlit and gradio.

Some frameworks use X-Forwarded-* headers. We follow industry standards and should .

Header Description
X-Forwarded-Host Original request URL.
X-Forwarded-Preferred-Username User name provided by IdP.
X-Forwarded-User User Id provided by IdP.
X-Forwarded-Email User Email provided by IdP.
X-Forwarded-Access-Token Databricks access token of end user.
X-Real-Ip Real Ip of user machine.
X-Request-Id UUID of request.

TODO

  • Maximum size of request
  • Default settings on the reverse proxy that may be important to customer
  • Any limitations with websockets, keepalive, and timeouts we may have
  • Protocols that we support/limit (grpc, websocket, etc) (i know we require http/2 with H2C but any other limitations)
  • Any CORS headers/x-frame-options that we control or recommend for embedding the app/etc
  • Number of concurrent connections