Runtime¶
Knowing how we run your code will help you prepare your app for Databricks, and troubleshoot any issues.
Operating System¶
We host your app in Linux VMs. We also limit, for security reasons, the OS calls your app can make. The majority of apps won't have a problem given Python's OS interoperability.
Resources¶
Every app gets 2 vCPUs and 8Gb of memory. We will continue to evaluate what is the best default setting.
Python¶
We use Python 3.11
, and all dependencies are installed with pip
in a virtual environment. The app command (from app.yaml) is executed within the virtual environment context, which means that it only has access to executables from the virtual environment.
Contract¶
We expect your app to respect a few guidelines we set. These are fairly standard in runtime environments.
- Must not implement TLS encryption on their own. See Proxy.
- Must handle HTTP/2 requests in cleartext (h2c) format
- Should handle graceful shutdown within 10 seconds SIGTERM, then SIGKILL.
- Must not require to run in a privileged security context.
- Must use
DATABRICKS_APP_PORT
network port and0.0.0.0
host. At the moment the port is8000
.
Environment Variables¶
The following environment variables are automatically set for the app.
General¶
Variable | Description |
---|---|
DATABRICKS_APP_NAME | Name of the app. |
DATABRICKS_WORKSPACE_ID | ID of the Databricks workspace the app belongs to. |
DATABRICKS_HOST | URL of the Databricks workspace. |
DATABRICKS_APP_PORT | Network port to listen to. See Runtime Contract. |
DATABRICKS_CLIENT_ID | Client ID for the app's service principal. |
DATABRICKS_CLIENT_SECRET | Client Secret for the app's service principal. |
In addition to the above, we set a few framework-specific environment variables so they just work.
Optionally, You can inject your own environment variables in app.yaml
.
Info
The environment variables injected are also logged so you can troubleshoot. Docs can be out-of-date.
Logs¶
Apps must write to stdout
and stderr
. There is currently no support for custom log files.
Pre-installed libraries¶
To make your life easier, we pre-install several libraries so you don't even need to worry about adding a requirements.txt file.
databricks-sql-connector~=3.0
databricks-connect~=14.0
databricks-sdk~=0.26
gradio~=4.0
streamlit~=1.0
dash~=2.0
flask~=3.0
fastapi~=0.110
'uvicorn[standard]'~=0.29
gunicorn~=22.0
pandas
numpy
Warning
Please note that PySpark
is NOT compatible with databricks-connect
. We are making the deliberate choice of pre-installing databricks-connect
as it is the recommended approach to interact with spark from Databricks Apps. See Best Practices
Proxy¶
The app is behind a reverse-proxy, so you need to ensure your app is not making assumptions on the request origin. We set the necessary configuration for streamlit and gradio.
Some frameworks use X-Forwarded-*
headers. We follow industry standards and should .
Header | Description |
---|---|
X-Forwarded-Host |
Original request URL. |
X-Forwarded-Preferred-Username |
User name provided by IdP. |
X-Forwarded-User |
User Id provided by IdP. |
X-Forwarded-Email |
User Email provided by IdP. |
X-Forwarded-Access-Token |
Databricks access token of end user. |
X-Real-Ip |
Real Ip of user machine. |
X-Request-Id |
UUID of request. |
TODO¶
- Maximum size of request
- Default settings on the reverse proxy that may be important to customer
- Any limitations with websockets, keepalive, and timeouts we may have
- Protocols that we support/limit (grpc, websocket, etc) (i know we require http/2 with H2C but any other limitations)
- Any CORS headers/x-frame-options that we control or recommend for embedding the app/etc
- Number of concurrent connections