TechWriter at work blog logo

TechWriter at work blog

Living and writing documentation at Documatt, small team of programmers that write documentation too.


Creating URL-encoded URLs in Falcon web framework#

REST APIs can pass another URL as part of the API’s URL. E.g., /cache/foo%2Fbar.html is passing foo/bar.html to /cache/ endpoint. Building these APIs in Python is difficult because of the unfortunate omission of the WSGI specification that doesn’t provide the means to access the raw request URL. WSGI offers web frameworks only already URL-decoded PATH_INFO CGI variable. I.e., for the previous example, they actually got /cache/foo/bar.html which likely leads to 404 Not Found as there is no cache/foo/bar.html endpoint.

Our example caching endpoint /cache/<path>: Getting foo.html is easy as /cache/foo.html. However, if a path itself contains characters otherwise considered as part of a URL like /, it needs to be URL encoded. Obtaining foo/bar.html is /cache/foo%2Fbar.html.

We love Falcon web framework at Documatt, and it’s the foundation for our backends. Unfortunately, Falcon also suffers from this WSGI heritage, and their FAQ well explains forward slash issues.

However, the complete workaround for correct routing %-encoded lashes in Falcon involves a little more.

Step 1: Middleware#

The workaround assumes a WSGI HTTP server that provides a non-standard CGI variable with non-URL encoded request URI. For example, Gunicorn exposes the RAW_URI variable.

Again, Falcon documentation does an excellent job and offers ready-made middleware recipe patching request path.

I will only improve it by raising an error if your Falcon app is launched in the WSGI server not-providing the raw URI CGI variable.

class RawPathComponent:
    def process_request(self, req, resp):
        raw_uri = req.env.get("RAW_URI") or req.env.get("REQUEST_URI")

        if not raw_uri:
            raise AssertionError(
                "To properly route request URIs containing percent encoded "
                "slashes (%2F), Falcon must be started with WSGI server "
                "providing raw URI via non-standard CGI variable. See "
                "https://falcon.readthedocs.io/en/stable/user/recipes/raw-url-path.html."
            )

        # NOTE: Reconstruct the percent-encoded path from the raw URI.
        req.path, _, _ = raw_uri.partition("?")

The resource class:

class URLResource:
    def on_get(self, req, resp, url):
        resp.media = {'url': url}

The middleware registration:

app = falcon.App(middleware=[RawPathComponent()])

And, the route declaration:

app.add_route('/cache/{url}', URLResource())

Step 2: Converter#

Unfortunately, the responder method on_get() receives the URI template field {url} in the url argument in a raw form, i.e., percent-encoded (foo%2Fbar.html).

To translate it to foo/bar.html, call falcon.uri.decode(value):

class URLResource:
    def on_get(self, req, resp, url):
        # NOTE: url here is potentially percent-encoded.
        url = falcon.uri.decode(url)

        resp.media = {'url': url}

A better approach is to create a custom field converter performing decoding for us.

class EncodedUrlConverter(BaseConverter):
    """
    Replace ``%2F`` to ``/``.
    """
    def convert(self, value):
        return falcon.uri.decode(value)

Register it under, e.g., percent name to the Falcon app:

app.router_options.converters["percent"] = EncodedUrlConverter

And, use it in route declaration:

app.add_route('/cache/{url:percent}', URLResource())

Step 3: Replace simple_server with Gunicorn#

From now, your Falcon app relies on a WSGI server providing raw URI in non-standard CGI variable, like Gunicorn.

Typical Falcon app often contains wsgiref.simple_server.make_server(), a builtin very simple WSGI server implementation useful for local debugging because it works with pdb.

from wsgiref.simple_server import make_server
from resources import URLResource
import falcon

app = falcon.App()

app.add_route('/cache/{url:percent}', URLResource())

if __name__ == '__main__':
    with make_server('', 8000, app) as httpd:
        print('Serving on port 8000...')

        # Serve until process is killed
        httpd.serve_forever()

Running the Falcon app via it will break our workaround since simple_server provides no raw URI CGI variable.

Let’s replace simple_server with a minimal but real Gunicorn server (see Gunicorn custom integration). It works with pdb too.

if __name__ == "__main__":
    from gunicorn.app.base import BaseApplication

    class StandaloneApplication(BaseApplication):
        def load_config(self):
            self.cfg.set("bind", "127.0.0.1:8000")

        def load(self):
            return app

    StandaloneApplication().run()

Debugging Gunicorn-powered apps#

Gunicorn applies a timeout, after which the process is killed. It’s useful in production, but when you debug, pause at a breakpoint, and think too long, the timeout might kill the debugger session. Hopefully, setting the timeout to zero disables it.

For example, when debugging in VS Code, let’s add --timeout 0 to my Falcon app debug configuration:

{
    "configurations": [
        {
            "name": "my-falcon-app",
            "type": "python",
            "request": "launch",
            "program": ".venv/bin/gunicorn",
            "args": ["--reload", "--timeout", "0", "api.app"]
        },
    ]
}

Step 4: TestClient tests#

Important

December 2023: The described issue has been fixed in Falcon 3.1.2.

The last struggles comes with testing. There’s a bug breaking TestClient tests. However, a fix is on its way already.

Comments

comments powered by Disqus