Published on Jun 15, 2023
URL encoding#
URL encoding (also known as percentage encoding) is a way to pass around characters otherwise prohibited in the URL and HTML forms because they have special meanings. For example, to use http:// as part of a URL, not its beginning, it has to be %-encoded to http%3A%2F%2F.

URL anatomy#
scheme://host-or-ip:port/path/to/somewhere?query=param&yet=another
where
scheme- is a type of service (like http or htts)host-or-ip- textual or IP address of the serverport- defines the port number at the host (default for http is 80)path/to/somewhere- request pathquery=parameter- additional parameter name and its valueyet=another- query parameters may occur multiple times, and they are separated by&
Characters allowed in URL#
Many applications embrace URL-friendly strings as identifiers, names, or allowed values. An URL-friendly string is sometimes called slug.
The only characters that could appear inside the URL are split into two groups:
reserved characters
! * ' ( ) ; : @ & = + $ , / ? # [ ]have special meaning to URL and must be %-encoded to pass them as data in URL.unreserved characters
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 - _ . ~are allowed in URLs as-is.
All other characters (e.g., non-English letters, math symbols) must also be URL-encoded.
Troubles with slashes#
For example, very problematic are slashes. Slash as / is a path separator, and slash as %2F is data.
For an imaginary REST API endpoint on /get-file/<path>, compare two completely different URLs.
https://api.somewhere.com/get-file/sweet/cheescake.html
will end up with 404 Not Found because there is no /get-file/sweet/cheescake.html endpoint.
However,
https://api.somewhere.com/get-file/sweet%2Fcheescake.html
will be correctly routed to /get-file/<path> endpoint because file path sweet/cheescake.html is URL-encoded as sweet%2Fcheescake.html.
HTML forms#
HTML forms are the second percent-encoding domain. When data entered in the HTML form are submitted, the browser percent-encodes its field names and values with application/x-www-form-urlencoded MIME type.
The slight difference between percent encoding for forms and URLs is described below.
For example, sending two field form:
POST /send-feedback HTTP/1.1
Content-Type: application/x-www-form-urlencoded
who=Matt&text=I+want+more+examples
Troubles with spaces#
Very special is also a space character. URLs cannot contain spaces.
Within the URL it is encoded as %20. For example, to obtain sweet cheescake.html file:
https://api.somewhere.com//get-file/sweet%20cheescake.html
(Using space for file names is not a wise idea, anyway.)
However, when space occurs in HTML form field name or value, it is encoded as +.
URL quoting in Python#
How to perform URL encoding in Python? The standard library module urllib.parse provides (among others) these functions:
quote_plus()andunquote_plus()) for encoding and decoding HTML form values
By default, quote() function doesn’t encode / to %2F because it a “safe” character.
from urllib.parse import quote
path = "some/file with space.html"
# some/file%20with%20space.html
print(quote(path))
To encode all disallowed characters, set safe="" parameter:
# some%2Ffile%20with%20space.html
print(quote(path, safe=""))
quote_plus() and unquote_plus() work the same, but the space is encoded/decoded as + and it has no safe characters by default:
# some%2Ffile+with+space.html
print(quote_plus(path))