URL encoding (also known as percentage encoding) is a way to pass around characters otherwise prohibited in the URL and HTML forms because they have special meanings. For example, to use
http:// as part of a URL, not its beginning, it has to be %-encoded to
scheme- is a type of service (like http or htts)
host-or-ip- textual or IP address of the server
port- defines the port number at the host (default for http is 80)
path/to/somewhere- request path
query=parameter- additional parameter name and its value
yet=another- query parameters may occur multiple times, and they are separated by
Many applications embrace URL-friendly strings as identifiers, names, or allowed values. An URL-friendly string is sometimes called slug.
The only characters that could appear inside the URL are split into two groups:
- reserved characters
! * ' ( ) ; : @ & = + $ , / ? # [ ]have special meaning to URL and must be %-encoded to pass them as data in URL.
- unreserved characters
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 - _ . ~are allowed in URLs as-is.
All other characters (e.g., non-English letters, math symbols) must also be URL-encoded.
For example, very problematic are slashes. Slash as
/ is a path separator, and slash as
%2F is data.
For an imaginary REST API endpoint on
/get-file/<path>, compare two completely different URLs.
will end up with 404 Not Found because there is no
will be correctly routed to
/get-file/<path> endpoint because file path
sweet/cheescake.html is URL-encoded as
HTML forms are the second percent-encoding domain. When data entered in the HTML form are submitted, the browser percent-encodes its field names and values with
application/x-www-form-urlencoded MIME type.
The slight difference between percent encoding for forms and URLs is described below.
For example, sending two field form:
POST /send-feedback HTTP/1.1 Content-Type: application/x-www-form-urlencoded who=Matt&text=I+want+more+examples
Very special is also a space character. URLs cannot contain spaces.
Within the URL it is encoded as
%20. For example, to obtain
sweet cheescake.html file:
(Using space for file names is not a wise idea, anyway.)
However, when space occurs in HTML form field name or value, it is encoded as
How to perform URL encoding in Python? The standard library module
urllib.parse provides (among others) these functions:
unquote()for encoding and decoding URLs
unquote_plus()) for encoding and decoding HTML form values
quote() function doesn't encode
%2F because it a "safe" character.
from urllib.parse import quote path = "some/file with space.html" # some/file%20with%20space.html print(quote(path))
To encode all disallowed characters, set
# some%2Ffile%20with%20space.html print(quote(path, safe=""))
unquote_plus() work the same, but the space is encoded/decoded as
+ and it has no safe characters by default:
# some%2Ffile+with+space.html print(quote_plus(path))