I was writing a small Python script recently to use a web API over HTTPS. To use HTTPS properly, I naturally wanted to validate the server's certificate before trusting the HTTPS connection. Sadly, doing something so simple required more digging than I would have thought.
I was using Python 3's
http.client module for HTTPS, so I needed use the
ssl.SSLContext.load_verify_locations method of my SSL context to specify the path of the trusted CA certificate store for the cert verification. I wanted to use the pre-existing CA store on my computer, which I thought should be straight forward.
Like many crypto APIs in higher-level libraries, the Python function turned out to be pretty much a pass-through wrapper to the OpenSSL API of the same name. I hadn't used that API before, so I read the brief Python documentation, then skimmed the OpenSSL documentation from which I gleaned:
int SSL_CTX_load_verify_locations(SSL_CTX *ctx, const char *CAfile, const char *CApath);
If CAfile is not NULL, it points to a file of CA certificates in PEM format. The file can contain several CA certificates identified by-----BEGIN CERTIFICATE----- ... (CA certificate in base64 encoding) ... -----END CERTIFICATE-----
If CApath is not NULL, it points to a directory containing CA certificates in PEM format. [...]
That seemed relatively straight-forward. I looked at the "ca-certificates" package on my Arch Linux system to see where the system's default CA certificates where installed. (The ca-certificates package is a bundle of default (and often pre-installed) CA certificates deemed worthy of shipping out-of-the-box by organizations like Mozilla. Other Linux distros may use a different name for the package of certs.) I tried
load_verify_locations with the
CApath argument pointed to the ca-certificates store, which in my case it was
/usr/share/ca-certificates/mozilla. Even though the PEM certificates were there (the ".crt" files are PEMs; since PEM allows for a concatenation of ".crt" file contents the ".crt" files are PEM as a trivial case), the Python module didn't find the certificates and the HTTPS connection returned an
SSL: CERTIFICATE_VERIFY_FAILED error. I tried pointing the
CAfile argument at the X509 cert files, but that failed too.
I searched for example usage of
load_verify_paths in Python and C. I was very disappointed to find no useful examples among the first 20 or so. I found a lot of code that ignored certificate validation (which was disturbing), code that only used arguments pulled from external settings, and code that used paths to app-specific certificates. But no literal examples of using the environment's existing store.
I went back to the OpenSSL documentation. Whoops, I hadn't read it fully:
If CApath is not NULL, it points to a directory containing CA certificates in PEM format. The files each contain one CA certificate. The files are looked up by the CA subject name hash value, which must hence be available.
Prepare the directory /some/where/certs containing several CA certificates for use as CApath:
OpenSSL needs the cert directory to be setup first. The cert filenames within the setup directory are going to be the CA name hash. OK, where was that setup on my system?
After more searching I stumbled across a very recent helpful comment on a Python bug report. (Less than surprising, the bug was advocating that Python should make the system CA cert store easier to access, although the suggested method itself was not a good idea.) The comment provided a list of example paths for
CAfile on different common *nix systems:
cert_paths = [ # Debian, Ubuntu, Arch, SuSE # NetBSD (security/mozilla-rootcerts) "/etc/ssl/certs/", # Debian, Ubuntu, Arch: maintained by update-ca-certificates "/etc/ssl/certs/ca-certificates.crt", # Red Hat 5+, Fedora, Centos "/etc/pki/tls/certs/ca-bundle.crt", # Red Hat 4 "/usr/share/ssl/certs/ca-bundle.crt", # FreeBSD (security/ca-root-nss package) "/usr/local/share/certs/ca-root-nss.crt", # FreeBSD (deprecated security/ca-root package, removed 2008) "/usr/local/share/certs/ca-root.crt", # FreeBSD (optional symlink) # OpenBSD "/etc/ssl/cert.pem", # Mac OS X "/System/Library/OpenSSL/certs/cert.pem", ]
These are locations setup for OpenSSL's
load_verify_locations use. On my system, the majority of the files were symlinks to certs in the store, where the symlink name was the CA subject name hash. The tools
c_rehash (a part of OpenSSL) and
update-ca-certificates (a part of ca-certificates) setup this path for OpenSSL.
It was a little disappointing that there was no de-facto path, environment variable, or something of the like to find the prepared OpenSSL-ready CA store. But while you can't rely on the CA store setup to be in the same place across platforms, if this list is true then
/etc/ssl/certs/ seems very popular choice on Linux.
So in my Python script, I used:
ssl_ctx = ssl.SSLContext(ssl.PROTOCOL_TLSv1) ssl_ctx.verify_mode = ssl.CERT_REQUIRED ssl_ctx.load_verify_locations(None, "/etc/ssl/certs/")
and got SSL certificate verification to work using my pre-installed certificate store.
Indeed, the solution was indeed fairly straight-forward. I was just surprised by how long it took find it. I had originally thought there would be lots of examples since, well, scripts aren't carrying their own CA list around and they are performing certificate validation. Right?
On that disturbing note: Please, devs, do verify certificates! HTTPS is just HTTP with obfuscation if you don't do cert validation. Anyone can MITM the connection with a self-signed cert. Even if the application is doing something mundane, like some seemingly boring API queries, you probably still want to protect your API key. If using the pre-installed cert store was confusing before, I hope this article helped address that issue.