While OpenSSL has become one of the defacto
libraries for performing SSL and TLS operations, the library is surprisingly
opaque and its documentation is, at times, abysmal. As part of our recent
research, we have been performing Internet-wide scans of HTTPS hosts in order to
better understand the HTTPS ecosystem (Analysis of the HTTPS Certificate
Ecosystem, ZMap: Fast
Internet-Wide Scanning and its Security Applications). We use
OpenSSL for many of these operations including parsing X.509 certificates.
However, in order to parse and validate certificates, our team had to dig
through parts of the OpenSSL code base and multiple sources of documention to
find the correct functions to parse each piece of data. This post is intended to
document many of these operations in a single location in order to hopefully
alleviate this painful process for others.
If you have found other pieces of code particularly helpful, please don’t
hesitate to send them along and we’ll update the post.
I want to note that if you’re starting to develop against OpenSSL, O’Reilly’s
Network Security with
OpenSSL
is an incredibly helpful resource; the book contains many snippets and pieces of
documentation that I was not able to find anywhere online. I also want to thank
James Kasten who helped find and document several of
these solutions.
Creating an OpenSSL X509 Object
All of the operations we discuss start with either a single X.509 certificate or
a “stack” of certificates. OpenSSL represents a single certificate with an
X509 struct and a list of certificates, such as the certificate chain
presented during a TLS handshake as a STACK_OF(X509). Given that the parsing
and validation stems from here, it only seems reasonable to start with how to
create or access an X509 object. A few common scenarios are:
1. You have initiated an SSL or TLS connection using OpenSSL.
In this case, you have access to an OpenSSL SSL struct from which you can
extract the presented certificate as well as the entire certificate chain that
the server presented to the client. In our specific case, we use libevent to
perform TLS connections and can access the SSL struct from the libevent
bufferevent: SSL *ssl = bufferevent_openssl_get_ssl(bev). This will clearly be
different depending on how you complete your connection. However, once you have
your SSL context, the server certificate and presented chain can be extracted as
follows:
We have found that at times, OpenSSL will produce an empty certificate chain
(SSL_get_peer_cert_chain will come back NULL) even though a client
certificate has been presented (the server certificate is generally presented as
the first certificate in the stack along with the remaining chain). It’s unclear
to us why this happens, but it’s not a deal breaker, as it’s easy to create a
new stack of certificates:
2. You have stored a certificate on disk as a PEM file.
For reference, a PEM file is the Base64-encoded version of an X.509 certificate, which should look similar to the following:
In this case, you can access the certificate as follows:
3. You have access to the raw certificate in memory.
In the case that you have access to the raw encoding of the certificate in
memory, you can parse it as follows. This is useful if you have stored raw
certificates in a database or similar data store.
4. You have access to the Base64 encoded PEM in memory.
Parsing Certificates
Now that we have access to a certificate in OpenSSL, we’ll focus on how to
extract useful data from the certificate. We don’t include the #includes in
every statement, but use the following headers throughout our codebase:
You will also need the development versions of the OpenSSL libraries and to compile with -lssl.
Subject and Issuer
The certificate subject and issuer can be easily extracted and represented as a
single string as follows:
These can be freed by calling OPENSSL_free.
By default, the subject and issuer are returned in the following form:
It is also possible to extract particular elements from the subject. For example, the following code will iterate over all the values in the subject:
or
Cryptographic (e.g. SHA-1) Fingerprint
We can calculate the SHA-1 fingerprint (or any other fingerprint) with the
following code:
This will produce the raw fingerprint. This can be converted to the human
readable hex version as follows:
Version
Parsing the certificate version is straight-foward; the only oddity is that it
is zero-indexed:
Serial Number
Serial numbers can be arbitrarily large as well as positive or negative. As
such, we handle it as a string instead of a typical integer in our processing.
Signature Algorithm
The signature algorithm on a certificate is stored as an OpenSSSL NID:
This can be translated into a string representation (either short name or long
description):
This will result in a string such as sha1WithRSAEncryption or md5WithRSAEncryption.
Public Key
Parsing the public key on a certificate is type-specific. Here, we provide
information on how to extract which type of key is included and to parse RSA and
DSA keys:
Validity Period
OpenSSL represents the not-valid-after (expiration) and not-valid-before as ASN1_TIME objects, which can be extracted as follows:
These can be converted into ISO-8601 timestamps using the following code:
CA Status
Checking whether a certificate is a valid CA certificate is not a boolean
operation as you might expect. There are several avenues through which a
certificate can be interpreted as CA certificate. As such, instead of directly
checking various X.509 extensions, it is more reliable to use X509_check_ca.
Any value >= 1 is considered a CA certificate whereas 0 is not a CA certificate.
Other X.509 Extensions
Certificates can contain any other arbitrary extensions. The following code will
loop through all of the extensions on a certificate and print them out:
Misordered Certificate Chains
At times, we’ll receive misordered certificate chains. The following code will
attempt to reorder certificates to construct a rational certificate chain based
on each certificate’s subject and issuer string. The algorithm is O(n^2), but we
generally only receive two or three certificates and in the majority-case, they
will already be in the correct order.
Validating Certificates
In our scans, we oftentimes use multiple CA stores in order to emulate different
browsers. Here, we describe how we create specialized stores and validate
against them.
We can create a store based on a particular file with the following:
And then validate certificates against the store with the following:
It’s worth noting that self-signed certificates will always fail OpenSSL’s
validation. While this might make sense in most client applications, we are
oftentimes interested in other errors that might be present. We validate
self-signed certificates by adding them into a temporary store and then
validating against it. It’s a bick hackish, but is much easier than
re-implementing OpenSSL’s validation techniques.
Sometimes you will also find that you just need to check whether a certificate
has been issued by a trusted source instead of just considering whether it is
currently valid, which can be done using X509_check_issued. For example, if
you wanted to check whether a certificate was self-signed:
Helper Functions
There are several other functions that were used in troubleshooting and might be
of help while you’re developing code against OpenSSL.
Print out the basic information about a certificate:
Print out each certificate in a given stack:
Check whether two certificate stacks are identical:
Check whether the subject and issuer string on a certificate are identical:
Convert an OpenSSL error constant into a human readable string:
I hope this helps. As I stated earlier, if you find other pieces of information
useful, let me know and we’ll get things updated. Similarly, if you find that
any of the examples don’t work, let me know.
Thanks to Jordan Whitehead for various corrections.