ASN.1 is a generic format with which you can describe any structure by defining a scheme and a message that conforms to that scheme. The idea is the same as in Google Protobuf or in XSD+XML. Here's an example of User scheme:
User
DEFINITIONS
AUTOMATIC TAGS ::=
BEGIN
Address ::= SEQUENCE {
city UTF8String (SIZE(1..32)),
street UTF8String (SIZE(1..32))
}
User ::= SEQUENCE {
firstName UTF8String (SIZE(1..16)),
lastName UTF8String (SIZE(1..16)),
role ENUMERATED { admin, user },
address Address
}
END
SEQUENCE bears the same meaning as XSD's sequence - it's just an ordered set of fields. Such schema roughly correspond to a class like this:
class User {
String firstName, lastName;
Role role;
Address address;
}
Many programming platforms have libraries to automatically compile ASN.1 to your Java/C++/whatever types. Once the schema is defined we can create messages that follow it. In ASN.1 it's called PDU (protocol data unit):
value User ::= {
firstName "Jerry",
lastName "Smith",
role admin,
address {
city "A",
street "B"
}
}
While this is a text format, it's just one of the many. You can also serialize messages into JSON, XML, DER, etc. So the workflow then looks like this:
- Create ASN.1 schema, compile it e.g. into Java & C++ classes
- Create an object in Java, serialize it into one of the formats supported by ASN.1
- Send it to a software written e.g. in C++
- Deserialize it into C++ object on the other end
You can play with ASN.1 (compile schema and create DPUs) using this online tool. In Java you can read and write ASN.1 messages using Bouncy Castle.
The most important serialization format that comes with ASN.1 is DER - it's a very compact binary representation of ASN.1 messages. This format is used by most of the crypto software to encode keys and certificates. E.g. our Jerry Smith message would take only 27 bytes (remember that it's binary).
You can use
this decoder to see the content of the file.
Note that field names are replaced with indices. And "admin" role turned into 0
. We can't turn this into
well-printed human-readable data because the tool doesn't have a complied ASN.1 scheme. Field and "enum" names
are not present in DER.
Even more compact encodings exist (BER, PER) but DER is the one used by most of the software.
While DER is very compact it's inconvenient when we need to copy-paste data - for that we need it in a text form. The solution is as always - to encode it further into Base64. And if we add header and footer in the right form we'll end up with PEM:
-----BEGIN OUR ENCODED DATA----
MBmABUplcnJ5gQVTbWl0aIIBAKMGgAFBgQFC
-----END OUR ENCODED DATA----
Oftentimes private keys and certificates are of this form. PEM can include metadata and multiple entries (will be useful for certificate chains):
This is metadata, I can put whatever I want
-----BEGIN OUR ENCODED DATA----
MBmABUplcnJ5gQVTbWl0aIIBAKMGgAFBgQFC
-----END OUR ENCODED DATA----
Next piece of PEM (actually it's the same data)
-----BEGIN OUR ENCODED DATA----
MBmABUplcnJ5gQVTbWl0aIIBAKMGgAFBgQFC
-----END OUR ENCODED DATA----
In practice instead of BEGIN OUR ENCODED DATA
there's a description of which format the underlying ASN.1 structure
represent - it could be public keys, private keys, certificates, etc.
You'll see below how ASN.1 is used to describe standard structures like keys and certificates. You'll notice that
instead of field names OIDs are used in crypto formats. These OIDs look like this 1.3.6.1.4.1.343
, and their
interpretation (human readable names) is described in respective specs. Looks like an overengineering, but who
am I to criticize.
You'll also notice that there are many fields that look like this:
SEQUENCE
OBJECT IDENTIFIER 2.5.4.6
PrintableString US
This is a key-value where first we see an OID 2.5.4.6
(which is a country name) and then a string "US".
Crypto formats heavily use sets of such key-values for extensibility: first a spec is prepared that describes
super important fields (their OIDs), and then new fields can be added by other specifications that describe
less important parts.
X.509 is a standard that defines ASN.1 scheme for certificates. There is already a great article that describes what a certificate consists of, so I won't duplicate it.
Such cert could be stored as PEM in which case it would have headers:
-----BEGIN CERTIFICATE-----
PKCS is a number of specifications that describe standard ASN.1 schemes to store crypto data:
- PKCS#1 - public or private RSA key
- PKCS#8 - any type (including RSA) of public or private key
- CMS (based on older PKCS#7) - certificate like X.509
- PKCS#12 - certificate like X.509 and its private key
If you see a PEM file with some header and you want to know what format is encoded there, you can find some of the common headers and their descriptions in this article.