What is Base64 and How Does It Work?
Base64 is a binary to text encoding scheme. Binary data is represented as an ASCII string. Typically, Base64 encoding is used to encode binary data for transmission over text based media without changing the original byte stream.
What Is Base64 Encoding Used For?
Base64 is used to transmit data over text based mediums where the original byte values of the data are important. Some systems may alter the original bytes of a string due to differing text encodings. Additionally, text based systems typically can't handle full binary data. To be able to transmit a hash fingerprint for example, a receiver needs to be able to receive the exact same byte values sent by the originator. If the byte values were to change, verification of the digital signature would fail. You'll find Base64 used in SSL certificates, SAML assertions for single sign on systems, embedding data in XML and HTML, as well as many other use cases.
How Does Base64 Work Exactly?
The Base64 alphabet was chosen from a subset of the ASCII character set (A-Z, a-z, 0-9, +, /, =). If you're paying attention very closely, you may have noticed that's actually 65 different values. You are right, the character "=" has a very special meaning in the Base64 lexicon. More on that later. Each Base64 character represents 6 bits of data, grouped in fours, for a total of 24 bits or 4 base64 characters. Conveniently, 3 bytes of data can be encoded evenly.
Let's look at an example. Take the ASCII character string "Dog" is represented by:
- DASCII: 0100 0100
- oASCII: 0110 1111
- gASCII: 0110 0111
Now let's group the bits in groups of 6 for base64 encoding. You can refer to the Base64 Alphabet Chart
- RDec: 17Bin: 010 001
- GDec: 6Bin: 000 110
- 9Dec: 61Bin: 111 101
- nDec: 39Bin: 100 111
As you can see, the resulting Base64 encoded string for "Dog" is "RG9n". Let's do another example. Take the ASCII character string "Dogs" which is represented by:
- DASCII: 0100 0100
- oASCII: 0110 1111
- gASCII: 0110 0111
- sASCII: 0111 0011
Let's group this one's bits in groups of 6 for base64 encoding.
- RDec: 17Bin: 010 001
- GDec: 6Bin: 000 110
- 9Dec: 61Bin: 111 101
- nDec: 39Bin: 100 111
- cDec: 28Bin: 011 100
- ?Dec: ??Bin: 11? ???
We have a problem! We have 2 remaining bits and Base64 characters encode 6 bits. What do we do? Remember I said the "=" character has a special meaning? It's used to pad the end of a base64 encoded string. Remember that Base64 encodes 24 bits in chunks of 6 bits equaling 4 base64 characters. We get our last group of 24 bits by first adding zero bits to fill in the remaining bits of our last group of 6 bits.
- RDec: 17Bin: 010 001
- GDec: 6Bin: 000 110
- 9Dec: 61Bin: 111 101
- nDec: 39Bin: 100 111
- cDec: 28Bin: 011 100
- wDec: 48Bin: 110 000
Since we need 2 additional base64 characters for our second group of 4 base64 characters, we add the "=" character as padding.
- RDec: 17Bin: 010 001
- GDec: 6Bin: 000 110
- 9Dec: 61Bin: 111 101
- nDec: 39Bin: 100 111
- cDec: 28Bin: 011 100
- wDec: 48Bin: 110 000
- =Dec: --Bin: --- ---
- =Dec: --Bin: --- ---
Now we have 2 groups of 4 Base64 characters. If we had the ASCII character string "Dogs!" Our Base64 encoded data would look like
- RDec: 17Bin: 010 001
- GDec: 6Bin: 000 110
- 9Dec: 61Bin: 111 101
- nDec: 39Bin: 100 111
- cDec: 28Bin: 011 100
- yDec: 50Bin: 110 010
- EDec: 4Bin: 000 100
- =Dec: --Bin: --- ---
Notice only one "=" character this time. You can try for yourself, remember that Base64 input is case sensitive. Experiment with different input strings and notice how it changes as you add each character.
Base64 Alphabet Chart
- 0A
- 1B
- 2C
- 3D
- 4E
- 5F
- 6G
- 7H
- 8I
- 9J
- 10K
- 11L
- 12M
- 13N
- 14O
- 15P
- 16Q
- 17R
- 18S
- 19T
- 20U
- 21V
- 22W
- 23X
- 24Y
- 25Z
- 26a
- 27b
- 28c
- 29d
- 30e
- 31f
- 32g
- 33h
- 34i
- 35j
- 36k
- 37l
- 38m
- 39n
- 40o
- 41p
- 42q
- 43r
- 44s
- 45t
- 46u
- 47v
- 48w
- 49x
- 50y
- 51z
- 520
- 531
- 542
- 553
- 564
- 575
- 586
- 597
- 608
- 619
- 62+
- 63/