Java SE 6
Java & Stream Ciphers
Java & Stream Ciphers
By: Rich Helton
Oct. 1, 2003 12:00 AM
In the 1990s, I worked extensively with the Winsock 2 interface and encryption when it first came out from Microsoft in Beta form; it was exciting in those days of networking because it allowed you to easily encrypt data through the networks.
When Java sockets came out, the encryption could be easily managed through a stream of data. After getting data in a socket stream and encrypting the stream as it passed through the Internet, I was hooked on Java. While C++ was prevalent, it didn't seem to have streaming algorithms ingrained in the language as well as Java did. Java created a secure programming language that separated itself from the operating systems and network internals, while the streams created a layer to process a continuous wave of data that could be encrypted and further the evolution of programming techniques.
In this article I discuss streams from the cipher perspective and provide an example of how to design and build a stream algorithm so you can practice proper techniques rather than rely on the technology to do it for you (an Ant script to deploy is included with the source code, which can be downloaded from www.sys-con.com/java/sourcec.cfm). The basic terminology to remember throughout the article is that a cipher is an encryption/decryption algorithm, and a stream is data processed a piece (either bit or byte) at a time. Knowing only those terms, you can build on the rest.
As I mentioned earlier, a stream is data that's processed one bit or, more likely, one byte at a time. It's worth noting that most algorithms will only work on a byte as a whole, not at a bit level. A stream cipher is both a decryption and encryption algorithm for streams. Encryption is used to change readable text - plaintext - into a nonreadable form - ciphertext. Decryption does the reverse. While most ciphers follow various block cipher modes based on the original DES, a true stream cipher (like RC4) comes in handy with unknown block sizes.
For those who are unfamiliar with ciphers and keys, a cipher is the engine that decrypts and encrypts data. The key is the extra data needed for the engine to complete the task. In the early days of cryptography, only a handful of people knew the algorithms and only they could encrypt and decrypt data. As time progressed, most algorithms became published specifications that anyone could access, and the key became the missing piece to ensure that not everyone could encrypt and decrypt the data. The key is a very important element that must be protected at all costs, especially if the key is symmetric. A symmetric or secret key is one in which the same key is used for encryption and decryption. Anyone who has access to the key can easily decrypt a message. Finding the algorithm that was used to encrypt is not complex, because like a virus, an encrypted message may also contain signatures that can describe its originating algorithm. The size of the key will determine how easy or difficult it is to break an encryption simply because a smaller size key has fewer possible choices, while a larger key has more.
To understand the concept of a stream cipher, part of the basics, let's discuss the theory of a key stream, sometimes called a running key. A key stream, in theory, is a continuous key that's constantly and randomly generated to produce the ciphertext. In other words, each key byte generated is XORed with a plaintext byte to produce a ciphertext byte for the size of the plaintext. In theory, if we have true randomness and the key is infinite, then the encryption could not be broken. The larger the key, the more secure the ciphertext, because there are more permutations of a key that have to be broken. The more random the key, the harder it is to break, because any pattern becomes harder to reverse.
The algebraic notation for the previous discussion can be represented as ci = pi ‰ ki. The symbol ci stands for the ciphertext at index i, the symbol pi stands for the plaintext data at index i, and ki stands for the key at index i. As we'll see in the RC4 algorithm, XOR is great because the same algorithm can be used in reverse. That is pi = ci ‰ ki , which means I can find the plaintext by XORing the ciphertext and the original key.
One of the practices that evolved from running a key theorem is using a product of one ciphertext byte as the key for the next plaintext byte. Another evolution of the running key is the idea of using the key to generate a larger set of keys by hashing initialization data to generate an S-box (a substitution box). S-boxes are discussed later, but can be described as creating a vector from a key to manipulate the data in an algorithm.
When all is said and done, we need to have a key (say 40-bits in some cases) with as few patterns as possible and the key needs to be kept secure. If you remember anything from this article, safeguarding the key must be the highest priority, since it controls access just like the keys to your car or house. The other point to remember about keys is that size does matter.
Other utilities that play a big role in secure programming in Java are the KeyStore, the jarsigner, and the security manager. While this article is too brief to describe these utilities in detail, you need to know that Java provides a utility called the keytool to store keys in a secure store, that Java has a jarsigner utility to sign a JAR file so it can't be written into without a certificate, and Java has a security manager that can control which resources can be accessed during runtime using a security policy. These utilities can control access to vital resources and data. I'd like to note that these resources come out of the box in Java as well as many encryption algorithms and, again, this is a benefit of using Java.
The de facto stream algorithm is RC4. RC4 stands for Ron's code number 4, Ron being Ron Rivest, the "R" in RSA. Since it's the de facto stream algorithm, I'll use it as an example to design, build, and deploy a stream algorithm. The reason you should understand cipher algorithms and their uses is not just to know how to use them, but to understand when to use them, their vulnerabilities, strengths, and how to develop your own algorithms.
After spending many years as a consultant, I've heard programmers proclaim, "I just need to know how to use it, not what it does; there are builders of the JCEs who are concerned about those talks." Yet, I have gotten a lot of consulting work reworking some organizations' code, usually because someone didn't understand the algorithm correctly. Just as an e-commerce programmer may understand the internals of JSPs and EJBs, a security programmer needs to know the internals of RC4, RSA, and other algorithms. From the IT security officer's point a view, programmers should be able to give the reasoning, strengths, weaknesses, and history behind the algorithms that they're using. Not understanding a cipher algorithm in enough detail could make misusing the algorithm worse than not having an algorithm at all.
Unlike a stream cipher, a block cipher uses chunks of data, a block, and usually 64 bytes to process through the cipher. If the block ends at less than 64 bytes, the algorithm pads the remaining block. For data that may be a few bytes, this may seem like a lot of overhead. For data that's time-consuming with a lot of I/O, the breaking up of blocks may seem to take up a lot of time. The solution to many may be to use a stream that handles any size data and is quick to process. Some of the places that a stream cipher may be a detriment would be using it for document files in which you wouldn't want plaintext and ciphertext lengths to match. I tend to use stream ciphers with stream I/O, especially Java sockets, when speed is important. Some users of RC4 state that RC4 is 10 times faster than DES. When using RC4, pay careful attention to the keys. If the same key is used over and over again, it could be compromised by constant observation and, if the key is not adequately randomized, it could be weak.
When using a cipher in Java, understanding the cipher itself, like RC4, is only a piece of the puzzle. An understanding about the Java Cryptography Extension (JCE) and service providers becomes paramount when using any cipher. A large part of understanding how a provider is accessed through the provider chain, how to access the provider, and how the algorithm is used in KeyGenerator and in CipherSpi is crucial. Understanding these concepts is important because programmers may be using a service provider without understanding the origin of the cipher they're using. In other words, programmers need to understand how to prevent Trojans and backdoors by understanding the origins of what their code is using.
All the key generators and ciphers in Java are built using the Service Provider Interface (SPI) layer. The idea of an SPI layer is to provide vendors with the ability to create their own algorithms with the use of a common interface. Since Sun supplies this interface; it allows others to commercially produce extensions that could work within the 1.4.1 SDK while not having to be built with the SDK. This article provides the code necessary to create a provider (code can be downloaded from www.sys-con.com/java/sourcec.cfm). All providers are registered with Sun to ensure that Sun knows who is integrating and interfacing into the 1.4.1 SDK.
Using the example class com.richware.cipher.RichProvider, the provider class is simple and there are a few things to remember. I declared the class as a final class so as not to allow the class to be extended; the class is extended from the java.security.Provider class shipped with the Java 1.4.1 SDK. The RichProvider registers information about itself to describe its origins like its name, version, and info. If a programmer is executing providers and possibly one that they downloaded, this is important information since it allows you to discover the origin of the provider. A lot of security goes into the provider interface because the valuable data of an organization that they encrypt could be sent through the Internet through a rogue provider.
Another piece of the provider is that it associates aliases to classes, usually both a KeyGenerator and a Cipher alias depending on the algorithm. However, this is totally dependent on matching a corresponding key type to the algorithm. For example, I used the following code:
This code simply means that when I pass RC4 in a KeyGenerator, it calls my KeyGenerator service provider code com.richware.cipher.RichRC4KeyGenerator. It will likewise call the provider's corresponding code when I pass RC4 in a Cipher instance. The code fragment executes this code in a block as a privileged action, which gets the JVM's Security Manager involved. All provider code must be signed in a Java Archive (JAR) file with a certificate from Sun so that security providers can possibly be tracked. In my example, when the richprovider.jar gets loaded, it has to be authenticated with a trusted certificate. You have to use the keytool to get the trusted certificate and the jarsigner utility to sign it in the provider's JAR file. Take a look at the sidebar "How to Get a Service Provider Certificate" for a set of simplified steps.
Looking at the steps in the sidebar, it's obvious that there are a lot of security provisions and traceability for using Java JCE providers. It was a lot different in the C++ days. In those days we just added a Dynamic Link Library (DLL) to the System32 path in Windows or a library in Unix. However, not to be preachy about the robustness of Java, you can examine the origin and execution of the JAR file. For instance, when in doubt about a JAR file, just move the JAR and isolate the execution of it to another system to examine. A trace of the JVM tracing into the JAR could be done to see if the JAR is Trojaned, but that is another discussion.
When using some of the more native libraries, it becomes more difficult to trace for Trojans through the libraries because it requires an understanding of the operating system that the native calls are integrated within. Some security consulting involves isolating the signatures for Trojans on libraries stored in the computer. These techniques help in host-based intrusion detection.
The code checks the key size in bits and, if it's the wrong size, it will throw an exception, otherwise it will generate a key based on the size. Most of the work is ensuring that the correct key size, given in bits and generated in bytes, is returned as a SecretKeySpec class.
Notice in Listing 1 that the SecureRandom class is used to create the random number key. The SecretKeySpec class is returned because the key is a secret key. One of the features of the code is that if you're not happy with Java's SecureRandom class and feel that you can build a better one, you can extend it and pass it in the KeyGenerator class to use it instead. A fragment of the RichRC4KeyGenerator code is provided in Listing 1.
The building of an S-box involves manipulating the key from an initialized S-box to produce a new substitution box to be used in the RC4 algorithm. Simply put, an S-box is built from a key and some initialization code like a new key that cannot be deciphered. The idea is to build an S-box to swap data from a known index into an unknown index to avoid Guassian elimination in trying to reverse the algorithm. This is accomplished by using the random key to define the position of the next swap with the index2 variable. The counter variable, along with the index1 variable, ensures that all the S-box bytes are swapped at least once during calculations. The idea is simply to try to avoid any pattern and factoring of the S-box while having the same key produce the same S-box.
After the S-box is built from the key, the RC4 algorithm can be used to encrypt or decrypt the data. Listing 3 demonstrates the RC4 cipher.
From the code, you can see that each output byte of the RC4 algorithm is the product of each input byte that is XORed with an S-box value with the xorIndex. First, an x and y index is selected. The y is a product of the S-box from the x index. The two S-boxes are swapped. Then an S-box is selected, a xorIndex that is the product of two other S-boxes that are symbolized by x and y indexes. Again, the idea is to keep swapping the S-boxes and index to make the value and location difficult to produce by factoring. Finding ways to make finding patterns and reverse factoring difficult becomes the guideline for developing ciphers.
Testing the Program
The program encrypts the message "This is a test, hackers beware," that is 31 bytes to a 31 byte encrypted message similar to "cUi8DZfy+IQti6xl4Z4FhzRZl2mY2Pa7RmZygn VXnA==" depending on the key. After encrypting, I decrypt and compare the output to the original message to see if anything changes. Listing 4 provides the code fragment that accomplishes this.
Included in the source code is a "-v" option that puts the test code and provider in a verbose mode for readers who might want to trace the provider calls and S-box information. A point to note in Listing 4 is that the same secret key used for encryption is used for decryption. When building the richprovider.jar, it won't work without being signed by the Sun certification. A "java.security.NoSuchProviderException: JCE cannot authenticate the provider RichWare" exception should appear without the certification.
To make the key pattern simple, if my key was all 0s and XORed with the plaintext, then the ciphertext would be the same as the plaintext. The cipher is useless. Now if the key contained only 1s and XORed with the plaintext, it would be easy to see the difference. If there was more of a mix of 1s and 0s throughout, it would become more difficult to see any relationship between the plaintext and ciphertext.
Here are simplified steps for getting a service provider certificate from Sun; of course, the Java 1.4.1 SDK documents go into a lot more detail. You'll need time to get this certificate from Sun. The following steps take you through the process:
2. JAR the provider class (i.e., RichProvider) code using the JAR utility.
3. Sign the JAR (i.e., richprovider.jar) trusted certification with the certificate from Sun using the jarsigner.
4. Ensure that the JAR (i.e., richprovider.jar) is in the %JAVA_HOME%jrelibext path.
Reader Feedback: Page 1 of 1
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
SYS-CON Featured Whitepapers
Most Read This Week