Protect Your Source Code: Obfuscation 101
Pages: 1, 2
Obfuscation Example
The University of Arizona has developed a very nice Java tool called Sandmark that is used to test and study software watermarking. As part of that study, it implements many well-known obfuscation algorithms and provides a GUI for a tool called Soot, an optimization tool that can also decompile bytecode. These two tools together allow you to take Java code, obfuscate it, and then decompile it to see the effect of the obfuscations. Let's get started with an example to put all of this together:
Download Sandmark. Get the executable sandmark.jar file and all the supporting jar files: BCEL.jar, bloat-1.0.jar, dynamicjava.jar, and junit.jar. Place these files in /Library/Java/Home.
Download Soot. You'll want the three precompiled jar files: sootclasses-2.2.1.jar, jasminclasses-2.2.1.jar, and polyglotclasses-1.3.jar available from the main page. Place them in /Library/Java/Home.
Download the jDecompile script. Add this script to your
$PATHby typingexport PATH=$PATH:/path/to/file/jDecompile. Also, change its permissions to executable withchmod u+x jDecompile. If you decide to use this tool a lot, you'll want to permanently add it to your path by modifying your .bashrc file. Make sure you're running Sandmark from the same Terminal you used to place this script in your path or else Sandmark won't find the script and you'll get errors.
To start up Sandmark, navigate to the sandmark.jar file with Terminal and execute it by typing java -jar sandmark.jar. The toolbar up top expands with a button on the far right, and each tab has its own specialized help menu, which actually is pretty helpful. For instance, the jDecompile script you downloaded is an adaptation of a script from the "Decompile" tab that I tailored for OS X.
Sandmark has an easy-to-use GUI interface and a great help system.
To do a quick obfuscation of source code, you can choose a particular obfuscation algorithm from the "Obfuscate" tab, or by letting Sandmark apply a variety of obfuscation algorithms on the "Quick Protect" tab.
For an overview of the algorithms available in Sandmark, check here. The only caveat is that Sandmark expects a jar file. (If you'd like an overview of creating and working with jar files before jumping into an example with Sandmark, check here.)
Save the following Java code to a file called IfElseDemo.java:
public class IfElseDemo {
public static void main(String[] args) {
int testscore = 76;
char grade;
if (testscore >= 90) {
grade = 'A';
} else if (testscore >= 80) {
grade = 'B';
} else if (testscore >= 70) {
grade = 'C';
} else if (testscore >= 60) {
grade = 'D';
} else {
grade = 'F';
}
System.out.println("Grade = " + grade);
}
}
Let's apply an obfuscation algorithm to this simple example and then decompile the obfuscated bytecode to see the difference.
Compile IfElseDemo in Terminal.
Save a file called IfElseDemo.java containing the example code above.
Type
javac IfElseDemo.javato compile.Type
java IfElseDemoto verify that the code runs.
- Wrap IfElseDemo.class into an executable jar file.
- With VIM or a text editor of your choice, create a file called "mainClass" in this same directory.
In "mainClass", place this line: "Main-Class: IfElseDemo" (no quotes).
Back on the command line, type
jar cmf mainClass IfElseDemo.jar IfElseDemo.class.Verify that the jar file is created and type
java -jar IfElseDemo.jarto verify that it executes properly.
Obfuscate the IfElseDemo.jar in Sandmark.
Choose the "Obfuscate" tab, and select the "Merge Local Integers" algorithm. Since our example code is primarily dependent upon integers for its logic, this looks like a good choice.
Name the output file IfElseDemo_obfuscated.jar.
Click on "Obfuscate".
Verify that IfElseDemo_obfuscated.jar exists and execute it with
java -jar IfElseDemo_obfuscated.jar.
Decompile IfElseDemo_obfuscated.jar with Sandmark to see the difference.
Choose the "Decompile" tab by extending the tabs with the arrow button on the far right.
Choose the IfElseDemo_obfuscated.jar as your input file.
Type "IfElseDemo" (no quotes) into the "Class" text box.
Leave the "Classpath" text box blank.
Click on "Decompile".
If all goes well, a preview of the obfuscated source code opens up that is quite a bit harder to understand. If you have trouble with the decompiling portion, make sure your path is set correctly for the Terminal window in which you're running Sandmark.
Although this example doesn't unlock any of the secrets of the universe, it does illustrate how effective obfuscation can be for even a simple example. Now imagine applying various obfuscation techniques to thousands of lines of more complex code.
If you take a look at the algorithms Sandmark offers, you'll notice that there's scores of confusing possibilities. Refactoring inheritance hierarchies, introducing confusing arithmetic operations, and introducing buggy variations of existing code blocks that never get executed are just a few of the possibilities. Keep in mind that you might want to just obfuscate the sensitive portions of your code, because the obfuscation can impose size and performance penalties. The penalties may or may not make a difference; it's a trade-off you have to measure and consider.
Final Thoughts
In a world where everyone follows license agreements and no one wants to reverse engineer government secrets, obfuscation techniques wouldn't be of much use. Since we don't live in the shire, however, security measures have their place and are just one of the many things that keeps the world spinning. Hopefully, you now have a better feel for the compilation process and understand how obfuscation is a powerful tool you can use to protect your code from exploitation and hacking.
Matthew Russell is a computer scientist from middle Tennessee; and serves Digital Reasoning Systems as the Director of Advanced Technology. Hacking and writing are two activities essential to his renaissance man regimen.
Return to MacDevCenter.com.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 18 of 18.
-
Extremely slow applications
2005-04-09 13:50:15 elanthis [Reply | View]
-
Extremely slow applications
2005-04-09 17:00:50 Matthew Russell |
[Reply | View]
Well, I'd be careful with making such a generalization as that. Clearly, if you obfuscate your entire performance sensitive application then you'd probably see some major meltdown (especially for an interpreted language), but it's almost never going to make sense to obfuscate an entire application.
It's likely just the sensitive portions of code that you want to protect; maybe something during startup that makes sure you've registered it, a portion that determines whether to fire another nag screen, or some other routine that handles how you encrypt some data that's stored to disk -- or something along those lines. i.e. the portions that are likely to be examined and patched. -
Extremely slow applications
2005-04-10 20:03:32 michaelartemiw [Reply | View]
What generalization! The code obfuscations in the article will without a doubt slow your software to a crawl. In 90% of the cases your "sensitive" code areas are going to be ones that need to run fast.
Just as you want it to take the hacker longer to break your code than it would be to write it from scratch you want it to be faster for the user to use your software than to do it by hand or use the competition's product.
It should be noted that we are talking about control flow obfuscation which is bleeding edge. You can still perform name obfuscation which will not cause a performance penalty and still deter most snoopers. -
Extremely slow applications
2005-04-11 06:11:31 Matthew Russell |
[Reply | View]
Ok, so first let me say that I'm not making the case that obfuscation doesn't incur a performancy penalty in many cases. But there are many techniques that aren't such a performance drain as you'd think and instead impose more of a size penalty --something we might be able to live with a little bit more. One of my favorites is when you duplicate a code block several times, introduce small bugs in each of the blocks, and then make it difficult to determine which of the blocks is actually executed. Sandmark does things along these lines.
BTW, the examples in the article for HelloWorld are illustrational and intended to communicate the concept, and weren't intended to be taken as being state-of-the-art. Sandmark, on the other hand, does some pretty clever stuff.
If you tailor your obfuscation approach and actually think about what it's doing (as opposed to blindly picking a technique), you can keep the performance penalty fairly minimal many times. In the case of Java or even ObjC, name obfuscation (as you mentioned) can go a long way and it's trivial to do with a few perl scripts.
I don't know that I'd agree that 90% of the time your sensitive code areas are going to have to run fast. Maybe so, but maybe not. I wouln't feel comfortable saying that it's anywhere 90% of the time though. Certainly, protecting against nag screen removal, registration code patching and basic security mechanisms along those lines don't need to run at lightning fast speeds, and are some of the most common ones that come to mind -- I'd think that this is especially true for "small" independent developers, who often have created more of a software engineering masterpiece than some cutting edge new algorithm for something. -
Extremely slow applications
2005-05-18 16:11:26 zeon_uk [Reply | View]
What you're forgetting is that compilers generate code in specific ways. A hacker over time gets used to these layouts and would immediately notice any obscured code. That in itself is saving the hacker time!
Secondly, a lot of protection code jumps through various hoops but nearly always ends up with a simple binary decision. Most hackers would just alter this bottleneck. It's not true for all situations, but you would be surpised how often it does happen. Literaly changing one bit in a branch instruction can circumvent an awful lot of protection code. Not every developer has the ability to check the object code to see if his techniques are affective.
Third, I agree with the remark regarding looking for the distinct features of sandmark, and writing something to reverse its affects. Many years ago when version 3 of a well known dongle maker came out I downloaded a copy of their developers kit and removed the protection on a $3000 application in 45 minutes. Their marketing campaign quoted them as spending a million man hours in its development. I did this by request...the user was having problems and it was impacting his ability to work.
I've seen attempts that cause many problems, including response times. I remember some apps. performing so many security checks that they consumed 10-15% of the running time.
My opinion is that security by obscurity never works...obfuscation is just another form of this.
As has already been mentioned...at best you can slow a hacker down. If you make it require too much time rather than skill he may not bother. Obviously, this applies to those that do it for the challenge. Commerical pirates will spend the time. Eastern Europe used to be the hot bed for this kind of thing.
That my two cents worth...
-
God save us all...
2005-04-11 16:35:55 matthewmusgrove [Reply | View]
...from developers like you.
Obfuscated source code can still be deobfuscated. Obfuscation only slows would be attackers. You should be fired for claiming that obfuscation is a means of securing programs.
-
God save us all...
2005-04-13 08:26:03 Matthew Russell |
[Reply | View]
In response: you already that any code can always be hacked. Nothing is ever really secure from a theoretical standpoint. But to quote you, "Obfuscation only slows would be attackers." That's the whole point, and I'm glad that you mentioned it. And for the record, obfuscation can and does secure programs. Not in the same sense that encryption secures a message being communicated (which I mentioned), but you'd have to redefine secure to say that it doesn't help to protect source code from attack. Nothing can ever totally secure programs, but it certainly does secure them from elementary patches and the likes.
-
Those who do not learn from history.
2005-04-12 00:37:11 Michael Schwern | [Reply | View]
the vast majority of software pirates won't spend 500 hours reverse engineering and patching a simple $10 shareware application
Every anti-piracy method is based on this assumption. Every one fails because it is not true. It was not true back when it was a bunch of computer nerds copying 5 1/4" floppies using BitNibbler downloaded off a dial-up BBS and its not even remotely true now with your grandma downloading music over BitTorrent.
The vast majority do not need to break the obfuscation. Just one. The Internet takes care of the rest. And there's always somebody who does this sort of thing for fun and is shockingly good at it.
An article investigating automated obfuscation and discussing how its done might have been interesting. But to conclude that obfuscation is security? Sophomoric.
-
Those who do not learn from history.
2005-04-13 08:34:11 Matthew Russell |
[Reply | View]
You quoted me to say that "the vast majority of software pirates...." and then make you point by saying that "The vast majority do not need to break the obfuscation. Just one." I agree with you since that seems agreeable what I was trying to communicate in the first place: deterrence goes a long way (not that obfuscation is your end-all solution for securing anything).
Ditto on my other response about obfuscation and security. I still stand by it.
BTW, I totally appreciate the feedback and thought that readers put into these replies. I'm engaging in a dialogue, because I find it interesting and even a little bit fun. Hopefully, others find it just as fun to read through and think about. -
I think you missed the point.
2005-04-13 13:38:58 Michael Schwern | [Reply | View]
The deterrence goes no where. Once one person breaks the obfuscation they post the unobfuscated code up on the Internet and you're shot. And there's a lot of folks on the Internet who love to break anti-piracy measures for breakfast.
-
You are teaching people to throw away their money.
2005-04-13 14:16:25 Michael Schwern | [Reply | View]
A friend of mine pointed out that I still hadn't mde the core point clear enough. So here it is.
Obfuscation is a waste of your time and money. By teaching people that obfuscation adds any deterence you are teaching them to throw away their time and money.
Why?
Because you are fighting against the entire Internet. You will lose.
Because some bored teenager is going to reverse the obfuscation and post it on the Internet irregardless of its value to him. The more interesting the anti-piracy measure the better, its about challenge not economics. Then anyone who can use google can find and download a copy.
This has been true for the last 20 years and it gets more true all the time.
With this in mind, your obfuscation presents about as much a barrier to pirating your software as byte-compiling it does. That is, it changes it from simply opening up the file and looking at it to doing about 5 minutes of work. For byte-compiling: finding and using a decompiler. For obfuscation: a google search. Since obfuscating is not adding any more anti-piracy value to your product than you're already getting by byte-compiling any amount of time you put into it is a waste of your employer's money.
Furthermore.
Because it increases code complexity which increases code maintenance costs.
If you obfuscate the code by hand, woe be unto the next person who has to maintain that code. Or even if its you, six months later, when you've forgotten what in the hell you did. The automated obfuscator solves this problem, but because it is using rote transforms (ie. refactoring) and well-known obfuscation techniques the pirates will go through it like tissue paper.
I wouldn't even be surprised if someone came out with an automated deobfuscator to undo each Sandmark transform.
That said, I would like to reiterate that the Sandmark automated obfuscator is interesting. An article focusing on how that accomplishes its obfuscations (not just how to use the thing) and how it measures code complexity would be very interesting.
Just don't try to say that its security.
-
You are teaching people to throw away their money.
2005-04-13 15:08:05 Matthew Russell |
[Reply | View]
Some thoughts on your response (BTW, this is interesting) :
"Obfuscation is a waste of your time and money" -- really? So if you do have some software and want people to register it by purchasing a code or something along those lines, should you take any protection measures at all to protect your algorithm that checks the registration code? I'd have to think that you would. Yea, people will try and may eventually break it, but that doesn't mean you just raise the white flag. Serious businesses and corporations seem to agree since they don't give up quite so easily. After all, everyone can't become rich through Google ads. Paying real money for real software eventually comes into play.
Also, keep in mind that "hard" algorithms, if correct, are at least as hard to break as they can be proven to be hard.
"Because you are fighting against the entire Internet. You will lose" -- That's just not a core philosophy I personally embrace for any endeavor. I see it as a defeatist attitude that ultimately leads to failure and pessimism if taken to the extreme. At any rate, I'd rather it take someone some time and frustration to reverse engineer or hack some of my work than to hand it to them on a silver platter.
"With this in mind, your obfuscation presents about as much a barrier to pirating your software as byte-compiling it does" -- If you're already assuming that a patch is out there, than I suppose a Google search does take care of the job in five minutes or less, but I'd defer to my previous points about not just surrendering and raising the white flag.
"I wouldn't even be surprised if someone came out with an automated deobfuscator to undo each Sandmark transform." -- I would be a bit surprised actually. And until someone actually accomplishes this by building a general purpose tool, I think I can remain rather unsurprised.
"Just don't try to say that its security." I call putting a padlock on a door security even though someone can take a pair of bolt cutters and rip it off...so I think I will have to remain of the opinion that obfuscation is indeed a measure of security. I'm actually the one that's surprised to hear so much of the contrary.
No security is bulletproof, but I still think that a little bit can go a long way, even if there is an internet vehicle that can be used to share the piracy with the rest of the world.
Out of curiousity, what would you think that people should do for "security" rather than shouldn't do? You've said a lot about security, but it's all been "don't do it that way" rather than "here's a specific thing that you should do".
-
Article purpose
2005-04-13 09:23:25 Florijan [Reply | View]
I am a beginner programmer, and I work with Java a lot at the moment. I read the article, and by thoughts were: oh well, not such a big wisdom described, however I can imagine it to be useful. I also thought that obfuscation is NOT such a nice method of "hiding" your source code. Performance and size being the main reasons.
But I still thought it was a decent article because it introduces some tools and techniques I (and many others I think) have considered. I did not see any nasty presumptuous sentences like: "If you obfuscate (and obfuscation is beautiful and costs you nothing), you are free from all the possible hacking threats there are or ever will be in this universe!" And those comments I read on the article seemed to be attacking exactly that one line which WASN"T THERE. So, read carefully people.
I think it is an OK article, even thought I don't think it will help many people. -
Statements via association
2005-04-13 13:52:40 Michael Schwern | [Reply | View]
Part of the art of writing is putting things in the right context. Associating one point with another without having to come right out and saying it.
So, the setup talking about bank accounts and espionage.
"With a few magic keystrokes, the entire system would somehow just start transferring funds to foreign bank accounts..."
"Well, that might account for the money laundering and espionage back in the 1980s"
The conclusion talking about government secrets.
"In a world where everyone follows license agreements and no one wants to reverse engineer government secrets, obfuscation techniques wouldn't be of much use."
And with just one sentence between...
"Hopefully, you now have a better feel for the compilation process and understand how obfuscation is a powerful tool you can use to protect your code from exploitation and hacking."
What sort of code are we protecting? Why code that protects government secrets, foreign bank accounts and espionage of course! That's the impression the author will walk away with especially if they're reading quickly.
The author might not have been aware of what was implied, but since he's not sitting next to everyone reading the article to correct their impressions it doesn't matter.
-
Obfuscators can compact/ improve code
2005-04-14 12:54:38 JohnGrant [Reply | View]
Pro obfuscators need not hurt performance.
DashO http://preemptive.com/products/dasho/Benefits.html
can compact / improve performance.
John -
Obfuscators can compact/ improve code
2005-04-14 14:55:12 Matthew Russell |
[Reply | View]
Just as a quick note for our readers: be careful here because some compilers remove 'dead code' when it's detected during the optimization phase (code that never actually gets executed). Be sure to check the compiler flags and make sure you don't unintentionally remove 'dead code' that you didn't mean to remove, especially when dealing with compiled languages and mature compilers like GCC.
-
Ulterior motives?
2005-04-15 18:22:58 scartinfuffle [Reply | View]
I suspect these detractors of obfuscation are hackers who would love
people to believe that obfuscation is pointless. Especially schwern,
who tries stupidly to raise a red flag by stating that obfuscation :
| ... increases code complexity which increases code maintenance costs.
| If you obfuscate the code by hand, woe be unto the next person who has
| to maintain that code.
He obviously doesn't understand (or doesn't want potential customers of
obfuscating software to understand) that the original project source code
(on which the maintenance is performed) is never obfuscated, just the
source code that can be reverse engineered from the executables.
Sure there may still be some hackers who see obfuscation as a challenge
but the the longer those criminal idiots spend hacking one piece of
software the less time they have to to hack the next. And every day
a hack is unavailable is another day that unscrupulous downloaders
who want the product now but can't find a hack may consider buying a
licence.
So in short - yes .. obfusccation is a waste of time and resources...
the hackers' time and resources! Which is a good thing.






You are inserting gobs and gobs and gobs of useless, pointless code that the compiler doesn't (is in fact incapable of) optimizing out. What might have been done in a dozen opcodes now takes hundreds or thousands. Doesn't matter if you're using C, Java, or any other language, such obfuscation techniques make your final application far worse performance wise.