Java Compiler Generating Secret Methods

I'm going to show you a little trick that will add two methods to any Java class, without actually defining them. Furthermore, these methods will be given package visibility, accessible by any class in the same package.

First and foremost, credit for showing me this interesting javac tidbit goes to Ted Neward. Ted recently presented an introduction to java bytecode at a local JUG. The entire presentation was incredibly interesting, but one of the more interesting bits that came out was a little bit of trickery the java compiler performs in certain cases.

Let's take a look at two classes. One we'll call CompTest, which will be a simple class that contains a private String and a method to print it to the screen. The second will be CompExecutive, which will simply make a CompTest instance and call the one method it defines. It will also use Java reflection to count the number of methods on CompTest.

  1. public class CompTest {
  2. private String myVariable="This is a private variable";
  3.  
  4. public void printVar() {
  5. System.out.println(myVariable);
  6. }
  7. }
  1. import java.lang.reflect.*;
  2.  
  3. public class CompExecutive {
  4. public static void main(String[] args) {
  5. CompTest ct=new CompTest();
  6. ct.printVar();
  7.  
  8. //Get all of the methods on the class.
  9. Method[] declaredMethods=ct.getClass().getDeclaredMethods();
  10. System.out.println("Number of Methods on CompTest: "+declaredMethods.length);
  11. }
  12. }

When we run this program, we get what we would expect. The outputted string, as well as the number of methods defined in CompTest, one.

This is a private variable
Number of Methods on CompTest: 1

Now let's change CompTest slightly. Let's give CompTest an inner class. Inner classes are allowed to access private variables inside the containing class, so we'll make an Inner class that changes the private variable, then prints it.

  1. public class CompTest2 {
  2. private String myVariable="This is a private variable";
  3.  
  4. public class InnerClass {
  5. public void alsoPrintVar() {
  6. myVariable = "Is it still private?";
  7. System.out.println("From Inner Class: "+myVariable);
  8. }
  9. }
  10.  
  11. public void printVar() {
  12. System.out.println(myVariable);
  13. InnerClass ic=new InnerClass();
  14. ic.alsoPrintVar();
  15. }
  16. }
  1. import java.lang.reflect.*;
  2.  
  3. public class CompExecutive2 {
  4. public static void main(String[] args) throws Exception {
  5. CompTest2 ct=new CompTest2();
  6. ct.printVar();
  7.  
  8. //Get all of the methods on the class.
  9. Method[] declaredMethods=ct.getClass().getDeclaredMethods();
  10. System.out.println("Methods on CompTest2: "+declaredMethods.length);
  11. }
  12. }

What do we expect the output to be? We haven't added any methods to CompTest, just an inner class declaration. The output should still say that there is only one declared method, right? And yet, when we run it...

This is a private variable
From Inner Class: Is it still private?
Methods on CompTest2: 3

Three? Where did the other two methods come from?

The answer lies in the way that the java compiler deals with inner classes. If you've ever worked with inner classes and packaged them, say into a .jar file, you may have noticed that each inner class is actually compiled to its own class file. In the above example, compilation yields three files: CompExecutive2.class, CompTest2.class, and CompTest2$InnerClass.class. This third class file is an independent, compiled class. One might wonder, given the fact that this is a separate class, how it is able to access the private variables inside CompTest2. Answering this question also gives us the secret behind the extra two methods.

We can run CompTest2 through javap, the java disassembler included with the jdk. If we run javap -c CompTest2, we can see the disassembled code that makes up our class. If we do so, we find this:

public class CompTest2 extends java.lang.Object{
public CompTest2();
  Code:
   0:   aload_0
   1:   invokespecial   #2; //Method java/lang/Object."<init>":()V
   4:   aload_0
   5:   ldc     #3; //String This is a private variable
   7:   putfield        #1; //Field myVariable:Ljava/lang/String;
   10:  return
 
public void printVar();
  Code:
   0:   getstatic       #4; //Field java/lang/System.out:Ljava/io/PrintStream;
   3:   aload_0
   4:   getfield        #1; //Field myVariable:Ljava/lang/String;
   7:   invokevirtual   #5; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   10:  new     #6; //class CompTest2$InnerClass
   13:  dup
   14:  aload_0
   15:  invokespecial   #7; //Method CompTest2$InnerClass."<init>":(LCompTest2;)V
   18:  astore_1
   19:  aload_1
   20:  invokevirtual   #8; //Method CompTest2$InnerClass.alsoPrintVar:()V
   23:  return
 
static java.lang.String access$002(CompTest2, java.lang.String);
  Code:
   0:   aload_0
   1:   aload_1
   2:   dup_x1
   3:   putfield        #1; //Field myVariable:Ljava/lang/String;
   6:   areturn
 
static java.lang.String access$000(CompTest2);
  Code:
   0:   aload_0
   1:   getfield        #1; //Field myVariable:Ljava/lang/String;
   4:   areturn
 
}

We can ignore the specifics about how jvm assembler works, but we can notice which methods are defined on the class. First is a constructor - no surprises there. It doesn't count as a method, so that's not one of the three. Next is printVar, which looks just how we declared it. But after that, things get strange. There are two extra methods, access$000 and access$002. They both take a CompTest2 instance, and they both return a string. What's going on here?

What's going on is that, since the compiler has to put CompTest2$InnerClass inside its own class file, it has to make the private variable, myVariable, accessible to it. Inner Classes are secretly loaded up to be constructed with a reference to the container class, so when methods need access to private members, they call these access methods, passing in the instance of the Outer Class they were given. These access methods can't be private (since the separate class can't see them), and Java doesn't support friend classes like C++, so the least permissive accessibility the access methods can have is package level visibility.

In other words, javac silently adds package-level methods to CompTest2, which allow CompTest2's private variables to be modified and accessed. If CompTest2 didn't have the line that changed myVariable, the compiler would have only added one method. As it is, two methods were added: one that simply returns the value, and the other that allows it to be changed.

And yes, you can use reflection to call these methods from a class in the same package as CompTest2. Case in point:

  1. import java.lang.reflect.*;
  2.  
  3. public class Sneaky2 {
  4. public static void main(String[] args) throws Exception {
  5. CompTest2 ct=new CompTest2();
  6.  
  7. Method secretSetMethod=ct.getClass().getDeclaredMethod("access$002",CompTest2.class, String.class);
  8. secretSetMethod.invoke(ct,ct," -- Not so private anymore, huh? -- ");
  9.  
  10. ct.printVar();
  11. }
  12. }

Since myVariable is private, you would hope that this code wouldn't compile, or wouldn't run, or something. Certainly that it wouldn't actually change the value of myVariable inside ct. And yet, when you run this program, you get this:

 -- Not so private anymore, huh? --
From Inner Class: Is it still private?

If you are wondering why it still prints "Is it still private?", it's because the method printVar() tells an InnerClass instance to run its own alsoPrintVar() method, which sets the variable (overwriting what we set using reflection).

There you go. Every time you use inner classes or anonymous inner classes, javac makes a package-visibility method to access or mutate any private variable your inner class needs to access or mutate. If your inner class only reads the variable, it makes the method that only returns it, but if your inner class changes it as well, it makes a second method to change the value.

Thanks again to Ted Neward for his excellent presentation.

Language Marathon: First Impressions of Ruby, Python, and C#.

For the past two months, I've been learning three different programming languages simultaneously. I've been wanting to learn Python for a while, and I finally made some time to read Learning Python cover to cover on camping trips. I've also been thrown into the world of Ruby (as well as Rails) for work, since we are developing our next web application using that technology. Lastly, I've had to learn C# and .NET for school - the last class for my Master's degree.

I've wanted to learn all three of these technologies for a while, but I've never had the time. This has been a very exciting experience for me, and I wanted to share my first impressions of these languages.

C#: First Impressions

I've been working with C# less than the other languages, so I still have a great deal to learn. Overall, the experience has been pretty positive. The language is almost exactly like Java. Java stole it's syntax from C for the most part (for developer familiarity), but C# is sort of a syntactic hybrid between Java and C++ (more towards the Java side). Overall, I like it.

I used to use C++ in high school and C in college. I never really liked C++ - I felt like the language hacked object orientation onto a fully procedural language, and it really showed. The language never seemed well designed to me, and it kind of had a "worst of both worlds" feel to it. C# seems like a natural progression in the C family - something even higher level than C++, without the procedural vestiges of C.

When I was a Windows user, I did most of my Windows application development in Visual Basic. Generally if I needed a Windows program, this was the easiest route to take - the form designer let me focus on the look and feel of the application, which was generally important (otherwise I'd be writing it as a console application in a different language entirely). I dabbled in the MFC a bit, but I generally found that writing Windows applications in C++ reminded me of all the things I hated about C++, and made me hate them even more. Visual Basic, for all of it's faults, gave me access to the same Win32 API, without the need to jump through so many of MFC's many hoops.

C# and .NET are a substantial improvement over this type of development. The easy-to-work-with GUI designer from Visual Basic is preserved, but the code now makes more sense, and uses a far less offensive language than either Basic or C++. I really liked the fact that when I changed properties of my GUI elements, rather than those being saved magically as they were in Visual Basic (last time I used it, anyway), the IDE simply changed the appropriate lines in the constructor for my form class. Partial classes are a very neat feature as well - allowing me to define a single class in separate files allows me to separate the event processing GUI code in a class from the look and feel GUI code in the same class.

C# makes me almost regret abandoning Windows a while back - if I were still using Windows, C# would become my development language of choice for native GUI applications. My fiancee and I both have windows on our phone, so I may start to look into developing mobile applications.

All in all, C# is a lot like Java - if I wanted to write a native Windows GUI application using Java syntax, C# would be the way to go.

Ruby: First Impressions

Ruby fanbois like to mention that Ruby has been around for over 10 years, but doing so denies reality. The truth is Ruby had a tiny section of fans (but nothing approaching widespread adoption), but didn't really hit the scene until 2004, when it became the driving force behind Ruby on Rails. As such, I can't mention much about Ruby without mentioning Rails.

Ruby, to be honest, does not impress me. Generally, I'm not a fan of languages that let you do the same thing in multiple ways. It allows personal style to influence code far too much. As a simple example, I like writing my loops out explicitly, so I use a for statement with a collection. A co-worker prefers to use the .each method on a collection, passing in a code closure. There isn't anything 'wrong' with either one of these approaches, but each of us dislike the way the other writes loops. This means that our shared codebase has inconsistent looping mechanisms all over it, making it less readable and less elegant. This is obviously not a fault of Ruby, but a fault of my co-worker and myself, since we could just flip a coin or something and commit to a certain way, but my point is that this situation is common in languages that allow multiple ways to do the same thing.

Ruby simply seems like a crappy language to me. Scripting languages don't get run through compilers, which means that everything which would be a compiler error in a language like Java or C++ becomes a runtime error when the application is actually running. The way a lot of scripting languages handle this is by being more tolerant - more situations that would lead to a compile error are tolerated by the interpreter, and handled as elegantly as possible, sometimes without even moving into an error state. Ruby doesn't seem very tolerant to me, but since it's not compiled, I see a lot of errors (which, in my experience, tend to be pretty poor) at runtime.

Rails, however, is excellent. Whenever I work with RoR, I can't help but be a bit disappointed that somethings as good as Rails is built on top of something like Ruby. Normally I'm not a big fan of convention over configuration, since the list of "conventions" can tend to grow so quickly that it's difficult for someone to learn all of the rules. However, having done Java web development for a long time, I definitely find convention over configuration tempting - Java web development suffers from configuration poisoning. I put my preconceptions aside when I started working with Rails, and though some of the magic it does makes my skin crawl sometimes, I definitely like it.

It didn't take my company very long to get past the RoR learning curve and start being productive - and I definitely have experienced how much easier and faster it is to make changes and improvements to the web application in RoR. We've been consistently overestimating our stories, since we're trying to estimate based on how long changes would take in Java. For the past three iterations, we've finished early and had to pull stories in from the next iteration since we were able to enhance the product so quickly. I have no doubt that RoR is the reason for this productivity gain. This is particularly impressive, since the Ruby and Rails documentation efforts are both godawful.

Most of what Rails does isn't new or original, though. I feel like the biggest advantage of Rails is simply how forceful it is about the MVC pattern - making filenames and actions line up is all that is needed to separate the view from the controller, which simply takes a lot of boilerplate code off my back. The templating and tag definitions are simply very good implementations of very old ideas. The ORM library actually kind of sucks. All in all, Rails is just a convenient packaging of ideas that can (and do) exist on other platforms. I think that 90% of what I actually care about in Rails could be written into a framework for nearly any other language, even Java.

Overall, RoR has ruined me a bit on Java web development. I finally understand why people who work with RoR dislike going back to Java for web development. I used to like Java web development, but there's a small part of me that would dread returning to it as well. Ruby on Rails really is a fantastic web application development platform.

Python: First Impressions

When I started using Linux early this year, the hardest adjustment for me was that I had lost the ability to write native GUI applications for my operating system. Like I said earlier, I used to throw together GUI apps using Visual Basic in Windows, but no such thing existed in Linux.

The deeper I got into the world of Linux, the more I noticed how often applications were written in Python. Python is extremely popular in the world of Linux. I would say that about half of the GUI apps I run regularly in Ubuntu are written in Python, or at least have Python components.

A friend in college was a big fan of Python, so I decided I would learn Python and it would become my application language of choice in Linux.

Python blew my fucking mind

Python is, quite simply, amazing. I've worked with a lot of programming languages, and I don't think I've ever come across one that is designed as elegantly and perfectly as Python.

There are hardly any exceptions to the language rules in Python - everything is incredibly consistent. As I was reading my Python book, I would often have questions along the lines of "How would Python handle this...", and I got to the point where, instead of reading to find the answer, I would close the book and simply think about it. Based on the other rules I knew about Python, I would try to predict what Python would do if it were designed consistently, and sure enough that's exactly what the book would tell me on the next page - every single time.

I still have some gripes about Python, of course. I dislike having to pass self references into class methods. It irks me that there's no way to make a private variable, and the fact that any method can overwrite anything on any class makes my hair stand on end. However, for each of these complaints, I UNDERSTAND why it works the way it does, and it makes complete sense to me. Unlike my complaints with Java, my complaints with Python come not from language inconsistencies, but simply from my own personal preference. Even if I had the power to change Python to allow private members, I wouldn't, because it would be inconsistent with the design of Python.

I've been writing all of my console apps and shell scripts in Python lately, and I'm excited to write some GUI apps using Glade and Python together soon.

In terms of language design and comfort, Python is officially my new favorite language. After just one book (which was one of the best programming books I've ever read), I feel like I've completely mastered the language. It's so straightforward, consistent, and easy to learn - I'm kind of annoyed I didn't learn it sooner.

Conclusion

That about covers my first impressions of my whirlwind language tour these past two months. C# is exactly the language I would have loved when I was still a Windows user, Ruby sucks, Rails is awesome, and Python is the best language ever made.

I'm worried that I may get rusty with Java - there's so much exciting stuff to work with, I can't really see myself starting a new Java project any time soon. I haven't written a single line of Java in a couple of months (I'm doing Ruby exclusively at work, C# at school, and Python at home), but I seriously doubt I'll ever really abandon Java - I still like it a great deal, and I'm excited to see how it continues to evolve.

Java Security FUD

As soon as Java started to grow in popularity, the misinformation about it began to spread. There are, of course, a number of legitimate criticisms of Java, but most of them have been used constructively to improve it. One of the oldest criticisms of Java was that it is slow - a criticism that was valid once-upon-a-time, but has become decreasingly relevant as Sun has made improvements to the JVM (particularly with Just-In-Time compilation). My friend Brandon has thoroughly deflated this criticism in a blog post that grew into a JDJ article (congrats, Brandon!), so I won't spend any more time on it.

A great deal of the invalid criticism of Java actually comes from a misunderstanding of the purpose of Java. Though the goals of Java have certainly changed since it was conceived as an embedded systems language to run appliances, there is no denying that, currently, the goals of Java are clear: write once, run anywhere. Though there are a number of areas Java stands to improve on this goal, it is definitely the idea. This inherently means that Java applications don't get to access low-level system calls that bypass the JVM. This limitation is the core cause of the common criticism that Java doesn't give you enough rope to hang yourself, as C++ does.

I am unsure of what it is about Java that inspires such a dogmatic loathing of it by so many developers that prefer not to use it. Though dogmatism is no stranger to the world of software development, it rarely reaches the level of vitriol that is common in the "Java vs. Anything" debate.

One such example is a web site entitled "Sun Redefines Randomness". The wording of even the title of this page makes it clear that its contents are meant as a criticism against Sun and Java, but unsurprisingly a bit of investigation reveals that it is simply FUD.

The page contains a simple applet. This applet generates a field of black and white boxes, using the java.util.Random class and taking the lowest-order bit from each integer generated to decide the color of the box. It does this repeatedly in an animation, and there is no denying that when you look at the applet's output, there is a distinct pattern to it. From the page:

As you can probably see by the horizontal stripes, this 'random' method exibits significant periodic behavoiour.

Rather than looking like random static, the applet's display looks like a badly tuned-in television.

All Java virtual machines that are available to me appear to exhibit the same problem.

It is interesting, his point about how all virtual machines do this. Of course they do - the implementation of Random is in the actual class file, not the JVM.

In any case, the author of this page doesn't seem to understand a few basic things about java.util.Random. This isn't surprising, since these things aren't exactly well-documented or obviously apparent.

RNGs

Random Number Generators (RNGs) are not, generally speaking, actually random. Picking a random number is actually a relatively difficult task, which is why random number generators pick what are called "pseudorandom" numbers.

With computers, it is not simply enough to use the last digit of the computer clock whenever you need a random number. There are actually patterns to when the OS schedules your process to run, and those patterns emerge when you try such a thing, but that's not the point. Random Number Generators need to run from a seed, meaning that you need to supply the RNG with a starting number, and it needs to generate a sequence of numbers based on that starting number. If you were to give an RNG the same seed again, it would need to generate the same sequence of pseudorandom numbers. Why bother with the seed? Because without it, you might not be able to easily reproduce a bug in your program consistently, which means you can't fix it. Being able to make the computer go through the same sequence of events is requisite for consistent debugging, which means even if you use random numbers in your program you must be able to get to the same sequence of random numbers.

This means that every RNG needs to work by performing a series of mathematical operations on a number. The RNG starts with a seed, and performs a series of mathematical operations on it to get the first "random" number. The next time a random number is needed, it performs the same operations on the last random number to get a new one, and so on. This ensures that, given the same starting number (the seed) the same sequence of numbers would be generated.

As you might imagine, a very bad random number generator would likely reveal a predictable pattern, given these constraints. So, is Java's RNG bad?

Java's Random class

When it was time to write a RNG for Java, a decision had to be made. Typically, when programmers use a random number generator they just want, effectively, a dice-roller. It is rarely important that a sequence of a million dice rolls generated in this manner not reveal a pattern. It is far more important, to your TYPICAL development purpose, that the dice-rolls occur quickly. Secondarily, it is important that the dice-rolls be fair, meaning that if you were using an RNG to generate a random number from 1-6, you would get an even distribution for each number approximately 1/6th of the time. Thirdly, it is important that the numbers be non-periodic, which is to say lacking predictable patterns in number generation.

There are a LOT of algorithms to take a number and generate a new pseudorandom number based on it. Most of these algorithms work by (and this is a gross oversimplification) multiplying the number by an extremely large number in order to put it far outside of the range of desired numbers, then adding another offset number, and then modding that number to get it back into the range. This is called a "linear congruential algorithm".

Java's Random class uses a 48-bit version of this algorithm. Here is the entire code that does the work of turning a number into the next number in the sequence:

seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1);

Just as described, this algorithm multiplies the previous number (seed), adds another number, then munges it back within range.

As you can imagine, because of the small number of mathematical operations, this algorithm is extremely fast. The constants used here have been selected because they yield random numbers that are fair.

Could the algorithm have also been non-periodic? Yes, but it would also be slower. Sun selected an algorithm that, first and foremost, is quick. They made this call based on their best guess for what your average developer using the class needed. Had they made it more cryptographically secure (more "random") it would have been slower.

Java's SecureRandom class

Of course, there ARE people who need numbers that are more cryptographically random. Sun made the logical choice: extend the Random class into a new class that overrides the generating function. Inside of the java.security package is a class called SecureRandom that has the same methods as Random. It re-implements the main seed-modifying method to use a completely different algorithm.

In fact, you can supply a number of options to the constructor of SecureRandom to use a variety of different popular random number generating algorithms. If you supply no arguments, SecureRandom will be constructed with the best generator it can find.

The author's source code is located here. If you simply change the code to construct an instance of java.security.SecureRandom rather than java.util.Random, without changing a single line of code otherwise, the pattern in the graph instantly vanishes.

On line 37, I have changed

Random rnd = new Random();

to

Random rnd = new java.security.SecureRandom();

I have uploaded the changed version here.

There are two things worth noticing about this new version of the applet. One, it lacks the periodicity illustrated in the non-secure applet. Two, the framerate is much lower.

Like I said, Random was written to be fast, which it is. SecureRandom is designed to be secure, so it is much slower.

Long story short, if you are writing code that needs somewhat random numbers and needs them quickly, use java.util.Random. If you are writing code that needs extremely random numbers less quickly, use java.security.SecureRandom. Of course, if you are writing code that needs extremely random numbers extremely quickly, use a degree in mathematics.

As you can see, this is just another unwarranted criticism of Java that is as trivial as it is un-researched. Criticism is meant to be constructive, which means it is meant to help a thing improve. It is impossible to improve upon something that isn't actually broken, so this kind of FUD accomplishes little more than further polarizing the world of software development.

Responsibility Of Data Warehouses

Imagine that you are going on a long vacation, and you ask a good friend to take care of your cat while you're away. Your cat seems pretty content with your home, so you give your friend the key to your apartment so he can come over once every few days and refill the water bowl, empty litter, etc.

One day, though, your friend forgets to lock your front door when he leaves. The next day, someone enters your home and steals your television, your jewelry, your computer, and everything else of value.

Who is to blame?

Obviously, we blame the robber. But do we blame the friend? I think most of us would blame the friend, as it was his responsibility to lock the door, and it was this failure to do so that caused your items to be stolen. We may feel a bit bad yelling at this friend (since we asked him to come over) but in the end we would still blame him.

Watch how this problem changes when you catch the robber, though. Right now, if you imagine yourself in this scenario, you're angry with your friend. Change the scenario to imagine the robber is caught the next day and all of your stuff is returned. Now, likely, you're angry with the robber. You don't feel any animosity toward the friend at all, yet his role in this is completely unchanged. He still should have locked the door, he still failed to do so, and he still allowed a stranger into your home as a result. Why are we less angry with him in this scenario? It is not because he is less responsible, but because we now have someone else to be more angry with. The robber is the MOST responsible person for the robbery, your friend a close second. Because of this, when you can hold the most responsible person accountable, you are likely to ignore the role played by the next-most-responsible individual.

Next time you go on vacation, however, you are likely to not ask this same friend to take care of your cat, because you still know he is partially responsible for this security breach.

Data Warehouses

Let's change the scenario one more time. Instead of a friend, imagine the person in your home has not been invited. Imagine that the way things really work is that, whenever you buy a home, or a car, or a television, or even a pizza, you have to pay with money as well as part of your housekey. It's not the whole housekey, but it's a chunk of the housekey - and each purchase requires a different chunk of the key. You are trusting every company you make a purchase from in this way not to do anything with the portion of the key, but you're not too worried; after all, it's only part of the key.

Then all of those companies turn around make a copy of your portion of the key, then send the portion to another company, which we'll call Acxiom. Eventually Acxiom collects enough portions of your key that they can form your entire housekey, which they then use to enter your home when you aren't there. They don't steal anything or take anything, but they do take notice of your home, your car, your television, and your empty pizza boxes.

This allows them to figure out what kind of thing you might be likely to buy next if asked. In your case, based on the stuff in your house, it seems likely that you'd be willing to purchase a DVD Player if given a little push. Acxiom then tells Sony that you might want to buy a DVD Player, Sony pays them money to do so (giving Acxiom profit enough to keep collecting portions of housekeys) and then sends you a DVD Player catalog, or calls you on the phone to tell you about a great deal, or sends you an e-mail.

Now, I don't want to get into how annoying this entire business model is, or how Acxiom has no right to be entering your home, even if you had to give portions of housekeys to everyone from whom you made a purchase. This post is not about how a company that does this shouldn't exist.

This post is about what happens when that company leaves your door unlocked.

Responsibility

Surely in this scenario, if Acxiom left your door unlocked, you would find Acxiom completely responsible. After all, who the hell invited them in the first place?

ChoicePoint (Acxiom's major competitor) had a data breach not too long ago, and it was held responsible and fined. Only a portion of the fine was actually given to the victims of the data breach (people whose identities were stolen), but the fine was still large. We also hold the robber responsible in these cases, but we do not allow the responsibility of the robber to overshadow the responsibility of the person who left the door unlocked. If anything, because we are placing so much trust in these kinds of companies, we hold the data company more responsible than the attacker.

To that point, ChoicePoint was fined $15 million, and the attacker sentenced to 16 months in prison.

Clearly we, as a society, said to ChoicePoint, "look, we aren't huge fans of you holding on to all of this data, but if you're going to do so, you had better damn well protect it." ChoicePoint is responsible for the theft of its data, and we have sent them that message loud and clear, to the tune of 15 million dollars. ChoicePoint invested tons of money in improving their security as a result.

If you hold data and do not protect it adequately, it is your fault when it is stolen.

Acxiom's Breach

A few years ago, a man who had legitimate access to part of Acxiom's data broke into other Acxiom databases and gained access to a lot of information about various people. He then sold some of this information to advertisers so they could launch an ad campaign using it.

He was caught, and much like when the robber was caught in the first scenario, that seemed to overshadow the fact that the breach happened in the first place.

The article explains:

Prosecutors said Levine had permission to access part of Acxiom's database but that he used decryption software to obtain passwords and go beyond his authorized access. Data stolen included names, telephone numbers, street addresses and e-mail addresses, along with highly detailed demographic information.

If Levine used "decryption software to obtain passwords" to other databases, it means that the passwords were, first, stored in a place that they could be retrieved without authorization to do so and, second, were stored in a cryptographically reversible manner. Passwords are usually stored as a one-way hash. You type your password to be saved, and the software hashes it into a special code. This code cannot be reversed to get back to the original password, but every time the password is hashed it yields the same code. Next time you try to verify your identity, the software performs the same hash and compares the hash codes. If they are equal, you must have typed the correct password.

If an attacker gets access to these hash codes, they cannot reverse them to get back to the password. This article indicates Levine did so, which indicates that the passwords we being stored in a way that allowed them to be decrypted. This was Acxiom's mistake - they should not have stored passwords in this manner.

Moreover, they should have had better access control, so that people with access to part of the system wouldn't be able to access a different part of the system.

These are both failures on the part of Acxiom to protect your data, yet the person being held fully responsible is Levine. As a matter of fact, Levine has to pay $153,395 to Acxiom, so they are actually being rewarded for failing to secure your data. Why?

Why did the government hold ChoicePoint responsible for a similar breach, but only hold the attacker responsible when it happens to Acxiom? On what grounds have we decided Acxiom has done nothing wrong?

They left the door unlocked.

Saying Goodbye To Windows

I've been a lifelong Windows user. The first computer I ever had ran Windows 3.1 and I eagerly upgraded to Windows 95, 98, and 2000. I considered myself a power user of these operating systems. Yes, I ran Windows, but I also developed in Windows frequently, and I understood Windows at an accomplished level.

Though I've been an Open-Source advocate for quite some time, I frequently experienced major problems when trying out Linux.

I tried a version of RedHat when I was in high school, and I tried Mandrake, Gentoo, and Fedora when I was in college. Whenever I tried to use Linux, I was met with some kind of problem that I couldn't overcome by myself. A "deal-breaker", as I called it, that left me to transition back to Windows with frustration.

I lived on the Computer Interest Floor when I was in college, and a lot of my friends ran Linux, so I figured it would be a great time to evaluate Linux. I tried many times to run Linux with the help of these friends, but even they, as Linux Gurus, discovered my problems couldn't be overcome. I used my computer for everything - work, school, multimedia, and even television. Not being able to do my job (which was writing Flash applications) was a dealbreaker one year. Not being able to use my tv-card in Linux was a dealbreaker another year.

Despite being a huge fan of open-source as a philosophy, Linux was always not quite ready for me as a user. By the time I had "given up" on Linux, I was running Windows 2000 exclusively (I liked 2000 a lot more than XP, which I found obnoxious). Every machine I set up ran Windows 2000. I replaced the shell with the open-source LiteStep and a custom theme I wrote. I replaced the file manager with the freeware x2explorer. I ran OpenOffice rather than MS Office, and FireFox rather than IE. I liked joking that the only part of Windows I used was the kernel - everything on top of it was free, and usually open.

A year or so ago, when I started reading about Vista, I knew I was in for trouble. I resisted XP because I didn't like the direction it took, and Vista seemed even worse. DRM drivers, "call-home" spyware, and a general lack of control in the hands of users all really irked me. I kept reading articles about planned features for Vista, and eventually I discovered something that was new to me.

A dealbreaker. In Windows. At first I tried to convince myself that it was just a rumor, but as more articles were published it became clear that there was no way around it. Windows Vista binds itself to your computer hardware. If I install Windows Vista on a certain machine, then decide to replace the motherboard in that machine, Vista considers that to be a "new" computer. Despite the fact that the "old computer" is just a scrapped motherboard sitting in a box in my closet, and despite the fact that the hard drive upon which Vista was installed remains, Vista considers it a brand new computer.

They give you the first "new computer" categorized in this manner a free pass. Upgrade the motherboard again, however, and you need to buy a new copy of Vista.

My main desktop, which I call "wrath" has been my main desktop for many years. It has run Windows 2000 as long as it has existed, and it has been through at least 5 motherboards, 10 hard drives, 10 ram sticks, 3 cases, and 3 video cards. The hardware has changed regularly, but I always considered it the same machine, because it was my ONLY desktop machine and the components that made up my previous desktop went into a box in the closet. This means that the copy of Windows 2000 I purchased for use as my desktop OS has always been active on only one machine. This is a legitimate use of my Windows 2000 CD, well within legality and with a clear intention NOT to unfairly pirate the OS in any way. Yet, as of Vista, I would have needed to buy 2 or 3 copies of Vista for this. That's simply unacceptable. That's a dealbreaker.

As the days ticked by, they approached two important dates: the day of release for Windows Vista and the official day that Windows 2000 would stop being supported. No more patches, no more security upgrades for 2000 users. I was a Windows fan, but I'm not stupid: running Windows without security patches is technological suicide. I could buy myself some time by biting the bullet and upgrading to XP, but that wasn't really a permanent solution. I absolutely would never be willing to install Vista, so I had to come up with a way to continue using my computer in spite of that. It was time to return to my old rival, Linux.

About 8 months ago, I began a process of migrating to Linux. This was not my usual "install Linux and see how I like it" process - this was a full-on switch, with the intention of being permanent. When the process was complete, I'd be using Linux as my main desktop operating system. I was out of options for Windows, so I was embracing Linux entirely.

I needed to be competent with Linux by the time Vista came out. That meant no copping-out and dual-booting, and it meant not building a spare Linux box to "play around with". I had to immerse myself in Linux if I were going to really learn it.

Instead, I would build a spare Windows box to "play around with" so that I could continue running games and video editing tools. I did all of my multimedia tv-watching on an XP Box in the living room, so I wouldn't have to depend on Linux for that (since it has always given me trouble). With my requirements from my desktop machine relaxed, I had a much better chance of being successful with Linux.

It took some time to get the multimedia box stable, the windows machine built, and a network-storage solution enabled in my home so I could share things like music across my network. I was able to finally switch to Ubuntu Linux about 3 months ago.

How has it been?

Getting my mouse to work correctly has been a pain. Making Ubuntu play nicely with my Western Digital NetCenter was something of a nightmare. Linux can't seem to handle my KVM switch without disabling my mouse wheel. Every torrent app for Linux is inferior to uTorrent. I've definitely dealt with a lot of frustration in Ubuntu - frustration with things that I took for granted when I used Windows. Despite these frustrations, there has been a noticeable lack of something important: a dealbreaker.

As obnoxious as Ubuntu can be at times, nothing so far has made me give up and re-install Windows. Nothing has gone past the level of annoyance.

This week Vista was released to the world. Linux has no dealbreakers, only annoyances. Windows has a dealbreaker. For the first time since I started using a computer, the roles of Linux and Windows have switched for me. I've enjoyed Ubuntu so much that I'm considering installing it on my laptop.

Have I learned enough about Linux to consider myself "competent" with it in time for Vista's release? Not as much as I'd like, but I'm quick enough performing tasks in Linux that I feel like I've moved past the hardest part of the learning curve. By the time XP stops being patched, I think I will be comfortable enough with Linux to put it on my multimedia machine. By the time Windows 2000 stops being supported, I think I'll be okay with the idea of shutting down my backup Windows machine permanently.

A new version of Windows is out, and for the first time I don't care.

I'm a Linux User now.

My Interview With Google (Continued)

I didn't expect the story of My Interview With Google to be a two-parter, but it turns out the story didn't end where I expected.

Not too long after I made the post, it was submitted to Reddit.com where it enjoyed front-page status for two days. During that time, I got a lot of visitors and a lot of comments, some even from Google engineers.

I also got a private e-mail. It was from someone at Google. He explained that my post had been circulating around the Google office and when it got to him, it piqued his interest.

Essentially, he wanted me to come work for him in Mountain View. He was looking for Java folks for his team, and he thought I'd be a good fit. I jumped out of my chair when I read this, amazed some additional life had been breathed into my foray into the world of Google. The more I considered the e-mail, however, the more a part of me wanted to say no. Why?

His offer was essentially doing some semi-internal development for Google. I wanted to work on their web application back-ends, so that was a tad disappointing. Could that be the reason I wanted to turn him down? That didn't seem right, I had been joking for a while that I'd be happy to clean toilets at Google. Writing code is writing code.

The position was also contract-to-hire, which didn't roll my socks up and down. But I had been saying that once I got my foot in the door, I'd be alright. I knew I'd do fine at Google if I worked there, so I wasn't too concerned I wouldn't be hired permanently at the end of the contract work. No, it wasn't the contract aspect that bothered me.

He also told me that I'd have to spend three months in California doing the job. I'd then have to spend three months in California in a permanent position in order to "culturally integrate" before I could go back to Colorado and work in the Boulder office. This definitely bothered me. Since I would want to continue living in Colorado, I'd basically have to live in a hotel in California while Julia (my fiancee) stayed here in Colorado for 6 months. I just got engaged a month ago, and the idea of abandoning the family I'm just starting for Google seemed completely unfair. If I had gotten the job I originally interviewed for, I'd only have to be in CA for one week for training, so 6 months was a pretty big deal. When I told Julia, she told me that she could handle 6 months, and if I wanted to take this position I should. She was completely supportive of whatever I wanted to do. So it wasn't even the 6 months away from my home that was driving me to turn the position down.

I thought about this for days. I couldn't figure out what about the offer I didn't like, so shouldn't I take it?

Eventually I figured out what I didn't like about the situation and I turned it down. I don't think I could explain my rationale better than I did in my e-mail to the guy from Google, so here is what I told him:

I've been thinking about your e-mail for a few days and I've finally made a decision. This was not a decision I made lightly by any stretch.

Let me start out by saying thank you for e-mailing me and giving me another potential shot at Google. I hope you don't mind, but I'd like to update my blog story with this additional bit, though I won't be using your name or any details.

As I said in e-mail and via the blog post, there is no place I'd rather work than Google. Google, to me, is Mecca for software developers. Google does amazing work that improves the entire world. There is no better way to put my software development skills to use than at Google, where I'd be doing good work to make life better for countless individuals.

My personality, my desire to learn, my goal of improving the world - all of these tell me that Google would be the best place I could work. I know Google is right for me.

But am I right for Google? The interview process concluded with a resounding "no". Google decided that I am not a good fit for the company, and sent me back to Colorado. The fact that I made a funny blog post describing my journey doesn't change the fact that, from a technical standpoint, Google considers me below their standards.

Despite the conclusion of the interview, I believe I *AM* right for Google. I believe that, if I interview again after improving my algorithm skills and becoming more confident in my own abilities, Google will see that I am a good fit and hire me.

In short, I want to work at Google more than I can describe, but I want to work there because I earned it. I want to start my first day at Google knowing that I belong there, and knowing that Google knows I belong there.

As tempting as your offer is, I feel like it's sneaking into Google via a backdoor. I want to enter Google through the front door.

I intend on improving my abilities and learning new skills, as I do all the time as a developer. When I am ready, I will re-apply to Google, and hopefully I will meet you in the cafeteria during my week of training in California. :)

Thank you again for your e-mail.

I never imagined I would pass up a chance to work at Google, but there it is. I think I very well may look back and regret this, but for the time-being I'm comfortable with my decision.

This, I imagine, actually concludes this story. At least for a while.

Remote IFrame Detector

I saw a presentation yesterday about cross-site scripting. It was pretty interesting (though a bit overdramatic) but ultimately almost all of the attacks came down to the ability to send the user to a site which contains an iframe for the site they wanted to visit.

This allows the attacker to be running javascript code on their machine, but the iframe makes the user believe he or she is at the same site. This can allow for keylogging and some degree of remote control.

All of this hinges on the ability to load the real site in an iframe from within the hacker site. It seems to be that all you need to do is prevent an iframe from loading a page on a different host than the one containing the iframe.

A simple way to accomplish a task like this is to load a greasemonkey script that draws a border around any iframe that is at a different location than the parent site. Having never written a greasemonkey script before, I set about doing this.

This is my first greasemonkey script, so it likely sucks, but it seems to work on the very limited set of tests I performed.

To install the script, click here.

Constructive feedback and criticisms welcome. I'm happy to improve this script, though I may not know exactly how to do so. ;)