Wednesday, December 30, 2009

"Python Trick" % locals()

I just learned this python trick for string formatting. Here's an example:
def compileCommand(sourcePath, outputPath):
switches = "-O0 -g"
return "gcc -c %(switches)s %(sourcePath)s -o %(outputPath)" % locals()
This trick uses the dict locals() to supply the values for formatting the string with the '%' signs in it. The names inside the parens are resolved with the local variable names in the function.

Example usage of the above function:
cmd = compileCommand("hello.c", "hello.o")
assert cmd == "gcc -c -O0 -g hello.c -o hello.o"

Wednesday, December 23, 2009

Someone Doesn't Know How make Works

I tried to look up documentation on makefile macro modifiers, and the first result gave the syntax as this:
$(name,modifier[,modifier ...])
I don't know what kind of 'make' this author was using, but that syntax doesn't work at all in mine. I sought out the real make documentation, and the concept of a macro modifier doesn't even seem to exist. The closest we get is a Substitution Reference, which uses a colon, not a comma, and has limited functionality.

In the end, the $(subst from,to,text) function solved my original problem.

Friday, December 18, 2009

cygwin make Hates Windows

I installed 'make' from cygwin's package manager and it doesn't work with windows absolute paths. Here is my Makefile:
C:/TEMP/asdf.txt:
touch C:/TEMP/asdf.txt
(that's a tab)

Invoking 'make' from cygwin gives the following error:
Makefile:1: *** target pattern contains no `%'.  Stop.
The workaround is redoing the paths that 'make' cares about to use the '/cygdrive/c/' path:
/cygdrive/c/TEMP/asdf.txt:
touch C:/TEMP/asdf.txt
(that's a tab)

This works. 'touch' knows how to read windows absolute paths and 'make' doesn't.

Python's eval Function Hates Windows

(Python 2.4) I'm loading options from a file that's just a big python dict literal:
options = eval(open(argv[1]).read())
The eval function always worked when I gave it a literal string (like with """), but every time I gave it a string that was read out of a file, it would give me a syntax error on the first newline. The problem is that python's eval function can't stand '\r' characters. My line endings were windows-style, and the python interpreter didn't have a problem with them, just the eval function. Here's the solution:
options = eval(open(argv[1], "rU").read())
UPDATE: python admits this limitation for the exec function, telling you to use universal newline mode "U" when opening files for reading. However, there is no mention of this limitation in the documentation for the eval function.

Sunday, December 13, 2009

My Favorite Runtime.exec Wrapper

If you want to run another process in java (like a python script), the internet should lead you to Runtime.exec. This method doesn't block, so you might be tempted to call Process.waitFor to make sure it's done. However, sometimes this method will hang forever when the process writes to stderr or returns != 0. That seems like a bug to me, so the wrapper below works around that by catching exceptions from Process.exitValue instead of asking sun to do the waiting for us. Furthermore, the wrapper below reports stdout and stderr to you via StringBuffer arguments.
/**
* Executes a command and waits for the process to exit.
* The process is queried every 100 milliseconds for completion.
* stdoutBuffer and stderrBuffer can be the same object.
*
* @param cmd the command to be passed to {@link Runtime#exec(String[])}.
* @param stdoutBuffer buffer to which the process's stdout will be written. can be null.
* @param stderrBuffer buffer to which the process's stderr will be written. can be null.
* @return the exit value of the process
* @throws RuntimeException IOException and InterruptedException are wrapped in RuntimeException and thrown
*/
public static int exec(String[] cmd, StringBuffer stdoutBuffer, StringBuffer stderrBuffer)
{
if (stdoutBuffer == null)
stdoutBuffer = new StringBuffer();
if (stderrBuffer == null)
stderrBuffer = new StringBuffer();
try {
Process p = Runtime.getRuntime().exec(cmd);
InputStreamReader stdout = new InputStreamReader(p.getInputStream());
InputStreamReader stderr = new InputStreamReader(p.getErrorStream());
while (true) {
int exitValue = -1;
boolean done;
try {
exitValue = p.exitValue();
done = true;
} catch (IllegalThreadStateException e) {
done = false;
}
while (stdout.ready())
stdoutBuffer.append((char)stdout.read());
while (stderr.ready())
stderrBuffer.append((char)stderr.read());
if (done)
return exitValue;
Thread.sleep(100);
}
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}

Wednesday, December 2, 2009

Keyboard Manufacturers Are All Slackers

Have you ever tried to press more than 3 buttons at once on your keyboard?
Modern keyboards ... [block] the 3rd key in certain key combinations, [which] means that when two keys are depressed simultaneously, many of the other keys on the keyboard will not respond until one of the two depressed keys is lifted. With better keyboards designs, this seldom happens in office programs, but it remains a problem in games even on expensive keyboards, due to wildly different and/or configurable key/command layouts in different games

-wikipedia
Am I missing something here? Why is it so hard to design a keyboard that actually does what it's supposed to do. It's 2009 and I can't play Metroid III on my snes emulator because I have to choose between charging my gun and jumping while pressing the run button.

Friday, November 20, 2009

Python 2.4 problem with reading binary files

In Python 2.4, you must specify "rb" as the mode for reading binary files. In Python 2.6 you can just say "r" and it will work.

Python 2.4:
>>> f = open(filename, "r")
>>> s = f.read()
>>> len(s)
114
>>> f.close()
>>> f = open(filename, "rb")
>>> s = f.read()
>>> len(s)
230
>>> f.close()
The problem is that it only reads part of the file. I don't know why it thinks the file is done in the middle like that, but to fix it, specify "rb" instead of "r".

I ran into this problem writing a program in Python 2.6 using just "r" and didn't run into the problem until I tried running it in Python 2.4.

Sunday, November 15, 2009

Not short circuited

If you want to do the following:

boolean pass = true;
pass = pass && test1();
pass = pass && test2();

It is NOT equivalent to the following:

boolean pass = true;
pass &= test1();
pass &= test2();

Because the &= operator is not short circuited. It compiles and works just fine for booleans, but it is a bitwise operator at heart and only works for booleans out of sympathy.

What we need is a &&= operator.

Wednesday, October 28, 2009

[Ljava.lang.String;@445c445c

If you've ever seen the above trying to do a println with an array, I have the solution for you!

wrong:
System.out.println(array);
right
System.out.println(Arrays.toString(array));

My Favorite Join Function

My favorite implementation of a join function (in Java):

public static <T> String join(T[] array, String delimiter)
{
if (array.length == 0)
return "";
StringBuilder builder = new StringBuilder();
builder.append(array[0]);
for (int i = 1; i < array.length; i++)
builder.append(delimiter).append(array[i]);
return builder.toString();
}

Wednesday, October 7, 2009

How Haskell Works

The most baffling programming language I ever learned was Haskell. It is famed as one of the only purely functional programming languages in use today. A hiring rep from Microsoft told me that it was his favorite language, which sparked my interest.

Basic Syntax


You can define a function called "divide" that takes two arguments, divides them, and returns the result.

divide a b = a / b

You invoke the function like this:

two_thirds = divide 2 3

No parens, no commas. Just spaces.

Furthermore, since this is a functional language, you can define functions by simply assigning them from other existing functions.

div = divide
two_thirds = div 2 3

This shouldn't be anything too profound since Python, even JavaScript, can do this.

Now let's make a function that inverts a number:

invert a = divide 1 a
one_half = invert 2

The Confusing Part


Unlike in Python (or any other language I've seen), you can define a function that supplies only some of the arguments to another function.

invert = divide 1
one_half = invert 2

But I thought "divide" took 2 arguments, you might say. It actually doesn't.

How Haskell Works


Every function in Haskell takes exactly 1 argument. To explain this, I'll use Python. Here's the Python equivalent of our divide function from above:

def divide(a):
return lambda b: a / b

Here's how to call our python divide function:

two_thirds = divide(2)(3)

The result of divide(2) is a function that takes 1 argument called b. When it gets b it divides 2/b and returns that. All functions in Haskell behave this way.

Some Examples



negate a = -a
divide a b = a / b

-- here's a list of numbers:
numbers = [1,2,5,10]


-- the map function takes a function and
-- applies it to each item of a list
negated_numbers = map negate numbers
-- produces: [-1,-2,-5,-10]

-- here's our invert function
invert = divide 1
inverted_numbers = map invert numbers
-- produces: [1, 0.5, 0.2, 0.1]
-- equivalently:
inverted_numbers = map (divide 1) numbers

-- more complex:
-- this function inverts a list of numbers
invert_list = map invert

inverted_numbers = invert_list numbers

Obviously, there's a lot more to Haskell than presented here, but hopefully this helps explain some of the semantics of Haskell I had trouble with when I learned it.

Tuesday, October 6, 2009

It's all C's fault: Part II

In a previous post, I vented about the ambiguity of expressions such as (a)-b. Are we subtracting (a) from b, or are we casting -b to type a? I have news: the Java compiler struggles with this same problem.

Integer i1 = (int)3; // valid
Integer i2 = (int)-3; // valid
Integer i3 = (int)-(3); // valid
Integer i4 = (Integer)3; // valid
Integer i5 = (Integer)-3; // compile error
Integer i6 = (Integer)-(3); // compile error
Integer i7 = (Integer)(-3); // valid

There's some automatic boxing of primitive types into wrapper types going on here, a feature of Java 5. The fact that the compiler can handle 3 but not -3 in any particular location is to me a fault. This is with jdk1.5.0_16.

At least I've got evidence that the (Type)expression casting syntax is troublesome.

Sunday, October 4, 2009

Operators VB vs Java

What's the output from these two functions in VB and Java respectively:

Public Sub PrintThings()
Console.WriteLine("3" + "6")
Console.WriteLine(3 & 6)
End Sub


public void printThings() {
System.out.println("3" + "6");
System.out.println(3 & 6);
}

Friday, October 2, 2009

Name That Function


let a, b, c be Non-Negative Integers
let f(x, y) be a function such that:
f(a, b) = c
f(b, a) = c
f(a, c) = b

what is f?

I'm not going to mess around with the formal notation of math, because the answer isn't something you can't figure out.

How to check your answer: pick arbitrary values for a and b. Use what you think f is and the formula f(a, b) = c to calculate c. Then see if the other two formulas are true with those values of a, b, and c.

Tuesday, September 29, 2009

No One Cares

What happens to the operand stack in the JVM when an exception is caught locally? The internet seems to say that it's emptied. The JVM Specification doesn't say.

... I care.

Monday, September 28, 2009

Java Riddle: instanceof and casting

Is it possible that the System.out.println line ever gets executed?

void brokenFunction(Object stringMaybe) {
if (stringMaybe instanceof String)
return;
String stringForSure = (String)stringMaybe;
System.out.println(stringForSure);
}

Friday, September 25, 2009

It's-A* Me

youNeedToWatchThis = youCareAboutVideoGames * youCareAboutProgramming

Ai Mario

proof it's real

What does Turing Complete mean?

Consulting the magnifluous fount of perpetual enlightenment:

Turing completeness:

A computational system that can compute every Turing-computable function is called Turing-complete (or Turing-powerful). Alternatively, such a system is one that can simulate a universal Turing machine.


Thank you, circular definitions.

In English:


A turing-complete programming language is a "real" programming language. The quick test is whether it has control structures that can go backwards: loops and/or recursion. That's pretty much it.

Examples:


C/C++, Java, Python, Ruby: Yes
a calculator: No
spreadsheet formulas (Excel, etc.): No
Lisp, Haskell, Prolog: Yes
assembly languages: Yes

Who cares:


The big deal with turing-complete programming languages is that any problem you can solve with one of them, you can solve with any other of them. So you could theoretically write gcc in BrainF if you had enough patience and system resources. You cannot, however, write gcc in Excel formulas.

Name that/those Programming Language(s)

What non-esoteric, turing-complete programming languages allow this fragment of a line to show up in source code?
code: " 1 2 "
(not including the quotes. can't be inside a literal string or comment.)
To help you out, C is NOT an answer.

There are many answers. How many can you come up with? So far I'm at 5.

Thursday, September 24, 2009

The Beauty of Python

OOffice pastes tables into plain text 1 cell per line with no indication of row breaks. Dumb. this prompted me to convert the text into a list of tuples in python. I got so far as getting a list of cell data, then I needed to take every 4 items in the list and make a tuple out of them.

Here's the problem I was faced with:

input: ["a", "b", "c", "d", "w", "x", "y", "z", ...]
output: [("a", "b", "c", "d"), ("w", "x", "y", "z"), ...]

After asking around in the python irc channel, Andy got the following solution, and while writing this post, I also found it in Python's documentation for the zip function:

output = zip(*[iter(input)]*4)

It's hard to overstate my satisfaction, this is such an intricate, concise solution.

What is going on


from the inside out:
iter(input): this makes an iterator object that traverses the input list. The iterator stores the state of which item it's on, and what comes next. [iter(input)]: this makes a list with 1 item in it, the iterator.
[iter(input)]*4: this takes the list, duplicates it 4 times, and concats it all together. ex: [1,2]*3 => [1,2,1,2,1,2]. An important thing to note in this step is that just the one iterator object with its state information occupies all four slots in the list.
zip: python's documentation. Note that the function takes lists as separate arguments
zip(*[iter(input)]*4): the * expands the list to its right into an argument list for the zip function. So it's equivalent to giving 4 arguments to the zip function. Here's where the magic happens.

the zip functions traverses each of its four arguments in parallel. Before an iterator has been moved, its "next" item is the list's first item. The zip function asks the first list's iterator for the "next" item, giving the first item in the list. Then the zip function asks the second iterator for its "next" item, which is typically the first item in the second list. However, the second iterator is the same object as the first iterator, and its "next" item is the second item in the list, not anyone's first item. And so the zip function is tricked into thinking that it is traversing 4 separate lists of length 2 each when really it is traversing over a single list of length 8. The result is exactly what we want.

Tuesday, September 15, 2009

I beat n 88.4

I don't care if you don't care, this is getting blogged.

Saturday, September 12, 2009

postfix/prefix lol

C/C++ and Java

Valid:

int a = 0;
int b = a++ + ++a;

Not valid:

int a = 0;
int b = a+++++a;

Too bad, right? It looks so fun.

Friday, September 11, 2009

Logical XOR operator


operator bitwise logical
not ~ !
and & &&
or | ||
xor ^ ???????


Where's the logical XOR operator?? One does exist in Java. Do you know what it is?

Java Quiz

In order for a method to be inlined, the method must NOT be overridable (virtual in C++ terms).

Under what 5 conditions is a method not overridable?

1. The method is explicitly marked as "final".
2. The method is _________.
3. The method is _________.
4. The method's class is _________.
5. The method's class is _________.

The answer is on the internet if you want to cheat.

Thursday, September 10, 2009

Selfish Promotion: Jax

When I look at the following code:

Scanner scanner;
try {
scanner = new Scanner(filename);
} catch (FileNotFoundException e) {
throw new RuntimeException(e);
}
ArrayList lines = new ArrayList();
while (scanner.hasNextLine())
lines.add(scanner.nextLine());

All I can think of is what a shame it is that I have to declare scanner before the try-catch block (and that the curlies are mandatory).

Here's the equivalent in my experimental programming language Jax:

Scanner scanner = try new Scanner(filename)
catch (FileNotFoundException e)
throw new RuntimeException(e);
ArrayList lines = new ArrayList();
while (scanner.hasNextLine())
lines.add(scanner.nextLine());

That first line shows how try-catch blocks evaluate to an expression. Everyone should be thrilled about Jax so I have motivation to continue working on it!

Tuesday, September 1, 2009

Failure Shorthand

Why type
throw new NullPointerException();
when you could just type
throw null;
IT TOTALLY WORKS!!

Sunday, August 30, 2009

Usability Please! - Part 1

ATTENTION PROGRAMMERS

if you put an icon on your UI, you MUST tell the user what it means IN ENGLISH. Either with text right beside it or in the TOOLTIP TEXT.

Thursday, August 27, 2009

Monday, August 24, 2009

Eclipse's Forum Needs To Get with It

Why is Eclipse's message board archive sooooo archaic?

http://dev.eclipse.org/mhonarc/lists/platform-debug-dev/msg01042.html

You read the issue backwards tolerating more and more layers of ">" cushioning. Then if you want to read the responses, you have to go down to the "Follow-Ups" section at the very bottom of the page and visit a separate page for each response.

Why doesn't eclipse consolidate all that information and make it presentable? They've already got the information organized as a hierarchy. Why can't they make it show up in a tree-thread view?

Also some css couldn't hurt.

Sunday, August 23, 2009

Answer to Name That Band

The answer can be found at a link starting with "http://is.gd/" and ending with 5 characters each of which have clues bellow:

1. 4 8 15 16 _3 42
2.
  e
ha_e
i
l

3. DRA_ = dull
4. Less Talk More Ro_k
5. Tritone of C#

Java Facepalmers

Just fyi, the following are retarded:


SomeType thing = new SomeType(paramerters);
if (thing == null)
// ...
// ...



if (this == null)
return;
// ...


These are both real mistakes I've found working on a Java project at work. It makes me feel dumb explaining that a constructor never returns null, and that this is never null.

Please write good code and don't make these mistakes. At least it gives me a job.

Abc's lost Lost episodes

abc.com has all Lost episodes available online. That's episodes 100-518. But for some reason, episodes 501-511 are missing. Why would they have the end of season 5 online, but not the beginning and middle of it?

Well now i'm stuck until December 8 when season 5 comes out on pirate bay DVD.

Friday, June 19, 2009

The Death of Mibbit

I was using mibbit for my irc client, then they made this decision:


Notice -- Connections via mibbit are no longer supported on freenode. You may wish to consider using http://webchat.freenode.net instead. Further information over at http://bit.ly/19JILF


R.I.P. Mibbit

Saturday, May 30, 2009

Java Riddle

I change the value of a constant in one file, and it causes a compile error in a different file. How is this possible?

Sunday, May 17, 2009

It's all C's fault

C's type casting syntax creates a lot of trouble for parsers.

(a)-b

What does the above code mean? are you casting negative b to type a? or are you subtracting b from a? It's ambiguous and depends on whether a is a type or a value.

Well, actually that's not the whole story. In Java, the only two ambiguous binary/unary operators are + and - which are only applicable as unary operators to primitive types. So any instance of (___)-b could be disambiguated by whether or not the ___ was one of the 7 primitive types. This is because you can't cast a primitive to anything but another primitive. In C++ this is not the case.

C++ has operator overloading, but barely dodges the bullet by enforcing the prototyping rule. To paraphrase, "every identifier must be defined prior to its usage." including types and where "prior" refers to position in source code. So if the C++ parser is traveling forward, it will have an accurate bank of identified types whenever it needs to disambiguate code such as (a)-b. All it needs to do is look up what category a belongs to--type, variable, function, etc,--and it will be able to make the right decision from there.

C++ can't cleanly separate parsing from semantic analysis, but C++ sucks anyway so who cares.

Wednesday, May 13, 2009

Java's "protected" access fine print

In Java, protected means that subclasses have access, even if the classes are in different packages. This is actually not always the case. Right from Java's language spec:

A protected member or constructor of an object may be accessed from outside the package in which it is declared only by code that is responsible for the implementation of that object.


The following code produces a compile error.


package p1;

public class C1
{
protected void foo()
{
}
}

...

package p2;

import p1.C1;

public class C2 extends C1
{
private C1 c1;
private void bar()
{
c1.foo(); // compile error
}
}


I have yet to understand why this is the case. I'm guessing it has something to do with virtual methods and identifier resolution tokens.

Sunday, May 3, 2009

Name that regex

What do these regular expressions match?

  1. /0x[0-9A-Fa-f]+/

  2. /\-?\d+/

  3. /\/{2}.*?\n|\/\*.*?\*\//

  4. /<!\-{2}.*?\-{2}>/

  5. /^\d{5}(\-\d{4})?$/

  6. /[A-Za-z_][A-Za-z0-9_]*/

Tuesday, April 28, 2009

Electric Traps and a High-Tech Marker

You are lost deep in an enormous maze.

The maze is made of rooms, automatic sliding doors, and fluorescent-lit hallways. The maze is so large and uniform-looking, and your memory is so bad, that you aren't able to remember where you've been or how you got wherever you happen to be. Rooms are connected by hallways with doors at both ends. Hallways can have stairs and ladders to other stories of the maze. There is exactly one exit.

Halfway through every hallway, the walls, floor, and ceiling are lined by strange metal. Every time you step on the metal, the hairs on your arms stand up and you hear a high-pitched squealing coming from the walls. The squealing continues until you're off the metal on one side or another. If you come back to a hallway and step on the same metal a second time, you hear the squealing again, but even higher-pitched. If you step on the metal in any hallway a third time, you get electrocuted and killed. There is no way to get through a hallway without stepping on the metal.

You find in your pocket a high-tech marker that can only write on the metal walls in the hallways, and only when they're squealing. You have no other way to leave markings in the maze.

How do you find your way out alive?

Sunday, April 26, 2009

Consistent Arbitrary Colors

Mibbit gives usernames colors, i think, depending on their users' client. XChat gives everyone but me the same color. I don't like either of these choices because many usernames get the same color, and I therefore can't use color to tell who said something without reading their username.

I'd prefer it if everyone had a different color (to a reasonable extent), but still have the same color the next time they log in. Short of users assigning colors to anything, how could an IRC client accomplish this?

Wednesday, April 22, 2009

Tuesday, April 14, 2009

No Excuse for Ignorance

Java's ArrayList and Scanner classes are two that anyone writing in Java should know about, or else they're stupid.

The following two atrocities are summaries of actual code I had to review, the first at work, the second a submission in a programming competition.

What's an ArrayList?


Inexcusable ignorance:

int size = 0;

int optAIndex = -1;
if (weNeedOptA)
optAIndex = size++;

int optBIndex = -1;
if (weNeedOptB)
optBIndex = size++;

int optCIndex = -1;
if (weNeedOptC)
optCIndex = size++;

String[] options = new String[size];

if (optAIndex != -1)
options[optAIndex] = "-a";

if (optBIndex != -1)
options[optBIndex] = "-b";

if (optCIndex != -1)
options[optCIndex] = "-c";

return options;


Enlightened version:


ArrayList<String> options = new ArrayList<String>();

if (weNeedOptA)
options.add("-a");

if (weNeedOptB)
options.add("-b");

if (weNeedOptC)
options.add("-c");

return options.toArray(new String[options.size()]);

The concept that this code is accomplishing is really pretty basic: add options to an array as we need them. That's just what ArrayList's are for. The ignorant solution uses 18 lines to solve an 8-line problem, and that's the reduced version of the actual code that I've had to deal with.

The toArray(T[]) function is the hardest part of the class to learn, but even the programmers who just copy and paste code they don't understand should be able to pick up that one-liner pretty easily.

If you think you know how to program in Java and you don't know about the ArrayList class, then what you really know is how to be a pain in the neck to other programmers.

The ArrayList api consists of 3 important methods. The constructor, add(T), and toArray(T[]). Don't click on the links, just look at what I did in the example, and that's all you need to know.

Now use it!

I Can Read Numbers All by Myself!


Inexcusable ignorance:

String numberStr = "";
int nextChar = System.in.read();
while (nextChar != -1) { // check for EOF
char c = (char)nextChar;
if ('0' <= c && c <= '9') // check for whitespace
numberStr += c;
else
break; // end of number
nextChar = System.in.read();
}
int number = Integer.parseInt(numberStr);


Enlightened version:


Scanner scanner = new Scanner(System.in);
int number = scanner.nextInt();

Two lines!! The task we're trying to perform is to read a number from standard input. Really, did you not think that someone else ran into this problem before you did? The first line wraps a scanner around standard input, so you can read from it many times successively. Really the enlightened version is a 1-line solution with a line for setup.

There's another issue here as well, folks. The inexcusable ignorance can't read negative numbers. The first time through the while loop breaks because '-' is not in the range [0-9]. Then the Integer.parseInt() call will throw a NumberFormatException because numberStr is still the empty string. This was a real submission to a programming competition I helped run, and the negative number issue is the reason the participant got this question wrong.

Not only is the bad solution bulky, it's broken. Use a Scanner!

The Scanner api has lots of nice functions for you. The code below outlines what I've found to be useful constructors and methods:

// reads from a text file
Scanner file = new Scanner(new File("<filepath>"));
while (file.hasNextLine()) {
String line = file.nextLine();
// ...
}

// reads through a string
Scanner scanner = new Scanner("123 hello 6.67e-11");
int number = scanner.nextInt(); // 123
String token = scanner.next(); // "hello" stops at whitespace
double gravity = scanner.nextDouble(); // Newton's constant
boolean done = !scanner.hasNext(); // true

I hope you, the reader, agree that knowledge these two classes and their api's are an absolute minimum requirement for Java programmers. Please do not put the rest of us through the pain of dealing with their absence in your code.

Thursday, April 9, 2009

Interned Arrays?

Intro to String.intern()


In Java, string comparison is done with the String.equals() method as opposed to using the == operator. This is because the == operator only compares references.

String s1 = "first ";
s1 += "second"; // All these lines are to prevent Java from optimizing
String s2 = "first"; // and concatenating the literals at compile time and
s2 += " second"; // defeating the purpose of this snippet.
assert s1.equals(s2) && !(s1 == s2);

That is to say, s1 and s2 were constructed in different ways but now represent equals strings (they're both "first second".). The .equals() method knows this but the strings are at different locations in memory.

Strings in Java (unlike in C) are immutable. That is to say that once I have a reference to a String, I know that it's value will never change because it is downright impossible for me or anyone else to do so (without using reflection. that's cheating). What does this imply?

When you spell out two different literal strings in your source code that actually have the same value, Java gets smart and really only makes one string and points both references to it.

String s1 = "hello", s2 = "hello";
assert s1 == s2;

It's harmless to do so because String's are immutable. This does a few good things. It takes up less space (which many people might not care about) and it optimizes comparisons. You can see this optimization in Java's String class source code (which Sun has so generously opened):

public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = count;
if (n == anotherString.count) {
char v1[] = value;
char v2[] = anotherString.value;
int i = offset;
int j = anotherString.offset;
while (n-- != 0) {
if (v1[i++] != v2[j++])
return false;
}
return true;
}
}
return false;
}

You can see that a call to .equals() with the object as its own parameter will return in one comparison. But is this optimization only applicable to literal String's?

No, you can create this effect at runtime with the String.intern() method. It returns a unique reference (possibly its own) to a String with the same value as itself. If two String's have the same value, then their .intern() methods will return the same reference.

String s1 = "first ";
s1 += "second";
String s2 = "first";
s2 += " second";
assert s1 != s2;
s1 = s1.intern();
s2 = s2.intern();
assert s1 == s2;

This is done by Java managing a pool of String's that are the official intern representatives, so to speak. When you call .intern(), a match is found from pool and returned, or if there isn't one yet, then you your string goes in the pool and becomes the official intern.

The call to .intern() is potentially expensive, but the idea is to save time later during comparisons and memory usage.

Why use it?


All literal String's go into the pool:

String s1 = "first ";
s1 += "second";
s1 = s1.intern();
String literal = "first second";
assert s1 == literal;

This means that if you are using some arbitrary String constants as property keys, you can just use the equality operator instead of the .equals() method.

public static final String HEIGHT_PROPERTY = "HEIGHT_PROPERTY";

public void setProperty(String key, int value) {
if (key == HEIGHT_PROPERTY)
this.height = value;
}
public void example() {
setProperty(HEIGHT_PROPERTY, 5);
}

When is this not the case? If you're reading property key/value pairs out of a file or from a stream, then String's are constructed from scratch and the newly created property maps don't use the same memory as your literal.

Property File:

HEIGHT_PROPERTY=5

Parsing Code:

String line = scanner.nextLine();
String[] keyAndValue = line.split("=");
String key = keyAndValue[0];
String value = keyAndValue[1];

Here, the key is equal to HEIGHT_PROPERTY according to the .equals() method, but not the == operator. Instead of changing all your code to use .equals() instead of the == operator, you can just intern the key.

String key = keyAndValue[0].intern();

Now you can still use the == operator throughout your code, and, as long as there are enough comparisons performed after the value is read from the file, you've increased performance.

Interned Arrays?



In Java, the size of an array is immutable. In order to increase the size of an array, you have to create a new one, then copy the contents into it.


Object[] objects = new Object[10];
fillThisArray(objects);
// now we need it to be bigger
Object[] newObjects = new Object[20];
System.arraycopy(objects, 0, newObjects, 0, 10);
objects = newObjects;
fillMore(objects);


This doesn't mean that arrays are themselves immutable in the same way that String's are because you can change the contents of an array. The only exception to this is, of course, zero-length arrays, who don't have any contents to change. This means that one zero-length array of a type is indistinguishable from another, leading to programmers getting smart and keeping just one zero-length array around and using it instead of creating new ones all the time.

Instead of writing this:

public Result[] getResults() {
if (resultGenerator == null)
return new Result[0];
else
return resultGenerator.getResults();
}

they write this:

private static final Result[] NO_RESULTS = new Result[0];

public Result[] getResults() {
if (resultGenerator == null)
return NO_RESULTS;
else
return resultGenerator.getResults();
}


This is a good idea, in my opinion, but also creates clutter just for the sake of efficiency. And what if another class were also storing a zero-length array of Result's? Then the purpose is partially defeated because there isn't just one instance. Is there a solution?

If the Java language 'interned' zero-length arrays, then this problem would be solved (and look much nicer).


Object[] a1 = new Object[0];
Object[] a2 = new Object[0];
assert a1 == a2; // fails :(


Currently, Java does not intern literal zero-length arrays, like the ones above. I agree that the "new" keyword looks like it should actually create something new. If, however, Java kept a special copy of zero-length arrays around for each type, it would take care of some optimizations that are ugly to write in code. The reference to the zero-length array would be inserted into the bytecode instead of the call to create a new one of length zero. It would only work for explicitly zero-length arrays. It wouldn't work for the following code.


private Object[] makeArray(int size) {
return new Object[size];
}
private void example() {
Object[] a1 = new Object[0];
Object[] a2 = makeArray(0);
assert a1 == a2; // hopeless
}


In this situation, there's no way around having two instances of zero-length arrays. Having a .intern() function on arrays would be more hassle than it's worth, because it only applies to zero-length arrays, which is a property that's usually only known at runtime.

What I'm proposing is that the code

new SomeType[0]

will never allocate memory, but will return a reference to an already existing zero-length array of type SomeType.

What do you think?

Sunday, April 5, 2009

Programming Languages Quiz

I made a little quiz covering new stuff about programming languages I learned that I thought was interesting.

It's as language-neutral as I could make it. Like you don't have to know C# specifically or anything.

It's really easy to look up the answers on the internet, so your score is only as honest as you are.

Programming Languages Quiz

I'm open for corrections or suggestions.

Saturday, April 4, 2009

Nice Guys Sleep Alone

v1:
You always loved the jerk in me and when I didn't treat you right.
You said I should change, be gentle and kind, but you really wanted to fight.
I blew you off and beat you down. It only made you want me more.
You knew I wasn't right for you, just one more thing to like me for.

Then I changed per your request, softened sweet and meek.
I served you, loved you, promised to wed you. But you just saw me as weak.
Your twisted heart's desires and your longing to pursue,
You bored of me and left me. And I'll never forgive you.

c:
I gave you my life. The game is done.
Rejecting this mere trophy heart that you've won.
You showed me you loved me. I tried to be true.
You left me with nothing. I'll never forgive you.

v2:
If jerks you like, then jerks you'll get. You'll never be happy in life.
You love the hate and lust the fight. You'll never be anyone's wife.
I'll be the jerk you love again. You can be sure of that.
I'll win you back and catch your heart. And then I'll leave you flat.

bridge:
All my devotion to you was in vain.
All the hurt I'll remit.
I'll give you your drama, abuse, and pain.
You'll beg to be mine, and I won't give you shit.

Friday, March 20, 2009

Tower and Fuses

You are imprisoned in a stone tower. The cell you’re in is lit by a torch just outside the cell door within your reach. There is a guard stationed at the only entrance to the tower who is far out of earshot. You know that during the changing of the guard, the tower entrance will be unguarded for about 1 minute. The changing of the guard happens at a different time each day. The cell door can only be opened with the blast from a stick of dynamite, which will go unnoticed only in the 1 minute window of time during the changing of the guard.

Your friend, who has climbed up the side of the tower, is able to give you two things through the window. The first thing is a tip: the guard will change in exactly 50 minutes. The second thing is a bag with four items in it: a stick of dynamite and three fuses. One fuse takes exactly 5 minutes to burn from one end to the other; the other two each take exactly 1 hour to burn from one end to the other. While the time it takes to burn the entire fuse is very predictable, the fuses do not burn consistently throughout their length making it impossible to divide the fuse in parts and burn only the parts predictably.

You have no watch or any other means by which to measure the passage of time. You have no further contact with any other ally. How do you escape?