Monday, September 14, 2009

A lenient URL decoder for Java


The URLDecoder class in the JDK insists on doing a strict parsing of escape characters in an encoded URL string. Sometimes, the application might want to decode correctly escaped string sequences and leave incorrect sequences intact. In fact the Sun documentation states that this aspect of decode handling is implementation dependent. However, Sun's implementation is strict - it throws an exception when it encounters improper escape sequences rather than treating them like regular text.

I couldn't find a lenient implementation so hand-crafted this from the original source for URLDecode class found here.

Following is the lenient decode.

    public static String decodeLenient(String s, String enc)
throws UnsupportedEncodingException {

boolean needToChange = false;
StringBuffer sb = new StringBuffer();
int numChars = s.length();
int i = 0;

if (enc.length() == 0) {
throw new UnsupportedEncodingException("URLDecoder: empty string enc parameter");
}

while (i < numChars) {
char c = s.charAt(i);
switch (c) {
case '+':
sb.append(' ');
i++;
needToChange = true;
break;
case '%':
/*
# * Starting with this instance of %, process all
# * consecutive substrings of the form %xy. Each
# * substring %xy will yield a byte. Convert all
# * consecutive bytes obtained this way to whatever
# * character(s) they represent in the provided
# * encoding.
# */

// (numChars-i)/3 is an upper bound for the number
// of remaining bytes
byte[] bytes = new byte[(numChars - i) / 3];
int pos = 0;

while (((i + 2) < numChars) &&
(c == '%')) {
String hex = s.substring(i + 1, i + 3);
try {
bytes[pos] =
(byte) Integer.parseInt(hex, 16);
pos++;
} catch (NumberFormatException e) {
sb.append(new String(bytes, 0, pos, enc));
sb.append("%");
sb.append(hex);
pos = 0;
}

i += 3;
if (i < numChars)
c = s.charAt(i);
}

sb.append(new String(bytes, 0, pos, enc));

// A trailing, incomplete byte encoding such as
// "%x" will be treated as unencoded text
if ((i < numChars) && (c == '%')) {
for (; i<numChars; i++) {
sb.append(s.charAt(i));
}

}

needToChange = true;
break;
default:
sb.append(c);
i++;
break;
}
}

return (needToChange ? sb.toString() : s);
}
}

1 comment:

Anonymous said...

This was a life saver for me. Thank you very much, sir!