Well, if you look at everywhere that calls that method, our code consistently hardcodes "UTF-8". So maybe we should question why are we passing in a charset anyway?
Regarding determining the charset, I have done a lot of reading on this, and you can't always depend on magic bytes to tell you the encoding, so some amount of the bytes have to be reviewed and matched with conversion tables to find a match. On ICU4J's homepage, "ICU's conversion tables are based on charset data collected by IBM over the course of many decades, and is the most complete available anywhere." I would be likely to believe this, as I had read elsewhere that many of the I18N code in JDK >=1.1 came from them and was actually fed directly into the Java language via partnership. "IBM and the ICU team played a key role in providing globalization technology into Sun's Java".