Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A somewhat related story dealing with MaxInt in Javascript.

One of the worst bugs I've encountered years ago involved the conversion of Javascript int from string to number. Javascript's long integer has only 53 bits, while most other languages have 64-bit long int. When the backend language generated Javascript snippets (JSON) containing integers greater than 53 bits, the horror started at the frontend. Javascript happily truncated the int to 53 bits upon conversion from string to int. It was not a happy tale since those long integers were account numbers. The wrong accounts ended up getting updated, randomly at first appearance.



I think the lesson there is that numeric types should only be used for things you actually want to do arithmetic with. An account ID that just happens to be all digits should still be stored and transmitted as a string.


The lesson I got was to be very careful about data type limitation when going across language boundary. The problem is not limited to numeric types. Different encoding and code page can screw up string values as well.


If you're not using UTF-8 everywhere then you're doing it wrong. Exceptions made for legacy systems, but you should get that data into UTF-8 as soon as possible.


It's unwise to lazily adopt a silver bullet without understanding the context and thinking through the consequence. I can say if you are not using XML with encoding specified to encode everything everywhere, then you are doing it wrong. You should get all your data into XML as soon as possible. Of course it sounds ludicrous.


XML is just one data storage and exchange format above many, with no particularly interesting properties and no compelling reason to use it. UTF-8 is the only encoding that's ASCII compatible, widely accepted/expected, and can represent any text you'll ever encounter.

I can come up with half a dozen reasons to use something other than XML for data storage. I've yet to hear anyone give me a compelling reason to use something other than UTF-8 for encoding strings. Just because what I said is absurd when you replace UTF-8 with XML doesn't mean the original was absurd.


UTF-8 is not efficient for random access.

I don't have problem with UTF-8. I have problem with the silver bullet attitude advocating using an approach for all cases without thought. That's just intellectually lazy.


No encoding that can handle all the necessary languages will be efficient for random access.

I'm not saying don't think about it. But once you think about it, I think there's really only one sane conclusion to reach.


Never say never. UTF-32 handles them just fine.


Precomposed versus decomposed accents? Jamo versus precomposed Hangul characters? The Unicode code point is rarely useful thing to know about on its own, and code which assumes that one code point equals one "character", for whatever definition of a character is in use, is likely to work poorly with UTF-32.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: