NYCPHP Meetup

NYPHP.org

[nycphp-talk] character set filtering

Allen Shaw ashaw at iifwp.org
Tue Aug 16 11:53:51 EDT 2005


Hi All,

I've googled around a little but probably am not using the right key 
words, so I ask for a few suggestions:

Our online database system is meant sooner or later to allow several 
thousand of our contacts to start updating their own data records (with 
careful data screening on our side of course).  The big sticking point 
for me is that we can't have them submitting it in just whatever 
character set they want.  For example, we don't want to let a Japanese 
user send in his name in Chinese characters, or any kind of kana either; 
the Koreans shouldn't be allowed to submit Hangul, etc., etc.  So 
somewhere in the system I have to screen user input to be sure it's 
limited to a certain character set. 

Questions I'm struggling with along this line are these:
* What character set shall we use?  (For example, of course we don't 
allow Chinese, Thai, Arabic, etc., but what about umlauts and the 
occassional enye?)  That's an internal decision for us, I'm sure, but do 
you know of technical points I should be sure to consider?
* How will I screen the incoming data?  Do I just hack some regex 
together and run everything through it, or is there a library I should 
consider, etc.?
* How totally without clue am I about this whole topic?

If you have specific examples of sites that are doing a good job with 
this, or links to more I could read on the topic, that would be great, 
but I'd love to hear any suggestions or experience you can share.

Thanks,
Allen

-- 
Allen Shaw                                
Polymer (http://polymerdb.org)

Fine-grained control over how your users access your data: 
user permissions, reports, forms, ad-hoc queries -- all 
centrally managed.




More information about the talk mailing list