non-ascii characters at URL and pasrsing those chars at string level

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

non-ascii characters at URL and pasrsing those chars at string level

Haluk Karamete
First off, I need to get you what non-ascii chacters I'm talking about.

For instance, just type in 'Slobodan Milosevic' in Google Search and go to
the first suggested wikipedia link.

You will see that the URL contains very unusual characters that is well
beyond the common ASCII set. I'm simply curious if WordPress support that.

Though this is not a feature I particularly like (to say the least), I do
confess that I find it quite interesting from an HTTP point of view.

But my real question (or pain to better put) is this.
Say you are scraping that data and you came across that title with those
funny characers...  and you want to create a tag out of that.

Is there a conversion function that I can pass in that string and get back
the ASCII 128 or below translated version?

So I pass in 'slobodan_milo%c5%a1evi%c4%87', and I get back the good old
'Slobodan Milosevic'

Does such a function exist? Or how do you deal with that situation?
_______________________________________________
wp-hackers mailing list
[hidden email]
http://lists.automattic.com/mailman/listinfo/wp-hackers
Reply | Threaded
Open this post in threaded view
|

Re: non-ascii characters at URL and pasrsing those chars at string level

Jason LeVan-3
urldecode() mixed with remove_accents() perhaps?

https://core.trac.wordpress.org/browser/trunk/src/wp-includes/formatting.php#L794

___________________________________

Jason LeVan

Email: [hidden email]

Twitter: @codeclarified

On Tue, Sep 9, 2014 at 6:03 PM, Haluk Karamete <[hidden email]>
wrote:

> First off, I need to get you what non-ascii chacters I'm talking about.
>
> For instance, just type in 'Slobodan Milosevic' in Google Search and go to
> the first suggested wikipedia link.
>
> You will see that the URL contains very unusual characters that is well
> beyond the common ASCII set. I'm simply curious if WordPress support that.
>
> Though this is not a feature I particularly like (to say the least), I do
> confess that I find it quite interesting from an HTTP point of view.
>
> But my real question (or pain to better put) is this.
> Say you are scraping that data and you came across that title with those
> funny characers...  and you want to create a tag out of that.
>
> Is there a conversion function that I can pass in that string and get back
> the ASCII 128 or below translated version?
>
> So I pass in 'slobodan_milo%c5%a1evi%c4%87', and I get back the good old
> 'Slobodan Milosevic'
>
> Does such a function exist? Or how do you deal with that situation?
> _______________________________________________
> wp-hackers mailing list
> [hidden email]
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>
_______________________________________________
wp-hackers mailing list
[hidden email]
http://lists.automattic.com/mailman/listinfo/wp-hackers
Reply | Threaded
Open this post in threaded view
|

Re: non-ascii characters at URL and pasrsing those chars at string level

Haluk Karamete
I will look into that. your suggestion looks very promising. thank you for
that.
I also discovered this resource http://www.acc.umu.se/~saasha/charsets/ for
my own DIY.

On Tue, Sep 9, 2014 at 4:53 PM, Jason LeVan <[hidden email]> wrote:

> urldecode() mixed with remove_accents() perhaps?
>
>
> https://core.trac.wordpress.org/browser/trunk/src/wp-includes/formatting.php#L794
>
> ___________________________________
>
> Jason LeVan
>
> Email: [hidden email]
>
> Twitter: @codeclarified
>
> On Tue, Sep 9, 2014 at 6:03 PM, Haluk Karamete <[hidden email]>
> wrote:
>
> > First off, I need to get you what non-ascii chacters I'm talking about.
> >
> > For instance, just type in 'Slobodan Milosevic' in Google Search and go
> to
> > the first suggested wikipedia link.
> >
> > You will see that the URL contains very unusual characters that is well
> > beyond the common ASCII set. I'm simply curious if WordPress support
> that.
> >
> > Though this is not a feature I particularly like (to say the least), I do
> > confess that I find it quite interesting from an HTTP point of view.
> >
> > But my real question (or pain to better put) is this.
> > Say you are scraping that data and you came across that title with those
> > funny characers...  and you want to create a tag out of that.
> >
> > Is there a conversion function that I can pass in that string and get
> back
> > the ASCII 128 or below translated version?
> >
> > So I pass in 'slobodan_milo%c5%a1evi%c4%87', and I get back the good old
> > 'Slobodan Milosevic'
> >
> > Does such a function exist? Or how do you deal with that situation?
> > _______________________________________________
> > wp-hackers mailing list
> > [hidden email]
> > http://lists.automattic.com/mailman/listinfo/wp-hackers
> >
> _______________________________________________
> wp-hackers mailing list
> [hidden email]
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>
_______________________________________________
wp-hackers mailing list
[hidden email]
http://lists.automattic.com/mailman/listinfo/wp-hackers