Re: Extending query_posts?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Extending query_posts?

Scott johnson-5
I'd be careful with that.  My understanding of randomization in mysql
is that the performance is awful.  I haven't done it myself but that's
hwat I understand.

Scott

On 1/29/06, Sam Angove <[hidden email]> wrote:

> On 1/29/06, Mattias Winther <[hidden email]> wrote:
> >
> > 1. Are there any other approaches that might work better? I realize there are ways to do a hack that does this, but I really want to make it feel like it was integrated in the wordpress core.
> > 2. Is there a hook for looking at and modifying the query just before sending it to the SQL server?
>
> 2. There's a bunch of them. posts_where, posts_join, posts_orderby
> etc. Browse WP_Query::get_posts() in classes.php.
>
> 1. I don't know how fast it is, but you can do it with ORDER BY RAND().
>
> This seems to work:
>
> <?php
> /*
> Plugin Name: Random Posts
> */
>
> function query_random_posts($query) {
>         return query_posts($query . '&random=true');
> }
>
> class RandomPosts {
>         function orderby($orderby) {
>                 if ( get_query_var('random') == 'true' )
>                         return "RAND()";
>                 else
>                         return $orderby;
>         }
>         function register_query_var($vars) {
>                 $vars[] = 'random';
>                 return $vars;
>         }
> }
> add_filter( 'posts_orderby', array('RandomPosts', 'orderby') );
> add_filter( 'query_vars', array('RandomPosts', 'register_query_var') );
> ?>
> _______________________________________________
> wp-hackers mailing list
> [hidden email]
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>


--
-------------------------------------------------------
J. Scott Johnson
Ookles launches 2/28/06 - have you signed up yet?
new startup: http://ookles.com/
blog: http://fuzzyblog.com/
podcast: http://techwarstories.com/
[hidden email]
aim: fuzzygroup
cell: 857 222 6459
-------------------------------------------------------
_______________________________________________
wp-hackers mailing list
[hidden email]
http://lists.automattic.com/mailman/listinfo/wp-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Extending query_posts?

MWinther

I can't find any specific references to the performance problems you are referring to... Anyone else out there who recognizes this, preferrably with some kind of link for further analysis? Or, someone with a huge database that can run the numbers?

/Mattias

On Mon, 30 Jan 2006 08:10:03 -0500, Scott johnson <[hidden email]> wrote:

> I'd be careful with that.  My understanding of randomization in mysql
> is that the performance is awful.  I haven't done it myself but that's
> hwat I understand.
>
> Scott
>
> On 1/29/06, Sam Angove <[hidden email]> wrote:
>> On 1/29/06, Mattias Winther <[hidden email]> wrote:
>> >
>> > 1. Are there any other approaches that might work better? I realize
> there are ways to do a hack that does this, but I really want to make it
> feel like it was integrated in the wordpress core.
>> > 2. Is there a hook for looking at and modifying the query just before
> sending it to the SQL server?
>>
>> 2. There's a bunch of them. posts_where, posts_join, posts_orderby
>> etc. Browse WP_Query::get_posts() in classes.php.
>>
>> 1. I don't know how fast it is, but you can do it with ORDER BY RAND().
>>
>> This seems to work:
>>
>> <?php
>> /*
>> Plugin Name: Random Posts
>> */
>>
>> function query_random_posts($query) {
>>         return query_posts($query . '&random=true');
>> }
>>
>> class RandomPosts {
>>         function orderby($orderby) {
>>                 if ( get_query_var('random') == 'true' )
>>                         return "RAND()";
>>                 else
>>                         return $orderby;
>>         }
>>         function register_query_var($vars) {
>>                 $vars[] = 'random';
>>                 return $vars;
>>         }
>> }
>> add_filter( 'posts_orderby', array('RandomPosts', 'orderby') );
>> add_filter( 'query_vars', array('RandomPosts', 'register_query_var') );
>> ?>
>> _______________________________________________
>> wp-hackers mailing list
>> [hidden email]
>> http://lists.automattic.com/mailman/listinfo/wp-hackers
>>
>
>
> --
> -------------------------------------------------------
> J. Scott Johnson
> Ookles launches 2/28/06 - have you signed up yet?
> new startup: http://ookles.com/
> blog: http://fuzzyblog.com/
> podcast: http://techwarstories.com/
> [hidden email]
> aim: fuzzygroup
> cell: 857 222 6459
> -------------------------------------------------------
> _______________________________________________
> wp-hackers mailing list
> [hidden email]
> http://lists.automattic.com/mailman/listinfo/wp-hackers

_______________________________________________
wp-hackers mailing list
[hidden email]
http://lists.automattic.com/mailman/listinfo/wp-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Extending query_posts?

David Chait
Yes, RAND is awful for ORDER BY.

http://jan.kneschke.de/projects/mysql/order-by-rand/

At the end, he shows times for 100, 1000, ... , 1M records.  RAND starts to
'hurt' even when just 1000 records.  He walks through building an alternate
approach, still in SQL (though using sub-selects, so MySQL 4.1+).

Do you need a completely random entry from the entire data set?  Or would a
random-but-recent entry work? (i.e., select post_id limit 100, pick a random
entry or entries, then do the 'full' query with post_id in {list of selected
ids}...)

-d

----- Original Message -----
From: "Mattias Winther" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, January 31, 2006 12:27 PM
Subject: Re: [wp-hackers] Extending query_posts?


|
| I can't find any specific references to the performance problems you are
referring to... Anyone else out there who recognizes this, preferrably with
some kind of link for further analysis? Or, someone with a huge database
that can run the numbers?
|
| /Mattias
|
| On Mon, 30 Jan 2006 08:10:03 -0500, Scott johnson <[hidden email]>
wrote:
| > I'd be careful with that.  My understanding of randomization in mysql
| > is that the performance is awful.  I haven't done it myself but that's
| > hwat I understand.
| >
| > Scott

_______________________________________________
wp-hackers mailing list
[hidden email]
http://lists.automattic.com/mailman/listinfo/wp-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Extending query_posts?

MWinther

Thanks, great link!

Well, as for my own blog, I'm using to generate recent posts in the same category, meaning that I'll have quite the long way to go before 1000 hits is becoming an issue. Then again, I would like to do a nice plug-in that can be trusted to perform at about the same speed with a ton of rows.

I guess it's time to brush up on my SQL.

/Mattias

On Tue, 31 Jan 2006 13:36:45 -0500, "David Chait" <[hidden email]> wrote:

> Yes, RAND is awful for ORDER BY.
>
> http://jan.kneschke.de/projects/mysql/order-by-rand/
>
> At the end, he shows times for 100, 1000, ... , 1M records.  RAND starts
> to
> 'hurt' even when just 1000 records.  He walks through building an
> alternate
> approach, still in SQL (though using sub-selects, so MySQL 4.1+).
>
> Do you need a completely random entry from the entire data set?  Or would
> a
> random-but-recent entry work? (i.e., select post_id limit 100, pick a
> random
> entry or entries, then do the 'full' query with post_id in {list of
> selected
> ids}...)
>
> -d
>
> ----- Original Message -----
> From: "Mattias Winther" <[hidden email]>
> To: <[hidden email]>
> Sent: Tuesday, January 31, 2006 12:27 PM
> Subject: Re: [wp-hackers] Extending query_posts?
>
>
> |
> | I can't find any specific references to the performance problems you are
> referring to... Anyone else out there who recognizes this, preferrably
> with
> some kind of link for further analysis? Or, someone with a huge database
> that can run the numbers?
> |
> | /Mattias
> |
> | On Mon, 30 Jan 2006 08:10:03 -0500, Scott johnson <[hidden email]>
> wrote:
> | > I'd be careful with that.  My understanding of randomization in mysql
> | > is that the performance is awful.  I haven't done it myself but that's
> | > hwat I understand.
> | >
> | > Scott
>
> _______________________________________________
> wp-hackers mailing list
> [hidden email]
> http://lists.automattic.com/mailman/listinfo/wp-hackers

_______________________________________________
wp-hackers mailing list
[hidden email]
http://lists.automattic.com/mailman/listinfo/wp-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Extending query_posts?

David Chait
If you are trying to do anything like CG-SameCat, the approach I used was to
not use random in the select, but select posts (well, post ID and title
really...) where the category matches, ordered by post ID (or date)
descending, and limit of some factor time the number of random items asked
for... Thus, you want say three random, but recent, related posts -- so grab
the last 10 or 20 and then use PHP to pick a random few from the set.  Just
a thought.

Completely random from entire posts table != completely random from recent
posts in matching category... ;)

-d

----- Original Message -----
From: "Mattias Winther" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, January 31, 2006 7:12 PM
Subject: Re: [wp-hackers] Extending query_posts?


|
| Thanks, great link!
|
| Well, as for my own blog, I'm using to generate recent posts in the same
category, meaning that I'll have quite the long way to go before 1000 hits
is becoming an issue. Then again, I would like to do a nice plug-in that can
be trusted to perform at about the same speed with a ton of rows.
|
| I guess it's time to brush up on my SQL.
|
| /Mattias
|
| On Tue, 31 Jan 2006 13:36:45 -0500, "David Chait" <[hidden email]>
wrote:
| > Yes, RAND is awful for ORDER BY.
| >
| > http://jan.kneschke.de/projects/mysql/order-by-rand/
| >
| > At the end, he shows times for 100, 1000, ... , 1M records.  RAND starts
| > to
| > 'hurt' even when just 1000 records.  He walks through building an
| > alternate
| > approach, still in SQL (though using sub-selects, so MySQL 4.1+).
| >
| > Do you need a completely random entry from the entire data set?  Or
would
| > a
| > random-but-recent entry work? (i.e., select post_id limit 100, pick a
| > random
| > entry or entries, then do the 'full' query with post_id in {list of
| > selected
| > ids}...)
| >
| > -d
| >
| > ----- Original Message -----
| > From: "Mattias Winther" <[hidden email]>
| > To: <[hidden email]>
| > Sent: Tuesday, January 31, 2006 12:27 PM
| > Subject: Re: [wp-hackers] Extending query_posts?
| >
| >
| > |
| > | I can't find any specific references to the performance problems you
are
| > referring to... Anyone else out there who recognizes this, preferrably
| > with
| > some kind of link for further analysis? Or, someone with a huge database
| > that can run the numbers?
| > |
| > | /Mattias
| > |
| > | On Mon, 30 Jan 2006 08:10:03 -0500, Scott johnson
<[hidden email]>
| > wrote:
| > | > I'd be careful with that.  My understanding of randomization in
mysql
| > | > is that the performance is awful.  I haven't done it myself but
that's
| > | > hwat I understand.
| > | >
| > | > Scott
| >
| > _______________________________________________
| > wp-hackers mailing list
| > [hidden email]
| > http://lists.automattic.com/mailman/listinfo/wp-hackers
|
| _______________________________________________
| wp-hackers mailing list
| [hidden email]
| http://lists.automattic.com/mailman/listinfo/wp-hackers 

_______________________________________________
wp-hackers mailing list
[hidden email]
http://lists.automattic.com/mailman/listinfo/wp-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Extending query_posts?

Scott johnson-5
What I used to do back at my last gig was, for sequential data, get a min /
max and then use php's rand function and then use a select (since auto
increments are consecutive).  And then use a recursive function if the
autoincrement id didn't exist for example.

The problem here is that whenever we did it we found that users rarely if
ever wanted true randomness.  Its almost always some type of "random but
within a set" and "wait for the details" type of thing.

now bear in mind my last gig was "index the blogosphere" (Feedster) which is
worlds different.  A lot of times 2 random selections would get data from
the same feed since we indexed data in order it was received. (hence the
comment about users not really wanting random).

Scott

On 1/31/06, David Chait <[hidden email]> wrote:

>
> If you are trying to do anything like CG-SameCat, the approach I used was
> to
> not use random in the select, but select posts (well, post ID and title
> really...) where the category matches, ordered by post ID (or date)
> descending, and limit of some factor time the number of random items asked
> for... Thus, you want say three random, but recent, related posts -- so
> grab
> the last 10 or 20 and then use PHP to pick a random few from the
> set.  Just
> a thought.
>
> Completely random from entire posts table != completely random from recent
> posts in matching category... ;)
>
> -d
>
> ----- Original Message -----
> From: "Mattias Winther" <[hidden email]>
> To: <[hidden email]>
> Sent: Tuesday, January 31, 2006 7:12 PM
> Subject: Re: [wp-hackers] Extending query_posts?
>
>
> |
> | Thanks, great link!
> |
> | Well, as for my own blog, I'm using to generate recent posts in the same
> category, meaning that I'll have quite the long way to go before 1000 hits
> is becoming an issue. Then again, I would like to do a nice plug-in that
> can
> be trusted to perform at about the same speed with a ton of rows.
> |
> | I guess it's time to brush up on my SQL.
> |
> | /Mattias
> |
> | On Tue, 31 Jan 2006 13:36:45 -0500, "David Chait" <[hidden email]
> >
> wrote:
> | > Yes, RAND is awful for ORDER BY.
> | >
> | > http://jan.kneschke.de/projects/mysql/order-by-rand/
> | >
> | > At the end, he shows times for 100, 1000, ... , 1M records.  RAND
> starts
> | > to
> | > 'hurt' even when just 1000 records.  He walks through building an
> | > alternate
> | > approach, still in SQL (though using sub-selects, so MySQL 4.1+).
> | >
> | > Do you need a completely random entry from the entire data set?  Or
> would
> | > a
> | > random-but-recent entry work? (i.e., select post_id limit 100, pick a
> | > random
> | > entry or entries, then do the 'full' query with post_id in {list of
> | > selected
> | > ids}...)
> | >
> | > -d
> | >
> | > ----- Original Message -----
> | > From: "Mattias Winther" <[hidden email]>
> | > To: <[hidden email]>
> | > Sent: Tuesday, January 31, 2006 12:27 PM
> | > Subject: Re: [wp-hackers] Extending query_posts?
> | >
> | >
> | > |
> | > | I can't find any specific references to the performance problems you
> are
> | > referring to... Anyone else out there who recognizes this, preferrably
> | > with
> | > some kind of link for further analysis? Or, someone with a huge
> database
> | > that can run the numbers?
> | > |
> | > | /Mattias
> | > |
> | > | On Mon, 30 Jan 2006 08:10:03 -0500, Scott johnson
> <[hidden email]>
> | > wrote:
> | > | > I'd be careful with that.  My understanding of randomization in
> mysql
> | > | > is that the performance is awful.  I haven't done it myself but
> that's
> | > | > hwat I understand.
> | > | >
> | > | > Scott
> | >
> | > _______________________________________________
> | > wp-hackers mailing list
> | > [hidden email]
> | > http://lists.automattic.com/mailman/listinfo/wp-hackers
> |
> | _______________________________________________
> | wp-hackers mailing list
> | [hidden email]
> | http://lists.automattic.com/mailman/listinfo/wp-hackers
>
> _______________________________________________
> wp-hackers mailing list
> [hidden email]
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>
_______________________________________________
wp-hackers mailing list
[hidden email]
http://lists.automattic.com/mailman/listinfo/wp-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Extending query_posts?

MWinther
In reply to this post by David Chait

Well, my thought is to get completely random from all posts in matching categor(y|ies), so I'm somewhere in between. But, since ordering by random works without hurting up to about 1000 hits, and considering the amount of posts I have, combined with the amount of posts/day I write, I figure I'll have a few years before this becomes an actual problem for me personally... And I'm guessing that once I start hitting the numbers, I will indeed choose to pick between more recent posts.

Since I'm modifying the SQL ordering, from what I understand, the posts are first selected from the criteria, and then ordered according to, in my case, random order, which should mean that with selective query_posts-usage one should be able to use the simple version in a lot of cases, even with large number of posts, without suffering any real performance problems. Optimizing seems like a great idea, and once I get my new iMac, I'll have the hardware to test on too. :) What is the easiest way to populate a test instance of the database with, say, 10000 posts?

On Tue, 31 Jan 2006 20:13:53 -0500, "David Chait" <[hidden email]> wrote:

> If you are trying to do anything like CG-SameCat, the approach I used was
> to
> not use random in the select, but select posts (well, post ID and title
> really...) where the category matches, ordered by post ID (or date)
> descending, and limit of some factor time the number of random items asked
> for... Thus, you want say three random, but recent, related posts -- so
> grab
> the last 10 or 20 and then use PHP to pick a random few from the set.
> Just
> a thought.
>
> Completely random from entire posts table != completely random from recent
> posts in matching category... ;)
>
> -d
>
> ----- Original Message -----
> From: "Mattias Winther" <[hidden email]>
> To: <[hidden email]>
> Sent: Tuesday, January 31, 2006 7:12 PM
> Subject: Re: [wp-hackers] Extending query_posts?
>
>
> |
> | Thanks, great link!
> |
> | Well, as for my own blog, I'm using to generate recent posts in the same
> category, meaning that I'll have quite the long way to go before 1000 hits
> is becoming an issue. Then again, I would like to do a nice plug-in that
> can
> be trusted to perform at about the same speed with a ton of rows.
> |
> | I guess it's time to brush up on my SQL.
> |
> | /Mattias
> |
> | On Tue, 31 Jan 2006 13:36:45 -0500, "David Chait"
> <[hidden email]>
> wrote:
> | > Yes, RAND is awful for ORDER BY.
> | >
> | > http://jan.kneschke.de/projects/mysql/order-by-rand/
> | >
> | > At the end, he shows times for 100, 1000, ... , 1M records.  RAND
> starts
> | > to
> | > 'hurt' even when just 1000 records.  He walks through building an
> | > alternate
> | > approach, still in SQL (though using sub-selects, so MySQL 4.1+).
> | >
> | > Do you need a completely random entry from the entire data set?  Or
> would
> | > a
> | > random-but-recent entry work? (i.e., select post_id limit 100, pick a
> | > random
> | > entry or entries, then do the 'full' query with post_id in {list of
> | > selected
> | > ids}...)
> | >
> | > -d
> | >
> | > ----- Original Message -----
> | > From: "Mattias Winther" <[hidden email]>
> | > To: <[hidden email]>
> | > Sent: Tuesday, January 31, 2006 12:27 PM
> | > Subject: Re: [wp-hackers] Extending query_posts?
> | >
> | >
> | > |
> | > | I can't find any specific references to the performance problems you
> are
> | > referring to... Anyone else out there who recognizes this, preferrably
> | > with
> | > some kind of link for further analysis? Or, someone with a huge
> database
> | > that can run the numbers?
> | > |
> | > | /Mattias
> | > |
> | > | On Mon, 30 Jan 2006 08:10:03 -0500, Scott johnson
> <[hidden email]>
> | > wrote:
> | > | > I'd be careful with that.  My understanding of randomization in
> mysql
> | > | > is that the performance is awful.  I haven't done it myself but
> that's
> | > | > hwat I understand.
> | > | >
> | > | > Scott
> | >
> | > _______________________________________________
> | > wp-hackers mailing list
> | > [hidden email]
> | > http://lists.automattic.com/mailman/listinfo/wp-hackers
> |
> | _______________________________________________
> | wp-hackers mailing list
> | [hidden email]
> | http://lists.automattic.com/mailman/listinfo/wp-hackers
>
> _______________________________________________
> wp-hackers mailing list
> [hidden email]
> http://lists.automattic.com/mailman/listinfo/wp-hackers

_______________________________________________
wp-hackers mailing list
[hidden email]
http://lists.automattic.com/mailman/listinfo/wp-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Extending query_posts?

Scott johnson-5
Another way to handle ordering by random ness is to hash a key in table and
then sort by the hash. Since the data distribution is basically fairly even,
that actually works ok (provided you clear the table periodically).

And yes my perspective is forever warped by terabyte + size mysql stuff.
Many things to me are fraught with danger (and I bloody well love MySQL just
its damn easy to run amok with it).

One thing I'd comment on is be careful with perfomance if your server is
getting any kind of load.  Use explain on your queries and watch ram
consumption.  When you use a lot of rand stuff even w/ little data involved
my gut feeling is *be careful*.  Be very, very careful.

S

On 2/1/06, Mattias Winther <[hidden email]> wrote:

>
>
> Well, my thought is to get completely random from all posts in matching
> categor(y|ies), so I'm somewhere in between. But, since ordering by random
> works without hurting up to about 1000 hits, and considering the amount of
> posts I have, combined with the amount of posts/day I write, I figure I'll
> have a few years before this becomes an actual problem for me personally...
> And I'm guessing that once I start hitting the numbers, I will indeed choose
> to pick between more recent posts.
>
> Since I'm modifying the SQL ordering, from what I understand, the posts
> are first selected from the criteria, and then ordered according to, in my
> case, random order, which should mean that with selective query_posts-usage
> one should be able to use the simple version in a lot of cases, even with
> large number of posts, without suffering any real performance problems.
> Optimizing seems like a great idea, and once I get my new iMac, I'll have
> the hardware to test on too. :) What is the easiest way to populate a test
> instance of the database with, say, 10000 posts?
>
> On Tue, 31 Jan 2006 20:13:53 -0500, "David Chait" <[hidden email]>
> wrote:
> > If you are trying to do anything like CG-SameCat, the approach I used
> was
> > to
> > not use random in the select, but select posts (well, post ID and title
> > really...) where the category matches, ordered by post ID (or date)
> > descending, and limit of some factor time the number of random items
> asked
> > for... Thus, you want say three random, but recent, related posts -- so
> > grab
> > the last 10 or 20 and then use PHP to pick a random few from the set.
> > Just
> > a thought.
> >
> > Completely random from entire posts table != completely random from
> recent
> > posts in matching category... ;)
> >
> > -d
> >
> > ----- Original Message -----
> > From: "Mattias Winther" <[hidden email]>
> > To: <[hidden email]>
> > Sent: Tuesday, January 31, 2006 7:12 PM
> > Subject: Re: [wp-hackers] Extending query_posts?
> >
> >
> > |
> > | Thanks, great link!
> > |
> > | Well, as for my own blog, I'm using to generate recent posts in the
> same
> > category, meaning that I'll have quite the long way to go before 1000
> hits
> > is becoming an issue. Then again, I would like to do a nice plug-in that
> > can
> > be trusted to perform at about the same speed with a ton of rows.
> > |
> > | I guess it's time to brush up on my SQL.
> > |
> > | /Mattias
> > |
> > | On Tue, 31 Jan 2006 13:36:45 -0500, "David Chait"
> > <[hidden email]>
> > wrote:
> > | > Yes, RAND is awful for ORDER BY.
> > | >
> > | > http://jan.kneschke.de/projects/mysql/order-by-rand/
> > | >
> > | > At the end, he shows times for 100, 1000, ... , 1M records.  RAND
> > starts
> > | > to
> > | > 'hurt' even when just 1000 records.  He walks through building an
> > | > alternate
> > | > approach, still in SQL (though using sub-selects, so MySQL 4.1+).
> > | >
> > | > Do you need a completely random entry from the entire data set?  Or
> > would
> > | > a
> > | > random-but-recent entry work? (i.e., select post_id limit 100, pick
> a
> > | > random
> > | > entry or entries, then do the 'full' query with post_id in {list of
> > | > selected
> > | > ids}...)
> > | >
> > | > -d
> > | >
> > | > ----- Original Message -----
> > | > From: "Mattias Winther" <[hidden email]>
> > | > To: <[hidden email]>
> > | > Sent: Tuesday, January 31, 2006 12:27 PM
> > | > Subject: Re: [wp-hackers] Extending query_posts?
> > | >
> > | >
> > | > |
> > | > | I can't find any specific references to the performance problems
> you
> > are
> > | > referring to... Anyone else out there who recognizes this,
> preferrably
> > | > with
> > | > some kind of link for further analysis? Or, someone with a huge
> > database
> > | > that can run the numbers?
> > | > |
> > | > | /Mattias
> > | > |
> > | > | On Mon, 30 Jan 2006 08:10:03 -0500, Scott johnson
> > <[hidden email]>
> > | > wrote:
> > | > | > I'd be careful with that.  My understanding of randomization in
> > mysql
> > | > | > is that the performance is awful.  I haven't done it myself but
> > that's
> > | > | > hwat I understand.
> > | > | >
> > | > | > Scott
> > | >
> > | > _______________________________________________
> > | > wp-hackers mailing list
> > | > [hidden email]
> > | > http://lists.automattic.com/mailman/listinfo/wp-hackers
> > |
> > | _______________________________________________
> > | wp-hackers mailing list
> > | [hidden email]
> > | http://lists.automattic.com/mailman/listinfo/wp-hackers
> >
> > _______________________________________________
> > wp-hackers mailing list
> > [hidden email]
> > http://lists.automattic.com/mailman/listinfo/wp-hackers
>
> _______________________________________________
> wp-hackers mailing list
> [hidden email]
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>



--
-------------------------------------------------------
J. Scott Johnson
Ookles launches 2/28/06 - have you signed up yet?
new startup: http://ookles.com/
blog: http://fuzzyblog.com/
podcast: http://techwarstories.com/
[hidden email]
aim: fuzzygroup
cell: 857 222 6459
-------------------------------------------------------
_______________________________________________
wp-hackers mailing list
[hidden email]
http://lists.automattic.com/mailman/listinfo/wp-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Extending query_posts?

David Chait
In reply to this post by MWinther
If you will only ever have say 200-300 posts with the category filtering,
then you are probably okay -- if you really don't care about the overhead.

Note that I don't know the internals of MySQL, but I'd assume that in order
to do an ORDER BY RAND(), I'd expect it has to randomize the ENTIRE table.
Even with a LIMIT in place, I think it still computes the entire result
table first.  That's why you are actually better off doing something like
(butchering it here) select post id,post title from posts where category in
(list of categories) order by post timestamp desc limit 100, then do a
random routine that picks N items from the results.  Why?

1. select less data whenever possible.  with post id and title, you can make
a link.

2. select a smaller result set.  category is your filter already, we'll
stick with that.

3. don't randomize the entire set.  even in PHP, many of the randomization
approaches will, say, randomize an entire array -- whereas you just want N
random entries out of the array.  randomizing the entire set is wasteful.

4. overhead will continue to grow.  remember that this thing happens every
pageload of a single page.  my approach could actually be further optimized
by caching the results of the query for say 12h or something, and doing only
the random-pick from the cache (I don't do this internal to CG-SameCat
currently, but I think my home-grown query caching does it one level higher,
for some number of hours).

best of luck,
-d

----- Original Message -----
From: "Mattias Winther" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, February 01, 2006 12:18 AM
Subject: Re: [wp-hackers] Extending query_posts?


|
| Well, my thought is to get completely random from all posts in matching
categor(y|ies), so I'm somewhere in between. But, since ordering by random
works without hurting up to about 1000 hits, and considering the amount of
posts I have, combined with the amount of posts/day I write, I figure I'll
have a few years before this becomes an actual problem for me personally...
And I'm guessing that once I start hitting the numbers, I will indeed choose
to pick between more recent posts.
|
| Since I'm modifying the SQL ordering, from what I understand, the posts
are first selected from the criteria, and then ordered according to, in my
case, random order, which should mean that with selective query_posts-usage
one should be able to use the simple version in a lot of cases, even with
large number of posts, without suffering any real performance problems.
Optimizing seems like a great idea, and once I get my new iMac, I'll have
the hardware to test on too. :) What is the easiest way to populate a test
instance of the database with, say, 10000 posts?
|
| On Tue, 31 Jan 2006 20:13:53 -0500, "David Chait" <[hidden email]>
wrote:
| > If you are trying to do anything like CG-SameCat, the approach I used
was
| > to
| > not use random in the select, but select posts (well, post ID and title
| > really...) where the category matches, ordered by post ID (or date)
| > descending, and limit of some factor time the number of random items
asked
| > for... Thus, you want say three random, but recent, related posts -- so
| > grab
| > the last 10 or 20 and then use PHP to pick a random few from the set.
| > Just
| > a thought.
| >
| > Completely random from entire posts table != completely random from
recent
| > posts in matching category... ;)
| >
| > -d
| >
| > ----- Original Message -----
| > From: "Mattias Winther" <[hidden email]>
| > To: <[hidden email]>
| > Sent: Tuesday, January 31, 2006 7:12 PM
| > Subject: Re: [wp-hackers] Extending query_posts?
| >
| >
| > |
| > | Thanks, great link!
| > |
| > | Well, as for my own blog, I'm using to generate recent posts in the
same
| > category, meaning that I'll have quite the long way to go before 1000
hits
| > is becoming an issue. Then again, I would like to do a nice plug-in that
| > can
| > be trusted to perform at about the same speed with a ton of rows.
| > |
| > | I guess it's time to brush up on my SQL.
| > |
| > | /Mattias
| > |
| > | On Tue, 31 Jan 2006 13:36:45 -0500, "David Chait"
| > <[hidden email]>
| > wrote:
| > | > Yes, RAND is awful for ORDER BY.
| > | >
| > | > http://jan.kneschke.de/projects/mysql/order-by-rand/
| > | >
| > | > At the end, he shows times for 100, 1000, ... , 1M records.  RAND
| > starts
| > | > to
| > | > 'hurt' even when just 1000 records.  He walks through building an
| > | > alternate
| > | > approach, still in SQL (though using sub-selects, so MySQL 4.1+).
| > | >
| > | > Do you need a completely random entry from the entire data set?  Or
| > would
| > | > a
| > | > random-but-recent entry work? (i.e., select post_id limit 100, pick
a
| > | > random
| > | > entry or entries, then do the 'full' query with post_id in {list of
| > | > selected
| > | > ids}...)
| > | >
| > | > -d
| > | >
| > | > ----- Original Message -----
| > | > From: "Mattias Winther" <[hidden email]>
| > | > To: <[hidden email]>
| > | > Sent: Tuesday, January 31, 2006 12:27 PM
| > | > Subject: Re: [wp-hackers] Extending query_posts?
| > | >
| > | >
| > | > |
| > | > | I can't find any specific references to the performance problems
you
| > are
| > | > referring to... Anyone else out there who recognizes this,
preferrably
| > | > with
| > | > some kind of link for further analysis? Or, someone with a huge
| > database
| > | > that can run the numbers?
| > | > |
| > | > | /Mattias
| > | > |
| > | > | On Mon, 30 Jan 2006 08:10:03 -0500, Scott johnson
| > <[hidden email]>
| > | > wrote:
| > | > | > I'd be careful with that.  My understanding of randomization in
| > mysql
| > | > | > is that the performance is awful.  I haven't done it myself but
| > that's
| > | > | > hwat I understand.
| > | > | >
| > | > | > Scott
| > | >
| > | > _______________________________________________
| > | > wp-hackers mailing list
| > | > [hidden email]
| > | > http://lists.automattic.com/mailman/listinfo/wp-hackers
| > |
| > | _______________________________________________
| > | wp-hackers mailing list
| > | [hidden email]
| > | http://lists.automattic.com/mailman/listinfo/wp-hackers
| >
| > _______________________________________________
| > wp-hackers mailing list
| > [hidden email]
| > http://lists.automattic.com/mailman/listinfo/wp-hackers
|
| _______________________________________________
| wp-hackers mailing list
| [hidden email]
| http://lists.automattic.com/mailman/listinfo/wp-hackers
|

_______________________________________________
wp-hackers mailing list
[hidden email]
http://lists.automattic.com/mailman/listinfo/wp-hackers
Loading...