Trimming the CPAN - "Automatic Purging"

Currently on PAUSE you have to explicitly delete old uploads.

How about changing it so you have to explicitly KEEP old uploads
that appear to have been superseded?

PAUSE already has a mechanism to delete files at some future point in
time. That's currently only used as part of a safety/sanity check to
delay deletions that were manually invoked.

I envisage PAUSE having a set of rules it would apply monthly, say,
to automatically select files for "purging".

The rules might look something like this:

    File does not have deletion date set, and
    File is older than 3 months, and
    File has a later upload
        - in the same directory
        - with the same major version
        - with a higher minor version
        - which is also more than 3 months old

(Naturally these are just suggestions. Let's not bikeshed the fine
details yet. It's the approach we need to discuss first.)

Files selected in this way would be scheduled to be deleted in a month
and an email would be sent to the authors, just as if they'd selected
the files for deletion via PAUSE.

All that's needed, in addition to the above script, is a way for authors
to indicate that a particular file shouldn't be purged. The database
could use a far-future date for that which the UI could present as
"do not purge" checkbox against the file.

Tim.
0
Tim
3/25/2010 11:12:32 AM
perl.module-authors 1604 articles. 0 followers. Follow

88 Replies
877 Views

Similar Articles

[PageSpeed] 40
Get it on Google Play
Get it on Apple App Store

On Thu, Mar 25, 2010 at 11:12:32AM +0000, Tim Bunce wrote:
> Currently on PAUSE you have to explicitly delete old uploads.

Which often is a good thing. While BACKPAN exists, it isn't somewhere
that many go to look for old distributions. For me and probably others,
BACKPAN only distributions are ones that have been specifically marked
by the maintainers as obsolete, badly broken or similar.

Automatic deletes from CPAN would change that.

There are many distributions on CPAN that older versions work on a
particular perl/os, but more recent ones don't. Latest isn't necessarily
the greatest. 

If you are going to perform this then it should really feed off the CPAN
Testers to know if a specific release has been marked as being the
latest working release for a particular perl/os.

I would also suggest extending the timeframe considerably to perhaps 3
or maybe 5 years.

Lastly I would also personnally be annoyed if only the latest versions
were available, as I often make great use of the diff tool on
search.cpan.org. Having only the latest version renders that great tool
redundant :(

> Files selected in this way would be scheduled to be deleted in a month
> and an email would be sent to the authors, just as if they'd selected
> the files for deletion via PAUSE.

There are already many authors who have non-responding email addresses
(I will get around to publicising that list at some point), so some
will likely disappear down a blackhole. What if you're about to delete a
set of distributions that should really be kept available? No one would
be listening to know that it should still be kept.

I would prefer a suggestion email to authors to delete, rather than an
email telling them that their distributions will be deleted unless they
do something.

Cheers,
Barbie.
-- 
Birmingham Perl Mongers <http://birmingham.pm.org>
Memoirs Of A Roadie <http://barbie.missbarbell.co.uk>
CPAN Testers Blog <http://blog.cpantesters.org>
YAPC Conference Surveys <http://yapc-surveys.org>


0
barbie
3/25/2010 1:42:58 PM
On Mar 25, 2010, at 8:42 AM, Barbie wrote:
>=20
> Lastly I would also personnally be annoyed if only the latest versions
> were available, as I often make great use of the diff tool on
> search.cpan.org. Having only the latest version renders that great =
tool
> redundant :(

I use that too :-) and it is very annoying that some authors =
automatically delete
previous releases when they upload a new one.

Graham.

0
gbarr
3/25/2010 1:46:30 PM
I have one case where the v1 and v2 of a module are simply
incompatible, but v1 still works, and unless the users have a
compelling reason, they won't migrate.  Pulling the rug from under
them would be quite unsportsmanlike.

Deletion should be opt-in, and there should be a way to "pin" some
releases as unreapable.  And warning emails (yes, some email addresses
are blackholes) to the author well in advance: "your module X version
Y will be deleted as you requested in Z weeks because there are P
newer releases ..."

-- 
There is this special biologist word we use for 'stable'. It is
'dead'. -- Jack Cohen
0
jhi
3/25/2010 3:00:42 PM
On Mar 25, 2010, at 4:12, Tim Bunce wrote:

> Currently on PAUSE you have to explicitly delete old uploads.
>=20
> How about changing it so you have to explicitly KEEP old uploads
> that appear to have been superseded?

I like it.

I agree with Jarkko that there should be a way to "pin" some versions =
and the configuration should be "more than N newer releases" or some =
such.

I think it should be on by default though.  Older than 3 (or 6?) months =
and at least 2 or 3 (or more?) newer releases or some such.

For most authors this won't change anything -- but it'll help those who =
unhelpfully _never_ delete anything.

On Search CPAN maybe BackPAN could be used to pull in older versions for =
diffs etc...


  - ask=
0
ask
3/25/2010 3:10:47 PM
What Jarkko said.

On Mar 25, 2010, at 08:00, Jarkko Hietaniemi wrote:

> I have one case where the v1 and v2 of a module are simply
> incompatible, but v1 still works, and unless the users have a
> compelling reason, they won't migrate.  Pulling the rug from under
> them would be quite unsportsmanlike.
> 
> Deletion should be opt-in, and there should be a way to "pin" some
> releases as unreapable.  And warning emails (yes, some email addresses
> are blackholes) to the author well in advance: "your module X version
> Y will be deleted as you requested in Z weeks because there are P
> newer releases ..."
> 
> -- 
> There is this special biologist word we use for 'stable'. It is
> 'dead'. -- Jack Cohen


-- 
Chris Nandor             pudge@pobox.com             http://pudge.net/
Slashdot / Geeknet       pudge@slashdot.org       http://slashdot.org/

0
pudge
3/25/2010 3:14:19 PM
On Mar 25, 2010, at 08:10, Ask Bj=F8rn Hansen wrote:

> I agree with Jarkko that there should be a way to "pin" some versions =
and the configuration should be "more than N newer releases" or some =
such.
>=20
> I think it should be on by default though.  Older than 3 (or 6?) =
months and at least 2 or 3 (or more?) newer releases or some such.

I like that solution better, BUT, there's a significant chance that some =
things will fall through the cracks (for authors who don't get the =
notices, for example), and because we put out release software on the =
CPAN that people rely on, I have to agree with Jarkko and vote to err on =
the side of safety first.

I'd rather spend more energy getting people to opt in, than opt them in =
by default.

--=20
Chris Nandor             pudge@pobox.com             http://pudge.net/
Slashdot / Geeknet       pudge@slashdot.org       http://slashdot.org/

0
pudge
3/25/2010 3:36:29 PM
On 25 Mar 2010, at 15:36, Chris Nandor wrote:
> I like that solution better


[snip]

But solution to what? Are we convinced there's actually a problem here?

-- 
Andy Armstrong, Hexten



0
andy
3/25/2010 3:38:46 PM
On Mar 25, 2010, at 8:38, Andy Armstrong wrote:

>> I like that solution better
>=20
>=20
> [snip]
>=20
> But solution to what? Are we convinced there's actually a problem =
here?

CPAN has almost 200k files.  www.cpan.org says there are "17627 =
modules".  rsyncing a gazillion files doesn't work that well (on the =
server).  Helping authors remember to delete things that are now =
irrelevant from the main CPAN system will make it easier to run mirrors =
and keep them fresh.


 - ask=
0
ask
3/25/2010 3:55:27 PM
On Thu, Mar 25, 2010 at 4:55 PM, Ask Bj=F8rn Hansen <ask@perl.org> wrote:
>
> On Mar 25, 2010, at 8:38, Andy Armstrong wrote:
>
>>> I like that solution better
>>
>> [snip]
>>
>> But solution to what? Are we convinced there's actually a problem here?
>
> CPAN has almost 200k files. =A0www.cpan.org says there are "17627 modules=
". =A0rsyncing a gazillion files doesn't work that well (on the server). =
=A0Helping authors remember to delete things that are now irrelevant from t=
he main CPAN system will make it easier to run mirrors and keep them fresh.

I appreciate that the number of files on CPAN has implications for the
infrastructure, but I feel a need to have some more factual info
before conceding to such measures.

Also, having _software_ determine what is 'irrelevant' is a dangerous
path indeed.

One of the strengths of CPAN is the low barrier of entry. If we lower
the barrier of exit, I'm not at all convinced we end up in a
significantly better place.

/Lars
0
lars
3/26/2010 9:55:13 AM
On Mar 26, 2010, at 4:55 AM, Lars Thegler wrote:

> I appreciate that the number of files on CPAN has implications for the
> infrastructure, but I feel a need to have some more factual info
> before conceding to such measures.

Absolutely.  This factual info would ideally look like this:

"Of the 17,000 distros on CPAN, there are 8,000 that have versions more =
than a year older than the most recent one.  If those distros with =
versions more than a year out of date were purged, the number of files =
would decrease from 200,000 to 120,000.  This would save 7GB out of the =
12GB that a full CPAN mirror takes now.  Removing that 7GB would mean =
Benefit X to mirror owners."

Without that, how can module authors be bothered to care?


xoxo,
Andy


--
Andy Lester =3D> andy@petdance.com =3D> www.theworkinggeek.com =3D> =
AIM:petdance




0
andy
3/26/2010 4:02:39 PM
> -----Original Message-----
> From: Ask Bj=C3=B8rn Hansen [mailto:ask@perl.org]
> Sent: Thursday, March 25, 2010 5:11 PM
> To: Tim Bunce
> Cc: cpan-workers@perl.org; module-authors@perl.org; Andreas J. Koenig
> Subject: Re: Trimming the CPAN - "Automatic Purging"
>=20


> On Search CPAN maybe BackPAN could be used to pull in older versions =
for diffs
> etc...

There is also gitPAN fort hat stuff: http://github.com/gitpan


0
burakgursoy
3/26/2010 4:41:23 PM
On Fri, 26 Mar 2010, Andy Lester wrote:

> Absolutely.  This factual info would ideally look like this:
>
> "Of the 17,000 distros on CPAN, there are 8,000 that have versions more than a year older than the most recent one.  If those distros with versions more than a year out of date were purged, the number of files would decrease from 200,000 to 120,000.  This would save 7GB out of the 12GB that a full CPAN mirror takes now.  Removing that 7GB would mean Benefit X to mirror owners."
>
> Without that, how can module authors be bothered to care?

If you don't mind me interjecting, I still can't be bothered to care.  We
have basically a 12GB data set, and we're worried about that?  I see that a
small barrier to bringing on new mirrors on constrained pipes, but
ultimately that's not that big a deal.  Hell, there's single versions of
some Linux distros that are bigger than that.

End sum:  I personally don't think this is the most pressing issue facing
CPAN.  Just issue a best practices guide to all the module authors (or
include it as on-line documentation in PAUSE) and be done with it.

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/26/2010 5:20:11 PM
On Friday-201003-26 13:20, Arthur Corliss wrote:
> On Fri, 26 Mar 2010, Andy Lester wrote:
>
>> Absolutely.  This factual info would ideally look like this:
>>
>> "Of the 17,000 distros on CPAN, there are 8,000 that have versions more than a year older than the most recent one.  If those distros with versions more than a year out of date were purged, the number of files would decrease from 200,000 to 120,000.  This would save 7GB out of the 12GB that a full CPAN mirror takes now.  Removing that 7GB would mean Benefit X to mirror owners."
>>
>> Without that, how can module authors be bothered to care?
>
> If you don't mind me interjecting, I still can't be bothered to care.  We
> have basically a 12GB data set, and we're worried about that?  I see that a
> small barrier to bringing on new mirrors on constrained pipes, but
> ultimately that's not that big a deal.  Hell, there's single versions of
> some Linux distros that are bigger than that.

The total size is not the problem.  The number of files is.  Vanilla
rsync is horribly inefficient (not the protocol, which is genius, mind)
because a client coming by and asking for updates basically ends up
requiring the moral equivalent of
"find . -type f -print".  Let me repeat that: each client.  Not fun.

0
jhi
3/26/2010 10:43:04 PM
On Fri, 26 Mar 2010, Jarkko Hietaniemi wrote:

> The total size is not the problem.  The number of files is.  Vanilla
> rsync is horribly inefficient (not the protocol, which is genius, mind)
> because a client coming by and asking for updates basically ends up
> requiring the moral equivalent of
> "find . -type f -print".  Let me repeat that: each client.  Not fun.

Why use rsync, then?  Why not have checkpointed logs on cpan with
additions/removals logged by date so you can roll forward on the client,
processing only those files?  It would be trivial to set up and a lot more
efficient.

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/26/2010 11:02:22 PM
On Friday-201003-26 19:02, Arthur Corliss wrote:
> On Fri, 26 Mar 2010, Jarkko Hietaniemi wrote:
>
>> The total size is not the problem.  The number of files is.  Vanilla
>> rsync is horribly inefficient (not the protocol, which is genius, mind)
>> because a client coming by and asking for updates basically ends up
>> requiring the moral equivalent of
>> "find . -type f -print".  Let me repeat that: each client.  Not fun.
>
> Why use rsync, then?  Why not have checkpointed logs on cpan with
> additions/removals logged by date so you can roll forward on the client,
> processing only those files?  It would be trivial to set up and a lot more
> efficient.

We wait your implementation breathlessly.  By the time all the CPAN 
mirrors have started using that, we probably will be rather blue in
the face.

>   	--Arthur Corliss
>   	  Live Free or Die
>

0
jhi
3/26/2010 11:06:54 PM
On Fri, 26 Mar 2010, Jarkko Hietaniemi wrote:

> We wait your implementation breathlessly.  By the time all the CPAN mirrors 
> have started using that, we probably will be rather blue in
> the face.

Now, let's not be that way.  :-)  You need to pick your problem domain.  You
guys can try to go through a lot of machinations to establish storage
policies which account for the million corner cases necessary to support all
the various versions of libraries & perl, and are relatively painless to
implement without raising the ire of all the contributors.... or just
improve the efficiency of synchronizing the mirrors.

<G> I know what sounds a hell of a lot easier and faster to me...  *Really*
fast for anyone familiar with the PAUSE code base.

Rsync by itself is definitely a bad idea for the number of files, I agree
whole-heartedly.  But it's the weakest and simplest link to replace.

Would I be happy to help?  Sure.  But I don't feel like diving into a
foreign code base all by myself?  No.  I don't have that many spare cycles.

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/26/2010 11:32:59 PM
On Mar 26, 2010, at 16:02, Arthur Corliss wrote:

> Why use rsync, then?  Why not have checkpointed logs on cpan with
> additions/removals logged by date so you can roll forward on the =
client,
> processing only those files?  It would be trivial to set up and a lot =
more
> efficient.


I find it curious that everyone who's actually involved in syncing the =
files or running mirror servers seem to think it generally sounds like a =
good idea and everyone who doesn't say it's "not worth the effort".

Anyway -- we have some other ideas for cutting down the number of files =
that we already agreed on but just needs announcement (which I promised =
to write up, oops).  No, I'm not going to make Tim's mistake and suggest =
it here first.

Tim: Next time just get the paint in your preferred color.  :-)


 - ask

0
ask
3/26/2010 11:44:26 PM
--286030772-1307462143-1269649388=:23890
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Fri, 26 Mar 2010, Ask Bj=F8rn Hansen wrote:

> I find it curious that everyone who's actually involved in syncing the fi=
les or running mirror servers seem to think it generally sounds like a good=
 idea and everyone who doesn't say it's "not worth the effort".

Sure, I don't run a CPAN mirror, but I do manage many, many terrabytes of
storage as part of my day job.  I think it's a tad presumptuous to disregar=
d
input just because we're not in your inner sanctum.  As I mentioned in a
follow up e-mail:  this is simply a matter of selecting the correct problem
domain.  I believe that streamlining the mirroring process will provide
greater gains for less effort.

That's not to say that pursuing other efficiencies isn't worthwhile, just
that you need to prioritize.

But what the hell do I know.  I don't run a *CPAN* mirror, so I must be
freaking clueless...

 =09--Arthur Corliss
 =09  Live Free or Die
--286030772-1307462143-1269649388=:23890--
0
corliss
3/27/2010 12:23:08 AM
On Fri, 26 Mar 2010, Arthur Corliss wrote:
> But what the hell do I know.  I don't run a *CPAN* mirror, so I must be
> freaking clueless...

It's not about what you know, but about what you are willing to
do yourself.

At some point you have to accept that the people who *do* the work
decide *how* they do it.

There is not much point in just talking to volunteers that they should
not be doing something but instead be doing something else if you are
not willing to take the burden of doing this other thing yourself.

Volunteers are not free labor that the talking masses can direct with
majority votes. :)

Cheers,
-Jan


0
jand
3/27/2010 12:54:24 AM
On Mar 26, 2010, at 8:23 PM, Arthur Corliss wrote:
> 
> Sure, I don't run a CPAN mirror, but I do manage many, many terrabytes of
> storage as part of my day job.  I think it's a tad presumptuous to disregard
> input just because we're not in your inner sanctum.  As I mentioned in a
> follow up e-mail:  this is simply a matter of selecting the correct problem
> domain.  I believe that streamlining the mirroring process will provide
> greater gains for less effort.
> 
> That's not to say that pursuing other efficiencies isn't worthwhile, just
> that you need to prioritize.
> 
> But what the hell do I know.  I don't run a *CPAN* mirror, so I must be
> freaking clueless...

Oh, don't be such a drama queen. I rebuilt and helped run nic.funet.fi for 2 years which is the canonical mirror for a large number of mirrors and the perspective of having a few terabytes spinning in storage changes quite dramatically when you are actually serving a few terabytes to thousands of clients. CPAN grew to be quite a burden on the site not only because of the high demand, but also because of the multitude of small files and I'm sure other mirrors feel similarly burdened. 

The sort of pruning Tim brought up has long been an idea, but with the current and growing size of the archive, something does need to be done to alleviate the burden not only on the canonical mirrors, but also on the random folks who want to grab a local mirror for themselves. In my present work environment, 12gb isn't a lot of disk space, but it's a lot considering I don't need to install perl modules daily and the vast majority of it I'll likely never use. It would be a kindness to both the mirror operators and to the end-users to trim it down to a manageable size. 

As for efficiency, rsync remains a good tool for the job that works on nearly every platform which is a rather tall order to match with any other solution. Relegating the cruft to BackPAN to make the current CPAN slimmer and less demanding on all fronts is an idea that would be welcomed by more than just mirror ops.

The only snag I can forsee in trimming back on the abundance of modules is the case where some modules have version requirements for other modules where it will barf with a mismatch/newer version of the required module (I bumped into this recently but can't remember exactly which module it was) but I think it's rare and the practise should be discouraged.

e.
0
eashton
3/27/2010 12:59:30 AM
On 26 Mar 2010, at 23:32, Arthur Corliss wrote:
> But it's the weakest and simplest link to replace.


Quite a bit of the discussion here on this topic has revolved around an =
explanation of why that isn't the case. Setting up rsync is trivial for =
mirror operators. Any alternative would likely be less so.

--=20
Andy Armstrong, Hexten



0
andy
3/27/2010 7:45:17 AM
On 27 Mar 2010, at 00:59, Elaine Ashton wrote:
> The only snag I can forsee in trimming back on the abundance of =
modules is the case where some modules have version requirements for =
other modules where it will barf with a mismatch/newer version of the =
required module (I bumped into this recently but can't remember exactly =
which module it was) but I think it's rare and the practise should be =
discouraged.


Maybe that could be solved by having the clients (and maybe =
search.cpan.org) automagically fall back to a backpan mirror?

And, yes, if it's considered a good idea I /am/ prepared to do something =
about it.

--=20
Andy Armstrong, Hexten



0
andy
3/27/2010 7:49:37 AM
On Fri, 26 Mar 2010, Elaine Ashton wrote:

> Oh, don't be such a drama queen. I rebuilt and helped run nic.funet.fi for 2 years which is the canonical mirror for a large number of mirrors and the perspective of having a few terabytes spinning in storage changes quite dramatically when you are actually serving a few terabytes to thousands of clients. CPAN grew to be quite a burden on the site not only because of the high demand, but also because of the multitude of small files and I'm sure other mirrors feel similarly burdened.

Don't be such an arrogant prick.  You guys made baseless assumptions about
people's experience with storage management in an attempt to diregard their
opinions.  That's being a dick by any metric.

> The sort of pruning Tim brought up has long been an idea, but with the current and growing size of the archive, something does need to be done to alleviate the burden not only on the canonical mirrors, but also on the random folks who want to grab a local mirror for themselves. In my present work environment, 12gb isn't a lot of disk space, but it's a lot considering I don't need to install perl modules daily and the vast majority of it I'll likely never use. It would be a kindness to both the mirror operators and to the end-users to trim it down to a manageable size.

I think I was quite explicit in saying that efficiencies should be pursued
in multiple areas, but the predominant bitch I took away from your thread
dealt with the burden of synchronizing mirrors.  What's the easiest way to
address that pain?  I don't believe it's your method.  I'd look into the
size issue *after* you address the incredible inefficiencies of a simple
rsync.

> As for efficiency, rsync remains a good tool for the job that works on nearly every platform which is a rather tall order to match with any other solution. Relegating the cruft to BackPAN to make the current CPAN slimmer and less demanding on all fronts is an idea that would be welcomed by more than just mirror ops.

Rsync is an excellent tool for smaller file sets.  I use it to sync my own
mirrors, those mirrors are typically ~10k files.  Am I surprised that it
doesn't scale when you're stat'ing every single file?  No.  Which is why
alternatives should be considered.  A simple FTP client playing a
transaction log forward is trivial.

I maintain several mirrors, most with rsync.  But that's with a clear
understanding of the size of the file set.  Use the right tool for the job.
And it seems apparent to me that rsync isn't the right tool for ~200k files.

> The only snag I can forsee in trimming back on the abundance of modules is the case where some modules have version requirements for other modules where it will barf with a mismatch/newer version of the required module (I bumped into this recently but can't remember exactly which module it was) but I think it's rare and the practise should be discouraged.

Try doing a simple cost-benefit analysis.  What you guys are proposing will
help.  But not as much as simpler alternatives.  Like replacing rsync with a
perl script and modifying PAUSE to log the transactions.

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/27/2010 6:52:05 PM
  > Oh, I understand that fully.  And I'd be happy to lend some of my 
time.  But
> you don't make people inclined to help when people are lobbing snarky
> comments like "we'll wait breathlessly for you to do it."

The time-honored tradition of many open source communities is to talk. 
And talk.  And talk.  The problem is that this solves nothing.  To do, does.

You are free to decide to take this as a personal insult.

0
jhi
3/27/2010 9:40:58 PM
# from Arthur Corliss
# on Saturday 27 March 2010 12:52:

>...should it appear that we have some kind of elitist cabal that will
>make their decision in isolation.

More likely there will not be some decision made because there will be 
no action taken.

>If that's going to be the case then this should have never been raised
>on an open forum like the module author's list.

I'll agree, but not for that reason.


# from Shlomi Fish on Tuesday 23 March 2010 02:14:
>>>>So I've been thinking that maybe we should trim the CPAN and remove
>>>>older versions like that so it will contain much less cruft. What do
>>>>you think?

   I think I am not going to take the trouble to delete anything and
   don't want anybody doing it on my behalf.  Thanks for asking.


Though the list has, as usual, moved on from that question to something 
which is off-topic for module authors.

>Quite frankly, at times some discussions on this list fail the concept
> of a technical meritocracy, and tend towards an established
> aristocracy. 

And you should win stuff for reading it!

--Eric
-- 
Don't worry about what anybody else is going to do. The best way to
predict the future is to invent it.
--Alan Kay
---------------------------------------------------
    http://scratchcomputing.com
---------------------------------------------------
0
enobacon
3/27/2010 11:24:43 PM
On Mar 27, 2010, at 2:52 PM, Arthur Corliss wrote:
> 
> Don't be such an arrogant prick.  You guys made baseless assumptions about
> people's experience with storage management in an attempt to diregard their
> opinions.  That's being a dick by any metric.

Actually, I thought I was merely offering my opinion both as the sysadmin for the canonical CPAN mothership and as an end-user. If that makes me a prick, well, I suppose I should go out and buy one :) 

> I think I was quite explicit in saying that efficiencies should be pursued
> in multiple areas, but the predominant bitch I took away from your thread
> dealt with the burden of synchronizing mirrors.  What's the easiest way to
> address that pain?  I don't believe it's your method.  I'd look into the
> size issue *after* you address the incredible inefficiencies of a simple
> rsync.

And you're disregarding a considerable problem that rsync is a well-established tool for mirroring that is easy to use and works on a very wide range of platforms. Asking mirror ops to adopt a new tool for mirroring one mirror, when they often have several or more, likely won't be met with much enthusiasm and would create two tiers of CPAN mirrors, those using rsync and those not, which would not only complicate something which should remain simple but, again, doesn't address the size of the archive and the multitude of small files that are always a consideration no matter what you're serving them up with.

> Rsync is an excellent tool for smaller file sets.  I use it to sync my own
> mirrors, those mirrors are typically ~10k files.  Am I surprised that it
> doesn't scale when you're stat'ing every single file?  No.  Which is why
> alternatives should be considered.  A simple FTP client playing a
> transaction log forward is trivial.

FTP? It's 2010 and very few corp firewalls allow ftp in or out. I can't remember the last time I even used ftp come to think of it. I had to go through 2 layers of network red tape just to get rsync for a particular system I wanted to mirror CPAN to at work. Asking for FTP would have been met with a big no or a cackle, depending on which of the nyetwork masters got the request first.

> Try doing a simple cost-benefit analysis.  What you guys are proposing will
> help.  But not as much as simpler alternatives.  Like replacing rsync with a
> perl script and modifying PAUSE to log the transactions.

How is replacing rsync, a standard and widely used tool, simpler for mirror ops? I suppose I don't understand the opposition to trimming off the obvious cruft on CPAN to lighten the load when BackPAN exists to archive them. There is already CPAN::Mini (which was created back when CPAN was an ever-so-tiny 1.2GB) so it's not as though lightening the load is a new idea or an unwelcome one.

e.
0
eashton
3/28/2010 1:38:16 AM
SSBhZ3JlZSB3aXRoIEVsYWluZSANCkkgY2FuJ3QgZ2V0IHJzeW5jIHRocm91Z2ggdGhlIGZpcmV3
YWxsIGF0IHdvcmsuIE5vdCBldmVuIHR1bm5lbGVkLiANCkZvciBDUEFOIEkgdXNlIENQQU46Ok1p
bmkuIEl0IHVzZXMgaHR0cCBhbmQgaXQgZG9lcyB0aGUgam9iIHRob3VnaCBpdCBkb2VzIGZvcmNl
IHRoZSBsb2NhbCBDUEFOIHRvIGJsZWFkLiBNeSBsb2NhbCBzb2x1dGlvbiB0byBvdGhlciB0aGlu
Z3Mgd2UgbmVlZCBzdWNoIGFzIEJsYXN0d2F2ZSAod2UgcnVuIFNvbGFyaXMpIEkgaGF2ZSBhIHNw
ZWNpYWwgc3F1aWQgcHJveHkgd2l0aCByZXN0cmljdGl2ZSBhY2xzIGxvdHMgb2YgZGlzayBzcGFj
ZSBhbmQgbG9uZyByZXRlbnRpb24uIFRoYXQgbWVhbnMgd2Ugb25seSBkb3dubG9hZCBhbnkgcGFj
a2FnZSBvbmNlOiB3aGF0IG5lZWRlZCB3aGVuIG5lZWRlZC4gVGhhdCBkb2VzIGNyZWF0ZSBhIHBy
b2JsZW0gb2YgdXNpbmcgYmFuZHdpZHRoIGR1cmluZyB0aGUgZGF5IGJ1dCBpdCB3b3JrcyBvdXQg
b2sgaW4gdGhlIGVuZC4gDQoNClJzeW5jIGlzIGJldHRlciB0aGFuIGhhdmluZyB0byBoYWNrIHJl
dmVyc2UgY2FjaGluZyBwcm94aWVzIGZvciBlYWNoIHNpdGUgb2YgaW50ZXJlc3QuIA0KDQpTZW50
IGZyb20gbXkgQmxhY2tCZXJyea4gc21hcnRwaG9uZSB3aXRoIE5leHRlbCBEaXJlY3QgQ29ubmVj
dA0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogRWxhaW5lIEFzaHRvbiA8ZWFz
aHRvbkBtYWMuY29tPg0KRGF0ZTogU2F0LCAyNyBNYXIgMjAxMCAyMTozODoxNiANClRvOiBBcnRo
dXIgQ29ybGlzczxjb3JsaXNzQGRpZ2l0YWxtYWdlcy5jb20+DQpDYzogRWxhaW5lIEFzaHRvbjxl
YXNodG9uQG1hYy5jb20+OyA8Y3Bhbi13b3JrZXJzQHBlcmwub3JnPjsgPG1vZHVsZS1hdXRob3Jz
QHBlcmwub3JnPg0KU3ViamVjdDogUmU6IFRyaW1taW5nIHRoZSBDUEFOIC0gIkF1dG9tYXRpYyBQ
dXJnaW5nIg0KDQoNCk9uIE1hciAyNywgMjAxMCwgYXQgMjo1MiBQTSwgQXJ0aHVyIENvcmxpc3Mg
d3JvdGU6DQo+IA0KPiBEb24ndCBiZSBzdWNoIGFuIGFycm9nYW50IHByaWNrLiAgWW91IGd1eXMg
bWFkZSBiYXNlbGVzcyBhc3N1bXB0aW9ucyBhYm91dA0KPiBwZW9wbGUncyBleHBlcmllbmNlIHdp
dGggc3RvcmFnZSBtYW5hZ2VtZW50IGluIGFuIGF0dGVtcHQgdG8gZGlyZWdhcmQgdGhlaXINCj4g
b3BpbmlvbnMuICBUaGF0J3MgYmVpbmcgYSBkaWNrIGJ5IGFueSBtZXRyaWMuDQoNCkFjdHVhbGx5
LCBJIHRob3VnaHQgSSB3YXMgbWVyZWx5IG9mZmVyaW5nIG15IG9waW5pb24gYm90aCBhcyB0aGUg
c3lzYWRtaW4gZm9yIHRoZSBjYW5vbmljYWwgQ1BBTiBtb3RoZXJzaGlwIGFuZCBhcyBhbiBlbmQt
dXNlci4gSWYgdGhhdCBtYWtlcyBtZSBhIHByaWNrLCB3ZWxsLCBJIHN1cHBvc2UgSSBzaG91bGQg
Z28gb3V0IGFuZCBidXkgb25lIDopIA0KDQo+IEkgdGhpbmsgSSB3YXMgcXVpdGUgZXhwbGljaXQg
aW4gc2F5aW5nIHRoYXQgZWZmaWNpZW5jaWVzIHNob3VsZCBiZSBwdXJzdWVkDQo+IGluIG11bHRp
cGxlIGFyZWFzLCBidXQgdGhlIHByZWRvbWluYW50IGJpdGNoIEkgdG9vayBhd2F5IGZyb20geW91
ciB0aHJlYWQNCj4gZGVhbHQgd2l0aCB0aGUgYnVyZGVuIG9mIHN5bmNocm9uaXppbmcgbWlycm9y
cy4gIFdoYXQncyB0aGUgZWFzaWVzdCB3YXkgdG8NCj4gYWRkcmVzcyB0aGF0IHBhaW4/ICBJIGRv
bid0IGJlbGlldmUgaXQncyB5b3VyIG1ldGhvZC4gIEknZCBsb29rIGludG8gdGhlDQo+IHNpemUg
aXNzdWUgKmFmdGVyKiB5b3UgYWRkcmVzcyB0aGUgaW5jcmVkaWJsZSBpbmVmZmljaWVuY2llcyBv
ZiBhIHNpbXBsZQ0KPiByc3luYy4NCg0KQW5kIHlvdSdyZSBkaXNyZWdhcmRpbmcgYSBjb25zaWRl
cmFibGUgcHJvYmxlbSB0aGF0IHJzeW5jIGlzIGEgd2VsbC1lc3RhYmxpc2hlZCB0b29sIGZvciBt
aXJyb3JpbmcgdGhhdCBpcyBlYXN5IHRvIHVzZSBhbmQgd29ya3Mgb24gYSB2ZXJ5IHdpZGUgcmFu
Z2Ugb2YgcGxhdGZvcm1zLiBBc2tpbmcgbWlycm9yIG9wcyB0byBhZG9wdCBhIG5ldyB0b29sIGZv
ciBtaXJyb3Jpbmcgb25lIG1pcnJvciwgd2hlbiB0aGV5IG9mdGVuIGhhdmUgc2V2ZXJhbCBvciBt
b3JlLCBsaWtlbHkgd29uJ3QgYmUgbWV0IHdpdGggbXVjaCBlbnRodXNpYXNtIGFuZCB3b3VsZCBj
cmVhdGUgdHdvIHRpZXJzIG9mIENQQU4gbWlycm9ycywgdGhvc2UgdXNpbmcgcnN5bmMgYW5kIHRo
b3NlIG5vdCwgd2hpY2ggd291bGQgbm90IG9ubHkgY29tcGxpY2F0ZSBzb21ldGhpbmcgd2hpY2gg
c2hvdWxkIHJlbWFpbiBzaW1wbGUgYnV0LCBhZ2FpbiwgZG9lc24ndCBhZGRyZXNzIHRoZSBzaXpl
IG9mIHRoZSBhcmNoaXZlIGFuZCB0aGUgbXVsdGl0dWRlIG9mIHNtYWxsIGZpbGVzIHRoYXQgYXJl
IGFsd2F5cyBhIGNvbnNpZGVyYXRpb24gbm8gbWF0dGVyIHdoYXQgeW91J3JlIHNlcnZpbmcgdGhl
bSB1cCB3aXRoLg0KDQo+IFJzeW5jIGlzIGFuIGV4Y2VsbGVudCB0b29sIGZvciBzbWFsbGVyIGZp
bGUgc2V0cy4gIEkgdXNlIGl0IHRvIHN5bmMgbXkgb3duDQo+IG1pcnJvcnMsIHRob3NlIG1pcnJv
cnMgYXJlIHR5cGljYWxseSB+MTBrIGZpbGVzLiAgQW0gSSBzdXJwcmlzZWQgdGhhdCBpdA0KPiBk
b2Vzbid0IHNjYWxlIHdoZW4geW91J3JlIHN0YXQnaW5nIGV2ZXJ5IHNpbmdsZSBmaWxlPyAgTm8u
ICBXaGljaCBpcyB3aHkNCj4gYWx0ZXJuYXRpdmVzIHNob3VsZCBiZSBjb25zaWRlcmVkLiAgQSBz
aW1wbGUgRlRQIGNsaWVudCBwbGF5aW5nIGENCj4gdHJhbnNhY3Rpb24gbG9nIGZvcndhcmQgaXMg
dHJpdmlhbC4NCg0KRlRQPyBJdCdzIDIwMTAgYW5kIHZlcnkgZmV3IGNvcnAgZmlyZXdhbGxzIGFs
bG93IGZ0cCBpbiBvciBvdXQuIEkgY2FuJ3QgcmVtZW1iZXIgdGhlIGxhc3QgdGltZSBJIGV2ZW4g
dXNlZCBmdHAgY29tZSB0byB0aGluayBvZiBpdC4gSSBoYWQgdG8gZ28gdGhyb3VnaCAyIGxheWVy
cyBvZiBuZXR3b3JrIHJlZCB0YXBlIGp1c3QgdG8gZ2V0IHJzeW5jIGZvciBhIHBhcnRpY3VsYXIg
c3lzdGVtIEkgd2FudGVkIHRvIG1pcnJvciBDUEFOIHRvIGF0IHdvcmsuIEFza2luZyBmb3IgRlRQ
IHdvdWxkIGhhdmUgYmVlbiBtZXQgd2l0aCBhIGJpZyBubyBvciBhIGNhY2tsZSwgZGVwZW5kaW5n
IG9uIHdoaWNoIG9mIHRoZSBueWV0d29yayBtYXN0ZXJzIGdvdCB0aGUgcmVxdWVzdCBmaXJzdC4N
Cg0KPiBUcnkgZG9pbmcgYSBzaW1wbGUgY29zdC1iZW5lZml0IGFuYWx5c2lzLiAgV2hhdCB5b3Ug
Z3V5cyBhcmUgcHJvcG9zaW5nIHdpbGwNCj4gaGVscC4gIEJ1dCBub3QgYXMgbXVjaCBhcyBzaW1w
bGVyIGFsdGVybmF0aXZlcy4gIExpa2UgcmVwbGFjaW5nIHJzeW5jIHdpdGggYQ0KPiBwZXJsIHNj
cmlwdCBhbmQgbW9kaWZ5aW5nIFBBVVNFIHRvIGxvZyB0aGUgdHJhbnNhY3Rpb25zLg0KDQpIb3cg
aXMgcmVwbGFjaW5nIHJzeW5jLCBhIHN0YW5kYXJkIGFuZCB3aWRlbHkgdXNlZCB0b29sLCBzaW1w
bGVyIGZvciBtaXJyb3Igb3BzPyBJIHN1cHBvc2UgSSBkb24ndCB1bmRlcnN0YW5kIHRoZSBvcHBv
c2l0aW9uIHRvIHRyaW1taW5nIG9mZiB0aGUgb2J2aW91cyBjcnVmdCBvbiBDUEFOIHRvIGxpZ2h0
ZW4gdGhlIGxvYWQgd2hlbiBCYWNrUEFOIGV4aXN0cyB0byBhcmNoaXZlIHRoZW0uIFRoZXJlIGlz
IGFscmVhZHkgQ1BBTjo6TWluaSAod2hpY2ggd2FzIGNyZWF0ZWQgYmFjayB3aGVuIENQQU4gd2Fz
IGFuIGV2ZXItc28tdGlueSAxLjJHQikgc28gaXQncyBub3QgYXMgdGhvdWdoIGxpZ2h0ZW5pbmcg
dGhlIGxvYWQgaXMgYSBuZXcgaWRlYSBvciBhbiB1bndlbGNvbWUgb25lLg0KDQplLg0K

0
dhudes
3/28/2010 2:16:29 AM
>>>>> On Sat, 27 Mar 2010 16:44:49 -0800 (AKDT), Arthur Corliss <corliss@digitalmages.com> said:

  > On Sat, 27 Mar 2010, Jarkko Hietaniemi wrote:
 >> The time-honored tradition of many open source communities is to
 >> talk. And talk.  And talk.  The problem is that this solves nothing.
 >> To do, does.
 >> 
 >> You are free to decide to take this as a personal insult.

  > I didn't take it as an insult, I took it as what it was -- a dodge.  You
  > already have your minds made up and are not willing to evaluate options
  > on their merits.

Says the author of a module named Paranoid. A lovely coincidence.

  > Let's just be honest about what's going on here.

If you want to study the CPAN "checkpointed logs" solution running on
the very CPAN for exactly one year now: File::Rsync::Mirror::Recent

What needs to be done is really extremely trivial: rewrite it in C and
convince the rsync people to incoude it in rsync code base. Just that.

So are you a taker, Arthur?

-- 
andreas
0
andreas
3/28/2010 4:02:14 AM
On Sat, 27 Mar 2010, Elaine Ashton wrote:

> Actually, I thought I was merely offering my opinion both as the sysadmin for the canonical CPAN mothership and as an end-user. If that makes me a prick, well, I suppose I should go out and buy one :)

:-) You'll have to pardon my indiscriminate epithets.  The barbs are coming
from multiple directions.  My point still stands, however.  Your experience,
however worthy, has zero bearing on whether or not my experience is
just as worthy.  Even moreso when you guys have zero clue who you're talking
to.  And you shouldn't have to know.  I would have thought simple communal 
and professional courtesy would be extended and all points considered in 
earnest.  Which does not appear to be the case.

> And you're disregarding a considerable problem that rsync is a well-established tool for mirroring that is easy to use and works on a very wide range of platforms. Asking mirror ops to adopt a new tool for mirroring one mirror, when they often have several or more, likely won't be met with much enthusiasm and would create two tiers of CPAN mirrors, those using rsync and those not, which would not only complicate something which should remain simple but, again, doesn't address the size of the archive and the multitude of small files that are always a consideration no matter what you're serving them up with.

Ah, you're one of them.  All objects look like nails when all you have is a
hammer, eh?  Rsync is a good tool, but like Perl, it isn't the perfect tool
for all tasks.  You've obviously exceeded what the tool was designed for,
it's only logical to look for (or write) another tool.  Ironically, what I'm 
suggesting is so basic that rsync can be replaced by a script which will 
likely run on every mirror out there with no more fuss than rsync.

> FTP? It's 2010 and very few corp firewalls allow ftp in or out. I can't remember the last time I even used ftp come to think of it. I had to go through 2 layers of network red tape just to get rsync for a particular system I wanted to mirror CPAN to at work. Asking for FTP would have been met with a big no or a cackle, depending on which of the nyetwork masters got the request first.

Sounds like you may be hamstrung by your own bureacracy, but that's rarely
the case in most the places I've worked.  Not to mention that between
passive mode FTP or even using an HTTP proxy (most of which support FTP
requests) what I'm proposing is relatively painless, simple, and easy to
secure.  This concern I suspect is a non-issue for most mirror operators.
Even if it was, allow them to pull it via HTTP for all I care.  Either one
is significantly more efficient than rsync.

> How is replacing rsync, a standard and widely used tool, simpler for mirror ops? I suppose I don't understand the opposition to trimming off the obvious cruft on CPAN to lighten the load when BackPAN exists to archive them. There is already CPAN::Mini (which was created back when CPAN was an ever-so-tiny 1.2GB) so it's not as though lightening the load is a new idea or an unwelcome one.

I'm not opposed to trimming the cruft, but I am opposed to ignorant
knee-jerk reactions bereft of any empirical data (or at least you haven't
shared).  The cruft, while being cruft, isn't inherently evil.  You have a
basic I/O and state problem.  And the I/O generated is predominantly caused 
by rsync trying to (re)assemble state on the file set, *per* request.  More
appallingly, most of that state image being generated is state that hasn't
changed in quite awhile.  Literally years in many cases.  So why are we
wasting cycles & I/O performing massively redundant work?

That's why having PAUSE implement a transaction log, and perhaps a cron job
on the master server doing daily checkpointed file manifests is so much more
efficient.  An in-sync mirror only needs to download the lastest transaction
logs and play them forward (delete certain files, download others, etc).
And, gee, just about every author on the list could write *that* sync agent
in an evening.  Out-of-sync mirrors can start by working off the checkpoint
manifest, get what's missing, and rolling forward.

What you're overlooking is that CPAN has, and will, continue to grow.  Even 
if you remove the cruft now at some point it might grow to the same size 
just with fresh files.  When that happens, you're right back where you are 
now.  Rsync can't cut it, it wasn't designed for this.

Whether you like it or not, even on a pared down CPAN rsync is easily your
most inefficient process on the server.  If you're not willing to optimize
that, then you really don't care about optimization at all.

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/28/2010 4:52:22 AM
You are misunderstanding the problem of changing the mirroring =
mechanism.

Making new software is nice and good -- Andreas already has something =
that's better for the PAUSE data.

Getting 1000s of mirrors to use your software (rather than rsync which =
they use for ALL OTHER mirrors -- not so easy.=
0
ask
3/28/2010 9:45:36 AM
On Mar 28, 2010, at 12:52 AM, Arthur Corliss wrote:
>=20
> :-) You'll have to pardon my indiscriminate epithets.  The barbs are =
coming
> from multiple directions.  My point still stands, however.  Your =
experience,
> however worthy, has zero bearing on whether or not my experience is
> just as worthy.  Even moreso when you guys have zero clue who you're =
talking
> to.  And you shouldn't have to know.  I would have thought simple =
communal and professional courtesy would be extended and all points =
considered in earnest.  Which does not appear to be the case.

I'm not sending any barbs, only my reasonable opinion borne from years =
on the reality-based operations side of this equation. As for who you =
are, it doesn't matter as I work daily with those who wrote, and =
continue to write, large chunks of operating systems, X, etc., and =
though their legend may precede them when it comes to my having to =
implement what works fabulously in their imagination, I do my best to =
bring them back to the grim reality that is operations. It's a frequent =
problem of engineers and those of us stuck having to live with and fix =
their grand ideas. Lofty goals usually die somewhere between dreams and =
production.=20

> Ah, you're one of them.  All objects look like nails when all you have =
is a
> hammer, eh?  Rsync is a good tool, but like Perl, it isn't the perfect =
tool
> for all tasks.  You've obviously exceeded what the tool was designed =
for,
> it's only logical to look for (or write) another tool.  Ironically, =
what I'm suggesting is so basic that rsync can be replaced by a script =
which will likely run on every mirror out there with no more fuss than =
rsync.

Well, you'll have to forgive those who mock your n=E4ivete as if it were =
so basic and trivial to replace rsync, it would have been done several =
times over by now as it's limitations are well known to all who use it =
on any large scale. However, it is a well-known, well-used, =
multi-platform and time-tested tool that will not be unseated very =
easily without good reason and a reason that reads something along the =
lines of improving performance on an archive that should have been =
trimmed back a bit is not a compelling reason for adoption.=20

> What you're overlooking is that CPAN has, and will, continue to grow.  =
Even if you remove the cruft now at some point it might grow to the same =
size just with fresh files.  When that happens, you're right back where =
you are now.  Rsync can't cut it, it wasn't designed for this.

And this is a good point to make, yes, it will continue to grow and I =
know that the current manager(s) of nic.funet.fi have commented on the =
burden it presents to the system which is also home to a number of other =
mirrors. You cannot assume that the generosity and the resources of the =
mirror ops are limitless and finding out where that limit lies will come =
too late to make amends.=20

Pruning back the archive is a good compromise until and unless another =
solution can be done that will not bother the mirror ops terribly much =
in terms of real work.

e.=
0
eashton
3/28/2010 2:13:45 PM
The entire point of rsync is to send only changes.
Therefore once your mirror initially syncs the old versions of modules is
not the issue. Indeed, removing the old versions would present additional
burden on synchronization! The ongoing burden is the ever-growing CPAN.

The danger in a CPAN::Mini and in removing old versions is that one is
assuming that the latest and greatest is the one to use. This is false.
Take the case of someone running old software. I personally support
systems still running Informix Dyanmic Server 7.31 as well as systems
running the latest IDS 11.5 build. We have Perl code that talks to IDS. If
DBD::Informix withdrew support for IDS 7.31 I would need both the last
version that supported it as well as the current.  I can get away with
upgrading Perl, maybe, but to upgrade the dbms is much more problematic
(license, for one thing; SQL changes another).



0
dhudes
3/28/2010 2:28:48 PM
On Sunday 28 Mar 2010 17:28:48 dhudes@hudes.org wrote:
> The entire point of rsync is to send only changes.
> Therefore once your mirror initially syncs the old versions of modules is
> not the issue. Indeed, removing the old versions would present additional
> burden on synchronization! The ongoing burden is the ever-growing CPAN.
> 
> The danger in a CPAN::Mini and in removing old versions is that one is
> assuming that the latest and greatest is the one to use. This is false.
> Take the case of someone running old software. I personally support
> systems still running Informix Dyanmic Server 7.31 as well as systems
> running the latest IDS 11.5 build. We have Perl code that talks to IDS. If
> DBD::Informix withdrew support for IDS 7.31 I would need both the last
> version that supported it as well as the current.  I can get away with
> upgrading Perl, maybe, but to upgrade the dbms is much more problematic
> (license, for one thing; SQL changes another).

You can always get the old versions from the Backpan, which keeps all 
historical versions - so it's a non-issue.

Regards,

	Shlomi Fish

-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Best Introductory Programming Language - http://shlom.in/intro-lang

Deletionists delete Wikipedia articles that they consider lame.
Chuck Norris deletes deletionists whom he considers lame.

Please reply to list if it's a mailing list post - http://shlom.in/reply .
0
shlomif
3/28/2010 3:31:51 PM
> -----Original Message-----
> From: Shlomi Fish [mailto:shlomif@iglu.org.il]
> Sent: Sunday, March 28, 2010 6:32 PM
> To: module-authors@perl.org
> Cc: dhudes@hudes.org
> Subject: Re: Trimming the CPAN - "Automatic Purging"
> 
> On Sunday 28 Mar 2010 17:28:48 dhudes@hudes.org wrote:

> > The danger in a CPAN::Mini and in removing old versions is that one is
> > assuming that the latest and greatest is the one to use. This is false.

Ok, what about this project then?

http://cp5.5.3an.barnyard.co.uk/

0
burakgursoy
3/28/2010 4:14:09 PM
On Sat, Mar 27, 2010 at 08:52:22PM -0800, Arthur Corliss wrote:
> On Sat, 27 Mar 2010, Elaine Ashton wrote:
> 
> >Actually, I thought I was merely offering my opinion both as the sysadmin 
> >for the canonical CPAN mothership and as an end-user. If that makes me a 
> >prick, well, I suppose I should go out and buy one :)
> 
> :-) You'll have to pardon my indiscriminate epithets.  The barbs are coming
> from multiple directions.  My point still stands, however.  Your experience,
> however worthy, has zero bearing on whether or not my experience is
> just as worthy.  Even moreso when you guys have zero clue who you're talking

Are you running a large public mirror site, where you don't even have
knowledge of who is mirroring from you?

(Not even knowledge, let alone channels of communication with, let alone
control over)

Because (as I see it, not having done any of this) the logistics of that is
going to have as much bearing on trying to change protocols as the actual
technical merits of the protocol itself.

Most of the cost of rsync is an externality to the clients. If one has an
existing mirror, one is using rsync to keep it up to date, what's the
incentive to change?

> Sounds like you may be hamstrung by your own bureacracy, but that's rarely
> the case in most the places I've worked.  Not to mention that between
> passive mode FTP or even using an HTTP proxy (most of which support FTP
> requests) what I'm proposing is relatively painless, simple, and easy to
> secure.  This concern I suspect is a non-issue for most mirror operators.
> Even if it was, allow them to pull it via HTTP for all I care.  Either one
> is significantly more efficient than rsync.

I'm missing something here, I suspect. How can HTTP be more efficient than
rsync? The only obvious method to me of mirroring a CPAN site by HTTP is to
instruct a client (such as wget) to get it all. In which case, in the course
of doing this the client is going to recurse over the entire directory tree
of the server, which, I thought, was functionally equivalent to the behaviour
of the rsync server.

Nicholas Clark
0
nick
3/28/2010 4:20:34 PM
--Apple-Mail-1--570095498
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On 2010-03-28, at 9:13 AM, Elaine Ashton wrote:

> On Mar 28, 2010, at 12:52 AM, Arthur Corliss wrote:
>=20
>> What you're overlooking is that CPAN has, and will, continue to grow. =
 Even if you remove the cruft now at some point it might grow to the =
same size just with fresh files.  When that happens, you're right back =
where you are now.  Rsync can't cut it, it wasn't designed for this.
>=20
> And this is a good point to make, yes, it will continue to grow and I =
know that the current manager(s) of nic.funet.fi have commented on the =
burden it presents to the system which is also home to a number of other =
mirrors. You cannot assume that the generosity and the resources of the =
mirror ops are limitless and finding out where that limit lies will come =
too late to make amends.=20
>=20
> Pruning back the archive is a good compromise until and unless another =
solution can be done that will not bother the mirror ops terribly much =
in terms of real work.
>=20
> e.

Has some sort of disk quota system for CPAN author accounts ever been =
considered?

--=20
best regards,
Randy


--Apple-Mail-1--570095498
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><div><div>On 2010-03-28, at 9:13 AM, Elaine Ashton =
wrote:</div><br><blockquote type=3D"cite"><div>On Mar 28, 2010, at 12:52 =
AM, Arthur Corliss wrote:<br><font class=3D"Apple-style-span" =
color=3D"#006312"><br></font><blockquote type=3D"cite">What you're =
overlooking is that CPAN has, and will, continue to grow. &nbsp;Even if =
you remove the cruft now at some point it might grow to the same size =
just with fresh files. &nbsp;When that happens, you're right back where =
you are now. &nbsp;Rsync can't cut it, it wasn't designed for =
this.<br></blockquote><br>And this is a good point to make, yes, it will =
continue to grow and I know that the current manager(s) of nic.funet.fi =
have commented on the burden it presents to the system which is also =
home to a number of other mirrors. You cannot assume that the generosity =
and the resources of the mirror ops are limitless and finding out where =
that limit lies will come too late to make amends. <br><br>Pruning back =
the archive is a good compromise until and unless another solution can =
be done that will not bother the mirror ops terribly much in terms of =
real work.<br><br>e.</div></blockquote></div><br><div>Has some sort of =
disk quota system for CPAN author accounts ever been =
considered?</div><div><br></div><div>--&nbsp;</div><div>best =
regards,</div><div>Randy</div><div><br></div></body></html>=

--Apple-Mail-1--570095498--
0
randy
3/28/2010 4:48:00 PM
--286030772-1808401713-1269795049=:21301
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Sun, 28 Mar 2010, Ask Bj=F8rn Hansen wrote:

> You are misunderstanding the problem of changing the mirroring mechanism.

I am not misunderstanding, I'm just willing to accept the reality for what
it is.  Rsync does not scale.  Period.

> Making new software is nice and good -- Andreas already has something tha=
t's better for the PAUSE data.

<G>  That makes my point all the more compelling, then.  Some of the work
has already been done.

> Getting 1000s of mirrors to use your software (rather than rsync which th=
ey use for ALL OTHER mirrors -- not so easy.

Perhaps, but it's also possible that it might not be as bad as you think,
either.  You have a strong case to be made that the entire ecosystem
benefits from making this change (particularly in a tiered mirroring
environment), and I'd be surprised if the majority of the mirror operators=
=20
aren't sympathetic and cooperative.  As a sys-admin I watch my SAR reports
like a hawk, I'm sure they're no different.

And that's not to say you have to eliminate rsync.  If you can get half of
them to stop, you'll still have some significant long term gains.

 =09--Arthur Corliss
 =09  Live Free or Die
--286030772-1808401713-1269795049=:21301--
0
corliss
3/28/2010 4:50:49 PM
QnV0IHlvdSBjYW4ndCB1c2UgQ1BBTi5wbSBvbiB0aGUgQmFja3Bhbi4gDQoNCi0tLS0tLU9yaWdp
bmFsIE1lc3NhZ2UtLS0tLS0NCkZyb206IFNobG9taSBGaXNoDQpUbzogbW9kdWxlLWF1dGhvcnNA
cGVybC5vcmcNCkNjOiBkaHVkZXNAaHVkZXMub3JnDQpTZW50OiBNYXIgMjgsIDIwMTAgMTE6MzEg
QU0NClN1YmplY3Q6IFJlOiBUcmltbWluZyB0aGUgQ1BBTiAtICJBdXRvbWF0aWMgUHVyZ2luZyIN
Cg0KT24gU3VuZGF5IDI4IE1hciAyMDEwIDE3OjI4OjQ4IGRodWRlc0BodWRlcy5vcmcgd3JvdGU6
DQo+IFRoZSBlbnRpcmUgcG9pbnQgb2YgcnN5bmMgaXMgdG8gc2VuZCBvbmx5IGNoYW5nZXMuDQo+
IFRoZXJlZm9yZSBvbmNlIHlvdXIgbWlycm9yIGluaXRpYWxseSBzeW5jcyB0aGUgb2xkIHZlcnNp
b25zIG9mIG1vZHVsZXMgaXMNCj4gbm90IHRoZSBpc3N1ZS4gSW5kZWVkLCByZW1vdmluZyB0aGUg
b2xkIHZlcnNpb25zIHdvdWxkIHByZXNlbnQgYWRkaXRpb25hbA0KPiBidXJkZW4gb24gc3luY2hy
b25pemF0aW9uISBUaGUgb25nb2luZyBidXJkZW4gaXMgdGhlIGV2ZXItZ3Jvd2luZyBDUEFOLg0K
PiANCj4gVGhlIGRhbmdlciBpbiBhIENQQU46Ok1pbmkgYW5kIGluIHJlbW92aW5nIG9sZCB2ZXJz
aW9ucyBpcyB0aGF0IG9uZSBpcw0KPiBhc3N1bWluZyB0aGF0IHRoZSBsYXRlc3QgYW5kIGdyZWF0
ZXN0IGlzIHRoZSBvbmUgdG8gdXNlLiBUaGlzIGlzIGZhbHNlLg0KPiBUYWtlIHRoZSBjYXNlIG9m
IHNvbWVvbmUgcnVubmluZyBvbGQgc29mdHdhcmUuIEkgcGVyc29uYWxseSBzdXBwb3J0DQo+IHN5
c3RlbXMgc3RpbGwgcnVubmluZyBJbmZvcm1peCBEeWFubWljIFNlcnZlciA3LjMxIGFzIHdlbGwg
YXMgc3lzdGVtcw0KPiBydW5uaW5nIHRoZSBsYXRlc3QgSURTIDExLjUgYnVpbGQuIFdlIGhhdmUg
UGVybCBjb2RlIHRoYXQgdGFsa3MgdG8gSURTLiBJZg0KPiBEQkQ6OkluZm9ybWl4IHdpdGhkcmV3
IHN1cHBvcnQgZm9yIElEUyA3LjMxIEkgd291bGQgbmVlZCBib3RoIHRoZSBsYXN0DQo+IHZlcnNp
b24gdGhhdCBzdXBwb3J0ZWQgaXQgYXMgd2VsbCBhcyB0aGUgY3VycmVudC4gIEkgY2FuIGdldCBh
d2F5IHdpdGgNCj4gdXBncmFkaW5nIFBlcmwsIG1heWJlLCBidXQgdG8gdXBncmFkZSB0aGUgZGJt
cyBpcyBtdWNoIG1vcmUgcHJvYmxlbWF0aWMNCj4gKGxpY2Vuc2UsIGZvciBvbmUgdGhpbmc7IFNR
TCBjaGFuZ2VzIGFub3RoZXIpLg0KDQpZb3UgY2FuIGFsd2F5cyBnZXQgdGhlIG9sZCB2ZXJzaW9u
cyBmcm9tIHRoZSBCYWNrcGFuLCB3aGljaCBrZWVwcyBhbGwgDQpoaXN0b3JpY2FsIHZlcnNpb25z
IC0gc28gaXQncyBhIG5vbi1pc3N1ZS4NCg0KUmVnYXJkcywNCg0KCVNobG9taSBGaXNoDQoNCi0t
IA0KLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0NClNobG9taSBGaXNoICAgICAgIGh0dHA6Ly93d3cuc2hsb21pZmlzaC5vcmcv
DQpCZXN0IEludHJvZHVjdG9yeSBQcm9ncmFtbWluZyBMYW5ndWFnZSAtIGh0dHA6Ly9zaGxvbS5p
bi9pbnRyby1sYW5nDQoNCkRlbGV0aW9uaXN0cyBkZWxldGUgV2lraXBlZGlhIGFydGljbGVzIHRo
YXQgdGhleSBjb25zaWRlciBsYW1lLg0KQ2h1Y2sgTm9ycmlzIGRlbGV0ZXMgZGVsZXRpb25pc3Rz
IHdob20gaGUgY29uc2lkZXJzIGxhbWUuDQoNClBsZWFzZSByZXBseSB0byBsaXN0IGlmIGl0J3Mg
YSBtYWlsaW5nIGxpc3QgcG9zdCAtIGh0dHA6Ly9zaGxvbS5pbi9yZXBseSAuDQoNCg0KU2VudCBm
cm9tIG15IEJsYWNrQmVycnmuIHNtYXJ0cGhvbmUgd2l0aCBOZXh0ZWwgRGlyZWN0IENvbm5lY3Q=


0
dhudes
3/28/2010 4:55:39 PM
VXNlIG9mIHdnZXQgYW5kIGh0dHAgdG8gZG93bmxvYWQgYW4gZW50aXJlIHNpdGUgbWVhbnMgbnVt
ZXJvdXMgVENQIG9wZW5zIGFuZCBIVFRQIEdFVCByZXF1ZXN0cy4gVGhlIGVudGlyZSBwb2ludCBv
ZiByc3luYyBpcyB0aGF0IGl0IGtub3dzIHRoZXJlIGFyZSBudW1lcm91cyBkb3dubG9hZHMuIEl0
IGRvZXMgT05FIG9wZW4uIFRoaXMgYWxsb3dzIFRDUCBzbG93IHN0YXJ0IHRvIHJhbXAgdXAgIA0K
DQpBIG11bHRpLWRvd25sb2FkIHNlc3Npb24gd2l0aCBmdHAgaXMgYWxzbyBlZmZpY2llbnQuIENs
aWVudHMgbGlrZSBuY2Z0cCBoYXZlIGJhdGNoIHRyYW5zZmVyIGJ1aWx0IGluLiBJZiBzZXR0aW5n
IHVwIGFuIGluaXRpYWwgbWlycm9yIHlvdSBtaWdodCBkbyBiZXR0ZXIgd2l0aCBmdHAgYnV0IG1h
aW50YWluaW5nIGl0IGlzIHdoZXJlIHJzeW5jIHJ1bGVzLiANCg0KSSBoYXZlbid0IGxvb2tlZCBj
bG9zZWx5IGJ1dCBJIGhhdmUgdGhlIGltcHJlc3Npb24gZnJvbSB3YXRjaGluZyB3Z2V0IHdvcmsg
dGhhdCB3Z2V0IHVzaW5nIEhUVFA6OkRhdGUgb3BlbnMgdHdvIFRDUCBjb25uZWN0aW9ucyBwZXIg
ZmlsZTogaXQgb3BlbnMgYSBzb2NrZXQgYW5kIGlzc3VlcyBhIHLpcXVlc3QgZm9yIHRpbWVzdGFt
cCB0aGVuIGNsb3NlcyBpdCB0aGVuIG9wZW5zIGEgc29ja2V0IHRvIGlzc3VlIGFuIGh0dHAgR0VU
IGlmIGl0IHdhbnRzIHRoZSBmaWxlLiBUaGVuIGl0IGNsb3NlcyB0aGF0IHNvY2tldCBhbmQgdGhl
IHByb2Nlc3MgcmVwZWF0cyBmb3IgbmV4dCBmaWxlLiBJdCBrZWVwcyBob3BpbmcgZm9yIHRoZSB0
aW1lc3RhbnAgZXZlbiBpZiB0aGUgc2VydmVyIGRvZXNuJ3Qgc3VwcG9ydCBodHRwOjpEYXRlIA0K
DQpSc3luYyBhbmQgZnRwIGFyZSBzdGF0ZWZ1bDsgaHR0cCBpcyBub3QuIEZvciBhYnNvbHV0ZSBn
ZXR0aW5nIG9uZSBmaWxlIGh0dHAgaXMgYmV0dGVyIHNpbmNlIHlvdSBza2lwIHRoZSB3aG9sZSBs
b2dpbiB0aGluZyBhbmQgc2V0dGluZyB1cCBkYXRhIGFuZCBjb250cm9sIHNvY2tldHMuIA0KU28g
YSBDUEFOIGNsaWVudCBzZXNzaW9uIHdpbGwgZG8gYmV0dGVyIHdpdGggYW4gaHR0cCBtaXJyb3I6
IGl0IGdldHMgYSB0YXIuZ3ogb3BlbnMgaXQgdXAgcHJvY2Vzc2VzIGl0IGFuZCB0aGVuIGdvZXMg
YmFjayBtYW55IHNlY29uZHMgZnJvbSBvcmlnaW5hbCByZXF1ZXN0IGZvciB0aGUgZmlyc3QgZGVw
ZW5kZW5jeS4gUmVwZWF0IHVudGlsIGVudGlyZSBkZXBlbmRlbmN5IHRyZWUgaXMgY29tcGxldGVk
IA0KDQpTZW50IGZyb20gbXkgQmxhY2tCZXJyea4gc21hcnRwaG9uZSB3aXRoIE5leHRlbCBEaXJl
Y3QgQ29ubmVjdA0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogTmljaG9sYXMg
Q2xhcmsgPG5pY2tAY2NsNC5vcmc+DQpEYXRlOiBTdW4sIDI4IE1hciAyMDEwIDE3OjIwOjM0IA0K
VG86IEFydGh1ciBDb3JsaXNzPGNvcmxpc3NAZGlnaXRhbG1hZ2VzLmNvbT4NCkNjOiBFbGFpbmUg
QXNodG9uPGVhc2h0b25AbWFjLmNvbT47IDxjcGFuLXdvcmtlcnNAcGVybC5vcmc+OyA8bW9kdWxl
LWF1dGhvcnNAcGVybC5vcmc+DQpTdWJqZWN0OiBSZTogVHJpbW1pbmcgdGhlIENQQU4gLSAiQXV0
b21hdGljIFB1cmdpbmciDQoNCk9uIFNhdCwgTWFyIDI3LCAyMDEwIGF0IDA4OjUyOjIyUE0gLTA4
MDAsIEFydGh1ciBDb3JsaXNzIHdyb3RlOg0KPiBPbiBTYXQsIDI3IE1hciAyMDEwLCBFbGFpbmUg
QXNodG9uIHdyb3RlOg0KPiANCj4gPkFjdHVhbGx5LCBJIHRob3VnaHQgSSB3YXMgbWVyZWx5IG9m
ZmVyaW5nIG15IG9waW5pb24gYm90aCBhcyB0aGUgc3lzYWRtaW4gDQo+ID5mb3IgdGhlIGNhbm9u
aWNhbCBDUEFOIG1vdGhlcnNoaXAgYW5kIGFzIGFuIGVuZC11c2VyLiBJZiB0aGF0IG1ha2VzIG1l
IGEgDQo+ID5wcmljaywgd2VsbCwgSSBzdXBwb3NlIEkgc2hvdWxkIGdvIG91dCBhbmQgYnV5IG9u
ZSA6KQ0KPiANCj4gOi0pIFlvdSdsbCBoYXZlIHRvIHBhcmRvbiBteSBpbmRpc2NyaW1pbmF0ZSBl
cGl0aGV0cy4gIFRoZSBiYXJicyBhcmUgY29taW5nDQo+IGZyb20gbXVsdGlwbGUgZGlyZWN0aW9u
cy4gIE15IHBvaW50IHN0aWxsIHN0YW5kcywgaG93ZXZlci4gIFlvdXIgZXhwZXJpZW5jZSwNCj4g
aG93ZXZlciB3b3J0aHksIGhhcyB6ZXJvIGJlYXJpbmcgb24gd2hldGhlciBvciBub3QgbXkgZXhw
ZXJpZW5jZSBpcw0KPiBqdXN0IGFzIHdvcnRoeS4gIEV2ZW4gbW9yZXNvIHdoZW4geW91IGd1eXMg
aGF2ZSB6ZXJvIGNsdWUgd2hvIHlvdSdyZSB0YWxraW5nDQoNCkFyZSB5b3UgcnVubmluZyBhIGxh
cmdlIHB1YmxpYyBtaXJyb3Igc2l0ZSwgd2hlcmUgeW91IGRvbid0IGV2ZW4gaGF2ZQ0Ka25vd2xl
ZGdlIG9mIHdobyBpcyBtaXJyb3JpbmcgZnJvbSB5b3U/DQoNCihOb3QgZXZlbiBrbm93bGVkZ2Us
IGxldCBhbG9uZSBjaGFubmVscyBvZiBjb21tdW5pY2F0aW9uIHdpdGgsIGxldCBhbG9uZQ0KY29u
dHJvbCBvdmVyKQ0KDQpCZWNhdXNlIChhcyBJIHNlZSBpdCwgbm90IGhhdmluZyBkb25lIGFueSBv
ZiB0aGlzKSB0aGUgbG9naXN0aWNzIG9mIHRoYXQgaXMNCmdvaW5nIHRvIGhhdmUgYXMgbXVjaCBi
ZWFyaW5nIG9uIHRyeWluZyB0byBjaGFuZ2UgcHJvdG9jb2xzIGFzIHRoZSBhY3R1YWwNCnRlY2hu
aWNhbCBtZXJpdHMgb2YgdGhlIHByb3RvY29sIGl0c2VsZi4NCg0KTW9zdCBvZiB0aGUgY29zdCBv
ZiByc3luYyBpcyBhbiBleHRlcm5hbGl0eSB0byB0aGUgY2xpZW50cy4gSWYgb25lIGhhcyBhbg0K
ZXhpc3RpbmcgbWlycm9yLCBvbmUgaXMgdXNpbmcgcnN5bmMgdG8ga2VlcCBpdCB1cCB0byBkYXRl
LCB3aGF0J3MgdGhlDQppbmNlbnRpdmUgdG8gY2hhbmdlPw0KDQo+IFNvdW5kcyBsaWtlIHlvdSBt
YXkgYmUgaGFtc3RydW5nIGJ5IHlvdXIgb3duIGJ1cmVhY3JhY3ksIGJ1dCB0aGF0J3MgcmFyZWx5
DQo+IHRoZSBjYXNlIGluIG1vc3QgdGhlIHBsYWNlcyBJJ3ZlIHdvcmtlZC4gIE5vdCB0byBtZW50
aW9uIHRoYXQgYmV0d2Vlbg0KPiBwYXNzaXZlIG1vZGUgRlRQIG9yIGV2ZW4gdXNpbmcgYW4gSFRU
UCBwcm94eSAobW9zdCBvZiB3aGljaCBzdXBwb3J0IEZUUA0KPiByZXF1ZXN0cykgd2hhdCBJJ20g
cHJvcG9zaW5nIGlzIHJlbGF0aXZlbHkgcGFpbmxlc3MsIHNpbXBsZSwgYW5kIGVhc3kgdG8NCj4g
c2VjdXJlLiAgVGhpcyBjb25jZXJuIEkgc3VzcGVjdCBpcyBhIG5vbi1pc3N1ZSBmb3IgbW9zdCBt
aXJyb3Igb3BlcmF0b3JzLg0KPiBFdmVuIGlmIGl0IHdhcywgYWxsb3cgdGhlbSB0byBwdWxsIGl0
IHZpYSBIVFRQIGZvciBhbGwgSSBjYXJlLiAgRWl0aGVyIG9uZQ0KPiBpcyBzaWduaWZpY2FudGx5
IG1vcmUgZWZmaWNpZW50IHRoYW4gcnN5bmMuDQoNCkknbSBtaXNzaW5nIHNvbWV0aGluZyBoZXJl
LCBJIHN1c3BlY3QuIEhvdyBjYW4gSFRUUCBiZSBtb3JlIGVmZmljaWVudCB0aGFuDQpyc3luYz8g
VGhlIG9ubHkgb2J2aW91cyBtZXRob2QgdG8gbWUgb2YgbWlycm9yaW5nIGEgQ1BBTiBzaXRlIGJ5
IEhUVFAgaXMgdG8NCmluc3RydWN0IGEgY2xpZW50IChzdWNoIGFzIHdnZXQpIHRvIGdldCBpdCBh
bGwuIEluIHdoaWNoIGNhc2UsIGluIHRoZSBjb3Vyc2UNCm9mIGRvaW5nIHRoaXMgdGhlIGNsaWVu
dCBpcyBnb2luZyB0byByZWN1cnNlIG92ZXIgdGhlIGVudGlyZSBkaXJlY3RvcnkgdHJlZQ0Kb2Yg
dGhlIHNlcnZlciwgd2hpY2gsIEkgdGhvdWdodCwgd2FzIGZ1bmN0aW9uYWxseSBlcXVpdmFsZW50
IHRvIHRoZSBiZWhhdmlvdXINCm9mIHRoZSByc3luYyBzZXJ2ZXIuDQoNCk5pY2hvbGFzIENsYXJr
DQo=

0
dhudes
3/28/2010 5:15:20 PM
--286030772-74950385-1269796701=:21301
Content-Type: TEXT/PLAIN; CHARSET=X-UNKNOWN; FORMAT=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE
Content-ID: <Pine.LNX.4.64.1003280918291.21301@AncHm-1.nevaeh-linux.org>

On Sun, 28 Mar 2010, Elaine Ashton wrote:

> I'm not sending any barbs, only my reasonable opinion borne from years on=
 the reality-based operations side of this equation. As for who you are, it=
 doesn't matter as I work daily with those who wrote, and continue to write=
, large chunks of operating systems, X, etc., and though their legend may p=
recede them when it comes to my having to implement what works fabulously i=
n their imagination, I do my best to bring them back to the grim reality th=
at is operations. It's a frequent problem of engineers and those of us stuc=
k having to live with and fix their grand ideas. Lofty goals usually die so=
mewhere between dreams and production.

Ah, let the chest thumping begin.  My point is that regardless of where the=
=20
idea comes from if it comes from a solid rationale it should be given=20
consideration.  And to date I have yet to see any one of you refute my=20
technical understanding of the problem, only my political understanding of=
=20
the problem.  I/O is the issue, and it is driven predominantly by rsync.

> Well, you'll have to forgive those who mock your n=E4ivete as if it were =
so basic and trivial to replace rsync, it would have been done several time=
s over by now as it's limitations are well known to all who use it on any l=
arge scale. However, it is a well-known, well-used, multi-platform and time=
-tested tool that will not be unseated very easily without good reason and =
a reason that reads something along the lines of improving performance on a=
n archive that should have been trimmed back a bit is not a compelling reas=
on for adoption.

Naivete?  Again:  show me where my assertions about the primary root of you=
r
problem is incorrect?  Show me how pruning CPAN isn't a temporary band-aid
that fails to address a fundamental weakness in the syncing process?  you
haven't.  You can try to dress it up any way you like in effort to discredi=
t
me, but until you do based on the facts, you have nothing.

Rsync is a good tool, but for different use case scenarios.

> And this is a good point to make, yes, it will continue to grow and I kno=
w that the current manager(s) of nic.funet.fi have commented on the burden =
it presents to the system which is also home to a number of other mirrors. =
You cannot assume that the generosity and the resources of the mirror ops a=
re limitless and finding out where that limit lies will come too late to ma=
ke amends.

<G> And you make my point for me.  I'm sure he would love to find a more
efficient use of his I/O.  I assume nothing, I only allow that you'll find
more interest than you assume in managing I/O.  Nor does what I'm proposing
preclude the intractable from continuing to use rsync.  Given that rsync is
your driver of the I/O problem taking away any significant percentage of th=
e
problem with have the largest dividends.

> Pruning back the archive is a good compromise until and unless another so=
lution can be done that will not bother the mirror ops terribly much in ter=
ms of real work.

At least you admit you're only treating the symptoms now, not the disease
itself.  Sure, it will buy you some time, but there'll also be some
political problems to work through which will likely burn as much if not
more manhours than just treating the disease.  And in the end time runs
out and the problem remains.

Look, I don't care if you guys decide against it, but let's be honest about
the compromises you're making.  Hell, pruning isn't even a compromise, it's
not a solution, it's only a delaying tactic.

 =09--Arthur Corliss
 =09  Live Free or Die
--286030772-74950385-1269796701=:21301--
0
corliss
3/28/2010 5:20:51 PM
V2h5IGlzIHJzeW5jIGEgcHJvYmxlbT8gV2hlcmUgaXMgdGhlIGJvdHRsZW5lY2sgaW4gdGhlIHBy
b3RvY29sIG9yIHRoZSBjb2RlIGltcGxlbWVudGluZyBpdD8NClNwZWNpZmljcyENClNBUiBpcyBh
bnRpcXVhdGVkIGRvZXNuJ3QgZ2l2ZSB0aGUgaW5mbyB5b3UgcmVhbGx5IG5lZWQuIFVzaW5nIGEg
bGludXggc3lzdGVtPyBVc2UgcHJvY2FsbGF0b3IgYW5kIGZlZWQgcmVzdWx0aW5nIGNvbGxlY3Rl
ZCBkYXRhIHRvIE9SQ0EuIEJldHRlciB5ZXQsIHVzZSBEVHJhY2Ugb3IgYXQgbGVhc3QgdHJ1c3Mu
ICBDb21waWxlIHJzeW5jIHdpdGggcHJvZmlsaW5nIGNvZGUgLS0gdXNlIFN1biBTdHVkaW8gMTIg
aXQgcnVucyBvbiBMaW51eCBhcyB3ZWxsIGFzIFNvbGFyaXMgYW5kIGl0cyBhIGZyZWUgZG93bmxv
YWQuIA0KDQpGcm9tIGEgbmV0d29yayBwcm90b2NvbCBwZXJzcGVjdGl2ZSByc3luYyBpcyBxdWl0
ZSBnb29kLiBJZiB5b3VyIG5ldHdvcmsgY2FwYWNpdHkgaXMgc28gbGFyZ2UgdGhhdCBpdCBleGNl
ZWRzIGJhbmR3aWR0aCBvciBJT1BzIG9mIHlvdXIgZGlza3MgeW91IHByb2JhYmx5IGNhbiBhZmZv
cmQgYmV0dGVyIGRpc2tzIG9yIGEgbW9yZSBlZmZpY2llbnQgZGlzayBzdG9yYWdlIGxheW91dC4g
DQpBcmUgbWlycm9ycyBsaWtlIG5pYy5mdW5ldC5maSBydW5uaW5nIG11bHRpcGxlIGdpZ2FiaXQg
V0FOIGNvbm5lY3Rpb25zPyAgSWYgc28gdGhleSBjb3VsZCBzdXJlIGRlbWFuZCBzdHJlYW0gbW9y
ZSB0aGFuIGEgYnVuY2ggb2YgU0FUQTIgZGlza3MgY2FuIHByb3ZpZGUuIA0KDQpXaXRob3V0IHBl
cmZvcm1hbmNlIGRhdGEgaXRzIGEgd2FzdGUgb2YgdGltZSB0byBhcmd1ZSBhZ2FpbnN0IHJzeW5j
IA0KDQpTZW50IGZyb20gbXkgQmxhY2tCZXJyea4gc21hcnRwaG9uZSB3aXRoIE5leHRlbCBEaXJl
Y3QgQ29ubmVjdA==

0
dhudes
3/28/2010 5:46:37 PM
On Mar 28, 2010, at 12:48 PM, Randy Kobes wrote:

> 
> Has some sort of disk quota system for CPAN author accounts ever been considered?

Not specifically, no, at least not that I'm aware of. That would have to be implemented on PAUSE and quotas frequently end up not solving the real problem and create a headache both for the sysadmin and the users. 

Jarkko and I were talking about it this morning - as he's not in favour of pruning - while trying to think of a way around the size problem and he reminded me of the idea that, if I recall correctly was Adreas' suggestion a while back, there be an A, B and C 'PAN' of sorts where you could pull varying degrees of content - sort of CPAN:Mini writ large. I don't think that idea ever got any traction because it wouldn't really solve some of the issues for the major upstream mirrors and the mechanics of deciding where to draw the lines between them. I still think it's a good idea though.

I do very much like Tim's proposal for giving old modules a push to BackPAN since, with proper communication of the changes to the authors along with a way to mark exceptions, this would rid CPAN of a lot of cruft that should be on BackPan anyway.

e.
0
eashton
3/28/2010 6:32:40 PM
On 28 Mar 2010, at 19:32, Elaine Ashton wrote:
> Jarkko and I were talking about it this morning - as he's not in =
favour of pruning - while trying to think of a way around the size =
problem and he reminded me of the idea that, if I recall correctly was =
Adreas' suggestion a while back, there be an A, B and C 'PAN' of sorts =
where you could pull varying degrees of content - sort of CPAN:Mini writ =
large. I don't think that idea ever got any traction because it wouldn't =
really solve some of the issues for the major upstream mirrors and the =
mechanics of deciding where to draw the lines between them. I still =
think it's a good idea though.

We're nearly there if A =3D=3D a CPAN::Mini style mirror, B =3D=3D the =
current mirror pruned and C =3D=3D backpan.

So the actions to make that happen are:

* give the current clients specific support for this
* generate a master mini mirror that other mini mirrors can pull from.
* prune

If we agree that this is a good solution I'm happy to do some work on it =
- I could host the mini master and I'd be happy to send Andreas a patch =
for CPAN.pm to support this scheme.

--=20
Andy Armstrong, Hexten



0
andy
3/28/2010 6:39:56 PM
On Sun, Mar 28, 2010 at 12:55 PM, Dana Hudes <dhudes@hudes.org> wrote:
> But you can't use CPAN.pm on the Backpan.
Can't you? It's just a mirror, so if you point CPAN.pm to the backpan,
you should be able to install packages from there (though to get the
version you want you'll need to specify the author/package name
manually I think).

Of course, I've never done this myself, so I could be mistaken
>
> ------Original Message------
> From: Shlomi Fish
> To: module-authors@perl.org
> Cc: dhudes@hudes.org
> Sent: Mar 28, 2010 11:31 AM
> Subject: Re: Trimming the CPAN - "Automatic Purging"
>
> On Sunday 28 Mar 2010 17:28:48 dhudes@hudes.org wrote:
>> The entire point of rsync is to send only changes.
>> Therefore once your mirror initially syncs the old versions of modules i=
s
>> not the issue. Indeed, removing the old versions would present additiona=
l
>> burden on synchronization! The ongoing burden is the ever-growing CPAN.
>>
>> The danger in a CPAN::Mini and in removing old versions is that one is
>> assuming that the latest and greatest is the one to use. This is false.
>> Take the case of someone running old software. I personally support
>> systems still running Informix Dyanmic Server 7.31 as well as systems
>> running the latest IDS 11.5 build. We have Perl code that talks to IDS. =
If
>> DBD::Informix withdrew support for IDS 7.31 I would need both the last
>> version that supported it as well as the current. =C2=A0I can get away w=
ith
>> upgrading Perl, maybe, but to upgrade the dbms is much more problematic
>> (license, for one thing; SQL changes another).
>
> You can always get the old versions from the Backpan, which keeps all
> historical versions - so it's a non-issue.
>
> Regards,
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0Shlomi Fish
>
> --
> -----------------------------------------------------------------
> Shlomi Fish =C2=A0 =C2=A0 =C2=A0 http://www.shlomifish.org/
> Best Introductory Programming Language - http://shlom.in/intro-lang
>
> Deletionists delete Wikipedia articles that they consider lame.
> Chuck Norris deletes deletionists whom he considers lame.
>
> Please reply to list if it's a mailing list post - http://shlom.in/reply =
..
>
>
> Sent from my BlackBerry=C2=AE smartphone with Nextel Direct Connect
0
jawnsy
3/28/2010 9:43:00 PM
On Sun, Mar 28, 2010 at 5:43 PM, Jonathan Yu <jawnsy@cpan.org> wrote:
> On Sun, Mar 28, 2010 at 12:55 PM, Dana Hudes <dhudes@hudes.org> wrote:
>> But you can't use CPAN.pm on the Backpan.
> Can't you? It's just a mirror, so if you point CPAN.pm to the backpan,
> you should be able to install packages from there (though to get the
> version you want you'll need to specify the author/package name
> manually I think).

As always with perl, "it depends".  They are laid out just as a normal
CPAN repository, so if you have one in your urllist, something
specified as author/distribution.tar.gz might well resolve. *However*,
they don't necessarily have up-to-date index files.  Compare
timestamps on 02packages.details.txt


  http://backpan.cpan.org/modules/
  http://backpan.perl.org/modules/

Anything in your urllist might at some point be used for an index, so
make sure you use backpan.cpan.org and not backpan.perl.org since the
former seems to keep other necessary CPAN index files up-to-date.

-- David
0
xdaveg
3/28/2010 10:04:03 PM
* Graham Barr <gbarr@pobox.com> [2010-03-26 10:20]:
> On Mar 25, 2010, at 8:42 AM, Barbie wrote:
> >Lastly I would also personnally be annoyed if only the latest
> >versions were available, as I often make great use of the diff
> >tool on search.cpan.org. Having only the latest version
> >renders that great tool redundant :(
>
> I use that too :-) and it is very annoying that some authors
> automatically delete previous releases when they upload a new
> one.

Why does that have to be constrained by the current availability
of modules? Couldn’t search.cpan.org simply not honour deletions?
Would there be any serious reason against this?

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>
0
pagaltzis
3/29/2010 1:53:08 AM
* Nicholas Clark <nick@ccl4.org> [2010-03-28 18:20]:
> I'm missing something here, I suspect.

Yes, you are.

> How can HTTP be more efficient than rsync? The only obvious
> method to me of mirroring a CPAN site by HTTP is to instruct
> a client (such as wget) to get it all.

As Arthur has repeatedly pointed this out: by first fetching
a transaction log from the remote end, then playing it forward
=66rom the last synch point.

(This is essentially what CPAN::Mini already does.)

It=E2=80=99s not very efficient protocol-wise, but it sure is rather
cheap in terms of server I/O.

Regards,
--=20
Aristotle Pagaltzis // <http://plasmasturm.org/>
0
pagaltzis
3/29/2010 2:13:47 AM
SSB0aGluayB0aGF0IEFuZHJlYXMncyBjb25jZXB0IG9mIHRyZWF0aW5nIHRoZXNlIG1pcnJvcnMg
YXMgYSBkYXRhYmFzZSBpcyBnb29kLiBDaGVja3BvaW50IGxvZ2ljYWwgbG9nIHJlcGxheSBpcyBi
ZXR0ZXIgdGhhbiBhIHNpbXBsZSByc3luYyBmb3IgbGFyZ2UgbnVtYmVycyBvZiBmaWxlcy4gIA0K
DQpUaGUgcmVwbGljYXRpb24gcHJvYmxlbSBmb3IgZGF0YWJhc2VzIGlzIHdlbGwtdW5kZXJzdG9v
ZCBhbmQgb3Blbi1zb3VyY2UgY29kZSBmb3IgaXQgaXMgYXZhaWxhYmxlIGZyb20gYXQgbGVhc3Qg
UG9zdGdyZXNxbC4gDQoNCkdyYWIgdGhlIGN1cnJlbnQgbG9nIGFuZCBhbnkgbG9ncyB5b3UncmUg
bWlzc2luZyBzaW5jZSBsYXN0IHVwZGF0ZSBhbmQgb2ZmIHlvdSBnbyANCkFub3RoZXIgYXBwcm9h
Y2ggd2hpY2ggaXMgYSBub24tc3RhcnRlciBwcmFjdGljYWxseSBzcGVha2luZyBidXQgSSB3aWxs
IG1lbnRpb24gYW55d2F5Og0KVXNlIHpmcy4gTWFrZSBvbmUgZmlsZXN5c3RlbSBmb3IgZWFjaCBt
aXJyb3JlZCBwcm9qZWN0IChDUEFOLCBmcmVzaG1lYXQsIGV0YykuIERhaWx5IG9yIGF0IG90aGVy
IHJlZ3VsYXIgaW50ZXJ2YWwgbWFrZSBhIHpmcyBzbmFwc2hvdC4gUHVyZ2Ugb2xkIG9uZXMgYWZ0
ZXIgc29tZSByZWFzb25hYmxlIHRpbWUgc3VjaCBhcyAyIGRheXMuIE1pcnJvciBzaXRlcyByZXF1
ZXN0IGEgemZzIGluY3JlbWVudGFsIHN0cmVhbSB3aXRoIHRoZSBuYW1lIG9mIHRoZWlyIGxhc3Qg
cmVjJ2Qgc25hcHNob3QgYW5kIHRoYXQgb2YgdGhlIGN1cnJlbnQuIA0KV2hpbGUgemZzIGlzIGF2
YWlsYWJsZSBmb3IgU29sYXJpcyAxMCwgT3BlblNvbGFyaXMgYW5kIEkgYmVsaWV2ZSBGcmVlQlNE
ICh0aGUgTWFjIE9TWCBwb3J0IGhhbHRlZCBJSVJDKSB0aGlzIGlzbid0IGF2YWlsYWJsZSBlbm91
Z2ggZm9yIG1ham9yIG1pcnJvcnMgdG8gdXNlIA0KU2VudCBmcm9tIG15IEJsYWNrQmVycnmuIHNt
YXJ0cGhvbmUgd2l0aCBOZXh0ZWwgRGlyZWN0IENvbm5lY3Q=

0
dhudes
3/29/2010 2:18:38 AM
VXNpbmcgaHR0cCBmb3IgdGhpcyBpcyBpbmVmZmljaWVudCANCkl0IG1ha2VzIGZvciBzbG93ZXIg
ZmlsZSB0cmFuc2ZlciBiZWNhdXNlIHlvdSBrZWVwIHJlcnVubmluZyBwYXRoIG10dSBwcm9iZXMg
YW5kIHRjcCBzbG93IHN0YXJ0ICBJdCBtYWtlcyBleHRyYSBzb2NrZXQgaGFuZGxlcyBvcGVuaW5n
IGFuZCBjbG9zaW5nICANCg0KQmV0d2VlbiBtYWpvciBtaXJyb3JzIHlvdSBkb24ndCBoYXZlIHBy
b3hpZXMgKG1heWJlIE5BVCBhbmQgZmlyZXdhbGwgd2l0aCBzdGF0ZWZ1bCBtdWx0aWxheWVyIGlu
c3BlY3Rpb24gYnV0IHRoYXQicyBkaWZmZXJlbnQpIFNvIGZ0cCBpcyBhdmFpbGFibGUgYW5kIGlz
IG1vcmUgc3VpdGFibGUgcHJvdG9jb2wgZm9yIGJ1bGsgZmlsZSB0cmFuc2ZlciANCg0KSW4gdGhl
IGNhc2Ugb2YgQ1BBTiB5b3UgZG9uJ3QgaGF2ZSB0byBnbyB0aGUgbG9nIHJvdXRlLiBJZiB0aGUg
bWlycm9yIGtub3dzIGl0IGxhc3Qgc3luY2ggdGltZSBpdCBjYW4gdXNlIHJzeW5jIHRvIGdldCB0
aGUgbW9kbGlzdCBldCBhbCBhbmQgaW1wb3J0IHRvIFNRTElURSB0aGVuIHF1ZXJ5IGJ5IGRhdGUg
dG8gY29tZSB1cCB3aXRoIHRoZSBsaXN0IG9mIGZpbGVzIHRvIGZldGNoIC0tIHZpYSBmdHAuIA0K
DQogLiANCi0tLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLS0NCkZyb206IEFyaXN0b3RsZSBQYWdh
bHR6aXMNClRvOiBtb2R1bGUtYXV0aG9yc0BwZXJsLm9yZw0KU2VudDogTWFyIDI4LCAyMDEwIDEw
OjEzIFBNDQpTdWJqZWN0OiBSZTogVHJpbW1pbmcgdGhlIENQQU4gLSAiQXV0b21hdGljIFB1cmdp
bmciDQoNCiogTmljaG9sYXMgQ2xhcmsgPG5pY2tAY2NsNC5vcmc+IFsyMDEwLTAzLTI4IDE4OjIw
XToNCj4gSSdtIG1pc3Npbmcgc29tZXRoaW5nIGhlcmUsIEkgc3VzcGVjdC4NCg0KWWVzLCB5b3Ug
YXJlLg0KDQo+IEhvdyBjYW4gSFRUUCBiZSBtb3JlIGVmZmljaWVudCB0aGFuIHJzeW5jPyBUaGUg
b25seSBvYnZpb3VzDQo+IG1ldGhvZCB0byBtZSBvZiBtaXJyb3JpbmcgYSBDUEFOIHNpdGUgYnkg
SFRUUCBpcyB0byBpbnN0cnVjdA0KPiBhIGNsaWVudCAoc3VjaCBhcyB3Z2V0KSB0byBnZXQgaXQg
YWxsLg0KDQpBcyBBcnRodXIgaGFzIHJlcGVhdGVkbHkgcG9pbnRlZCB0aGlzIG91dDogYnkgZmly
c3QgZmV0Y2hpbmcNCmEgdHJhbnNhY3Rpb24gbG9nIGZyb20gdGhlIHJlbW90ZSBlbmQsIHRoZW4g
cGxheWluZyBpdCBmb3J3YXJkDQpmcm9tIHRoZSBsYXN0IHN5bmNoIHBvaW50Lg0KDQooVGhpcyBp
cyBlc3NlbnRpYWxseSB3aGF0IENQQU46Ok1pbmkgYWxyZWFkeSBkb2VzLikNCg0KSXSScyBub3Qg
dmVyeSBlZmZpY2llbnQgcHJvdG9jb2wtd2lzZSwgYnV0IGl0IHN1cmUgaXMgcmF0aGVyDQpjaGVh
cCBpbiB0ZXJtcyBvZiBzZXJ2ZXIgSS9PLg0KDQpSZWdhcmRzLA0KLS0gDQpBcmlzdG90bGUgUGFn
YWx0emlzIC8vIDxodHRwOi8vcGxhc21hc3R1cm0ub3JnLz4NCg0KDQpTZW50IGZyb20gbXkgQmxh
Y2tCZXJyea4gc21hcnRwaG9uZSB3aXRoIE5leHRlbCBEaXJlY3QgQ29ubmVjdA==

0
dhudes
3/29/2010 2:27:11 AM
Hi Elaine,

Elaine Ashton wrote:
> On Mar 28, 2010, at 12:48 PM, Randy Kobes wrote:
> Jarkko and I were talking about it this morning - as he's not in
> favour of pruning - while trying to think of a way around the size
> problem and he reminded me of the idea that, if I recall correctly
> was Adreas' suggestion a while back, there be an A, B and C 'PAN' of
> sorts where you could pull varying degrees of content - sort of
> CPAN:Mini writ large. I don't think that idea ever got any traction
> because it wouldn't really solve some of the issues for the major
> upstream mirrors and the mechanics of deciding where to draw the
> lines between them. I still think it's a good idea though.

This sounds a bit like the CPAN -> backpan scheme but with some 
additional levels?

> I do very much like Tim's proposal for giving old modules a push to
> BackPAN since, with proper communication of the changes to the
> authors along with a way to mark exceptions, this would rid CPAN of a
> lot of cruft that should be on BackPan anyway.

I'm not even going to throw in my considerable weight on this whole 
debate of pruning*. But if backpan became the "official" way to access 
old versions starting from yesterday's, wouldn't that mean:

a) That the toolchain would have to be adapted to a tiered 
infrastructure (think of the indexes...)
and more importantly:
b) The backpan would have to be mirrored all over the place as well, 
thus pushing the problem to the next level?

Best regards,
Steffen

* If you must know, I don't like the means but sympathize with the goals.

PS: This isn't targeted at Elaine specifically, but can everybody please 
take a step back and relax? Please be civil.
0
smueller
3/29/2010 6:25:25 AM
On Sun, 28 Mar 2010, dhudes@hudes.org wrote:

> The entire point of rsync is to send only changes.
> Therefore once your mirror initially syncs the old versions of modules is
> not the issue. Indeed, removing the old versions would present additional
> burden on synchronization! The ongoing burden is the ever-growing CPAN.

That's not entirely true, particularly when you're talking about rsync.
Remember, old synced data doesn't have to be transfered, but it still needs
to be checked for potential changes, something rsync does for every request.
That generates a crap load of I/O in the form of stats on the server.

> The danger in a CPAN::Mini and in removing old versions is that one is
> assuming that the latest and greatest is the one to use. This is false.
> Take the case of someone running old software. I personally support
> systems still running Informix Dyanmic Server 7.31 as well as systems
> running the latest IDS 11.5 build. We have Perl code that talks to IDS. If
> DBD::Informix withdrew support for IDS 7.31 I would need both the last
> version that supported it as well as the current.  I can get away with
> upgrading Perl, maybe, but to upgrade the dbms is much more problematic
> (license, for one thing; SQL changes another).

This is a good example of the potentials of pruning, to be certain.  Even if
all the authors dutifully documented all the necessary scenarios that would
require pinning specific versions on CPAN it's almost guaranteed that
there's still going to be collateral damage.

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/29/2010 7:39:12 AM
On Sun, 28 Mar 2010, Nicholas Clark wrote:

> Are you running a large public mirror site, where you don't even have
> knowledge of who is mirroring from you?
>
> (Not even knowledge, let alone channels of communication with, let alone
> control over)
>
> Because (as I see it, not having done any of this) the logistics of that is
> going to have as much bearing on trying to change protocols as the actual
> technical merits of the protocol itself.

I do run mirrors and am mirrored from.  Not on the scale of CPAN (in terms
of file count), but having been long aware of the effect of rsync servers I
have explored the scalability aspects of it.

It should have been obvious that trying to facilitate a cut-over to a new
syncing tool can't be done on this scale in one fell swoop.  Obviously,
there'd have to be a gradual migration where protocols are supported
concurrently, much like FTP & rsync are currently both supported.  We add a
new option and encourage people to move over.  Since we already have a list
of the public mirrors we should have some idea of where to start that
conversation.

> Most of the cost of rsync is an externality to the clients. If one has an
> existing mirror, one is using rsync to keep it up to date, what's the
> incentive to change?

Common sense and professional courtesy.  Especially because it's likely that
some "clients" running public mirrors may be a sync source for some private
mirrors.  They may not feel the pain of the master repositories, but they
certainly share a portion.  And it's not likely that many mirrors have a 
capital budget to support scaling a free service, so it would be best to 
make efficient use of those resources.

> I'm missing something here, I suspect. How can HTTP be more efficient than
> rsync? The only obvious method to me of mirroring a CPAN site by HTTP is to
> instruct a client (such as wget) to get it all. In which case, in the course
> of doing this the client is going to recurse over the entire directory tree
> of the server, which, I thought, was functionally equivalent to the behaviour
> of the rsync server.

You are missing something, but I may have not been explicit enough.  HTTP or
FTP can easily be the payload transport, once you know the precise files
that need to be transferred.  That is tremendously more efficient than what
rsync does on the server.  So, use rsync (or FTP mgets, etc.) to transfer
your transaction logs, compile a list of new files to retrieve, and use the
very common and low-overhead protocols to transfer the files...

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/29/2010 7:50:16 AM
On Sun, 28 Mar 2010, Elaine Ashton wrote:

> I do very much like Tim's proposal for giving old modules a push to BackPAN since, with proper communication of the changes to the authors along with a way to mark exceptions, this would rid CPAN of a lot of cruft that should be on BackPan anyway.

I'm not trying to be a dick (not intentionally, anyway), but isn't that
basically making your problem BackPan's problem?

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/29/2010 7:52:40 AM
On Sun, 28 Mar 2010, Andy Armstrong wrote:

> We're nearly there if A == a CPAN::Mini style mirror, B == the current mirror pruned and C == backpan.
>
> So the actions to make that happen are:
>
> * give the current clients specific support for this
> * generate a master mini mirror that other mini mirrors can pull from.
> * prune
>
> If we agree that this is a good solution I'm happy to do some work on it - I could host the mini master and I'd be happy to send Andreas a patch for CPAN.pm to support this scheme.

It should be pointed out that this is only viable under the assumption that
you have a separate pool of servers for each tier.  Again, this is just
load balancing, not load optimization.

That said, if you have the volunteers, then why not.  Perhaps I can offer a
system to support mirroring up here in Alaska.

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/29/2010 7:56:12 AM
On Sun, 28 Mar 2010, Andreas J. Koenig wrote:

> Says the author of a module named Paranoid. A lovely coincidence.

:-) As they say, just because you may be paranoid, it doesn't mean that no
one's out to get you.

> If you want to study the CPAN "checkpointed logs" solution running on
> the very CPAN for exactly one year now: File::Rsync::Mirror::Recent
>
> What needs to be done is really extremely trivial: rewrite it in C and
> convince the rsync people to incoude it in rsync code base. Just that.
>
> So are you a taker, Arthur?

Heh, nice.  That sounds much more involved than my proposal, plus it leaves
us entirely at the mercy of an outside organization (the rsync folks) who
may or may not care about our needs.

I think it would be a worthy cause ultimately, but certainly a much longer
time to implementation, and considerably more effort.  Kind of sounds like
the normal stonewalling I've been getting these last few days by our
resident rsync fetishists.

Very ironic.  I use the hell out of rsync, just more discriminately that you
guys, and yet I'm public enemy number one.

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/29/2010 8:02:11 AM
--286030772-562812897-1269850198=:21301
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Sun, 28 Mar 2010, Dana Hudes wrote:

> Use of wget and http to download an entire site means numerous TCP opens =
and HTTP GET requests. The entire point of rsync is that it knows there are=
 numerous downloads. It does ONE open. This allows TCP slow start to ramp u=
p

That wasn't exactly what I was suggesting.  And we'll ignore HTTP's
Keep-Alive support for the time being which negates your TCP open issue.  I=
f
you're fetching transaction logs by which you can determine beforehand
precisely what files to retrieve HTTP or FTP will beat the pants off of
allowing rsync to tell you what you need to retrieve and delivering it.

> A multi-download session with ftp is also efficient. Clients like ncftp h=
ave batch transfer built in. If setting up an initial mirror you might do b=
etter with ftp but maintaining it is where rsync rules.
>
> I haven't looked closely but I have the impression from watching wget wor=
k that wget using HTTP::Date opens two TCP connections per file: it opens a=
 socket and issues a r=E9quest for timestamp then closes it then opens a so=
cket to issue an http GET if it wants the file. Then it closes that socket =
and the process repeats for next file. It keeps hoping for the timestanp ev=
en if the server doesn't support http::Date
>
> Rsync and ftp are stateful; http is not. For absolute getting one file ht=
tp is better since you skip the whole login thing and setting up data and c=
ontrol sockets.
> So a CPAN client session will do better with an http mirror: it gets a ta=
r.gz opens it up processes it and then goes back many seconds from original=
 request for the first dependency. Repeat until entire dependency tree is c=
ompleted

Dude, you definitely don't understand what we're discussing.  And neither
rsync, ftp, or http are stateful -- that's the problem.  Rsync has to
build a picture of the repositories state *per* request, even the old files
that haven't been touched in years.  It then uses that information to selec=
t
and deliver the new files you need.  Maintaining state means that you
maintain knowledge of state over time, across multiple requests.  And rsync
doesn't do that, it simulates that.  Quite cleverly, but in an very
expensive way which is borne by the server.

 =09--Arthur Corliss
 =09  Live Free or Die
--286030772-562812897-1269850198=:21301--
0
corliss
3/29/2010 8:09:58 AM
On Sun, 28 Mar 2010, Dana Hudes wrote:

> I agree with Elaine
> I can't get rsync through the firewall at work. Not even tunneled.
> For CPAN I use CPAN::Mini. It uses http and it does the job though it does force the local CPAN to blead. My local solution to other things we need such as Blastwave (we run Solaris) I have a special squid proxy with restrictive acls lots of disk space and long retention. That means we only download any package once: what needed when needed. That does create a problem of using bandwidth during the day but it works out ok in the end.
>
> Rsync is better than having to hack reverse caching proxies for each site of interest.

Your use of a proxy is commendable.  The whole proxy thing was just thrown
out there as possible options to address concerns Elaine brought up that is
debatably non-pertinent to the majority of the public mirror operators.  I
know it's a complete non-issue for me.  The whole point was that if she
didn't want to get permission from her network folks to run another protocol
(in her case she was scoffing at FTP -- that's totally 1980s, man! ;-) she
could use one that was very likely already open, like HTTP, and use that as
the payload transport layer.  Making the CPAN mirror HTTP-browseable is
completely palatable to me.  Not for crawling, but for specific file
retrievals, assuming you're working off of the transaction logs.

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/29/2010 8:18:06 AM
On Sun, 28 Mar 2010, Dana Hudes wrote:

> Why is rsync a problem? Where is the bottleneck in the protocol or the code implementing it?
> Specifics!
> SAR is antiquated doesn't give the info you really need. Using a linux system? Use procallator and feed resulting collected data to ORCA. Better yet, use DTrace or at least truss.  Compile rsync with profiling code -- use Sun Studio 12 it runs on Linux as well as Solaris and its a free download.

Wow.  You kids and your new shiny toys...  Look, here's a nice little
specific example for you.  I run an rsync server that contains 8,700+ files
and directories.  Now, say I want to sync a mere thirty-two new files.
Making that request on my server causes the rsync daemon to stat the entire
hierarchy to the tune of 18,000+ f & lstats.  Per request.  Freaking ouch.
And that's a tolerable use-case in my mind for rsync.  That's a hell of alot
I/O generated which would take but a couple of stats to retrieve via HTTP or
FTP.  Assuming you knew what you needed already.

Now, when you add in a file set of sufficient size to exhaust filesystem
caching, plus a crap load of concurrent requests, my archaic SAR reports
written on stone tables tend to say your I/O wait states starts pushing the
load levels unacceptably high, not to mention the pages being thrashed from
memory's cache pool, high interrupts and excessive seeks on the drives, and
so on and so forth.  <sniff>  Cavemen are people, too.

Now, look at the size of CPAN with *hundreds* of thousands of files.  Can
you imagine that amount of I/O *per* request?!

> From a network protocol perspective rsync is quite good. If your network capacity is so large that it exceeds bandwidth or IOPs of your disks you probably can afford better disks or a more efficient disk storage layout.
> Are mirrors like nic.funet.fi running multiple gigabit WAN connections?  If so they could sure demand stream more than a bunch of SATA2 disks can provide.
>
> Without performance data its a waste of time to argue against rsync

And without having had examined how rsync works on both ends it should have 
been a waste of time to argue the merits of rsync.

 	--Arthur Corliss
 	  Live Free or Die
0
acorliss
3/29/2010 8:31:50 AM
V2hpbGUgbWFqb3IgbWlycm9yIGNhbiB1c2UgSFRUUDEuMSBpZiBubyBwcm94eSwgU3F1aWQgcG94
aWVzIGRvbid0IGRvIDEuMSBvbiBvdGhlciB0aGFuIGV4cGVyaW1lbnRhbCBiYXNpcy4gDQpBcyB0
byB3aGV0aGVyIHdnZXQgZG9lcyB1c2UgaXQgb3Igbm90IEkgd2lsbCBoYXZlIHRvIGNoZWNrIA0K
RG9uJ3QgYXNzdW1lIGVpdGhlciB3YXkgDQoNCi0tLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLS0N
CkZyb206IEFyaXN0b3RsZSBQYWdhbHR6aXMNClRvOiBtb2R1bGUtYXV0aG9yc0BwZXJsLm9yZw0K
U2VudDogTWFyIDI4LCAyMDEwIDExOjQ1IFBNDQpTdWJqZWN0OiBSZTogVHJpbW1pbmcgdGhlIENQ
QU4gLSAiQXV0b21hdGljIFB1cmdpbmciDQoNCiogRGFuYSBIdWRlcyA8ZGh1ZGVzQGh1ZGVzLm9y
Zz4gWzIwMTAtMDMtMjkgMDQ6MzBdOg0KPiBVc2luZyBodHRwIGZvciB0aGlzIGlzIGluZWZmaWNp
ZW50IEl0IG1ha2VzIGZvciBzbG93ZXIgZmlsZQ0KPiB0cmFuc2ZlciBiZWNhdXNlIHlvdSBrZWVw
IHJlcnVubmluZyBwYXRoIG10dSBwcm9iZXMgYW5kIHRjcA0KPiBzbG93IHN0YXJ0ICBJdCBtYWtl
cyBleHRyYSBzb2NrZXQgaGFuZGxlcyBvcGVuaW5nIGFuZCBjbG9zaW5nDQoNCkVycm0sIHlvdSBt
aXNzZWQgdGhlIGxhc3QgZGVjYWRlLiAoSFRUUC8xLjEgaGFzIGtlZXAtYWxpdmUgYW5kDQpwaXBl
bGluaW5nIGFuZCBpdJJzIDEwIHllYXJzIG9sZCBub3cuKQ0KDQo+IEluIHRoZSBjYXNlIG9mIENQ
QU4geW91IGRvbid0IGhhdmUgdG8gZ28gdGhlIGxvZyByb3V0ZS4gSWYgdGhlDQo+IG1pcnJvciBr
bm93cyBpdCBsYXN0IHN5bmNoIHRpbWUgaXQgY2FuIHVzZSByc3luYyB0byBnZXQgdGhlDQo+IG1v
ZGxpc3QgZXQgYWwgYW5kIGltcG9ydCB0byBTUUxJVEUgdGhlbiBxdWVyeSBieSBkYXRlIHRvIGNv
bWUNCj4gdXAgd2l0aCB0aGUgbGlzdCBvZiBmaWxlcyB0byBmZXRjaCAtLSB2aWEgZnRwLg0KDQpT
YXkgd2hhdD8gU3RhdCB2aWEgcnN5bmMgdG8gZmVlZCBhbiBTUUxpdGUgZGF0YWJhc2UgdGhhdCBk
cml2ZXMNCmFuIEZUUCB0cmFuc2Zlcj8gQ291bGQgeW91IGV2ZW4gcG9zc2libHkgY29tZSB1cCB3
aXRoIGEgbW9yZQ0KUnViZS1Hb2xkYmVyZ2lhbiBjb25zdHJ1Y3Rpb24/DQoNClJlZ2FyZHMsDQot
LSANCkFyaXN0b3RsZSBQYWdhbHR6aXMgLy8gPGh0dHA6Ly9wbGFzbWFzdHVybS5vcmcvPg0KDQoN
ClNlbnQgZnJvbSBteSBCbGFja0JlcnJ5riBzbWFydHBob25lIHdpdGggTmV4dGVsIERpcmVjdCBD
b25uZWN0

0
dhudes
3/29/2010 11:17:11 AM
On Mon, 29 Mar 2010, Dana Hudes wrote:

> Orcallator, procallator and friends aren't shiny new toys
> Adrian Cockroft wrote initial version of orcallator in the early 90s for his book "Solaris Performance Tuning. The 2nd edition is I think 1998.
> The current version of ORCA (processes the collected data) is from I believe 2007 or so
> www.orcaware.org i think it was

I was being facetious.  Your immediate dismissal of SAR is ill-advised.  I'm
wearing my abestos-lined boxers, so I'll lob this little inflammatory gem
out there:  if you're running a server (especially in production) and you're
*not* running SAR, you're a freaking idiot.

Profiling individual programs is all well and good for occasional or
developer use, but the point of SAR is to give you a global view into the
health of your system and to identify architectural bottlenecks.  I think it
would be greatly entertaining for Elaine or any of the other mirror
operators to post their SAR reports so you guys can see the huge amount of
abuse being heaped on their servers.

SAR is debatably one of the lowest overhead methods of gaining that
macroscopic view, and it still has profiling value on development systems
when you're testing a specific workload.

To ignore SAR is to show zero competence as a sys-admin.

 	--Arthur Corliss
 	  Live Free or Die
0
acorliss
3/29/2010 5:12:09 PM
QXJ0aHVyIHlvdXIgaWdub3JhbmNlIGlzIGFwYWxsaW5nDQpHbyBsb29rIGF0IHdoYXQgT1JDQSBk
b2VzIA0KU0FSIGRvZXNuJ3QgZ2l2ZSB5b3UgdGhlIGluZm8gDQpXaXRoIE9SQ0EgaSBoYXZlIGFu
eSB0aGluZyBmcm9tIGtzdGF0IG9yIGlvc3RhdC4gSXQgZ29lcyBpbnRvIHJvdW5kcm9iaW4gZGF0
YWJhc2Ugd2l0aCBycmR0b29sLiANCg0KUHJvY2FsbGFvdHIgZG9lcyBmb3IgbGludXggd2hhdCAN
Cm9yY2FsbGF0b3IgZG9lcyBmb3Igc29sYXJpcyB3aGVyZSBpdCBpcyB0aGUgc3RhbmRhcmQgcGVy
Zm9ybWFuY2UgdG9vb2wgDQotLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0tDQpGcm9tOiBBcnRo
dXIgQ29ybGlzcw0KVG86IERhbmEgSHVkZXMNCkNjOiBtb2R1bGUtYXV0aG9yc0BwZXJsLm9yZw0K
U2VudDogTWFyIDI5LCAyMDEwIDE6MTIgUE0NClN1YmplY3Q6IFJlOiBUcmltbWluZyB0aGUgQ1BB
TiAtICJBdXRvbWF0aWMgUHVyZ2luZyINCg0KT24gTW9uLCAyOSBNYXIgMjAxMCwgRGFuYSBIdWRl
cyB3cm90ZToNCg0KPiBPcmNhbGxhdG9yLCBwcm9jYWxsYXRvciBhbmQgZnJpZW5kcyBhcmVuJ3Qg
c2hpbnkgbmV3IHRveXMNCj4gQWRyaWFuIENvY2tyb2Z0IHdyb3RlIGluaXRpYWwgdmVyc2lvbiBv
ZiBvcmNhbGxhdG9yIGluIHRoZSBlYXJseSA5MHMgZm9yIGhpcyBib29rICJTb2xhcmlzIFBlcmZv
cm1hbmNlIFR1bmluZy4gVGhlIDJuZCBlZGl0aW9uIGlzIEkgdGhpbmsgMTk5OC4NCj4gVGhlIGN1
cnJlbnQgdmVyc2lvbiBvZiBPUkNBIChwcm9jZXNzZXMgdGhlIGNvbGxlY3RlZCBkYXRhKSBpcyBm
cm9tIEkgYmVsaWV2ZSAyMDA3IG9yIHNvDQo+IHd3dy5vcmNhd2FyZS5vcmcgaSB0aGluayBpdCB3
YXMNCg0KSSB3YXMgYmVpbmcgZmFjZXRpb3VzLiAgWW91ciBpbW1lZGlhdGUgZGlzbWlzc2FsIG9m
IFNBUiBpcyBpbGwtYWR2aXNlZC4gIEknbQ0Kd2VhcmluZyBteSBhYmVzdG9zLWxpbmVkIGJveGVy
cywgc28gSSdsbCBsb2IgdGhpcyBsaXR0bGUgaW5mbGFtbWF0b3J5IGdlbQ0Kb3V0IHRoZXJlOiAg
aWYgeW91J3JlIHJ1bm5pbmcgYSBzZXJ2ZXIgKGVzcGVjaWFsbHkgaW4gcHJvZHVjdGlvbikgYW5k
IHlvdSdyZQ0KKm5vdCogcnVubmluZyBTQVIsIHlvdSdyZSBhIGZyZWFraW5nIGlkaW90Lg0KDQpQ
cm9maWxpbmcgaW5kaXZpZHVhbCBwcm9ncmFtcyBpcyBhbGwgd2VsbCBhbmQgZ29vZCBmb3Igb2Nj
YXNpb25hbCBvcg0KZGV2ZWxvcGVyIHVzZSwgYnV0IHRoZSBwb2ludCBvZiBTQVIgaXMgdG8gZ2l2
ZSB5b3UgYSBnbG9iYWwgdmlldyBpbnRvIHRoZQ0KaGVhbHRoIG9mIHlvdXIgc3lzdGVtIGFuZCB0
byBpZGVudGlmeSBhcmNoaXRlY3R1cmFsIGJvdHRsZW5lY2tzLiAgSSB0aGluayBpdA0Kd291bGQg
YmUgZ3JlYXRseSBlbnRlcnRhaW5pbmcgZm9yIEVsYWluZSBvciBhbnkgb2YgdGhlIG90aGVyIG1p
cnJvcg0Kb3BlcmF0b3JzIHRvIHBvc3QgdGhlaXIgU0FSIHJlcG9ydHMgc28geW91IGd1eXMgY2Fu
IHNlZSB0aGUgaHVnZSBhbW91bnQgb2YNCmFidXNlIGJlaW5nIGhlYXBlZCBvbiB0aGVpciBzZXJ2
ZXJzLg0KDQpTQVIgaXMgZGViYXRhYmx5IG9uZSBvZiB0aGUgbG93ZXN0IG92ZXJoZWFkIG1ldGhv
ZHMgb2YgZ2FpbmluZyB0aGF0DQptYWNyb3Njb3BpYyB2aWV3LCBhbmQgaXQgc3RpbGwgaGFzIHBy
b2ZpbGluZyB2YWx1ZSBvbiBkZXZlbG9wbWVudCBzeXN0ZW1zDQp3aGVuIHlvdSdyZSB0ZXN0aW5n
IGEgc3BlY2lmaWMgd29ya2xvYWQuDQoNClRvIGlnbm9yZSBTQVIgaXMgdG8gc2hvdyB6ZXJvIGNv
bXBldGVuY2UgYXMgYSBzeXMtYWRtaW4uDQoNCiAJLS1BcnRodXIgQ29ybGlzcw0KIAkgIExpdmUg
RnJlZSBvciBEaWUNCg0KDQpTZW50IGZyb20gbXkgQmxhY2tCZXJyea4gc21hcnRwaG9uZSB3aXRo
IE5leHRlbCBEaXJlY3QgQ29ubmVjdA==

0
dhudes
3/29/2010 5:27:35 PM
On Mon, 29 Mar 2010, Dana Hudes wrote:

> Arthur your ignorance is apalling
> Go look at what ORCA does
> SAR doesn't give you the info
> With ORCA i have any thing from kstat or iostat. It goes into roundrobin database with rrdtool.
>
> Procallaotr does for linux what
> orcallator does for solaris where it is the standard performance toool

*My* ignorance is appalling?  Let's see, in this discussion alone you've
shown us that:

   * you didn't know of a decade-old support for multiple HTTP requests
     over a single TCP connection existed
   * you claimed that rsync & ftp are stateful, when they're obviously
     not
   * you obviously had zero clue of the I/O impacts of running an rsync
     server (with the massive number of stats per request)
   * apparently you don't know that SAR gives you everything in iostat,
     vmstat, etc. as well.

And based on all this, I'm willing to bet you don't understand how RRD 
works, particularly with how the archive data is stored.

I never claimed that SAR is better than the other tools, but it's
universally available on UNIX (and clones) making it an excellent global
tool for use on heterogenous systems platforms, and more than capable of
identifying architectural bottlenecks with virtually no overhead.  That
makes it a necessity.

Don't try to cover up your previously displayed areas of ignorance by
pursuing a pointless and very stupid tangent.  That's not the point of this
discussion.

 	--Arthur Corliss
 	  Live Free or Die
0
acorliss
3/29/2010 5:55:05 PM
On 29/03/2010 09:39, Arthur Corliss wrote:
> On Sun, 28 Mar 2010, dhudes@hudes.org wrote:
>
>> The entire point of rsync is to send only changes.
>> Therefore once your mirror initially syncs the old versions of modules is
>> not the issue. Indeed, removing the old versions would present additional
>> burden on synchronization! The ongoing burden is the ever-growing CPAN.
>
> That's not entirely true, particularly when you're talking about rsync.
> Remember, old synced data doesn't have to be transfered, but it still needs
> to be checked for potential changes, something rsync does for every
> request.
> That generates a crap load of I/O in the form of stats on the server.

I believe cvsup (FreeBSD's source distribution mechanism) knows how to 
avoid this cost by serialising context between runs.

That may be an avenue worth exploring, since it should be a less risky 
proposition for a mirror operator to download a tried and true 
technology rather than some pie-in-the-sky new system that may run out 
of steam in a year's time.

David

>> The danger in a CPAN::Mini and in removing old versions is that one is
>> assuming that the latest and greatest is the one to use. This is false.
>> Take the case of someone running old software. I personally support
>> systems still running Informix Dyanmic Server 7.31 as well as systems
>> running the latest IDS 11.5 build. We have Perl code that talks to
>> IDS. If
>> DBD::Informix withdrew support for IDS 7.31 I would need both the last
>> version that supported it as well as the current. I can get away with
>> upgrading Perl, maybe, but to upgrade the dbms is much more problematic
>> (license, for one thing; SQL changes another).
>
> This is a good example of the potentials of pruning, to be certain. Even if
> all the authors dutifully documented all the necessary scenarios that would
> require pinning specific versions on CPAN it's almost guaranteed that
> there's still going to be collateral damage.
>
> --Arthur Corliss
> Live Free or Die
>


-- 
naked, but wearing blinding lights! were it a pretty girl, she'd be 
surrounded as a flame by moths
0
david
3/30/2010 10:50:35 AM
On Sun, Mar 28, 2010 at 07:28:48AM -0700, dhudes@hudes.org wrote:

> The danger in a CPAN::Mini and in removing old versions is that one is
> assuming that the latest and greatest is the one to use. This is false.

And this is why I run cp5.6.2an.barnyard.co.uk etc.  

It wouldn't be difficult for someone to take my code and customise it
further to, eg, also "pin" a few modules that rely on the particular
versions of third-party libraries that you use.

-- 
David Cantrell | Bourgeois reactionary pig

Eye have a spelling chequer / It came with my pea sea
It planely marques four my revue / Miss Steaks eye kin knot sea.
Eye strike a quay and type a word / And weight for it to say
Weather eye am wrong oar write / It shows me strait a weigh.
0
david
3/30/2010 10:55:16 AM
On Sun, Mar 28, 2010 at 06:04:03PM -0400, David Golden wrote:

> As always with perl, "it depends".  They are laid out just as a normal
> CPAN repository, so if you have one in your urllist, something
> specified as author/distribution.tar.gz might well resolve.

Not just "might well resolve".  It *will* work.  If you use one of my
cpXXXan mirrors, you're hitting a BackPAN mirror with a custom index.

>                                                              *However*,
> they don't necessarily have up-to-date index files.  Compare
> timestamps on 02packages.details.txt

Indeed.  I don't imagine that that would be hard for Andreas to keep in
sync!

-- 
David Cantrell | even more awesome than a panda-fur coat

"IMO, the primary historical significance of Unix is that it marks the
time in computer history where CPUs became so cheap that it was possible
to build an operating system without adult supervision."
                         -- Russ Holsclaw in a.f.c
0
david
3/30/2010 11:51:22 AM
On Tue, 30 Mar 2010, David Landgren wrote:

> I believe cvsup (FreeBSD's source distribution mechanism) knows how to avoid 
> this cost by serialising context between runs.
>
> That may be an avenue worth exploring, since it should be a less risky 
> proposition for a mirror operator to download a tried and true technology 
> rather than some pie-in-the-sky new system that may run out of steam in a 
> year's time.

You had me excited at first, but then the home page said:

   To update non-RCS files, CVSup uses the highly efficient rsync algorithm,
   developed by Andrew Tridgell and Paul Mackerras.

Looks like its speed benefits are due to knowledge of specific file types
(RCS and log files) so it can grab just the new content for transfer.  For
all other types it falls back onto rsync, which they say is built into
CVSup.

If there isn't an existing (and portable solution) out there, I've got a few
ideas I may have to mock up and try out myself.

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/30/2010 4:59:44 PM
Arthur Corliss wrote:
> You had me excited at first, but then the home page said:
>
>   To update non-RCS files, CVSup uses the highly efficient rsync 
> algorithm,
>   developed by Andrew Tridgell and Paul Mackerras.
>
> Looks like its speed benefits are due to knowledge of specific file types
> (RCS and log files) so it can grab just the new content for transfer.  
> For
> all other types it falls back onto rsync, which they say is built into
> CVSup.
Er, not exactly. Read
http://www.cvsup.org/howsofast.html

 From what I can see, cvsup uses the rsync algorithm on a file-by-file 
basis (it uses just the differential send part of the rsync algorithm). 
It doesn't rsync the whole tree, which was what I understood to be the 
original problem (wasn't the complaint about the flood of stats?).

So if you want to make a tool that works fine for large mirrors, your 
priority apparently should be to reduce the "lots of stats" part which 
is used to determine exactly what files need to be considered for 
checking. (Rsync already makes sure all the *other* I/O operations are 
minimized).

Now the key, as I see it, is that unlike all the other use cases where 
rsync is used, large mirrors are likely to have their directories 
directly transfered from another mirror. So, the client that pulled the 
tree update down could store a list of changed files, and the server 
could then just use that list to determine which files
need to be synced to the downstream mirror. (Sure, the original site has 
to generate the list, but if they use a tool like PAUSE to upload the 
files, that shouldn't be hard to do).
0
matija
3/30/2010 5:54:48 PM
On Tue, 30 Mar 2010, Matija Grabnar wrote:

> Er, not exactly. Read
> http://www.cvsup.org/howsofast.html

I had read  http://www.cvsup.org/faq.html#features  item #3.

> From what I can see, cvsup uses the rsync algorithm on a file-by-file basis 
> (it uses just the differential send part of the rsync algorithm). It doesn't 
> rsync the whole tree, which was what I understood to be the original problem 
> (wasn't the complaint about the flood of stats?).

Sounds like I may have interpreted the FAQ incorrectly, then.  Thanks for
pointing that out.  I have a few question, though: the explanation says:

    "At the same time, the Tree Differ generates a list of the server's
    files."

That seems to infer that it's doing the exact same thing as rsync, so all 
the stats are still present on the server, right?

Nowhere do I see it mentioning that the daemon is maintaining state between
requests.  The primary speed-ups (beyond special file update handling) is
better use of bidirectional bandwidth.

Do you have access to a cvsup server so you can verify its behavior?

> So if you want to make a tool that works fine for large mirrors, your 
> priority apparently should be to reduce the "lots of stats" part which is 
> used to determine exactly what files need to be considered for checking. 
> (Rsync already makes sure all the *other* I/O operations are minimized).

Agreed.

> Now the key, as I see it, is that unlike all the other use cases where rsync 
> is used, large mirrors are likely to have their directories directly 
> transfered from another mirror. So, the client that pulled the tree update 
> down could store a list of changed files, and the server could then just use 
> that list to determine which files
> need to be synced to the downstream mirror. (Sure, the original site has to 
> generate the list, but if they use a tool like PAUSE to upload the files, 
> that shouldn't be hard to do).

Agreed, but I'm not sure we've gotten past the stat storm on the server,
though.

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/30/2010 6:33:16 PM
Hi!

>> Now the key, as I see it, is that unlike all the other use cases where 
>> rsync is used, large mirrors are likely to have their directories 
>> directly transfered from another mirror. So, the client that pulled 
>> the tree update down could store a list of changed files, and the 
>> server could then just use that list to determine which files
>> need to be synced to the downstream mirror. (Sure, the original site 
>> has to generate the list, but if they use a tool like PAUSE to upload 
>> the files, that shouldn't be hard to do).
> 
> Agreed, but I'm not sure we've gotten past the stat storm on the server,
> though.

Ok, this might be a complete wacky idea, but couldn't we use some kind 
of version control system.

Before you kick my backside, hear me out: This is of course very 
theoretical at the moment, there are probably quite a number of pitfalls 
and kinks to work out...

Currently, there's CPAN and Backpan. With Backpan playing the archive.

Suppose, just suppose we see that as some kind of old style, simplistic 
version control system, e.g. CPAN is a checkout of the latest version of 
all files and Backpan holding the older versions.

Now, if we where to put all files into mercurial, git or the like, 
renaming the files so they don't have version numbers in their names but 
storing them sequentially as commits so new versions update old ones.

Now, a new mirror would (once) ask for the latest version without the 
history of all the files, meaning it will have to make a complete 
"checkout" of the latest version. No way around it, really. We call that 
version FOO.

But, suppose 100 modules get updated on the main server, so the server 
stores 100 changesets, which in many version control systems are stored 
sequentially in a single file. Call that version BAR.

Now the mirror wants to update again, calls the server and says, "i have 
version FOO, give me all updates". So the server looks up version FOO in 
the file (via some shorter index list), open the main file, seeks to the 
indicated position and basically dumps the rest of the file via network 
to the mirror. The mirror then applies this changeset by taking each 
chunk as a patch and applying it to the corresponding file(s).

For fast mirroring and legacy clients, the main server still would have 
a full directory checkout, allowing the oldstyle sync. Compressed, 
slurpable tarballs can also be autogenerated like once a month.

This could also solve some long-standing problems, like having modules 
available for legacy production environments. A user might still be able 
to checkout a specific version of CPAN depending on his/her needs, like 
"give me CPAN as it was on 23th December 2007".


This could work like any modern, distributed version control systems. 
That way, the user would also be able to apply local patches and/or 
deciding which changesets to pull in from the main server. Or have a 
complete, local mirror and one for the production systems where he/she 
pulls in changes after they have been reviewed.


NOW its time to kick my butt, if you want to.

LG
Rene
0
rene
3/30/2010 8:08:57 PM
On Tue, 30 Mar 2010, Rene Schickbauer wrote:

<snip>

> This could work like any modern, distributed version control systems. That 
> way, the user would also be able to apply local patches and/or deciding which 
> changesets to pull in from the main server. Or have a complete, local mirror 
> and one for the production systems where he/she pulls in changes after they 
> have been reviewed.
>
>
> NOW its time to kick my butt, if you want to.

:-) No one can accuse you of not being ambitious.  It's a neat idea, but
definitely an involved solution.  While it could solve a lot of problems I
think the human component is going to be your biggest obstacle.  As we've
seen from the reaction to the heretical notion of ditching rsync I have to
imagine getting everyone to ditch their favorite RCS tool would be even
worse.

Basically, we should just all get onboard with git (disclaimer:  I don't use
git myself, so my understanding may be deficient), a decentralized
distributed RCS.  And have developers periodically merge their branches.

Tough sell.  It probably would solve a bunch of issues, but you're treading
into vi versus emacs territory.  ;-)

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
3/30/2010 10:26:45 PM
I've said nothing till now, because I figured more noise wouldn't help much=
..

But I quite like the rsync daemon/proxy idea, and as it so happens I'm
attending the OzLabs Unconference in 3 weeks time to hang out with
Tridge, Rusty and the other Australia C/Kernel/Samba/RSync elites.

So I'd be happy to raise any issues or ideas in this area with them in
person over beers.

Adam K

On Sun, Mar 28, 2010 at 7:08 PM, Eric Wilhelm <enobacon@gmail.com> wrote:
> Or even write an rsync daemon (or proxy perhaps) in Perl. =A0So, when the
> client asks for a file, you can answer without checking the disk. =A0Can
> something like that work with an unmodified client, or does the amount
> of data needed to answer a naive client overwhelm any potential gain?
>
> Unfortunately the protocol is not formally documented and the perl code
> I've seen (File::RsyncP) seems to be lagging:
0
adam
3/31/2010 2:03:51 AM
On Sun, Mar 28, 2010 at 2:32 PM, Elaine Ashton <eashton@mac.com> wrote:
>
> On Mar 28, 2010, at 12:48 PM, Randy Kobes wrote:
>
>>
>> Has some sort of disk quota system for CPAN author accounts ever been considered?
>
> Not specifically, no, at least not that I'm aware of. That would have to be implemented on PAUSE and quotas frequently end up not solving the real problem and create a headache both for the sysadmin and the users.

new proposal: Make modules "pay rent" in order to remain on a mirror.
Rent could be in the form of actual user interest, or good reviews.

Use as a dependency could count as rent.

Or simple downloading.  A mirror server that functioned more as a
cache than a mirror would also work: only the files that are actually
requested need be stored, as long as the mirror server knows how to
get something else if requested.  If the root cause of The Pain turns
out to be "full mirroring" then do partial mirroring, and automate the
partition with a policy instead of trying to plan the partition.




-- 
question doubt
0
davidnicol
3/31/2010 4:52:20 AM
David Nicol wrote:
> On Sun, Mar 28, 2010 at 2:32 PM, Elaine Ashton <eashton@mac.com> wrote:
>> On Mar 28, 2010, at 12:48 PM, Randy Kobes wrote:
>>
>>> Has some sort of disk quota system for CPAN author accounts ever been considered?
>> Not specifically, no, at least not that I'm aware of. That would have to be implemented on PAUSE and quotas frequently end up not solving the real problem and create a headache both for the sysadmin and the users.
> 
> new proposal: Make modules "pay rent" in order to remain on a mirror.
> Rent could be in the form of actual user interest, or good reviews.

Hmm, this can *only* work as long as that model is not applied to the 
main server: Just because a module is seldomly used doesn't 
automatically mean it is not vital to *someone*.

Modules that might fit into this category are many Acme modules. For 
example, i use Acme::Don't sometimes, cause it's better better for 
temporarly commenting out code sections than "if(0)"....

LG
Rene
0
rene
3/31/2010 9:21:01 AM
On Wed, Mar 31, 2010 at 01:03:51PM +1100, Adam Kennedy wrote:
> I've said nothing till now, because I figured more noise wouldn't help much.
> 
> But I quite like the rsync daemon/proxy idea, and as it so happens I'm
> attending the OzLabs Unconference in 3 weeks time to hang out with
> Tridge, Rusty and the other Australia C/Kernel/Samba/RSync elites.
> 
> So I'd be happy to raise any issues or ideas in this area with them in
> person over beers.

I can see two possibly useful things (and I have no idea if either is yet
possible, or a great understanding of how the protocol works)

1: stateful rsync daemon which doesn't scan all the time, either by
   a: Actually having a means to update
   b: Simply telling fibs, and pretending that the file system it scanned
      $n minutes ago is still current. (Which I think would work, at least for
      a mirror where files aren't edited (much) - if the server discovers that
      the client's view of that file *is* out of date, then scan that file for
      real, and give the up to date truth)

2: federated (or federate-able) server (or proxy) - so that you can say
   "hand this subtree off to that other server"
   This would allow the (fast, existing, C) rsync server to serve most of
   (say) funet.fi, handing off to a stateful server for the CPAN subtree.

Nicholas Clark
0
nick
3/31/2010 10:11:18 AM
On Mar 31, 2010, at 6:52, David Nicol wrote:

> new proposal: Make modules "pay rent" in order to remain on a mirror.
> Rent could be in the form of actual user interest, or good reviews.

How you are proposing purging useless stuff from CPAN -- that's a lot =
more radical than Tim's proposal of  just purging _old_ useless stuff.


 - ask=
0
ask
3/31/2010 10:41:39 AM
Just to summarize (and this is going to be the last mail I send in this =
thread):

Old releases (more than a few releases back) are virtually useless.  =
Just how useless is up for debate, but BACKPAN is there.

We've always encouraged CPAN authors to purge old releases as =
appropriate.

Tim noticed (I'm guessing) that while many authors do this; some just =
don't at all.  He suggests that we could make the computers help them =
remember or do it, one way or another.

Everyone who doesn't run mirrors says "oh, who cares - it doesn't bother =
me".

Some of us who does run mirrors say "actually, that sort of thing is =
important and an actual issue.".

Others reply "then you're doing it wrong".   But nobody came with =
something reality based that'd be "right".


The main point here is that we can't use 20 inodes per distribution.  =
It's Just Nuts.   Sure, it's only something like 400k files/inodes now - =
but at the rate it's going it'll be a lot more soon enough.

HOWEVER: Right now more of those are wasted on other things (.readme =
files, symlinks, ...) -- some of which have solutions in progress =
already.

I don't think anyone is arguing that we NEED to delete the old =
distributions; only that they do indeed have a cost to keep around in =
the main CPAN.


 - ask

0
ask
3/31/2010 11:43:54 AM
On 30/03/2010 20:33, Arthur Corliss wrote:
> On Tue, 30 Mar 2010, Matija Grabnar wrote:
>
>> Er, not exactly. Read
>> http://www.cvsup.org/howsofast.html
>
> I had read http://www.cvsup.org/faq.html#features item #3.
>
>> From what I can see, cvsup uses the rsync algorithm on a file-by-file
>> basis (it uses just the differential send part of the rsync
>> algorithm). It doesn't rsync the whole tree, which was what I
>> understood to be the original problem (wasn't the complaint about the
>> flood of stats?).
>
> Sounds like I may have interpreted the FAQ incorrectly, then. Thanks for
> pointing that out. I have a few question, though: the explanation says:
>
> "At the same time, the Tree Differ generates a list of the server's
> files."
>
> That seems to infer that it's doing the exact same thing as rsync, so
> all the stats are still present on the server, right?
>
> Nowhere do I see it mentioning that the daemon is maintaining state between
> requests. The primary speed-ups (beyond special file update handling) is
> better use of bidirectional bandwidth.

Well I do know the client has a .sup file that runs into the dozens of 
megabytes for each kernel tree you track.

If you want to avoid a stat storm you are going to trade stats for disk 
space, by way of a cache. And that may be what these sup files are, but 
it may also be a red herring (they look like CVS descriptors).

I've never dived into the protocol. The fact that the client is written 
in Modula-3 scares me.

> Do you have access to a cvsup server so you can verify its behavior?

On any FreeBSD machine that syncs the kernel tree. If you're stuck, send 
me an SSH public key (RSA >= 2048 if possible, non-blank local pass 
phrase) and I shall set you up.

David

-- 
There's bum trash in my hall and my place is ripped
I've totaled another amp, I'm calling in sick
0
david
3/31/2010 2:42:27 PM
On 31/03/2010 06:52, David Nicol wrote:

> new proposal: Make modules "pay rent" in order to remain on a mirror.
> Rent could be in the form of actual user interest, or good reviews.
>
> Use as a dependency could count as rent.

Put a value tag on things and people will game the system to ensure 
their files are up on top. Doomed to failure.

David

-- 
There's bum trash in my hall and my place is ripped
I've totaled another amp, I'm calling in sick
0
david
3/31/2010 2:45:15 PM
On Wed, Mar 31, 2010 at 10:45 AM, David Landgren <david@landgren.net> wrote:
> On 31/03/2010 06:52, David Nicol wrote:
>
>> new proposal: Make modules "pay rent" in order to remain on a mirror.
>> Rent could be in the form of actual user interest, or good reviews.
>>
>> Use as a dependency could count as rent.
>
> Put a value tag on things and people will game the system to ensure their
> files are up on top. Doomed to failure.

I'm not suggesting that there be any kind of who-is-on-top game, the
game is who falls out the bottom. If someone cares enough to want to
game the system to ensure their files don't fall out, those files will
surely stay.  "pay rent" here is intended to mean something like
tracking usage over a long period in order to authoritatively identify
"old and useless" based on metrics and a policy.  Especially combined
with a Dnews-like trick file server that's really a cache and only
stores things people actually ask it for, which responds to the OP's
pain as I understand it, which is a frustration that their CPAN mirror
contains a lot of cruft. Although it still isn't clear why that is a
problem.

Purpose-based partitioning could be performed like deferred sidewalks:
put the pavement where the students make the trails in the grass.
0
davidnicol
3/31/2010 6:20:44 PM
On Wed, Mar 31, 2010 at 7:43 AM, Ask Bj=C3=B8rn Hansen <ask@perl.org> wrote=
:
> The main point here is that we can't use 20 inodes per distribution.

so don't. How much reengineering would be needed to keep CPAN in a
database instead of a file system?
0
davidnicol
4/1/2010 4:39:27 AM
On Thursday 01 April 2010 05:39:27 David Nicol wrote:
> On Wed, Mar 31, 2010 at 7:43 AM, Ask Bj=C3=B8rn Hansen <ask@perl.org> wro=
te:
> > The main point here is that we can't use 20 inodes per distribution.
>=20
> so don't. How much reengineering would be needed to keep CPAN in a
> database instead of a file system?

It'd mean each and every mirror operator changing how they sync their mirro=
rs,=20
and how access is provided...

Currently, it's dead simple to sync a copy of CPAN via rsync, offer it up v=
ia=20
whatever combination of HTTP, FTP and rsync you prefer, and job done - you'=
re=20
doing a valuable public service by offering a CPAN mirror.

Make that process a lot harder (setting up database replication, custom=20
scripts, etc etc) and a lot of people just won't do it.

There's a lot to be said for keeping things simple.

(FWIW, I run mirrors.uk2.net, and appreciated the fact it was simple and ea=
sy=20
to get a mirror up and running without investing much time at all. =20
Personally, I have no real problem with the current size of CPAN or the=20
overhead of updating via rsync, but that's just my opinion.)

Cheers

Dave P
0
davidp
4/1/2010 9:50:16 AM
--00151747af0a46ce3804832f305c
Content-Type: text/plain; charset=ISO-8859-1

Much of this discussion is beyond my depth but in terms of keeping it
simple, and trying to limit the stat calls on the upstream servers,
what about DNS as a replication model?  You could break up the tree at
logical divisions similar to zones and assign them serial numbers
(say a .serial file) and then still use rsync, but broken up into modules to
avoid recursion into sub-trees where the serial number is up to date?
The rsyncd.conf could be published also so replicas use the same
include/exclude logic.
-lee

--00151747af0a46ce3804832f305c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<font face=3D"arial,helvetica,sans-serif">Much of this discussion is beyond=
 my depth but in terms of keeping it simple, and trying to limit the stat c=
alls on the upstream servers,<br>what about DNS as a replication model?=A0 =
You could break up the tree at logical divisions similar to zones and assig=
n them serial numbers<br>

(say a .serial file) and then still use rsync, but broken up into modules t=
o avoid recursion into sub-trees where the serial number is up to date?<br>=
The rsyncd.conf could be published also so replicas use the same include/ex=
clude logic.<br clear=3D"all">

</font>-lee

--00151747af0a46ce3804832f305c--
0
leakin
4/1/2010 4:16:23 PM
--286030772-1885073826-1270144145=:3432
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Wed, 31 Mar 2010, Ask Bj=F8rn Hansen wrote:

<snip>

> Everyone who doesn't run mirrors says "oh, who cares - it doesn't bother =
me".
>
> Some of us who does run mirrors say "actually, that sort of thing is impo=
rtant and an actual issue.".
>
> Others reply "then you're doing it wrong".   But nobody came with somethi=
ng reality based that'd be "right".

Some revisionist history here.  I run mirrors (not CPAN) and know full well
the limitations and inefficiencies of rsync.  To date, not one of you have
been able to refute that for this scale rsync is hurting you.  But most of
you have been obstinately against find a more efficient way of doing things=
=2E

I've made a viable suggestion, and offered some time to work on it.  But
you've made it abundantly clear that it's not welcome.

> The main point here is that we can't use 20 inodes per distribution.  It'=
s Just Nuts.   Sure, it's only something like 400k files/inodes now - but a=
t the rate it's going it'll be a lot more soon enough.

Thats a problem, but not likely the biggest drag on server I/O you're
suffering.  Might that be <ahem> rsync?

> HOWEVER: Right now more of those are wasted on other things (.readme file=
s, symlinks, ...) -- some of which have solutions in progress already.
>
> I don't think anyone is arguing that we NEED to delete the old distributi=
ons; only that they do indeed have a cost to keep around in the main CPAN.

You're right, I'm not arguing the need for the cruft.  I've only pointed ou=
t
the obvious reality that trimming files only postpones the I/O management
issues that at some time are likely going to have to be addressed, anyway.
And that you'll get less bang for the buck (or man hour) by treating the
symptoms, not the disease.

For the record:  if that's what you want to do, have at it.  Let's just not
be disingenuous about the fact that we're abrogating our responsibilities a=
s
technologists by refusing to address the real problems and weaknesses of th=
e
platform.

 =09--Arthur Corliss
 =09  Live Free or Die
--286030772-1885073826-1270144145=:3432--
0
corliss
4/1/2010 5:49:05 PM
On Apr 1, 2010, at 19:49, Arthur Corliss wrote:

> I've made a viable suggestion, and offered some time to work on it.  But
> you've made it abundantly clear that it's not welcome.

Talk = ZzZz.
Code = Interesting.
Deployment = Useful.


  - ask

0
ask
4/1/2010 10:58:41 PM
On Apr 1, 2010, at 19:49, Arthur Corliss wrote:

I can't believe I'm doing this, but ...

>> The main point here is that we can't use 20 inodes per distribution.  =
It's Just Nuts.   Sure, it's only something like 400k files/inodes now - =
but at the rate it's going it'll be a lot more soon enough.
>=20
> Thats a problem, but not likely the biggest drag on server I/O you're
> suffering.  Might that be <ahem> rsync?

That reply doesn't even make sense.

>> HOWEVER: Right now more of those are wasted on other things (.readme =
files, symlinks, ...) -- some of which have solutions in progress =
already.
>>=20
>> I don't think anyone is arguing that we NEED to delete the old =
distributions; only that they do indeed have a cost to keep around in =
the main CPAN.
>=20
> You're right, I'm not arguing the need for the cruft.  I've only =
pointed out
> the obvious reality that trimming files only postpones the I/O =
management
> issues that at some time are likely going to have to be addressed, =
anyway.
> And that you'll get less bang for the buck (or man hour) by treating =
the
> symptoms, not the disease.
>=20
> For the record:  if that's what you want to do, have at it.  Let's =
just not
> be disingenuous about the fact that we're abrogating our =
responsibilities as
> technologists by refusing to address the real problems and weaknesses =
of the
> platform.

You are confusing "we", "I" and "you" again.

.....

Yes, I (and I'm guessing everyone else who have thought about it for =
more than say 5 seconds) agree that having rsync remember the file tree =
to save the disk IO for each sync sounds like an "obvious solution". =20

But reality is more complicated.  If it was such an obviously good =
solution someone would have done it by now.  (For starters play this =
question: "What is the kernel cache?").

Andreas' solution is much more sensible -- and as have been pointed out =
before we DO USE THAT; but the problem here is not with clients who are =
interested enough to do something special and dedicate resources to =
their CPAN mirroring.


 - ask

0
ask
4/1/2010 11:13:33 PM
On Apr 2, 2010, at 1:50, Arthur Corliss wrote:

> And my assertion has been that the excessive stats by the server are a =
bigger
> impediment to synchronization than the inode count.

Well, then one of us don't understand how file systems etc work.  :-)


  - ask=
0
ask
4/2/2010 2:37:02 PM
> It hasn't been done because its outside of the scope of design for rsync.
> It's meant to sync arbitrary filesets in which many, if not all, changes =
are
> made out of band. =C2=A0It's decidely non-trivial to implement in that mo=
de
> unless you're willing to accept a certain window in which your database m=
ay
> be out of date.
>
> But, in a situation like PAUSE, where the avenues in which files can be
> introduced into the file sets is controlled, it does become trivial. =C2=
=A0It's
> the gatekeeper, it knows who's been in or out.

so the requirements for the Solution To The Problem Which Solves A
More General Problem Than The Immediate Problem And Will Therefore
Make Whoever Sets It Up A Hero include a replacement for the current
mirroring technology stack that is tailored to mirroring distributions
possibly including on-demand caching and expiration and that is
trivial to install -- something like

  perl -MCPAN -e 'install STTPWSAMGPTTIPAWTMWSIUAH::Mirrorsuite'
  nohup nice nice perl -MSTTPWSAMGPTTIPAWTMWSIUAH::Mirrorsuite -e
'mirror cpan.org .' &
0
davidnicol
4/4/2010 9:11:11 PM
On Sun, 4 Apr 2010, David Nicol wrote:

> so the requirements for the Solution To The Problem Which Solves A
> More General Problem Than The Immediate Problem And Will Therefore
> Make Whoever Sets It Up A Hero include a replacement for the current
> mirroring technology stack that is tailored to mirroring distributions
> possibly including on-demand caching and expiration and that is
> trivial to install -- something like
>
>  perl -MCPAN -e 'install STTPWSAMGPTTIPAWTMWSIUAH::Mirrorsuite'
>  nohup nice nice perl -MSTTPWSAMGPTTIPAWTMWSIUAH::Mirrorsuite -e
> 'mirror cpan.org .' &

Gee, kind of looks like your tongue got superglued to your cheek.  You're
mischaracterizing the problem.  The immediate problem *is* the I/O load
caused by synchronizing mirrors with rsync, *not* supporting CPAN clients,
right?  If you have data indicating something different, then please provide
it so we can all get educated.

Regardless, it should be that easy to install, but it should also install a
script into bin/ to make ye ole cron job just as succinct as what's
currently being used with rsync.

 	--Arthur Corliss
 	  Live Free or Die
0
corliss
4/5/2010 4:24:19 PM
Reply:

Similar Artilces:

Attempt to contact module author Shlomo Yona (SuffixTree module on CPAN)
The module author Shlomo Yona for the Perl Module SuffixTree lists an email address that is no longer valid. An email has been sent to this author's @cpan.org email address as well (I'm hopeful that there will be a response, but as of now haven't). There are bug reports in the RT that include patches ranging from 2 years old to 8 years old. The module was first uploaded in January of 2003, with two subsequent revisions also in January 2003. No activity since then. This message is an attempt to contact Shlomo Yona. If you know how to reach him please let me know. If ...

Making RPMs out of CPAN Modules (namely XML::Feed) Automatically
Hi good people, I'd like to install the XML::Feed on my system. (Mandriva Linux 2006 Beta-2) Only problem is that I want it and all of its dependencies packaged as RPMs. I'm aware of three such solutions for automatically preparing RPMs out of CPAN packages: 1. cpan2rpm - http://perl.arix.com/cpan2rpm/ - I don't know if it does dependency recursion. 2. Ovid - http://search.cpan.org/dist/Ovid/ - Slightly less updated to handle all the latest CPAN technologies (Module::*, ExtUtils::*, etc.). But does recursion. 3. cpanflute - I heard about it, but cannot find ...

more Module::Build failures on CPAN modules
I keep getting stuff like: ! perl (5.8.9) is installed, but we need version >= 5.8 ! perl (5.12.3) is installed, but we need version >= 5.8 Does that look familiar? --tom --00151747849e2225f504a316ccc0 Content-Type: text/plain; charset=ISO-8859-1 On Thu, May 12, 2011 at 6:33 PM, Tom Christiansen <tchrist@perl.com> wrote: > I keep getting stuff like: > > ! perl (5.8.9) is installed, but we need version >= 5.8 > > ! perl (5.12.3) is installed, but we need version >= 5.8 > > Does that look familiar? > Oh ...

Scripts in CPAN modules & search.cpan.org
Hello: I'm not sure where to report this, but there seems to be a bug in search.cpan.org related to modules containing scripts with hyphens in them. I noticed this looking at the page http://search.cpan.org/~joshr/Sman-0.95/ : The link for the script 'sman' leads to the docs for 'sman-update', and the link for the script 'sman-update' does not exist. (The scripts themselves exist and can be found at http://search.cpan.org/src/JOSHR/Sman-0.95/script/ ) Is there a bug system this should get filed in, or an appropriate group to notify? Or is this...

CPAN-ification of core modules
Am I right that there is some effort to take the core modules and move them to CPAN so they'll have a dual life? If that's correct I would like to to try to pick up one of the Perl only modules (I don't have skills to do C) and make it dual life. Is there a document on what are the issues with with dual life modules? Is there a list of modules that you would like to see as on CPAN (or a list that you don't want to make dual life)? regards Gabor Gabor Szabo wrote: > Am I right that there is some effort to take the core modules and move them > to CPAN ...

Add tags to CPAN modules via CPAN::Forum
Hi, as I have written on use.perl.org already I have added a way to tag the CPAN modules via CPAN::Forum http://www.cpanforum.com/ Soon I'll start to provide a downloadable version of this information to be integrated with the search engines. To see the already existing tags visit http://www.cpanforum.com/tags/ Soon I'll provide a way to connect the username on CPAN::Forum with the PAUSEID so the tags added by module authors can have a higher weight in the search results. So go tag a module today. Regards Gabor -- Gabor Szabo http://www.szabgab.com/ Pe...

Automatic authorization of controls and adding authorization at desing time
Are you planning to add a role or authorization attributes to the server controls, so they could automatically be displayed if a user has the specified roles or not? /Fredrik Normén - fredrikn @ twitterMicrosoft MVP, MCSD, MCAD, MCTASPInsidersMy Blog That's an interesting idea - I've filed a suggestion on it. So far you can set access to whole pages based on roles, and you have access to roles in code. Right now, for example, you can show and hide commands based on role in an If Not isPostBack() block in Page_Load. We're still thinking about whether we need to provide role-b...

automatic purge
Hi Does anyone know when a Netware Server 6.0 with NSS Version 3.05 starts the automatic purge? Less than 10% or 20% free space? And does the purgeable space belong to the free space? Cheers Chris Hi http://support.novell.com/cgi-bin/search/searchtid.cgi?/10092950.htm Regards Markus <christopher.baertsch@abraxas.ch> schrieb im Newsbeitrag news:n8hvd.6328$Hw3.1354@prv-forum2.provo.novell.com... > Hi > > Does anyone know when a Netware Server 6.0 with NSS Version 3.05 starts > the automatic purge? Less than 10% or 20% free space? > > And...

CPAN Modul
Hello How can I deinstall a Modul, which I install before with CPAN? Mfg Thierry Rietsch Thierry [t.rietsch@nzz.ch] quoth: *>Hello *>How can I deinstall a Modul, which I install before with CPAN? You use ExtUtils::Installed to name the files and then delete them. #!/usr/local/bin/perl use ExtUtils::Installed; my ($inst) = ExtUtils::Installed->new(); my $module = "Date::Christmas"; foreach my $found (sort($inst->files($module))) { print "$found\n"; }; which would give you ... /usr/local/lib/perl5/5.00503/man/man3/Date::Christmas.3...

Automatic trimming
Hi all, A nature of our tables is that we mostly use Varchar with some predefined length as columns. While testing we encountered that some users tried to fill in strings longer than the predefined field length which results in an exception (which is good). I know we should have thought about that in the first place and find a clever scheme to check for the maximum length first but is there some kind of automatic trimming mechanism we could use or how can we easily dynamically check if the length does not exceed the given VARCHAR length? We use TIBSQL and TIBQuery for our applic...

CPAN module
------=_NextPart_000_013E_01C16937.CC6E8F40 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I have tried to use the CPAN module to install other modules on a SunOS = 5.6. I get the error message: Can't exec "/usr/bin": Permission denied at = /usr/local/lib/perl5/5.00502/sun4-solaris/IO/File.pm line 164. As far as I can tell /usr/bin is world-executable. Does anyone have an = idea what the problem is? I used the CPAN shell on RH 6.1 and it was fine, not that that means = anything since it's a completely differe...

Trimming the CPAN
Hi all, Lately, I've been thinking that there are many modules on CPAN with old versions still present there (and not only on the BackPAN) - often by previous authors. For example, look at: http://search.cpan.org/dist/Curses-UI/ It still has Curses-UI-0.71 from February 2002, and many other previous versions. All of this takes space on the CPAN , takes time to mirror and download, and clutters the various interfaces like search.cpan.org. So I've been thinking that maybe we should trim the CPAN and remove older versions like that so it will contain much less cruft....

modules, modules
This site appears to be the most comprehensive list of free custom modules: www.dnnfaq.com Any other sites? I do like how Rainbow gives you a bucket of them - saves a lot of time over having to snoop around and find some of the DNN ones. Is there also a way to list modules as "certified"? Also - is there a decent repository of skins? I have recently installed version 2.0. Sure wish there were more modules ready for it! That would make it much easier to evaluate the program and give recommendations to my boss. The reason there aren't more 2.0 modules yet is becasue it h...

Module in Module
Hi, Is there a way to place a module inside another module (e.g. A feedback module inside a Text/HTML module) ? Cheers Tassos There is a way to inject controls into a module dynamically based on some criteria like a querystring parameter - is that what you are wanting to do?Dylan Barberread my stupid blog http://codemypantsoff.com There's a commercial module "wrapper" on Snowcovered that is designed to hold other modules.  So it is possible.I don't know if you could put a module holding a module inside a module containing a module, but I wouldn't try.  The entire space-time...

CPAN modules
Greetings, this is a very newbie question: 1. Where/how/what is the setting to allow modules built via CPAN build process to be located such that it is visible to my shell(bash ot tcsh) Is there an environmental variable that need to be set? donnie Donnie, You wrote: >1. Where/how/what is the setting to allow modules built via CPAN build >process to be located such that it is visible to my shell(bash ot tcsh) As far as I know, there is no such setting, and I do not believe one is even necessary, as your login shells have no relationship to perl whatsoever. ...

Web resources about - Trimming the CPAN - "Automatic Purging" - perl.module-authors

Resources last updated: 1/12/2016 7:49:17 PM