Jump to content

Saving cache pages (was: Different types of geocaching.com users)


JusticeMercy

Recommended Posts

I have a question for people that use features that are not available on the site, by saving the data and developing their own tools to meet their needs (i.e. route planning, multiple home-points ala GPXSpinner, imports to mapping software, or syncing with PDAs) without going the standard PQ -> GPX -> ? route. Assuming, that their bandwidth wouldn't be any greater than the casual user because they are seeing the same number of cache pages, just saving it for later (using geoclipper, good ole' Save as, or another web-capture tool like Plucker or AvantGo) is this use welcome? I have never read that people would be unwelcome to save their page views, but I get the impression from the general make-up of the site. For instance, a &num=50 flag on the search similar to Google would make it much easier to get a decent .loc file without having to merge 5 little ones, but I've always assumed GC.com didn't want people to be able to see that much information on their screen or they would save it. If you just save pages you'd look at anyway, is this OK?

 

[Message has been edited for clarity]

 

[This message was edited by PointingTheWay on July 08, 2003 at 09:04 AM.]

Link to comment

In short, I was wondering what percentage of users cause what percentage of load on the servers?

 

Also, would there be a load increase to adding a flag like &num=XX to pull down a larger .loc file?

 

I should have just said that to start with icon_rolleyes.gif

 

Thanks, PTW

Link to comment

I'm not sure of the question either, but I fall into the second and fourth categories. In doing so, I probably use LESS bandwidth than active non-members.

 

Here's why: As a member, I can (only!) get up to five Pocket Queries which eliminate the need for me to search around the site looking for a cache (or more likely 5 to 6 caches) to find. That probably substitutes for several dozen page views, more if you consider just how many waypoints I take with me. Once I have my GPX files, I'm not using ANY bandwidth from the GC website until I log my cache finds or travel bugs. I run my own custom software on the sets of GPX files that reduce them to a single GPX file containing around 600 or 700 waypoints that feature clumps of caches near areas where I am a lot and (here's the really cool part!!) gives me all caches within 1 mile of all the 17 highways along which I frequently travel. My GPS takes up to 1000 waypoints, so I just dump them all in. Meanwhile, my software also creates a specially formatted HTML file for each cache and creates a menu structure for easy navigation. Then Plucker converts that entire directory structure into a single document which is then loaded into my Palm Tungsten T.

 

So, I probably fall quite high on "Jeremy's happiness meter" since I'm not using large amounts of bandwidth to get my huge list of waypoints.

Link to comment

quote:
Originally posted by PointingTheWay:

Assuming, that their bandwidth wouldn't be any greater than the casual user because they are seeing the same number of cache pages, just saving it for later (using geoclipper, good ole' Save as, or another web-capture tool like Plucker or AvantGo) is this use welcome?


Unfortunately, in our experience, this assumption doesn't hold true. Save As is obviously not a problem, but automated tools that grab data off the site never simulate the actual browsing from a user. They're always much harder than a real person. The reason is that these apps need some way to know what pages to grab, so you give them a start page and then a "depth". Any depth greater than 1 or 2 causes an enormous number of pages to be queued for download. And in the case of Avantgo, in an effort to increase performance, they queue up multiple servers to process the requests for that channel. I've seen 14 Avantgo servers hitting our site at one time. So take one Avantgo user with a channel depth of 3 and you get somewhere on the order of 5000 pages to download. Avantgo spawns 10 servers, each grabbing 500 pages as fast as they can. Ouch! And I know that some people have used larger depths just to see what they get.

 

Another example happened last week when the server blocked someone's IP address for excessive requests. When he contacted me, I asked him if he was doing anything that might have caused it. He mentioned that he was trying out a new shareware application to grab cache pages. Unfortunately, the app had a bug in it so it ran continuously. He thought it might just be slow, so he just let it run for a while. I'm not sure how long he was running that tool before the server blocked him, but basically in a matter of minutes, he had downloaded 300 MB of data from the site. It was clearly an honest mistake and he had no way of knowing this might happen, but that's typical of what we're up against with automated tools.

 

Pocket Queries are designed to use cached (no pun intended) versions of the cache pages. We can send out thousands of PQs with 300 caches each with little to no impact on the web server or database. That's why we so heavily encourage their use. And as you pointed out, since they're GPX files, it makes it easy for you to use the data in a way that works best for you.

 

quote:
In short, I was wondering what percentage of users cause what percentage of load on the servers?
This may seem a little out of whack, but with the few imprecise indicators we have I'd estimate that 5% of the users cause over 30% of the server load.

 

frog.gif Elias

Link to comment

quote:
Originally posted by Elias:

This may seem a little out of whack, but with the few imprecise indicators we have I'd estimate that 5% of the users cause over 30% of the server load.

 

frog.gif Elias


 

So, then 5% of cheap, inconsiderate users cause geocaching.com to slow to a crawl or stop every weekend for the other 95% of us. Great!

 

Tae-Kwon-Leap is not a path to a door, but a road leading forever towards the horizon.

Link to comment

quote:
Originally posted by Elias:

This may seem a little out of whack, but with the few imprecise indicators we have I'd estimate that 5% of the users cause over 30% of the server load.


For the record, my use of Plucker and my tools are limited ONLY to GPX files already in my email inbox. The only exception is a CGI file that scrapes a single page (the California state page) looking for nearby caches with potential of FTF. And it runs only when I manually request it, which is not very often. Since it doesn't download the graphics, it's actually cheaper in terms of bandwidth than a normal browser. Being an administrator myself, I know what NOT to do that could affect other people's websites. I wish others had the same consideration...

Link to comment

quote:
Originally posted by Elias:

Pocket Queries are designed to use cached (no pun intended) versions of the cache pages. We can send out thousands of PQs with 300 caches each with little to no impact on the web server or database.


If that is the case then why haven't we seen the limits bumped up as so many people have requested?
Link to comment
Guest
This topic is now closed to further replies.
×
×
  • Create New...