Skip to content
 

Overview of Web Site Traffic Analysis Tools

The first of the year is a time for looking back at what was accomplished in the past year, and to set goals for the coming year. If you run a web site, one way to do that is to look at your site’s traffic and see what happened. I run my own server, so have access to my complete web server logs. Historically I’ve aways used Webalizer to run my stats.

But is Webalizer still the tool I should be using? There are other tools out there, so I looked at six different applications that will generate different reports based on my server logs. The tools I looked at are: Analog, AWStats, PWebStats, Visitors, Webalizer, and W3Perl.

Tool My Comments Final Grade
Analog
  • Slightly quirky command line syntax
  • Had to manually copy the images to the output directory
  • Output is a single html file
B
AWStats
  • Command line configuration is intimidating
  • Have to process each month separately in a two step process (one to read logs, other to generate reports)
  • Command line installation is quirky, with numerous path headaches
  • Gives the best, most detailed reports
  • Easily scriptable with shell scripts
  • 22 html files created (per month of data) at highest level of detail
A-
PWebStats
  • Ugh – lots of configuration work
  • Tedious to work on
  • Doesn’t seem to be able to easily do historical reports
  • Many files output
D
Visitors
  • Simple configuration, works immediately after installation
  • Hardcoded Google traffic report, no other search engines included
  • Lots of command line options, no configuration file, so a good candidate for scripting once you figure out which options you want to use
  • Report is one single file, no graphic files needed; images in the report are actually shaded table cells
  • Apparently handles data from multiple years
B+
Webalizer
  • Requires a config file; command line options available
  • Creates easy to read chart and reports
  • Output three files per month (one html page and two images) plus a 12 month summary (one html page and one image)
  • Does not keep annual reports so to have a 2007 report once any part of 2008 is analyzed, you need to manually rename and tweak the annual report once all of December has been processed
B
W3Perl
  • Difficult to install and set up for offline usage
  • I never was able to get it to find the language files, so it never would run
I

Notes:

  1. I was testing offline processing. Some of these tools can be installed to run directly from a web site’s cgi-bin (W3Perl apparently prefers that setup).
  2. I ran these reports on the log of one of my rarely used domains, which is also used for exploring software. The log file had 7,746 records in it for 2007.
  3. I edited by hand the sample reports to remove spam referral links, as I don’t want to link to “bad places.”
  4. Processing speed can be important, but generally reporting tools like these are scripted and run overnight, so I did not track processing time. When I ran these programs on my larger sites’ logs (PlanetMike.com, ChristmasMusic247.com and ShowBizRadio.net) they all finished in an acceptable amount of time.
  5. The sample reports show default options. Read the tool’s docs for details on customizing your reports.
  6. Final Grade is entirely subjective, my own opinion.

Keep in mind reading server logs is a black art. Many assumptions are made by each of these tools. A key assumption is how to define a visit. If the same IP address and user agent visit within 30 minutes, that’s one visit. To another tool with the same data, that may be two visits. Look at this example: A user from the IP address 257.258.259.260 visits a site at 10:00pm, reads that page, and then follows links on the site. The pages are accessed at 10:10pm, 10:50pm, 11:00pm, 11:10pm, 11:45pm, 12:05am and 12:10am. Webalizer would report one site and three visits. Visitors would report two unique visitors.

And that raises the issue of defining terms. Someone (or something) that accesses a site may be called a visitor, a host, or a site. Hits may be called hits, requests, accesses. Some tools define hits as only html pages that are accessed, others think a hit is anything that is accessed from the server. So you shouldn’t compare stats from one tracking package to another. You can’t easily compare stats until you understand what they are reporting.

Another major issue is removing robots, spiders and crawlers from your reports. Most webmasters aren’t interested in how many automated critters are devouring their site, they only want to know how many people are reading their articles. That’s where third party embedded tools come into play. I will discuss those tools next week. One time when you do want to know about automated traffic is when you want to identify Bad Things. Spammers, thieves, crackers and other abusers are out there leaving fingerprints in your log files. Third party tools can’t help you when you’re looking for bots.

Conclusion

Each of these tools has a place in a webmaster’s toolbox. I will continue to use Webalizer for my public traffic report for PlanetMike.com. And for my own knowlede of how my sites are doing, I’ll probably use Webalizer, Visitors, and AWStats.

Update: September 8, 2009: I’ve removed the sample reports I generated. They were attracting a huge amount of attention from referer log spammers.

One Comment

  1. Bubba says:

    I DONOT like WebAlizer.
    Main reasons:
    1) Not very correct stats
    2) Refspam through webalizer logs

    Refspam is popular in my country, and in case they make it more often the site with WebAlizer may me ddosed.

    Thats what i think