Inside Paulo Abrantes' head
[ start | index | login or register ]
start > 2006-05-24 > 1

Log Analyzer: Integration of awstats with the Apache http server

Created by pabrantes. Last edited by pabrantes, 2 years and 62 days ago. Viewed 1,350 times. #1
[edit] [rdf]
labels
attachments

Log Analyzer: Integration of awstats with the Apache http server

Lately I've been asking myself some questions like:
  1. How many daily visits I'm getting?
  2. From where does those visits come from?
  3. Which pages are the most viewed?
  4. ...
Well you know, the normal questions you ask yourself when you have a website.

I was noticing a traffic increase while reading the adsense's reports but the information wasn't very useful I just saw I was getting more ads displayed.. That actually didn't tell me, for sure, that I was having more visitors. So I decided to install a log analyzer for apache.
After some search I decided to go for >>awstats.

In this post you'll find an introduction to what awstats is and it's capabilities along on how to deploy it.

Keep in mind that the solutions presented here are the ones I designed, which are probably similar to some out there and probably also very different from others. It's up to you to see if they fit in your needs or if you have to adapt them.

The following subjects will be presented:

  • Before starting
  • Presenting awstats
  • Installing awstats
  • Configuring awstats
  • Configuring Virtual Hosts in Apache
Before Starting

You need to have installed the following software:

  • >>Apache HTTP Server: other http servers will also work with awstats, but I'll be lecturing with apache since it's the one I use.
  • >>Perl 5.00503 or higher: you'll probably already have perl installed with your linux distro, and it will be higher then 5.00503.
  • >>Awstats: if I really convince you that awstats is a nice webstats tool and you want to install it.
There's no special compilation options for Apache, since the directives we'll be using (<VirtualHost> and <Directory>) are part of the core system.

Presenting awstats

Awstats is a perl script which reports your website statistics using the access log of the server. It also supports ftp and mail logs, although, that is out of the scope of this post.

It can operate as a cgi-bin file within your http server or it can run in batch mode and generating a static HTML file.

Awstats supports the following features (this is not an exhaustive list of the features. For such list please check >> awstats comparison with other log analyzers):

  • Unique visitors, number of visits, number os pages retrieved and bandwidth spent counters by month and day of the month
  • Top day of the week, top hour of the day
  • Countries and hosts that have been accessing your site (with some plugins you can even show the city)
  • Spiders that have crawled your website
  • Visit Duration
  • Top pages requested
  • Operating Systems from the visitors
  • Browsers from the visitors
  • Referees
  • Search Terms
Installing awstats

Installing it's pretty straight forward, you first download >>awstats.

You untar the file into your /usr/local directory and then you run perl awstats_configure.pl and answer the simple questions. When you finish this process you'll have a configuration file in /etc/awstats/awstats.YOUR_HOST.conf_

It also probably added you a few lines in the http.conf or generated a file with the lines that it then asks you to add. For now we'll ignore those lines.

At least you'll probably be interested in one plugin, called geoIPFree, which is a plugin that allows awstats to report from which country a certain IP is in a more accurate way.

So you can >> download GeoIP free and install it.

Now before we configure awstats, we'll first configure the virtual hosts in apache.

Configuring Virtual Hosts in Apache

The idea is that we setup two different virtual hosts, one that holds your website and other one that displays the stats. This is actually not necessary, but I prefer this way.

Imagine that your apache installation is located at /usr/local/apache, your log directory is logs and your httpd documents directory is htdocs.
Your vhost configuration has to look something like this:

NameVirtualHost *

<VirtualHost *>
ServerName www.serverdomain.tld
DocumentRoot /usr/local/apache/htdocs/mainsite/
ErrorLog /usr/local/apache/logs/mainsite-errors.log
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""
TransferLog /usr/local/apache/logs/mainsite-access.log
</VirtualHost>

<VirtualHost *>
ServerName awstats.serverdomain.tld
DocumentRoot /usr/local/apache/htdocs/stats/
ErrorLog /usr/local/apache/logs/stats-errors.log
TransferLog /usr/local/apache/logs/stats-access.log
</VirtualHost>

Probably the only line that left you thinking is the LogFormat one, that type of format is called the NCSA format and it's a widely adopted standard. Using it you can have access to details such as referees, browser and operating system of the clients that are accessing your website.

Although we might want to restrict access to our statics, so we'll configure the access policy for the awstats vhost, to do that you need to use the <Directory> directive.

So inside the awstats VirtualHost we add the following:

<Directory /usr/local/apache/htdocs/stats/>
Order Deny, Allow
Allow from 192.168.0.0/24
Deny from all
</Directory>

With such configuration only local IPs in the private network 192.168.0.0/24 will be allowed to access the awstats reports.

Configuring awstats

Open the configuration file located in /etc/awstats and make sure that:

  1. LogFile is set to your mainsite TransferLog file
  2. LogType is set to W
  3. LogFormat is set to 1 (the NSCA format)
  4. SiteDomain is set to your site domain
  5. AllowToUpdateStatsFromBrowser is set to 0 (not that we will actual use it)
  6. DNSLookup is setted to 1 (this means that dns is fully enabled and
every hostname will be resolved. If you have a high traffic site, this might take a while to process)
  1. LoadPlugin="geoipfree is uncommented
Now we'll have to setup a cron job that will update the reports, you can choose daily or hourly. That's up to you.

You now have to choose between using static HTML or having a cgi-bin script, the first one is more secure but less flexible, if you want for example to check the reports of daily visits for a previous month you'll have to generate by hand again a new report, using the cgi-bin you just have a drop box and have to select the month you want. Although awstats already had some security problems in the past and it's up to you to know which one to choose.

Let's first configure for a static output.

You write the following script

#!/bin/bash

perl /usr/local/awstats/wwwroot/cgi-bin/awstats.pl -config=YOUR_DOMAIN -update -output -staticLinks > /usr/local/apache/htdocs/stats/index.html

And you store it in the cron.daily or cron.hourly directory so it gets executed daily or hourly. Don't forget to give executing permissions to it.

You can now access your website statistics reports by simply typing in your browser awstats.yourdomain.tld. And you're done!

Now let's write the script to use with awstats in a cgi-bin way.

#!/bin/bash

perl /usr/local/awstats/wwwroot/cgi-bin/awstats.pl -config=YOUR_DOMAIN -update

Although that's not all, you still have to do two other things:

1. Create a Symlink from your /usr/local/apache/htdocs/stats to /usr/local/awstats/wwwroot

2. Add the following lines within the awstast vhost:

Alias /awstatsclasses "/usr/local/apache/htdocs/stats/classes/"
Alias /awstatscss "/usr/local/apache/htdocs/stats/css/"
Alias /awstatsicons "/usr/local/apache/htdocs/stats/icon/"
ScriptAlias /cgi-bin/ "/usr/local/apache/htdocs/stats/cgi-bin/"

Now you can access your website statistics reports by going to the following URL: awstats.yourdomain.tld/cgi-bin/awstats.pl?config=The_Server_Name_You_Give_In_The_Config_Param

Final notes

As you probably already realized it's pretty simple to scale this solution. you just need to get more awstats configuration files, and run each configuration file in the cron job.
If you use the cgi-bin version you just have to change the config parameter to access each stats but if you are using the static HTML solution you have to generate a different HTML file for each one (which is trivial).

I think there's not much else to add. I suggest you to read the Documentation in the awstats and also the comments in the configuration files.

6 comments (by pabrantes, MANOWAR^) | post comment
Who am I?
paulo-roca2My name is Paulo Abrantes AKA pabrantes and I'm a software developer. I'm currently employed at >>CIIST working as a Java developer in >>FenixEDU.

This blog is mostly about Java programming, domain driven design and snipsnap bliki developing. Everything written in this blog is my personal opinion and it may not reflect the opinions of my employer and co-workers.


Blog subscription
subscribe by rss subscribe by email

Links
>> Home
>> Paulo's Profile
>> Post History
>> Add to Technorati Favorites
>> Paulo's Photo Gallery
>> WishList
>> Posting without Login

Search Blog
Fellow Bloggers

Recent Posts

Java Programming: Bytecode Injection
Intermission: Sorry For Downtime
Software Developing: Studying The Bliki Domain Model
SnipSnap Developing: Trying to settle a roadmap
System Administration: Load Balancing with Apache
Blogging: Two years have passed
Software Developing: The SnipSnap Saga
Java Programming: Getting your code spicy with Groovy
Software Developing: Fluent Interfaces
Software Developing: Implementing a ShoutBox on SnipsSnip
Software Developing: SnipSnap, SnipIt and SnipSnip
Java Programming: Proxies and Access Control
Java Programming: Proxies and References
Java Programming: References' Package
YALM: Yet Another Layout Modification

For older posts, please refer to post-history for a complete Post History

Logged in Users: (0)
… and 7 Guests.
This is a modified version of snipsnap.org created by >>Paulo Abrantes