<rdf:RDF
    xmlns:s='http://snipsnap.org/rdf/snip-schema#'
    xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
    xml:base='http://pabrantes.net/blog/rdf'>
    <s:Snip rdf:about='http://pabrantes.net/blog/rdf#start/2006-05-24/1'
         s:name='start/2006-05-24/1'
         s:cUser='pabrantes'
         s:oUser='pabrantes'
         s:mUser='pabrantes'>
        <s:content>1 Log Analyzer: Integration of awstats with the Apache http server {anchor:Log Analyzer: Integration of awstats with the Apache http server}&#xA;Lately I&apos;ve been asking myself some questions like: &#xD;&#xA;1. How many daily visits I&apos;m getting?&#xD;&#xA;1. From where does those visits come from?&#xD;&#xA;1. Which pages are the most viewed?&#xD;&#xA;1. ~~...~~&#xD;&#xA;&#xD;&#xA;Well you know, the normal questions you ask yourself when you have a website. &#xD;&#xA;&#xD;&#xA;I was noticing a traffic increase while reading the adsense&apos;s reports but the information wasn&apos;t very useful I just saw I was getting more ads displayed.. That actually didn&apos;t tell me, for sure, that I was having more visitors. So I decided to install a log analyzer for apache.&#xD;&#xA;\\After some search I decided to go for {link:awstats|url=http://awstats.sourceforge.net|newWindow=true}. &#xD;&#xA;&#xD;&#xA;In this post you&apos;ll find an introduction to what awstats is and it&apos;s capabilities along on how to deploy it.&#xD;&#xA;&#xD;&#xA;Keep in mind that the solutions presented here are the ones I designed, which are probably similar to some out there and probably also very different from others. It&apos;s up to you to see if they fit in your needs or if you have to adapt them. &#xD;&#xA;&#xD;&#xA;The following subjects will be presented:&#xD;&#xA;&#xD;&#xA;* Before starting&#xD;&#xA;* Presenting awstats&#xD;&#xA;* Installing awstats&#xD;&#xA;* Configuring awstats&#xD;&#xA;* Configuring Virtual Hosts in Apache &#xD;&#xA;&#xD;&#xA;&#xD;&#xA;__Before Starting__&#xD;&#xA;&#xD;&#xA;You need to have installed the following software:&#xD;&#xA;&#xD;&#xA;* {link:Apache HTTP Server|url=http://www.apache.org|newWindow=true}: other http servers will also work with awstats, but I&apos;ll be lecturing with apache since it&apos;s the one I use.&#xD;&#xA;* {link:Perl 5.00503 or higher|url=http://www.perl.org/|newWindow=true}: you&apos;ll probably already have perl installed with your linux distro, and it will be higher then 5.00503.&#xD;&#xA;* {link:Awstats|url=http://awstats.sourceforge.net|newWindow=true}: if I really convince you that awstats is a nice webstats tool and you want to install it.&#xD;&#xA;&#xD;&#xA;There&apos;s no special compilation options for Apache, since the directives we&apos;ll be using (&lt;__VirtualHost__&gt; and &lt;__Directory__&gt;) are part of the core system. &#xD;&#xA;&#xD;&#xA;__Presenting awstats__&#xD;&#xA;&#xD;&#xA;Awstats is a perl script which reports your website statistics using the access log of the server. It also supports ftp and mail logs, although, that is out of the scope of this post.&#xD;&#xA;&#xD;&#xA;It  can operate as a cgi-bin file within your http server or it can run in batch mode and generating a static HTML file. &#xD;&#xA;&#xD;&#xA;Awstats supports the following features (this is not an exhaustive list of the features. For such list please check {link: awstats comparison|url=http://awstats.sourceforge.net/docs/awstats_compare.html|newWindow=true} with other log analyzers):&#xD;&#xA;&#xD;&#xA;* Unique visitors, number of visits, number os pages retrieved and bandwidth spent counters by month and day of the month&#xD;&#xA;* Top day of the week, top hour of the day&#xD;&#xA;* Countries and hosts that have been accessing your site (with some plugins you can even show the city)&#xD;&#xA;* Spiders that have crawled your website&#xD;&#xA;* Visit Duration&#xD;&#xA;* Top pages requested&#xD;&#xA;* Operating Systems from the visitors&#xD;&#xA;* Browsers from the visitors&#xD;&#xA;* Referees&#xD;&#xA;* Search Terms&#xD;&#xA;&#xD;&#xA;__Installing awstats__&#xD;&#xA;&#xD;&#xA;Installing it&apos;s pretty straight forward, you first download {link:awstats|url=http://awstats.sourceforge.net/#DOWNLOAD|newWindow=true}. &#xD;&#xA;&#xD;&#xA;You ~~untar~~ the file into your /usr/local directory and then you run ~~perl awstats_configure.pl~~ and answer the simple questions. When you finish this process you&apos;ll have a configuration file in __/etc/awstats/awstats.YOUR_HOST.conf___&#xD;&#xA;&#xD;&#xA;It also probably added you a few lines in the http.conf or generated a file with the lines that it then asks you to add. For now we&apos;ll ignore those lines.&#xD;&#xA;&#xD;&#xA;At least you&apos;ll probably be interested in one plugin, called geoIPFree, which is a plugin that allows awstats to report from which country a certain IP is in a more accurate way. &#xD;&#xA;&#xD;&#xA;So you can {link: download GeoIP free|url=http://search.cpan.org/~gmpassos/Geo-IPfree-0.2/|newWindow=true} and install it.&#xD;&#xA;&#xD;&#xA;Now before we configure awstats, we&apos;ll first configure the virtual hosts in apache. &#xD;&#xA;&#xD;&#xA;__Configuring Virtual Hosts in Apache__&#xD;&#xA; &#xD;&#xA;The idea is that we setup two different virtual hosts, one that holds your website and other one that displays the stats. This is actually not necessary, but I prefer this way.&#xD;&#xA;&#xD;&#xA;Imagine that your apache installation is located at __/usr/local/apache__, your log directory is __logs__ and your httpd documents directory is __htdocs__.\\&#xD;&#xA;Your vhost configuration has to look something like this:&#xD;&#xA;&#xD;&#xA;{quote}&#xD;&#xA;&#xD;&#xA;NameVirtualHost *&#xD;&#xA;&#xD;&#xA;&lt;VirtualHost *&gt;\\&#xD;&#xA;ServerName www.serverdomain.tld\\&#xD;&#xA;DocumentRoot /usr/local/apache/htdocs/mainsite/\\&#xD;&#xA;ErrorLog /usr/local/apache/logs/mainsite-errors.log\\&#xD;&#xA;LogFormat &quot;%h %l %u %t \\\&quot;%r\\\&quot; %&gt;s %b \\\&quot;%{Referer}i\\\&quot; \\\&quot;%{User-agent}i\\\&quot;&quot;\\&#xD;&#xA;TransferLog /usr/local/apache/logs/mainsite-access.log\\&#xD;&#xA;&lt;/VirtualHost&gt;\\&#xD;&#xA;&#xD;&#xA;&lt;VirtualHost *&gt;\\&#xD;&#xA;ServerName awstats.serverdomain.tld\\&#xD;&#xA;DocumentRoot /usr/local/apache/htdocs/stats/\\&#xD;&#xA;ErrorLog /usr/local/apache/logs/stats-errors.log\\&#xD;&#xA;TransferLog /usr/local/apache/logs/stats-access.log\\&#xD;&#xA;&lt;/VirtualHost&gt;&#xD;&#xA;&#xD;&#xA;{quote}&#xD;&#xA;&#xD;&#xA;Probably the only line that left you thinking is the LogFormat one, that type of format is called the NCSA format and it&apos;s a widely adopted standard. Using it you can have access to details such as referees, browser and operating system of the clients that are accessing your website. &#xD;&#xA;&#xD;&#xA;Although we might want to restrict access to our statics, so we&apos;ll configure the access policy for the awstats vhost, to do that you need to use the &lt;__Directory__&gt; directive. &#xD;&#xA;&#xD;&#xA;So inside the awstats VirtualHost we add the following:&#xD;&#xA;&#xD;&#xA;{quote}&#xD;&#xA;&lt;Directory /usr/local/apache/htdocs/stats/&gt;\\&#xD;&#xA;Order Deny, Allow\\&#xD;&#xA;Allow from 192.168.0.0/24\\&#xD;&#xA;Deny from all\\&#xD;&#xA;&lt;/Directory&gt;&#xD;&#xA;{quote}&#xD;&#xA;&#xD;&#xA;With such configuration only local IPs in the private network 192.168.0.0/24 will be allowed to access the awstats reports. &#xD;&#xA;&#xD;&#xA;__Configuring awstats__&#xD;&#xA;&#xD;&#xA;Open the configuration file located in __/etc/awstats__ and make sure that:&#xD;&#xA;&#xD;&#xA;1. __LogFile__ is set to your mainsite TransferLog file&#xD;&#xA;1. __LogType__ is set to W&#xD;&#xA;1. __LogFormat__ is set to 1 (the NSCA format)&#xD;&#xA;1. __SiteDomain__ is set to your site domain&#xD;&#xA;1. __AllowToUpdateStatsFromBrowser__ is set to 0 (not that we will actual use it)&#xD;&#xA;1. __DNSLookup__ is setted to 1 (this means that dns is fully enabled and&#xD;&#xA;every hostname will be resolved. If you have a high traffic site, this might take a while to process) &#xD;&#xA;1. __LoadPlugin=&quot;geoipfree__ is uncommented&#xD;&#xA;&#xD;&#xA;Now we&apos;ll have to setup a cron job that will update the reports, you can choose daily or hourly. That&apos;s up to you.&#xD;&#xA;&#xD;&#xA;You now have to choose between using static HTML or having a cgi-bin script, the first one is more secure but less flexible, if you want for example to check the reports of daily visits for a previous month you&apos;ll have to generate by hand again a new report, using the cgi-bin you just have a drop box and have to select&#xD;&#xA;the month you want. Although awstats already had some security problems in the past and it&apos;s up to you to know which one to choose. &#xD;&#xA;&#xD;&#xA;Let&apos;s first configure for a static output. &#xD;&#xA;&#xD;&#xA;You write the following script&#xD;&#xA;&#xD;&#xA;{quote}&#xD;&#xA;#!/bin/bash&#xD;&#xA;&#xD;&#xA;perl /usr/local/awstats/wwwroot/cgi-bin/awstats.pl -config=YOUR_DOMAIN -update -output -staticLinks &gt; /usr/local/apache/htdocs/stats/index.html &#xD;&#xA;{quote}&#xD;&#xA;&#xD;&#xA;And you store it in the __cron.daily__ or __cron.hourly__ directory so it gets executed daily or hourly. Don&apos;t forget to give executing permissions to it.&#xD;&#xA;&#xD;&#xA;You can now access your website statistics reports by simply typing in your browser __awstats.yourdomain.tld__. And you&apos;re done! &#xD;&#xA;&#xD;&#xA;Now let&apos;s write the script to use with awstats in a cgi-bin way.&#xD;&#xA;&#xD;&#xA;{quote}&#xD;&#xA;#!/bin/bash&#xD;&#xA;&#xD;&#xA;perl /usr/local/awstats/wwwroot/cgi-bin/awstats.pl -config=YOUR_DOMAIN -update &#xD;&#xA;{quote}&#xD;&#xA;&#xD;&#xA;Although that&apos;s not all, you still have to do two other things:&#xD;&#xA;&#xD;&#xA;\1. Create a Symlink from your /usr/local/apache/htdocs/stats to /usr/local/awstats/wwwroot&#xD;&#xA;&#xD;&#xA;2. Add the following lines within the awstast vhost:&#xD;&#xA;&#xD;&#xA;{quote}&#xD;&#xA;&#xD;&#xA;Alias /awstatsclasses &quot;/usr/local/apache/htdocs/stats/classes/&quot;\\&#xD;&#xA;Alias /awstatscss &quot;/usr/local/apache/htdocs/stats/css/&quot;\\&#xD;&#xA;Alias /awstatsicons &quot;/usr/local/apache/htdocs/stats/icon/&quot;\\&#xD;&#xA;ScriptAlias /cgi-bin/ &quot;/usr/local/apache/htdocs/stats/cgi-bin/&quot;\\&#xD;&#xA;&#xD;&#xA;{quote}  &#xD;&#xA;&#xD;&#xA;Now you can access your website statistics reports by going to the following URL: awstats.yourdomain.tld/cgi-bin/awstats.pl?config=The_Server_Name_You_Give_In_The_Config_Param&#xD;&#xA;&#xD;&#xA;__Final notes__&#xD;&#xA;&#xD;&#xA;As you probably already realized it&apos;s pretty simple to scale this solution. you just need to get more awstats configuration files, and run each configuration file in the cron job. \\&#xD;&#xA;If you use the cgi-bin version you just have to change the __config__ parameter to access each stats but if you are using the static HTML solution you have to generate a different HTML file for each one (which is trivial).&#xD;&#xA;&#xD;&#xA;I think there&apos;s not much else to add. I suggest you to read the Documentation in the awstats and also the comments in the configuration files.&#xD;&#xA;</s:content>
        <s:mTime>2006-05-24 20:23:18.154</s:mTime>
        <s:cTime>2006-05-24 20:23:18.154</s:cTime>
        <s:comments>
            <rdf:Bag>
                <rdf:li>
                    <s:Comment rdf:about='http://pabrantes.net/blog/rdf#comment-start/2006-05-24/1-1'
                         s:name='comment-start/2006-05-24/1-1'
                         s:cUser='MANOWAR^'
                         s:oUser='MANOWAR^'
                         s:mUser='MANOWAR^'>
                        <s:content>Ok this is not really related to how useful awstats is or any other number/stats prodocuing software out there (we have awstats, webalizer and nagios running here nonstop) but at what point do you get number/information overload to where the information is no longer really usefeull because there is so much of it? I find my self in that position more often than not which is not really good considering I am a system adming and I can only imagine what a normal user who log into their webhopsting company and looks at their awstats and goes &quot;WTF IS THIS, WHAT DOES IT ALL MEAN?! /slams head against desk...&quot;</s:content>
                        <s:mTime>2006-05-25 17:35:07.936</s:mTime>
                        <s:cTime>2006-05-25 17:35:07.828</s:cTime>
                        <s:commentedSnip rdf:resource='http://pabrantes.net/blog/rdf#start/2006-05-24/1'/>
                    </s:Comment>
                </rdf:li>
                <rdf:li>
                    <s:Comment rdf:about='http://pabrantes.net/blog/rdf#comment-start/2006-05-24/1-2'
                         s:name='comment-start/2006-05-24/1-2'
                         s:cUser='pabrantes'
                         s:oUser='pabrantes'
                         s:mUser='pabrantes'>
                        <s:content>Well I don&apos;t know nagios, but I do know awstats and webalizer.\\&#xD;&#xA;They both spit plenty of data, but I think it&apos;s well organized. Maybe awstats has too much things, but I guess that most of them are self explanatory and you can extract valuable information from some of the information, like for example:&#xD;&#xA;&#xD;&#xA;* If your visits are going up (always a good sign)&#xD;&#xA;* What&apos;s your most viewed pages&#xD;&#xA;* The 404 your server is issuing.&#xD;&#xA;&#xD;&#xA;But since I never had to analyze awstats for a server that has for example 10000 hits a day, I don&apos;t actually have the feel of doing such work. Although I do believe that in such cases it might be difficult to process data...Maybe if you keep you head clear, restrict the working set to just the top 10-20 lines of each table that display relevant info you might get along (or not).&#xD;&#xA;&#xD;&#xA;&#xD;&#xA;</s:content>
                        <s:mTime>2006-05-25 20:11:47.912</s:mTime>
                        <s:cTime>2006-05-25 20:11:47.825</s:cTime>
                        <s:commentedSnip rdf:resource='http://pabrantes.net/blog/rdf#start/2006-05-24/1'/>
                    </s:Comment>
                </rdf:li>
                <rdf:li>
                    <s:Comment rdf:about='http://pabrantes.net/blog/rdf#comment-start/2006-05-24/1-3'
                         s:name='comment-start/2006-05-24/1-3'
                         s:cUser='MANOWAR^'
                         s:oUser='MANOWAR^'
                         s:mUser='MANOWAR^'>
                        <s:content>Well i was just talking about information overload in general webstats was just a the simplest example... it seems like these days the goal of everyone is to get you information overload headache :) with having rss feeds, mailing lists, blogs, forums. There seems we have gotten into the age of Information Overload not the Information Age... but that is just my opinion.</s:content>
                        <s:mTime>2006-05-25 20:41:29.15</s:mTime>
                        <s:cTime>2006-05-25 20:41:29.086</s:cTime>
                        <s:commentedSnip rdf:resource='http://pabrantes.net/blog/rdf#start/2006-05-24/1'/>
                    </s:Comment>
                </rdf:li>
                <rdf:li>
                    <s:Comment rdf:about='http://pabrantes.net/blog/rdf#comment-start/2006-05-24/1-4'
                         s:name='comment-start/2006-05-24/1-4'
                         s:cUser='pabrantes'
                         s:oUser='pabrantes'
                         s:mUser='pabrantes'>
                        <s:content>Well it&apos;s not overload, it&apos;s redundancy! When an information link fails you still have plenty of others ~~laugh~~. &#xD;&#xA;&#xD;&#xA;For example I have subscribed plenty of rss feeds and mailing lists... Most of the times I just browse through the titles, maybe 1 in each 20 or even 30 calls my attention... So I don&apos;t actually get overloaded. And to be truth I never had that overload headache, maybe it&apos;s just me!  </s:content>
                        <s:mTime>2006-05-25 23:13:35.765</s:mTime>
                        <s:cTime>2006-05-25 23:13:35.7</s:cTime>
                        <s:commentedSnip rdf:resource='http://pabrantes.net/blog/rdf#start/2006-05-24/1'/>
                    </s:Comment>
                </rdf:li>
                <rdf:li>
                    <s:Comment rdf:about='http://pabrantes.net/blog/rdf#comment-start/2006-05-24/1-5'
                         s:name='comment-start/2006-05-24/1-5'
                         s:cUser='MANOWAR^'
                         s:oUser='MANOWAR^'
                         s:mUser='MANOWAR^'>
                        <s:content>Well see thats what I am talking about. Why waste the resources/time to sift through 20/30 things and actually read 1... thats exactly the waste of time I am talking about :).</s:content>
                        <s:mTime>2006-05-26 17:19:57.461</s:mTime>
                        <s:cTime>2006-05-26 17:19:57.375</s:cTime>
                        <s:commentedSnip rdf:resource='http://pabrantes.net/blog/rdf#start/2006-05-24/1'/>
                    </s:Comment>
                </rdf:li>
                <rdf:li>
                    <s:Comment rdf:about='http://pabrantes.net/blog/rdf#comment-start/2006-05-24/1-6'
                         s:name='comment-start/2006-05-24/1-6'
                         s:cUser='pabrantes'
                         s:oUser='pabrantes'
                         s:mUser='pabrantes'>
                        <s:content>Well whe never know when such topic might get in handy in the future. Going through them __might__ ring a bell when the future I may need something related with that topic, or maybe it won&apos;t or I never will actually need it. &#xD;&#xA;&#xD;&#xA;But I don&apos;t actually see as a waste of resources/time as you say. It&apos;s information and it&apos;s always valuable. &#xD;&#xA;&#xD;&#xA;I see this mostly has a graph problem (I really do). Imagine the following:&#xD;&#xA;&#xD;&#xA;1. Map all the information you have in a graph&#xD;&#xA;1. Give weights to the edges acording to your interests in that day&#xD;&#xA;1. Browse the graph in a way of maximizing your Interest &#xD;&#xA;&#xD;&#xA;Now if your interestes change, which they do... You can always go back to the graph or learn something else, even if you didn&apos;t even look at it the 1st time! &#xD;&#xA;&#xD;&#xA;The point here, is that reading the subjects helps me ~~index~~ my ideas, even if I don&apos;t read the message body. It&apos;s like reading the newspaper and only looking at the headlines. It&apos;s not a waste of time, it&apos;s actually a maximization of used time! ~~laugh~~&#xD;&#xA;&#xD;&#xA;But that&apos;s my point of view, you&apos;re totally free to have your own! </s:content>
                        <s:mTime>2006-05-27 00:37:58.138</s:mTime>
                        <s:cTime>2006-05-27 00:37:58.034</s:cTime>
                        <s:commentedSnip rdf:resource='http://pabrantes.net/blog/rdf#start/2006-05-24/1'/>
                    </s:Comment>
                </rdf:li>
            </rdf:Bag>
        </s:comments>
        <s:snipLinks>
            <rdf:Bag>
                <rdf:li rdf:resource='#snipsnap-notfound'/>
                <rdf:li rdf:resource='http://pabrantes.net/blog/rdf#pabrantes/post-history'/>
                <rdf:li rdf:resource='#snipsnap-search'/>
                <rdf:li rdf:resource='http://pabrantes.net/blog/rdf#vista/main.html'/>
                <rdf:li rdf:resource='http://pabrantes.net/blog/rdf#start/2006-05-01/1'/>
            </rdf:Bag>
        </s:snipLinks>
        <s:attachments
             rdf:type='http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag'/>
    </s:Snip>
</rdf:RDF>
