Tuesday, April 01, 2008

proxy servers 1:

I have already pointed out (here) the rather odd logs at John Walker's fourmilab.com, in which three Roger's IPs accounted for so much of the traffic for the site in January, 2004. The size of the traffic apparently attracted the attention of the site owner, who discusses it here. His conclusion? That these are proxy servers.  This is what he writes:
    The hosts with very high hit rates appear to be HTTP proxy servers which are relaying requests from hosts behind them. Here's a dump of a packet from the host at the very top of the heavy hitters report:
      wc07.wlfdle.rnc.net.cable.rogers.com -> vitesse HTTP GET / HTTP/1.1
      0: 0800 20a1 4ca0 0030 1e05 2758 0800 4500 .. .L..0..'X..E.
      16: 00ce e6a2 4000 2c06 28f1 42b9 544a c108 ....@.,.(.B.TJ..
      32: e68a b959 0050 0b64 6c2d 798e e202 8018 ...Y.P.dl-y.....
      48: 4470 c88b 0000 0101 080a 2e4f 0050 594f Dp.........O.PYO
      64: daa9 4745 5420 2f20 4854 5450 2f31 2e31 ..GET / HTTP/1.1
      80: 0d0a 486f 7374 3a20 666f 7572 6d69 6c61 ..Host: fourmila
      96: 622e 6368 0d0a 436f 6e6e 6563 7469 6f6e b.ch..Connection
      112: 3a20 6b65 6570 2d61 6c69 7665 0d0a 5072 : keep-alive..Pr
      128: 6167 6d61 3a20 6e6f 2d63 6163 6865 0d0a agma: no-cache..
      144: 582d 466f 7277 6172 6465 642d 466f 723a X-Forwarded-For:
      160: 2032 342e 3135 332e 3539 2e38 340d 0a56
      176: 6961 3a20 312e 3020 7763 3037 2028 4e65 ia: 1.0 wc07 (Ne
      192: 7443 6163 6865 204e 6574 4170 702f 352e tCache NetApp/5.
      208: 322e 3152 3144 3929 0d0a 0d0a 2.1R1D9)....
    As you can see, this is a proxy server forwarding a packet for a host with IP address, which resolves to CPE00e0184e28fb-CM00803785d6b6.cpe.net.cable.rogers.com, another host in the same domain (albeit a very different IP address: the proxy server is Monitoring packets from the proxy server show it forwarding requests from an assortment of hosts behind it. 
Thus John Walker.  There are two important points for us here: (1) is a proxy-server that was forwarding requests for a number of Rogers' customers, and (2) each of Rogers' customer will have had their own IP.  (In this case the customer is

There is, however, another point.  To the right is screen-capture of the last line of the packet quoted above.  In it is identified the server software that is doing the proxying (see the red arrow): Netcache by Network Appliances (NetApp).  

This is something that we see elsewhere.  A Thai web-directory, truehits.net, provides a variety of services for its customers, including daily traffic statistics that are available on the web.  One reported statistic is top proxy servers, and the same proxy server discussed by fourmilabs in January 2004 was listed by truehits as one of top proxy-servers to visit sunncity.com on April 19 2003, Dec. 23 2004, and Dec. 26 2004.

The screen cap on the left is from April 19.  It shows, not only is the same server software being used (see red arrow), but the exact same version (5.1.1R1D9), and (since it is set up to "Forward IP") with much the same configuration.  

Now, according to the fourmilabs logs, the forwarding was being done for (a Rogers IP).  In the case of this truehits log, we can again identify the specific IP of the individual user, not merely the proxy.  Note that the proxy had forwarded 63 hits (green arrow).   

Among the top daily visitors (see right) is, which is marked here as NETBLK-RNS-EAST.  (To judge from this, RNS abbreviates Rogers Network Services.) And the number of hits is identical to the proxy's.  (It is only to be expected, of course, that an small site in Thailand might only receive one Rogers customer in a daily log.)

The important point for us is that Rogers was using this IP as proxy for all of 2003 and 2004.

One final point.  In most cases the proxy hides the individual IP, and the IP that appears in the logs is that of the  proxy, not that of the individual Rogers customer.  In the case of the fourmilabs logs, of course, Mr. Walker has fished out the individual IP for us.  Normally, however, we don't see the individual data, but we see details about his browser and operating system ("user agents").  Consider these examples from two log entries for the same proxy ( during the period when this proxy was in use:
  • Tue 29-June-2004 08:07 - wc08.wlfdle.rnc.net.cable.rogers.com [=, ed.] - "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" - "http://search.yahoo.com/"
  • - - [25/Sep/2004:09:59:00 +0200] "GET /downloads/hovtext/1.0/HovText.exe HTTP/1.1" 200 1136640 "http://hovklan.com/hovtext/index.php/?page=download〈=en" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; FunWebProducts; SV1; .NET CLR 1.1.4322)"
In both cases, the IP is the proxy, and the user agent data (browser, operating system, etc.) is the individual's.