As a moderately large company we rent mail boxes for our employees at a hosting provider; a lot of mailboxes. These come in varying sizes, and naturally the larger you go the more expensive they become.
The other day I received an email requesting several new accounts and set upon creating these when I came across what seemed to be a rather inefficient allocation. The user had a mid-size tier, costing about €150 per year, while he could seemingly make do with the very smallest tier of about €50 annually.
This, of course, made me curious about our other allocations and I went looking for an overview of all our mail accounts' usage. No such luck. The only way to see how much of the rented space was actually being used was by navigating the - non-rest and stateful - web interface of our hosting provider and looking up the statistics for each user individually.
Challenge accepted!
I've gotten tired of using selenium lately, and been meaning to look into some of PhantomJS based alternatives. My eye had fallen on CasperJS and I decided to give it a spin.
Using brew I downloaded the latest development release:
[shell]
$ brew update
$ brew install casperjs --devel
[/shell]
The script in it's entirety can be found in this gist, but walking per section:
[javascript]
var casper = require('casper').create({ verbose: true, logLevel: 'info' });
var credentials = JSON.parse(require('fs').read('./credentials.json'));
var url = 'private';
casper.start(url + '/user/login', function() {
this.fill('form#login_form', credentials, true);
});
[/javascript]
The first line initialises CasperJS, with some logging enabled. The second line reads in a simple json file containing the form fields and values of the login page, while the third line contains the base url of our hosting provider.
In the next section casper is told to navigate to said url's login page, fill in the specified form with the credentials and submit. It's that easy!
[javascript]
casper.thenOpen(url + '/subscription/config/xebia.com/exchange_mailbox', function() {
this.getElementsInfo('tr td a').forEach(function (node) {
if (node.attributes.nicetitle === "View") {
[/javascript]
All logged in, it's time to navigate to the exchange's overview page. Here every user's account details are linked to, in a node with the attribute nicetitle="view". Naturally, we want to iterate over these. This is where a small hitch in the plan was encountered.. the html is completely unstructured. Simply a table of varying dimensions with label, value pairs. I decide to postpone the problem, and for now simply fetch the entire element:
[javascript]
casper.thenOpen(url + node.attributes.href, function() {
require('fs').write('output', JSON.stringify(this.getElementInfo('div.contentleft').html, 'a'));
});
}
});
});
[/javascript]
Ending it all with a:
[javascript]
casper.run();
[/javascript]
It's time to dive in to the console and give it a spin:
[shell]
$ casperjs fetch.js
[/shell]
Excellent! Casper is spinning along, discovering and fetching the data, and I can see a tail of the generated output file streaming in. Unreadable, but the data is all there, bringing use nicely to the second topic of this post.
BASH data analysis:
To begin with, let's put this malformed html through tidy. Since we're not interested in the many warnings tidy will give us, we'll redirect stderr to /dev/null, yielding us:tidy <fetched 2>/dev/null
ntttt</pre> <div class=""clear""></div> <pre>ntt</pre> <div>nttt<label>Current mailbox size</label>nttt1921 MBntt</div> <pre>ntt</pre> <div>nttt<label>Warning quota</label>nttt2250 MBntt</div> <pre>ntt</pre> <div>nttt<label>Block send quota</label>nttt2375
tidy <fetched 2>/dev/null | grep label</pre>
<div>nttt<label><label>Email </label></label> <div>nttt<label>Email aliases</label>nttt <div>nttt<label>Current mailbox size</label>nttt1921 <div>nttt<label>Warning quota</label>nttt2250 <div>nttt<label>Block send quota</label>nttt2375 <div>nttt<label>Block send and receive quota</label>nttt2500 MBntt</div> <div>nttt<label>Pop enabled</label>nttt<img alt="" src="<br" />
tidy <first-fetch 2>/dev/null | grep label | sed 's/^.*/label>//' | sed 's/\nt//g'
<div><label><label>Exchange </label></label> <div><label>SMTP 1921 2250 2375 2500 MB</label></div> <label><label> <img alt="" src="<br" /><img alt="" src="<br" />...
tidy <first-fetch 2>/dev/null | grep label | sed 's/^.*/label>//' | sed 's/\nt//g' | grep -v 'label|img|<br>|HOSTED'
Sunil Prakash Sunil Prakash REDACTED</pre> </div> REDACTED</div> 1921 2250 2375 2500 MB
tidy <first-fetch 2>/dev/null | grep label | sed 's/^.*/label>//' | sed 's/\nt//g' | grep -v 'label|img|<br>|HOSTED' | sed 's/<.*//' | sed 's/ MB$//'
Sunil Prakash Sunil Prakash REDACTED REDACTED 1921 2250 2375 2500
cat data | paste -d , - - - - - - - - | sed 's/,,/,.,/' | head -n 2
REDACTED,REDACTED,REDACTED,REDACTED,1921,2250,2375,2500 REDACTED,REDACTED,REDACTED,REDACTED,40,2250,2375,2500
cat data | paste -d , - - - - - - - - | sed 's/,,/,.,/' | awk -F, '{print $0 "," $5/$8*100"%" }' | head -n 2
REDACTED,REDACTED,REDACTED,REDACTED,1921,2250,2375,2500,76.84% REDACTED,REDACTED,REDACTED,REDACTED,40,2250,2375,2500,1.6%
cat data | paste -d , - - - - - - - - | sed 's/,,/,.,/' | awk -F, '{print $0 "," $5/$8*100"" }' | sort -t, -k +9 -n -r | tail -n 2
REDACTED,REDACTED,REDACTED,REDACTED,0,2250,2375,2500,0 REDACTED,REDACTED,REDACTED,REDACTED,0,225,237,250,0
cat data | paste -d , - - - - - - - - | sed 's/,,/,.,/' | awk -F, '{print $0 "," $5/$8*100 }' | sort -t, -k +9 -n -r | column -t -s , | tail -n 5
REDACTED REDACTED REDACTED REDACTED 1 2250 2375 2500 0.04 REDACTED REDACTED REDACTED REDACTED 0 225 237 250 0
cat data | paste -d , - - - - - - - - | sed 's/,,/,.,/' | awk -F, '{print $0 "," $5/$8*100 }' | sort -t, -k +9 -n -r | awk -F, '{ if ($8 > 250 && $9 < 10) print $3 "," $9"%" }' | column -t -s,
REDACTED 0.32% REDACTED 0.2%
cat data | paste -d , - - - - - - - - | sed 's/,,/,.,/'x | awk -F, '{ if ($8 > 250 && ($5/$8) < 10) print $0}' | wc -l | xargs echo "100 *" | bc
6100
Written by

Joshua Appelman
Our Ideas
Explore More Blogs


The Unwritten Playbook: Leading Without a Title
The Unwritten Playbook: Leading Without a Title Leading Without a Title: The Consultant’s Guide to Stealth Influence “Why should we listen...
Jethro Sloan


The Unwritten Playbook: Winning as a Team
The Unwritten Playbook: Winning as a Team Team Whispering: Navigating the Human Dynamics No One Prepared You For "We’ve gone through three...
Jethro Sloan

