What or Who is Ufi learndirect?
“The concept of a ‘University for Industry’ led to the creation of Ufi in 1998. The organisation then set-up learndirect, a nationally recognised brand for learning. In six years learndirect has become the largest e-learning network of its kind in the world, and has individualised the delivery of learning to a mass audience through a unique combination of flexibility, accessibility and support.”
In this piece I plan to talk a bit about our e-learning platform and the part that open source tools and systems have played in our success.
Technical and Service Context
VOLUMES
447,000 learners last year
4,000 concurrent learners at peak
consuming 70 mb/s of bandwidth
99.98% systems availability
The learndirect learner management system (LMS) like most learning management systems is more than a website with lots of content.
Content sites like the BBC or CNN while they have some personalisation, typically present their consumers with a collection of web pages. If they are personalised at all they present their consumers with a sub-set of content according to preferences or tracked activity. Critically, the content itself does not change from consumer to consumer and as a result can be load-balanced across a number of serves or caches and requires relatively little tracking.
Learner management systems such as the learndirect system track a learner’s progress through a piece of learning and adapt in response to on-programme formative assessment. Such systems do expect to modify content according to consumer behaviour and as a result the use of multiple content servers only works to an extent. Such systems require a single authoritative data source for each course.
Additionally, consumers visiting a news or similar site have plenty of choice. If the BBC site is slow or not there for whatever reason, there are plenty of other such sites for a consumer to visit.
With web delivered learning, the consumer is intending to engage in a formal learning activity that they have formally enrolled in and in many cases have traveled to one of our learning centres to take their course. There is no other site for them to go to. If the site is slow or closed, then their journey was a waste of time.
For this reason the system must be both available and perform well. It is not enough that a system is available and returns content. If e-learning is to be effective, the medium needs to be as un-intrusive as possible; content has to render without the consumer becoming aware of any wait.
This presents us with a double bind; each user’s content is customised and there is a service expectation of 100% availability and responsiveness. In addition, we have issues of large scale and 24 x 7 availability we can see that constructing such a service is a serious web engineering exercise.
If you are not monitoring the service, then you are just running software.
It’s never good when the first person to tell you that your service has a problem is one of your consumers. Without appropriate monitoring software this will inevitably be the case, and in all probability they won’t tell you immediately.
So, the first key differentiator between a service and a system is Monitoring.
Choose the right tools.
When our service was first constructed a very expensive piece of software was purchased to perform availability monitoring, however, Mr. Heisenberg was forgotten and the load associated with that particular tool was sufficient to detrimentally impact the system. The tool itself was sold as the usual universal panacea, however, in implementation it was clear that its forte was component monitoring and not service monitoring.
Running a live system with this tool gave us all sorts of problems. The tool required agents on all machines and was really only designed around component availability and even then this was often measured from the wrong place (inside the firewall).
We took a look at the open source offerings available at that time and selected two.
Event monitoring
Nagios has won lots of awards. We use it to monitor events from two locations.
- Our DMZ where it looks at all of our components every 90 seconds and critically has thresholds set for Green, Amber and Red. While most components in our large system are duplicated to provide resilience, it’s absolutely vital to know when one of your resilient components has failed in order to prevent a systems failure.
- The public Internet. From this location, we can look at the service(s) from the perspective of the end user.
Nagios is used to provide event monitoring. Implementing such a tool is not to be undertaken lightly. Getting the sensitivity correct so as not to cry wolf, and embedding the culture such that when an alert is sent out, the operational staff respond rapidly is, in my opinion, more difficult than installing the system in the first place.
Trend and volume monitoring
The second open source monitoring tool we use provides trend monitoring, After looking around we found Cacti.
While Nagios tells us when we have a specific issue/problem, Cacti provides us with the information to understand or diagnose the root cause. In measuring volumes and their trends, Cacti allows us to look across the whole application stack at any point in time and examine critical volumes.
Cacti is used to measure volumes. If a system can return a number, Cacti can capture, store and trend it. These volumes can be business or technical volumes examples of which might include the number of users logged into the system over time or critical system volumes such as bandwidth, disk space, CPU, or Memory usage.
When you want to compare historical volumes or activity at a particular moment in time, Cacti can provide it.
Culture and tools
As you might expect from an open source tool set, both of these tools are highly extensible. We have been able to write and adapt agents to interface with them, with the exception of our database monitoring, and we have been able to monitor and trend all our services.
I spoke above about getting the culture right, putting these critical volumes onto big flat screens, making them obvious to everyone in your operations and service team. This was the single most important cultural change we made next to implementing an ITIL service culture.
The real question here is how we’ve been allowed to put all this instrumentation all over our application. Most government contracts are outsourced, but we chose to in-source our operations and development teams.









"General content on open source"