Big Data

Share this page

Written by Jonathan Kettleborough on 1 June 2014 in Features

In the third and final article of this series Jonathan Kettleborough highlights a number of emerging issues and potential pitfalls to be aware of when using Big Data

In this final article about L&D and Big Data, the focus is on demonstrating some of the emerging issues and trends that L&D will need to be aware of if they are to make the very best use of Big Data and avoid a number of possible hazards.

The good

Let’s start on a positive note. The emergence of Big Data and data analytics can be of massive benefit to the L&D industry. With the right data and application we can identify and address a wide range of issues, including:

  • The ability to identify at the earliest opportunity those who are in danger of leaving our organisations and do what’s necessary to retain our best talent
  • The ability to identify the really high performers based on ‘new’ live data based on performance and ratings and profitability, rather than ‘old’ assumptive data such as university attended and the grade of their degree
  • The ability to measure – at a more granular level – the real drivers of performance within our business and thereby discover those ‘hidden gems’ that are making a real difference
  • The ability to ‘fine tune’ businesses based on fact rather than fiction – as L&D professionals we handle ambiguity well – but now it’s time to deal with evidence rather than emotion
  • The ability to understand that data sets can be combined to form a more intricate and accurate picture than ever before – no longer will ‘average training hours per year’ or ‘delegate feedback per trainer’ be considered enough evidence on their own – they will require support by more and more data sources.

There’s no doubt that, handled correctly, Big Data has the opportunity to deliver the L&D profession some massive benefits. But it’s not all good news, and there’s a real necessity to tread carefully as we shall explore in this article.

The bad

Let’s be frank, learning and development folks have never been that hot with numbers.  Sure, we can handle ‘happy sheets’ and some basic performance data but – with few exceptions we’re not the type of people who can readily handle data in a detailed way. I explored this situation in my last article, but needless to say we’re lacking in our ability and comfort with data1.

Here’s a simple, but very real example. A few years ago I spoke at an industry conference and after the event I asked for details of the feedback I’d gained. “I’ll send it to you, but you might not like all the results,” I was told.  When I got the results I realised that a fundamental error had been made.  As with so many feedback systems, delegates were asked to rate the presenter on a score of 1 to 5, where 1 was poor and 5 excellent.  What happened was that anyone who hadn’t answered a question was entered as zero – massively affecting the score. 

Let’s see how this worked in reality: there were 60 people at the event.  If all had voted the session as a 4 (on a scale of 1 to 5 where 1 is poor and 5 is outstanding) then the scores would have been

60 x 4 = 240

240 / 60 = 4

So in this case an average of 4 would have been recorded. But what actually happened was: of the 60 people at the event, 50 had voted the session a 4, but 10 had failed to respond. Here’s how the numbers should have been calculated:

50 x 4 = 200

200 / 50 = 4

However, here’s how the numbers were calculated:

50 x 4 = 200

10 x 0 = 0

200 + 0 = 200

200 / 60 = 3.33

A simple mistake, I’d agree, but one that, if used for recognition, reward or promotion, could have had dire consequences. Our ability to handle data, even at a basic level, is critical when we enter the Big Data arena and if we get it wrong we’d better be prepared for the backlash.

Our ability to make simple mistakes goes further. You’ll be familiar with the Kirkpatrick model for training evaluation and Level 1 metrics are all about the “reaction” of the delegates to the learning, or the location, or the trainer. Now that all sounds fair and reasonable, but a variety of commentators and research shows that there is little correlation between learner reactions and measures of learning, or subsequent measures of changed behaviour. It’s been suggested that “satisfaction” is not necessarily related to good learning and sometimes learner discomfort is actually essential for success. 

Mixed results may indicate that what is measured at the reaction level stage might be important and more focused reaction-level questionnaires may be more informative about the value of training. But even ignoring the negative research, stand back for a moment and ask yourself if you really believe that there’s a direct link between someone’s liking for a location, a trainer or a lesson and their long-term performance. Crazy eh, but given these facts, why on earth do we continue to use these metrics?

I believe (and I hasten to add it’s just my belief) that these metrics provide the senior L&D professionals with what can only be described as “comfort metrics”. Ask someone to vote on how they felt about having root canal treatment (and yes, I have been there) and I suggest that most people would not rate it as a pleasurable experience. But does it matter? Of course not! What matters most are the outcomes and not the processes!

Neil Rackham, the best-selling author of SPIN Selling was asked to assess his best trainers.  Surprisingly, trainers with the worst feedback from the students were the ones who actually delivered better salespeople. Put simply, the level 1 data was giving the wrong picture!  And this is not just a one-off occurrence. Roger Chevalier, vice president of Performance for Century 21 Real Estate found that there was very little correlation between level 1 evaluations and how well people performed in the field. Echoing those thoughts, Donald Taylor, chairman of the Learning and Performance Institute, has said:

“When I was running a 17-room training centre in London, I noted the 3L effect (lunch-loos-liking). If there was something wrong with the lunch or the loos, the scores for the trainer were marked down, even if it was the same trainer, on the same course, in the same room.  And no, it wasn’t just that trainer having a bad day – all classroom scores were marked down.”

We are going to have to learn to look at data differently and to be prepared to throw away many of our old beliefs. A great example of this is the growing view that forced ranking – the application of normal distribution to employee performance – is deeply flawed. Although many still stick to the forced rank or forced stack as it is sometimes known, there is building evidence that these approaches have actually damaged organisations and some believe that Microsoft’s ‘lost decade’ was as a direct result of misplaced or misunderstood data techniques2

So there’s the bad part – we know that we’re not good with data and until we become more familiar and comfortable with data then we’ll never reap the deeper benefits of Big Data – indeed, we could do untold harm along the way, as we shall now see.

At a basic level, if we don’t understand data then we can make some massive mistakes – that much is already understood. However, the really ugly impact of Big Data is, I believe, yet to really hit us.

There’s a classic Big Data story of how retailer Target marketed pregnancy-related products to a teenager even though her father never knew she was pregnant. Luckily Target was right with their analysis, but you may not be so lucky. As a customer who used a storecard, the teenager had given Target the right to access and use her purchase history3. That’s fair enough. But what about your employees, have they given you the right to access all their data?

We know that recruiters are already ‘looking’ at our social media lives and this in itself has caused some issues, although it’s generally felt that this is justifiable, especially using work-related sites such as LinkedIn. But where companies attempt to mine or scrape data from your non work related social media life then there could be some potentially ugly consequences.

According to the CIPD using social media in recruitment or as part of career progression carries the risk of a number of different claims if a candidate is not appointed as a result of information gleaned4. Depending on the circumstances, the claims available include:

  • A breach of the Human Rights Act 1998 (incorporating Article 8 of the European Convention on Human Rights). This provides a right to respect for private and family life. Although some case law has established (in another context) that employees should have little expectation of privacy with material they post online, claims remain possible.
  • A breach of the Data Protection Act 1998, which states that data controllers such as prospective employers should not hold excessive information and should process any information in a fair way.
  • Age discrimination – it has been suggested that the over 50s age group will be more cautious with their social media presence than the under 30s, resulting in more potential for negative recruitment decisions for younger people.
  • Sex discrimination – information about people’s marital status, numbers of children etc may incorrectly influence a selection decision.
  • Disability discrimination – information about people’s physical or mental state, such as revealing depression to their friends on social media, may also lead to a claim.
  • Sexual orientation discrimination – if a prospective employer reacts negatively to information disclosed on social media about sexuality, this may lead to a claim.

Even the emergence of massive open online courses, MOOCs, are now causing a stir in the world of Big Data with news that student details, including performance data, is being sold online. If recruiters are looking for ‘smart self-motivated candidates’ then this sounds an ideal hunting ground, until that is, you realise that the MOOC population is overwhelmingly male  and this merely increases sex and race bias, which is not a good thing at all5,6.

There are also some other considerations we need to take into account when using Big Data within L&D.

Quality and accuracy:  If we’re going to make judgments about people we’d better make sure that the quality and accuracy are beyond reproach – as shown in my comment about feedback forms earlier. The accuracy of data is even more critical if we are to make potentially life-changing judgments about someone. What are the consequences if we hire, promote or fire someone based on bad data? I’m not aware of any legal actions at the moment, but I honestly think it’s only a matter of time before someone sues an employer for Big Data discrimination – and I’m not alone; Kate Crawford, a Principal Researcher at Microsoft Research, would agree7

Quantity:  We need to make sure that we have enough data to make decisions. As the old proverb states ‘One swallow does not a summer make’. When analysing data it’s critically important to make sure that you have enough of it to really paint a picture. I don’t propose to go into the science of probability, sample sizes or statistical significance but be aware that just 20 people in a population of 1,000 exhibiting certain traits, behaviours or trends doesn’t mean that this is statistically significant for the whole population. Picking a few and assuming this applies to the many no longer applies.

Correlation and Causation:  I mentioned this at length in my last article8 and won’t go over too much old ground here. Needless to say, the key issue is that just because two items correlate does not mean that one causes the other.  Ignore this at your peril!

Privacy and anonymity:  I’ve mentioned the problems of data mining or scraping social media sites and there’s a real danger that people’s personal data could be used against them – even inadvertently. Even data which has been scrubbed to remove personal references can be reconnected to individuals. Mobile phone carriers are selling collections of data about phone movements with all personal details removed. But a group of researchers from MIT, the Universite Catholique de Louvain in Belgium and other institutions looked at one such collection and were able to pinpoint 95% of the unique users by analysing just four GPS time and location stamps per person9

Several years ago, researchers at Carnegie Mellon University were able to create a system to uncover Social Security numbers from birthday and hometown information listed on social networking sites like Facebook10 11.

And within the UK the sale of NHS records has caused uproar as the potential for reversing anonymous data is huge12

And of course most major businesses will conduct internal research and surveys where the employee is assured anonymity. Examples could be safety surveys, culture surveys, employee engagement surveys and so on. The idea behind anonymity is that the employee will give a more truthful response and the business therefore sees a picture that is closer to the truth than the rose coloured image it may otherwise see. During these surveys employees usually have to complete a range of demographic information – age, location, role, time in service etc which could, if the data were used for the wrong reasons, actually identify either very small groups of respondents or actual individuals. If used incorrectly this ‘honest data’ could be turned against the employees. Dangerous times indeed.

Tomorrow’s Big Data challenge isn’t just technical; it’s whether L&D managements have algorithms and analytics that are both fairly transparent and transparently fair. Big Data champions and practitioners had better be discriminating about how discriminating they want to be.

The ugly

The obvious thing about Big Data is that it’s big! No pun intended here but whenever large amounts of data are collected they become the target of the unscrupulous. Keeping your data safe is becoming a bigger and bigger issue on almost a daily basis. And it’s not just the ‘USB left on a train’ syndrome anymore – there are now individuals and groups willing to sell and exploit valuable data. One example is the two Aviva employees who sold details of people who had recently had accidents to claims companies. The flag was raised when hundreds received calls from firms persuading them to take personal injury claims. The employees have been dismissed and police have arrested two people on suspicion of fraud13

Added to that, Morrisons, the Bradford-based UK retailer, has also been attacked where the personnel records of around 100,000 employees have been stolen. According to the BBC, this information, which includes bank account details, has been published online and sent on a disc to a newspaper. According to Morrisons, its initial investigation does not point to the work of an outside hacker and it had sought to allay shoppers’ fears by saying there had been no loss of customer data14

Not all data loss will be reported or even known about but here’s a list of some of the UK government departments and agencies known to have ‘lost’ data. 

  • Serious Fraud Office
  • Greater Manchester Police
  • Powys County Council
  • Department for Work and Pensions
  • Ministry of Defence
  • Insolvency Service
  • Home Office
  • Royal Navy
  • Foreign and Commonwealth Office
  • HM Revenue and Customs
  • Ministry of Justice
  • Driving Standards Agency

But data loss isn’t all down to hackers. According to various sources, the majority of data loss is due to employees. Aviva was one example and the continued focus on the exploits of ex NSA contractor Edward Snowden shows just how much damage one employee can do15.

Of course the majority of employees are honest and in the past the theft of data would have been targeted at bank account detail. But why clear out a bank account once when you can have an income for years? Knowing details of an employee’s marital status, their sexuality, their deepest medical details may seems somewhat boring – until that is you decide to take the information to a newspaper or blackmail the individual. 


There’s no doubt that Big Data is here to stay and as the connection of data grows then we’ll be hard pushed to avoid its impact. Where L&D is concerned, however, we need to take care. We need to ensure we use the right data, in the right way, for the right reasons and allow a right of reply. We also need to exercise a massive duty of care – we’re potentially in charge of data that could change lives for the worse and we must always remain cognisant of that fact.

It’s bound to be a bumpy ride and it’ll be one I’ll watch closely.

A fully-referenced version of this article is available on request.

About the author

Jonathan Kettleborough is a consultant, author, blogger and lecturer. He can be contacted on Twitter @JKettleborough or via


Please login to post a comment or register for a free account.

Related Articles

25 November 2022

Jo Cook reflects on planning hybrid events and urges us to not to let the technology get in the way 

18 November 2022

A selection of the latest news, research and stories from the world of HR, talent, learning and organisational development as selected by the TJ editorial team.

1 November 2022

Jo Cook reveals the results of research into virtual and hybrid learning 

Related Sponsored Articles

11 May 2022

For the second year, Learning Pool, the global provider of e-learning solutions, has been selected as a Strategic Leader on the Fosway 9-Grid™ for Digital Learning 2022

6 December 2021

Learning Pool, global provider of e-learning solutions, is thrilled for its colleagues, Stefan Eger and Ronnie Wilson-Miller who both achieved wins at the Learning Technologies Awards 2021