Saturday, February 2, 2013

Essential: Not Bouncy Enough?

Admin note: Following the farcically misinterpreted Galaxy poll of "best" Prime Ministers of the last few decades (in which a huge lead for Howard based largely on him being the only Liberal in the sample while the Labor vote was split four ways was supposed to prove he was the best), I thought it would be fun to have a "Not-A-Poll" opt-in for "best" Tasmanian Premier!  It's on the right. Vote early stack often! Results are almost totally meaningless and for amusement/interest purposes only.  Ballot order random.

------------------------------------------------------------------------------------------------------------------------ 
 Advance Summary

1. The regular online pollster Essential Report has recently polled figures that lean to the Coalition compared to other pollsters (by about 1 point or slightly more) and that change very little from poll-to-poll.

2. Essential's two-party preferred results became much less volatile from early to mid 2010 onwards.  Prior to that they had been very bouncy and tended to have a large house effect in favour of Labor.

3. The current lean to the Coalition is a more recent development.

4. It would be expected that Essential would bounce less from poll to poll compared to other polls because of its use of a two-week rolling average, which gives it a larger effective sample size and reduces the impact of random factors on the result.

5. However, even with sample size taken into account, Essential polling in 2011 and 2012 was 30% less bouncy than a simulated "random" poll of similar size, even after assumptions are made that make the random poll less bouncy than such a poll normally would be.

6. This suggests that (i) Essential's scaling system is remarkably effective in reducing poll-to-poll bounce, or (ii) there is a problem with the underlying subsampling methods that is causing overly repetitive results, or (iii) both of these things are happening. At this stage I think (ii) is most likely.

7. It would be easier to trust a poll that produced remarkably constant results if it did not drift off the 2PP average of the other pollsters.

Disclaimer: These are provisional findings relating to a complex modelling question.  They may be amended based on any feedback received.  If so, significant edits will be noted at the bottom of the post. Oh, and this post comes with a Level Three Wonk Alert.  You have been warned!



-----------------------------------------------------------------------------------------------------------------

Well, what a week it's been.  Election date announced, Craig Thomson arrested and charged with eating an icecream (oh and massive sweeping fraud and corruption of course), Roxon and Evans resigning, Obeid saga continues, World Heritage Area nomination.  Capped off with Tony Sheldon's glorious comment about NSW Labor hacks as "B-grade politicians able to thrive forever on corruption and detritus [..] like cockroaches", raising the question of why Labor was too gutless to bring in the pest controllers back around, oh, 2005 or earlier.  In the midst of this, Andrew Wilkie called for a complete ban on election betting. I may have been the only person in Australia who noticed, and I didn't agree! We should see the first polling data that is influenced by these strange events very soon.

Note that this week's Newspoll would be expected to be, on average, at least a 52 to Coalition based on bounceback trends after a 3-point shift alone.  Don't therefore get too excited (yet) by anything in the range 51-54, while a 55 or up would be strong evidence that Labor has sustained serious (but not necessarily permanent) damage.  

[Update (Monday morning): The Newspoll was indeed very unpleasant for Labor, a 56 to the Coalition, although this was mitigated by a rather mild 54-46 (=53-47 accounting for house effect) Galaxy.  With two such different results more evidence of the new state of play from other pollsters is needed to say whether we have only seen a modest shift from 52.5-land or a major one.

Further Update (Wednesday morning): Only a modest shift with Morgan and Essential in the mix now; the Newspoll looks at least 2 points, probably 2.5, out of whack.]

The great thing about such an early announcement of an election date (assuming it lasts) is the unprecedented (for Australian federal elections) chance to benchmark the government's polling progress against past governments, and being able to accurately compare like with like in terms of time until a "known" polling date.  If the government remains in a competitive polling position, it will be possible to show how it scrubs up compared to 1954, 1993, 2001 and other such comebacks (as well as losses like 1996) and I hope to have some graphs on that stuff soon.  On the other hand, the announcement of the election so far in advance provides yet another way in which this election leadup will be unique and all attempts to project it based on past patterns could fail.

At the moment, and especially after a week like this, the government remaining in a competitive position is at least a medium-sized "if".  Abbott notwithstanding (his unpopularity probably is a drag on the Coalition vote, but apparently a finite one), I shan't be greatly shocked anymore if the last four months is about as close as it gets.  I often see parallels with 1996, in which the government was within reach in polling but never got clear air before yet another mistake had it hosing down self-created problems again.  But that's just my subjective view at the moment; the science is silent.

--------------------------------------------------------------------------------------------------------------

A funny thing happened in federal polling last year.  You can see it on the Mark the Ballot graph here, on the right-hand side just below the trend line.  There's a line of little circles, five of them, in a row.  Amid all the chaos and noise, the rogues and the bouncing, online pollster Essential Report polled the same national 2PP result, 53-47 to Coalition, in five consecutive fortnights.  Like this:

(excerpt from MTB here)
You wouldn't have noticed this if you were just following the Essential weekly figures.  In the same period they went 53-53-53-53-54-53-52-53-53-53, and you might have rightly thought four 53s in a row from a pollster that rolls its c. 1000-vote samples together over consecutive fortnights isn't really all that odd.  But to get five independent fortnightly samples in a row with the same rounded 2PP is the sort of thing that should only happen randomly maybe once every four or five years.  And yet, last year, it actually happened twice - from June 12 to August 6, five independent fortnightly Essential totals produced five 56-44s.

Now, of course, funny patterns of some sort or other will appear in any load of data that you look at, and at the moment it seems in Australia that one in a hundred year floods or heatwaves happen about two years out of three.  But there's apparently a reason for that, and in Essential's case I've been wondering if there is something strange going on with their results.  And I haven't been alone in wondering.

The poster reception to recent Essential polls in the comments section on pollbludger has often been one of familiar dismay.  This is to be expected, given that, for whatever reason, Pollbludger posters lean heavily to Labor, and the company keeps showing the Coalition way ahead, even while Newspoll doesn't.  But there is more to the reception than just the concern about house effects that accompanies every Galaxy poll that appears with its standard extra point to the Coalition.  In the case of Essential, there's concern that its online polling methods may be resulting in the poll being  repetitive and unresponsive to actual changes because of the way that it polls.  Is the company really "fishing in a stagnant pool"?

With that in mind I decided to test the following hypothesis: that in 2011 and 2012 Essential has been less bouncy than would be expected for a truly random sample even if voting intention was stable.

Essential's Nature and History

Essential is unlike other significant Australian pollsters in that it uses sub-sampling from a much larger online panel, rather than random sampling from the broader community, to make up its group of subjects for each poll. The pollster's methods are currently described as follows:

---------------------------------------------------------------------------------------------------------------

Essential Research has been utilizing the Your Source online panel to conduct research on a week by week basis since November 2007.  Each Monday, the team at Essential Media Communications discusses issues that are topical.  From there a series of questions are devised to put to the Australian  public.  Some questions are repeated each week (such as political preference and social perspective), while others are unique to each week and reflect prominent media and social issues that are present at the time.   
 
Your Source has a self-managed consumer online panel of over 100,000 members. The majority of panel members have been recruited using off line methodologies, effectively ruling out concerns associated with online self-selection.  Your Source has validation methods in place that prevent panelist over use and ensure member authenticity.    Your Source randomly selects 18+ males and females (with the aim of targeting 50/50 males/females) from its Australia wide panel.   An invitation is sent out to approximately 7000 – 8000 of their panel members.  The response rate varies each week, but usually delivers 1000+ responses.  The Your Source online omnibus is live from the Wednesday night of each week and closed on the following Sunday.  Incentives are offered to participants in the form of points.
 
EMC uses the Statistical Package for the Social Sciences (SPSS) software to analyse the data.  The data is weighted against Australian Bureau of  Statistics (ABS) data.  

-------------------------------------------------------------------------------------------------------------------

When publishing a weekly poll rate, Essential merges each week's sample with the previous week's sampling, to create a sample size that averages around 1900 respondents for the voting intention question.  (Uncertain respondents are excluded from the percentage calculations and hence the sample size).  An archive of Essential Report releases from Sep 2008 to 2010 is stored here and recent figures can be found here (with many pages of previous results to scroll back through).  From these sources plus the frequent old reporting of the poll at Pollytics I was able to compile a full list of Essential 2PPs since Sep 2008.  (I have not yet been able to find the late 2007-early 2008 figures, although they evidently exist.)

When its polling first became prominent, Essential was difficult to take seriously, because although it asked a wider range of interesting questions than other pollsters (and still does), its results were very Labor-skewed.  This chart from Pollytics shows that through most of 2008, and through the first half of 2009, Essential's reading of the Labor primary vote was about three points higher than Newspoll's. The pollster was locked in mortal combat with Morgan Face-2-face for the title of the most house-effected of the major pollsters.  In early 2009 it was giving the Coalition 2PPs as low as 37 while Newspoll was bottoming out around 42.

Furthermore, for its sample size, the poll was remarkably bouncy.  In June 2009 (just before the OzCar saga dented Turnbull's leadership) it changed in the Coalition's favour by seven points in a fortnight (though they were still behind 55:45) but it soon returned to its overly Labor-favouring ways. 

From around the start of 2010 onwards, though, the pollster became less out of step with others, and it recorded a good result at the 2010 election when its final poll (51:49 to Labor) missed the mark by less than a point.  In Feb 2011 the poll was assessed as favouring Labor by 0.7 points.  But this has shifted, and the current Mark the Ballot aggregator post shows it as now the most Coalition-friendly of the major pollsters, slightly more so than Galaxy, with a lean to the Coalition that exceeds a point. For the history of this shift over time see Mark the Ballot link (see first comment) here.   Bludgertrack gives Essential a reasonable accuracy rating, although this is based on between-election comparisons with Newspoll, and also considers it to skew Coalition by about a point.

This graph tracks the Essential weekly reading of the Coalition 2PP since September 2008 (click to expand):


Note that in 2008 and 2009 the line is much more erratic than in at least the second half of 2010 (probably most of 2010 actually, as the dip midway through 2010 is the short-lived Gillard transition bounce for Labor, and clearly not random bouncing).  The poll became less erratic at about the same time as it stopped wildly favouring the ALP compared to other pollsters (as discussed above).  I don't know if the company changed its herbs and spices around this time, but it's a remarkable difference. 

To assess how much Essential bounces from sample to sample, I decided to split the overlapping fortnightly samples into two groups labelled A and B, so that each sample consisted of a string of completely distinct fortnightly samples. The first weekly sample from each year goes in group A, the second in group B, the third in group A, the fourth in group B, etc.  I think this is easier than trying to model/simulate the complications of samples that are batched with other samples when you don't know the exclusion rate for each.  As a result there are two runs of fortnightly samples, which describe almost exactly the same polling (except at the end of the year) but compile it differently, leading to somewhat different results.

The following are the yearly averages for the level of fortnightly poll-to-poll bouncing in Essential, for the two groups.  I could also include standard deviations, but when dealing with clearly non-standard distributions, that would be a bit spurious. (For 2008 I have only results from September onwards.)


(Click for larger version.)  In 2008 and 2009 the average poll to poll bounce for Essential was about 2 points.  But since then, it was barely over 1 point in 2010, and below that on an average across the two samples in 2011, with 2012 the least bouncy yet (an average of 0.827 points change per fortnight across the two sampling runs).

How Bouncy "Should" An Opinion Poll Be?

In my article Morgan and the Myth of Excessive Bouncing I refuted the myth that both Newspoll and Morgan Face-to-Face readings are excessively prone to bouncing around from poll to poll.  I showed that both these polls - both of which rely mainly on random selection from the wider voting population - were not much bouncier than a simulated random poll involving respondents who always had a 54% chance of preferring the Coalition.

This is as it should be. Newspolls aren't completely randomised, in that they do employ geographic scaling, but I don't think that would greatly reduce poll-to-poll bounciness.  Some poll to poll movement comes from genuine long-term changes and some probably from short-term issues, and some of it comes from things like leadership bounces, so for a poll to be bouncing at slightly above the expected random poll-to-poll bounce is to be expected.  If a poll is much less bouncy than a random series of similar size, it's either doing something very clever, or something wrong.

In the case of Essential, if we assume a sample size of 2000 (it is actually usually a bit smaller) and a 2PP of 54, then the standard deviation for the result of the whole sample (the average difference between the sample and the expected score) would be about 1.11 points.  But that's before rounding, which has some impact on it.  I'm not familiar with (or can't remember!) the maths required to model the impact of rounding precisely, or to calculate the expected poll-to-poll difference (except that it should be at least as big as the SD), so I've decided to model it experimentally.  Enter my little simulation, FakeEssential.

At 17am on the 16th and 32nd of each month, FakeEssential pseudo-randomly polls 2000 fake residents of the land of Ausfakia by using the Excel RAND() function to ask them to pick a number between 0 and 1.  (I even borrowed one of my partner's computers so I could use Excel 2007, as pre-2003 Excel has a primitive pseudo-random number generator and I'm not sure what Libre Office's is like.) A figure above 0.46 is interpreted as support for the Coalition, while below 0.46 means that the fake voter fears even imaginary Tony Abbotts and won't vote for one.  With a total of 5174 polls by FakeEssential in the can, it's been running for over 215 fake years and polled over 10 million fake voters (most of the Ausfakian electorate!)

FakeEssential is designed to simulate a dreary land in which Absolutely Phony Tony's Fake Coalition leads The Unreal Julia's Faker Party 54-46 forever (54-46 being close to the average real position of the last two years), however there is never an election.  After 215 fake years of this it had the Fake Coalition on an average 2FP (two-fake-preferred) of 54.038% with a standard deviation of 1.139 points.  Consistent with experience that virtually no data lie more than three standard deviations from the mean, the highest result it recorded was 58 (five times in 5174 samples) and the lowest 50 (three times.)

The mean fortnightly poll-to-poll change of FakeEssential is 1.250 pointsHere's what the frequency distribution of week-to-week net bounces looks like:


(I pressed the invert colours button by mistake and decided I liked it that way.  That isn't quite a zero for a six-point bounce - it actually happened three times in the sample.)

Fake vs Real Essential 

My original hypothesis was that Essential's results for 2011-2 would be less bouncy than FakeEssential's.  Before I make the comparison, I should note that there are three assumptions I am making that should make such a finding less likely, and that should make FakeEssential bounce less than Essential if Essential is anything like a simple random sample from a large pool.  These are:

1. FakeEssential's underlying 2FP never changes.  As already noted, poll-to-poll changes in real life are not just random but can also be a result of long-term trend changes and short-term influences.

2. FakeEssential's sample size is larger than Essential's usual effective sample size.  A smaller poll bounces more than a larger one.  (To illustrate this, I referred to the two cases last year of Essential releasing the same result five fortnights in a row.  Newspoll, with a somewhat smaller sample size, has only recorded the same result four times in a row once in its 27-year history (early 2003, oddly enough, 51-49 to Coalition)). 

3. And this is a sneaky one: by setting 54 as the average 2PP instead of 54-point-something, I'm making it more likely that a single specific value, 54, will be recorded after the rounding.  If the random variation in the sample is less than half a point either way from the average, it rounds to 54.  If, on the other hand, I set it as 54.5, then a trivial variation in one direction rounds to 54, while in the other it rounds to 55.  (The difference this makes to the average poll to poll change is small - a few hundredths of a point in a test I did earlier.)

Also I'm not including end-of-year to start-of-next-year changes, because they occur over a slightly longer period, although as it happens the last three have been zero, zero and one points.


Pooling the 2011 A and 2012 A samples, I get an average poll-to-poll bounce of 0.896 points from 48 fortnightly changes.  For the 2011 B and 2012 B samples pooled, it's 0.870 points from 46.

Here's the graphs of the frequency of poll-to-poll changes in the two samples:



Compared to the FakeEssential graph above, the distribution is broadly similar but 0-point changes are much closer in frequency to 1-point changes, while 2-point and 3-point changes are much less common in relative terms.  In fact, zero-point changes (same result two fortnights in a row) have been about 40% of the total for Essential in 2011-2 compared to about 25% for its fake equivalent. 

Overall, Essential is 30% less bouncy from poll to poll than FakeEssential.  (For those who need the relatively obvious blessed by a significance indictator, I thought Mann-Whitney would be the least violated, and it gives p=0.0129 for the A sample and p=0.0078 for the Bs.  Note that these are not independent tests but two different ways of testing the bounciness of more or less the same sample.  It's a result that is statistically significant, and bordering on being strongly so.)

The result is very likely to also be significant if data from 2010 is added in.  But that wasn't my original hypothesis so I haven't tested it for now.  

Ah! But What Does It Mean?

I've concluded that there appears to be strong evidence that Essential Report is less bouncy than a poll of the same size would normally be, even when I skew the experiment against such a conclusion.

A pollster who manages to record poll-to-poll shifts that are well below those that would occur randomly could be doing something right, or something wrong, or both.  The "something right" would possibly be to do with the company's scaling.  To give an extreme and hopelessly simplistic example, suppose there are two electorates, in one of which everyone always votes Coalition and in one of which everyone always votes Labor.  If I sample voters from these two electorates combined, and select them at random, then my poll results will bounce like any other, as one poll includes more from the Coalition electorate, and the next includes more from the Labor electorate.  That's although voting intention will not change.  But if I instead ensure my sample is taken evenly from both electorates, or weight the values if it turns out that it isn't, I'll end up with a result of 50-50 every time, with no bouncing, which is actually the true result.

Pollsters don't use scaling specifically to control bouncing; they use it mainly so that they can sample an unrepresentative base and still convert it to a representative one, and thus avoid house effects.  But when a poll is using lots of sophisticated scaling, it's quite possible that this would reduce poll-to-poll bouncing.  But by over 30%? (Note added: sophisticated scaling can also quite easily, and indeed would usually increase bouncing, since the weighting of certain respondents above others makes the result more susceptible to variation between the highly-weighted respondents and thus reduces the effective sample size.)

The other possibility is that something is causing overly repetitive responses.   It's clear that the panel source imposes limits on panellist overuse, but it would be interesting to know more about the nature of these.  For instance, is it possible for the same subject to be interviewed twice in a four-week period (a significant chance of which might explain the patterns I have observed here), or is there a cooling off phase between polls?  If there is such a phase, how long is it?  An interesting question here (and readers might know some psychological literature on this) is whether polling the same person twice within a certain period - even one longer than four weeks - makes it more likely they will repeat their previous answer even if they were in the process of changing their mind, or even may make them more likely to maintain their pre-existing opinion.

Readers may have other information or statistical theories about what could be going on here.   (However, if commenting, please ensure that reasonable evidence is provided for any new claim that might damage the company's reputation.)

If the source of Essential's remarkably unbouncy polling was sheer modelling genius on the company's part one would expect this to come with 2PP readings that consistently aligned strongly with the established pollsters - Newspoll, Nielsen and Morgan phone especially - barring the remote possibility that all those polls have started leaning to Labor at the same time and Essential is now the only one getting it right.  Bounce-free polls that are also responsive to changes in the underlying trend are something pollsters would love to produce, but an insufficiently dynamic poll can be too slow to move when something actually happens.  The combination of low bouncing and a now 1-point-plus slippage from the aggregate of other pollsters makes me wonder if Essential's methods may be in need of a tin of this stuff:


"More bounce with every ounce!"

--------------------------------------------------------------------------------------------------------------------
Any response from Essential will be welcome and of course published (or not) as requested. 


7 comments:

  1. Like you, I have wondered for some time about the apparent under dispersion in Essential polls (especially as opinion polls are often a touch over dispersed - probably an artifact of post stratified sample re-weighting). I have also wondered about the slow drift in Essential's relative house effect over the years (http://marktheballot.blogspot.com.au/2012/12/house-effects-over-time.html).

    I'd assumed there was a quite lot of resampling going on (it would explain both observations). But it is only a hypothesis. I really don't know.

    ReplyDelete
    Replies
    1. Thanks! I've included a link to that piece. Also added a comment that scaling can easily create increased variation rather than reduce it.

      Delete
    2. If the Essential poll is highly resampled on (say) a fortnightly basis it would act like panel data. (The most famous Australian example of which is HILDA - http://www.melbourneinstitute.com/hilda/).

      The advantage of panel data is that each point of time observation has much the same sampling error as the previous sample. As a consequences, panel data is good at capturing the dynamics of change.

      I know, more than for any other pollster, I sit up and notice when the Essential poll changes.

      Delete
  2. I've wondered for a long time about the apparant pro-coalition leaning of Essential's polling. What occurred to me was basically that it must have something to do with the core polling sample base.

    Could it be possible that the underlying split of the entire 100,000 database just happened randomly to favour the coalition by, say, 1%. If this was so, then continual sampling from subsets of this pool might carry forward that bias as an overlying bias above any normal trend which other pollsters would be reporting. To put it another way, if the original 100k pool happened to have a 1% random bias towards labor voters, the Essential polling figure might now reflect a steady 1% bias towards labor. So basically the problem arises simply from the fact that Essential samples from a fixed 100k base, while the others sample from the entire population.

    ReplyDelete
    Replies
    1. Since Essential's "lean" has changed over time - quite dramatically - for the "lean" of the whole panel to cause the Coalition-leaning results in that way, there would have had to be changes in the composition of the panel over time. There would have to be people who tend to be more pro-Labor than average leaving, and people who tend to be more pro-Coalition than average coming in. I don't know anything about their turnover rate, so I don't eliminate panel change as a possible source of the sort of change Mark the Ballot mentioned in the link in the first comment.

      When a pollster uses weighting (and uses the general population as the basis for that weighting), an underlying "lean" in the overall panel's views is not necessarily a big problem anyway.

      Delete
  3. Could it be Essential's apparent lean and changes thereto don't really exist, at least not a strict Coalition v. Labor basis? Could it be that since Essential doesn't move with the trend that its "lean" is always toward the party against whom the trend is moving?

    ReplyDelete
    Replies
    1. It seems that this pattern of temporary lean against the trend (or indifference to it) has indeed been Essential's behaviour for a while now based on its form since this article was written. Though it wasn't the case in early 2012 when it was following the trend line down as much as anyone.

      It responded slowly to the trend back to Labor in late 2012, slowly and incompletely to the blowout to the Coalition in the first half of this year, incompletely to the Rudd bounce, and then by going in the opposite direction when the Rudd bounce went down. It's now at its highest value for ALP since early 2011.

      It may well be (I haven't checked) that over the whole of that period the differences from others all average out at more or less zero and the "leans" are only fairly short-term functions of its strange behaviour relative to trend. In the last few weeks it appears to have been rampantly ALP-leaning, though perhaps it just got two dud samples in a row.

      I've been finding this makes it very difficult to model in an aggregate and Mark the Ballot has kicked it out of his entirely.

      Delete