China Database Methodology

It is important to explain in some detail how the analysis of the historical data and the modelling of the forecasts has been done.  This is because China’s demographic data, while quite extensive, does nonetheless contain quite a few issues that must be dealt with in order to get a proper understanding of what the population is really like.  While it is often said that China’s published data is questionable, the fact is that China is one of the few countries in the world where the entire population is ‘registered’.  As such, in terms of demographics there is a comprehensive and quite reliable database available.  However, such is the size of China that the aggregation of the statistics does cause some issues which need to be dealt with before forecasts can be made.


DATA Sources

All data used in this Report is obtained from publications of the National Bureau of Statistics of China, hereafter referred to as the NBS.  The NBS makes available quite a wide range of data in a number of publications including the National Statistical Yearbook as well as the Statistical Yearbook for each province.  There are also publications by the NBS giving county and City level data.  While some of this data or is available online, at the time of writing this Report the majority of the most recent data is still only obtainable in print form (currently the web is about 6 months behind print – this will no doubt change in time.).  As such Global Demographics Ltd. acquires the entire range of publications each year and they are the input source for the population models.

Global Demographics Ltd also purchases and uses publications by various Ministries in China including Health, Labour, and Education.  In many respects these provide a useful cross check on the data obtained directly from the NBS.  For example, the number of students registered as entering primary school aged six should correlate with the number of births reported six years earlier.

The source data is then entered into our own templates and checked for consistency with previous years’ data as well as summing over totals etc.



The next step is to ‘clean’ the data.  This is necessary in the case of China because almost invariably the sum of the components does not equal the total.  For example, the reported total population of the provinces varies marginally from the reported total population of the country.  The same applies for virtually every major variable.  This issue exists for multiple reasons, not least of all the sheer size of the population being reported on.  Other factors include non-reporting of specific sub populations for reasons of State Security, and an incorrect reporting of data for political reasons.  However, explaining the reasons for these differences is not really a productive exercise, the issue is how they are resolved to create a coherent dataset suitable for analysis to provide the models with which to forecast.

To deal with this issue, Global Demographics Ltd have made the assumption that the national figure is always the most correct.  This is done on the basis that the NBS employs some very talented individuals and those in the central office would probably be those most aware of what is the most correct situation.  They will also be in the position to identify and remove factors such as double counting.  Given that assumption, Global Demographics Ltd then systematically modifies the component data so that it sums to the national figure.  Because the exact source of the error is not known, the error is applied proportionately to all the components.  For example, because the sum of the total populations of the provinces does not equal the national population, we adjust the province totals by multiplying each of the provinces by the error.  For example, if the sum of the provinces is 98% of the national Figure then all of the individual provinces are multiplied by 1/ 0.98 = 1.0204

It is recognized that this is not ideal, but as the exact location of the error is not known, just that there is an error, the process effectively spreads the error across all the possible components.  This process has other implications.  Many of the demographic variables have interdependencies and if one of them is altered, then that has implications for others.  A good example of this is that by reweighting the population and number of households in the provinces so that the sum of them matches the national total, this has implications for average household size (number of persons per household).  In these instances, we have used the new derived figure rather than the published figure so that the overall database is internally consistent.

The implication of this process is that our underlying historic data shown in this publication is similar to the published data only at the national level.  Almost all underlying variables had been subject to some modification so that they sum to the national Figure or their weighted average equals the national average. Generally, the variation of the sub-level data from the published values is not significant, but it is different and for good reason – the published data contains error.

Timeliness of Data

China publishes the previous year’s data at a province level starting in October. For example, in October 2017 the first of the province data for 2016 was published.  There is quite a time lag between the first province being available (October) and the last (March the subsequent year). Detailed prefecture and county Level data as well as some Ministry level data is also not available until March or April of the subsequent year.  For example, 2015 data at county level is not fully available until April 2017.  Because of the cleaning process and ensuring that the sub regions add to the master regions, this means the full county level model cannot be updated until April of the subsequent year.

In terms of forecasts, they are run forward twenty years.  The error for the first ten years is generally acceptable, and reducing.  The error for 20 years out will be higher as a result of migration trends and potential changes in government policy on issues such as birth rates and freedom of movement of the population.  As such the longer term forecasts should be treated as indicative only, given current trends and relationships.



Having cleaned the data, the next stage is to model the historic relationships that exist within it.  Global Demographics has a prior model of demographic relationships, as shown in Figure 1-1 earlier in Section 1 of this Report.  The only exogenous variables to this model are birth rates and death rates.  For birth rates, they have to be exogenous because in the case of China it is subject to government policy.  In the case of death rates, they tend to have a highly lagged relationship to various socio economic variables such as education and income and as such we do not have the historical data sufficiently far back in time to model and the forecast the relationship.  For example, the nutritional value of a person’s diet in their first eight years of life significantly influences lifespan.  Unfortunately, there is no data available on the per capita spend on food 40 years ago in China.  As it happens death rates do follow quite a consistent trend reflecting improvements in nutrition, health care and education, and therefore we use these trends by age and gender for forecasting purposes.


Overall population change

Apart from those birth rates and death rates the projected values of the other variables are largely derived from a logical model of the population.  We start with the latest known age distribution of the population (at national, province and county level) re express it in one-year age steps, if not already published in that form, then move everybody forward one year of age less an adjustment by the proportion expected to die in that year of that age gender.  Obviously, births are added in at the bottom, and are estimated by the trend in propensity of a woman to have a child in that year given her present age.  Birth rates by age of mother are published and tend to have a relatively consistent trend, subject to changes in government policy in respect of the One Child Policy which is discussed in detail in Section three.

This process is started as a national level first, then repeated at a province level, with annual corrections so that it sums to the national level.  This is then repeated at prefecture level and finally county level with the prefectures summing to the province level, and the Counties summing to the prefecture total.  Note, Section Two describes the geopolitical structure of China and the inter relationship between Counties, Prefectures and Provinces.  For the moment it is sufficient to appreciate that Counties are sub regions of Prefectures, which in turn are sub regions of Provinces.  There are 27 provinces and 4 municipalities.

This process would be very simple and reliable if it were not for one other variable, and that is migration.  In the case of China, external migration is actually almost insignificant at this stage and there is little need to consider it.  In contrast, rural to urban migration is very significant and needs to be taken into account.  Fortunately, good data is published on the age gender profile of the urban and rural populations at county and province level and it is therefore possible to derive who has migrated.  It will be no surprise to learn that historically rural to urban migrants have been biased to persons aged 15 to 39 years of age.  As such it has been possible to develop a migration model which is incorporated into the population forecast model, where a certain proportion by age and gender of the rural population in a county is estimated to have moved to an urban location in the same county in the year and the two populations are adjusted accordingly.  There is some error in that whereas the rural to urban migration might initially be within the same county, there is also a proportion that is between provinces typically rural to urban or urban to urban, so a province migration model has also had to be created and applied.

In the case of China there is also a need to adjust for the ‘population’ being reported. As discussed in detail in Section Two, the published data reports on two populations – the Hukou population which is essentially where people are born (traditional family base) and the registered population which is persons registered with the Public Security Bureau if living for more than six months away from their Hukou county.  This of course includes the huge number of persons who have migrated from rural to urban areas over the last two decades.  The larger urban areas tend to report the registered population as this gives a more accurate measure of the population there, whereas the smaller towns and villages report the Hukou – even though many are no longer living there (the Hukou does not move with the person).  As such when the ‘reported’ populations are summed there is a systematic overstatement of the population and it is necessary to reweight the populations of the smaller urban areas and villages down to adjust for this overstatement.  Apart from the issue of measurement, if working with population statistics in China check which population you have information on – Hukou or Registered.


Education Profile of the adult population

In terms of education there is good data on enrolments by age of child, and we are able to express this as a proportion of each age group, and continue this trend forward. This gives the expected number of people at each level of education (number of persons aged 5 to 9 years multiplied by the propensity of 5 to 9 year olds attending primary – which in China is 100%) as well as the education profile of those leaving the education system and joining the workforce.  This also gives us a measure of how the education profile of the workforce will change over time with the potential error being the estimated education profile of those retiring.  In the absence of data on that it has been assumed that the retirees have the worst education profile available in historic statistics, which is typically primary or lower secondary education only.


Household Size

The combination of age profile and education profile of the population is a good determinant of average household size.  The older the population the greater the proportion of households that are two person or one person only households. The better educated the population the fewer children and lesser acceptance of extended family situations. The historic relationship is very strong and the relationship is used to forecast average household size, which combined with the forecast population gives the total number of households.  This is done separately for urban and rural populations, and of course by province and county.  Again, in terms of historic data,  there is a cross check that the sum of the estimated number of households in each province and county equals the national total.

Labour Force

The size of the labour force is forecast by multiplying the number of people of working age by their propensity to be employed.  Working age in the case of China is defined as ages 15 to 64 inclusive.  The lower bound is raised over time as a result of increased educational opportunities delaying entry into the workforce. The upper bound (which is higher that the official retirement age for men) reflects the understanding that globally people are living longer and working longer.  In rural China particularly, there is an increasing number of persons aged 60 to 74 still in full time employment.  While a small percentage of the total labour force in 2015, the older worker starts to become important by 2035 – especially, by then, in urban areas.  Retirement is delayed as people live longer.  Both trends are consistent in nature and can be forecast with some reliability, Therefore our definition of working age by 2034 is 18 to 74 years.

The propensity to be employed (that is persons of working age who are employed – and not to be confused with unemployment which is the proportion of persons of working age seeking but not finding work) also tends to follow quite a consistent trend, although there are fluctuations around the trend line depending on the state of the economy and the availability of work.  Given that the working-age population is quite reliably measured, and the drivers behind the trend in propensity to be employed are relatively stable, the estimates of the future total labour force have quite a high confidence.


GDP Forecast

Forecasting household incomes and expenditure is somewhat more problematic. To put it simply household incomes are a function of GDP and the share of it that reaches the worker, and the number of workers per household.  Forecasting GDP is difficult and it would appear that few economists actually get it right and many hype it in order to attract investor’s money.

Global Demographics Ltd uses the IMF forecast for the current and subsequent year on the basis that they are probably the most knowledgeable independent forecaster for GDP.  Their short term forecasts would take into account the many factors that potentially influence total real GDP.

To forecast the GDP in future years Global Demographics Ltd uses the following approach.  GDP in its simplest form is productivity per worker multiplied by the number of workers.  Global Demographics Ltd has a good forecast of the number of workers and therefore can rely on that input to the equation. In terms of productivity per worker, historically this is a function of the change in the education standard of the adult population and the change in the accumulated fixed Capital Investment per worker.  In short the growth of GDP per worker is a function of the growth in education and capital behind the worker.

In terms of education we have a strong basis for forecasting the overall education profile of the adult population as explained earlier.  China publishes both the existing education profile of the adult population as well as current school enrolments, and with that Global Demographics Ltd are able to estimate quite reliably how the education profile will change (improve) over the future years.  The potential error in this is quite low as the trends in terms of education profile of those entering adult age are quite solid.

Accumulated Fixed Capital Investment per worker is slightly less reliable.  Total Accumulated Fixed Capital Investment per worker is the fixed Capital Investment of each year for the previous 10 years depreciated at a rate of 10% per annum.  Using this measure avoids inter year fluctuations as a result of government short-term policy.  However, the level of Fixed Capital Investment in future years is a function of many things not least of which is the cost of labour and government policy.  The assumption is that Fixed Capital Investment in China will continue to increase, but probably not at the rate evident between 2008 and 2016 when the government stimulus policy was in play.  Instead it is projected to increase at the rate prior to 2007.


Household Incomes

Forecasting household incomes while significantly influenced by GDP is also a function of the share of GDP that the worker. Globally this share of productivity that reaches the worker is a function of the supply of labour and education standard.  If there is a large number of poorly educated people looking for work, then the share of productivity that reaches the worker is low.  This is because the ability of the worker to negotiate a greater share of the benefit of their labour is constrained.  Conversely in a well-educated society with a slow growing, or even declining, labour force the need to pay higher wages is greater and as a result private consumption is a greater proportion of GDP.

In the case of China, although the overall education standard is relatively middle level by global standards it is improving rapidly.  In terms of the supply of labour, this is still positive in urban China but the rate of growth is slowing, and labour supply is actually declining in the rural areas.  Both these trends would argue for the share of GDP paid out in wages to increase and the model reflects that.   However, as discussed later in the chapter on household incomes, China is unusual in that it currently distributes a very low proportion of its productivity back to its workers at 58% in wages.

The combination of these factors means that China will have to move away from its present position where only 38% of its GDP ends up in private consumption.  From a ‘balanced economy’ point of view it needs to increase to a higher level.  The question is, what level?  There are really two constraints operating here.  It cannot increase to the point where the wage is 100% of productivity. Also  if the share of productivity that goes to the worker increases too much, then return on investment drops and that reduces future investment.

A rationale is developed for a most likely and optimistic case and this is explained in detail in the Section Eight on GDP growth.  The current version released by Global Demographics’ Ltd assumes the Wage Component of GDP per worker will reach 70% by 2037.

Average Wage multiplied by number of workers per household gives average household income for the county.   The average propensity to save, pay taxes and spend by income is published in the annual Household Income and Expenditure Report and the trends in that are used to forecast forward by income level.  The distribution of households by income (and their spending patterns by income) as reported in the Household Income and Expenditure Survey is then used to estimate the distribution of households around the average. The model is structured such that average household income minus savings minus tax gives an average expenditure per household which, when multiplied by number of households, equals the Private Consumption Expenditure share of GDP.  ( A useful sanity check).

While clearly it is possible to challenge any such methodology, Global Demographics Ltd are of the view that this probably gives a good estimate of what the average household income will be in future years and also the distribution of households around the mean.  The NBS Household Income and Expenditure Survey potentially has errors in terms of under reporting income but it probably does not have systematic errors in terms of the distribution around the mean or the rates of saving or tax by reported income level.  Of course there will be some but the Household Income and Expenditure Survey is nonetheless probably the most reliable data on the shape of the distribution.

It is fortunate that in the case of China private consumption expenditure is published separately for urban and rural households by province, and also at county level the net value added is also published which is a version of GDP.  Using these additional pieces of information, it is possible to estimate what household incomes are down to city level as well as the distribution of households around the mean.

The final stage of the forecast model is household expenditure and this uses the relationships evident in the Household Income and Expenditure Survey.  Specifically, we look at the propensity to spend on each of 42 categories of consumption by income level of the household.  Again this data is published by region and provides a good statistical framework to model and use for forecasting purposes.