Methodology – China Database

NOTE: Financial Data is Expressed in Real Values.

Introduction

It is essential to explain in detail how the historical data analysis and the forecasts’ modelling have been done. This is because China’s demographic data, while quite extensive, nonetheless contains a few issues that must be dealt with to properly understand what the population is really like. While it is often said that China’s published data is questionable, the fact is that China is one of the few countries in the world where the entire population is ‘registered’. As such, a comprehensive and quite reliable database is available regarding demographics.

DATA Sources

All data used is obtained from publications of the National Bureau of Statistics of China, hereafter referred to as the NBS. The NBS makes a wide range of data available in several publications, including the National Statistical Yearbook and the Statistical Yearbook for each province. The NBS also publishes county—and city-level data.

In recent years, an increasing amount of data at a national and provincial level has become available through their online database. As this tends to be more up-to-date, we use it where available.

As such Global Demographics Ltd. acquires the entire range of publications each year and they are the input source for the population models.

Global Demographics Ltd also purchases and uses publications by various Ministries in China, including Health, Labour, and Education. In many respects, these provide a useful cross-check on the data obtained directly from the NBS. For example, the number of students registered as entering primary school at the age of six should correlate with the number of births reported six years earlier.

The source data is then entered into our own templates and checked for consistency with previous years’ data, summing over totals, etc.

Cleaning

The next step is to ‘clean’ the data. This is necessary in the case of China because, almost invariably, the sum of the components does not equal the total. For example, the reported total population of the provinces varies marginally from the reported total population of the country. The same applies to virtually every significant variable.

This issue exists for multiple reasons, not least of all the sheer size of the population being reported on. Other factors include non-reporting of specific subpopulations for reasons of State Security. However, explaining the reasons for these differences is not a productive exercise, and the issue is how they are resolved to create a coherent dataset suitable for analysis to provide the models to forecast.

Global Demographics Ltd has assumed that the national figure is always the most correct to deal with this issue. This is because the NBS employs talented individuals, and those in the central office are probably the most aware of the proper situation. They will also be able to identify and remove factors such as double counting. Given that assumption, Global Demographics Ltd then systematically modifies the component data to sum up to the national figure. Because the exact source of the error is unknown, the error is applied proportionately to all the components. For example, because the sum of the total populations of the provinces does not equal the national population, we adjust the province totals by multiplying each province by the error. For example, if the sum of the provinces is 98% of the national Figure, then all of the individual provinces are multiplied by 1/ 0.98 = 1.0204

It is recognized that this is not ideal, but as the exact location of the error is not known, just that there is an error, the process effectively spreads the error across all the possible components. This process has other implications. Many demographic variables have interdependencies, and if one is altered, then that has implications for others. An excellent example is reweighing the population and number of households in the provinces so that the sum matches the national total, which has implications for average household size (number of persons per household). In these instances, we have used the newly derived figure rather than the published figure so that the overall database is internally consistent.

This process implies that the underlying historical data in our publications and online services is similar to the published data only at the national level. Almost all underlying variables have been subject to some modification so that they sum to the national Figure or their weighted average equals the national average. Generally, the variation of the sub-level data from the published values is not significant, but it is different.

Timeliness of Data

China publishes the previous year’s data at a province level starting in October. For example, in October 2017, the first province data for 2016 was published. There is quite a time lag between the first province’s availability (October) and the last (March of the subsequent year). Detailed prefecture and county-level data and some Ministry-level data are unavailable until March or April of the following year. For example, 2015 data at the county level is not fully available until April 2017.

Because of the cleaning process and the need to ensure that the sub-regions sum to the master regions, the full county-level model cannot be updated until April of the subsequent year.

Forecasts are run forward forty years. The error for the first ten years is generally acceptable and reducing. The error for 20 years out will be higher due to migration trends and potential changes in government policy on issues such as birth rates and freedom of movement of the population. As such, the longer-term forecasts should be treated as indicative only, given current trends and relationships.

Modelling

After cleaning the data, the next stage is to model its historical relationships. Global Demographics has a prior model of demographic relationships. The only exogenous variables in this model are birth rates and death rates. Birth rates have to be exogenous because, in the case of China, it is subject to government policy. In the case of death rates, they tend to have a highly lagged relationship to various socio-economic variables such as education and income. As such, we do not have the historical data sufficiently far back in time to model and forecast the relationship. For example, the nutritional value of a person’s diet in their first eight years of life significantly influences lifespan. Unfortunately, no data is available on the per capita spend on food 40 years ago in China. As it happens, death rates follow a consistent trend reflecting improvements in nutrition, health care, and education; therefore, we use these trends by age and gender for forecasting purposes.

Overall population change

Except for the birth and death rates, the projected values of the other variables are primarily derived from a logical population model.

We start with the latest known age distribution of the population (at national, province and county level) and express it in one-year age steps, if not already published in that form, then move everybody forward one year of age less an adjustment by the proportion expected to die in that year of that age gender. Births are added at the bottom and are estimated by the trend in the propensity of a woman to have a child in that year, given her present age. Birth rates by mother’s age are published and tend to have a relatively consistent trend, subject to changes in government policy for the One Child Policy.

This process starts at a national level first, then is repeated at a provincial level, with annual corrections to sum up at the national level. This is then repeated at the prefecture level and finally at the county level, with the prefectures summing to the province level and the Counties summing to the prefecture total.  Counties are sub-regions of Prefectures, which are sub-regions of Provinces. There are 27 provinces and four municipalities.

This process would be straightforward and reliable if it were not for one other variable: migration. In the case of China, external migration is almost insignificant at this stage, and there is little need to consider it. In contrast, rural-to-urban migration is very significant and must be considered. Fortunately, good data has been published on the age and gender profile of the urban and rural populations at the county and province levels, and it is therefore possible to determine who has migrated. It will be no surprise to learn that historically, rural to urban migrants have been biased toward persons aged 15 to 39. As such, it has been possible to develop a migration model which is incorporated into the population forecast model, where a certain proportion by age and gender of the rural population in a county is estimated to have moved to an urban location in the same county in the year, and the two populations are adjusted accordingly. There is some error in that whereas the rural-to-urban migration might initially be within the same county, there is also a proportion between provinces, typically rural to urban or urban to urban, so a province migration model has also had to be created and applied.

In the case of China, there is also a need to adjust for the ‘population’ being reported. As discussed in detail in Section Two, the published data reports on two populations – the Hukou population, which is essentially where people are born (traditional family base) and the registered population, which is persons registered with the Public Security Bureau if living for more than six months away from their Hukou county. This includes the significant number of persons who have migrated from rural to urban areas over the last two decades. The larger urban areas tend to report the registered population, which gives a more accurate measure of the population there.

In contrast, the smaller towns and villages report the Hukou – even though many are no longer living there (the Hukou does change with the person’s relocation). As such, when the ‘reported’ populations are summed, there is a systematic overstatement of the population, and it is necessary to reweight the populations of the smaller urban areas and villages down to adjust for this overstatement. Apart from the measurement issue, if working with population statistics in China check which population you have information on – Hukou or Registered.

Education Profile of the adult population

In terms of education, there is good data on enrolments by age of child, and we can express this as a proportion of each age group and continue this trend forward. This gives the expected number of people at each level of education (number of persons aged 5 to 9 years multiplied by the propensity of 5 to 9-year-olds attending primary – which in China is 100%) as well as the education profile of those leaving the education system and joining the workforce. This also gives us a measure of how the education profile of the workforce will change over time, with the potential error being the estimated education profile of those retiring. Without data, it has been assumed that retirees have the worst education profile available in historical statistics, typically primary or lower secondary education only.

Household Size

The combination of the population’s age profile and education profile is a good determinant of average household size. The older the population is, the more significant the proportion of households with two-person or one-person only. The better educated the population, the fewer children and lesser acceptance of extended family situations. The historical relationship is powerful, and the relationship is used to forecast the average household size, which, combined with the forecast population, gives the total number of households. This is done separately for urban and rural populations and, of course, by province and county.

Again, in terms of historical data, there is a cross-check that the sum of the estimated number of households in each province and county equals the national total.

Labour Force

The size of the labour force is forecast by multiplying the number of people of working age by their propensity to be employed. Working age in the case of China is defined as ages 15 to 64 inclusive. The lower bound is raised over time as a result of increased educational opportunities delaying entry into the workforce. The upper bound (which is higher than the official retirement age for men) reflects the understanding that globally, people are living longer and working longer. In rural China mainly, there is an increasing number of persons aged 60 to 74 still in full time employment. While a small percentage of the total labour force in 2015, the older workers will start to become important by 2035 – especially, by then, in urban areas. Retirement is delayed as people live longer. Both trends are consistent in nature and can be forecast with some reliability, Therefore our definition of working age by 2034 is 18 to 74 years.

The propensity to be employed (that is, persons of working age who are employed – and not to be confused with unemployment, which is the proportion of persons of working age seeking but not finding work) also tends to follow quite a consistent trend. However, there are fluctuations around the trend line depending on the state of the economy and the availability of work.

Given that the working-age population is quite reliably measured, and the drivers behind the trend in propensity to be employed are relatively stable, the estimates of the future total labour force have quite a high confidence.

GDP Forecast

Forecasting household incomes and expenditures is somewhat more problematic. To put it simply, household incomes are a function of GDP and the share of it that reaches the worker and the number of workers per household. Forecasting GDP is difficult, and it would appear that few economists actually get it right, and many hype it to attract investors’ money.

Global Demographics Ltd uses the following approach to forecast the GDP in future years. GDP, in its simplest form, is productivity per worker multiplied by the number of workers. Global Demographics Ltd has a reasonable forecast of the number of workers and, therefore, can rely on that input to the equation. In terms of productivity per worker, historically, this has been a function of the change in the education standard of the adult population and the change in the accumulated fixed capital investment per worker. In short, GDP growth per worker is a function of the growth in education and capital behind the worker.

In terms of education, we have a solid basis for forecasting the overall education profile of the adult population. China publishes both the existing education profile of the adult population as well as current school enrolments, and with that, Global Demographics Ltd is able to estimate quite reliably how the education profile will change (improve) over the future years. The potential error in this is quite low as the trends in terms of the education profile of those entering adult age are quite solid.

Accumulated Fixed Capital Investment per worker is slightly less reliable. Total Accumulated Fixed Capital Investment per worker is the fixed Capital Investment of each year for the previous ten years depreciated at a rate of 10% per annum. Using this measure avoids inter-year fluctuations as a result of government short-term policy. However, the level of fixed capital investment in future years is a function of many things, not least the cost of labour and government policy. The assumption is that Fixed Capital Investment in China will continue to increase, in line with the size of rural-to-urban migration.

Household Incomes

Forecasting household incomes, while significantly influenced by GDP, is also a function of the share of GDP the worker receives in wages. Globally, this share of productivity that reaches the worker is a function of the supply of labour, accumulated Fixed Capital Investment per worker (capital investment behind the worker), and education standards. If there are a large number of poorly educated people looking for work and there is a high capital investment to support them, then the share of productivity that reaches the worker is low.

This is because the worker’s ability to negotiate a greater share of the benefit of their labour is constrained. Conversely, in a well-educated society with a slow-growing, or even declining, labour force, the need to pay higher wages is greater, and as a result, private consumption is a greater proportion of GDP.

In the case of China, although the overall education standard is relatively middle-level by global standards, it is improving rapidly. In terms of the supply of labour, this is still positive in urban China but the rate of growth is slowing, and labour supply is declining in the rural areas. Both these trends would argue for the share of GDP paid out in wages to increase, and the model reflects that. However, as discussed later in the chapter on household incomes, China is unusual because it currently distributes a very low proportion of its productivity back to its workers at 58% in wages.

The combination of these factors means that China will have to move away from its present position, where only 38% of its GDP ends up in private consumption. From a ‘balanced economy’ point of view, it needs to increase to a higher level. The question is, what level? Two constraints are operating here. It cannot grow to the point where the wage is 100% of productivity. Also, if the share of productivity that goes to the worker increases too much, then the return on investment drops and that reduces future investment (and productivity).

A rationale is developed for a most likely and optimistic case and this is explained in detail in the Section Eight on GDP growth. The current version released by Global Demographics’ Ltd assumes the Wage Component of GDP per worker will reach 70% by 2037.

The average Wage multiplied by the number of workers per household gives the average household income for the county.

The average propensity to save, pay taxes and spend by income is published in the annual Household Income and Expenditure Report and the trends in that are used to forecast forward by income level. The distribution of households by income (and their spending patterns by income) as reported in the Household Income and Expenditure Survey is then used to estimate the distribution of households around the average. The model is structured such that average household income minus savings minus tax gives an average expenditure per household, which, when multiplied by the number of households, equals the Private Consumption Expenditure share of GDP. ( A helpful sanity check).

While clearly, it is possible to challenge any such methodology. Global Demographics Ltd believe that this probably gives a reasonable estimate of the average household income in future years and the distribution of households around the mean. The NBS Household Income and Expenditure Survey potentially has errors in underreporting income. Still, there are probably no systematic errors in the distribution around the mean or the savings or tax rates by reported income level. Of course, there will be some, but the Household Income and Expenditure Survey is probably the most reliable data on the shape of the distribution.

It is fortunate that in the case of China, private consumption expenditures are published separately for urban and rural households by province, and at the county level, the net value added is also published, which is a version of GDP. Using these additional pieces of information, it is possible to estimate household incomes down to the city level and the distribution of households around the mean.

The final stage of the forecast model is household expenditure, and this uses the relationships evident in the Household Income and Expenditure Survey. Specifically, we look at the propensity to spend on each of 42 categories of consumption by household income level. Again, this data is published by region and provides a good statistical framework to model and use for forecasting purposes.