The occupational classifications used in the population censuses carried out between 1985 and 2005 slightly differ. A harmonization must be developed if we want to follow overall trends during this period. Uchikoshi and Mugiyama (2020) [UM2020] created such a harmonization to cover censuses carried out between 1980 and 2005. This task creates a similar harmonization covering censuses carried out between 1985 and 2005. Census information for 1980 is available only in PDF scanned image format and would need to be manually keyed in, so it is omitted from this task.
The harmonization created here was created independently of Uchikoshi and Mugiyama (2020) and may differ.
- No leading zeros so that sorting as numbers and sorting as text produce the same results
- Maintain same ordering as for the 2005 population census.
- Enable "group by" aggregation of detailed occupations to produce mid-level and top level occupation class totals.
- Correspondence table should support merging with data downloaded from e-Stat (https://www.e-stat.go.jp/)
Coding for the harmonization has the following features:
The harmonization and correspondence table can be downloaded from this project's repository.
The harmonization task was done in a Microsoft Excel workbook with one worksheet for harmonizing and one worksheet for each of the census years.
In the worksheet, column 'OccCodeAgg' is a temporary coding created for stepping through the set of 5 classifications. The 5 columns to its right contain the 'OccCode' values as described in the task on preprocessing the data; they will be used to merge downloaded data into the crosswalk. The MinorTitle entries are from the 2005 census.
The harmonization task consists of stepping down these classifications, rearranging and aggregating as needed. Here are some illustrative examples:
-
The first two categories are unchanged across the censuses so two sequential codes, Agg001 and Agg002, are entered under OccCodeAgg.
-
The third category is in position 3 for censuses 1990 through 2005, but in position 10 for the 1985 census. This is reflected by entering code '1985010' in position 3 under 1985. Most such rearrangements are under the same middle category and result in no changes when aggregated at the mid or major levels. Some rearrangements, however, change minor occupations mid and major classifications. An example is tobacco workers, who are classified at position 246 under construction workers in the 1985 census, 207 under food and beverage workers in the 1990 and 1995 censuses, and under beverage and tobacco workers in the 2005 census.
-
Mining and smelting technicians, the fourth category, were split into two categories in the 1985 census, but combined for later censuses. Assigning the same OccCodeAgg, 'Agg004', to both rows will allow "group by" aggregation of 1985 data when it is imported.
-
Similarly, when a minor occupation is split, such as for computer processing technicians in the 2005 census, assigning the same OccCodeAgg, in this case 'Agg010,' to both will enable aggregation to the broader definition. Here, the MinorTitle is taken from the broader definition used in the older census.
-
When new occupations appear in later censuses, such as certified social insurance and tax accountants, and other management specialists in the 2000 and 2005 censuses, OccCodeAgg entries are assigned without provision for aggregation.
Where occupations have been removed from the classification, they are aggregated into the not otherwise classified (NOC) slot for that mid-level group if one exists. Otherwise it is replaced by NA. Elbers's (2021) segregation package accomodates omitted occupations.
The mid-level group food, beverage and tobacco workers presents a special case. Tobacco workers were not in this group in 1985, and was one NOC category for the group. In 1990, tobacco workers were moved into this group and the group split into a mid-level food workers group and mid-level beverage and tobacco workers group. Each of the latter groups has an NOC category. For this crosswalk, these two mid-level groups were recombined and the NOC groups coded for aggregation.
Finally, a hierarchical numbering scheme of major, mid and minor occupations is created by deriving the minor level OccMinor from OccCodeAgg, and adding OccMajor, OccSub and OccMid codes to be compatible with the occupation levels used for the Population Census of Japan. The added OccSub level reflects the fact that the population census classifications actually have 4 levels.
This harmonization has 261 to 264 occupations (see descriptive statistics) in contrast to the 245 of UM2020's harmonization. One reason for this may be differing starting points for creating the crossworks. The harmonization in this task prioritizes the 2005 occupational classification while UM2020 appears to prioritize the 1980 classification. Occupational classifications for the 1990 through 2005 censuses were derived from the 3rd Revision Edition of the Japan Standard Occupational Classification; classifications for the 1980 and 1985 censuses were derived from the 2nd Revision.
The csv file is created by hiding column OccCodeAgg, copying all cells to a new spreadsheet while saving formula as values, and saving the new spreadsheet in csv format.
Datasets:
- Occupational crosswalk in CSV format
- For public critique, the XSLX spreadsheet used for this task. (This spreadsheet also contains data for 2010 and 2015, but these are not used in the crosswalk.)