An Analysis of Covid-19 Deaths by Age and Income in São Paulo, Brazil

Bernardo Loureiro | Medida SP
6 min readDec 17, 2020


This is translation by the author of this article, published originally on June 11 in Portuguese.

I’ve analyzed data from the Brazilian Health Ministry to understand how Covid-19 deaths in the São Paulo Metro area are distributed according to age and income. I’ve found out that residents from lower-income areas are dying more because of the disease, even when controlling for population size in each age and income bucket.

Covid-19 Deaths by Age and Income

I analyzed 3,959 confirmed Covid-19 deaths in the São Paulo Metro area, reported until May 18. I mapped the postal codes of the victims and joined it to 2010 Brazilian Census data to get the income of the deceased person’s postal code. Also, I divided the deaths in age buckets, since age is a known risk factor for Covid-19.

The map above seems to display a larger number of deaths in São Paulo’s periphery (which is poorer than downtown), but doesn’t allow for an accurate reading of the relationship between deaths, age, and income. To facilitate this comparison, I’ve plotted these variables as a heatmap, seen below.

We can see that the majority of deaths was concentrated among older (more than 50 years) and poorer individuals (average household income of less than three thousand Brazilian Reais, or about 590 US Dollars).

Deaths among younger individuals

It also struck me the considerable number of deaths among younger low-income persons, especially when comparing against young and high-income. In the 30 years or less age group, there was only 1 death attributed to Covid-19 in the period in higher income postal codes (average household income above 6,500 BRL or 1,270 USD). In comparison, there were 52 deaths among young persons in the lower income postal codes during the same period.

Note: How to read the chart above?

In the vertical axis, I’ve divided deaths in income buckets; in the horizontal axis, in age buckets. Each cell represents the number of deaths for that specific age and income buckets. The diagram below explains how to read the chart:

Considering population size

Population size varies according to income and age, so I did further analysis to take this fact into consideration.

To do this, I added the number of deaths according to age bucket (ignoring income). Then, I distributed the deaths in that age bucket considering the population distribution according to income. This allows me to approximate the number of “expected” deaths, in case income had no impact in Covid-19 mortality rates.

Here’s an example: let’s say there were 100 deaths in the 31–40 age bucket and, in the same bucket, 7,000 poor individuals and 3,000 rich individuals in the population. If income didn’t matter, we’d expect to see 70 deaths among poorer individuals and 30 deaths among the richer. If we subtract this expected number from the real number of deaths, we can get a better picture of the impact of income on mortality.

This analysis is shown in the chart below. It shows the significant impact of income in the number of deaths attributed to Covid-19. The number in each cell indicates the number of deaths above (positive) or below (negative) the expected, for each age and income bucket.

We can see a high number of deaths above the expected for the lower income bucket, and a number of deaths below expected for the higher income buckets.

Covid-19 is increasingly affecting the poorer

Lastly, I’ve analyzed the deaths by income and notification date. We can see that, as time progresses, deaths are more and more concentrated in areas with lower income. The median of the average household income of the deceased’s postal codes was almost 4,000 BRL (780 USD) in the beginning of March. In the beginning of May, it had fallen to 2,200 BRL (430 USD).

Is low income a risk factor?

Although low income might not be a direct risk factor, it can be associated to a series of other risk factors, such as:

  • Access to heathcare
  • Overcrowded households (as I had pointed out in this other analysis)
  • Being required to work and the inability to work remotely
  • Exposure to pollution and other environmental risk factors
  • Incidence of other risk diseases, such as diabetes and heart diseases

The writer Rebecca Solnit has talked about some of these aspects in her piece titled “Coronavirus does discriminate, because that’s what humans do.” The New York Times also published a piece on the most impacted neighborhoods in the city, and another about racial disparities and the impact of the virus.

I believe that environmental justice is a relevant perspective to address these issues. Much like different populations are exposed to different environmental risks (following historic exclusionary processes), different populations are also exposed to different risks during the pandemic. This series of short texts establishes some parallels between environmental, justice, clima justice, and the pandemic.

To make this scenario even worse, exposure to certain risk factors can combine to increase mortality rates even more. For example, if one lives in an area that’s more exposed to pollution; has greater chances of having a pre-existing medical condition that’s untreated; less access to healthcare; and is required to leave one’s home to work, then all these risk factors will be added.

Limitations of this study

Deaths and not cases

I’ve chosen to analyze deaths and not cases because sub-notification tends to be a smaller issue for deaths. In addition, there’s likely an income bias in sub-notification, since higher income individuals likely have more access to testing.

Analyses by color / race

Analyses that show relationships between color / race and mortality rates are fundamental to explain the unequal impact of the disease. However, over a third of the deaths reported in the dataset I used had no color / race information, which made it impossible for me to analyze this relationship. This is an essential variable that must be collected properly so as not to make these issues invisible in the discussions about the pandemic.

Postal code and addresses

I used the Brazilian postal code (CEP) to georeference the residential location of the deceased, which implies in a few limitations. Persons living in informal housing, generally poorer, many times don’t have a postal code; Google’s georeferencing API doesn’t work as well in these areas; homeless persons do not have a postal code to report. All of these factors might mean that the dataset is under-representing poorer individuals.

Income by location

I’ve used the most recently available Census data (2010) to attribute income by postal code. I know that the average postal code doesn’t necessarily represent the exact income of the individual. However, it is a good starting point, and the most granular data available right now.


All methodology is available in open source code here, using Jupyter Notebooks. The dataset used for this analysis is now made available without the postal code of the deceased (as pointed out here by LabCidade). In the methodology I’ve included a full copy of the dataset used.