Resumen:
The thesis addresses the challenge of estimating socioeconomic status (SES) at an intraurban
level using digital data sources. Traditional methods for measuring SES, such as censuses and
surveys, are often limited by their infrequency and coarse spatial granularity, which hinders
timely and accurate assessments, especially at the neighborhood level. The study proposes
leveraging alternative digital data sources, including mobile phone top-up transactions and
supermarket purchase data, to model and predict SES, providing the potential for more
frequent, cost-effective, and spatially granular analysis. The research focuses on urban
neighborhoods in Ecuador, aiming to develop machine learning models that can accurately
predict Neighborhood SES (NSES).
The research employs two machine learning models: a Regression Model using mobile
phone top-up transactions and a Graph Neural Network (GNN) Model using supermarket
transaction data. The first model focuses on linear relationships between variables derived
from top-up transaction data and NSES. The model is designed to estimate the NSES by
aggregating the average denomination and the denomination diversity at the neighborhood
level. The second model leverages the complex, non-linear relationships inherent in
supermarket transactions. The GNN model transforms these transactions into a graph
representation, where items purchased together are linked, and the frequency and diversity
of these links are analyzed to infer SES. The model is particularly suited for capturing the
socioeconomic patterns that emerge from the co-purchase behaviors of individuals within a
neighborhood.
Both models demonstrate significant predictive power in estimating SES at the intraurban
level. The Regression Model achieves a prediction accuracy of up to 74%. This model is
particularly effective in identifying the relationship between average top-up denomination
and neighborhood SES, with higher denominations indicating wealthier neighborhoods.
The GNN Model outperforms the Regression Model, achieving a prediction accuracy of
up to 91%. The GNN model is able to model the intricate patterns of co-purchases within
neighborhoods, allowing for a more detailed and accurate representation of NSES. The
results highlight the potential of digital data sources as viable alternatives to complement
traditional SES measurement methods.