Metrics for Analyzing Urban Societies

If we want to analyze specific regions or communities within cities, there must be some indicators that act as differentiators. For example, carbon dioxide concentrations, literacy rates, water hardness levels, diabetes levels etc. We can use multiple quantitative metrics. However, the hard part is finding relationships between these metrics and interpreting them.

With regards to Jaipur, there are various ways to collect data and use metrics. The first, simpler method is to set up your own grid of sensors around the city, each sending data in files based on its latitude and longitude rather than identification number. The second would be to get in touch with say the state government, or a sufficiently large private company.

Now, along with collecting numerical data, we will also need some qualitative data (and assign classes) to establish patterns. For example, if a specific region with a certain pincode has an abnormal concentration of carbon dioxide and a high usage of cigarettes, we would expect severe health, or even comfort problems there.

Getting the data is the hard part. Kaggle has a database on Jaipur, but only regarding weather data, which sucks. There are various other governmental sources, but most are written in hindi and are consequently very hard to input into a model as a language to language model/framework will be required. If all the sheets have a standard template, we could type in the data manually for the first few sheets, and then figure out a scraper to do the job for us.