Summary of Reading – July ’20

Speaker for the Dead (Orson Scott Card) – Very beautiful and elegant. The slow pace and the depth of understanding here, along with the level of detail and complexity works well with the pace, seriousness, and action of the first book in the series. It’s great to see Card take a respite from pure action and thrill and focus a lot on world-building, which sets up the sequel to this book pretty well. It’s amazing how similar the situation set up by the ending is to the plot of the first book, an invitation for xenocide. 

I really like how the story progresses and how Card transitions the murders (apparent) of Pipo and Libo to something done for a very honourable and specific purpose. What seemed like a monstrous act was instead an event of transformation, of sending your best to the afterlife (which is very real for the piggies). It also enabled that evident and missed gap of communication that would be present between any two species, and that’s how Card made the human interpretation of Pipo’s and Libo’s deaths look horrific but then made the human interpretation itself look very biased. Personally, I thought Pipo was killed by stumbling onto one of the Piggies’ closely guarded secrets, one they would kill to protect, and that the same fate befell Libo, albeit independently. It wasn’t so, they both were killed for refusing to kill and send their brothers to their third lives (also note how important ‘third’ is now compared to what ‘third’ was at the start of the book). 

Ender’s level of control and influence is also brilliant. How he uses his skills to unravel the situation almost perfectly, and guide everyone towards a rebellion is amazing. It’s beautiful character development. Over the course of two books, Ender goes from being a compassionate killer to a wonderful father and a man who has fully redeemed himself by bringing the species he destroyed back to life and nurturing a new species so as to prevent them from the same fate the buggers met. 

It’s also quite interesting to read about the ecology on the planet. How every living thing shares a plant-animal life cycle. Raises a lot of questions about our basal assumptions for any form of alien life. 


Xenocide (Orson Scott Card) – One of the best books I’ve read so far, and that’s saying a lot. While the scientific reasoning behind many things may not be sound (anything at all regarding philotes i.e), the book more than makes up for it through world-building, action, and plot. It brings so many concepts and worldviews and beliefs together that it is just exhilarating. The concept of genetic enslavement was rather interesting to read about. The people of Path were genetically enhanced to be more intelligent than any other human being, but had a specific gene engineering inside them that led them to believe in the power of the gods and that the gods spoke to them to keep them on their path, making them slaves to Congress. I like how the plant itself is named path, and how the tampered gene is an attempt to make everyone fall onto the designated path, but as the story ends, the planet goes on to forge its own path. We also see genetic enslavement in the form of the descolada. I find the entire concept to be dictatorial and bizarre, and also worrisome as it isn’t that hard for something like that to be done irl. 

In terms of storytelling, Peter’s return might just breathe back the intense action and absorbance of the first book into the series. It was definitely a huge surprise. However, the brilliance of Peter’s return is equally matched by Novinha’s stupidity. I mean, what was Novinha even doing? Why did she become a nun? While the dependence of every single character on mysterious religious power is troubling, as any advanced future society should’ve freed itself of the shackles of religious belief by then, what’s even more disturbing is the big question: what happened to Jane? I take it she will be weakened, but what happens? The ending felt like a small calm before the storm, and I personally believe that this is great, as the next book heralds the arrival of the fleet, Peter’s bid to destroy Congress, and the end of Ender’s journey.


We are the Nerds (Christine Lagorio-Chafkin) – Not as good a read as I expected it to be, primarily because I use reddit a lot and I expected an account of its history to be based more on the product rather than the people running it or the social lives of the people running it. Before reading this book, I was expecting an account of reddit’s journey as a product, not as a business. Furthermore, the book goes off on various tangents, everything from Ohanion’s relationship with Serena Williams to Aaron Swartz and net neutrality. While I feel this may have been necessary to attract a larger audience (Aaron Swartz attracts tech-savvy people and Williams is just famous), it isn’t related to reddit whatsoever. However, there were some aspects of this book I found fascinating. One of these was Huffman’s return, which sounded quite poetic. I was surprised by how he wasn’t able to fit in and needed a counsellor/therapist. I also found the entire “spezgiving” saga to be quite childish, but it really brought out how big businessmen/women are the same people as us, and how they’re prone to the same mistakes. I also found it interesting to read about how people are targeted online and the internet’s potential to be downright toxic (seeing as I’ve encountered a very small portion of this toxicity myself), but how its countered by the internet’s ability to be wholesome, helpful, and amazing. The best example of this, in the book, was perhaps Barack Obama’s reddit AMA, or perhaps the “Mr Splashy Pants” thing. While there were some likeable facets to the book, it doesn’t really do justice to its title.

Modelling COVID-19 Cases for South Korea, India, and Sweden.

I recently wrote a paper that modelled the spread of covid-19 in Italy using a logistic fit. I wrote this a while back, and I was curious about how such a logistic function would behave NOW, so I decided to look at how a logistic fit could be applied to India, South Korea, and Sweden. I’ve taken these 3 countries because they’ve adopted entirely different approaches to tackling the pandemic, so I was curious as to whether or not the effects of these approaches would be graphically discernible. Here, I modelled the cumulative confirmed cases for the respective country against the number of days since the beginning of the outbreak. I’ll go through solely the results, you can find the full code here.

I’m using covid19api.com for obtaining the required data. It gives you a good deal of data, and has excellent support so I’d recommend using this API for all things covid-19. After obtaining the data, I treated it by filtering out undesired columns/rows, converting the date string to an integer that represented number of days since the beginning of the outbreak, and broke my data into a feature matrice and a target vector. I’ve used a logistic function for fitting the data.

South Korea

Here are the results, graphically, for South Korea.

It’s evident that the logistic model can’t correctly explain the epidemic in South Korea, but that’s because of the South Korean government’s perfect handling of their localised cluster. South Korea rolled out tests and implemented stringiest social distancing policies not just more effectively, but more quickly than any other country. The graph acts as representative of these results. Real world cases overtake the logistic fit, but then fall back down. South Korea’s strict implementation of social distancing and mass-scale testing can be seen to be effective at around the 50 day mark. The curve flattens out, to some extent, but then resumes an upwards slope.

India

India demonstrates a nearly perfect logistic fit. You can see that the logistic model explains really world data to what appears to be nearly perfect accuracy. I believe this is because of some inherent conditions here. Firstly, population density. It’s hard not to come in contact with someone else while sick in India. While modelling a disease’s spread, you use a r0 value. In other countries, population density may restrict how many people an infected person infects. However, In India, any individual who is not quarantined and sick will come in contact with more people than the R0 number.

Sweden

A logistic model also seems to describe the situation in Sweden pretty well. The logistic fit largely pertains to real world data. This is somewhat of a surprise, as Sweden implemented a vastly different policy regarding COVID-19. Instead of carrying out mass scale testing or enforcing a lockdown, Sweden opted for ‘herd immunity.’ Herd immunity is immunity to a disease as a consequence of a large proportion of the population having recovered from it. On a social level, this didn’t exactly go well for Sweden.

I think its quite interesting to look at how accurate logistic models are in states with different approaches. For instance, South Korea succeeded in pushing down the R0 in their cluster due to testing and quarantine. India, on the other hand, has so far failed to implement any successful policy that reduces the R0 values. Sweden is actually trying to increase its R0 so as to develop herd immunity. One inherent assumption in any logistic model, and most epidemiological models is that the R0 value remains largely constant. This assumption definitely holds true for India and Sweden, but breaks down when it comes to South Korea.

Summary of Reading (June ’20)

Rafa (Rafael Nadal) – I’ve never really been that engrossed with tennis so reading this book was an insight into a world I’ve never really been a part of, but it was fun nonetheless. Major parts of the book seemed like they were devoted to more tennis-tuned readers, given that there’s a lot on game specifics and a lot of tennis terminology that went flying over my head. I found it really insightful to read about the impact his social and mental stability had on his physical stability and his game. I especially found the sections on the the importance of family to Nadal, and how his circle of stability helps him. It was also inspiring to read about having such a concrete routine that you’re at the court of 5 am no matter what and no matter how much sleep you were able to get. He talks a lot about having a very focused mental state during the game, and filtering everything out but the game so that you have very high concentration, and I relate to that not just because I’ve played cricket in the past, but because that state of mind where nothing else can impact you and all your mind is bent on particular task is something I try to attain and do attain while programming or reading. The sections on humility are also wonderful to read, but also mildly humorous sometimes, especially the parts where he talks about how his uncle and coach, Toni, makes him perform small, irrelevant gestures like not walking in the middle of the group and not violating dress codes etc. I also enjoyed reading about Mallorca and the societal setup there, the amount of peace Nadal gets is incredibly valuable given not many athletes can afford that in the modern world, and seeing him talk about how important a factor that is is brilliant and wholly justified. This book deviated from my usual sci-fi adventures, but I enjoyed it but found the sections on describing games for long periods of time a bit too much.

The Three Body Problem (Cixin Liu) An incredible book that brings together two vastly different story-lines in a manner I am yet to see. The book starts out in mundane manner, throwing you straight into the action wherein the antagonist’s father dies, forming the motivation for an act we see quite a while later. Everything that occurs at the starts: the countdown, weird results from experiments, the universe flashing, seem to be pure sci-fi elements that the author eithers fails to or doesn’t explain. The beauty of this book lies in turning that around into something that is rational and believable. Even more so, it’s amazing to see the Three Body video game turn into something that exists in the real world. Of course, abandoning the question of how life even came to be on such a world, it’s fascinating to see alternate theories play out as to how the environment functions, and see hundreds of years of scientific progress occur in the span of a few pages. The different scenarios that play out: solar trygzys, triple sun days, chaotic eras and so on make this book incredibly fascinating. Add to that the depth to which all characters have been created; we have Wang who is oblivious to the grand scheme of things but a good man at his core, then we have Dang Shi who is oblivious to the big picture and not a man of science, but the guy who always solves the problem through his core set of principles, and lastly we have Ye, an almost psychopathic personality with a deep hatred for humankind, such that she’s effectively brought and end to it. Learning about the 3 body problem was an incredible experience for me. I also loved the author’s note about how most ideas that take off fall back to ground because of how the gravity of reality is too strong.

Delta-V (Daniel Suarez) – Very much new, and very relevant. While some of the things in the book were downright outlandish, how Suarez portrayed everything and how real he made the dangers of space travel seem to be made this book amazing to read. The candidate smelection process had a lot of time and space devoted to it, and it payed off with it being one of the best pre-climax sections I’ve personally read. In fact, I’d say some of the stuff in the entire candidate selection section was better than the rest of the book and the climax. It was really good at hooking me on. Some stuff that stood out during the candidate selection process: the high-co2 atmosphere puzzle solving event, the psychological test, and most importantly, how bonds were being formed. When the actual crew went up to hotel LEO to pursue other projects, I thought that the book was dying down, and I personally vouched for the chosen crew dying in their first attempt and the actual crew substituting for them, but what Saurez did was much better, although admittedly a bit rushed. I also couldn’t quite comprehend why a spaceship that had 14 billion dollars invested in it had a lot of software errors that made living unbearable and potentially fatal for the crew. I mean, come on, if you have 14 billion dollars, surely you can have a nice programming team as well who don’t mess their job up. The entire concept of mining was made so much better by the sample schematics attached by Saurez towards the end, they really payed off. The arrival of the Argo was also surprising, and Joyce’s downfall was also kind of expected but tragic nonetheless. The worst part of the book, in my opinion, was how the new investors treated the astroanuts. That seemed like pure fiction, rather than the rest of the book, which was genuinely believable and inspiring. This book also hammered into me the sort of challenges future astronauts can face, and the sacrifices they might have to make. It really drove the point home; space is hard.

Ender’s Game (Orson Scott Card) – Pretty good read. What really held me back was the sheer outlandishness of a boy of eleven years of age leading an entire army and being representative of the Earth’s military WHILE not even realizing he was doing this. I know its purely fictional, and the story is quite impressive, but the entire notion of such a young person doing all that is beyond me. Another thing I did not understand was why Mazer Rackham didn’t lead the armies. He told Ender that he wouldn’t be alive until the date of the future battle, when instead that same battle was Ender’s training. The simple explanation for this is that Ender is better than Mazer. That being said, no matter how good Ender is, why would the authorities choose a 11 year old boy with an incredible skill set over an experienced veteran and celebrated hero? That doesn’t make any sense to me. I love the elegance in how Ender is compassionate at heart and Peter is more hurtful, but how events reverse roles, making Ender seem hurtful and Peter seem compassionate. Throughout the book, Ender’s desire to not be like Peter holds him back and dominates him, whereas Peter’s desire to be more compassionate drives him to greater heights. It’s a stunning depiction of how events in the real world can pan out, and how roles can be reversed even when characters are not. In very cliche fashion, I’m going to say that the character I bonded the most with was the main character, Ender, but only because of his tendency to look at things the way they were and not the way they were conventionally taken to be. The best example of this is how each team thought of the battleroom as horizontally aligned, but Ender was the only one who viewed it as a place where you’re going down towards your enemy and not straight, and that changed a lot. His tendency to tackle the rules is also brought out by the match against two teams, where he sends a man through the gate rather than eliminating the two teams. The ending is a bit heartbreaking, especially when Ender realizes that he didn’t just obliterate the opposite side which had its own culture and legacy, but also sacrificed soldiers on his side without knowing so, and seeing how the guilt of this plays out is fascinating.

Recursion (Blake Crouch) – A book that takes science and throws it out the front door. The reasoning that time has no linearity, and that the past, the present, and the future exist as one just makes no sense. The simplest way to gauge time is drink tea; your cup of tea cools down gradually, that’s the progression of time. However, time isn’t a thing in this book. Instead, it’s a virtual construct made by our brains to make everything much more simple. And guess what this means, you can travel back and forth, but not in time, in your memories. Now, in normal conditions, this would be time travel, but since time apparently does not exist in this book, or is as traversable as a physical dimension, this is just memory travel. The part that struck me the most was how Helena and Barry were not able to figure out how to nullify everything after the original timeline in what was about 198 years, when Slade did that on his own in just a single timeline. Both of them thought about it so much that Barry, who was a police detective in the first timeline, became an astrophysicist/quantum physicist in the second and was trying to calculate the Schwarzschild radius of a memory (what?). Both of them didn’t think about returning to the previous timeline by activating a dead memory. It’s only logicial that if you can time travel, the only way to nullify your existing timeline is to go back and cut out the event that birthed your existing timeline. Instead, Barry and Helina decided to try and find a way to remove dead memories themselves rather than the events that created those dead memories, had they decided to try and pursue the latter in 198 years, they would’ve succeeded. I also don’t get why people need to be killed to be sent back into their own timeline, or why the U-shaped building building just appeared and caused mass FMS rather than it being there forever and people getting FMS when the cut-off date finally came.

Currently Reading

The Amazing Adventures of Kavalier and Clay (Michael Chabon)

The Brain (David Eagleman)

Implementing Hierarchal Clustering (Python)

Clustering is an important non-supervised learning technique. It aims to split data into certain clusters. For instance, if you input data pertaining to shoppers in the local grocery market, clustering that could output age based clusters of say < 12 years, 12-18, 18-60, and > 60. Another intuitive example is banking, clustering financial data for a large group of individuals could output income-based clusters, say 3 pertaining to the lower middle, upper middle, and upper classes.

The most basic and intuitive method of clustering is K-means, which identifies K clusters. It randomly initialises K centroids, marks the points near it, and repeats until it repeats K averaged out cluster centroids. Hierarchal analysis, however, is based on a much different principle.

There are two methods of hierarchal analysis: agglomerative and divisive. Agglomerative puts each data point into a cluster of its own. Hence, if you input 6000 points, you start out with 600 clusters. It then clusters the closest points (and then clusters) and repeats this process until there is only one giant cluster encompassing the entire dataset left. Divisive does the exact opposite. It starts with one giant clusters, and splits each cluster (repeatedly) until each point is its own cluster. The results are plotted on a special plot, called a dendrogram. The longest vertical line segment on the dendrogram gets to be the optimum number of clusters for analysis. This will be much easier to understand when I show a dendrogram below.

Here is a link to the dataset I’ve used. You can access the full code here. This tutorial on analyticsvidhya was also immensely helpful to me when understanding how hierarchal clustering works.

Note down the libraries I’ve imported. The dataset is fairly straightforward. You have an ID for each customer, and financial data corresponding to that ID. You’ll notice that the standard deviation or range for features is quite different. Where balance frequency tends to stay close to 1 for each ID, account balance is wildly different. This can cause issues during clustering, so that’s why i’ve scaled my data so that each feature is similar to each other feature in relative terms.

Here is the dendrogram for the data. The y-axis represents the ‘closeness’ of each individual data-point/cluster. You’d obviously expect y to be maxed out when there’s only 1 cluster, so that’s no surprise. Now, looking at this graph, we must select the number of clusters for our model. A general rule of thumb here is to take the number of clusters pertaining to the longest vertical line visible here. The distance of each vertical line from each other represents how faraway those clusters are. Hence, you want a small number of clusters (not necessary, but in this application, optimal), but also want your clusters to be spaced far apart, so that they clearly represent different groups of people (in this context).

I’m taking 3 clusters, which corresponds to a vertical axis value of 23. 3 clusters also intuitively makes sense to me as any customer can broadly be classified into lower, middle, and upper class. Of course, there are subdivisions inside these 3 broad categories too, and you might argue that the lower class wouldn’t even be represented here, so we can say that these 3 clusters correspond to the lower middle, upper middle, and upper classes.

Here is a diagrammatic representation of what I’ve chosen.

After building the model, all that’s left is visualising the results. There are more than two features, so I’m arbitrarily selecting two, plotting all points using those features for my axes, and giving each point a color that corresponds to all other points in its cluster.

You’ll notice that there is a lot of data here, but also a clear pattern. Points belonging to the purple cluster visibly tend towards the upper left corner. Similarly, points in the teal cluster tend to the bottom left corner, and points in the yellow cluster tend to the bottom right corner.

Implementing PCA and UMAP in Python

You can find the full code for PCA here, and the full code for UMAP here.

Dimensionality reduction is an important part of constructing Machine Learning models. Dimensionality Reduction is basically the process of combining multiple features into a smaller number of features. Features that have a higher contribution to the target value have a greater representation in the final combined feature than features that contribute less. For instance, if you have 8 features, the first 6 of which have a summed contribution of around 95%, and the last 2 of which have a contribution of about only 5%, then those 6 features will have a greater representation in the final combined feature. In terms of advantages, the most significant is less memory storage and hence higher modeling and processing speed. Other advantages include simplicity and easier visualization. For instance, you can easily plot the contribution of two combined features to the target, especially compared to plotting, say, 20 initial features. Another significant aspect is that features will less contribution that would otherwise add useless ‘weight’ to the model are removed early on.

The two methods of dimensionality reduction I will be using are PCA and UMAP. I won’t be going in through how they work as I’ve given a short overview of their purpose above. Instead, I’ll go through the code I implemented for each, and visualize the results. For this exercise, I’m using the WHO Life Expectancy Dataset that can be found on Kaggle, as its very small and easy to work with. My target variable will be life expectancy, and my features will be aspects like adult mortality, schooling, GDP etc. I randomly selected these features from the dataset.

Here is a list of the modules we will be using. train_test_split will help us break out data into a training set and a testing set (about a 7:3 ratio). While this isn’t significant right now, this aids in the detection of under fitting and over fitting. Under-fitting is detected by bad performances on both the training set and the testing set, whereas Over-fitting is detected by really good performance on the training set but bad performance on the testing set. StandardScaler has been used to normalise features. Feature normalisation is a technique that reduces the range of the dataset, or the standard deviation, in layman’s terms. Lastly, we’ve imported both PCA and UMAP, which will be used.

Here we just load our dataset, extract the features that will be used (see column names in the dataframe), and rename them for the sake of simplicity. As you can see, there are some random spaces and not all use underscores as notation, so I decided to have one uniform way of typing out each feature. Now, to extract a feature matrix and a target vector, just drop the life_expectancy column from the dataframe and convert it into a numpy array, and convert the life_expectancy column into a separate numpy array. I won’t show the code for splitting and normalising, because that’s pretty much irrelevant here.

Implementing PCA in itself is very simple, as shown above. You’ll notice that I’ve specified n_components to be equal to 2 above. This is because I just wanted to point out that the number of combined features you want at the end can be set by you. In this case, it doesn’t really matter because PCA will give only two combined features if I do not a specify a pre-set number. After that, I’ve fitted the training_data to PCA.

Here’s a bit of data treatment before I finally plot the results. I’ve basically converted PCA’s output, which was a numpy array, to a pandas dataframe, and then added life_expectancy as a column because that will be used for the color-bar you will see below.

Here is the code for my plot, and here is the plot:

You can see the relative contribution of each componenent to each feature, whose target value, or life expectancy, is represented by the color of the marker. While I don’t see any patterns straightaway (specific colors being clustered somewhere etc.), the primary thing that does stand out is how heavily green dots (~ 70 expectancy) are clustered towards the bottom left. There are other colors as well, but there don’t seem to be many green dots anywhere else.

The code for UMAP is the exact same, except with UMAP as our decomposer instead of PCA. Here’s the plot.

You can straightaway see that the results of UMAP are quite different. Once again, there are no noticeable patterns in terms of specific colors being clustered in specific locations, but the overall structure is quite different from that of PCA. We can see that each color is distributed throughout.

There’s no way to say which method is better without modeling your target variable with respect to both principal components and calculating the accuracy on the testing set. This post just aims to illustrate how both of them work without going into specific details.

Sustainability: Why it Matters and What I’m Trying to do about it.

There are so many facets to global warming and climate change people are aware about, but don’t understand. This leads to a lot of focus on very specific issues, which, in turn, leads to negligence of other problems. For instance, one such facet is decreasing tree cover in urban areas and loss of forested areas. This problem is very simple to understand. People need space to live, and they need shelter to live in, hence, they take a plot of land with trees, cut the trees down to get space and materials for building those shelters. Sustainability here would be planting more saplings elsewhere and restricting yourself to the plot you cleared out initially. If, with time, you lack space, build vertically, not horizontally. Leave room for mother nature. Instead, due to an ever-increasing need for residential and industrial areas, cities continue to expand at a breathtaking pace. Find an image comparing what cities were like ten years ago and what they are like now visually, you’ll see what I’m talking about. Urban areas constantly demand more homes, offices, power generation facilities, shopping malls, roads etc. The list goes on and on. Nature doesn’t. Due to urban expansion and hence reduced green cover, carbon dioxide levels are ramped up with more vehicles to produce it and less trees to remove it. Describing the different ways through which carbon dioxide is produced is futile. More urban areas results in more airports, not just one for each city, but more than one for the large cities. Airports need vast patches of land, which means less tree cover. More airports means more flights, and aircraft are one of the biggest singular producer of greenhouse gases, so that just keeps adding on. This is just one impact of not having enough green cover, some other consequences include higher temperatures (hence more need for air conditioners in buildings and vehicles, which has redundant unit-impact, but a very large aggregate impact), soil erosion, elimination of natural flood barriers etc. It’s bad. You might say that cutting down a small patch of trees has no impact in the larger scheme. You’re right, it doesn’t. But when millions of people say that, that impact adds up, and it leads to the world we’re in today.

Now less green cover is just one facet of the problem, as I said at the start. You might’ve realized the magnitude of this problem, but there are other problems out there that bear the same, if not heavier consequences. There’s rising temperatures, rising sea levels, desertification etc. And these are just other aspects of the same overarching problem: global warming. There are a host of issues out there: space junk, war, plastic pollution etc.

But then, the reason why one can’t solve all these issues is because they’re too large. That doesn’t just mean each one needs too much effort and time, it means that each one needs too much money, on the scale of billions of dollars. Knowing this, I set out to try and make my own small-scale, high-impact solution to the problem of reduced tree numbers: Treephillia.

Planting trees is something we’re all taught as a kid. Kindergarten and elementary school are full of activities where PT teachers show you how to plant a sapling, then you go ahead and plant yours and it feels like you’re doing your bit for the world. It’s something we all read about, and I do believe that the mass media deserves a lot of credit for showing the world just how important planting trees can be. In the developed world, and in a significant proportion of the developing world, most people know about this issue, even though they may not do anything about it or not even care. Awareness is there, but no one knows what to do. Take person X, for example. X wants to contribute by planting trees, but he has no idea about what to plant or how to plant it. He doesn’t know where he should plant it, and he doesn’t know which saplings he should buy from where. X is also aware of how important planting trees is, but is unsure about the impact just one plantation can have. X is representative of the majority of Earth’s population in this matter.

My application tries to remedy these issues. With inputs from experts who have field experience, I can advertise to its users what they should plant. For instance, the Eucalyptus is a 100% no-no, it might look pretty but it stunts the growth of other trees. With a plantation site feature that enables officials within the local forestry department to mark spots ripe for public plantation campaigns, I can tell X where to go to plant his tree. With a map that enables X to see plantations, he gets to know the larger impact. If you have a city with a population of 1 million people, and 1 out of 100 plant just a single sapling, you still have 10000 plantations. If you have a country with a population of 100 million people, via an extension of the same assumption 1/100 plant, you have 1 million plantations. That’s a lot, and that would really matter. My application gives its user the personal interface they need to plant trees smartly in the modern day era. And no, it’s not just limited to the stuff I listed out above. There is a serious lack of incentive surrounding planting trees, so I also decided to implement a voucher feature that rewards users who plant trees. While a voucher should not be a reason to want to plant trees, it serves as a pathway to doing so, and that works.

Now, how exactly does my application in reality? It’s not just for singular users who want more information and who want to record what they actually do. For instance, businesses can use it to have each employee plant a tree in the event of an office birthday and track the trees planted on a map. People can plant a tree on the important occasions of their life, say birthdays, anniversaries etc. Hotels can get employees to plant a tree when a guest has a birthday, or maybe even plant one without any occasion to visually improve the setting their guests stay in. The government can use it to track tree plantations and mark planting sites for the public. NGOs in this sector can use this to reach people who genuinely care and want to have an impact on the world through planting trees.

Personally, tree plantation is very important. As a 7 year old, I helped Dad plant a sapling outside our old house. For four years, I saw that sapling grow from a timid little thing to a leafy tree. Every time I return to that house, I think about just how much that one tree has grown. With this in mind, I’ve always drawn an analogy between planting a sapling and a mass tree plantation movement. Like a sapling, any plantation trend would be small at first. But, with time, it’ll grow. It’ll sprout branches and twigs, wear leaves, and most importantly, grow roots deep enough to keep it stable and strong. However, there is one difference here. Unlike a tree, a movement can’t be ripped out of its foundations or chopped off from its base. It will persist, and so will our planet.

Plotting Shapefile Data Using Geopandas, Bokeh and Streamlit in Python.

I was recently introduced to geospatial data in python. It’s represented in .shp files, in the same way any other form of data is represented in say .csv files. However, each line in a .shp file corresponds to either a polygon, a line, or a point. A polygon can represents certain shapes, so in the given context of maps and geospatial data, a polygon could act as a country, or perhaps an ocean and so on. Lines are used to represent boundaries, roads, railway lines etc. Points are used to represent cities, landmarks, features of interest etc. This was very much new to me, so I found it fascinating to see how any sort of map can be broken down into polygons, lines, and points then played around with using code. I was also introduced to streamlit, which provides, say, an alternative to Jupyter Notebooks but with more interaction and in my opinion, better visual appeal. I think one distinct advantage Jupyter Notebook has is compartmentalisation, and how good code and markdown look next to each other, whereas streamlit seems to be more visually appealing. However, one big advantage streamlit has is the fact that it is operated by the command line, making it much more efficient for a person like me who’s very much comfortable with typing out commands and running stuff, rather than dragging my pointer around to click objects.

I used the geopandas library for dealing with shapefile data. It’s incredibly efficient and sort of extends native pandas commands to shapefile data, making everything much easier to work with. One thing I didn’t like was how streamlit didn’t have inbuilt functionality to view GeoDataFrames, so essentially that means I have to output geospatial data using the st.write() method, and that just results in some ugly, green-colored output, very much unlike the clean, tabular output you get when you use st.write() for displaying dataframes. It’s also a bit surprising how st.dataframe() doesn’t extend to a GeoDataFrame, but eh, it works for now.

Bokeh is new for me, I decided to start out with making plots using inbuilt geopandas and matplotlib functionality rather than move straight to Bokeh. Hence, in this post, I’ll be going through how I made and annotated some maps using geopandas, then extended that to Bokeh to make my code much more efficient. A huge advantage Bokeh brings to the table is that it can be used to store plots, so no going back to earlier cells to find a plot. Just output it to an html file and write some code to save the contents of that html file, you’re then good to go.

The very first thing I did was create a very basic plot showing Germany filled with a blue color. Here’s a small piece of code that accomplishes that.

Plotting Germany

This simple takes the geodata for the entire world, in low resolution. I then select the data specific to Germany, plot it using geopandas, turn plot axes off just to make it look better visually, and output a plot to my streamlit notebook (do I call it that?). Here’s the result

As you can see, it’s a very simple plot showing the nation state of Germany. Now, I thought I’d extend this, make it look better, and annotate it with the names of some cities and the capital, Berlin. The very first thing to do here is to get some data for each city, their latitudes and longitudes, to be specific. You can either import that via a shapefile, which will have each location as a point. However, I manually inputted 6 cities and their data from an internet source to a dataframe, and then used that to annotate my figure, which I’ll be talking about now.

The dataframe at the top contains my city position data. Down below, I’m creating another GeoDataFrame that holds each cities data as a point object. You’ll notice that I’ve used the points_from_xy() method while creating the city dataframe. points_from_xy() wraps around the point() method. You can view it as equivalent to [point(x,y) for x, y in zip(df.Longitude, df.Latitude]. It works really well and removes the need to have a for loop, making, making my code much more efficient. I’ve then plotted the same map as above, except with a white fill and a black outline (better looking imo). After that, I’ve gone over each point using a for loop and added a label, which is the City name (stored as just “Name”). I’ve also increased the size of the marker for Berlin, given that its the capital. The last step is just adding a red marker to indicate each city’s position. Note that st.pyplot() is a streamlit method that outputs any figures we might have. Here is the output of the code above.

I think this looks much better.

Now, I decided to plot a map showing all the main railway lines in India on top of a blank map of India, and output this to a new .html page as a bokeh plot.

As you can see the, the code for this is very simple. I’ve firstly plotted the map of India using the get_path functionality in geopandas. Then, for the sake of visbility, axes lines have been turned off. Then, I’ve read the railway shapefiles, which consists entirely of line objects, and plotted it using the pandas_bokeh library, outputting to an html file. Here’s the result

I find working with geospatial data to be terribly useful, and very much exciting. I think that’s partly because I love data science; playing around with data and modelling it, and working with geospatial data opens up an entire realm of possibilities. I’d describe the experience of making my first shapefile plot as something akin to working for the first time with time series data. Much like time, it adds another dimension to what one can do. In the coming weeks, I’ll be blogging more often, hopefully once every week, on geospatial data specifically. The contents of this post are not even an introduction to what can be accomplished using geospatial data. Next point of exploration, for me, is going to be how to depict elevations, and use color density to indicate population density, average cost of living, gdp etc. One immediate application that comes to mind is making a map that uses color to reflect how much covid-19 has impacted a country, based on not just cumulative confirmed cases, but also factors like healthcare expenditure, economic downturn, unemployment etc. I think it’ll be interesting.

You can find the entire code here.

Nihilism and Importance.

At the start, we humans thought we were the center of the universe. How naive. We thought this lie to be a universal truth that granted us special status in the universe, we were at the center of God’s creation, and thus the most prized of his inventions. Then, the heliocentric model placed the Sun at the center of the universe. In what was a huge blow to human pride, we came out as self-appointed winners on the basis of us still being God’s cherished inventions. We then learned that the Sun wasn’t the center of the universe, but instead the center of one of trillions of Solar Systems in constant revolution around the Milky Way center. Then, we found out that we were just another planet around another star in another galaxy that was one of trillions upon trillions. Now, there is a fair chance that we may just be one of quadrillions of universes.

While our insignificance in the cosmic scheme of things has nought but grown, we’ve still found ways to make ourselves feel unnecessarily special. The first of these is the proclamation that we’re the only known form of intelligent life in the known universe. Now that might hold true what of what importance is being intelligent when regardless of its existence, the horsehead nebula still remains untraveled, the TRAPPIST system remains unexplored, and the flat earth theory still remains believable to an absurd number of people. We’re intelligent relative to the natural life on Earth (although maybe not, octopuses display far more intelligent behavior than us). On a cosmic scale, the human species is to the universe what the the flutter of a butterfly’s wing is to storms on Jupiter, irrelevant. In fact, the argument that being intelligent is being special is in itself flawed; an intelligent species wouldn’t knowingly harm its ecosystem and put politics above science.

So, one may ask, what does make us significant? Is it love? friendship? Science? The former two make individual lives important, but not an entire civilization. Love isn’t a metric for progress, scientific progress is, and its measured by the Kardashev scale. The Kardashev scale measures a species’s technological progress. I denotes optimal utilization of all planetary resources. II denotes optimal utilization of host star’s energy output. III denotes optimal utliization of your entire galaxy’s energy resources. We haven’t even reached point I yet, and there’s no saying we ever will given the way we’ve used our planet’s resources is destroying it. The only way to become significant in a cosmic sense is to expand our barriers and become a space-faring civilization. Our solar system provides a variety of environments, everything from the hellish landscape of Venus to the subsurface oceans of Europa, populating them is both an engineering challenge and a stepping stone to going beyond.

Having a measurable impact requires scientific progress, but humans would rather continue playing tic-tac-toe rather than switch to Minecraft. Rather than adopt incredible challenges in the form of navigating the harsh environment of space and settling humans on other bodies, our politicians choose to fight and debate and argue endlessly. That’s because the entire concept of modern politics is somewhat flawed. Politicians can’t make decisions that balance on the scale of decades, even centuries. They cannot comprehend the importance of science, because anyone who is mildly scientific would never go into politics, and fixate themselves on which country did what and their endless agendas rather than the one big agenda that should occupy their mind: what is our place in the universe?

And it goes without saying that space offers more economic potential than anything else, so the entire notion of immense space exploration budgets drawing money away from the economy and facets like healthcare and education is founded in quicksand. Asteroid mining offers returns on the level of trillions of dollars, if not more. Lunar colonization offers dramatic reductions in the cost of space exploration and a gateway to other bodies in the Solar System. Europa and offer the chance to find life beyond Earth, or at least understand how modern Earth came to be. Space offers a dizzying array of possibilities, but we ignore them all.

As the Ancient one put it, we’re a “momentary speck within an indifferent universe,” having an impact requires understanding of just how important technology and science are.

But then again, do people really get that any longer? Ancient humans had access to the universe via a tool that is ironically useless today, the naked eye. Looking up offered a window into a larger world, unlike today where the night sky is pitch black like in all of Enid Blyton’s stories. This sparked curiosity, it led to the development of calendars and timekeeping and birthed the concept of studying, which has historically been awesome. Even astrology came into being, one of our first attempts to explain how the universe impacts us. We’ve lost sight of that burning curiosity now. People think they have access to a whole new world through their smartphones, but no one realises what we’ve lost along the way. We’ve lost the reason why we began being intelligent: the night sky.

Solving the UEFA Champions League problem on Code Chef with OOP

So I decided to try out code chef, and chose to solve the UCL problem listed on the home page of the ‘practice’ portion of code chef. The problem statement was detailed and easy to understand, you can view it here. It basically involves inputting the match results for a league in the following format:

<home team> <home team goals> vs. <away team goals> <away team>

for instance, fcbarca 5 vs. 1 realmadrid

12 match results are inputted, and match results for T leagues are taken in. The output for each league should be the names of the winning team and the runner-up. For instance, the output for the following league schedule/results,

manutd 8 vs. 2 arsenal
lyon 1 vs. 2 manutd
fcbarca 0 vs. 0 lyon
fcbarca 5 vs. 1 arsenal
manutd 3 vs. 1 fcbarca
arsenal 6 vs. 0 lyon
arsenal 0 vs. 0 manutd
manutd 4 vs. 2 lyon
arsenal 2 vs. 2 fcbarca
lyon 0 vs. 3 fcbarca
lyon 1 vs. 0 arsenal
fcbarca 0 vs. 1 manutd

would be

manutd fcbarca

Here are some basic rules we observe for football tournaments. Each victory gives you 3 points, each lose gives you 0 points, each draw gives you 1 point. If two teams have the same point tally, the team with the higher goal difference is placed first. For simplicity’s sake, we’re assuming here that no two teams with the same point total will have the same goal difference.

I defined two classes, League, and Team right at the start. The only attribute an instance of the League would have would be table, which is a dictionary wherein the key is a string pertaining to the team’s name and the value if a Team object containing all necessary information. I also put five methods inside the League class, which I’ll be explaining just in a bit.

League Class

The does_team_exist() method just returns true or false values pertaining to whether or not the inputted string is a key in the dictionary field. In my main code, if does_teaam_exist() returns False, the add_team() method is called and the self.table attribute is updated with a Team object being added to it. If does_team_exist() returns True, the program continues. match_won() and match_lost() are pretty well defined. The former calls the win() method for the Team object corresponding to the Team that won and the lose() method for the Team object corresponding to the Team that lost. You’ll notice that we also pass in the goal difference here. The latter calls the same draw() method for both, and no gd is passed in because it equals 0 for both times in the event of a drawn match.

return_top-two() is our most important method. It creates a list of all values in the table dictionary. Then, I used a lambda function to sort on the basis of two parameters, points and goal difference. You’ll notice that my first element in the lambda function inside the tuple is the number of points associated with that object, and the second is the goal difference for that team. After sorting all Team objects, I return the team names of the last and second last objects.

Now, I’ll show you the Team Class.

Team Class

This is much simpler. When initializing, the only argumentative input is the team name, while points and team difference are initialized to 0. The win() method increases points by 3 and goal difference by the argumentative input. the draw() method only increments points by 1, and the lose() method decrements goal difference by gd. This is followed by three get methods, two of which are used in the lambda function shown above, and one which is used to get the desired output for our program.

Now, the only part left is our main class or working code. Here it is.

Code

We take in T as our first input as defined by the problem statement, then iterate over each League. Before going into the results for each league, I’ve initialized league to an instance of the League class. I then iterate over each of the 12 fixtures in each league. Splitting the input and extracting certain indices (see code) gives us our values for each team’s name and the number of goals it scored. You can see that I have two “if not”s up there that initialize a new team object inside the table attribute of the League class if the team in question is not there already. Then, I compare home team goals and away team goals and call the appropriate method where needed. Our output is the value returned by the return_top_two().

You can check out the full code here.

Credits to code chef for the problem statement.

UI to mail NASA APOD image to inputted email [#2 NASA APOD Series]

I’ll be going through using Flask to create a 2-webpage website with a simple form on the first page that takes in your email, clicking the submit button sends an image of the latest NASA APOD image to the inputted email. For this, I’m using the script written and explained in the last post in this series, which can be found here. So in this post, I’ll be going through the flask templates i created, and my form functionality, which is pretty simple.

The first step is to create a .html page and a main.py file for our website. Here’s the code for the following.

<body> tag of .html file

So the image above shows the code I’ve written for the body tag, which is all that’s worth looking at. The form data returned in the main.py file is used to just show the form and a submit button, very simple functionality so I’m not going to go that much into detail for this part. We only have an email field and a submit button, clicking the submit button firstly checks whether or not the email field has a valid input. If it doesn’t, the field is cleared and a red outline is put on it. If it is, an email containing the latest NASA APOD image is mailed to the inputted email, and the user is redirected to another webpage.

This is the code in my main.py file. I’ve initialised my Flask app with the first line and given it a secret key (for forms). The send_email() function contains the code explained in the previous post. It’s called if and only if the submit button is clicked on the home page and a valid email address is filled out. If these conditions are satisfied, the send_email() function is called with the inputted email as the only argument. The code then redirects the user to the bye.html page, which is just a small html file displaying the words “bye.” I did it for the sake of seeing whether or not the send_email()function has been called inside home.

My functionality for setting up the email form is very simple. I use the wtforms module.. I’ve created emailButton class wherein the the email is set to a StringField object, and submit is set to a SubmitField object. The library provides validator methods. DataRequired() ensures that some data is inputted into the field, and Email() ensures that the given input is a properly formatted email.

This is what our webpage displays. After entering a valid email and clicking submit, here is what I’ve received in my inbox.

Next time I’ll have a more styled layout, and I’ll go through the countdown functionality that will send emails to inputted email addresses until the number of inputted days becomes zero. Project should ideally be complete when that’s done.

You can check out the full code here.