Mental strength


Many people out there forget how important mental strength is. If you don’t believe you can do something then you won’t…simple as that. If you can visualize yourself being successful and gain confidence through training and hard work…you will succeed!

-Brian Shaw

LRU Cache


I had heard a lot about this interview question from people but honestly, I never studied it in school. Maybe I skipped one of those classes that talk about it 🙂

LRU stands for least recently used (add cache to that, it becomes LRU cache. But LRU could be of anything like LRU disk, LRU threads, etc.). When an interviewer asks you this design question (I consider it as a design question), they are not expecting you to explain how caching works and what cache is but are in fact looking for an algorithmic solution to the LRU problem.

So what do we mean by LRU? Suppose you have two cats of ages 1 and 2 and you are asked to let go of the one that’s older – you let go of the second cat. What did we do here? We compared the ages of the two cats and only kept the one that was younger. Something similar happens with LRU design questions. We want to keep X (could be cache, drives, threads, etc.) but when we run out of capacity, we have to let got of something and this something is decided based on what was not used in recent times.

fstab file

Fig 1. High-level LRU working

How do we maintain the least recently used items? One way is to maintain something like a hashmap of ages and based on what is oldest, we remove it. But what would be the time complexity of this solution? We will have to traverse the entire map to find the oldest age and thus, it would take O(n) time. See HERE for this solution.

Can we do better? Turns out we can if we maintain a doubly linked list (DLL). The oldest item would always remain at the back of the list while the youngest at the front. In this way, we can delete an item in O(1) time. Whenever we access an element from the list, we push it to the front since it then becomes the most recently used item.

A doubly linked list solution in C++ should ideally not use pointers if you are using STL. But if you still want to write your own doubly linked list, HERE is a solution. But if you want to use STL, take a look at THIS.

To surmise, LRU implementation is a popular interview question and not that hard to grasp the concept of it. Give yourself a few hours and see if you fully understand what LRU demands and ways of designing it.

Feel free to leave comments if you have any questions.

Hadoop on containers


I have started working on a project to make Hadoop/Spark run effectively in a containerized environment. And while microservices like a web server are well suited to run in a container (being stateless), running any big data technology like Hadoop/Spark/Kafka can be a challenge thanks to their need of being stateful (opposite of what containers provide). There are some workarounds available and I have written a one-page report that takes no more than 2-3 minutes to read providing a brief, high-level overview. If you have any questions, please leave a comment and I’ll try to address them.

Monitors: An Operating System Structuring Concept


While re-visiting to recall the differences between Semaphores and Monitors, I came across one of my OS class homework where I had written a summary on Monitors. So I thought I may as well blog about it.

Paper: Monitors: An Operating System Structuring Concept

Author: C. A. R. Hoare

Date: 1974

What is it about?

  1. One of the major problems in building an operating system is to allow sharing of computer installations among different processes that may make unpredictable demands upon its resources. The paper attempts to address this issue by introducing the concepts of “signal” and “wait”.
  2. Creating resource allocation algorithms for different kinds of resources can be a challenge for a developer. There wasn’t an easy way to construct separate schdeulers for each class of resource. The paper introduces the concept of monitors that attempts to solve this issue.
  3. The paper also talks about the problem where multiple processes are trying to access the same resource and what can be done to prevent this race condition.
  4. The author has built upon the work done in the paper “Proof of correctness of data representations. Acta Informatica” by Hoare.

Some key contributions

  1. The paper talks about monitors – a synchronization structure. A monitor is a class like structure that contains data and functions. One significant property of monitors is that only one program may be executing a monitor procedure at any given point in time.
  2. The paper demonstrates the usefulness of monitors in structuring operating system designs. Monitors provide an effective way to enforce synchronization of processes (like critical sections).
  3. It also introduces the concept of condition variables. The condition variables stop one process to acquire resources being used by another process. It creates a queue of processes waiting in line to acquire the resource.
  4. “Signal” and “wait” operations are also introduced in the paper. Like the name suggests, “signal” means resume the execution of a process waiting in the queue, while “wait” is used to cause a delay in the execution of a process.
  5. How to implement semaphores using monitors is also described in the paper. Monitors are powerful like semaphores but provide a higher level of abstraction by providing data and code encapsulation.
  6. Introduce a new type of variable called “condition” for cases where there may be more than one reason for waiting and they need to be distinguished by both the waiting and the signalling operation.

Experimental methodology

  1. The paper talks about how semaphores can be implemented using monitors and vice-a-versa. They also provide a proof that the monitor condition concepts are not any less powerful than semaphores and that they can be used for all the same purposes.
  2. The paper makes some good suggestions towards the end and one of them is “Never seek to make an optimal decision; merely seek to avoid persistently pessimal decisions.” This observation calls for the need of good scheduling algorithms and the paper presents one.

The above is only a high-level summary and doesn’t contain all the details mentioned in the paper. For a thorough review, please go through the entire paper. If you have any questions, feel free to leave a comment.

Happy reading!