Thursday, June 30, 2022

Problem's asked during Competitive Programming for Hashing

 So this blog will be dedicated to all the questions asked in technical rounds of competitive programming. These questions are like bible/gita for programmers you should be aware of these concepts because the solution can be given using hashing only. I would suggest solve these problems using hashing concepts.

Problems are below:

1. Count Non-Repeated Elements 

Hashing is very useful to keep track of the frequency of the elements in a list.

You are given an array of integers. You need to print the count of non-repeated elements in the array.

Example 1:

Input:

10

1 1 2 2 3 3 4 5 6 7

Output: 

4

Explanation: 

4, 5, 6 and 7 are the 

elements with frequency 1 and rest 

elements are repeated so the number 

of non-repeated elements are 4.

2. Print Non-Repeated Elements 

Hashing is very useful to keep track of the frequency of the elements in a list.

You are given an array of integers. You need to print the non-repeated elements as they appear in the array.

Example 1:

Input:

n = 10

arr[] = {1,1,2,2,3,3,4,5,6,7}

Output: 4 5 6 7

Explanation: 4, 5, 6 and 7 are the only 

elements which is having only 1 

frequency and hence, Non-repeating.

3.First Repeating Element 

Given an array arr[] of size n, find the first repeating element. The element should occurs more than once and the index of its first occurrence should be the smallest.

Example 1:

Input:

n = 7

arr[] = {1, 5, 3, 4, 3, 5, 6}

Output: 2

Explanation: 

5 is appearing twice and 

its first appearence is at index 2 

which is less than 3 whose first 

occuring index is 3.

4.Intersection of two arrays 

Given two arrays a[] and b[] respectively of size n and m, the task is to print the count of elements in the intersection (or common elements) of the two arrays.

For this question, the intersection of two arrays can be defined as the set containing distinct common elements between the two arrays. 

Example 1:

Input:

n = 5, m = 3

a[] = {89, 24, 75, 11, 23}

b[] = {89, 2, 4}

Output: 1

Explanation: 

89 is the only element 

in the intersection of two arrays.

5.Union of two arrays 

Given two arrays a[] and b[] of size n and m respectively. The task is to find union between these two arrays.

Union of the two arrays can be defined as the set containing distinct elements from both the arrays. If there are repetitions, then only one occurrence of element should be printed in the union.

Example 1:

Input:

5 3

1 2 3 4 5

1 2 3

Output: 

5

Explanation: 

1, 2, 3, 4 and 5 are the

elements which comes in the union set

of both arrays. So count is 5.

6.Check if two arrays are equal or not 

This problem is part of GFG SDE Sheet. Click here to view more.   

Given two arrays A and B of equal size N, the task is to find if given arrays are equal or not. Two arrays are said to be equal if both of them contain same set of elements, arrangements (or permutation) of elements may be different though.

Note : If there are repetitions, then counts of repeated elements must also be same for two array to be equal.

Example 1:

Input:

N = 5

A[] = {1,2,5,4,0}

B[] = {2,4,5,0,1}

Output: 1

Explanation: Both the array can be 

rearranged to {0,1,2,4,5}

7.Subarray with 0 sum 

Given an array of positive and negative numbers. Find if there is a subarray (of size at-least one) with 0 sum.

Example 1:

Input:

5

4 2 -3 1 6

Output: 

Yes

Explanation: 

2, -3, 1 is the subarray 

with sum 0.

8.Winner of an election 

Given an array of names (consisting of lowercase characters) of candidates in an election. A candidate name in array represents a vote casted to the candidate. Print the name of candidate that received Max votes. If there is tie, print lexicographically smaller name.

Example 1:

Input:

n = 13

Votes[] = {john,johnny,jackie,johnny,john 

jackie,jamie,jamie,john,johnny,jamie,

johnny,john}

Output: john 4

Explanation: john has 4 votes casted for 

him, but so does johny. john is 

lexicographically smaller, so we print 

john and the votes he received.

9.Subarray range with given sum 

Given an unsorted array of integers and a sum. The task is to count the number of subarray which adds to the given sum.

Example 1:

Input:

n = 5

arr[] = {10,2,-2,-20,10}

sum = -10

Output: 3

Explanation: Subarrays with sum -10 are: 

[10, 2, -2, -20], [2, -2, -20, 10] and 

[-20, 10].


10.Positive Negative Pair 

Given an array of distinct integers, find all the pairs having both negative and positive values of a number in the array.

Example 1:

Input:

n = 8

arr[] = {1,3,6,-2,-1,-3,2,7}

Output: -1 1 -3 3 -2 2

Explanation: 1, 3 and 2 are present 

pairwise positive and negative. 6 and 

7 have no pair.


11.Zero Sum Subarrays 

You are given an array arr[] of size n. Find the total count of sub-arrays having their sum equal to 0.

Example 1:

Input:

n = 6

arr[] = {0,0,5,5,0,0}

Output: 6

Explanation: The 6 subarrays are 

[0], [0], [0], [0], [0,0], and [0,0].

12.Subarrays with equal 1s and 0s 

Given an array containing 0s and 1s. Find the number of subarrays having equal number of 0s and 1s.

Example 1:

Input:

n = 7

A[] = {1,0,0,1,0,1,1}

Output: 8

Explanation: The index range for the 8 

sub-arrays are: (0, 1), (2, 3), (0, 3), (3, 4), 

(4, 5) ,(2, 5), (0, 5), (1, 6)

13.Sort an array according to the other 

This problem is part of GFG SDE Sheet. Click here to view more.   

Given two integer arrays A1[ ] and A2[ ] of size N and M respectively. Sort the first array A1[ ] such that all the relative positions of the elements in the first array are the same as the elements in the second array A2[ ].

See example for better understanding.

Note: If elements are repeated in the second array, consider their first occurance only.

Example 1:


Input:

N = 11 

M = 4

A1[] = {2, 1, 2, 5, 7, 1, 9, 3, 6, 8, 8}

A2[] = {2, 1, 8, 3}

Output: 

2 2 1 1 8 8 3 5 6 7 9

Explanation: Array elements of A1[] are

sorted according to A2[]. So 2 comes first

then 1 comes, then comes 8, then finally 3

comes, now we append remaining elements in

sorted order.


14. Sorting Elements of an Array by Frequency 

Medium Accuracy: 47.44% Submissions: 32039 Points: 4

Given an array of integers, sort the array according to frequency of elements. That is elements that have higher frequency come first. If frequencies of two elements are same, then smaller number comes first.

Example 1:

Input:

N = 5

A[] = {5,5,4,6,4}

Output: 4 4 5 5 6

Explanation: The highest frequency here is

2. Both 5 and 4 have that frequency. Now

since the frequencies are same then 

smallerelement comes first. So 4 4 comes 

firstthen comes 5 5. Finally comes 6.

The output is 4 4 5 5 6.

Note : if you are able to solve these problems then consider your self good for Hashing.


Wednesday, June 29, 2022

Implementing Hashing in Java || Kotlin

 Implementing Hashing in Java || Kotlin :

Java/Kotlin provides many built-in classes and interfaces to implement hashing easily. That is, without creating any HashTable or hash function. Java/Kotlin mainly provides us with the following classes to implement Hashing:

1. HashTable (A synchronized implementation of hashing): This class implements a hash table, which maps keys to values. Any non-null object can be used as a key or as a value.

// Java program to demonstrate working of HashTable

import java.util.*;

class AbhiSolution {

    public static void main(String args[])

    {

        // Create a HashTable to store 

        // String values corresponding to integer keys

        Hashtable<Integer, String>

            hm = new Hashtable<Integer, String>();

        // Input the values

        hm.put(1, "Abhinav");

        hm.put(12, "Geek");

        hm.put(15, "A computer");

        hm.put(3, "Mobile");

        // Printing the Hashtable

        System.out.println(hm);

    }

}

// Kotlin program to demonstrate working of HashTable

import java.util.*

fun main(args: Array<String>) {

 val hm: Hashtable<Int, String> = Hashtable<Int, String>()

 hm[1] = "Abhinav"

 hm[12] = "Geek"

 hm[15] = "A computer"

 hm[3] = "Mobile"

 println(hm)

}

Output: {15=A computer, 3=Mobile, 12=geek, 1=abhinav}

2. HashMap (A non-synchronized faster implementation of hashing): HashMap is also similar to HashTables in Java/Kotlin but it is faster in comparison as it is not synchronized. HashMap is used to store key-value pairs or to map a given value to a given key. The general application of HashMap is to count frequencies of elements present in an array or a list.

import java.util.HashMap;

public class Main {

    public static void main(String[] args) {

        int arr[] = { 10, 34, 5, 10, 3, 5, 10 };

        createHashMap(arr);

    }

    // Function to create HashMap from array

    static void createHashMap(int arr[])

    {

        // Creates an empty HashMap

        HashMap<Integer, Integer> hmap = new HashMap<Integer, Integer>();

        // Traverse through the given array

        for (int i = 0; i < arr.length; i++) {

            // Get if the element is present

            Integer c = hmap.get(arr[i]);

            // If this is first occurrence of element

            // Insert the element

            if (hmap.get(arr[i]) == null) {

                hmap.put(arr[i], 1);

            }

            // If elements already exists in hash map

            // Increment the count of element by 1

            else {

                hmap.put(arr[i], ++c);

            }

        }

        // Print HashMap

        System.out.println(hmap);

    }

}

3.LinkedHashMap (Similar to HashMap, but keeps order of elements):

import java.util.LinkedHashMap;

public class Main {

    public static void main(String[] args) {

        LinkedHashMap<String, String> lhm =

                new LinkedHashMap<String, String>();

        lhm.put("one", "androidcodingworld.blog.com");

        lhm.put("two", "androidcode.blog.com");

        lhm.put("four", "codingworld.blog.com");

        // It prints the elements in same order

        // as they were inserted

        System.out.println(lhm);

        System.out.println("Getting value for key 'one': "

                + lhm.get("one"));

        System.out.println("Size of the map: " + lhm.size());

        System.out.println("Is map empty? " + lhm.isEmpty());

        System.out.println("Contains key 'two'? " +

                lhm.containsKey("two"));

        System.out.println("Contains value "

                + " " + lhm.containsValue(" " +

                " "));

        System.out.println("delete element 'one': " +

                lhm.remove("one"));

        System.out.println(lhm);

    }

}

Output :

{one=androidcodingworld.blog.com, two=androidcode.blog.com, four=codingworld.blog.com}

Getting value for key 'one': androidcodingworld.blog.com

Size of the map: 3

Is map empty? false

Contains key 'two'? true

Contains value  false

delete element 'one': androidcodingworld.blog.com

{two=androidcode.blog.com, four=codingworld.blog.com}


4 .HashSet (Similar to HashMap, but maintains only keys, not pair): The HashSet class implements the Set interface, backed by a hash table which is actually a HashMap instance. The class also offers constant time performance for the basic operations like add, remove, contains, and size assuming that the hash function disperses the elements properly among the buckets. HashSet is generally used to keep a check on whether an element is present in a list or not.

import java.util.HashSet;

public class Main {

    public static void main(String[] args) {

        HashSet<String> h = new HashSet<String>();

        // Adding elements into HashSet usind add()

        h.add("India");

        h.add("Australia");

        h.add("South Africa");

        h.add("India"); // adding duplicate elements

        // Displaying the HashSet

        System.out.println(h);

        // Checking if India is present or not

        System.out.println("\nHashSet contains India or not:"

                + h.contains("India"));

        // Removing items from HashSet using remove()

        h.remove("Australia");

        // Printing the HashSet

        System.out.println("\nList after removing Australia:" + h);

        // Iterating over hash set items

        System.out.println("\nIterating over list:");

        for (String s : h) System.out.println(s);

    }

}

Output :

[South Africa, Australia, India]

HashSet contains India or not:true

List after removing Australia:[South Africa, India]

Iterating over list:

South Africa

India


5.LinkedHashSet (Similar to LinkedHashMap, but maintains only keys, not pair):

import java.util.LinkedHashSet;

public class Main {

    public static void main(String[] args) {

        LinkedHashSet<String> linkedset =

                new LinkedHashSet<String>();


        // Adding element to LinkedHashSet   

        linkedset.add("A");

        linkedset.add("B");

        linkedset.add("C");

        linkedset.add("D");

        // This will not add new element as A already exists

        linkedset.add("A");

        linkedset.add("E");

        System.out.println("Size of LinkedHashSet = " +

                linkedset.size());

        System.out.println("Original LinkedHashSet:" + linkedset);

        System.out.println("Removing D from LinkedHashSet: " +

                linkedset.remove("D"));

        System.out.println("Trying to Remove Z which is not "+

                "present: " + linkedset.remove("Z"));

        System.out.println("Checking if A is present=" +

                linkedset.contains("A"));

        System.out.println("Updated LinkedHashSet: " + linkedset);

    }

}

Output:

Size of LinkedHashSet = 5

Original LinkedHashSet:[A, B, C, D, E]

Removing D from LinkedHashSet: true

Trying to Remove Z which is not present: false

Checking if A is present=true

Updated LinkedHashSet: [A, B, C, E]


6.TreeSet (Implements the SortedSet interface, Objects are stored in a sorted and ascending order)

import java.util.Iterator;

import java.util.TreeSet;

public class Main {

    public static void main(String[] args) {

        TreeSet<String> ts1 = new TreeSet<String>();

        // Elements are added using add() method

        ts1.add("A");

        ts1.add("B");

        ts1.add("C");

        // Duplicates will not get insert

        ts1.add("C");

        // Elements get stored in default natural

        // Sorting Order(Ascending)

        System.out.println("TreeSet: " + ts1);

        // Checking if A is present or not

        System.out.println("\nTreeSet contains A or not:"

                + ts1.contains("A"));

        // Removing items from TreeSet using remove()

        ts1.remove("A");

        // Printing the TreeSet

        System.out.println("\nTreeSet after removing A:" + ts1);

        // Iterating over TreeSet items

        System.out.println("\nIterating over TreeSet:");

        Iterator<String> i = ts1.iterator();

        while (i.hasNext())

            System.out.println(i.next());

    }

}

Output:

TreeSet: [A, B, C]

TreeSet contains A or not:true

TreeSet after removing A:[B, C]

Iterating over TreeSet:

B

C


Tuesday, June 28, 2022

Hashing - Peeling Concept and Knowledge for Interview

 Hashing is a method of storing and retrieving data from a database efficiently.


Suppose that we want to design a system for storing employee records keyed using phone numbers. And we want the following queries to be performed efficiently:
  1. Insert a phone number and the corresponding information.
  2. Search a phone number and fetch the information.
  3. Delete a phone number and the related information.
We can think of using the following data structures to maintain information about different phone numbers.

  1. An array of phone numbers and records.
  2. A linked list of phone numbers and records.
  3. A balanced binary search tree with phone numbers as keys.
  4. A direct access table.
For arrays and linked lists, we need to search in a linear fashion, which can be costly in practice. If we use arrays and keep the data sorted, then a phone number can be searched in O(Logn) time using Binary Search, but insert and delete operations become costly as we have to maintain sorted order.

With a balanced binary search tree, we get a moderate search, insert and delete time. All of these operations can be guaranteed to be in O(Logn) time.

Another solution that one can think of is to use a direct access table where we make a big array and use phone numbers as indexes in the array. An entry in the array is NIL if the phone number is not present, else the array entry stores pointer to records corresponding to the phone number. Time complexity wise this solution is the best of all, we can do all operations in O(1) time. For example, to insert a phone number, we create a record with details of the given phone number, use the phone number as an index and store the pointer to the record created in the table.
This solution has many practical limitations. The first problem with this solution is that the extra space required is huge. For example, if the phone number is of n digits, we need O(m * 10n) space for the table where m is the size of a pointer to the record. Another problem is an integer in a programming language may not store n digits.

Due to the above limitations, the Direct Access Table cannot always be used. Hashing is the solution that can be used in almost all such situations and performs extremely well as compared to above data structures like Array, Linked List, Balanced BST in practice. With hashing, we get O(1) search time on average (under reasonable assumptions) and O(n) in the worst case.

Hashing is an improvement over Direct Access Table. The idea is to use a hash function that converts a given phone number or any other key to a smaller number and uses the small number as an index in a table called a hash table.

Hash Function: A function that converts a given big phone number to a small practical integer value. The mapped integer value is used as an index in the hash table. In simple terms, a hash function maps a big number or string to a small integer that can be used as an index in the hash table.

A good hash function should have following properties:

  1. It should be efficiently computable.
  2. It should uniformly distribute the keys (Each table position be equally likely for each key).

For example, for phone numbers, a bad hash function is to take the first three digits. A better function will consider the last three digits. Please note that this may not be the best hash function. 

There may be better ways.

Hash Table: An array that stores pointers to records corresponding to a given phone number. An entry in hash table is NIL if no existing phone number has hash function value equal to the index for the entry.

Collision Handling: Since a hash function gets us a small number for a big key, there is a possibility that two keys result in the same value. The situation where a newly inserted key maps to an already occupied slot in the hash table is called collision and must be handled using some collision handling technique. 

Following are the ways to handle collisions:

  • Chaining: The idea is to make each cell of the hash table point to a linked list of records that have the same hash function value. Chaining is simple, but it requires additional memory outside the table.

  • Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NIL. When searching for an element, we one by one examine the table slots until the desired element is found or it is clear that the element is not present in the table.

Open Addressing: Like separate chaining, open addressing is a method for handling collisions. In Open Addressing, all elements are stored in the hash table itself. So at any point, the size of the table must be greater than or equal to the total number of keys (Note that we can increase table size by copying old data if needed).

Important Operations:

  • Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.
  • Search(k): Keep probing until the slot's key doesn't become equal to k or an empty slot is reached.
  • Delete(k): Delete operation is interesting. If we simply delete a key, then the search may fail. So slots of the deleted keys are marked specially as "deleted".
Insert can insert an item in a deleted slot, but the search doesn't stop at a deleted slot.

Open Addressing is done in the following ways:

  1. Linear Probing: In linear probing, we linearly probe for the next slot. For example, the typical gap between the two probes is 1 as taken in the below example also.
    let hash(x) be the slot index computed using a hash function and S be the table size.

  1. If slot hash(x) % S is full, then we try (hash(x) + 1) % S
    If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
    If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
    ..................................................
    ..................................................

Let us consider a simple hash function as “key mod 7” and a sequence of keys as 50, 700, 76, 85, 92, 73, 101.


Clustering: The main problem with linear probing is clustering, many consecutive elements form groups and it starts taking time to find a free slot or to search an element.

  1. Quadratic Probing We look for i2'th slot in i'th iteration.
    let hash(x) be the slot index computed using hash function.
    If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
    If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
    If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
    ..................................................
    ..................................................
  2. Double Hashing We use another hash function hash2(x) and look for i*hash2(x) slot in i'th rotation.
    let hash(x) be the slot index computed using hash function.
    If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
    If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
    If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
    ..................................................
    ..................................................
Comparison of above three:

  • Linear probing has the best cache performance but it suffers from clustering. One more advantage of Linear probing that it is easy to compute.
  • Quadratic probing lies between the two in terms of cache performance and clustering.
  • Double hashing has poor cache performance but no clustering. Double hashing requires more computation time as two hash functions need to be computed.
S.No.Seperate ChainingOpen Addressing
1.Chaining is Simpler to implement.Open Addressing requires more computation.
2.In chaining, Hash table never fills up, we can always add more elements to chain.In open addressing, table may become full.
3.Chaining is Less sensitive to the hash function or load factors.Open addressing requires extra care for to avoid clustering and load factor.
4.Chaining is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.Open addressing is used when the frequency and number of keys is known.
5.Cache performance of chaining is not good as keys are stored using linked list.Open addressing provides better cache performance as everything is stored in the same table.
6.Wastage of Space (Some Parts of hash table in chaining are never used).In Open addressing, a slot can be used even if an input doesn't map to it.
7.Chaining uses extra space for links.No links in Open addressing

Performance of Open Addressing: Like Chaining, the performance of hashing can be evaluated under the assumption that each key is equally likely to be hashed to any slot of the table (simple uniform hashing).

 m = Number of slots in the hash table
n = Number of keys to be inserted in the hash table

Load factor α = n/m ( < 1 )

Expected time to search/insert/delete < 1/(1 - α)

So Search, Insert and Delete take (1/(1 - α)) time

Next implementation will come in coming blog.


TOP 10 CLOUD PROGRAMMING LANGUAGES TO LOOKOUT FOR IN 2022

 


TOP 10 CLOUD PROGRAMMING LANGUAGES TO LOOKOUT FOR IN 2022

Organizations, these days, don’t think twice before relying on programming languages for they know how easier it has become for them to achieve their goals. Talking about programming, cloud programming languages have started to gain huge popularity. However, it is always important to have a fair understanding as to which among them are worth it all. In this article, I will talk about the top 10 cloud programming languages to look out for in 2022.

Python

Python has gained wide popularity as a server-side language that caters to a wide range of applications. Be it simple scripting to advanced web applications, this cloud programming language has got you covered. Python enables the developers to use a variety of programming styles including reflecting, functional, etc. In addition to all of this, Python is considered to be one of the easiest and most marketable programming languages to learn.

JavaScript

For organizations to create dynamic web elements such as animated graphics, interactive maps, etc., there cannot be a better programming language to rely on than JavaScript. This programming language has wide applications in the area of web development, building web servers, game development, etc.

Golang (Go)

Go, as known to many, is a cloud programming language developed by Google. Its ability to handle multicore and networked systems and massive codebases is the very reason for its growing popularity. This has led to huge companies such as Google, Uber, Twitch, Dropbox, etc. relying on APIs and web applications.

Java

When it comes to web development, application development, or big data, Java needs no special mention. This is a general-purpose programming language with an object-oriented structure that is owned by Oracle Corporation. Though this is a little complex programming language, it is extensively used.

C#

C# has gained wide recognition for all the right reasons – it supports the concepts of object-oriented programming. C# is considered to be the one cloud programming language that is ideal for applications on Windows, Android, and iOS.

R

R, yet another remarkable programming language,  is used for processing statistics, including linear and nonlinear modeling, calculation, testing, visualization, and analysis. If you are good at mathematics, R is just the language for you.

C++

C++ is a well-known cross-platform cloud programming language that boasts of a bag full of features such as data abstraction, polymorphism, inheritance, etc. Be it desktop application development, GUI application development, 3D game development, or building real-time mathematical solutions, C++ has got you covered.

Kotlin

Kotlin is one open-source programming language, that companies such as Netflix, Pinterest, and Amazon Web Services rely on heavily. The very reason why this is the case is because of Kotlin’s features such as support for lambda functions, smart casts, null safety, and operator overloading.

Ruby

Ruby has made its way to become one of the top 10 cloud programming languages to look out for in 2022 as it has become extremely popular for web developers. Ruby has an easy-to-read and writing syntax. 

Disclaimer: The information provided in this article is solely the my opinion and not investment advice – it is provided for educational purposes only.

Monday, June 27, 2022

Hard things in Computer Science and in Software Development

 Hard things in Computer Science and Development



If you’ve more than a couple of years of experience in IT, you probably have stumbled upon the following quote:

There are only two hard things in computer science: cache invalidation and naming things.

— Phil Karlton

Then, because it’s such a great quote, it evolved:

However, I think that the initial quote is misleading. A lot of things are hard in computer science. This post aims to describe some of them.

Cache invalidation

I had a few use-cases for caching in my professional career. When I did, it was mostly to cache Hibernate entities. Only once did I implement my own cache. I used a simple HashMap as it was for a batch job as the cache size was small: no invalidation was needed.

Most cache implementations implement the Cache-Aside strategy. In Cache-Aside, the application tries to load data from the cache. If the cache has them, it returns them; if not, the application reads from the source of truth - in general, a database.

While data are stored in the cache, they may change in the database. In short, the probability that data in the cache and the source of truth diverge increases as time passes. For this reason, we sometimes need to synchronize data with the source of truth. It’s achieved by cleaning data from the cache - cache invalidation, also known as TTL.

The TTL specifies how long an entry is valid. After the time has elapsed, the cache removes it, and the next read will load data from the source of truth.

The tricky thing, in this case, is to choose the correct TTL:

If the reference data in the source of truth changes before invalidation, clients read stale data

If the invalidation happens before the change, an unnecessary reload happens.

In essence, the smaller the TTL, the less chance to read stale data, but the less useful is the cache.

Naming things

If you have any experience as a developer, you probably are convinced that naming things is challenging indeed. If not, let’s introduce another quote:

Programs are meant to be read by humans and only incidentally for computers to execute

— Donald Knuth

Here’s another one, slightly more inflammatory:

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

— Martin Fowler

For example, here’s some code:

data class Foo(val bar: String)

class Baz : IllegalArgumentException()

data class Qux(val quux: Double, val foo: Foo) {

    operator fun plus(qux: Qux) =

        if (foo != qux.foo) throw Baz()

        else Qux(quux + qux.quux, foo)

}

data class Currency(val symbol: String)

class MismatchCurrencyException : IllegalArgumentException()

data class Amount(val number: Double, val currency: Currency) {

    operator fun plus(amount: Amount) =

        if (currency != amount.currency) throw MismatchCurrencyException()

        else Amount(number + amount.number, currency)

    }

}


Thanks to the usage of APIs, namely IllegalArgumentException and plus(), you may infer somewhat what the code does. However, correctly renaming classes and fields reveals our intent:

data class Currency(val symbol: String)

class MismatchCurrencyException : IllegalArgumentException()

data class Amount(val number: Double, val currency: Currency) {

    operator fun plus(amount: Amount) =

        if (currency != amount.currency) throw MismatchCurrencyException()

        else Amount(number + amount.number, currency)

    }

}

When I worked on projects, I always cautioned about the following issues when talking with the business:

Using different words to cover the same reality

Using the same word to cover different realities

The second is much worse, as you may think you’re talking about the same thing with the business or fellow developers, but you’re not. If the talk ends in agreement, but each party has a different reality in their head, it’s a recipe for a future disaster.

Within the scope of verbal communication, it’s possible to ask questions about the meaning of such and such words. In code, it’s not! Hence, naming classes and variables must convey the exact meaning.

It’s hard because you need to be precise without being verbose.

Dates, times, and timezones

I already wrote about Date and time gotchas. To sum up:

The move from the Julian calendar to the Gregorian one happened at different dates depending on the country involved

Some countries' timezone are not entirely congruent to their geographical location

Some countries implement DST, some not. Even worse, some did and don’t anymore.

Countries sometimes change their timezones. While it’s not frequent, it happens more frequently than most think.

Not all timezones are separated by one hour. For example, India is UTC+5:30, but three timezones are spaced by 45 minutes.

Estimates

Estimates in software development projects are so hard to get right that some people working on the development side started the "No Estimate" movement. I won’t delve into the pros and cons of not doing estimates; my point is just that estimating a non-trivial software project is challenging.

I think that entire books have been written on the reasons why estimates are hard and how to improve them. Let’s summarize the reasons:

Not only software projects

Many who are involved in software projects but do not know how they work are eager to compare the activity with house building. Unfortunately, "house" can encompass different levels of industrialization. What the people refer to is an off-the-shelf building. It involves low or zero customization. Software development projects are on the opposite side of the scale, akin to unique architectural chef-d’oeuvre. For a more detailed explanation, look at the following On house building and software development projects post.

Problems along the way

Estimates are provided before the start of the project, with all parameters available at the time. Unfortunately, while risk management accounts for the known knowns and the known unknowns, it’s impossible to forecast the unknown unknowns. With the highly-mutable state of software projects, it’s practically a given that something unexpected happens, invalidating our previous estimates.

The nature of estimates

By definition, an estimate is a guess. The problem is that most treat them as a deadline. At this point, the organization will focus on respecting the deadline, regardless of the trade-offs - a recipe for failure.

Distributed systems

There’s so much one can do with a single computer, even a multicore one. Adding more resources to a computer rapidly hits the point of diminishing returns. You can do nothing at this point but distribute the load across several computers. Welcome to the realm of distributed systems!

The problem with distributed systems is that they are easy "to get wrong". Here is the list of fallacies regarding distributed systems:

The network is reliable

Latency is zero

Bandwidth is infinite

The network is secure

Topology doesn’t change

There is one administrator

Transport cost is zero

The network is homogeneous.

Tons of books and theses have been written on distributed systems. People who can get them right deserve all my respect.

In my career, I stumbled upon two distributed systems problems:

Dual writes

Leader election

Dual writes

Imagine a system with two distributed data stores. We require that they must contain the same data.

It qualifies as the dual write problem. In a single database, we can solve it with database transactions. In two different databases, two-phase commits are available, even if they might not be reliable. If the stores that you need to write to cannot be enrolled in 2PC, as in microservices architectures, you need to rely on compensating transactions. Compensating transactions are fragile and complex to implement correctly, if at all.

From a theoretical point of view, the CAP teaches us that we can choose only two out of three capabilities: Consistency, Availability, and Partition tolerance. In reality, the choice is no choice at all. Because the system is distributed, we need to choose P; because modern requirements forbid us to block, we need to choose A. The trade-off is consistency: both stores will converge to the same state over time.

While working on this problem, I discovered Change-Data-Capture. The idea behind CDC is to send updates to a single store and stream the diffs of the new state to the other one. To implement it by oneself is not trivial. I’d recommend using an existing product: I’ve used Debezium successfully in the past for my demos.

Leader election

Distributed systems rely on multiple nodes, and coordination across them is mandatory. Some rely on a specific node referred to as the leader to manage other nodes, while others are leaderless.

Most modern implementations are leader-based: it seems leaderless implementations are less reliable, though, truth to be told, I cannot tell the reason given the depth of my current understanding at the moment of this writing). Such implementations require all nodes to agree on which node is the leader among them - consensus. When a network partition happens, some nodes cannot communicate with others. It’s relatively easy to reach a consensus in a partition; it’s more challenging when the network reunites, and a single leader must be selected among all previous ones.

It’s why the Paxos algorithm, or the Paxos family of algorithms, was invented. However, experts seem to think that Paxos is error-prone to implement: the Raft algorithm is an attractive easier-to-implement alternative. In any case, easier doesn’t mean easy.

Proving code is bug-free

Traditional software engineering mandates testing to avoid bugs. Unfortunately, whatever approach you favor - unit, integration, end-to-end, or a mix of the three - doesn’t guarantee your code has no bugs. It’s, in fact, quite widespread to find bugs in production despite the infamous 100% code coverage. The only reliable way to have bug-free code is to prove it. It requires solid mathematical foundations and a programming language that allows formal proofs.

A couple of such languages exist. Unfortunately, all of them still belong to academia; the ones I’ve heard of are Coq, Idris, and Isabelle.

Until any of them makes it into the mainstream industry, writing bug-free code will be part of one the hard things in computer science.

Summary

Writing there are only two hard things in computer science is a strong claim. In this post, I tried to list several hard things I’ve been exposed to during my career. I believe there are many others: I’ll be interested in the ones you’ve encountered.



TOP 10 OUTDATED PROGRAMMING LANGUAGES TO FORGET IN 2022

Programming languages that take the coding world by storm are introduced only once in a few years. Meanwhile, quite many coding languages developed on the lines of these languages find their way into a programmer’s tech stack for a short term and then disappear. At times mainstream programming languages have to face a similar fate when equally competent programming languages, with better features, are released into the market. Here are the top 10 outdated programming languages which are worth forgetting.

Programming languages that take the coding world by storm are introduced only once in a few years. Meanwhile, quite many coding languages developed on the lines of these languages find their way into a programmer’s tech stack for a short term and then disappear. At times mainstream programming languages have to face a similar fate when equally competent programming languages, with better features, are released into the market. Here are the top 10 outdated programming languages which are worth forgetting.

1.VB.NET
Visual Basic.NET, a language created by Microsoft has a syntax similar to BASIC and a coding style similar to that of C#. With developers actively adopting modern languages like .NET and C#, it might see itself shelved very soon.

2. Elm:
It has been developed as a domain-specific language for declaratively creating web browser-based graphical user interfaces. It is considered a dying language as it has been dormant for the past two years with no significant update.

3. Coffee Script:
Though known for its prettier looks and efficient destructuring, it is quite notorious for the ambiguity it adds to the programming. Coffee Script lacks the function of explicit scoping and real functions, a primary reason for its decline in the popularity charts.

4. Haskell:
A culmination of not-so-popular languages like Miranda and Clean, it has been the language firms and projects like Facebook and Github resorted to. Of late it is giving up to RedMonk despite having major updates in recent years.

5. Erlang:
A general-purpose language, adopted largely by telecommunication systems, has performed not so well in the job market. Though it had a constant presence last year, its community engagement was not so good.

6. PASCAL:
A direct descendant of ALGOL 60, has once ruled as an imperative and procedural programming language. Now that Delphi has taken that role, PASCAL language is being pushed to the corner.

7. COBOL60 :
COBOL was basically created for business applications, it had to give in to more comfortable static typing of Java or dynamic typing of Python. Because of its strong typing rules, and difficulty in parsing, companies are opting for other languages.

8. Cold Fusion:
It was launched by Adobe with several advanced features for web and mobile application development. In addition, it comes with different versions developers and enterprises can use according to their needs. But due to poor quality of debugging and lack of package manager, it is gradually losing its popularity.

9. Objective-C
Once a popular language for developing Apple’s applications, it is quickly being replaced by Swift. More developers are choosing Swift over er Objective-C, for Swift is feature-rich. However, it has not crashed to as low as one might expect, thanks to its legacy code.

10. Perl :
It is the collective name given to the bunch of high-level, general-purpose, interpreted, and dynamic programming languages. Though its main drawback lies in draining the CPU’s capacity, it is being rendered redundant as programmers are being drawn to other modern languages.