Kornesh Kanan

SIREN: Implicit Neural Representations with Sine

Kornesh Kanan — Fri, 26 Jun 2020 18:27:00 GMT

Machine learning is just a function takes encoded representation of input $x$ and maps it to encoded representation of output $y$. Does the way we choose to encode the input has an impact on the model? Can we represent an image better than a grid of pixels with 3 channels for RGB? Can we represent an audio clip better than a spectrogram?

Imagine we hadn't chosen to encode images as grids of pixels, would we be using CNN today? I think this a very intriguing question to understand how much the method of encoding has impacted the algorithm. If you had a different representations, would you come up with different algorithms? Because to me, it's not at all obvious that one particular representation is particularly good or a particularly bad. Maybe there are other better ones.

Perhaps we can train a neural network to represent them better? That's key idea behind the recent, mind blowing paper SIREN; they encode an image in the weights of a single neural network. Think of overfitting an entire neural network by training it with a single image. The neural network has to learn a compact representation of just a single image. If neural network are capable of learning to differentiate between thousand of image classes in ImageNet, it surely can overfit a single image, right?

In this paper they used a very simple MLP for this but the conventional non-linear activation functions like ReLU and TanH is replaced with the periodic sine function. Because ReLus are not that great at learning representations.

They also seemed to suggest a initialization scheme in the paper but it seems similar to the default initializer in Tensorflow. Why sine is outperforming ReLU and what's so special about sine?

What is sine?

I have not bothered to understand or appreciate sine beyond the "SOH CAH TOA" until now. It is beautifully explained in detail at here, with some missing pieces. I will attempt to summarize it here.

Sine is a repeating pattern that is one dimensional. It moves up and down. Starts from 0, moves to 1 and then dives to -1, finally returns to 0. Sine is a gentle back and forth rocking pattern.

The speed of sine is non-linear, it speeds up & slows down in cycles. Let's say it takes 10 seconds for sine to move from 0 to 1. After the first 5 seconds it would have traveled 70% distance. It will take another 5 seconds to travel the remaining 30%. And going from 98% to 100%, the final 2% takes almost a full second!

How sine is difference between circles? Just like how squares are examples of lines, circles are examples of sine.

Let's define $\pi$ as the time sine takes from 0 to 1 and back to 0. Similarly, $\pi$ is the time from 0 to -1 and back to 0. $\pi$ is about returning to center or 0. So it takes 2 * $\pi$ for a full cycle.

What's special about sine?

The derivative of a sine is also a sine — as cosine is just a shifted sine. Whut?

$ \frac{d}{dx} sin(x) = cos(x) \\ \frac{d}{dx} cos(x) = - sin(x) $

If we plot the graphs, $cos(x)$ is just $sin(x)$ horizontally shifted by $\frac{\pi}{2}$

$\begin{aligned} sin(x) &= cos(x - \frac{\pi}{2}) \\ cos(x) &= sin(x + \frac{\pi}{2}) \end{aligned}$

None of other commonly used non-linear activation functions has this property. This allows us to not only represent the image itself but its derivatives too!

Another benefit is neural representations are continuous and sort of has unlimited resolution, just like reality. When we take a picture, the camera sensor is actually sampling reality discretely couple of micrometers apart from one pixel to another. Let's say if you want to know the color of an image at coordinate (x, y), you can't ask for the RGB value the between those two discrete pixels. You'd have a constraint that x and y must be an integer within the range of the image's height and width. Neural representation doesn't have this limitation and you can query what's the color of an image at (1.2, 200.5)? You can query multiple resolutions of the image with the same representation!

Alright. Let's dive into the code. The original paper was implemented in Pytorch but the following is my attempt to reproduce it in Tensorflow.

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

in_features = 2
out_features = 1
hidden_features = 256
hidden_layers = 3
outermost_linear = True
first_omega_0=30.
hidden_omega_0=30.

class Siren(keras.layers.Layer):
    def __init__(self, in_features = 2, hidden_features=256, is_first=False, is_linear=False, omega_0=30.):
        super(Siren, self).__init__()
        self.omega_0 = omega_0
        self.is_first = is_first
        self.is_linear = is_linear

        if is_first:
          init = tf.keras.initializers.RandomUniform(minval=-1 / in_features, maxval=1 / in_features)
        else:
          init = tf.keras.initializers.RandomUniform(minval=-np.sqrt(6 / in_features) / omega_0, maxval=np.sqrt(6 / in_features) / omega_0)

        #From https://www.tensorflow.org/guide/keras/custom_layers_and_models
        self.w = self.add_weight(shape=(in_features, hidden_features), initializer=init, trainable=True)
        self.b = self.add_weight(shape=(hidden_features,), initializer="zeros", trainable=True)

    def call(self, inputs):
        if self.is_linear:
          return tf.matmul(inputs, self.w) + self.b
        return tf.sin(tf.multiply(self.omega_0, tf.matmul(inputs, self.w) + self.b))

That's it, the only different part is we're using the initializer as exactly referenced in the paper. Before we train the model, we need couple of helper functions (which I have not ported over to Tensorflow).

import torch
from torch import nn
from PIL import Image
import skimage
from torchvision.transforms import Resize, Compose, ToTensor, Normalize
import time
import matplotlib.pyplot as plt

def get_mgrid(sidelen, dim=2):
    '''Generates a flattened grid of (x,y,...) coordinates in a range of -1 to 1.
    sidelen: int
    dim: int'''
    tensors = tuple(dim * [torch.linspace(-1, 1, steps=sidelen)])
    mgrid = torch.stack(torch.meshgrid(*tensors), dim=-1)
    mgrid = mgrid.reshape(-1, dim)
    return mgrid

def get_cameraman_tensor(sidelength):
    img = Image.fromarray(skimage.data.camera())        
    transform = Compose([
        Resize(sidelength),
        ToTensor(),
        Normalize(torch.Tensor([0.5]), torch.Tensor([0.5]))
    ])
    img = transform(img)
    return img

Finally, we can train the model and the output should be close to the input image.

BATCH_SIZE = 8192
EPOCHS = 100
sidelength = 256
img = get_cameraman_tensor(sidelength)
print(img.shape)
pixels = img.permute(1, 2, 0).view(-1, 1)
coords = get_mgrid(sidelength, 2)

strategy = tf.distribute.MirroredStrategy()

with strategy.scope():
  inputs = tf.keras.Input(shape=(2,))
  x = Siren(in_features=2, is_first=True)(inputs)
  x = Siren(in_features=256)(x)
  x = Siren(in_features=256)(x)
  x = Siren(in_features=256)(x)
  outputs = Siren(in_features=256, hidden_features=1, is_linear=True)(x)

  model = keras.Model(inputs=inputs, outputs=outputs, name="Siren")
  model.summary()

  train_dataset = tf.data.Dataset.from_tensor_slices((coords, pixels))
  train_dataset = train_dataset.batch(BATCH_SIZE).cache() #.shuffle(10000)
  train_dataset = train_dataset.prefetch(tf.data.experimental.AUTOTUNE)

  optimizer = tf.keras.optimizers.Adam(learning_rate=1e-4)
  loss = tf.keras.losses.MeanSquaredError(reduction=tf.keras.losses.Reduction.NONE)
  model.compile(optimizer, loss=loss)
  model.fit(train_dataset, epochs=EPOCHS, verbose=0)
  result = model.predict(train_dataset)
  plt.imshow(result.reshape(256,256))

Setting up nginx-ingress without Google Load Balancer in GKE

Kornesh Kanan — Wed, 17 Jun 2020 12:45:14 GMT

It's nice to have a load balancer to dynamically distribute incoming traffic across nodes inside Kubernetes cluster. However, it is a bit of overkill when you are running your side projects on GKE with a single node and it costs additional ~$20 a month for an otherwise inexpensive cluster.

We could avoid this by running ingress controller on hostPort which exposes http ports via the host IP. Changing the service type to NodePort prevents GCP from creating forwarding rules in Google Load Balancer. Using nginx-ingress Helm chart we can easily achieved this by setting the values as below:

controller:
  kind: DaemonSet
  daemonset:
    useHostPort: true
  service:
    type: NodePort

values.yml

I'm not a big fan of Helm charts in general, so here I'm just using it to generate the yaml configurations and then manually apply it using kubectl.

helm fetch stable/nginx-ingress --untar --untardir nginx
helm template nginx/nginx-ingress --name nginx-ingress --values values.yml > k8s.yml
kubectl apply -f k8s.yml

Ideally, in a production system I would recommend you to use a load balancer as the cost out weights the benefits.

Programmatically accessing Google Sheets via Service Account

Kornesh Kanan — Tue, 19 May 2020 13:41:00 GMT

I was working on automation that involves reading Google Sheets from server side and then writing some values back.

Initially this seemed straight forward but led me to waste a couple of hours pursuing the false start of "Domain Wide Delegation" which would allow anyone impersonate as a user without the whole UI based OAuth2 flow. Turns out you don't actually need this.

Instead, simply create a service account and then share your Google Sheet with email address of that service account (sa-name@project-id.iam.gserviceaccount.com) just like how you would share a document with an actual person.

Now, the official docs are pretty confusing and vague on how to authenticate Google Sheets using a service account. Ironically, after ton of Googling, I finally manage to include Google Sheet's access scope with the service account in Python API client via from_service_account_file.

from googleapiclient.discovery import build
from google.oauth2 import service_account
import pandas as pd
from datetime import datetime

credentials = service_account.Credentials.from_service_account_file(
    '/tmp/serviceaccount.json', scopes=['https://www.googleapis.com/auth/spreadsheets'])
service = build('sheets', 'v4', credentials=credentials)
sheet = service.spreadsheets()

sheetId = 'XXXXXXXXXXXXXXXXX'
tabName = 'Sheet1'

# Read
result = sheet.values().get(spreadsheetId=sheetId, range=tabName).execute()
values = result.get('values', [])
df = pd.DataFrame(values)
print(df)

# Write
data = df.values.tolist()
data.append(["Updated at", datetime.now().isoformat()])
result = sheet.values().update(spreadsheetId=sheetId, range=tabName, valueInputOption='RAW', body={
    'values': data
}).execute()

This reminds me of a favorite quote of my colleague

Straight forward does not mean easy.

Hope this helps someone save a couple of hours.

On GKE Cluster Management Fee

Kornesh Kanan — Thu, 05 Mar 2020 09:09:00 GMT

On June 6, 2020, your Google Kubernetes Engine (GKE) clusters will start accruing a management fee, with an exemption for all Anthos GKE clusters and one free zonal cluster. ...

I was utterly disappointed when I received this email last night. It introduces additional $75 per month per cluster. Apparently, I wasn't the only one who felt this way, folks at HackerNews were furious. Some of them were willing to migrate their entire stack off GCP and others would even go to extremes of rolling out their own DIY cluster. However, after going through the thread, I think this fee is fairly justifiable.

I discovered people were creating tons of small clusters purely for isolation purposes without realizing the additional overhead that it implies. This fee will be a slap on the wrist (or face in my case) and will encourage teams to create larger clusters instead and use proper isolations at namespace level; perhaps injuntiction with Istio and gVisor.

This new pricing will match it’s AWS counterpart, EKS, which already charges $0.1 per hour per cluster. Given the maturity and feature sets on GKE far outweighs EKS, it is not completely unreasonable. Plus, there’s one free cluster exempted from this fee, per account.

In the thread, I noticed people tend to get irrational when they’re upset and feel powerless. Instead of fixing their existing architecture they would rather do something completely irrational. Weirdly enough, I could relate them. I think we sometimes tend resist change under the false clause of stability. Clearly, I’m guilty of this too; something for me to reflect on.

Also I discovered a gem on multi-tenancy on GKE, hidden deep in the thread.

10x faster RoBERTa tokenizer with Custom Tokens support

Kornesh Kanan — Tue, 21 Jan 2020 20:16:00 GMT

For my Question Answering Kaggle competition, I wanted to experiment replacing the BERT model with RoBERTa. This means I need to reencode and retokenize the entire Natural Questions dataset into TFRecords. This process was already taking hours with the WordPiece tokenizer used for the BERT models. RoBERTa uses a faster and language agnostic tokenizer called SentencePiece. However, in my experiment, SentencePiece tokenizer was significantly slower and would have took close to 12 hours to complete if I had let it continue.

Fortunately, I came across HuggingFace’s Rust tokenizer which was 10x faster but it is still in the early days and doesn't support Custom Tokens out of the box. While you might (rightly) think I clickbaited you (as I didn't actually write the 10x Rust tokenizer), I did wrote a wrapper on top of the Rust implementation to support Custom Tokens for separating Questions [Q] and Answers [A]. This might seem straightforward but can really tricky implement it right, if you're not aware how all these tokenizers actually work underneath the abstractions. Hope this helps someone.

import json
from transformers import RobertaTokenizer

from tokenizers import Tokenizer, pre_tokenizers, decoders
from tokenizers.models import BPE, WordPiece

class CustomRobertaTokenizer:
    def __init__(self, path):
        self.tokenizer = RobertaTokenizer.from_pretrained(path)
        vocab = path+"/vocab.json"
        merges = path+"/merges.txt"
        # Create a Tokenizer using BPE
        self.rust = Tokenizer(BPE.from_files(vocab, merges))
        # Use ByteLevel PreTokenizer
        self.rust.pre_tokenizer = pre_tokenizers.ByteLevel.new(add_prefix_space=True)
        # Use ByteLevel Decoder
        self.rust.decoder = decoders.ByteLevel.new()

        with open(path+'/added_tokens.json', 'r') as f:
            self.added_token = json.load(f)

        special_tokens = {"bos_token": "", "eos_token": "", "unk_token": "", "sep_token": "", "pad_token": "", "cls_token": "", "mask_token": ""}
        # self.special_token_map = {v: k for k, v in special_tokens.items()}

        for k, v in special_tokens.items():
            self.added_token[v] = k
            setattr(self, k, v)

    def tokenize(self, txt, add_prefix_space=True):
        streams = []
        tmp = []
        for w in txt.split():
            # print(w, tmp, streams)
            # if w in self.added_token or w in self.special_token_map:
            if w in self.added_token:
                if len(tmp) != 0:
                    streams.append(tmp)
                    tmp = []
                streams.append(w)
            else:
                tmp.append(w)

        # print(streams)
        if len(tmp) != 0:
            streams.append(tmp)
            tmp = []

        bpes = [" " + " ".join(x) for x in streams if isinstance(x, list)]

        # print("bpes", bpes)
        bpes_result = self.rust.encode_batch(bpes)
        # print(bpes_result)

        final_result = []
        for w in streams:
            if isinstance(w, list):
                final_result.extend(bpes_result.pop(0).tokens)
            else:
                final_result.append(w)
        return final_result

    def convert_tokens_to_ids(self, tokens):
        return self.tokenizer.convert_tokens_to_ids(tokens)

~~And you can use it as below:~~

# Provide the custom tokens [Q] and [A] in added_tokens.json
txt = " [Q] Who founded Google? [A] Google was founded on September 4, 1998, by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University in California. "

#Default tokenizer
tokenizer = RobertaTokenizer.from_pretrained('./nq-vocab')

# With Custom Tokenizer support
tokenizer = CustomRobertaTokenizer('./nq-vocab')
print(tokenizer.tokenize(txt))

Also, remember that the SentencePiece tokenizer requires additional effort for post processing as we don't know what encoding rules were used. We have to use some heuristics like longest-common-substring algorithm to map the outputs to the input tokens but this is not guaranteed to work everytime. I'm not sure how others have solved this problem.

~~You can get an excellent overview of various tokenizers from here and here.~~

Supporting clients that does not support SNI in EKS

Kornesh Kanan — Tue, 12 Nov 2019 16:06:00 GMT

SNI is something that's been enabled by default for most modern browsers and http clients. It allows us to serve multiple different SSL certs on the same IP address and TCP port. This is incredibly useful for multi tenancy in EKS when used along side Nginx Ingress Controller and cert-manager. Unsurprisingly, there are still some legacy clients that do not support this feature. When such clients make a request to an endpoints that uses SNI to route requests to their respective services (think multi tenancy), we might be getting a SSL connection error.
$ java -Djsse.enableSNIExtension=true SSLPoke example.com 443 # works $ java -Djsse.enableSNIExtension=false SSLPoke example.com 443 # SSL Error
To support such clients, we just need to give the endpoint a dedicated IP address (or set it as default) and avoid using any services that require SNIs like Nginx Ingress Controller. There are couple of options to achieve this in AWS.
Service level Classical ELB
The most straight forward option is to simply create a Service level ELB with type: LoadBalancer. You can use external-dns annotation to link the ELB to a domain in Route53. Or you can manually create a CNAME record in Route53 and map a subdomain (egnon-sni.example.com) to the ELB that will be provisioned (egelb-url.us-west-2.elb.amazonaws.com). The following is all yaml we need for this method to work.
apiVersion: v1 kind: Service metadata: annotations: #external-dns.alpha.kubernetes.io/hostname: non-sni.example.com service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-west-2:0000:certificate/XXXX service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https" labels: app: frontend name: frontend-proxy-svc spec: type: LoadBalancer ports: - port: 80 targetPort: 8080 name: http protocol: TCP - port: 443 targetPort: 8080 name: https protocol: TCP selector: app: frontend
Note that we are completely bypassing Nginx Ingress Controller and cert-manager here as that would require us to create a dedicated Kubernetes Ingress level ELB.
www.example.com -> ELB -> ingress-nginx + cert-manager -> service non-sni.example.com -> ELB -> service
Requires Service level ELB
We need dedicated ACM cert, we can't use Let's Encrypt
Requires manual CNAME record update every time ELB changes, can cook up some bash script to automated it tho
Additional ELB cost
Terminates SSL at ELB level
Ingress level ALB
This option has a high maintenance cost if you're not already using ALB as we need to install additional Operator called AWS ALB Ingress Controller.
Still needs a dedicated SSL cert
Although the AWS Application Load Balancer (ALB) is a modern load balancer offered by AWS that can can be provisioned from within EKS, at the time of writing, the alb-ingress-controller; is only capable of serving sites using certificates stored in AWS Certificate Manager (ACM). source
Requires manual CNAME record updates (according to the only cert-manager docs on ALB) but probably can be achieved using external-dns
Incurs additional cost of ALB
apiVersion: extensions/v1beta1 kind: Ingress metadata: labels: app: frontend name: frontend-proxy-ingress annotations: kubernetes.io/ingress.class: alb alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:0000:certificate/XXXX spec: tls: - hosts: - "non-sni.example.com" secretName: ca-star-example-com-key-pair rules: - host: "non-sni.example.com" http: paths: - path: / backend: serviceName: frontend-proxy-svc servicePort: 80 --- apiVersion: v1 kind: Service metadata: labels: app: frontend name: frontend-proxy-svc spec: type: NodePort ports: - port: 80 targetPort: 8080 name: http protocol: TCP selector: app: frontend
Ingress level classical ELB
Another potential solution but I didn't spend time investigating this.
How we could have done it in GCP
I can't help myself but to compare how this whole ordeal could have been so much easier in GCP. Simply create a reserved static IP named frontend-proxy-static-ip
$ gcloud compute addresses create frontend-proxy-static-ip --global $ gcloud compute addresses describe frontend-proxy-static-ip --global --format 'value(address)' # 35.186.228.000
Attach the static IP to Ingress by just adding one line annotation kubernetes.io/ingress.global-static-ip-name. Service just needs the usualtype: NodePort. Then we create an A Record pointing to reserved static IP from non-sni.example.com.
apiVersion: extensions/v1beta1 kind: Ingress metadata: labels: app: frontend name: frontend-proxy-ingress annotations: certmanager.k8s.io/cluster-issuer: ca-issuer-ent-frontend-com kubernetes.io/ingress.global-static-ip-name: frontend-proxy-static-ip kubernetes.io/ingress.class: nginx nginx.ingress.kubernetes.io/force-ssl-redirect: "true" spec: tls: - hosts: - "non-sni.example.com" secretName: ent-default-ssl-certificate rules: - http: paths: - path: / backend: serviceName: frontend-svc servicePort: 80
No load balancers are involved
We can still use Let's Encrypt certs via cert-manager, we don't need dedicated certs
No additional cost, in-use static IPs are free
No need to create additional service, note that we're using frontend-svc instead of frontend-proxy-svc

Understanding Computational Model of Quantum Computers

Kornesh Kanan — Mon, 07 Oct 2019 10:51:00 GMT

When I look at computers I would at least have some intuition how it all works in theory. There was bunch of theories we had to learn in university to reason with the mathematical models of it's computation. However, I had no idea theoretically how the quantum computers are supposed to be faster. So I went down the rabbit hole and I want to share what I have learnt so far before I forget it all.
The intuition
Most explanations on how quantum computers work are usually presented from a physicist's perspective. We often hear that quantum computers have qbits which can be both 0 and 1 at the same time, called a superposition. And when we measure it using a filter, it collapses to either 0 or 1.
Based on that explanation itself, I got the wrong idea that quantum computers can represent 3 distinct states any given time. But this doesn't really explain how it would outperform classical computers by a substantial margin. In complexity theory terms, the ability to represent n bits in $2^n$ vs $3^n$ isn't that significant.
The important piece of information that's often missing is that each qbits has some probability to be 0 and some probability to be 1 at the same time -- in other words a qbit can be both 0 and 1 with some probability to collapse to either one state when we measure it. We exploit this quantum property in our favor.
Trying to explain how quantum computers work with metaphors and analogies (like in pop science articles) makes it hard for us grasp the intuition behind quantum computers. I think our natural language is not equipped to deal with the level of intricacy required in the quantum world. So let's start with something more formal.
Single cbit vs Single qbit
Let's consider a classical bit (cbit) with the value 0 which can be written as $\begin{pmatrix} 1 \\ 0\end{pmatrix}$ or $| 0 \rangle$ using Dirac vector notation. A cbit with the value 1 can be written as $\begin{pmatrix} 0 \\ 1\end{pmatrix}$ or $| 1 \rangle$. Formally, a cbit can represented by $\begin{pmatrix} a \\ b \end{pmatrix}$ where $a$ and $b$ are numbers and $\lVert a \rVert^2 + \lVert b \rVert^2 = 1$.
We can use the same notation to represent quantum bits (qbit). However, the qbits lives in a richer space where it exists as both 0 and 1 over a probabily of each bit collapsing to one or another. We denote that probability coefficients instead of binary 0s and 1s.
For example, a qbit $\begin{pmatrix} 1 \\ 0 \end{pmatrix}$ has a 100% chance of collapsing to 0, and a qbit $\begin{pmatrix} 0 \\ 1 \end{pmatrix}$ has a 100% chance of collapsing to 1. Similarly, a qbit $\begin{pmatrix} 0.25 \\ 0.75 \end{pmatrix}$ has a 25% chance of collapsing into 0 and 75% chance of collapsing into 1. Accordingly, a qbit $\begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix}$ has a 50% chance of collapsing to either 0 or 1. We can summarize this as, if a qbit has value $\begin{pmatrix} a \\ b \end{pmatrix}$ then it collapses to 0 with probability $\lVert a \rVert^2$ and 1 with probability $\lVert b \rVert^2$.
Two cbits vs two qbits
Two cbits can exist in any one of $|00\rangle, |01\rangle, |10\rangle or |11\rangle$ states; it's important to emphasise that only one of the those state can exist at any given time. It's probable states can represented as the tensor products as shown below.
$| 00 \rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix} \otimes \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 1 \\ 0 \\ 0 \\ 0 \end{pmatrix}$
$| 01 \rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix} \otimes \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \\ 0 \\ 0 \end{pmatrix}$
$| 10 \rangle = \begin{pmatrix} 0 \\ 1 \end{pmatrix} \otimes \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 1 \\ 0 \end{pmatrix}$
$| 11 \rangle = \begin{pmatrix} 0 \\ 1 \end{pmatrix} \otimes \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \\ 1 \end{pmatrix}$
A single qbit can exist as both $|0\rangle$ and $|1\rangle$. Two qbits exists in as all four states $|00\rangle, |01\rangle, |10\rangle and |11\rangle$ simultaneously 🤯. For example, qbits $\begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix}$ and $\begin{pmatrix} 0.25 \\ 0.75 \end{pmatrix}$ exists in all four possible states $|00\rangle, |01\rangle, |10\rangle, |11\rangle$ simultaneously with probability of $0.125, 0.375, 0.125$ and $0.375$ respectively.
$\begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix} \otimes \begin{pmatrix} 0.25 \\ 0.75 \end{pmatrix} = \begin{pmatrix} 0.125 \\ 0.375 \\ 0.125 \\ 0.375 \end{pmatrix}$
2 cbits contains 2 bits of information. 2 qbits contains 4 bits of information that represents the current state of the system -- ie you need to give 4 pieces of information (the four coefficients $0.125, 0.375, 0.125$ and $0.375$) to describe the current state.
Eleven cbits vs eleven qbits
If you have $n$ qbits you get to work with $2^n$ probabilities. So every single additional qbit doubles the number of probabilities you get work with. Let's see why this is a big deal.
Imagine if you have 11 cbits and 11 qbits. With 11 cbits, we can store a single number at any given time, eg. 2019 as $|11111100011\rangle$. With 11 qbits, we can store all the numbers from 1 to 2048 ($2^{11}$) simultaneously, over a probability distribution. With just 300 qbits we could perform more calculations at once than there are atoms in the observable universe 🤯 🤯.
Operations on cbits vs qbits
When we instruct a quantum computers to do something, we're just changing these probabilities to favor some computation that we're trying to do. We can combine multiple primitive logic gates like AND, OR, XOR and NAND to perform operations on cbits to compute almost anything. Similarly, we can use quantum logic gates like NOT, CNOT, Z and Hadamard gate to perform quantum operations. By carefully manipulating the probabilities so that correct answers are amplified and incorrect ones cancel out, a quantum algorithm can solve exponential problems in polynomial time, eg. in simulating complicated chemical interactions.
We will have to take some classical bits, put them into quantum computer, do a bunch of quantum operations. And at the end of it, we can collapse them to zero or one, so that our whole computation is deterministic. We cannot measure all the states in a quantum computer but we can measure few final states to get the answer that we are after. We usually run the program multiple times to gather statistics to deal with the noise caused by stability issues with current gen quantum computers.
Quantum computers are in still in their early days and there are plenty of Turing Awards waiting to be collected.

Improving slow EFS throughput when copying thousands of small files

Kornesh Kanan — Wed, 25 Sep 2019 14:58:00 GMT

We were in the process of migrating our customers from a private cloud to AWS EKS. During our dry-runs, we noticed Customer A with 21GB worth of files had it transferred in 32 minutes from a local disk in the private cloud to EFS. However, it took us 2.5 hours to transfer Customer B with only 12GB worth of files.
This shouldn't be an issue if we were only planning to migrate one or two customer instances per week. Furthermore, we have a pretty narrow weekly deployment window that is only a few hours long. In order to migrate more customers in the same week, we had to do something about this bottleneck.
First, I repeated the dry-run to make sure this is not a one off event. It wasn't. However this time, I noticed that the EFS throughput was not reaching anywhere near the dedicated maximum throughput that we had allocated, which was 40Mbps.
So I ran couple of benchmarks on EFS using the fio tool. Strangely now the throughput was actually hitting the max.
It finally hit me and I checked the file counts for both customers using a simple find /data | wc -l. Turns out Customer A only had a total of 10,000 files whereas Customer B had a whopping 122,000 files!
I realized rsync and cp are a single threaded processes and we’d need to parallelize it using something like GNU Parallel. After some googling, came across AWS’s EBS to EFS throughput guide. In which they suggested the use of fpart + cpio + GNU Parallel for optimal performance.
fpart is a tool that gets of list of all files in a directory and splits the list into equal sized partitions based on number of files and total file size. We could then take the output lists and feed it into individual rsync or cpio threads to speed up copying process. So each thread would have similar workloads.
However, fpart binary was not readily available and I couldn't find any docker images that had it. So I had to create our own docker image and build it from source.
FROM ubuntu:18.04 RUN apt-get update && \ apt-get install -y build-essential autoconf rsync ca-certificates cpio parallel nload && \ gcc --version && make --version ADD https://github.com/martymac/fpart/archive/fpart-1.1.0.tar.gz / # Install fpart RUN tar -xvzf /fpart-1.1.0.tar.gz && \ ls -lsh / && \ cd /fpart-fpart-1.1.0/ && \ autoreconf -i && \ ./configure && \ make && \ make install && \ PATH=$PATH:/usr/local/bin && which fpart && fpart -V # Test fpart RUN export THREADS=$(($(nproc --all) * 16)) && \ echo $THREADS && \ ls -lsh /home && \ fpsync -n $THREADS -v -o "-achv --delete" /fpart-fpart-1.1.0/ /home/ && \ ls -lsh /home && \ rm -rf /home/* && \ ls -lsh /home CMD ["/bin/sh"]
I couldn't get the exact command that was mentioned in the AWS guide to work – even after hours of trial and error. We finally settled for using just fpsync (rsync with fpart) for simplicity. fpsync is part of the fpart package that we have built in the docker image above.
export THREADS=$(($(nproc --all) * 16)) echo $THREADS fpsync -n $THREADS -v -o "-achv --delete" /tmp/data /data
Even with this suboptimal solution, we managed to reduce the EFS copying time from 2h 30mins to just 20 mins.
Ideally we should move these files to S3 and serve them from there but this requires application level changes and we don't have that kind of bandwidth right now.