Projects

I like writing code. I like building product. I like making things that people like. - Paul Buccheit

Table of Content

Ask & Answer
Hire | Paperless Applicant Tracking System
Github Recommender
Finanz | Visualize Your Expenses
My Personal Website

Ask & Answer

Last updated: 19 November 2020

Ask & Answer - Home Page — Figure 2. Redesign of the Home Page

Figure 3. Profile Detail Page

Ask & Answer - Topic Detail Page — Figure 4. Topic Detail Page

Ask & Answer - Question Detail Page — Figure 5. Question Detail Page, highlighting the answer question feature

Ask & Answer - Question Detail Page Comment — Figure 6. Question Detail Page, highlighting the comment and reaction feature

You might have asked yourself once (and now twice) - how hard is it to build a social media website like Facebook? Someone else might be asking themself a similar question - how to build Quora?

Let me tell you something, there are no such thing as dumb questions.

It's hard. Period.

I have always been intrigued on how to create an application like Quora or Facebook. Ask & Answer is a hybrid of both. I wanted to recreate the bare minimum of it with the least resources. So here's the tech stack for it:

Tech Stack: Golang GraphQL (Backend), React + Apollo Client v3 + Storybook (Frontend)
Platform: Web
Storage: Postgres (with a nicely done database design)

Users can ask questions, comment, post answers, leave reactions as well as connect with other users by sending friend requests. Designing the backend is indeed a challenge, and I took a lot of consideration when designing the database schema using Postgres, a Relational Database Management System (RDMS). Given the nature of the application, a graph database would have been an interesting choice. But since Postgres (or MySQL) is more readily available, I decided to go with it.

Both backend and frontend has undergo several revision, just to get the structure right. Since it is possible to traverse infinitely when using GraphQL (question has answer, answer belongs to question), I had to be careful to avoid circular dependency when writing in golang.

It is currently still work in progress, and I have barely scratched the surface.

HIRE | Paperless Applicant Tracking System

Last updated: 19 November 2020

HIRE serves the needs of three different target audience - recruiters, job seekers and companies. HIRE attempts to close the gap between by covering their different needs.

Recruiters wants a better way to manage the candidate - from having a place to store their contact details as well as documents, and keeping track of the application status. Recruiters can also add notes for the different hiring stages, keeping it transparent to other collaborators. This helps to minimize duplicate work as well as leaving a digital footprint of the hiring decisions for future reference.

Job seekers wants to keep track of their current job application, as well as being notified whenever there are changes in their application status. Even if their application is rejected, job seekers wants confirmation in order to move on.

Companies wants a space where they can work closely with recruiters and view their hiring metrics.

HIRE is a mesh of features that covers all their needs. The technology stack is as follow:

Tech Stack: TypeScript GraphQL + DataLoader (Backend), TypeScript React + Storybook (Frontend)
Platform: Web
Storage: Postgres

The backend is designed as a monolith, but with clear separation between services. Each services are a single folder, so it can be broken into microservices when needed. However, structuring it as a monolith was a clear choice for the following reasons:

The boundaries has not been clearly defined in the beginning. It only starts to be more clear after several iteration.
It is easier to work with database transactions in a monolith.
Cross-cutting concerns such as authorization, logging, dataloaders can be shared more easily.

The code is well architectured, and is production ready. Unit and integration tests are also written to cover the basic cases. I rarely do TDD (test-driven-development), but rather TAD (test-after-development). Having a coverage of more than 80% is usually sufficient for me, with more focus on the code that produces side-effects (a.k.a. write operations). The integration tests also covers running reads and writes against the database that is spun up through docker compose - something I see as essential since a lot of silly mistakes usually happens at that layer that is undetected should one just mock the layer.

Another interesting aspect of this project is code generation. Most of the CRUD operations are generated from templates, including the unit and integration tests. The only changes that are manual is the mapping to the database columns, and custom business logic. Sofware writing sofware.

Github Recommender

Last updated: 16 June 2020

This is an interesting one. After working on the expense tracker, I wanted to deal with more data-driven stuff. So I scraped GitHub's data (calling API is not technically scraping, but I just want to sound cool).

Yes, again, I have worked on different iterations on this app. The present version is v3 for the frontend.

Backend: initial version uses NodeJS + TypeScript but then switched to using golang , dockerized. The latest go code is in a separate private repo.
Frontend: ReasonML 😂, now using Vue Class Decorator + TypeScript
Recommendation: using TF-IDF algorithm written in golang, and also Trie Search

There are several interesting domain here, mainly the scraping and matching part.

Scraping

Scraping is easy, if you only need to get the data once. There were two problems that I had to deal with - stale data and GitHub's rate limiting. In order to keep the data up to date, I have to periodically fetch new data for new users that are created in Malaysia and Singapore. This is done by keeping track of the last created user, and using the timestamp as a cursor to fetch newer users that are created after that period.

For each user, I need to fetch only repositories that are updated since the date their data has been last scraped. Using the delta timestamp, we can minimize the fetching and avoid unnecessary calls. I wanted to add an additional logic to prioritize active users (those with repositories, and are still updating them), though I still didn't find the time to do so.

To prevent the api from being rate limited, I had to add a throttle on the api calls, pause when the rate limits has been exceeded, and resume scraping. Note that due to this limitation, I only allow GitHub user's from Malaysia and Singapore to be scraped.

The scraper runs periodically every day to fetch new users that are created the day before, and every minutes to fetch the user's repositories. I can only conclude that building a resilient scraper is not easy.

Matching

Matching GitHub user's is probably one of the more exciting feature. Initially, I created this in order to find users similar to my profile, based on the types of programming languages used, the repository's name, description and tags, as well as workplace.

Since it is a project, I wanted to avoid using library (and end up writing some 😊).

alextanhongpin/autocomplete : An auto-complete and auto-correct server. Auto-complete feature using Trie and auto-corect using BK-Tree which uses Damerau-levenshtein as the distance metric
alextanhongpin/typeahead : Auto-complete implementation using Trie
alextanhongpin/stringdist : Various string distance implementation in golang, used in the autocomplete server

The algorithm used for the GitHub recommender is TF-IDF, which stands for term frequency, inverse document frequency. I got interested in string algorithms after reading about Natural Language Processing and Text Mining. While the results are relevant (at least to me), I find that it can never be perfect, due to the fact it relies heavily on information that is scraped from GitHub.

Aside from the accuracy of the matching algorithm, another issue that I faced initially was how expensive it was to perform the matching. At one point, the server just crashes due to the intensive calculation and sorting (sorting causes the CPU to shoot up to 100% in my cheap linode instance running Docker). Loading everything into memory wasn't the smartest choice either, as I soon face out of memory (OOM) issue for the application which causes constant crash. After several attempts at profiling (another reason to choose Golang over Node, because the profiling tool was much more mature at that time and allows me to find the bottleneck in the application), I manage to optimize the algorithm used, and also rewritten it to work in batches instead of loading everything into memory.

😊 After everything works, I just ditched this project and continue working on other things. You can see the screenshot of the application below (it is no longer hosted, but a picture paints a thousand words).

Finanz | Visualize Your Expenses

Last updated: 15 June 2020

Your Personal Finance Manager. Link here

An application to manage and visualize your expenses.

Finanz is an application that I wrote for myself to manage my expenses. While most people normally experiment different frontend frameworks with a TODO app, I do it with Finanz. If you visited my Behance website, you might have seen different iterations on this app - in fact, it is something I have been working progressively on.

I have rewritten this app with different tech stacks:

Frameworks: from jQuery to Backbone, to React, Vue and back to React (with React Hooks and styled components)
Platform: Google Chrome Extension, Electron, Web App (present), React Native (not pursued)
Storage: From nedb, indexedDB (attempted to do offline storage, but couldn't get it to sync when online), Firebase (present)

Though the frameworks changed (mainly due to code structure, such as when React Hooks are introduced etc), the business logic is plain simple - aggregate transactions and present it to the user, whether it is daily, weekly, monthly, yearly or total balance.

The attempts with various frameworks also taught me their limitation. For example, the present stack is using Firebase Realtime Database as the storage (cause I did not want to pay for storage, and since I am the only user, I can go lean). Yeah, I know there's FireStore now, but the point is using Firebase is not the most ideal choice for several reasons - the primary being the business logic is handled on the client side. Business logics should ideally be handled on the server. Also, to aggregate the total balance, all the data has to be fetched from the Firebase storage to be computed on the client side. Snapshotting data might work, but comes at the risk of data inconsistency (transactions in Firebase only works for FireStore at the present, if I am not wrong).

In terms of presentation, the UI definitely has changed a lot too. The very first version only has two screens (and it is a Chrome Extension).

😊 I probably will rewrite it again in the near future.

My Personal Website

Last updated: 19 May 2020

You are looking at it now! 😃

Why my first website is also my favorite - because it is simple.

It was Jun 2014 that I decided to create my personal website.

With my limited knowledge in HTML and CSS, I designed my website with as little code as possible. That aligns with the principle make it work, make it right, make it fast . 😊

Technology Stack

I have rewritten the site a few times over the years, mainly to keep up to date with new technologies. But simplicity is still at its core.

Past- from vanilla JavaScript to jQuery to Backbone.js to hyperapp with plain CSS3
Present- Svelte, now with CSS Variables and CSS Grid layout

Keeping things simple is hard, especially when you are equipped with more knowledge. When I look back at the old code that I wrote years ago, I tend to ponder -was I the one who wrote this? Then I will look at my current code and rewrite it to be simpler. Somehow I just need to remind myself not to complicate things just because you can.

Design Principles

I have learn and adopted several design principles when designing this site. This includes:

limiting color choices to just red, white and shades of black
structuring components using Atomic Design Pattern
using Block Elements Modifier (BEM) naming methodology (before CSS are scoped, this is the convention I used to avoid clashing class names)
mixing two different fonts a.k.a font pairing to make the title stands out from the content
applying Modular Scale (perfect fourth: 4 / 3) and Vertical Rhythm to improve readability of the site

I think applying patterns, and being aware of the philosophy behind each decision makes a difference in the end result. See the difference in one of the page below by clicking on it.

Before:

After:

It keeps getting better and better. 😄