Casual Inference

Creating Executable Shiny App for Local Use

Posted on December 27, 2024 | 6 minutes | 1110 words | John Lee

I recently developed an R Shiny app for my team. Hosting Shiny app at shinyapps.io is a great and easy way to quickly deploy the app online and share with others. The website does allow you to host a few shiny apps for free, but there are some limitations. There is a cap to the number of hours apps can run monthly, and anyone can access the app once it’s deployed. [Read More]

visualization

woRdle Play

Posted on August 23, 2022 | 8 minutes | 1683 words | John Lee

Intro After watching 3Blue1Brown’s video on solving Wordle using information theory, I’ve decided to try my own method using a similar method using probability. His take on using word frequency and combining this with expected information gain quantified by bits for finding the solution was interesting. This is a great approach, especially when playing against a person, who may chose to play a word that’s not in the predefined list of the official Wordle webiste. [Read More]

coding prediction

wordle_guesser

Posted on August 18, 2022 | 1 minutes | 73 words | John Lee

Wordle is a game currently owned and published by the New York times that became massively popular during the Covid 19 pandemic. It is a game in which a player has 6 attempts to guess a 5 letter word that changes each day, where each attempt is coupled with hints for each letter of the word. You can find more detail about this game in the wikpedia or at the game website itself. [Read More]

coding

Linear Regression on Coffee Rating Data

Posted on January 7, 2021 | 14 minutes | 2813 words | John Lee

While I am reading Elements of Statistical Learning, I figured it would be a good idea to try to use the machine learning methods introduced in the book. I just finished a chapter on linear regression, and learned more about linear regression and the penalized methods (Ridge and Lasso). Since there is an abundant resource available online, it would be redundant to get into the details. I’ll quickly go over Ordinary Least Squares, Ridge, and Lasso regression, and quickly show an application of those methods in R. [Read More]

linear regression regression

UIUC Public GPA Dataset Exploration with Shiny

Posted on December 28, 2020 | 1 minutes | 205 words | John Lee

Last year, I thought it would be a good idea to dig through the GPA data set available from here. I started building a Shiny app that lets the user explore certain aspects of the data. Now, it’s almost been a year and I haven’t got the chance and the will to work on it until now. I made it really simple so that I can quickly move on to other topics instead of dragging this on for another year with an unfinished product. [Read More]

coding visualization

Grasping Power

Posted on November 10, 2019 | 12 minutes | 2377 words | John Lee

I was reading a paper on calculation of sample sizes, and I inevitably came across the topic of statistical power. Essentially, when you’re designing on experiment, the sample size is an important factor to consider due to limiting resources. You want to have a sample size that is neither too small (which could result in high chance of failure to detect true differences) nor too big (potential waste of resources, albeit yielding better estimation). [Read More]

coding theory visualization

The Phi Function

Posted on October 10, 2019 | 3 minutes | 446 words | John Lee

I frequently encounter the

\Phi

and

\Phi^{-1}

functions in statistical texts. For some reason, the notation always throws me off guard, and I have to spend a few minutes visualizing. This post draws a definitive link between the functions and corresponding graphs. This ought to help me save some time and build more solid understanding of the concepts that make use of this. The

\Phi

function is simply cumulative distribution function,

F

, of a standard normal distribution. [Read More]

Sorting Comparison Pt. 2

Posted on August 4, 2019 | 13 minutes | 2641 words | John Lee

Load all the datasets that I’ve saved from the previous benchmarks set.seed(12345) library(microbenchmark) library(tidyverse) library(knitr) library(kableExtra) load("2019-03-01-sorting-comparison/sort_comparisons") Blowing off the Dust I see that in my environment, two variables, special_case_sort_time and trend_sort_time are loaded. It’s been a long time since I’ve created these data, so I have an unclear memory as to what these objects are. Usually I use str, class to understand they are. I also make use of head to quickly glance at the data usually if it is a data. [Read More]

coding visualization

Sorting Comparison

Posted on March 8, 2019 | 20 minutes | 4185 words | John Lee

As I’m self studying algorithms and data structures with python from here, I figured I could try to do some experiments with different sorting algorithms using my own implementations in R. Types of sorting algorithms I will use: Bubble Sort Insertion Sort Selection Sort Shell Sort Merge Sort Quick Sort I will be dealing with a vector of type double. It can be a collection of any real positive numbers. [Read More]

coding

Two-Dimension LDA

Posted on February 4, 2019 | 6 minutes | 1236 words | John Lee

LDA, Linear Discriminant Analysis, is a classification method and a dimension reducion technique. I’ll focus more on classification. LDA calculates a linear discriminant function (which arises from assuming Gaussian distribution) for each class, and chooses a class that maximizes such function. The linear discriminant function therefore dictates a linear decision boundary for choosing a class. The decision boundary should be linear in the feature space. Discriminant analysis itself isn’t inherently linear. [Read More]

regression classification