Samples

Here are a few examples of projects I have worked on.

ISTRBoost: Importance Sampling Transfer Regression using Boosting

ISTRBoost: Importance Sampling Transfer Regression using Boosting (arxiv.org link)

An implementation of a machine learning method for improving transfer learning while avoiding overfitting problems.

Partial abstract:

Current Instance Transfer Learning (ITL) methodologies use domain adaptation and sub-space transformation to achieve successful transfer learning. However, these methodologies, in their processes, sometimes overfit on the target dataset or suffer from negative transfer if the test dataset has a high variance. Boosting methodologies have been shown to reduce the risk of overfitting by iteratively re-weighing instances with high-residual. However, this balance is usually achieved with parameter optimization, as well as reducing the skewness in weights produced due to the size of the source dataset. While the former can be achieved, the latter is more challenging and can lead to negative transfer. We introduce a simpler and more robust fix to this problem by building upon the popular boosting ITL regression methodology, two-stage TrAdaBoost.R2.

Paradise: real-time, generalized, and distributed provenance-based intrusion detection

Paradise: real-time, generalized, and distributed provenance-based intrusion detection (Google Scholar Link)

This paper proposes a distributed intrusion detection system using machine learning techniques.

Partial abstract:

Identifying intrusion from massive and multi-source logs accurately and in real-time presents challenges for today’s users. This paper presents Paradise, a real-time, generalized, and distributed provenance-based intrusion detection method. Paradise introduces a novel extract strategy to prune and extract process feature vectors from provenance dependencies at the system log level, and it stores them in high-efficiency memory databases. Using this strategy, Paradise does not depend on the specific operating system type or provenance collection framework. Provenance-based dependencies are calculated independently during the detection phase, thus, Paradise can negotiate all detection results from multiple detectors without extra communication overhead between detectors.

SoK: Identifying Mismatches Between Microservice Testbeds and Industrial Perceptions of Microservices

Identifying Mismatches Between Microservice Testbeds and Industrial Perceptions of Microservices (JSys link)

This paper attempts to develop more realistic benchmarks for microservices.

Partial abstract:

Microservices have increasing been a popular way of designing and building large-scale distributed systems. The challenges in developing microservices-based distributed applications have given rise to much academic research. However, the benchmarks used in academia are far from real-world microservices-based applications. This paper fills this gap and proposes ways microservices benchmarks should evolve to be more realistic. The authors take the readers through the limitations of existing testbeds, interviews with industry participants focusing on understanding the distance between benchmarks and real-world microservice-based applications, and propose ways to improve existing testbeds.