Model vs data

How do we choose a reasonable starting point when modeling some data? In the context of statistical inference, this question takes on a prominent dimension as we typically begin our analysis with a fairly simple model that represents the system or process, with reasonable accuracy. This model can then be used to perform a nested sampling operation or equivalent such that we obtain a posterior pdf or estimate.

Often, when building complex models from data, it can be useful to start the process with a least-squares optimized model that fits data obtained from an experiment as a baseline/starting point.


During the last couple of weeks, I’ve been simulating some data for a statistical inference project. While the topic of simulating data is quite broad and depends on the application, I thought that a more general approach of simulating data points that follow a particular probability distribution may be useful to those working on scientific experiments and mathematical modeling.

This post combines a few basic techniques in order to generate some simulated data that follow the distribution of a given probability density function (p.d.f).

Let’s assume that we have a p.d.f of the form,

Plotting this to verify,

The probability density function of a log-transformed random variable, whose p.d.f was a standard normal

I’ve been working in the log domain over the last couple of weeks, specifically using the natural logarithm, denoted by “ln”. Life has been easier this way.

The dataset I’m working on has a component driven by a random variable X that is distributed normally (a Gaussian distribution) with mean = 0 and standard deviation = 1.

In my last post, I transformed this dataset entirely to the log domain by using the transformation x → ln(x) using numpy and python,

In case you missed it, the resulting discussion around this can be found here.

The component that makes…

Data was right. Data is always right.

I’ve been pretty busy working with some data from an experiment. I’m trying to fit a subset of the data to a model distribution/distributions where one of the functions follows a normal distribution (in linear space). Sounds pretty simple right?

Based on the domain knowledge of this problem, I also know that the data can probably be fitted by a mixture model and more specifically a Gaussian mixture model. Brilliant you say! Why not try something like,

from sklearn.mixture import GaussianMixturemodel = GaussianMixture(*my arguments/params*)*my arguments/params*)

But try as I might I couldn’t find parameters that should model the…

This was my second project at MAS Innovations and was extremely rewarding as we were able to build a completely new line of business for MAS Holdings with a brand presence in New York (and now global). The product is a reusable women’s undergarment capable of managing stress incontinence and light period flow with antimicrobial and anti-odour technologies embedded in the gusset.

We took a three-pronged approach based on the MIT Model of Innovation (Market, Implementation, and Technology) and were able to take this product to market in the US within a relatively short 12 month period. I served as…

I covered some asteroid and NEA basics in my last post. In this post, we will examine some NEA/NEO data and try and understand some metrics related to potentially hazardous/dangerous NEO (Near Earth Objects). Let’s begin.

Our exploration starts with some publicly available data via the CNEOS API, specifically, I will be examining the following dataset.

Setting the following values “Observed anytime”, “Any impact probability”, “Any Palermo scale” and “Any H” returns a database query which at the time of this writing produces a dataset with 990 rows. I have converted the dataset to CSV and connected it to a…

Fig 1 — Asteroid 243 Ida as seen by the Galileo probe on August 28, 1993. Image Credit: NASA/JPL/Processed by Kevin M. Gill, Ida’s moon Dactyl is on the right

Our Solar System is a strange place and there’s a lot we don’t know and don’t fully understand. There’s no better way to reflect on this point than to take a historical perspective. The discovery and characterisation of the planets and other bodies in our Solar System can serve as a great starting point. I’ve been spending some time on Solar System dynamics and thought I’d take a close look at asteroids, specifically the Main Asteroid Belt between Mars and Jupiter as well as Near Earth Asteroids (NEAs). There’s much to learn here so let’s dive in!

What are asteroids?

Open edX Logo

Digital Ocean DO is a public cloud service provider which is a good alternative to the more popular AWS, Azure and Google Cloud. I came across DO thanks to their excellent documentation which in my opinion, blows AWS out of the water. While AWS will try to sell you on getting their training and accreditation, DO assumes that you just want to learn straight away and start implementing.

Having said that I hadn’t really implemented anything on DO, just played around with something my friend Giles had put together for his shiny-server.

Giles and I have been working on Programming…

Praveen Jayasuriya

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store