Tuesday, November 21, 2017

Dynamic information equilibrium: UK CPI

The dynamic information equilibrium CPI model doesn't just apply to the US. Here is the UK version (data is yellow, model is blue):


The forecast is for inflation to run at about 2.1% (close to the US dynamic equilibrium of 2.5%) in the absence of shocks:



Monday, November 20, 2017

Numerical experiments and the paths to scientific validity

Christiano et al got much more attention than their paper deserved by putting in a few choice lines in it (Dilettantes! Ha!). Several excellent discussions of the paper — in particular this aspect — are available from Jo Mitchell, Brad DeLong (and subsequent comments), and Noah Smith.

I actually want to defend one particular concept in the paper (although as with most attempts at "science" by economists, it comes off as a nefarious simulacrum). This will serve as a starting point to expand on how exactly we derive knowledge from the world around us. The idea of "DSGE experiments" was attacked by DeLong, but I think he misidentifies the problem [1]. Here is Christiano et al:
The only place that we can do experiments is in dynamic stochastic general equilibrium (DSGE) models.
This line was attacked for its apparent mis-use of the word "experiment", as well as the use of "only". It's unscientific! the critics complain. But here's an excerpt from my thesis:
The same parameters do a good job of reproducing the lattice data for several other quark masses, so the extrapolation to the chiral limit shown in Fig. 2.3 is expected to allow a good qualitative comparison with the instanton model and the single Pauli-Villars subtraction used in the self-consistent calculations.
Lattice data. I am referring to output of lattice QCD computations that are somewhat analogous to using e.g. the trapezoid rule to compute integrals as "data" — i.e. the output of observations. Robert Waldman in comments on DeLong's post makes a distinction between hypothesis (science) and conjecture (math) that would rule out this "lattice QCD data" as a result of "lattice QCD experiments". But this distinction is far too strict as it would rule out actual science done by actual scientists (i.e. physicists, e.g. me).

Saying "all simulations derived from theory are just math, not science" misses the nuance provided by understanding how we derive knowledge from the world around us, and lattice QCD provides us with a nice example. The reason we can think of lattice QCD simulations as "experiments" that produce "data" is that we can define a font of scientific validity sourced from empirical success. The framework lattice QCD works with (quantum field theory) has been extensively empirically validated. The actual theory lattice QCD uses (QCD) has been empirically validated at high energy. As such, we can believe the equations of QCD represent some aspect of the world around us, and therefore simulations using them are a potential source of understanding that world. Here's a graphic representing this argument:


Of course, the lattice data could disagree with observations. In that case we'd likely try to understand the error in the assumptions we made in order to produce tractable simulations or possibly limit the scope of QCD (e.g. QCD fails at low energy Q² < 1 Gev²).

The reason the concept of DSGE models as "experiments" is laughable is that it fails every step in this process:


Not only does the underlying framework (utility maximization) disagree with data in many cases, but even the final output of DSGE models also disagrees. The methodology isn't flawed — its execution is.

*  *  *

The whole dilettantes fracas is a good segue into something I've been meaning to write for awhile now about the sources of knowledge about the real world. I had an extended Twitter argument with Britonomist about whether having a good long think about a system is a scientific source of knowledge about that system (my take: it isn't).

Derivation

The discussion above represents a particular method of acquiring knowledge about the world around us that I'll call derivation for obvious reasons. Derivation is a logical inference tool: it takes empirically mathematical descriptions of some piece of reality and attempts to "derive" new knowledge about the world. In the case of lattice QCD, we derive some knowledge about the vacuum state based on the empirical success of quantum field theory (math) used to describe e.g. magnetic moments and deep inelastic scattering. The key here is understanding the model under one scope condition well enough that you can motivate its application to others. Derivation uses the empirical validity of the mathematical framework as its source of scientific validity.

Observation

Another method is the use of controlled experiments and observation. This is what a lot of people think science is, and it's how it's taught in schools. Controlled experiments can give us information about causality, but one of the key things all forms of observation do is constrain the complexity of what the underlying theory can be through what is sometimes derided as "curve fitting" (regressions). Controlled experiments and observation mostly exclude potential mathematical forms that could be used to describe the data. A wonderful example of this is blackbody radiation in physics. The original experiments basically excluded various simple computations based on Newton's mechanics and Maxwell's electrodynamics. Fitting the blackbody radiation spectrum curve with functional forms of decreasing complexity ultimately led to Planck's single parameter formula that paved the way for quantum mechanics. The key assumption here is essentially Hume's uniformity of nature to varying degrees depending on the degree of control in the experiment. Observation uses its direct connection to empirical reality as its source of scientific validity.
 
Indifference

A third method is the application of the so-called "principle of indifference" that forms the basis of staticial mechanics in physics and is codified in various "maximum entropy" approaches (such as the one used in the blog). We as theorists plead ignorance of what is "really happening" and just assume what we observe is the most likely configuration of many constituents given various constraints (observational or theoretical). Roderick Dewar has a nice paper explaining how this process is a method of inference giving us knowledge about the world, and not just additional assumptions in a derivation. As mentioned the best example is statistical mechanics: Boltzmann assumed simply that there were lots of atoms underlying matter (which was unknown at the time) and used probability to make conclusions about the most likely states — setting up a framework that accurately describes thermodynamic processes. The key assumption here is that the number of underlying degrees of freedom is large (making our probabilistic conclusions sharper), and "indifference" uses the empirical accuracy of its conclusions as the source of its scientific validity.
 
Other paths?

This list isn't meant to be exhaustive, and there are probably other (yet undiscovered!) paths to scientific validity. The main conclusion here is that empirical validity in some capacity is necessary to achieve scientific validity. Philosophizing about a system may well be fun and lead to convincing plausible stories about how that system behaves. And that philosophy might be useful for e.g. making decisions in the face of an uncertain future. But regardless of how logical it is, it does not produce scientific knowledge about the world around us. At best it produces rational results, not scientific ones.

In a sense, it's true that the descriptions above form a specific philosophy of science, but they're also empirically tested methodologies. They're the methodologies that have been used in the past to derive accurate representations of how the world around us works at a fundamental physical level. It is possible that economists (including Christiano et al) have come up with another path to knowledge about the world around us where you can make invalid but prima facie sensible assumptions about how things work and derive conclusions, but it isn't a scientific one.

...

Footnotes:

[1] Actually, the problem seems misidentified in a similar way that Friedman's "as if" methodology is misidentified: the idea is fine (in science it is called "effective theory"), but the application is flawed. Friedman seemed to first say matching the data is what matters (yes!), but then didn't seem to care when preferred theories didn't match data (gah!).

Friday, November 17, 2017

The "bottom up" inflation fallacy

Tony Yates has a nice succinct post from a couple of years ago about the "bottom up inflation fallacy" (brought up in my Twitter feed by Nick Rowe):
This inflation is caused by the sum of its parts problem rears its head every time new inflation data gets released. Where we can read that inflation was ’caused’ by the prices that went up, and inhibited by the prices that went down.
I wouldn't necessarily attribute the forces that make this fallacy a fallacy to the central bank as Tony does — at the very least, if central banks can control inflation, why are many countries (US, Japan, Canada) persistently undershooting their stated or implicit targets? But you don't really need a mechanism to understand this fallacy, because it's actually a fallacy of general reasoning. If we look at the components of inflation for the US (data from here), we can see various components rising and falling:


While the individual components move around a lot, the distribution remains roughly stable — except for the case of the 2008-9 recession (see more here). It's a bit easier to see the stability using some data from MIT's billion price project. We can think of the "stable" distribution as representing a macroeconomic equilibrium (and the recession being a non-equilibrium process). But even without that interpretation, the fact that an individual price moves still tells us almost nothing about the other prices in the distribution if that distribution is constant. And it's definitely not a causal explanation.

It does seem to us as humans that if there is something maintaining that distribution (central banks per Tony), then an excursion by one price (oil) is being offset by another (clothing) in order to maintain that distribution. However, there does not have to be any force acting to do so.

For example, if the distribution is a maximum entropy distribution then the distribution is maintained simply by the fact that it is the most likely distribution (consistent with constraints). In the same way it is unlikely that all the air molecules in your room will move to one side of it, it is just unlikely that all the prices will move in one direction — but they easily could. For molecules, that probability is tiny because there are huge numbers of them. For prices, that probability is not as negligible. In physics, the pseudo-force "causing" the molecules to maintain their distribution is called an entropic force. Molecules that make up a smell of cooking bacon will spread around a room in a way that looks like they're being pushed away from their source, but there is no force on the individual molecules making that happen. There is a macro pseudo-force (diffusion), but there is no micro force corresponding to it.

I've speculated that this general idea is involved in so-called sticky prices in macroeconomics. Macro mechanisms like Calvo prices are in fact just effective descriptions at the macro scale, and therefore studies that look at individual prices (e.g. Eichenbaum et al 2008) will not see stick prices.

In a sense, yes, macro inflation is due to the price movements of thousands of individual prices. And it is entirely possible that you could build a model where specific prices offset each other via causal forces. But you don't have to and there exist ways of constructing a model where there isn't necessarily any way to match up the macro inflation with specific individual changes because macro inflation is about the distribution of all price changes. That's why I say the "bottom up" fallacy is a fallacy of general reasoning, not just a fallacy according to the way economists understand inflation today: it assumes a peculiar model. And as Tony tells us, that's not a standard macroeconomic model (which is based on central banks setting e.g. inflation targets).

You can even take this a bit further and argue against the position that microfoundations are necessary for a macroeconomic model. It is entirely possible for macroeconomic forces to exist for which there are no microeconomic analogs. Sticky prices are a possibility; Phillips curves are another. In fact, even rational representative agents might not exist at the scale of human beings, but could be a perfectly plausible effective degrees of freedom at the macro scale (per Becker 1962 "Irrational Behavior and Economic Theory", which I use as the central theme in my book).

Thursday, November 16, 2017

Unemployment rate step response over time

One of the interesting effects I noticed in looking at the unemployment rate in early recessions with the dynamic equilibrium model was what looked like "overshooting" (step response "ringing" transients). For fun, I thought I'd try to model the recession responses using a simple "two pole" model (second order low pass system).

For example, here is the log-linear transformation of the unemployment rate that minimizes entropy:


If we zoom in on one of the recessions in the 1950s, we can fit it to the step response:


I then fit several more recessions. Transforming back to the original data representation (unemployment rate in percent), and compiling the results:


Overall, this was just a curve fitting exercise. However, what was interesting were the parameters over time. These graphs show the frequency parameter ⍵ and the damping parameter ζ:


Over time, the frequency falls and the damping increases. We can also show the damped frequency which is a particular combination of the two (this is the frequency that we'd actually estimate from looking directly at the oscillations in the plot):


With the exception of the 1970 recession, this shows a roughly constant fairly high frequency that falls after the 1980s to a lower roughly constant frequency.

At this point, this is just a series of observations. This model adds far too many parameters to really be informative (for e.g. forecasting). What is interesting is that the step response in physics results from a sharp shock hitting a system with a band-limited response (i.e. the system cannot support all the high frequencies present in the sharp shock). This would make sense — in order to support higher frequencies, you'd probably have to have people entering and leaving jobs at rates close to monthly or even weekly. While some people might take a job for a month and quit, they likely don't make up the bulk of the labor force. This doesn't really reveal any deep properties of the system, but it does show how unemployment might well behave like a natural process (contra many suggestions e.g. that it is definitively a social process that cannot be understood in terms of mindless atoms or mathematics).

Wednesday, November 15, 2017

New CPI data and forecast horizons

New CPI data is out, and here is the "headline" CPI model last updated a couple months ago:


I did change the error bar on the derivative data to show the 1-sigma errors instead of the median error in the last update. The level forecast still shows the 90% confidence for the parameter estimates. 

Now why wasn't I invited to this? One of the talks was on forecasting horizons:
How far can we forecast? Statistical tests of the predictive content
Presenter: Malte Knueppel(Bundesbank)
Coauthor: Jörg Breitung
A version of the talk appears here [pdf]. One of the measures they look at is year-over-year CPI, which according to their research seems to have a forecast horizon of 3 quarters — relative to a stationary ergodic process. The dynamic equilibrium model is approaching 4 quarters:


The thing is, however, the way the authors define whether the data is uninformative is relative to a "naïve forecast" that's constant. The dynamic equilibrium forecast does have a few shocks — one centered at 1977.7 associated with the demographic transition of women entering the workforce, and one centered at 2015.1 I've tentatively associated with baby boomers leaving the workforce [0] after the Great Recession (the one visible above) [1]. But for the period from the mid-90s after the 70s shock ends until the start of the Great Recession would in fact be this "naïve forecast":


The post-recession period does involve a non-trivial (i.e. not constant) forecast, so it could be "informative" in the sense of the authors above. We will see if it continues to be accurate beyond their forecast horizon. 

...

Footnotes

[0] Part of the reason for this shock to posited is its existence in other time series.

[1] In the model, there is a third significant negative shock centered at 1960.8 associated with a general slowdown in the prime age civilian labor force participation rate. I have no firm evidence of what caused this, but I'd speculate it could be about women leaving the workforce in the immediate post-war period (the 1950s-60s "nuclear family" presented in propaganda advertising) and/or the big increase in graduate school attendance.

Friday, November 10, 2017

Why k = 2?

I put up my macro and ensembles slides as a "Twitter talk" (Twalk™?) yesterday and it reminded me of something that has always bothered me since the early days of this blog: Why does the "quantity theory of money" follow from the information equilibrium relationship N M for information transfer index k = 2?

From the information equilibrium relationship, we can show log N ~ k log M and therefore log P ~ (k − 1) log M. This means that for k = 2 

log P ~ log M

That is to say the rate of inflation is equal to the rate of money growth for k = 2. Of course, this is only empirically true for high rates of inflation:


But why k = 2? It seems completely arbitrary. In fact, it is so arbitrary that we shouldn't really expect the high inflation limit to obey it. The information equilibrium model allows all positive values of k. Why does it choose k = 2? What is making it happen?

I do not have a really good reason. However, I do have some intuition.

One of the concepts in physics that the information equilibrium approach is related to is diffusion. In that case, most values of k represent "anomalous diffusion". But ordinary diffusion with a Wiener process (a random walk based on a normal distribution) results in diffusion where the distance traveled goes as the square root of the time step σ ~ √t. That square root arises from the normal distribution, which is in fact a universal distribution (there's a central limit theorem for distributions that converge to it). Another way: 

2 log σ ~ log t

is an information equilibrium relationship t σ with k = 2.

If we think of output as a diffusion process (distance is money, time is output), we can say that in the limit of a large number of steps, we obtain

2 log M ~ log N

as a diffusion process, which implies log P ~ log M.

Of course, there are some issues with this besides it being hand-waving. For one, output is the independent variable corresponding to time. This does not reproduce the usual intuition that money should be causing the inflation, but rather the reverse (the spread of molecules in diffusion is not causing time to go forward [1]). But then applying the intuition from a physical process to an economic one via an analogy is not always useful.

I tried to see if it came out of some assumptions about money M mediating between nominal output N and aggregate supply S, i.e. the relationship

N M S

But aside from figuring out that if the IT index k in the first half is k = 2 (per above), then the IT index k' for M S would have to be 1 + φ or 2 − φ where φ is the golden ratio in order for the equations to be consistent. The latter value k' = 2 − φ ≈ 0.38 implies that the IT index for N ⇄ S is k k' ≈ 0.76, while the former implies k k' ≈ 5.24. But that's not important right now. It doesn't tell us why k = 2.

Another place to look would be the symmetry properties of the information equilibrium relationship, but k = 2 doesn't seem to be anything special there.

I thought I'd blog about this because it gives you a bit of insight as to how physicists (or at least this particular physicist) tend to think about problems — as well as point out flaws (i.e. ad hoc nature) in the information equilibrium approach to the quantity theory of money/AD-AS model in the aforementioned slides. I'd also welcome any ideas in comments.

...

Footnotes:

[1] Added in update. You could make a case for the "thermodynamic arrow of time", in which case the increase in entropy is actually equivalent to "time going forward".

Interest rates and dynamic equilibrium

What if we combine an information equilibrium relationship A ⇄ B with a dynamic information equilibrium description of the inputs A and B? Say, the interest rate model (described here) with dynamic equilibrium for investment and the monetary base? Turns out that it's interesting:



The first graph is the long term (10-year) rate and the second is the short term (3 month secondary market) rate. Green is the information equilibrium model alone (i.e. the data as input), while the gray curves show the result if we use the dynamic equilibria for GPDI and AMBSL (or CURRSL) as input.

Here is the GPDI dynamic equilibrium description for completeness (the link above uses fixed private investment instead of gross private domestic investment which made for a better interest rate model):


Wednesday, November 8, 2017

A new Beveridge curve or, Science is Awesome

What follows is speculative, but it is also really cool. A tweet about how the unemployment rate would be higher if labor force participation was at its previous higher level intrigued me. Both the unemployment rate and labor force participation were pretty well described by the dynamic information equilibrium model. Additionally, if you have two variables obeying a dynamic equilibrium models, you end up with a Beveridge curve as the long run behavior if you plot them parametrically.

The first interesting discovery happened when I plotted out the two dynamic equilibrium models side by side:


The first thing to note is that the shocks to CLF [marked with red arrows, down for downward shocks, up for upward] are centered later, but are wider than the unemployment rate shocks [marked with green arrows]. This means that both shocks end up beginning at roughly the same time, but the CLF shock doesn't finish until later. In fact, this particular piece of information led me to notice that there was a small discrepancy in the data from 2015-2016 in the CLF model — there appears to be a small positive shock. A positive shock would be predicted by the positive shock to the unemployment rate in 2014! Sure enough, it turns out that adding a shock improves the agreement with the CLF data. Since the shock roughly coincides with the ending of the Great Recession shock, it would have otherwise been practically invisible.

Second, because the centers don't match up and the CLF shocks are wider, you need a really long period without a shock to observe a Beveridge curve. The shocks to vacancies and the unemployment rate are of comparable size and duration so that the Beveridge curve jumps right out. However the CLF/U Beveridge curve is practically invisible just looking at the data:


And without the dynamic equilibrium model, it would never be noticed because of a) the short periods between recessions, and b) the fact that most of the data before the 1990s contains a large demographic shock of women entering the workforce. This means that assuming there isn't another major demographic shock, a Beveridge curve-like relationship will appear in future data. You could count this as a prediction of the dynamic equilibrium model. As you can see, the curve is not terribly apparent in the post-1990s data (the dots represent the arrows in the earlier graph above):


[The gray lines indicate the "long run" relationship between the dynamic equilibria. The dotted lines indicate the behavior of data in the absence of shocks. As you can see, only small segments are unaffected by shocks (the 90s data at the beginning, and the 2017 data at the end).]

I thought the illumination of the small positive shock to CLF 2015-2016 as well as the prediction of a future Beveridge curve like relationship between CLF and U were fascinating. Of course, they're both speculative conclusions. But if this is correct, then the tweet that set this all off is talking about a counterfactual world that couldn't exist: if CLF was higher, then we either had a different series of recessions or the unemployment rate would be lower. That is to say we can't move straight up and down (choosing a CLF) in the graph above without moving side to side (changing U).

[Added some description of the graphs in edit 9 Nov 2017.]

...

Update 9 November 2017

Here are the differences between the original prime age CLF participation forecast and the new "2016-shock" version:



Tuesday, November 7, 2017

Presentation: forecasting with information equilibrium

I've put together a draft presentation on information equilibrium and forecasting after presenting it earlier today as a "twitter talk". A pdf is available for download from my Google Drive as well. Below the fold are the slide images.



JOLTS data out today

Nothing definitive with the latest data — just a continuation of a correlated negative deviation from the model trend. The last update was here.


I also tried a "leading edge" counterfactual (replacing the logistic function by an an exponential approximation for time t << y₀ where y₀ is the transition year which is somewhat agnostic about the amplitude of the shock) and made an animation adding the post-forecast data one point at a time:


Essentially we're in the same place we were with the last update. I also updated the Beveridge curve with the latest data points: