A collection of articles & related content on information theory curated by Chris Aldrich.
I have uploaded a revised edition of my 2010 primer in Information Theory and Entropy.
You may also download a copy here (PDF).
In this paper, we present an innovative method for lossless compression of discrete-color images such as map images, graphics, GIS as well as binary images. This method comprises two main components. The first is a fixed-size codebook encompassing 8×8 bit blocks of two-tone data along with their corresponding Huffman codes and their relative probabilities of occurrence. The probabilities were obtained from a very large data set of two color images (binary) and are used for arithmetic coding. The second component is the row-column reduction coding, which will encode those blocks that are not in the codebook. The proposed method has been successfully applied on two major image categories: (i) images with a predetermined number of discrete colors such as digital maps, graphs, and GIS images; and (ii) binary images. The results show that our method compresses images from both categories (discrete color and binary images) with 90% in most case and higher than the JBIG-2 by 5% to 20% for binary images, and by 2% to 6.3% for discrete color images on average.
Around September 2011, I decided to replace my existing system of personal organization with a new, much-publicized system known colloquially as GTD for Getting Things Done, developed by David Allen over a decade ago.
In GTD, Allen describes a personal organization system which purportedly increases productivity in a stress-free manner. (I’ll succinctly summarize the main points of that system below.) I write “purportedly” because it is in my nature to almost always assume a skeptical position when reading and examining a particular matter. Allen’s writing is based on an experientialist approach wherein the phrase “trust me because I’ve seen it” seems to be clued at incessantly throughout the flow of the book. (In fact, at some point, Allen does write: “…but in the meantime, trust me.”) Thus, I wondered whether there could be any scientific (neuro-psychological) basis behind the foundational claims of superiority of such an organization system.
I didn’t find a plausible answer to my question until I came across Mihaly Csikszentmihalyi‘s seminal book “Flow.” It is my intention to elucidate on that answer in this post. First, I’ll illustrate Allen’s GTD system along its structural points. Next, I’ll attempt to use some psychological results from Csikszentmihalyi’s work to rationalize to a degree some, if not all, of Allen’s points.
To motivate the reader into his system, Allen introduces the concept of “incompletes” or “open loops” (pp. 12–15), which entail essentially “stuff” (thoughts, ideas, intentions, created or discovered information) that one has in mind that is occupying one’s mind. This stuff (or, commitments) need to be thoroughly identified (for one may not be fully aware of all of them) and manifested into some sort of tangible system (electronic or physical) so that one may then take the necessary action(s). Thus, the basic principle behind the GTD framework is to corral all the stuff that bugs your mind and decide whether and how to act on it. The trick to effectual organization is to keep reminders about such actions and consult (review) them frequently so one doesn’t miss doing any “incompletes.”
Furthermore, on pages 15–17, Allen discusses why “incompletes” are on one’s mind. In particular, he states (p. 16):
Until those thoughts [incompletes] have been clarified and those decisions [to act on them] made, and the resulting data has been stored in a system that you absolutely know you will think about as often as you need to, your brain can’t give up the job.
It is precisely the latter statement that I intend to rationalize moving forward.
Allen’s Getting Things Done (GTD) system consists of five phases:
- Collect. One must always collect everything that is deemed as an incomplete in one’s mind. Every thought, desire, intention, extravagant idea or opinion that has a “…’should,’ ‘need to,’ or ‘ought to’ attached to it” (p. 26) needs to be expelled from the mind and written down somewhere outside the mind. On page 29, Allen allegorically states that one’s mind is like a computer’s RAM — it can only hold and process that much. Thus, one needs to “get it all out of one’s head.” Naturally, getting them out is not sufficient and efficient unless one processes them, which brings about the next phase.
- Process. Now that one has their stuff out of their mind, one needs to process what each elicited item means. Allen provides a nice, intuitive flowchart of this “algorithm” of his on page 32. In simple terms it works as follows: Take an item from your system and ask yourself what you can do about it. That is, what’s the next physical, unitary action you can take to do that item? If there is no action, trash it or incubate it for prospective use. If there is an action, determine what it is and if it takes less than 2-3 minutes, do it; otherwise, put a reminder somewhere in your system to do it in due time. How does one organize reminders, though? That’s the next phase.
- Organize. Here you want to separate actionable from non-actionable items. The latter can either be trashed if they have no value or incubated/referenced otherwise. The referencing should be general (e.g. file-folder-based). Actionable items posit a gimmick: if an incomplete you’ve taken out of your mind takes more than one action step to complete, then Allen refers to it as a project. Thus, you can’t “do” a project, but you can only “do” the actions associated with that project. For instance, if item “Save the world before I die” was on your mind and you took it out, it most certainly needs more than an action step and is, therefore, a project. Allen advises one to keep projects in a separate list and the actions associated with them in another list for frequent reviewing, which yields to the fourth phase.
- Review. Now that you’ve organized your stuff, you must review them so that you actually do them. Reviewing starts from the most exigent element — the calendar — and proceeds with the next actions that you intend to do. Also, a weekly review of all you’ve got (including your list of projects) is essential to keep your system up and running.
- Do. The final phase of Allen’s GTD system is to actually do things, mainly based on your intuition, but also on the contextualization of your settings. For instance, if you’re on a plane using your laptop, you’ll most probably do things that can be done only on your laptop (e.g. email, write a poem, etc.).
The rest of the book goes at length to describe the actual implementation of this system and to get projects going through several tricks and tips based on the author’s expertise as well as on other practical knowledge.
Now, if you look back at the most critical point of this five-phase system, the first point, you might naturally ask: Why is it that one needs to get things done by first getting them out of their minds? That is, why won’t the brain give up the job? Why would one want to free up one’s psyche of stuff that bugs one? What’s the rationale behind this? Should I just trust the author and invest my time in implementing a system of personal organization with utter skepticism and risk to end up not leveraging it? These questions were sitting at the back of mind as I was reading and implementing Allen’s GTD system. These questions were the only “open loops” that thwarted me from assembling the system back when I finished reading the book months ago. But I answered them comfortably as I was reading Csikszentmihalyi’s “Flow” and I can reassure you that after implementing GTD with less skepticism it does work smoothly, overall. Here’s why.
The essence of Csikszentmihalyi’s “Flow” is about positive human experiences that enable one’s life to flow, i.e. to be exhilarating, vivacious, enjoying life, and so forth. Csikszentmihalyi (whose name sesquipedalophobics might want to eschew pronouncing) provides rigorous psychological arguments (based on decades-long research) on what flow is and how to attain and preserve it using a phenomenological, rather than anatomical/ biochemical, approach to the functioning of the nervous system and the mind.
After revisiting happiness in the first chapter, Csikszentmihalyi elucidates on the anatomy and limitations of consciousness interestingly in light of the information-theoretic concept of entropy (p. 25). On page 24 he writes:
A person can make himself happy, or miserable, regardless of what is actually happening ‘outside,’ just by changing the contents of consciousness. […] To develop this trait, one must find ways to order consciousness so as to be in control of feelings and thoughts.
According to the author, information theory is relevant to apprehend what goes on in one’s mind through such a model of consciousness. This could be a valid undertaking because after all: “The function of consciousness is to represent information about what is happening outside and inside the organism in such a way that it can be evaluated and acted upon by the body” (p. 24). As such, consciousness may be construed as the central “clearinghouse” which handles sensations, thoughts, intentions, desires, etc., noting that intentions are what provides and keeps order in the consciousness (p. 27). And these intentions are organized in a hierarchy of goals/ outcomes (p. 28). Hence, we can freely control which outcome to pursue based on our subjective prioritization of objectives. In David Allen’s sense, these outcomes could be viewed as project outcomes, but also as the next actions to take to do whatever incomplete has been collected.
That being stated, consciousness has indeed a major limitation: it can process only that much information. The empirically observed bound of the amount of information consciousness can process at a time is at most 7 bits. According to Csikszentmihalyi (and, independently, Robert Lucky), the mind can discriminate these 7-bit chunks at every 1/18th of a second, which implies that an average human can process up to 120 bits of information per second. If we assume that a human is awake (“conscious”) for 16 hours a day and lives on average 70 years, the lifelong amount of information the mind can maximally process is roughly 185 billion bits (p. 29). Pretty fascinating!
To understand what that limit on the mind’s information-processing bandwidth implies think of the following scenario. If it takes a person 60 bits of information to comprehend what another person is saying, then theoretically one could be listening to two persons at the same time and understand both of them readily, since the total processing power of the mind is 120 bits (per second). Yet, we all have experienced such a situation and we all are aware of the practical impossibility of such an event. The reason lies in the fact that one’s consciousness is already occupied with existing thoughts, feelings, “incompletes” or “open loops”. Consequentially, if one liberates one’s mind by controlling one’s consciousness, then one could achieve flow. It exactly this conclusion that supports the ostensibly efficient system of personal organization put forth by Allen. In other words, if one frees up one’s mind by means of controlling consciousness (the processing unit of information), then one gives it the chance to process even more information (discovered or created), which one wouldn’t otherwise process. In Csikszentmihalyi’s words, what Allen refers to as “incompletes” are, as a matter of fact, “things that occupy the mind and reduce its capacity of processing information” (p. 30).
This brings about the discussion on attention, whose intentional ordering of consciousness averts increased (psychic) entropy, thus reducing the chances of yielding disorder in the mind. On page 31 we read:
It is attention that selects the relevant bits of information from the potential millions of bits available. It takes attention to retrieve the appropriate references from memory, to evaluate the event, and then to choose the right thing to do.
Hence, learning where to direct attention implies controlling one’s consciousness. That is to say, one’s personal productivity could be flowing incessantly if one induces order in one’s consciousness. This is exactly what David Allen alludes to throughout his GTD book. (“The Art of Stress-Free Productivity” is, in fact, the book’s subtitle.)
In all, “The mark of a person who is in control of consciousness is the ability to focus attention at will, to be oblivious to distractions, to concentrate for as long as it takes to achieve a goal, and not longer. And the person who can do this usually enjoys the normal course of everyday life” (Flow, p. 31). If we can at least control what we want to get out of our brains and efficiently manage it through a working, actionable organization system, we should attain not only flow, but also professional productivity. Based on my preliminary evaluation of the GTD system, this has been indeed the case in my professional and academic life.
The main point to remember here is this: The reason why Allen claims you want to get things out of your brain is what Csikszentmihalyi concludes: to free up your mind so that the information-processing power of your consciousness increases. If you (learn to) control your consciousness by steering attention attentively, then you bring about order in your mind (increased psychic energy). And, thereafter, flow comes into play and psychic entropy dissipates. Flow, in turn, provides for a stress-free life, which consequentially yields efficient personal productivity through an effectively implemented organization system, such as Getting Things Done. But the prime step to take is to corral “incompletes” in one bucket, whence processing, organization, reviewing, and doing follow. I highly recommend the reader studies both books, especially Flow.
To make it fun (!), I decided to compute the ICSE ’12 information entropy off of the tag cloud. Here’s the fun I experienced:
First, I assigned a relative weight to the tags based merely on the visual perception of their font size, which in a tag cloud implies “gravity” or “importance”. For instance, the word ‘code’ got a relative weight of 85 whereas the word ‘pattern’ a relative weight of 9. The total weight for the 149 tags was 2387.5.
Next, I computed the empirical probabilities of each tag by using a frequency approach: the probability of a tag equals its relative weight over 2387.5, the total weight. It turns out the tags appear to follow a power-law distribution:
Having the probability distribution, I could compute the Shannon entropy, which turned out to be 7.01 bits per tag. That is, in principle, you’d need on average 7.01 bits to encode each tag in the tag cloud. Note that other constructs (such as articles, verbs, etc.) are excluded from the software engineering context. If one were to include those, then, no worries — the entropy of English language, especially in the academia, is probably less than 1 bit per character! Naturally, a more accurate first-order entropy can be computed if the count for each word is provided. That way, the empirical probabilities would be derived directly by dividing each word count with the total number of words (1,373,011). The perceived entropy value of 7.01 bits/ tag as well as the more accurate entropy value could be helpful to determine the lossless compression terminus of the 393 MB ICSE Proceedings archive! But, hey, bring it all to Canada — no need for data compression in our land.
Here’s the raw data, for reference:
I’ve been following the news on the upcoming Facebook IPO. Most analysts and news makers describe the trading process as betting (in the sense of gambling). Here‘s an example. Yet, insightful financiers and economists would agree that trading and investing are not random processes. Nor is the market efficient. Where does the market thus stand?
Malkiel would have us believe that the market is a process entirely modeled by random walks. This is the Random Walk Theory (RWT) of the markets. That is, prices are randomly generated because ‘true’ news is entirely unpredictable and, thus, entirely random. On the other side of the coin, Fama promotes the hypothesis (and later an elaborated theory) that the markets are entirely efficient, i.e. informationally predictable. This is the Efficient Market Theory (EMT). Both diagonally opposing views are, nonetheless, cynical, as writer and investor Thomsett has put it. In light of Information Theory and its ultimate power at quantifying and perspicaciously studying information, I’d argue pretty much that far from efficient or random, the market is entropic.
I’ve written an introductory article to information theory and entropy located here. At a glance, entropy measures the uncertainty in a system or process of a number of outcomes (discrete or not). If the experiment has three outcomes (e.g. the price could be $3, 10 or 18), then these price levels are associated with probabilities, i.e. the probability of getting a price equal to $3 is 45%, etc. Entropy just averages the expected price. If the outcomes are equiprobable, i.e. all three prices have a 1/3 chance of popping up, then there is maximum uncertainty because one just can’t predict which of the three prices is going to pop up next. This latter view conforms with that of the random walk theory, according to which true information cannot be predicted and thus the market actors cannot know (or predict) what price levels to expect next. On the other hand, if it is almost sure that one can predict true news (information), then one would have no problem expecting price levels, given the almost total confidence in the probabilities. This is the view promoted by the efficient market theory. Both view are academically appealing, but not enough realistic to be embraced by the field actors. The latter embrace the entropic view of the market — though not cognizant of what it means — that I’m alluding to here.
The entropic perspective is simple: When one carries out fundamental and technical analyses, the average information one gains should increase. In other words, there shall be an average decrease in uncertainty, or entropy, in the system. Thus, the decision a trader or investor makes is indeed based on existing information (financial indicators in company reports, price patterns on charts, etc.) as well as possible future information (insider hints, unpublished news, etc.). As such, the predictability power increases because entropy decreases. It’s thus not truly random, but not yet efficient. It’s in between the spectrum. It’s thus entropic. Here’s a simple conceptualization of what I’m babbling on:
According to the entropic view of the market, if new information comes in and it is surprising, then the entropy tends to increase because the number of factors that affect the outcomes in the system increases. If the new information is expected, such as Greece’s exit from the Eurozone, then it won’t have much entropy (thus randomness) in current price level trends, although other factors might. In the long run, however, the system is less entropic, as moving averages reveal.
The main lesson to be learned by the entropic view of the market: Do your homework when investing! Read, read, read more and study, in order to decrease your uncertainty (which is also tantamount to ignorance in the naive sense).