This is an article off my M.Sc. thesis published in 2015 in the rather serious venue IEEE Transactions on Image Processing (PDF here):

In this paper, we present an innovative method for lossless compression of discrete-color images such as map images, graphics, GIS as well as binary images. This method comprises two main components. The first is a fixed-size codebook encompassing 8×8 bit blocks of two-tone data along with their corresponding Huffman codes and their relative probabilities of occurrence. The probabilities were obtained from a very large data set of two color images (binary) and are used for arithmetic coding. The second component is the row-column reduction coding, which will encode those blocks that are not in the codebook. The proposed method has been successfully applied on two major image categories: (i) images with a predetermined number of discrete colors such as digital maps, graphs, and GIS images; and (ii) binary images. The results show that our method compresses images from both categories (discrete color and binary images) with 90% in most case and higher than the JBIG-2 by 5% to 20% for binary images, and by 2% to 6.3% for discrete color images on average.

Around September 2011, I decided to replace my existing system of personal organization with a new, much-publicized system known colloquially as GTD for Getting Things Done, developed by David Allen over a decade ago.

In GTD, Allen describes a personal organization system which purportedly increases productivity in a stress-free manner. (I’ll succinctly summarize the main points of that system below.) I write “purportedly” because it is in my nature to almost always assume a skeptical position when reading and examining a particular matter. Allen’s writing is based on an experientialist approach wherein the phrase “trust me because I’ve seen it” seems to be clued at incessantly throughout the flow of the book. (In fact, at some point, Allen does write: “…but in the meantime, trust me.”) Thus, I wondered whether there could be any scientific (neuro-psychological) basis behind the foundational claims of superiority of such an organization system.

I didn’t find a plausible answer to my question until I came across Mihaly Csikszentmihalyi‘s seminal book “Flow.” It is my intention to elucidate on that answer in this post. First, I’ll illustrate Allen’s GTD system along its structural points. Next, I’ll attempt to use some psychological results from Csikszentmihalyi’s work to rationalize to a degree some, if not all, of Allen’s points.

To motivate the reader into his system, Allen introduces the concept of “incompletes” or “open loops” (pp. 12–15), which entail essentially “stuff” (thoughts, ideas, intentions, created or discovered information) that one has in mind that is occupying one’s mind. This stuff (or, commitments) need to be thoroughly identified (for one may not be fully aware of all of them) and manifested into some sort of tangible system (electronic or physical) so that one may then take the necessary action(s). Thus, the basic principle behind the GTD framework is to corral all the stuff that bugs your mind and decide whether and how to act on it. The trick to effectual organization is to keep reminders about such actions and consult (review) them frequently so one doesn’t miss doing any “incompletes.”

Furthermore, on pages 15–17, Allen discusses why “incompletes” are on one’s mind. In particular, he states (p. 16):

Until those thoughts [incompletes] have been clarified and those decisions [to act on them] made, and the resulting data has been stored in a system that you absolutely know you will think about as often as you need to, your brain can’t give up the job.

It is precisely the latter statement that I intend to rationalize moving forward.

Allen’s Getting Things Done (GTD) system consists of five phases:

  1. Collect. One must always collect everything that is deemed as an incomplete in one’s mind. Every thought, desire, intention, extravagant idea or opinion that has a “…’should,’ ‘need to,’ or ‘ought to’ attached to it” (p. 26) needs to be expelled from the mind and written down somewhere outside the mind. On page 29, Allen allegorically states that one’s mind is like a computer’s RAM — it can only hold and process that much. Thus, one needs to “get it all out of one’s head.” Naturally, getting them out is not sufficient and efficient unless one processes them, which brings about the next phase.
  2. Process. Now that one has their stuff out of their mind, one needs to process what each elicited item means. Allen provides a nice, intuitive flowchart of this “algorithm” of his on page 32. In simple terms it works as follows: Take an item from your system and ask yourself what you can do about it. That is, what’s the next physical, unitary action you can take to do that item? If there is no action, trash it or incubate it for prospective use. If there is an action, determine what it is and if it takes less than 2-3 minutes, do it; otherwise, put a reminder somewhere in your system to do it in due time. How does one organize reminders, though? That’s the next phase.
  3. Organize. Here you want to separate actionable from non-actionable items. The latter can either be trashed if they have no value or incubated/referenced otherwise. The referencing should be general (e.g. file-folder-based). Actionable items posit a gimmick: if an incomplete you’ve taken out of your mind takes more than one action step to complete, then Allen refers to it as a project. Thus, you can’t “do”  a project, but you can only “do” the actions associated with that project. For instance, if item “Save the world before I die” was on your mind and you took it out, it most certainly needs more than an action step and is, therefore, a project. Allen advises one to keep projects in a separate list and the actions associated with them in another list for frequent reviewing, which yields to the fourth phase.
  4. Review. Now that you’ve organized your stuff, you must review them so that you actually do them. Reviewing starts from the most exigent element — the calendar — and proceeds with the next actions that you intend to do. Also, a weekly review of all you’ve got (including your list of projects) is essential to keep your system up and running.
  5. Do. The final phase of Allen’s GTD system is to actually do things, mainly based on your intuition, but also on the contextualization of your settings. For instance, if you’re on a plane using your laptop, you’ll most probably do things that can be done only on your laptop (e.g. email, write a poem, etc.).

The rest of the book goes at length to describe the actual implementation of this system and to get projects going through several tricks and tips based on the author’s expertise as well as on other practical knowledge.

Now, if you look back at the most critical point of this five-phase system, the first point, you might naturally ask: Why is it that one needs to get things done by first getting them out of their minds? That is, why won’t the brain give up the job? Why would one want to free up one’s psyche of stuff that bugs one? What’s the rationale behind this? Should I just trust the author and invest my time in implementing a system of personal organization with utter skepticism and risk to end up not leveraging it? These questions were sitting at the back of mind as I was reading and implementing Allen’s GTD system. These questions were the only “open loops” that thwarted me from assembling the system back when I finished reading the book months ago. But I answered them comfortably as I was reading Csikszentmihalyi’s “Flow” and I can reassure you that after implementing GTD with less skepticism it does work smoothly, overall. Here’s why.

The essence of Csikszentmihalyi’s “Flow” is about positive human experiences that enable one’s life to flow, i.e. to be exhilarating, vivacious, enjoying life, and so forth. Csikszentmihalyi (whose name sesquipedalophobics might want to eschew pronouncing) provides rigorous psychological arguments (based on decades-long research) on what flow is and how to attain and preserve it using a phenomenological, rather than anatomical/ biochemical, approach to the functioning of the nervous system and the mind.

After revisiting happiness in the first chapter, Csikszentmihalyi elucidates on the anatomy and limitations of consciousness interestingly in light of the information-theoretic concept of entropy (p. 25). On page 24 he writes:

A person can make himself happy, or miserable, regardless of what is actually happening ‘outside,’ just by changing the contents of consciousness. […] To develop this trait, one must find ways to order consciousness so as to be in control of feelings and thoughts.

According to the author, information theory is relevant to apprehend what goes on in one’s mind through such a model of consciousness. This could be a valid undertaking because after all: “The function of consciousness is to represent information about what is happening outside and inside the organism in such a way that it can be evaluated and acted upon by the body” (p. 24). As such, consciousness may be construed as the central “clearinghouse” which handles sensations, thoughts, intentions, desires, etc., noting that intentions are what provides and keeps order in the consciousness (p. 27). And these intentions are organized in a hierarchy of goals/ outcomes (p. 28). Hence, we can freely control which outcome to pursue based on our subjective prioritization of objectives. In David Allen’s sense, these outcomes could be viewed as project outcomes, but also as the next actions to take to do whatever incomplete has been collected.

That being stated, consciousness has indeed a major limitation: it can process only that much information. The empirically observed bound of the amount of information consciousness can process at a time is at most 7 bits. According to Csikszentmihalyi (and, independently, Robert Lucky), the mind can discriminate these 7-bit chunks at every 1/18th of a second, which implies that an average human can process up to 120 bits of information per second. If we assume that a human is awake (“conscious”) for 16 hours a day and lives on average 70 years, the lifelong amount of information the mind can maximally process is roughly 185 billion bits (p. 29). Pretty fascinating!

To understand what that limit on the mind’s information-processing bandwidth implies think of the following scenario. If it takes a person 60 bits of information to comprehend what another person is saying, then theoretically one could be listening to two persons at the same time and understand both of them readily, since the total processing power of the mind is 120 bits (per second). Yet, we all have experienced such a situation and we all are aware of the practical impossibility of such an event. The reason lies in the fact that one’s consciousness is already occupied with existing thoughts, feelings, “incompletes” or “open loops”. Consequentially, if one liberates one’s mind by controlling one’s consciousness, then one could achieve flow. It exactly this conclusion that supports the ostensibly efficient system of personal organization put forth by Allen. In other words, if one frees up one’s mind by means of controlling consciousness (the processing unit of information), then one gives it the chance to process even more information (discovered or created), which one wouldn’t otherwise process. In Csikszentmihalyi’s words, what Allen refers to as “incompletes” are, as a matter of fact, “things that occupy the mind and reduce its capacity of processing information” (p. 30).

This brings about the discussion on attention, whose intentional ordering of consciousness averts increased (psychic) entropy, thus reducing the chances of yielding disorder in the mind. On page 31 we read:

It is attention that selects the relevant bits of information from the potential millions of bits available. It takes attention to retrieve the appropriate references from memory, to evaluate the event, and then to choose the right thing to do.

Hence, learning where to direct attention implies controlling one’s consciousness. That is to say, one’s personal productivity could be flowing incessantly if one induces order in one’s consciousness. This is exactly what David Allen alludes to throughout his GTD book. (“The Art of Stress-Free Productivity” is, in fact, the book’s subtitle.)

In all, “The mark of a person who is in control of consciousness is the ability to focus attention at will, to be oblivious to distractions, to concentrate for as long as it takes to achieve a goal, and not longer. And the person who can do this usually enjoys the normal course of everyday life” (Flow, p. 31). If we can at least control what we want to get out of our brains and efficiently manage it through a working, actionable organization system, we should attain not only flow, but also professional productivity. Based on my preliminary evaluation of the GTD system, this has been indeed the case in my professional and academic life.

The main point to remember here is this: The reason why Allen claims you want to get things out of your brain is  what Csikszentmihalyi concludes: to free up your mind so that the information-processing power of your consciousness increases. If you (learn to) control your consciousness by steering attention attentively, then you bring about order in your mind (increased psychic energy). And, thereafter, flow comes into play and psychic entropy dissipates. Flow, in turn, provides for a stress-free life, which consequentially yields efficient personal productivity through an effectively implemented organization system, such as Getting Things Done. But the prime step to take is to corral “incompletes” in one bucket, whence processing, organization, reviewing, and doing follow. I highly recommend the reader studies both books, especially Flow.


Revised 2015.11.07

Adrian Kuhn shared the following image of the ICSE ’12 tag cloud built from the 1,373,011 words sampled from the main conference submissions:

To make it fun (!), I decided to compute the ICSE ’12 information entropy off of the tag cloud. Here’s the fun I experienced:

First, I assigned a relative weight to the tags based merely on the visual perception of their font size, which in a tag cloud implies “gravity” or “importance”. For instance, the word ‘code’ got a relative weight of 85 whereas the word ‘pattern’ a relative weight of 9. The total weight for the 149 tags was 2387.5.

Next, I computed the empirical probabilities of each tag by using a frequency approach: the probability of a tag equals its relative weight over 2387.5, the total weight. It turns out the tags appear to follow a power-law distribution:

Having the probability distribution, I could compute the Shannon entropy, which turned out to be 7.01 bits per tag. That is, in principle, you’d need on average 7.01 bits to encode each tag in the tag cloud. Note that other constructs (such as articles, verbs, etc.) are excluded from the software engineering context. If one were to include those, then, no worries — the entropy of English language, especially in the academia, is probably less than 1 bit per character! Naturally, a more accurate first-order entropy can be computed if the count for each word is provided. That way, the empirical probabilities would be derived directly by dividing each word count with the total number of words (1,373,011). The perceived entropy value of 7.01 bits/ tag as well as the more accurate entropy value could be helpful to determine the lossless compression terminus of the 393 MB ICSE Proceedings archive! But, hey, bring it all to Canada — no need for data compression in our land.

Here’s the raw data, for reference:

i Tag Relative weight pi Hi
1 code 85 0.0356020942408377 0.171313506572643
2 requirements 10 0.00418848167539267 0.0330863117190497
3 find 9 0.0037696335078534 0.0303506765014925
4 engineering 12 0.0050261780104712 0.0383815163162604
5 programs 13 0.00544502617801047 0.0409511995374668
6 cases 15 0.00628272251308901 0.0459543105059809
7 similar 11 0.00460732984293194 0.0357614188024733
8 result 8 0.00335078534031414 0.0275477613162236
9 pattern 9 0.0037696335078534 0.0303506765014925
10 three 13 0.00544502617801047 0.0409511995374668
11 language 12 0.0050261780104712 0.0383815163162604
12 user 13 0.00544502617801047 0.0409511995374668
13 often 9 0.0037696335078534 0.0303506765014925
14 large 11 0.00460732984293194 0.0357614188024733
15 search 9 0.0037696335078534 0.0303506765014925
16 students 15 0.00628272251308901 0.0459543105059809
17 models 17 0.00712041884816754 0.0507958018854544
18 source 19 0.00795811518324607 0.0554947822337051
19 tool 18.5 0.00774869109947644 0.0543325175142862
20 data 38 0.0159162303664921 0.095073334100918
21 number 32 0.0134031413612565 0.0833847625423812
22 one 32 0.0134031413612565 0.0833847625423812
23 II 10 0.00418848167539267 0.0330863117190497
24 also 32 0.0134031413612565 0.0833847625423812
25 approach 38 0.0159162303664921 0.095073334100918
26 e.g 14 0.00586387434554974 0.0434743544881844
27 projects 13 0.00544502617801047 0.0409511995374668
28 ACM 13 0.00544502617801047 0.0409511995374668
29 call 10 0.00418848167539267 0.0330863117190497
30 information 20.5 0.00858638743455497 0.0589346708986153
31 systems 17 0.00712041884816754 0.0507958018854544
32 feature 10 0.00418848167539267 0.0330863117190497
33 ICSE 10 0.00418848167539267 0.0330863117190497
34 task 11 0.00460732984293194 0.0357614188024733
35 based 18 0.00753926701570681 0.0531620859872783
36 file 9 0.0037696335078534 0.0303506765014925
37 approaches 10.5 0.0043979057591623 0.034431061674485
38 terms 9 0.0037696335078534 0.0303506765014925
39 generated 9 0.0037696335078534 0.0303506765014925
40 use 32 0.0134031413612565 0.0833847625423812
41 bugs 13 0.00544502617801047 0.0409511995374668
42 pages 11 0.00460732984293194 0.0357614188024733
43 classes 11 0.00460732984293194 0.0357614188024733
44 process 19 0.00795811518324607 0.0554947822337051
45 Figure 31 0.0129842931937173 0.0813737172482226
46 problem 12 0.0050261780104712 0.0383815163162604
46 quality 12 0.0050261780104712 0.0383815163162604
47 execution 16.5 0.00691099476439791 0.0495994554238569
48 shown 10 0.00418848167539267 0.0330863117190497
49 et 11.5 0.00481675392670157 0.0370780377843622
50 knowledge 10 0.00418848167539267 0.0330863117190497
51 line 11 0.00460732984293194 0.0357614188024733
52 several 8 0.00335078534031414 0.0275477613162236
53 class 13 0.00544502617801047 0.0409511995374668
54 usage 8 0.00335078534031414 0.0275477613162236
55 project 20.5 0.00858638743455497 0.0589346708986153
56 IEEE 13 0.00544502617801047 0.0409511995374668
57 need 10 0.00418848167539267 0.0330863117190497
58 existing 9.5 0.00397905759162304 0.0317264487084756
59 tasks 9.5 0.00397905759162304 0.0317264487084756
60 features 12 0.0050261780104712 0.0383815163162604
61 first 18 0.00753926701570681 0.0531620859872783
62 state 14 0.00586387434554974 0.0434743544881844
63 However 15 0.00628272251308901 0.0459543105059809
64 example 24 0.0100523560209424 0.0667106766115785
65 well 8 0.00335078534031414 0.0275477613162236
66 used 38 0.0159162303664921 0.095073334100918
67 testing 16.5 0.00691099476439791 0.0495994554238569
68 changes 14 0.00586387434554974 0.0434743544881844
69 paper 12 0.0050261780104712 0.0383815163162604
70 possible 10.5 0.0043979057591623 0.034431061674485
71 support 12 0.0050261780104712 0.0383815163162604
72 pp 32 0.0134031413612565 0.0833847625423812
73 function 8 0.00335078534031414 0.0275477613162236
74 system 24 0.0100523560209424 0.0667106766115785
75 using 26 0.0108900523560209 0.0710123467189126
76 framework 8 0.00335078534031414 0.0275477613162236
77 reports 10 0.00418848167539267 0.0330863117190497
78 Section 15 0.00628272251308901 0.0459543105059809
79 level 10 0.00418848167539267 0.0330863117190497
80 development 19 0.00795811518324607 0.0554947822337051
81 University 12 0.0050261780104712 0.0383815163162604
82 design 11 0.00460732984293194 0.0357614188024733
83 vol 10 0.00418848167539267 0.0330863117190497
84 important 9 0.0037696335078534 0.0303506765014925
85 Conference 10 0.00418848167539267 0.0330863117190497
86 analysis 24 0.0100523560209424 0.0667106766115785
87 methods 18 0.00753926701570681 0.0531620859872783
88 evaluation 9 0.0037696335078534 0.0303506765014925
89 Java 12 0.0050261780104712 0.0383815163162604
90 algorithm 10.5 0.0043979057591623 0.034431061674485
91 programming 12 0.0050261780104712 0.0383815163162604
92 time 24 0.0100523560209424 0.0667106766115785
93 method 24 0.0100523560209424 0.0667106766115785
94 participants 11 0.00460732984293194 0.0357614188024733
95 values 12 0.0050261780104712 0.0383815163162604
96 provide 10.5 0.0043979057591623 0.034431061674485
97 new 19 0.00795811518324607 0.0554947822337051
98 bug 23 0.00963350785340314 0.0645225677153213
99 i.e 11.5 0.00481675392670157 0.0370780377843622
100 work 23 0.00963350785340314 0.0645225677153213
101 study 18 0.00753926701570681 0.0531620859872783
102 application 16 0.00670157068062827 0.0483939519518189
103 case 19.5 0.00816753926701571 0.0566490951118284
104 API 13 0.00544502617801047 0.0409511995374668
105 tools 12 0.0050261780104712 0.0383815163162604
106 International 9 0.0037696335078534 0.0303506765014925
107 program 24 0.0100523560209424 0.0667106766115785
108 techniques 12 0.0050261780104712 0.0383815163162604
109 input 16 0.00670157068062827 0.0483939519518189
110 related 8 0.00335078534031414 0.0275477613162236
111 research 14 0.00586387434554974 0.0434743544881844
112 specific 10 0.00418848167539267 0.0330863117190497
113 developers 24 0.0100523560209424 0.0667106766115785
114 applications 12 0.0050261780104712 0.0383815163162604
115 Proceedings 10 0.00418848167539267 0.0330863117190497
116 shows 11 0.00460732984293194 0.0357614188024733
117 performance 14 0.00586387434554974 0.0434743544881844
118 variables 8 0.00335078534031414 0.0275477613162236
119 Software 32 0.0134031413612565 0.0833847625423812
120 order 10 0.00418848167539267 0.0330863117190497
121 rules 9 0.0037696335078534 0.0303506765014925
122 software 70 0.0293193717277487 0.149294299501816
123 al 12 0.0050261780104712 0.0383815163162604
124 following 9 0.0037696335078534 0.0303506765014925
125 many 12 0.0050261780104712 0.0383815163162604
126 elements 10.5 0.0043979057591623 0.034431061674485
127 type 10 0.00418848167539267 0.0330863117190497
128 tests 11 0.00460732984293194 0.0357614188024733
129 patterns 14 0.00586387434554974 0.0434743544881844
130 show 8 0.00335078534031414 0.0275477613162236
131 Engineering 16.5 0.00691099476439791 0.0495994554238569
132 developer 13 0.00544502617801047 0.0409511995374668
133 context 10 0.00418848167539267 0.0330863117190497
134 type 9 0.0037696335078534 0.0303506765014925
135 tests 10 0.00418848167539267 0.0330863117190497
136 control 9 0.0037696335078534 0.0303506765014925
137 given 10 0.00418848167539267 0.0330863117190497
138 two 32 0.0134031413612565 0.0833847625423812
139 found 12 0.0050261780104712 0.0383815163162604
140 test 45 0.018848167539267 0.107989292760895
141 results 24 0.0100523560209424 0.0667106766115785
142 Table 13 0.00544502617801047 0.0409511995374668
143 set 26 0.0108900523560209 0.0710123467189126
144 value 12 0.0050261780104712 0.0383815163162604
145 refactoring 10 0.00418848167539267 0.0330863117190497
146 model 32 0.0134031413612565 0.0833847625423812
147 technique 11 0.00460732984293194 0.0357614188024733
148 different 24 0.0100523560209424 0.0667106766115785
149 change 13.5 0.00565445026178011 0.0422183733869045
Total 2387.5 1.0000 7.01030881472941

I’ve been following the news on the upcoming Facebook IPO. Most analysts and news makers describe the trading process as betting (in the sense of gambling). Here‘s an example. Yet, insightful financiers and economists would agree that trading and investing are not random processes. Nor is the market efficient. Where does the market thus stand?

Malkiel would have us believe that the market is a process entirely modeled by random walks. This is the Random Walk Theory (RWT) of the markets. That is, prices are randomly generated because ‘true’ news is entirely unpredictable and, thus, entirely random. On the other side of the coin, Fama promotes the hypothesis (and later an elaborated theory) that the markets are entirely efficient, i.e. informationally predictable. This is the Efficient Market Theory (EMT). Both diagonally opposing views are, nonetheless, cynical, as writer and investor Thomsett has put it. In light of Information Theory and its ultimate power at quantifying and perspicaciously studying information, I’d argue pretty much that far from efficient or random, the market is entropic.

I’ve written an introductory article to information theory and entropy located here. At a glance, entropy measures the uncertainty in a system or process of a number of outcomes (discrete or not). If the experiment has three outcomes (e.g. the price could be $3, 10 or 18), then these price levels are associated with probabilities, i.e. the probability of getting a price equal to $3 is 45%, etc. Entropy just averages the expected price. If the outcomes are equiprobable, i.e. all three prices have a 1/3 chance of popping up, then there is maximum uncertainty because one just can’t predict which of the three prices is going to pop up next. This latter view conforms with that of the random walk theory, according to which true information cannot be predicted and thus the market actors cannot know (or predict) what price levels to expect next. On the other hand, if it is almost sure that one can predict true news (information), then one would have no problem expecting price levels, given the almost total confidence in the probabilities. This is the view promoted by the efficient market theory. Both view are academically appealing, but not enough realistic to be embraced by the field actors. The latter embrace the entropic view of the market — though not cognizant of what it means — that I’m alluding to here.

The entropic perspective is simple: When one carries out fundamental and technical analyses, the average information one gains should increase. In other words, there shall be an average decrease in uncertainty, or entropy, in the system. Thus, the decision a trader or investor makes is indeed based on existing information (financial indicators in company reports, price patterns on charts, etc.) as well as possible future information (insider hints, unpublished news, etc.). As such, the predictability power increases because entropy decreases. It’s thus not truly random, but not yet efficient. It’s in between the spectrum. It’s thus entropic. Here’s a simple conceptualization of what I’m babbling on:

Entropic market hypothesis
Entropic market hypothesis

According to the entropic view of the market, if new information comes in and it is surprising, then the entropy tends to increase because the number of factors that affect the outcomes in the system increases. If the new information is expected, such as Greece’s exit from the Eurozone, then it won’t have much entropy (thus randomness) in current price level trends, although other factors might. In the long run, however, the system is less entropic, as moving averages reveal.

The main lesson to be learned by the entropic view of the market: Do your homework when investing! Read, read, read more and study, in order to decrease your uncertainty (which is also tantamount to ignorance in the naive sense).