Important privacy note: DuckDuckGo is my search engine of choice as well as the default browser on my smartphone. Check it out at https://duckduckgo.com/#1.

Also, for the curious, here’s how to get rid of Google: https://spreadprivacy.com/how-to-remove-google/.


Header photo: Dax the Duck, logo screenshot from duckduckgo.com

Quoted

Replicated from a tweet by @marnanel, and it checks out! Note the highlighted:

CC-BY-NC-ND 2018 itheorist. Facebook’s IPv6 address ($ host facebook.com)

Thanks to Lee Phillips for sharing this tweet about Julia, a relatively new programming language with the power of C and the usability of Python.

This is an article off my M.Sc. thesis published in 2015 in the rather serious venue IEEE Transactions on Image Processing (PDF here):

In this paper, we present an innovative method for lossless compression of discrete-color images such as map images, graphics, GIS as well as binary images. This method comprises two main components. The first is a fixed-size codebook encompassing 8×8 bit blocks of two-tone data along with their corresponding Huffman codes and their relative probabilities of occurrence. The probabilities were obtained from a very large data set of two color images (binary) and are used for arithmetic coding. The second component is the row-column reduction coding, which will encode those blocks that are not in the codebook. The proposed method has been successfully applied on two major image categories: (i) images with a predetermined number of discrete colors such as digital maps, graphs, and GIS images; and (ii) binary images. The results show that our method compresses images from both categories (discrete color and binary images) with 90% in most case and higher than the JBIG-2 by 5% to 20% for binary images, and by 2% to 6.3% for discrete color images on average.

During my research work at IBM Ottawa in 2012, I had the opportunity to carry out ethnographic studies of software developers and testers in order to understand how they collaborate and how requirements are managed among them. From the large amount of data I was able to gather from interviews, developed work, and observations, I was able to coauthor “Openness and Requirements: Opportunities and Tradeoffs in Software Ecosystems” with Eric Knauss, et al. Download a PDF here.

A growing number of software systems is characterized by continuous evolution as well as by significant interdependence with other systems (e.g. services, apps). Such software ecosystems promise increased innovation power and support for consumer oriented software services at scale, and are characterized by a certain openness of their information flows. While such openness supports project and reputation management, it also brings some challenges to Requirements Engineering (RE) within the ecosystem. We report from a mixed-method study of IBM’s CLM ecosystem that uses an open commercial development model. We analyzed data from from interviews within several ecosystem actors, participatory observation, and software repositories, to describe the flow of product requirements information through the ecosystem, how the open communication paradigm in software ecosystems provides opportunities for ’just-in-time’ RE, as well as some of the challenges faced when traditional requirements engineering approaches are applied within such an ecosystem. More importantly, we discuss two tradeoffs brought about the openness in software ecosystems: i) allowing open, transparent communication while keeping intellectual property confidential within the ecosystem, and ii) having the ability to act globally on a long-term strategy while empowering product teams to act locally to answer end-users’ context specific needs in a timely manner.

Research presented in this paper follows preliminary investigations reported in another paper we published in 2012.

Suppose I’m a professor at some institution directing a lab conducting research in computer science or related fields. A new graduate student is going to join soon. Below is a welcome package I’d send over to my student before they join my lab.

Dear graduate student,
Congratulations on being accepted into the program and for considering doing research in my lab!

Before you finalize whatever you’re doing and prepare for the adventure, I’d like to offer you the following welcome package. Please accept, or at least review, it in all wisdom.

I. Books you should read or reread

Graduate studies are fun, but they constitute an over-the-edge adventure. As such, you should mingle your enthusiasm with a reality check. Are you organized? Are you generally a happy person? Do you tend to nourish or avert negative thoughts? Do you cope with challenges and polish your skills? Do you have sufficient foundational knowledge for working with me? I’ve sent you several books you should read, if not study, before you head over to my lab. Here’s a brief overview of each of them:

1. Flow by M. Csikszentmihalyi

This book will probably invite your attention to focus on its main theme: In order to enjoy whatever you do in life (even if your job is to mow grass), your skills must be such that they cope with your challenges in order to attain optimal flow and contentment in your work. If, for whatever reasons, the proportionality between challenges and skills becomes asymmetric, then you have to attain new skills in order to cope with increased challenges, or you have to increase challenges in order to apply your skills. If you live and work in disproportion you shall feel listless, which in turn shall affect your success (negatively).

Csikszentmihalyi’s book (I know, the author’s name is a nightmare) is an amazing intellectual work. I highly recommend you read and study its points.

2. Getting Things Done by D. Allen

You may be an organized person. You may already have a system of personal organization in place. Or, you may not. This book by Allen provides a practical personal organizational system to augment your current system or to help you get started afresh. I’ve used this system in a digital format probably also due to the rationale I came up with for not hesitating to use it. The rationale, ironically, came after I read Flow: here’s what I observed.

3. The Mindful Brain by D. Siegel

Mindful awareness entails focusing your attention in your present activity, be it breathing, running, thinking, etc. It’s part of ancient wisdom, but has become a topic of research in neurology, psychology, etc. This book by neuropsychologist Dan Siegel is sufficient to apprehend the concept and some of the practices related to maintaining a mindful brain in your life. I highly recommend you read that in order to learn how to eliminate minor worries and negative thoughts from your life.

4. The Conquest of Happiness by B. Russell

I’m probably biased here, since Russell is one of my favorite intellectuals. I don’t have heroes in my life, but Russell would be one of them if I were to put forth the existence of a modern Mount Olympus. This book by Russell is an intellectual treatise to happiness for all sorts of people — for me, for you, for the janitor, for the philosopher, and so forth.

It’s about helping you to realize that no matter how you feel before, during, or after doing something, the universe won’t change. Happiness in Russell’s terms is expanding your knowledge and nurturing feelings of affection towards others so as to increase your social network. This is also my definition of happiness, which supersedes all others. Thus, why be stressed before talking to an audience? Why have negative thoughts about life? What does it mean to be happy? Read this book to find out and come up with or adapt your definition of happiness — you’ll need it during your stay in my lab.

5. Foundational knowledge

Naturally, I expect you to have good foundational knowledge in computer science. Otherwise, what’s the point of philosophizing and being a doctor in the end? You should know what the fuss about the P vs. NP problem is and why the Courant folks are willing to give away a million bucks to the one who comes up with a funny solution. A couple of venerable books in the field of algorithms, data structures, and computability are: Introduction to Algorithms by Cormen et al.; Introduction to the Theory of Computation by M. Sipser, Data Structures and Algorithm Analysis series by Weiss. Do review them and keep them for future reference.

6. The Logic of Scientific Discovery by K. Popper

This is one last book you should read and refer to for several reasons. First, Popper introduces the concept of falsifiability. That is, a theory is not such if it’s not falsifiable. Here’s an example of such a theory. Second, I should warn you that most research in applied fields of computer science, such as empirical software engineering, takes on a neo-positivist (or logical positivist) stance. I, personally, am not a big fan of that stance, which mingles empiricism with rationalism to establish scientific truth by making verifiable, rather than falsifiable, claims. I align myself with Hume and Popper on the matter, who is one of the critics of neo-positivism for the very criterion of verification.

That’s all your package contains with respect to books. Do your best to read them. As for myself, cultural legitimacy has always been crucial in my life — it wouldn’t trade it for anything at all.

II. A collaboration model

For the sake of not falling prey to natural, neurobiological wirings for collaboration, let’s give the following model a shot:

III. Your bank account

Now the nasty news. Unless you have no financial worries in your life, I should let you know that you need to come to my lab with a positive financial basis, say $10,000 on the least. The reason is simple: Academia has been trapped by the publishing business. As such, you have to publish. Then, if successful, you have to travel to scientific, esoteric venues. Here’s the fun part: You’ll be reimbursed for your expenses, but it’s your initial burden to account for the travel expenses; hence the financial basis above, which comes from my own experience. Like it or not, you will be reimbursed completely if all is in order, but you’ll still have incurred financial loss. You’ll know why eventually. This is just a friendly warning.

IV. Gaining knowledge, augmenting your intellect

Here’s some practical advice: Never refrain from reading books. You’ll probably be a little overwhelmed with reading scientific papers, some of which will be in the ‘kerfuffle’ class of works, but do not let this deter your from reading books.  Read a lot and read mindfully. Keep an open mind and read critically. Exploit local libraries. Also, write about your reflections on what you read, or manifest any thoughts you get into some personal blog. This way, you’ll polish your skills and augment your intellect.

V. Traveling

Finally, don’t become apathetic. Travel. Get acquainted with the local zeitgeist. Be a contrarian, and keep up with being positive. In the end, you’ll succeed.

I hope you enjoy exploring this welcome package. Looking forward to collaborating with you.

Welcome to my lab!

Following my ethnographic research work at IBM Ottawa, my colleagues and I published the following paper (PDF here):

IT ecosystems are large software systems that consist of various, constantly interacting and partly autonomous subsystems as well as stakeholders of the overall system. Because of these specific properties, such systems are a highly relevant research area in the field of requirements engineering. In this paper we describe our approach to investigate and to model the flow of requirements in IT ecosystems. We are currently applying this approach in a case study in the IBM Collaborative Lifecycle Management project. This project is of particular relevance to the requirements engineering community because of its open commercial approach. This paper contributes by highlighting challenges of requirements engineering in IT ecosystems, i.e. contextualizing requirements, mapping them to subsystems, and communicating them to stakeholders. We define research questions and describe a mixed method approach to answer them.

We had set out to understand three phenomena:

  1. How requirement mapping is done
  2. How context is captured during elicitation
  3. How teams coordinate and communicate during a project

Adrian Kuhn shared the following image of the ICSE ’12 tag cloud built from the 1,373,011 words sampled from the main conference submissions:

To make it fun (!), I decided to compute the ICSE ’12 information entropy off of the tag cloud. Here’s the fun I experienced:

First, I assigned a relative weight to the tags based merely on the visual perception of their font size, which in a tag cloud implies “gravity” or “importance”. For instance, the word ‘code’ got a relative weight of 85 whereas the word ‘pattern’ a relative weight of 9. The total weight for the 149 tags was 2387.5.

Next, I computed the empirical probabilities of each tag by using a frequency approach: the probability of a tag equals its relative weight over 2387.5, the total weight. It turns out the tags appear to follow a power-law distribution:

Having the probability distribution, I could compute the Shannon entropy, which turned out to be 7.01 bits per tag. That is, in principle, you’d need on average 7.01 bits to encode each tag in the tag cloud. Note that other constructs (such as articles, verbs, etc.) are excluded from the software engineering context. If one were to include those, then, no worries — the entropy of English language, especially in the academia, is probably less than 1 bit per character! Naturally, a more accurate first-order entropy can be computed if the count for each word is provided. That way, the empirical probabilities would be derived directly by dividing each word count with the total number of words (1,373,011). The perceived entropy value of 7.01 bits/ tag as well as the more accurate entropy value could be helpful to determine the lossless compression terminus of the 393 MB ICSE Proceedings archive! But, hey, bring it all to Canada — no need for data compression in our land.

Here’s the raw data, for reference:

i Tag Relative weight pi Hi
1 code 85 0.0356020942408377 0.171313506572643
2 requirements 10 0.00418848167539267 0.0330863117190497
3 find 9 0.0037696335078534 0.0303506765014925
4 engineering 12 0.0050261780104712 0.0383815163162604
5 programs 13 0.00544502617801047 0.0409511995374668
6 cases 15 0.00628272251308901 0.0459543105059809
7 similar 11 0.00460732984293194 0.0357614188024733
8 result 8 0.00335078534031414 0.0275477613162236
9 pattern 9 0.0037696335078534 0.0303506765014925
10 three 13 0.00544502617801047 0.0409511995374668
11 language 12 0.0050261780104712 0.0383815163162604
12 user 13 0.00544502617801047 0.0409511995374668
13 often 9 0.0037696335078534 0.0303506765014925
14 large 11 0.00460732984293194 0.0357614188024733
15 search 9 0.0037696335078534 0.0303506765014925
16 students 15 0.00628272251308901 0.0459543105059809
17 models 17 0.00712041884816754 0.0507958018854544
18 source 19 0.00795811518324607 0.0554947822337051
19 tool 18.5 0.00774869109947644 0.0543325175142862
20 data 38 0.0159162303664921 0.095073334100918
21 number 32 0.0134031413612565 0.0833847625423812
22 one 32 0.0134031413612565 0.0833847625423812
23 II 10 0.00418848167539267 0.0330863117190497
24 also 32 0.0134031413612565 0.0833847625423812
25 approach 38 0.0159162303664921 0.095073334100918
26 e.g 14 0.00586387434554974 0.0434743544881844
27 projects 13 0.00544502617801047 0.0409511995374668
28 ACM 13 0.00544502617801047 0.0409511995374668
29 call 10 0.00418848167539267 0.0330863117190497
30 information 20.5 0.00858638743455497 0.0589346708986153
31 systems 17 0.00712041884816754 0.0507958018854544
32 feature 10 0.00418848167539267 0.0330863117190497
33 ICSE 10 0.00418848167539267 0.0330863117190497
34 task 11 0.00460732984293194 0.0357614188024733
35 based 18 0.00753926701570681 0.0531620859872783
36 file 9 0.0037696335078534 0.0303506765014925
37 approaches 10.5 0.0043979057591623 0.034431061674485
38 terms 9 0.0037696335078534 0.0303506765014925
39 generated 9 0.0037696335078534 0.0303506765014925
40 use 32 0.0134031413612565 0.0833847625423812
41 bugs 13 0.00544502617801047 0.0409511995374668
42 pages 11 0.00460732984293194 0.0357614188024733
43 classes 11 0.00460732984293194 0.0357614188024733
44 process 19 0.00795811518324607 0.0554947822337051
45 Figure 31 0.0129842931937173 0.0813737172482226
46 problem 12 0.0050261780104712 0.0383815163162604
46 quality 12 0.0050261780104712 0.0383815163162604
47 execution 16.5 0.00691099476439791 0.0495994554238569
48 shown 10 0.00418848167539267 0.0330863117190497
49 et 11.5 0.00481675392670157 0.0370780377843622
50 knowledge 10 0.00418848167539267 0.0330863117190497
51 line 11 0.00460732984293194 0.0357614188024733
52 several 8 0.00335078534031414 0.0275477613162236
53 class 13 0.00544502617801047 0.0409511995374668
54 usage 8 0.00335078534031414 0.0275477613162236
55 project 20.5 0.00858638743455497 0.0589346708986153
56 IEEE 13 0.00544502617801047 0.0409511995374668
57 need 10 0.00418848167539267 0.0330863117190497
58 existing 9.5 0.00397905759162304 0.0317264487084756
59 tasks 9.5 0.00397905759162304 0.0317264487084756
60 features 12 0.0050261780104712 0.0383815163162604
61 first 18 0.00753926701570681 0.0531620859872783
62 state 14 0.00586387434554974 0.0434743544881844
63 However 15 0.00628272251308901 0.0459543105059809
64 example 24 0.0100523560209424 0.0667106766115785
65 well 8 0.00335078534031414 0.0275477613162236
66 used 38 0.0159162303664921 0.095073334100918
67 testing 16.5 0.00691099476439791 0.0495994554238569
68 changes 14 0.00586387434554974 0.0434743544881844
69 paper 12 0.0050261780104712 0.0383815163162604
70 possible 10.5 0.0043979057591623 0.034431061674485
71 support 12 0.0050261780104712 0.0383815163162604
72 pp 32 0.0134031413612565 0.0833847625423812
73 function 8 0.00335078534031414 0.0275477613162236
74 system 24 0.0100523560209424 0.0667106766115785
75 using 26 0.0108900523560209 0.0710123467189126
76 framework 8 0.00335078534031414 0.0275477613162236
77 reports 10 0.00418848167539267 0.0330863117190497
78 Section 15 0.00628272251308901 0.0459543105059809
79 level 10 0.00418848167539267 0.0330863117190497
80 development 19 0.00795811518324607 0.0554947822337051
81 University 12 0.0050261780104712 0.0383815163162604
82 design 11 0.00460732984293194 0.0357614188024733
83 vol 10 0.00418848167539267 0.0330863117190497
84 important 9 0.0037696335078534 0.0303506765014925
85 Conference 10 0.00418848167539267 0.0330863117190497
86 analysis 24 0.0100523560209424 0.0667106766115785
87 methods 18 0.00753926701570681 0.0531620859872783
88 evaluation 9 0.0037696335078534 0.0303506765014925
89 Java 12 0.0050261780104712 0.0383815163162604
90 algorithm 10.5 0.0043979057591623 0.034431061674485
91 programming 12 0.0050261780104712 0.0383815163162604
92 time 24 0.0100523560209424 0.0667106766115785
93 method 24 0.0100523560209424 0.0667106766115785
94 participants 11 0.00460732984293194 0.0357614188024733
95 values 12 0.0050261780104712 0.0383815163162604
96 provide 10.5 0.0043979057591623 0.034431061674485
97 new 19 0.00795811518324607 0.0554947822337051
98 bug 23 0.00963350785340314 0.0645225677153213
99 i.e 11.5 0.00481675392670157 0.0370780377843622
100 work 23 0.00963350785340314 0.0645225677153213
101 study 18 0.00753926701570681 0.0531620859872783
102 application 16 0.00670157068062827 0.0483939519518189
103 case 19.5 0.00816753926701571 0.0566490951118284
104 API 13 0.00544502617801047 0.0409511995374668
105 tools 12 0.0050261780104712 0.0383815163162604
106 International 9 0.0037696335078534 0.0303506765014925
107 program 24 0.0100523560209424 0.0667106766115785
108 techniques 12 0.0050261780104712 0.0383815163162604
109 input 16 0.00670157068062827 0.0483939519518189
110 related 8 0.00335078534031414 0.0275477613162236
111 research 14 0.00586387434554974 0.0434743544881844
112 specific 10 0.00418848167539267 0.0330863117190497
113 developers 24 0.0100523560209424 0.0667106766115785
114 applications 12 0.0050261780104712 0.0383815163162604
115 Proceedings 10 0.00418848167539267 0.0330863117190497
116 shows 11 0.00460732984293194 0.0357614188024733
117 performance 14 0.00586387434554974 0.0434743544881844
118 variables 8 0.00335078534031414 0.0275477613162236
119 Software 32 0.0134031413612565 0.0833847625423812
120 order 10 0.00418848167539267 0.0330863117190497
121 rules 9 0.0037696335078534 0.0303506765014925
122 software 70 0.0293193717277487 0.149294299501816
123 al 12 0.0050261780104712 0.0383815163162604
124 following 9 0.0037696335078534 0.0303506765014925
125 many 12 0.0050261780104712 0.0383815163162604
126 elements 10.5 0.0043979057591623 0.034431061674485
127 type 10 0.00418848167539267 0.0330863117190497
128 tests 11 0.00460732984293194 0.0357614188024733
129 patterns 14 0.00586387434554974 0.0434743544881844
130 show 8 0.00335078534031414 0.0275477613162236
131 Engineering 16.5 0.00691099476439791 0.0495994554238569
132 developer 13 0.00544502617801047 0.0409511995374668
133 context 10 0.00418848167539267 0.0330863117190497
134 type 9 0.0037696335078534 0.0303506765014925
135 tests 10 0.00418848167539267 0.0330863117190497
136 control 9 0.0037696335078534 0.0303506765014925
137 given 10 0.00418848167539267 0.0330863117190497
138 two 32 0.0134031413612565 0.0833847625423812
139 found 12 0.0050261780104712 0.0383815163162604
140 test 45 0.018848167539267 0.107989292760895
141 results 24 0.0100523560209424 0.0667106766115785
142 Table 13 0.00544502617801047 0.0409511995374668
143 set 26 0.0108900523560209 0.0710123467189126
144 value 12 0.0050261780104712 0.0383815163162604
145 refactoring 10 0.00418848167539267 0.0330863117190497
146 model 32 0.0134031413612565 0.0833847625423812
147 technique 11 0.00460732984293194 0.0357614188024733
148 different 24 0.0100523560209424 0.0667106766115785
149 change 13.5 0.00565445026178011 0.0422183733869045
Total 2387.5 1.0000 7.01030881472941