Sunday, December 18, 2016

Postmortem: Deducely

by Aswin Vayiravan, Deducely (@Deducely
Editor: Mei Nagappan (@MeiNagappan), University of Waterloo, Canada

Editor's Note: In a new series of posts (modeled after the Postmortems in, we are looking into what worked great and what challenges remain for software developers. We hope to curate many such postmortems in the future! Do email me directly if you have ideas on improving this series of posts.  In the first post of this series, I reached out to a small startup company called Deducely in India.

About Us
Deducely is an AI-powered sales lead generation platform. Usually, software companies have a lead generation team that does a lot of manual work like researching, filtering and qualifying prospects before the sales pitch happens. We have a tool that would remove the burden of repeated manual work in these teams. We use Tensor Flow to learn and track patterns and categorize leads. Also, we use NLTK to learn and extract the required information from unstructured data. We are a small bootstrapped startup headquartered in California but working out of a nondescript village in South India called Thiruparankundram.

What worked great for us?

  1. The SDLC model: Back in university we had meticulously studied the Pros and Cons of various Software Development Lifecycle models like the Waterfall model, Spiral model, and Agile model. etc. However, when I started my career in Freshdesk (A startup back then), it was surprising to find that I couldn’t fit the software development happening there into any one of these theoretical models! It was a hybrid of everything!  Similarly, in our startup Deducely we do not strictly follow any specific model, but the closest model that we could relate to would spiral model. We plan what has to be built, brainstorm the features with our customers and we finally start building one small module at a time, test it, release it and iterate.

  1. Development Platform: We are Linux lovers and to get Linux into our development devices, we use Vagrant. Although this technology might be a tad bit old, we are huge fans of it! It helps us in isolating various development environments. Even if we ruin the configuration of a particular vagrant environment, we can always spin up a new one from a previous snapshot. This gives us the freedom to go and SUDO without caring much about the consequences! We do not use any IDE. Atom is our text editor of choice.

  1. Maintaining the code: Git has become the de-facto version control system for code these days. Its decentralized approach takes some time to master but once you start making Git work for you the benefits it offers is unparalleled. Whilst many bigger companies use Github enterprise version or Atlassian Bitbucket, we went ahead with the Gitlab - an open source, self-hosted version control system with a plethora of features like revision control, continuous integration, a container registry, and an issue tracker.

  1. Tracking the tasks: It all happens with a very simple ToDo board on Trello. We are just two people (Myself and my co-founder Arun Kumar) working full time with a two more remote part-timers. We prioritize tasks and assign it to one person. Before we actually write and integrate different functions, we have a small chat about the function prototype and once the function is coded we run ad-hoc unit tests on the functions. Apart from this, the bugs in our code are tracked using GitLab’s inbuilt bug tracker.

Screen Shot 2016-12-10 at 12.11.29 AM.png
A screenshot of our task board

Screen Shot 2016-12-10 at 12.23.52 AM.png
A screenshot of the issues in one of our repositories

  1. Storing the data: Initially, there was a lot of arguments for using a conventional database like MySQL or PostgreSQL, but we settled with MongoDB because of its loosely typed Schema. All the queries in MongoDB are simple JS function calls and this is a huge advantage for a full stack JS company like ours. Also, say if we had to edit the schema of a MySQL table containing a billion records it would have been a plain disaster, but with MongoDB, we have better control over the schema and data types. Also backing up, and restoring the DB is fairly painless. Plus replication, fault tolerance and disaster recovery are made simple through MongoDBs replica sets. Though the mongoshell is the best way to connect to MongoDB, we prefer robomongo.

Our biggest challenges

  1. Callback Hell: We are Javascript aficionados, especially NodeJS. It gives us the ability to do multiple tasks in parallel instead of waiting for IO. We are a two person company and the NPM registry for NodeJS has a lot of trusted open source third party module that we generously incorporate while development. Also, the community support for NodeJS is very mature. If we face a problem, it can be solved in a matter of a few Google searches. However, writing code free of callback hell is a huge challenge not only with node but with any JS flavor. Callbacks, inside callbacks, makes the code unmaintainable and cluttered. We get around callback hell with a library called Bluebird. It makes code, more maintainable and certain code flows that are easily achievable in other languages aren’t easily possible in JS. Such code flows can be achieved with bluebird.

  1. Sequential execution: We had to scrape data out millions of web pages Javascript gave us a lot of power as it interacted natively with the DOM. However, it had one nasty side effect. In our case, we had to sequentially make millions of HTTP requests. In our initial days, I wrote a snippet to read the list of websites from the database and make those HTTP requests. Since NodeJS is non-blocking controlling the code flow was a huge challenge, and millions of HTTP requests went out but we were never able to get the responses back as these million requests went in one shot! At once! controlling this behaviour of NodeJS was the biggest challenge. JS isn’t very kind to sequential tasks and we work around this restriction via messaging queues - specifically RabbitMQ. It has a powerful API and an easy to use GUI to monitor the status of the queue.

Screen Shot 2016-12-10 at 12.57.51 AM.png

The RabbitMQ dashboard

Data Box
Developer Deducely
Platform Linux
Number of Developers 2 Full time and 2 part time
Length of development 1 year
Lines of Code 10K - 100K

Sunday, December 11, 2016

FSE 2016 Panel: The State of Software Engineering Research

by Matthieu Foucault, Carlene Lebeuf (@CarlyLebeuf), and Margaret-Anne Storey (@margaretstorey)- University of Victoria
Cross Posted from Margaret-Anne Storey's Blog.

The 2016 International Symposium on the Foundations of Software Engineering hosted a panel of prominent software engineering researchers moderated by Margaret-Anne Storey. The slides presented during the panel can be found here
Our panelists:

Tao Xie
University of Illinois at Urbana-Champaign
Tao is an ACM distinguished researcher. His Research focuses on automated software testing, mobile security, and software analytics.

Laurie Williams
North Carolina State University
Laurie is a founder of the Extreme Programming / Agile Conference. Her research focuses on software security, testing, and agile programming.

Peri Tarr
IBM Research
Peri is a principal research staff member at IBM TJ Watson Lab and a technical lead for Cognitive Tools and Methods at IBM. Her research focuses on software composition and aspect oriented software development.

Prem Devanbu
University of California at Davis
Prem started his career as an industrial software developer, then worked at Bell Labs and AT&T Research before beginning to teach at University of California at Davis. His research focuses on empirical software engineering, naturalness of software, and social analytics.

Lionel Briand
University of Luxembourg
Lionel currently leads the Software Verification and Validation Lab at the University of Luxembourg. He strongly advocates that research is practical to industry.
Our panelists were asked to reflect on three questions related to research in software engineering:
  • Do you believe our community as a whole is achieving the right balance of science, engineering, and design in our combined research efforts?
  • What new or existing areas of research do you think our community should pay more attention to?
  • Do you have novel suggestions for how we could improve our research methods to increase the impact of software engineering research in the near and distant future?
Each panelist was asked to briefly present their thoughts on these questions. Then we opened the floor to questions, and the rest of the panel was dedicated to a discussion between panelists and members of the audience. Our summary of the panel discussion focuses on the panelist responses to the three questions posed as well as the themes that emerged from their responses.

Balancing Science and Engineering

A common theme that quickly emerged was the importance of the role of industry in research. To kickstart the group discussion, panelists were asked to reflect on a statement made by Jan Bosch of Chalmers University at a research conference a few weeks earlier:
“Research does not start in universities anymore, it starts in industry.”

A quick show of hands at the conference demonstrated that the majority of people in attendance seemed to agree with Jan Bosch’s claim. Williams also agreed with this statement and expanded it further by stating that “research starts in industry, because that’s the context”. Briand felt that because “we are in a discipline where most of the phenomenon we are studying cannot be reproduced in a lab environment”, as software engineers, “our lab is the industry”.
All panelists agreed that collaborating with practitioners (not only industry, but also open-source communities, governments, etc.) is essential to solve real problems. Williams drew connections between research in software engineering and biology:
“If we try to come up with problems that we think are interesting, that would be similar to a biologist never going outside. We have to go out there and see the problems that they have and then help with it.”
However, even if practitioners are aware of these problems, they may not be able to solve them. Briand mentioned that a lack of expertise and a lack of freedom to look at novel solutions might be to blame. Devanbu observed that researchers have this advantage:
“As a researcher, you can have a broader perspective that spans over several languages, and not only try to generalize observations, but also find effects that are only observable at an ecosystem level. It’s not only a question of freedom, but also of perspective that industrials don’t have because they are not considering different projects at the same time.”
Xie suggested we engage in practitioners in research that is currently outside of their scope:
“If we show [practitioners] things outside of their scope (that in the longer term may be important), they may be more open, and may engage in collaborations with academic researchers, […] along with providing data, problems, or discussions.”
Members of the audience, namely Daniel Jackson (Massachusetts Institute of Technology) and Tom Ball (Microsoft Research), emphasized that a balance is needed, and that looking at basic science should not be left out. Notable examples, such as UNIX, Simula, ALGOL 60, and distributed systems, were not the product of massive empirical studies, but of academic researchers sitting in a room and brainstorming.
The discussion above illustrates the importance of making a conscious effort to reach out to practitioners. However, this is not an easy task and it requires real commitment and patience from researchers, as Peri Tarr mentioned:
“One of the problems that we face all the time as industrial researchers is gaining the trust of the people whose problem we’re going to help them solve […]. It can take months or years to get on the same page with the people who have a problem, to establish that yes, you’re looking for a way to solve their problem that will actually work for them, within, as Lionel points out, their real-world constraints.”
The audience (at FSE and listening to the broadcast) questioned (via Twitter) about our role as researchers and how we collaborate with industry, for example:

More discussion on these questions is needed! We invite you to participate in the blog discussion below.

Paying Attention to Other Areas of Research

In their opening statements, all panelists mentioned other areas of research that our community should look at.
Devanbu mentioned DevOps and IoT as other areas that the SE community tends to neglect. Xie mentioned SE research results that had a broad impact outside of the SE community, such as symbolic execution, delta debugging, or Representational State Transfer (REST). Xie further suggested that we consider more of the societal impact of SE research, advocating for a “bigger social responsibility” for researchers. He referred to the previous day’s keynote from Margaret Burnett about gender inclusiveness of software, and cited David Notkin’s 2013 quote:
“Anybody who thinks that we are just here because we are smart forgets that we’re also privileged, and we have to extend that further. So we have got to educate and help every generation”
Williams addressed the problem of cybersecurity as one of the main challenges for our community, stating that “we haven’t yet provided software engineers the means to write secure code without impacting their own workflow.” She said that software engineering researchers need to “situate [their] work in this world where there is someone working against [them], whether it’s an attacker or someone doing something they aren’t supposed to.” The second research area Williams highlighted was “agile software development on steroids”; the world of continuous integration, continuous deployment, devOps, continuous experimentation, testing in production, etc. We need to explore ways of adopting these practices as well as understand their benefits and the risks they introduce.
Tarr insisted on focusing our research efforts at the intersection of Software Engineering and other “high impact, societally important, value creation areas”, such as health care, environment, cognitive sciences, security and privacy, and education. She said, “In every one of these areas, these people are trying to get new generations of software done, but they don’t know how to do it […]. We desperately need software engineers at the intersection of these areas.” She noted that the traditional areas of software engineering research are now being driven by practitioners and that, as researchers, we are privileged to have the opportunity to take bigger risks that lead to bigger rewards. We “shouldn’t be working in areas where we aren’t afraid to fail”.
Briand considers that, although all topics covered by our community are relevant to practitioners, our “research is largely disconnected from practical engineering needs and priorities” and we “fail to recognize the variations across domains and contexts”. The needs and constraints of people developing software across these varying domains are completely different and “there is no such things as a universal solution to any software engineering problem”. In the domain of software engineering, our working assumptions and contextual factors make a huge difference. Because there is a disconnect from particular needs and priorities, there is a gap in the research literature – “the gap between what I needed and what I could find was significant” – that is too large to deal with.
What are your thoughts on the panelists’ suggestions for future software engineering research directions? Do you agree or disagree? Or do you suggest other areas we should pay attention to, e.g., are there other disciplines we should apply our results to, as the tweet below suggests? Let us know in the discussion below!

Widening Our Vision of What is Research

While discussing whether we need to pay more attention to different areas of research, the panelists were asked to comment on Jane Cleland-Huang’s (University of Notre Dame) tweet regarding fostering more diverse areas of research:

Cleland-Huang commented on her tweet by adding:
“It is easy for us as a community to lock into the same area. For example a lot of people do research that benefits from open-source systems, but other areas [are left out], such as immersive studies in industry, or areas where I do research in, such as requirements and traceability, where datasets are not so available. If we want to make a difference in those areas, what do we, as a community, need to do in terms of the review process and encouraging that kind of research?”
As discussed in the previous sections, if we want to do impactful research, we need to reach out to practitioners and look at the intersection of software engineering and other fields. However, this cannot happen because if, as Xie mentions, we keep a “narrow-minded definition of what is a research contribution”, a large number of papers that may have a high impact on industry will not make it in our venues. Too often our community rejects contributions that look at real-world problems because it’s not research, it’s engineering. A notable example of this is William’s comment regarding her research on agile software development: “My research and the research my lab did was initially rejected by the community because they considered that practitioners shouldn’t do that (using agile methods). But they were doing it, so we have to accept what practitioners are doing on a widespread basis.”
Xie mentioned that “we don’t have enough expertise or experience in the program committees to really judge whether there is a real problem or not.” Tarr furthered this with, “As a community, one of the most important things that we can do is to […] start establishing norms and bars for people who are conducting high risk, important research in important places and are going out into the world to get this information.”
In response to a question posed by Storey regarding how we know when our methods have crossed the line from research to pure engineering, Williams gave the advice that when we are shifting more to the engineering side of things, we should take a step back and reframe the problem in a more scientific way. For example, we can ask “what are the independent variables?” that will allow us to switch to a scientific way of thinking about our research.
This discussion is related to one tweet we received ahead of the panel:

It would seem that our community may need to accept contributions that differ according to their engineering, scientific and design content, but if that is so, do we need to establish different criteria when assessing papers? Jonathan Bell suggested in a tweet that we consider not just evaluation approaches, but also our datasets and tools:

Finally, some discussion that occurred on Twitter suggests we rethink how our community considers negative results and that we look to how other research areas embrace not just positive results:

In summary, we wish to thank the conference organizers for suggesting this panel, and we thank the panelists and the FSE community for participating in this discussion! And we hope to continue the discussion in the comments below!