Stopping Starting-up

I love start-up. The early ‘wouldn’t it be great if …’ conversations, the commitment to going on a journey into the unknown, the first days when absolutely nothing is in place and things like name, logo, and website are the focus of endless creative conversation, the first customer, the first hire …

Its a hell of a ride.

There’s a huge amount of start-up advice around for the would-be entrepreneur. May favourites are the writings of Steve Blank and Eric Reis; Customer Development and Lean Startup have more to say about how to go from a vague idea to a functioning startup than any other business approach I’ve seen. There’s an increasing amount of support for start-ups, in the UK at least, from various tax breaks for would-be investors to the well-intentioned but somewhat ill-executed Startup Britain. There’s also a lot of very early stage investment money washing around from Angels and Micro-VCs; let’s face it if you’re going to invest in a hare-brained, high-risk scheme dreamt up by a bunch wide-eyed idealists you’re probably better off taking a punt on someone’s idea for a new type of social network than buying Euro-zone bonds.

Start-ups will save us from the global economic crisis!*

But once you’ve started your startup, you’ve done some customer development, you’ve perhaps pivoted your idea, you’ve reached a core of a product and you have some paying customers, then what? Its time to stop being a start-up and establish your business. Steve Blank calls this phase ‘Company Growth’ in Four Steps to the Epiphany.

In my experience** there are a few things you need to stop being a start-up:

A sustainable business model

On day one of your start-up you are concerned with the necessary detail of the company: what are we called, what do we do, where do we work? On day two you should be out there discovering your customers and understanding your potential market. On day three you should be trying to show those potential customers why they should become actual customers. And so on. You shouldn’t be worrying about sustainability, about what you need to keep being successful in 12-18 months time because unless you focus on finding your customers and your product-market fit you won’t be around in 12-18 months time.

But at some point you’re going to know roughly who your customers are and your product will have demonstrated a reasonable fit with the market. At this point you need to be worrying about sustainability because things like cash-flow, excessive overheads, technical debt and the like might just prove fatal if you don’t deal with them.

A sustainable business model is simply one where the revenue exceeds the cost (and beware hidden cost; many lean start-ups rely on the early joiners living with a reduced or waived salary in return for a stake in the long-term success of the company, which is an inherently unsustainable situation). It doesn’t necessarily have to be by a lot and it doesn’t necessarily have to make month-on-month profits, but it does need to support itself. But whilst it is defined by financial security, sustainability is about more than just about the money: a sustainable company will retain its employees, improve its processes, and learn from its internal influences as well as its external ones.

Less Product-Market Fit, more Market-Product Fit

A good product, one that meets the needs of its customers and has established itself in the market, should start to distort that market, no matter how niche (“Find your niche and dominate it”† was one of the excellent pieces of advice we were given on starting Singletrack). Disproportionately successful products like the iPad and Facebook demonstrate this in extremis; the newly-perceived tablet market is actually an iPad market, many times greater in size than the tablet market ever was, and early social networks simply never imagined the market was as big as Facebook has demonstrated it to be.

But all successful products demonstrate this effect to a degree. The current dominant player in our market does one thing well and many other things poorly (according to their customers we’ve talked to). They have distorted the market to be all about their core strength and we are actively seeking to displace them by redefining that market. Some of their customers will buy into us, some won’t, but we hope that in the next couple of years, when people talk about the market we’re in, the conversation will be much more diverse than it is today.

What I’m not saying is that Customer Development is a short-term process, or that an established company no longer listens to or learns from its customers. What I am saying is that in the early days a lean startup will do almost anything (within boundaries of ability, desire and possibly legality) that customers commit to paying for. But as time goes on you have a lot more data to go on and a lot more experience in your market and there will come a time when leading your customers on a market-defining journey is more valuable to you and them than focusing on fitting your product to the market.

Weaning off the founders

In the early days the founders are the company. As the company grows this effect lessens but it will be quite some time until the company is immune to the loss of some or all of its founders. But that is precisely what an established business needs to be. In the early days people will buy into the founders of the business as much as they buy into the product.

In fact I’d go so far as to say that believing in the founder has far more to do with an early customer making a commitment to buy something that doesn’t actually exist yet as whatever it is the product is portrayed as shortly to be doing.

But as time goes on the founders’ need to replace themselves with others who do the jobs they’ve been doing better than they can. They need to reduce their day-to-day involvement and ensure they are steering the company in the right direction. This doesn’t mean they need to make themselves redundant or irrelevant, just that the company should continue on if and when the founders decide to leave.††

In Conclusion

These are the things I think you need to stop being a start-up and get established. But I’m sure there are more and am interested in other people’s experience. It seems to me that with all the buzz about start-up around at the moment, a good body of experience and knowledge about how to successfully stop being a start-up is going to be increasingly important.

*Actually start-ups won’t save us from the global economic crisis but they might just create a few much-needed jobs, create a bit of excitement and confidence, and instil more entrepreneurial spirit in this country.

** Background: I’ve started three companies. The first never got out of being a start-up and died midway through its second year. The second was modestly successful and is still around after 10 years. The third is currently making the transition from start-up to established business.

† Parker Harris, co-founder of Salesforce.com

†† I’ll know Singletrack is doing okay when the team start telling me to shut up and let them do their jobs. I’m looking forward to that day.

Optimising Custom Applications on Force.com

One of the greatest challenges of developing on someone else’s PaaS offering is performance optimisation. When I built large-scale enterprise web systems from the ground up there were so many levers we could pull to improve performance without changing a single line of code: more/better CPUs, more memory, more/better use of indexing/caching/threading, and so on. And if we wanted to optimise code we could choose where to apply our efforts, from the lowest of the low-level infrastructure code to the client code running in the browser.

But when you build code that runs on someone else’s platform you have only one thing you can optimise: your own code.

One of the things that amazes me about building on force.com is how infrequently we need to do performance optimisation. Create a simple custom page with a record or two’s worth of data and a smattering of dynamic client-side functionality and an end user will be hard pushed to tell that its not a native force.com tab. Even more complex pages, with more than a couple of records and more than just a smattering of client-side functionality render pretty damn quickly and are perfectly usable. But the Singletrack system also has a few pages that are really very complex, cover a few hundred records and provide a lot of client-side tools. This post covers the specific topic of how we optimised a custom force.com page that took ~40s to render in its first version and got this down to < 2s.

The problem

Deliver information about a list of contacts, usually around 150-300 in length. Which contacts are returned may be manually set or may be dynamically chosen using a set of non-trivial criteria. What information is delivered is configurable on a per-user basis but typically consists of ~20 fields covering the Contact, its Account, and recent Activity History. Add in a number of tools for manipulating and interacting with the information on the list. If it helps, think of it as a Salesforce list view on steroids.

The first solution

Work out what information the user wants to see about each contact (from Custom Settings). Work out the criteria for selecting the contacts (stored on the record underpinning the view). Dynamically construct the query string and execute. Render as a table using Visualforce (think JSP/ASP/ERB for none-force.com’ers) along with embedded Javascript for all the client-side functionality. The result: ~40s from request to availability in Chrome.

The first optimisation … and some lessons learned

Rule #1 of optimisation; profile the shit out of the system and work from facts not opinions. But profiling isn’t well supported in force.com (basically, debug statements with timestamps are required) so we made some guesses as to where we thought the problem was likely to be in order to focus our instrumentation efforts. Given we were still quite new to force.com at the time we were probably a bit too influenced by our fears and immediately set about instrumenting all the querying. Waste of time, even increasing the number of contacts tenfold the querying accounted for less than 300ms of the request. And in general the server processing was really very fast.

Lesson #1 of optimising in the cloud: profile the shit out of the system and work from facts not opinions.

Instead we turned our attention to the page rendering and this turned up a surprising result. We needed two <apex:repeat /> loops to construct the table; one for the rows and one for the columns. Rendering a table of 3000 rows (requiring 2000 iterations in one loop) was pretty fast, rendering a table of 400 rows with 5 columns (also requiring 2000 iterations but two loops) was not. In fact it was 10 times slower and rendering 200 rows with 10 columns – our most typical use case – was much slower still.

This is when lesson #2 of optimising in the cloud really hit home: you can either do less or you can do differently, you don’t have the option of adding more of a vital resource. We could remove the ability of users to choose which columns they saw (making the column set fixed and removing the need for the nested loop) or we could change the way we rendered the table. In the end we decided to do differently and return all the results as JSON data and construct the table in Javascript on the client. Our first version of this approach gave us a 100% improvement in performance: 20s from request to availability.

However we also quickly worked out that our JSON based solution (this was before force.com released native support for JSON) was still pretty slow due to using ‘+’ for String concatenation in creating the JSON string. Replacing this with extensive use of String.format() gave us another 100+% performance improvement: 8.5s.

The second optimisation … and more lessons

We lived with this for a while. 10s wasn’t great but it was no slower than opening up an Excel spreadsheet (what users had been doing before they used our system) and general consensus was 10s was okay. Of course, what seems acceptable on day one rapidly becomes irritatingly slow and within a couple of months there was a lot of grumbling about performance especially as some people were reporting that the page ‘regularly’ took 20+s to load. This turned out to be rooted in some environmental issues: i) browser caches were being cleared out every time the browser was closed – a not uncommon admin setting in our customers’ domain and ii) the ISP routing for salesforce.com subdomains (used for accessing custom code in managed packages) turned out to be less than optimal – sometimes adding 6-8s to a request. The latter was a real eye-opener and we still haven’t got to the bottom of why that was the case but switching to a back-up ISP and resolving the browser cache issue ensured the customers’ were getting consistent 10s response times.

Once we’d resolved these problems we noticed that the latency in requesting Static Resources from force.com could be pretty high: ~1.5-2s in some cases (as is commonly the case all our Javascript and CSS files were packaged up as a zip file and deployed to force.com as a Static Resource). By moving our Javascript and CSS outside the package and delivering it via Amazon Cloudfront we won’t improve performance in the common circumstance of someone accessing the system via a browser with a fully populated cache but, we can shave a second or two off the overall response time for a browser with an empty cache as well as isolating ourselves from whatever the circumstances that cause force.com to update the timestamps for Static Resource caching (it seemed that every new force.com release required all browsers to completely repopulate their caches causing a rash of performance complaints directly after a major Salesforce upgrade).

Lesson #3 of optimising in the cloud: there’s not much you can optimise outside your own code, but there is more than nothing.

This round of work got us looking at our implementation again. One thing that really stood out was the amount of time we had a ‘blank page’ before any of our data was being downloaded, even ignoring client-side processing of it: 4s. Create a custom Visualforce page with just some text in it and it will have sub-second response time. Given we knew that querying and processing the data took only ~300ms it seemed surprising that it was taking ~3s to download the data. Some investigation here turned up a very surprising result – that the viewstate for the page was huge even though there wasn’t much real state within the page. By ensuring that the scope of the <apex:form/> tags was as narrow as possible (just encompassing the fields and actions within the form) and judicious use of transient variables we were able to significantly reduce the size of the viewstate and this brought the ‘blank page’ time down to below 2s and overall response times of ~4.5s. Additionally, the psychological effect of not staring at a blank page for a few seconds meant that people felt the page was a lot faster than you might expect from a 60% improvement.

Something else we looked at here, after realising that Static Resources can be a bit slow, was how long it took for force.com to download its own Javascript and Stylesheets. By removing the Salesforce header and sidebar, and opting not to load standard style sheets we took another 0.5s off the page load.

Lesson #4 of optimising in the cloud: a bit of understanding about how the platform works goes a long way to spotting areas for optimisation even when you think your actual code doesn’t have a lot of room for improvement.

With the use of the new native support for JSON (worth ~1.5s over our homegrown implementation, and resulting in vastly simpler code) and a few other minor tweaks we’re now down to a steady 2s for page load; a whopping 2000% improvement over that first, not-too-naive implementation.

Summary

Far from just having whatever performance you get out of force.com on your first effort, there is plenty of opportunity for optimisation if you find you need to do it. However, don’t look too much to the platform for bottlenecks (apart from a few specific cases mentioned) the answer lies in your use of the platform and in your custom code. String concatenation and viewstate in particular seem to be areas where you can gain some quite significant improvements with relatively little effort and, with the new support for JSON, shipping compact JSON strings that you then render on the client rather than having Visualforce do all your rendering on the server side for you is definitely a good option even if you don’t need nested <apex:repeats />. And if you see inconsistent performance in your customers’ environments there are certain things to go looking for there that can make a dramatic improvement.

Why we don’t Continuously Deploy

Continuous Deployment is very much en vogue; its seen as the natural extension of Continuous Integration and is a part of the lean startup philosophy. But we don’t do it and I thought it might be interesting to explain why.

  1. Our users like improvement but they don’t like change. We build a business system for business users and our users each spend 2-8 hours a day using our system. They want to see the system improve but having a consistent, recognisable user experience is more important than having one that is constantly changing, even if that change brings improvement. So we gather all their feedback as they use the system and package that into a release. When that release is ready we do demos and distribute documentation to make sure everyone is aware of the changes coming. Bundling up all improvements into one big change allows us to make sure everyone understands what is coming, is happy about what’s new, and there are no surprises when they log in.
  2. We do QA as well as testing. To me QA is a separate activity to writing and running automated tests. My experience of developer-written tests (unit, functional, integration and so on) is they tend to take a rational and logical view of the functionality under test: what happens when used correctly, what happens when expected parameters aren’t supplied, what happens when parameters are supplied with unexpected values? But users aren’t necessarily ‘rational’ or ‘logical’ as developers see it and having humans bash away at the system, hitting the back button randomly, pasting parameter values from surprising sources, leaving the browser open for an hour while they have lunch, and so on helps us to spot the gaps in the automated testing and the assumptions we made whilst writing them. But manual QA takes time and having continuous pressure to have it done as quickly as possible to support continuous deployment will be counter-productive. So we bundle our changes up into weekly QA releases which means QA can be done at a pace that suits the assurance of quality rather than speed of deployment.
  3. We’re still learning. To me there’s a bit of a paradox in Continuous Deployment as applied to lean startup. On the one hand, multi-variate testing and continuous deployment allow you to gather and act on lots of feedback very quickly. On the other hand, a constantly changing user experience makes it harder to make sense of that feedback. In our system there is a piece of core functionality that users use for ~3 hours each day. If that changed on a daily basis they’d quickly get pissed off with it and wouldn’t use it until it was ‘ready’ or ‘done’. We’re now on our third major re-working of that functionality and the latest version is vastly improved, based on real user feedback. But each new version has been introduced as a whole, with all the change management mentioned above, and so we’ve kept our users onside whilst making quite radical changes to the product.
  4. ‘Continuous’ is a relative, not an absolute, term. Sounds like a stupid statement but its a lesson I learned a while back. In a previous job I was doing some work with a company and their supplier to improve their responsiveness to change. We were pushing for weekly releases but quickly realised that, to a pair of companies used to 18-24 month release cycles, quarterly releases would be a big improvement and much more achievable than weekly. Were we being unambitious? Perhaps from our own point of view as eXtremists, but in terms of the goals our clients had, quarterly would take them along way to achieving what they wanted. At Singletrack we do releases every 6-10 weeks which is so much more continuous than other systems our users use (upgrades every 12-24 months is more the norm) but avoids the disruption that ‘true’ Continuous Deployment would cause.
So we currently do: Continuous Integration in development; Daily Builds with full runs of automated tests (takes about 3 hours otherwise we’d do this more frequently); Weekly QA releases; End-user releases every 6-10 weeks. A lot of this stems from our business: we sell a complex, powerful SaaS product to sophisticated and demanding business users. If we were running a B2C website with a host of irregular and infrequent users (from the point of view of our system, a user that doesn’t log in for hours every day is infrequent) then we would certainly go for a much more continuous process than the one we have now.

An economic model of technical debt?

A lot of what I’ve read about technical debt assumes that it is generally a bad thing. A lot of these articles also use credit card analogies to compare technical debt to personal finance and I think this is missing a trick. Businesses take a different view of debt to people and I wonder if using a more mature model of technical debt  would allow us to take a more nuanced view? [Caveat, I’m not an economist or an accountant, I’ve just run SME businesses for a few years and so have a little knowledge, possibly just enough to get this badly wrong].

Start-ups like ours run on credit. Whether its founders taking no salary, people working for sweat equity, friends and family or Angel funding, buying kit and services on personal credit cards, whatever – there’s going to have to be a bit of debt incurred to get from having nothing but an idea to having enough paying customers to sustain you. When we incur technical debt in our start-up it is in very much the same spirit: if things don’t pan out the debt isn’t going to have to be paid back. That doesn’t mean you take on as much credit as is being offered because loading the business with debt may quickly become one of the causes of its failure. But you take sensible risks, take the credit you need and work bloody hard to pay it off before it becomes a burden.

As a business starts to establish itself things like cashflow become more important. A business might use credit (e.g. order factoring or  bridging loans) to smooth the flow and to reduce their risk of running out of money. Many projects I’ve run experience a similar ebb-and-flow of requirements and I wonder if taking a longer-term view of the ‘requirements flow’ of a project might allow a more structured approach to technical debt; when there are lots of requirements to deliver in a short period, be prepared to take on a bit more technical debt. When there is less pressure to deliver requirements, pay the debt down. If the ‘less pressure’ phase isn’t looking like coming, take extraordinary action if it looks like the TD might get out of hand.

Once a business is established and looking to grow, it often takes on debt in order to achieve strategic objectives. What would be the equivalent in software delivery?

The problem with all the above is that it is still very much an analogy and I’m no great fan of analogies in software development. But what struck me about the conversation leading up to and following my previous post on Technical Debt  is that technical debt isn’t really a metaphor as such. It’s not like monetary debt – where taking a bit of extra risk and incurring a greater overall cost in the long term allows you to achieve important things in the short term – it is monetary debt. By accepting technical debt you are (presumably, or why else are you doing it) incurring a lower cost, or enjoying a higher income, today but incurring a higher cost in the long run.

So what if we stopped treating technical debt as metaphor and started considering it as monetary debt? Could we quantify the costs we save by not doing something today (e.g. pairing, refactoring, resolving limits issues, optimising, or even just applying good old YAGNI) that may cause us pain at a later date? Could we quantify the value gained by doing something valuable sooner? Could we quantify how much more it will cost at that later date when we have to deal with the pain? Could we use these numbers to base decisions about whether or not to incur technical debt?

Here’s a simple example from our start-up. Two customers wanted broadly similar features adding to our product but had different timescales and some differences in detailed requirements. We took the decision to build two separate versions of the feature, one for each customer, even though we knew that this would give us two broadly similar sets of code to maintain and, if we wanted to sell the same feature to other customers, we would not only need to refactor them into one code set, we’d also have to do some additional work to migrate the customer’s data from their individual versions to the new unified model.

In this instance it wasn’t about cost saving as such. Both customers were willing to pay for the feature to be added if we could do it to their timeframe. So we got some money up front, lets say $100 (I’ll preserve the ratios but I’m not prepared to reveal the real sums). The cost of building it twice was a bit more than building it just the once, let’s say $20 instead of $15. We then had an additional cost of maintaining two code sets for  awhile, lets say $3 instead of $2, and we then had the cost of the rework and migration: $12. All in, the gross profit for this was $100 – $20 – $3 – $12 = $65.

Now suppose we’d built it once for one customer, and then evolved it for the second customer. Immediate income is $50. Gross profit is $50 – $15 – $2 = $33. If we then sell to the second customer the profit goes up but probably not to $83, lets say $80 because there’s bound to be some migration or re-work required to keep customer 1 in alignment with what customer 2 wants.

In this economic model it is $15 more profitable not to incur the technical debt. But, and its a big but, in any business and particularly in a startup cash is king. $100 dollars this month is way better than $50 this month with the expectation of another $50 three months later. And that assumes customer 2 still wants to pay in three months time. By then they may have bought or built an alternative. If they don’t buy you’ve only made $33, not $65; by any conventional business model, a certain $65 profit is vastly preferable to a certain $33 profit with the possibility of an additional $47.

But whether or not you agree with the decision to go for the $100 now and incur the technical debt isn’t the point. The question isn’t whether we made the right choice in this instance, its whether quantifying like this makes it easier to surface the benefits and liabilities of technical debt? A lot of the blog articles about technical debt say its hard to quantify but making rough guesses about this stuff isn’t that hard and rough guesses should be all you need (let’s face it, most successful business are run on far rougher guesses about predicted revenue, predicted costs, etc.). And it seems to me that if, as techies, we could get better at using terms, equations and numbers the business understand we could get much better at communicating why we should or should not incur technical debt, what our current debt level is, what it will cost to pay down, what the financial, reputational or other forms of impact might be if the risks inherent in the debt don’t pan out, and so on.

Perhaps we could stop treating technical debt as a metaphor and start using it as a real tool for planning and delivering our products and projects.

Technical Debt and the Lean Startup

A conversation on twitter between @david_harvey,  @rachelcdavies@RonJeffries and myself made me to think it was about time to post on my view of technical debt in a lean startup and how that view has changed since the days of running XP-based projects for established companies.

There’s a lot of information about Technical Debt out there on the web (including Ward’s Video, a clear summary by Martin Fowler, and an interesting, albeit somewhat flawed, article by Rod Hilton) and I’m not going to get into a detailed description here, but at its simplest level it says that making some quick-and-dirty choices today may lead to pain later on, and that pain is like a debt that you’re going to have to pay back at some point.

Its that ‘at some point’ which has led me to believe there is a difference between technical debt as applied to a lean startup vs. that applied to a project for an established business. In a project you are trying to deliver something which meets some kind of business case. You will know roughly who the customer is, roughly what their objectives are, roughly what you can spend (time and money) in order to achieve them, and so on. There’s going to be a lot of stuff that isn’t known but there are some constraints in place. When I used to use XP to deliver these kinds of project we had a very simple view of technical debt: don’t incur it if you can avoid it and, if you can’t avoid it, pay it down at the earliest possible time. Why? Because in these projects sustainability is a primary objective: you know you’ll have to pay the debt down at some point so why let it grow uncontrollably?

In a lean startup not only do you not know what your customers want, you don’t even know who they are, and you usually have less time and money to spend on finding out than most established companies would try and run a project with. Your primary objective is finding those customers and learning what they want, not sustainability. Which isn’t to say sustainability is unimportant, just that its not your primary objective; once you have a business, then you worry about how to make it sustainable. Its a High Quality Problem.

Here are some examples of technical debt we incurred at various stages of our search for customers and what they wanted:

  • Areas of the system understood by only one person. I’m a pair-programming fanatic and would love for everything in our system to be developed by pairs. But when you have only one developer in the company, that isn’t possible. When you have two developers and one has to go support a sales meeting, that isn’t possible. In fact, our experience is that until you have 5+ full-time developers there are going to be times when it isn’t possible to have everything paired. And do you know what? That’s okay. Code can be developed by a single developer but the fact that they built it on their own is a form of technical debt: until others have worked on the code, it is more widely understood, and has had the benefit of many eyes on it, you are in debt to the system.
  • Large scale refactorings left undone. We use a number of underlying platforms and tools. If the platform or tool doesn’t do what we need we develop custom code. It isn’t uncommon for the tool or platform provider to then release something that does what our custom code does at a later date and, until we refactor to use the new platform or tool feature, we have some debt to the system. But that’s okay – the custom code still works, its just not as simple as it could be nor as future-proof if the provider develops the feature further.
  • Limitations left in place. Sometimes a truly scalable solution is much harder than a simple but limited one. We had a situation where using in-memory collections led to a hard limit on the number of objects we could handle but the in-memory version of the code took a couple of hours to produce. We knew how to get around the limit – by adding in a system of creating permanent records and then processing them asynchronously using scheduled jobs – but that was several days work. The limit was okay – the function still worked, just for small collections of objects, and we had a debt to the system as and when our customers couldn’t live with the size limit.
  • Sub-optimal design approaches. Sometimes it is easier to deliver a simplistic design than a well-crafted one. For example we delivered a quick-and-dirty reporting function that was simple but slow because we were making it work, then making it right but hadn’t yet got to making it fast. But that was okay, the information was available to the users and correct, just not very quickly delivered.

But why would we live with such limitations? Aren’t we just delivering a shoddy system?

The answer to the first question is that we’re prioritising learning what makes our business work over some abstract notion of quality. If we develop a feature and it turns out that customers don’t want to pay for it, does it matter if it is only understood by one developer, that it could be refactored, that it is limited or sub-optimal? We develop the feature to get it into customers’ hands as quick as possible, to understand whether it is something valuable or not, to understand how it should work. Only if that features is valuable is it worth paying down any debt we have incurred building it. In fact, never mind the feature, the same applies to the whole system: unless the system is sufficiently valuable to customers we have no business and if we have no business the whole question of quality is moot. Customers pay for features and benefits, not fully-paired, beautifully factored, limitation-free, optimised code.

The answer to the second question is an emphatic ‘no’. We produce production quality, tested code. The testing is something we don’t compromise on simply because it is the mechanism by which we will pay down the debt if and when the time comes to do so. I can live with the fact that a design is sub-optimal if we have tests which will help us optimise it when we have to. I can live with code that need refactoring if we have tests which will help us refactor it when we have to. I can even live with something written by one person if … well you get the picture.

In a startup, technical debt is something to be managed, not minimised. We make sure we understand how much debt we have and which bits of the system it affects. We make sure we have the ability to pay down that debt as and when we need to. And we make sure we work the time and money required to pay down debt into any timsescales or budgets we agree.

Anything less would be irresponsible. Anything more is prioritising some notion of ‘code quality’ over learning what makes our business work and equally irresponsible.

What is the cloud? Three little words: As-A-Service

I’ve been talking to a few people recently about what this cloud stuff I’ve been doing is really all about. Isn’t it just some marketing hype? If I set up some servers have I just created a cloud? Isn’t ‘cloud’ just another word for ‘Internet’?

Clearly it is a marketing term; witness Microsoft’s embarrassing and somewhat desperate attempt to brand nearly everything they do and sell as ‘cloud’. And there are many ways to set up a ‘private’ or ‘hybrid’ cloud on your own servers, some of which have been around for years and have been re-badged to appease the marketeers.

But what I see, what excites me about what we do, is so much more than the hype. To me the cloud is a computing platform defined by those three little words: as-a-service.

Having spent a professional lifetime building large-scale enterprise systems from the ground up it is mind-boggling to me that I can integrate an apparently infinite amount of versioned, replicated, backed up and edge-cached storage into our system with nothing more than a credit card and a surprisingly small amount of coding. And Amazon S3/Cloudfront is the least sophisticated of the services we utilise.

As-a-service I can deploy to a powerful application platform, complete with user management, sophisticated security, reporting, and more (force.com). Or I can deploy web applications built on Ruby on Rails (Herokusoon to support Clojure for FP fans) or Java (Amazon Elastic Beanstalk or Google App Engine). I can store and manipulate data, send bulk email, instrument and report on my services, the list is growing daily.

As-a-service: as I need it, paying for only what I use, with scalability, availability, security and performance all built in.

And it’s this as-a-service nature that I believe makes ‘the cloud’ a truly different and fundamentally better place to build software. The vision of Service Oriented Architecture is finally being realised but not in terms of the truly dreadful SOAP and WSDL that were bound too-tightly into the concept, or  auto-discoverable service white-pages, or centrally-controlled service architecture that simply re-packaged much of the distributed objects thinking into distributed services.

The as-a-service nature of the cloud creates a free market where service providers compete to capture our attention and take our money. To do this the emphasis isn’t on standardisation or ‘discoverability’ but on simplicity, effectiveness (compare the wonderful JSON with SOAP) and value. The ‘mash-up’ approach to service integration pioneered by web developers is an increasingly viable approach for building large-scale, enterprise-strength systems.

Old Dog, New Tricks – Lessons learned from a year with my head in the Cloud

Coming to the end of my first full year developing systems built exclusively on and for cloud platforms (primarily force.com) I figured this was a good time to reflect on some of the lessons learned and some old lessons unlearned.

How I learned to stop worrying and love the platform

[With apologies, Stanley Kubrik]

For 13 years or so prior to committing to the cloud I was a professional worrier. I had no doubt the various teams I worked in could deliver features and good code but part of my job was to worry about the non-functional characteristics of the system. How would it perform, how would it scale, would it be available, would it be secure, was it manageable, was it maintainable, etc. and so on? I like to think I got pretty good at making sure our systems did what they needed to in all these areas by building up a toolkit of common issues and their solutions.

Building on force.com and S3, there’s just no point worrying about most of these. I have no control whatsoever over the scalability or availability of the system. The security is what it is:secure to a degree that no project I’ve run ever had a budget to achieve; and we’d have to be trying pretty hard to leave security holes in our product. Manageability is built into the force.com platform and following the conventions means we get a high degree of manageability in our application ‘for free’. Maintainability is still under our control but this goes back to good old-fashioned code-quality: make sure the code is understandable, well factored, and covered in tests and you’ll be okay. Even performance, to a large degree out of our hands, requires a new way of thinking if we’re going to do anything about it.

So lesson learned/un-learned number one: take all the time and energy you used to spend worrying about the architecture and non-functional characteristics of the system you’re building and divert this into delivering great products and services. You need to understand what the platform offers in terms of security, etc. but there’s not much you can actually do to change it so you might as well learn to love it.

Don’t upset the Algorithm, baby

[With anguished apologies to The Noisettes]

On a walk through the Scottish hills with @johnsnolan he mourned the death of the algorithm. John’s view – which I share – is that many developers no longer understand the wide variety of algorithmic approaches available to them, nor is there enough discussion of algorithm as a subject. As John said: “Not everything can be solved with fucking Map-Reduce!”. Design patterns, object-orientation, frameworks, libraries and so on have lead to a homogenisation of development approach. Perhaps the current slightly hysterical vogue for functional programming languages is in part due to the frustration that, as developers, we spend more time assembling other people’s clever solutions than we do writing our own?

But this homogenisation is not a wholly bad thing. In teams of any significant size, having clear code understandable by the whole team is often better than having incredibly clever or efficient but somewhat obscure code. And you can have your cake and eat it: if your simple code is a bit slow or memory inefficient you can always spec a bigger server, throw a few new database indexes into the mix, up the thread pool size, up the cache size and so on.

But not on someone else’s platform.

On someone else’s platform algorithm is about the only tool you have at your disposal if you want to speed things up or make them more efficient (and on force.com, as on Google’s App Engine, there are some hard and fast limits on things like heap size to ensure your application plays nice on their servers which makes a degree of efficiency imperative). Force.com is a extremely well-optimised platform but its still possible to screw up end-user performance by doing too much calculation or returning too much data for any given request.

Lesson number two: spending time crafting efficient algorithms and designing a UX that supports these (by, for example, loading large data sets in pages rather than all in one go) is the best and possibly only option for ensuring consistently good performance.

It’s a Mashable, Mashable, Mashable, Mashable World

[No, I’m not apologising again]

Probably the nicest lesson to learn, and one that I addressed in my previous post, is that the cloud isn’t about one platform or one language but about creating and consuming services written and integrated in a variety of different languages running on a variety of different technologies. I hate the term ‘Mash-Up’ but there is something enduringly wonderful about the idea that you can take all of these services and platforms and splurge them together in different and interesting way. Its like the anti-SOA … a primordial mass of open and accessible stuff with no-one curating, standardising or organising it.

Lesson number three: developing applications for the cloud is about having some knowledge of a broad range of platforms, technologies and languages at your disposal, and an appreciation what each of them can do for you, rather than understanding one technology or language inside-out.

So my New Year’s resolution is to worry less, reaquaint myself with Djikstra, and learn a bit of Ruby and Heroku  and Node.js and …