<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:sa="http://socialagency.com/rss/events-artists.dtd"
>

<channel>
	<title>Scott Martin &#187; Uncategorized</title>
	<atom:link href="http://scottmartin.net/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://scottmartin.net</link>
	<description></description>
	<lastBuildDate>Sun, 12 Jul 2009 01:56:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Creating a virtual datacenter with Scalr and Amazon Web Services</title>
		<link>http://scottmartin.net/2009/07/11/creating-a-virtual-datacenter-with-scalr-and-amazon-web-services/</link>
		<comments>http://scottmartin.net/2009/07/11/creating-a-virtual-datacenter-with-scalr-and-amazon-web-services/#comments</comments>
		<pubDate>Sat, 11 Jul 2009 18:09:27 +0000</pubDate>
		<dc:creator>Scott</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scottmartin.net/?p=29</guid>
		<description><![CDATA[I have recently been tasked with putting together a new server setup under Amazon Web Services for our company&#8217;s web products.  We do a bit of everything, hosting both our own PHP-based products and open-source projects like Drupal and Wordpress.  We need the hosting to be both reliable and scalable to very high traffic on [...]]]></description>
			<content:encoded><![CDATA[<p>I have recently been tasked with putting together a new server setup under <a href="http://aws.amazon.com/">Amazon Web Services</a> for our company&#8217;s web products.  We do a bit of everything, hosting both our own PHP-based products and open-source projects like <a href="http://drupal.org">Drupal</a> and <a href="http://wordpress.org">Wordpress</a>.  We need the hosting to be both reliable and scalable to very high traffic on short notice.  We&#8217;ve been using AWS since the beginning, but I decided to create a new setup using <a href="http://code.google.com/p/scalr/">Scalr</a>, the free open-source EC2 management solution.  This has turned out to be a simply phenomenal solution for anyone who needs to do large-scale web hosting, so I&#8217;ll go through some of the details.</p>
<p><strong>MySQL</strong></p>
<p>Since we host Drupal and Wordpress-based sites, we have to have a reliable MySQL server with fault-tolerance and a solid backup system.  This is where Scalr really shines.  Out of the box, it can create a MySQL host that will:</p>
<ul>
<li>automatically spin up new slave servers when load has become too high and take them down when load is too low</li>
<li>host up to 60GB of data locally (faster), 1TB on EBS (slightly slower), or more on an array of 1TB EBS partitions</li>
<li>automatically backup to S3</li>
</ul>
<p>The only part that isn&#8217;t done automatically is the configuration of your software to use the correct master or slave MySQL server depending on the server configuration of the moment and whether a read or write query is happening.  At first, I thought this was going to be a difficult thing to deal with, especially considering we have Drupal and Wordpress installations that I don&#8217;t want to custom-patch to do this.  But then I discovered <a href="http://forge.mysql.com/wiki/MySQL_Proxy">MySQL Proxy</a>.</p>
<p><strong>MySQL Proxy</strong></p>
<p>MySQL Proxy is a very small, simple piece of software that has some truly amazing functionality.  We run it on each web-serving instance, where it provides a virtual MySQL server on localhost that proxies write queries to the master server and read queries to a random slave.  It will even failover to another slave if necessary, etc.  And it is easily dynamically configured.  We use an <a href="http://scottmartin.net/files/mysql-proxy">init.d script</a> to start or restart the proxy whenever there is a Scalr host-up/host-down event.  This lets any software on the box simply use a default mysql config and lets the heavy lifting be done by a centralized system.  If someday we scale to the point where we exceed the write throughput of an extra large EC2 MySQL master, we can use the scripting capabilities to create universal partitioning and sharding policies that are distributed to the web nodes.  Wonderful.</p>
<p><a href="http://www.gluster.org/"><strong>GlusterFS</strong></a></p>
<p>One way that standard PHP software is not built for scalability is that many packages assume the existence of a shared filesystem to store uploaded files in addition to a shared MySQL data store.  In an ideal world, all software would store its files on S3, as all our in-house software does.  But we don&#8217;t live in an ideal world.  Again, I thought this was going be a difficult solution using NFS, which I&#8217;ve used before and works but is difficult to set up, not very fast, and not very configurable.  But then I discovered this <a href="http://www.gluster.org/docs/index.php/GlusterFS_on_Scalr/EC2">guide</a> to using GlusterFS on Scalr, which is another simple but amazing solution.   We have one Gluster server which serves a 1TB EBS partition to all the web servers.  We&#8217;re storing all our code for everything on there for now, but if that turns out to slow things down, we&#8217;ll store code locally with an rsync script that will sync from the share.  The partition is automatically backed up with incremental snapshots using built-in Amazon functionality.  This server is a single point of failure for the entire cluster, so we will eventually use a mirrored GlusterFS configuration, which it easily supports.</p>
<p><strong>Web servers</strong></p>
<p>Scalr provides auto-scaling for web servers, and out-of-the-box load balancing of multiple servers using a separate load-balancing instance.  This is simple and solid, a great solution.  Each web server also runs a 64MB memcached server, but Scalr can also provide dedicated memcached servers.  The share lets us distribute Apache configurations and code to all the instances simultaneously.  I set up Postfix to use an <a href="http://authsmtp.com/">Authsmtp</a> account, so outgoing mail can be sent normally by PHP but not be blocked as spam as most mail coming from EC2 instances is.</p>
<p><strong>Backend servers</strong></p>
<p>We have backend servers performing certain periodic tasks (DB maintenance, web crawling, etc).  We control these with cron jobs that insert events into SQS.  Scalr scales this role based on the length of the SQS queue, so it only runs one server unless there&#8217;s a backlog, in which case it magically creates enough servers to take care of it and then shuts them down.</p>
<p><strong>Scalr highs and lows</strong></p>
<p>Scalr is great, great software, but it&#8217;s not perfect.  It doesn&#8217;t handle servers like our Gluster master well that need only one server running and a certain disk mounted all the time.  I&#8217;m not convinced it would auto-recover from a Gluster master failure, but I think it would from just about any other detected failure (as always, the real risk to downtime comes from undetected failure).  If the MySQL host goes down, new servers are created to replace it, and the same goes for web servers and load balancers.  The scripting interface makes it easy to configure your software to respond to events, making it fairly easy to configure software to follow the dynamic configuration of the cluster.  The rebundling feature kicks ass; it means that all you have to do to install new server software is log into one instance, make the change, and then tell Scalr, which packages up the instance, creates an AMI, and then replaces all instances of that role with new instances based on the new AMI.  This beats writing AMI creation scripts hands-down.  On the other hand, it basically only supports Ubuntu, and only Ubuntu from the base AMIs that they have prepackaged for you.  You can easily build whatever you want on that, but for some applications that could be a dealbreaker.  For me, it&#8217;s fine; Ubuntu is a great distro.  The Scalr interface definitely has some rough edges and some real bugs, and there are some places where I see we might need to patch to get the functionality we need.  But that&#8217;s open-source software for you.  The only real competitor to this is Rightscale, which costs big money.  I&#8217;ll take it the free one with the bugs that I can fix if I have to, thanks.</p>
<p><strong>Tips</strong></p>
<ul>
<li>Run Scalr on a non-AWS host that doesn&#8217;t disappear, because it has no watchdog watching it.  It needs at least 512MB of ram, so the 512MB Slicehost slice works well.  Don&#8217;t try to install it on anything that&#8217;s not Ubuntu, it needs obscure PHP modules that are easily available as Ubuntu packages.</li>
<li>Security: lock down your Scalr install to only accept connections over an SSH tunnel (except for requests from the AWS farm to /query-env, /event_handler.php, or /config_opts.php).  The master keys to everything are protected by nothing more than over-the-wire passwords if you don&#8217;t, which is not necessarily unacceptable, but makes me nervous in a way that the good old SSH private key doesn&#8217;t.</li>
</ul>
<p><strong>Advantages<br />
</strong></p>
<p>This is a game-changer for startups, or for anyone who needs a datacenter solution.  I created this setup in under a week of work with next to no up-front investment.  No buying servers from Dell, no forecasting of usage patterns, no driving to datacenters, no tape backups.  And because of a combination of Amazon&#8217;s amazing pricing and the on-demand nature of Scalr, I expect to end up paying less than half of what a traditional datacenter solution would cost.  When you have traditional servers, you have to buy enough of them to handle the biggest amount of traffic you expect in the next month + 50%, or risk being caught trying to rush-ship new servers to meet demand, which is a place you don&#8217;t want to be.  With this solution, if we have clients with campaigns that take off overnight, we just pay for what we actually need during each hour, which I think will probably average out to be about half of the iron we would need under the max + 50% rule.</p>
<p>There are reasons beyond the strict bottom line as well.  Much of they money you spend for web hosting of any kind goes into power and cooling from electricity sources inside the United States, which today translates into mostly coal.  So it&#8217;s in all of our interests to minimize our datacenter costs, because every dollar that we spend indirectly increases global warming just a bit.  Additionally, because of their simply absurd scale, Amazon&#8217;s datacenters are, I&#8217;m sure, more than marginally more electricity-efficient than any colocation facility per computing unit, increasing the carbon savings.</p>
]]></content:encoded>
			<wfw:commentRss>http://scottmartin.net/2009/07/11/creating-a-virtual-datacenter-with-scalr-and-amazon-web-services/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>The new Facebook application ecosystem</title>
		<link>http://scottmartin.net/2008/11/14/the-new-facebook-application-ecosystem/</link>
		<comments>http://scottmartin.net/2008/11/14/the-new-facebook-application-ecosystem/#comments</comments>
		<pubDate>Fri, 14 Nov 2008 16:37:13 +0000</pubDate>
		<dc:creator>Scott</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scottmartin.net/?p=21</guid>
		<description><![CDATA[A post written for Social Agency&#8217;s corporate blog, Socialsama.
One of the interesting parts of developing social applications is that the platforms and technologies we interact with are periodically changing, and we must keep pace both with our platform and with our application marketing strategies.
Facebook recently completed a fundamental redesign of much of its site.  Transitions [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p><em>A post written for <a href="http://socialagency.com/">Social Agency</a>&#8217;s corporate blog, <a href="http://www.socialsama.com/">Socialsama</a>.</em></p></blockquote>
<p>One of the interesting parts of developing social applications is that the platforms and technologies we interact with are periodically changing, and we must keep pace both with our platform and with our application marketing strategies.</p>
<p>Facebook recently completed a fundamental redesign of much of its site.  Transitions on such a big platform are always painful, and a portion of the Facebook user base was vocal in their opposition (and were covered widely in the media).  Nevertheless, the new Facebook is now fully implemented, and we are starting to see the application ecosystem adapt to the new restrictions and features.</p>
<p>The biggest change from the application point of view is the fact that most of the profile boxes that were the prime source of application virality have been pushed onto a rarely-visited tab of the profile. Previously, going to a friend&#8217;s profile displayed all the boxes from their various applications, and users tended to discover new applications by seeing cool things on their friends&#8217; profiles.  This meant that the most successful applications were those that made the user say &#8220;I want that&#8221; when seeing the profile box on their friend&#8217;s profile.  Of course, this led to a competition over profile real estate from applications, and a top user complaint about the old Facebook was that the profile pages were crowded and difficult to use. The new Facebook has made the profile box irrelevant for most apps.</p>
<p>The new profile page is centered around a feed of user activities on Facebook and its applications.  This means that the way a user discovers a new application is by seeing an action by one of their friends and saying &#8220;I want to do that&#8221;.  The feed is the central interface for Facebook, and much of the social interaction a user has now on Facebook is to consume new feed entries created by their friends.</p>
<p>So, what does a current successful Facebook app look like?  It invites meaningful and frequent user interaction, and posts these interactions to the user&#8217;s feed.  Users can discuss feed entries in the feed itself, so a successful app&#8217;s feed entries invite conversations between users.  Additionally, the new Publisher interface lets applications integrate completely into the feed interface, letting users compose entries from their home page with an application-defined interface, lowering the barrier to creating a new entry.</p>
<p>In a future post, we&#8217;ll look at what makes a successful MySpace app and what an app needs to do in order to succeed in both worlds.</p>
]]></content:encoded>
			<wfw:commentRss>http://scottmartin.net/2008/11/14/the-new-facebook-application-ecosystem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Definition: palin</title>
		<link>http://scottmartin.net/2008/10/01/definition-palin/</link>
		<comments>http://scottmartin.net/2008/10/01/definition-palin/#comments</comments>
		<pubDate>Wed, 01 Oct 2008 20:04:25 +0000</pubDate>
		<dc:creator>Scott</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scottmartin.net/?p=18</guid>
		<description><![CDATA[palin (/ˈpeɪlɪn/)
v.
To attempt to answer a question for which the speaker knows no answer by saying random phrases related to the question&#8217;s domain in the hopes that the phrases will spontaneously create meaning.
Palining is almost always obvious to an audience with knowledge of the question&#8217;s domain, but palining sometimes works when the audience is less [...]]]></description>
			<content:encoded><![CDATA[<p>palin (<span class="IPA" title="Pronunciation in the International Phonetic Alphabet (IPA)"><span class="mw-redirect">/ˈpeɪlɪn/)</span></span></p>
<p>v.</p>
<p>To attempt to answer a question for which the speaker knows no answer by saying random phrases related to the question&#8217;s domain in the hopes that the phrases will spontaneously create meaning.</p>
<p>Palining is almost always obvious to an audience with knowledge of the question&#8217;s domain, but palining sometimes works when the audience is less knowledgeable than the speaker. Example: &#8220;I tried to palin my way through that essay exam, but the professor saw right through it.&#8221;</p>
<p>Palining in real life:</p>
<p style="padding-left: 30px;">COURIC: Why isn&#8217;t it better, Governor Palin, to spend $700 billion helping middle-class families who are struggling with health care, housing, gas and groceries; allow them to spend more and put more money into the economy instead of helping these big financial institutions that played a role in creating this mess?</p>
<p style="padding-left: 30px;">PALIN: That&#8217;s why I say I, like every American I&#8217;m speaking with, were ill about this position that we have been put in where it is the taxpayers looking to bail out. But ultimately, what the bailout does is help those who are concerned about the health-care reform that is needed to help shore up our economy, helping the—it&#8217;s got to be all about job creation, too, shoring up our economy and putting it back on the right track. So health-care reform and reducing taxes and reining in spending has got to accompany tax reductions and tax relief for Americans. And trade, we&#8217;ve got to see trade as opportunity, not as a competitive, scary thing. But one in five jobs being created in the trade sector today, we&#8217;ve got to look at that as more opportunity. All those things under the umbrella of job creation. This bailout is a part of that.</p>
]]></content:encoded>
			<wfw:commentRss>http://scottmartin.net/2008/10/01/definition-palin/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>FUTURESHOCK</title>
		<link>http://scottmartin.net/2008/08/06/futureshock/</link>
		<comments>http://scottmartin.net/2008/08/06/futureshock/#comments</comments>
		<pubDate>Wed, 06 Aug 2008 22:03:14 +0000</pubDate>
		<dc:creator>Scott</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scottmartin.net/?p=16</guid>
		<description><![CDATA[My new driver&#8217;s license is good until 2014.  Has our society decided between saying twenty-fourteen or two thousand fourteen yet?  Either way, FUTURESHOCK.
]]></description>
			<content:encoded><![CDATA[<p>My new driver&#8217;s license is good until 2014.  Has our society decided between saying twenty-fourteen or two thousand fourteen yet?  Either way, FUTURESHOCK.</p>
]]></content:encoded>
			<wfw:commentRss>http://scottmartin.net/2008/08/06/futureshock/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
<sa:location>location</sa:location>
<sa:event_start_time>start time</sa:event_start_time>
<sa:event_start_time>end time</sa:event_start_time>
<sa:artist_name>artist name</sa:artist_name>
<sa:song_name>song name</sa:song_name>	</item>
		<item>
		<title>MySQL is obsolete</title>
		<link>http://scottmartin.net/2008/03/20/mysql-is-obsolete/</link>
		<comments>http://scottmartin.net/2008/03/20/mysql-is-obsolete/#comments</comments>
		<pubDate>Fri, 21 Mar 2008 02:30:22 +0000</pubDate>
		<dc:creator>Scott</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scottmartin.net/2008/03/20/mysql-is-obsolete/</guid>
		<description><![CDATA[It&#8217;s hard for me to admit, but the landscape of web application technology has recently shifted.  MySQL, and relational databases in general, have started to become obsolete.  The problem is that MySQL doesn&#8217;t scale.  Well, it does, but only so far.  A typical story:
Your new database-backed website is starting to become popular, and load times [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s hard for me to admit, but the landscape of web application technology has recently shifted.  MySQL, and relational databases in general, have started to become obsolete.  The problem is that MySQL doesn&#8217;t scale.  Well, it does, but only so far.  A typical story:</p>
<p>Your new database-backed website is starting to become popular, and load times at peak hours are starting to rise.  You add a slave server or two to spread out the reads, and put a lot of the reads in a cache.  That helps, and load times fall.   You can repeat this a few times, but eventually you hit a wall where you have too many writes to handle, no matter how much hardware you have.  Then you&#8217;re stuck.  You have to start partitioning your data into separate clusters, but your application was written with a lot of joins, and a lot of stuff no longer works if everything isn&#8217;t in the same database.  So, you&#8217;ve got a big development effort that has a high possibility of failure.</p>
<p>I suspect that this story would happen to the majority of startups out there if they had dramatic success.  It happened to Friendster, and it cost them global success as the social networking space which they basically invented and completely owned slipped through their fingers.</p>
<p>Relational databases suck for another reason.  If you want any kind of user-defined data, where the user needs to customize the fields gathered, you are faced with either dynamically altering tables (maintenance nightmare, not scalable), or entity-attribute-value tables (hard to query, not scalable).   Scalability is great, but this is the real flaw that&#8217;s been hitting me lately.  There are no good solutions to this problem in the relational world.  Google had all these problems years ago with their web indexing database, so they made BigTable, which currently holds their entire 800TB index of the web with metadata in a single table.  That technology now holds almost everything they do.</p>
<p>BigTable is Google-proprietary, unfortunately.  But as of the last year or so, there are reasonably-mature implementations of BigTable and the Google File System that it runs on.  Hadoop is the GFS equivalent, it takes care of managing an arbitrarily-large set of constantly-changing data.  HBase is the BigTable equivalent, providing a REST interface to a simple query language.  These are both very new technologies, but Yahoo is using Hadoop to manage its crawled web data, and HBase is running Powerset, one of the most database-intensive web ideas I can think of.</p>
<p>These are column-oriented databases.  You don&#8217;t define columns, you define column families, and a row can have zero to many of each family, each uniquely tagged.  This lets you do away with many-to-many joins, and have infinite capability for user-defined fields that are stored with the main data.  Empty columns are never even stored.</p>
<p>We&#8217;re planning to run Hadoop and HBase on a cluster of Amazon EC2 machines.  We&#8217;ll use S3 as the disk that the Hadoop cluster reads and writes from.  What this means is that since every system (web, DB, storage) scales with the number of machines in operation, all more traffic will ever mean to us is a bigger bill from Amazon.  And I bet that bill will be lower than the total cost of ownership of the equivalent traditional setup, because we can run fewer servers at night and save money.</p>
<p>So, the next bit of my life will be spent writing a solid database abstration layer between CakePHP and HBase, and an EC2 monitor and dynamic machine allocator.  I hope to open-source these so we can make this a realistic choice for more people out there.</p>
]]></content:encoded>
			<wfw:commentRss>http://scottmartin.net/2008/03/20/mysql-is-obsolete/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SXSW</title>
		<link>http://scottmartin.net/2008/03/11/sxsw/</link>
		<comments>http://scottmartin.net/2008/03/11/sxsw/#comments</comments>
		<pubDate>Wed, 12 Mar 2008 04:09:53 +0000</pubDate>
		<dc:creator>Scott</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[scalabilty work mac]]></category>

		<guid isPermaLink="false">http://scottmartin.net/2008/03/11/sxsw/</guid>
		<description><![CDATA[I was privileged enough to go to South by Southwest Interactive this weekend.  I&#8217;m quite tired, but it was definitely worth it.  I went to as many panels as I could, and there were a lot of really great ones.
Highlights:

The creator of Slideshare showing off the really slick Ajax he did to absolutely maximize usability.  [...]]]></description>
			<content:encoded><![CDATA[<p>I was privileged enough to go to <a href="http://2008.sxsw.com/interactive/">South by Southwest Interactive</a> this weekend.  I&#8217;m quite tired, but it was definitely worth it.  I went to as many panels as I could, and there were a lot of really great ones.</p>
<p>Highlights:</p>
<ul>
<li>The creator of <a href="http://www.slideshare.net/">Slideshare</a> showing off the really slick Ajax he did to absolutely maximize usability.  <a href="http://swfupload.org/">SWFUpload</a> is a technology I will definitely be using in the future.</li>
<li><a href="http://www.randsinrepose.com/">Michael Lopp</a>, a product manager at Apple whose opinions I really respect showing how the creative design process works there, which is something I&#8217;ve wondered about.</li>
<li>A panel composed of old-school marketing dinosaurs completely collapsing under the weight of the Twitter-enabled audience in a mini version of the <a href="http://blog.wired.com/underwire/2008/03/sxsw-mark-zucke.html">Zuckerberg debacle</a>.  Beautiful.  I missed the Zuckerberg thing in person, but these kind of audience revolts were happening all over the place.  If most of your audience is actively talking about you behind your back, they&#8217;re not going to put up with anything.</li>
<li>Architects from <a href="http://twitter.com/">Twitter</a>, <a href="http://www.sixapart.com/">Six Apart</a>, <a href="http://www.meebo.com/">Meebo</a> and more discussing their architecture struggles, which was great.</li>
<li>Several of the people behind <a href="http://openid.net/">OpenID</a> discussing the various implications of it.  I am now confident in saying that I don&#8217;t know how people will authenticate themselves on the web in twenty years, but I know they&#8217;ll be doing it over the OpenID protocol.  If you&#8217;re a small company today that wants to maximize user registrations (and who doesn&#8217;t), you should be accepting OpenIDs with the Yahoo button.  You can make it more generic after the public is better-educated.</li>
<li>The creators of every major Javascript library (<a href="http://jquery.com/">jQuery</a>, <a href="http://www.prototypejs.org/">Protoype</a>, <a href="http://script.aculo.us/">Scriptaculous</a>, <a href="http://dojotoolkit.org/">Dojo</a>) on one panel discussing highly technical Javascript techniques.  I vote for SXSW to have more panels like this.</li>
<li>Developers from <a href="http://digg.com">Digg</a>, <a href="http://flickr.com">Flickr</a>, and <a href="http://wordpress.com/">Wordpress</a> discussing their insane scalability.  I&#8217;ve been doing a lot of scalability research lately, and now I&#8217;m fairly confident I can build systems that can be scaled in the future.</li>
<li>I stood next to Jeff Bezos for a couple of minutes, and had conversations with other important people.</li>
</ul>
<p>All in all, it was a great experience.   I hope I get to solve my own scalability problems soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://scottmartin.net/2008/03/11/sxsw/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What do to right from the beginning</title>
		<link>http://scottmartin.net/2007/12/14/what-do-to-right-from-the-beginning/</link>
		<comments>http://scottmartin.net/2007/12/14/what-do-to-right-from-the-beginning/#comments</comments>
		<pubDate>Sat, 15 Dec 2007 04:46:48 +0000</pubDate>
		<dc:creator>Scott</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scottmartin.net/2007/12/14/what-do-to-right-from-the-beginning/</guid>
		<description><![CDATA[I was talking with my coworker Paul today about what we would do differently from the beginning of a big software project.  The platform that we work on is great and getting better every day, and some of these decisions were made right from the start, but some weren&#8217;t, and we&#8217;ve had to suffer [...]]]></description>
			<content:encoded><![CDATA[<p>I was talking with my coworker <a href="http://blog.paulbonser.com/">Paul</a> today about what we would do differently from the beginning of a big software project.  The platform that we work on is great and getting better every day, and some of these decisions were made right from the start, but some weren&#8217;t, and we&#8217;ve had to suffer through fixing them.</p>
<p>So, what&#8217;s important from the very start? Most of these items are applicable to any large software project, but some of them are specific to large web sites run by PHP.  In no particular order:</p>
<ul>
<li> Internationalization. If your project is successful at all, someone will eventually want to use it in a language other than English.  This will probably happen sooner than you think, because even though you&#8217;ve never met them, there are a billion Chinese people with internet access.  This means that every bit of English text you put into your application needs to have hooks built-in to translate it.  Thankfully, the foundation to do this right is there, with GNU <a href="http://www.gnu.org/software/gettext/">gettext</a> and the locale setting in the environment, but you have to be aware of this from the beginning, even if you have no plans to internationalize.  It costs almost nothing to put those hooks in now, just in case.</li>
<li>UTF-8.  If you don&#8217;t use UTF-8, you make it so that no one can post content to your site in any language other than one that uses Latin characters, and you&#8217;ll eventually be forced into a messy, annoying, never-quite-done transition from ISO-8859 to UTF-8.  If you use UTF-8 from the beginning, you&#8217;ll never really think about the fact that user 375209 posts his blogs mostly in Korean but sometimes in Thai. The fact that pretty much everything out there defaults to 8859 is like a bad joke.  Do you not understand what I&#8217;m talking about?  Read <a href="http://www.joelonsoftware.com/articles/Unicode.html">this</a>.</li>
<li>HTML filtering (XSS prevention).  If you are making a site that allows non-authenticated users to post text that other users will see (which is just about every dynamic web site ever), not doing complex, difficult, slow filtering on every last bit of textual user input means that a malicious user can redirect anyone who would have seen their text to a site of their choosing. And yes, in real life, people do this, usually with redirects to porn.  It happened to one of my clients this week (not my fault).  This is so difficult to get right that even huge sites like Myspace get this wrong regularly. Again, there are some great open source libraries out there, such as the <a href="http://htmlpurifier.org/">HTML Purifier</a>.  And don&#8217;t think you can get away with filtering out all HTML; it&#8217;s the future, people want to post rich text. When the first user posts the first porn redirect on your front page, you need a solution you can implement in a few hours, and if you haven&#8217;t done filtering at all, that&#8217;s just not possible.  Don&#8217;t use a BBCode-style solution; it&#8217;s not good for a variety of reasons.</li>
<li>Model-view-controller.  We don&#8217;t use a particular framework for this, but it&#8217;s the philosophy that&#8217;s important.  In any request, there should be a controller that reads user input, a model that does data transformation/processing, and a view that controls the format of the output sent to the user.  We use <a href="http://smarty.php.net/">Smarty</a> for the view.  Don&#8217;t make your own templating system or use simple PHP files, use Smarty.  With Smarty, you aren&#8217;t tempted to do complex stuff.</li>
<li>Code standards.  Decide on spaces or tabs (important for source control). Use PHPDoc religiously.  Decide on camel case or underscores.  Don&#8217;t use functions (well, sometimes), use objects.  Write these decisions down.</li>
<li>Scalability.  Maybe you&#8217;ll be successful and you&#8217;ll get to the point where it all can&#8217;t run on a single rented box, and you&#8217;ll start to have to worry about performance.  At that point, you need to start caching your models (this is one of the reasons you need MVC).  We use memcache.   One of the things you can do now is keep all methods that change data in the same place.   Later you can put code there to kill the cache entry for this object so it can be regenerated with new data.</li>
<li>Don&#8217;t worry about performance.  The number of CPU cycles you take up processing your text is no longer important up to a pretty absurd point in today&#8217;s world of cheap fast parallel web servers.  Your DB server is what&#8217;s important, since it&#8217;s pretty difficult to have more than one of them, and extrordinarily difficult to have more than one that you can write to.  And if you notice someday that one particular page is really slow, well, then cache it.</li>
<li>Database abstraction.  Do not write SQL queries in your code.  This is probably the most controversial item in this list, but I stand by it.  Never write SQL queries.  Generate them programmatically.  We use <a href="http://pear.php.net/package/DB_DataObject/">DB_DataObject</a>.  Today, I&#8217;d probably use <a href="http://pear.php.net/package/MDB_QueryTool/">MDB_QueryTool</a>, but they&#8217;re basically equivalent.  It can be a pain, especially when you&#8217;re talking multi-join, but this is also where it&#8217;s most useful.  A big query with a bunch of variables which could be filled with where clauses and joins (but maybe not, make sure and put an and at the beginning of your additional condition unless you&#8217;re the first condition and spaces at the beginning and end of all variables, just to be sure!) is impossible to maintain, and the business rules in it are easy to overlook.  SQL is a 30-year-old language and sucks hard in some pretty basic ways; let a library deal with the pain for you.</li>
<li>No stored procedures. No triggers.  No views.  No foreign keys.  The world has moved on from the days of the all-powerful DBMS whose status at any moment can be relied upon by a court of law.  Your database should just be tables that you can add and delete from.  Simple.  There&#8217;s nothing you can do in a stored procedure that you can&#8217;t do in your main code, but there are plenty of things you can&#8217;t do from a stored procedure.</li>
<li>Security.  Security is important.  Write every piece of code with an eye toward who the user is and whether they can do this.  This probably means roles, etc.  It&#8217;s a pain in the ass.  It&#8217;s important.  Do periodic security audits.</li>
<li>SQL injection.  SQL injection prevention is important and hard to get right every time, so you need to do it systematically.  Here, more than any other security concern, is where one bad line will totally screw you.</li>
<li>Standardized HTML/CSS.  This is our next big project, since we didn&#8217;t do this one right at all.  I don&#8217;t know the right answer to this one yet, but I will soon.  <a href="http://developer.yahoo.com/yui/">YUI</a>?</li>
<li>URL rewriting.  As strange as this seems, people really do care about what the URLs of pages look like, even if they&#8217;re never going to use them.  We use <a href="http://pear.php.net/package/Net_URL_Mapper">Net_URL_Mapper</a>, but I&#8217;m not sure what the best answer is here either.</li>
<li>No code generation.  Also a controversial statement.  My experience with code generation has been universally negative. My primary argument is that changes in the code that transforms your configuration into your implementation should have real-time effects, not effects that show up months later when you want to change the configuration but the regeneration you just did broke something else because configurations like this were phased out six months ago, except for this one&#8230; you get the idea.  If the performance of this transformation is too slow for real-time, then just cache it forever in memcache, but don&#8217;t keep it around on the disk.</li>
<li>All warnings on in development, all fatal errors logged and examined.  If you do this, you will increase your software reliability significantly.</li>
<li>No configuration in the database. There are many reasons for this, but the best one is that it&#8217;s pretty much impossible to usefully version control the contents of database tables.  Configuration should go in files that are version controlled.  We use <a href="http://json.org/">JSON</a>.  Simple, human editable, machine editable.</li>
<li>Open source.  Use all open source software.  Do you really trust your business to the whims of some other company?  What if there&#8217;s some bug in their software that&#8217;s really screwing you up, but you can&#8217;t fix it?  There&#8217;s plenty about web development you can&#8217;t control without voluntarily using closed software in your architecture, especially when there&#8217;s such good stuff out there.</li>
<li>Database layer isolation.  That&#8217;s not a good phrase for what I&#8217;m talking about, but I&#8217;m not sure what the industry word is for this.  To put it simply, don&#8217;t have queries for a specific table all over your code, call one piece of code that queries that table.  What if you want to add a hidden flag to that table?  Then you&#8217;ve got to go add a where clause to all those queries, and you&#8217;ll miss one, and then the hidden thing will show up on the front page, and people will be pissed.</li>
</ul>
<p>This is just what&#8217;s important to do right from the beginning.  There&#8217;s a whole other set of processes that are just as important, but it&#8217;s the kind of thing you can do as you grow.  That&#8217;s for next time.</p>
]]></content:encoded>
			<wfw:commentRss>http://scottmartin.net/2007/12/14/what-do-to-right-from-the-beginning/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>First post</title>
		<link>http://scottmartin.net/2007/09/05/first-post/</link>
		<comments>http://scottmartin.net/2007/09/05/first-post/#comments</comments>
		<pubDate>Thu, 06 Sep 2007 02:27:22 +0000</pubDate>
		<dc:creator>Scott</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scottmartin.net/2007/09/05/first-post/</guid>
		<description><![CDATA[I&#8217;ve decided to start using my web presence at scottmartin.net if for no other reason than that it&#8217;s not cool to have the &#8220;your name&#8221; domain sit idle.  A blog seems like a good a thing as any.  I don&#8217;t really care if anyone other than me ever reads it.  You can [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve decided to start using my web presence at scottmartin.net if for no other reason than that it&#8217;s not cool to have the &#8220;your name&#8221; domain sit idle.  A blog seems like a good a thing as any.  I don&#8217;t really care if anyone other than me ever reads it.  You can comment if you want, but I almost certainly don&#8217;t care about what you have to say unless I actually know you or you are Lyle.</p>
<p>So for a start, I&#8217;ve moved away from using Livejournal to read RSS feeds and <a href="http://scottmartin.net/feeds/">wrote my own reader</a>.  It&#8217;ll work better on my phone anyway.  <a href="http://simplepie.org">SimplePie</a> makes RSS reading laughably easy.  Why are the best libraries for actually doing stuff always hard to find? I was looking for work and all I could find was Magpie, which is GPLed and therefore useless (does no one understand that you might as well put your library under the You Can&#8217;t Use This Unless You&#8217;re a Hippie license), and some other libraries that care what format an RSS feed is in.  Why are building the future of the internet on a standard with at least three incompatable versions that do the same thing?  And that store dates using some weird text format?  It takes SimplePie 10K lines of code to sufficiently abstract all the nonsense.</p>
<p>Another hidden but great library: <a href="http://pear.php.net/package/Net_URL_Mapper">Net_URL_Mapper</a>, a Pear library that neatly does the Ruby URL mapping thing but that has almost nothing written about it in English.  It works perfectly, by the way.  This page will probably show up on the first Google result page for it, and that&#8217;s pathetic.   (Google searcher: look at the test cases, they are all the documentation you need.)</p>
]]></content:encoded>
			<wfw:commentRss>http://scottmartin.net/2007/09/05/first-post/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
