1. Staying Alive 2 years ago

    When we announced that Fiesta was shutting down the reaction was beyond anything we could’ve expected. Support poured in via email, twitter, and in comments on the blog post. That support inspired us to spend the past several weeks scrambling to find a way to keep Fiesta alive, and I’m very happy to announce that it won’t be shutting down after all.

    A good friend of mine and one of the earliest Fiesta users, Joseph Perla, noticed the overwhelming response and reached out to us. He will be taking over the maintenance and development of Fiesta, and setting the course for its future. Joe is a great developer (and person), and I’m confident he’s the right person to keep Fiesta alive and flourishing for a long time to come. Expect to hear more from Joe in the weeks and months to come.  Joe is “committed to keeping Fiesta groups alive forever as a core enhancement to email.”

    Sorry for the confusion/concern caused by the shutdown notice, but know that we wouldn’t have been able to find a way to keep it alive if it weren’t for all of your support. So, thank you, and happy Fiesta’ing!

    Mike

  2. Comments
  3. So long 2 years ago

    Update (2/28/2011): We’ve found a way to keep Fiesta alive for good. See here for details.

    We’ve got some sad news for you: we have decided to stop working on Fiesta and to shut the service down. If you’re interested in why, there are more details at the end of this post. If not, here’s the quick overview of how this affects you: we’ve made an export tool available on the list management page to export your lists. We’ve turned off the ability to create new lists, but your current lists will continue to function until March 1st. Hopefully that gives you time to find an alternate solution and coordinate with the other members of your lists. After March 1st, we’ll be turning off the servers that handle incoming email (so lists will no longer function). At that point all personal data will be permanently deleted.

    We owe all of you a huge debt of gratitude for using Fiesta and making it so fun for us to work on over the past 13 months. We sincerely hope that it’s been a useful tool for you. If you have any questions or concerns please feel free to get in touch with me directly: mike@corp.fiesta.cc

    We wish you way more than luck,

    Mike and Dan




    If you’ve read this far, it means you’re curious to hear about *why* we’re shutting down.

    I started work on Fiesta as a reaction to a real problem - communicating with groups online is harder than it should be. I think that we’ve come really far with Fiesta and have solved a lot of the problems that we set out to solve. That said, it really has been a labor of love. Dan and I have worked tirelessly to get things to the point where they are now. While Fiesta has been growing rapidly, there is still a long way to go.

    To be totally honest, I’ve lost the enthusiasm that is necessary to continue to work tirelessly on solving the problem. The reasons for that are probably too uninteresting to list here. The summary is that it seems to me that life is too short to spend time on something that I’m not 100% passionate about, so I’ve decided to stop spending time on Fiesta.

    You may have some more questions:

    Why not keep Fiesta running but spend less time on it?

    Running a service like Fiesta is relatively expensive. There are financial costs (like paying for servers to host the service), but there are also time costs. Behind the scenes we do things every day like fighting spam (still partially a manual process even at really large scale services) and dealing with outages and unexpected issues. All of these things make the idea of putting Fiesta on “autopilot” impractical.

    Why not sell Fiesta to someone who can keep it running?

    If we found the right match, someone committed to the well-being of the users who have put their trust in us, we’d definitely consider this option. If you think you’re the right person to take over (or if you know the right person) definitely get in touch.

    P.S. There are a lot of people who have supported us along the way and we owe a huge thanks to all of you. I’d like to single out Dogpatch Labs and Polaris for a particular thanks: we never would have made it this far without your support.

  4. Comments
  5. New standards are needed to fix reply-by-email 2 years ago

    There is a growing (and welcome) trend of web services tightly integrating email into their application workflows. One of the most useful (and frequent) types of integration is reply-by-email. Services like GitHub, Disqus and Facebook all support this.

    To get reply-by-email working, services set the From header to a special address that goes back to the service in question. Here’s what the From header looks like for messages from GitHub, Disqus and Facebook, respectively:

    John Doe <reply+i-...@reply.github.com>
    Disqus <notifications-...@disqus.net>
    Reply to Comment <c+...@reply.facebook.com>

    In every case, the ... is actually a long, presumably unique, identification string.

    This works well in general, but presents a couple of subtle user-experience problems. The most notable problems we’ve found:

    • Many clients (like Gmail) will automatically add addresses you contact to your address book. They’ll be included in the autocomplete hints when you’re composing new messages.
    • When searching for messages from John Doe using his email address (say john.doe@example.com) you won’t see any of the messages from him that came via GitHub.

    The autocomplete problem isn’t as much of a sticking point for Disqus and Facebook style headers, since they don’t use the sender’s name. For GitHub-style headers it’s a big problem: when I start typing “John Doe” the autocomplete will include the GitHub reply address. That said, GitHub-style headers are nice as they provide more information/context about who is responsible for the message you’re receiving.

    To solve these problems, there should be a header that services can use to indicate that they are doing this type of From-header munging. When a client sees that header it can avoid adding the From address to the autocomplete index and, for the purposes of search, can index the message using a different address that the service optionally provides.

    Let’s say the header chosen is X-Indexable-From. For a service to indicate that a message’s From address shouldn’t be added to contacts/auto-complete, it just sets:

    X-Indexable-From: false

    If it’s compatible with a service’s privacy policy, it can also specify the address of the person who is actually responsible for the notification being received. That way, the proper name/address can be added to the receiver’s contacts and search index:

    X-Indexable-From: john.doe@example.com

    or

    X-Indexable-From: John Doe <john.doe@example.com>

    In any of these examples, the From header stays the same as above - it’s just ignored by clients for the purposes of autocomplete & search indexing.

    Since we’re proposing this, it may come as no surprise that we’re running into similar issues at Fiesta with our implementation of Reply-Thru. We thought we’d publish this post and see if others have thoughts on this or can point out anything we’re overlooking. Consider it a very informal RFC. Is anybody else interested in pushing something like this forward as an open standard? Alternatively, is there an existing solution to this problem?

    Edit: We’ve already heard from a couple of people who run services that have the same issue and are looking for solutions. The best thing you can do to help is to spread the word about this problem and the proposed solution. It doesn’t really become viable until we get the attention of at least one big email client, so I’m not proposing that people start adding this header to outbound emails yet. Or at least not until there is consensus built around syntax, header name, etc.

  6. Comments
  7. Using (and abusing) MongoDB ObjectIds as created-on timestamps 2 years ago

    One of my favorite MongoDB tricks is the ability to use an ObjectId (the default type for MongoDB’s _id primary key) as a timestamp for when a document was created. Here’s how it works:

    >>> import pymongo
    >>> db = pymongo.Connection().test
    >>> db.test.insert({'hello': 'world'})
    ObjectId('4f202e64e6fb1b56ff000000')
    >>> doc = db.test.find_one()
    >>> doc['_id'].generation_time
    datetime.datetime(2012, 1, 25, 16, 31, 32, tzinfo=<...>)

    We’re inserting a single document and then immediately querying for it. The generation_time property of the automatically generated _id gives us a datetime representing when that ObjectId was generated (precise to the second). This is great for those times when you would’ve otherwise added an extra “created_on” field with just a timestamp.

    Going in Reverse

    PyMongo’s ObjectId class also has a method that let’s us generate an ObjectId from a datetime, for use in querying (the other drivers have this too). Let’s insert another document and give it a try:

    
    >>> import pprint
    >>> import datetime
    >>> from bson.objectid import ObjectId
    
    >>> db.test.insert({'hello': 'a little later'})
    ObjectId('4f2030d9e6fb1b56ff000001')
    >>> pprint.pprint(list(db.test.find()))
    [{u'_id': ObjectId('4f202e64e6fb1b56ff000000'), u'hello': u'world'},
     {u'_id': ObjectId('4f2030d9e6fb1b56ff000001'), u'hello': u'a little later'}]
    >>> timestamp = datetime.datetime(2012, 1, 25, 16, 35)
    >>> pprint.pprint(list(db.test.find({'_id': {'$gt': ObjectId.from_datetime(timestamp)}})))
    [{u'_id': ObjectId('4f2030d9e6fb1b56ff000001'), u'hello': u'a little later'}]
    

    The call to ObjectID.from_datetime() is what let’s us create a special ObjectId just for querying. If you look at the API docs you’ll see a note that I wrote a long time ago about when this method is safe to use. That leads us into our next section:

    Abusing ObjectIds

    At Fiesta we use ObjectIds to get the timestamps to display in our new archiving UI. Recently we had to import some existing archives for a group that was migrating to Fiesta. This presents a problem: when we import the archives we are creating new documents with new ObjectIds, but we want them to have timestamps that make them look much older. There are a couple of ways we could’ve approached this problem. I’ll start with what we did and then discuss why it’s wrong and what we probably should’ve done instead :).

    We wrote some code to generate ObjectIds with timestamps that occurred in the past, and manually generated _id values to match the messages we were importing. Here’s the code:

    import calendar
    import struct
    
    from bson.objectid import ObjectId
    
    
    # Current ObjectId increment
    INC = 0
    
    def generate_objectid(generation_time):
        '''
        This is unsafe.
    
        We generate fake ObjectIds. Set the five (machine id/PID) bytes to
        '\xFA' so we can at least recognize OIDs we generated.
    
        We don't lock around the INC, so this method isn't re-entrant.
        '''
        global INC
    
        # Timestamp
        oid = struct.pack(">i", int(calendar.timegm(generation_time.timetuple())))
    
        # Machine ID / PID
        oid += "\xFA" * 5
    
        # Increment
        oid += struct.pack(">i", INC)[1:4]
        INC = (INC + 1) % 0xFFFFFF
    
        return ObjectId(oid)
    

    We couldn’t use the above ObjectId.from_datetime() because, as noted in the docs, it’s unsafe for use in anything but queries. The method above is marginally more safe by virtue of using an actual increment and a canary for the Machine ID & PID (from_datetime() uses all \x00s). But it’s still unsafe - if we need to do another import we need to be sure not to use the same canary. We also need to be sure that the canary never matches any of our actual Machine ID / PID bytes.

    What we should’ve done

    What we probably should do is add a “created_on” field with a regular datetime timestamp for messages that are being imported. When we go to display a message use created_on if it exists and fall-back to the _id otherwise. That way we’re never resorting to improperly generated ObjectIds, but we still get the benefit of built-in timestamps when we can. I figured I’d do this post in case anybody comes across the same problem, and as a neat way of exposing some of the internals of ObjectIds.

  8. Comments
  9. Solving the Reply Problem: Reply-Thru 2 years ago

    The great thing about mailing lists is that they make it simple to discuss things with an entire group. A problem with mailing lists is that they can make it too simple to discuss things with an entire group. Everyone has seen it or had it happen to them: you receive a message through a list and reply with a personal note to the message’s sender. Your reply, however, actually goes to the entire list. Embarrassing.

    The Problem

    The reason this happens is because most mailing list software (including Fiesta by default) sets a “Reply-To” header pointing back to the list when a message is distributed. When you press Reply, your email client sees that header and composes a new message to the list rather than the sender. Mailing lists do this to foster group communication; if replies go to the whole group then everyone stays in the loop. Otherwise, it’s very easy for discussions to become fragmented.

    The problem seems intractable: we want people to be able to send personal replies, but we also want to keep group discussion on the list as much as possible (even if somebody hits Reply instead of Reply-All). To date, all mailing list software has sort of punted on the problem. The list software picks a default behavior and, at best, gives users the ability to customize the behavior themselves.

    Customization might seem like a great solution, but it’s bad for a couple of reasons. First, customization requires work and management. Who gets to decide how a list behaves? When? Is the setting mandatory or is it hidden away where only advanced users will find it? Second, customization means that different lists behave different ways, even on the same service. This is the same problem seen with all modal UIs; users can never be certain exactly how a list will behave without knowing how it’s configured.

    A New Solution: Reply-Thru

    We’ve been working on what we think is a better solution to this problem. We’re calling it Reply-Thru. Reply-Thru is enabled on all lists that are using NewFiesta.

    When you get a message from a Fiesta list you can Reply or Reply-All, just like a normal email message. If you Reply-All, your message will go to the entire list. If you Reply, it will go only to the sender. What makes Reply-Thru special, however, is how that direct message is sent. It gets sent through Fiesta, and a note is added telling the recipient that it was sent directly to them. The recipient has the option of sending a Reply directly back to you, but they also have the option of hitting Reply-All and taking the conversation back to the list. They can also distribute the message to rest of the list with a single click.

    Reply-Thru makes direct responses the default, but makes it dead simple to bring the discussion back to the list when that’s where it belongs. Best of all, there’s no configuration: every list can behave the same way. There are some issues with this approach as well, most notably is that we have to manipulate the From address of emails being sent to lists. Instead of seeing an email from mike@example.com, users will see mike-at-example.com+via@fiesta.cc. This might present some confusion, especially to those who are used to the old behavior.

    We think the benefits of reply-thru outweigh the issues, and we’ll continue to iterate on it as more Fiesta lists transition to NewFiesta. Give it a try and let us know what you think!

  10. Comments
  11. Announcing NewFiesta 2 years ago

    Since we launched Fiesta, one of the most asked for features has been the ability to browse list messages from the web. We’ve been hard at work on getting this right, and we’re psyched to announce that it’s available now. Here’s an example thread that explains a bit more about it. For new groups it’s enabled by default, but for existing groups you’ll need to go to your settings page and manually enable (we explain why in that thread). There was some good coverage of the launch in Fast Company, as well.

    Now that this is out the door, we’ll start talking about some of the tech behind it here in the blog, as usual. If you have any questions or feedback - let us know.

  12. Comments
  13. Cool Python Module: html2text 2 years ago

    Parts of the Fiesta API (as well as some new features we’re rolling out for Fiesta itself) rely on the ability to automatically generate clean text (markdown) versions of incoming messages. Our parser tends to prefer using the text version of a message if possible, as it’s generally easier to parse than the HTML version. That said, sometimes a message only contains an HTML version - we need a way to generate our canonical markdown representation from that alone. Enter html2text.

    html2text is a great little Python module by Aaron Swartz that takes HTML as input and generates a markdown version as output. Here’s an example:

    >>> import html2text
    >>> print html2text.html2text('''<html><body><p>Hello World</p>
    <ul><li>Here's one thing</li>
    <li>And here's another!</li></ul></body></html>''')
    Hello World
    
      * Here's one thing
      * And here's another!
    

    The output is nicely formatted markdown text, exactly what we were looking for. The only problem we’ve noticed is that the module has some trouble dealing with malformed HTML. Our approach has been to run things through BeautifulSoup first, which tends to do a great job even with crappy markup.

  14. Comments
  15. Some Quick Tips for Securing Cookies 2 years ago

    Last week we did a post with some tips for improving web app security, focused primarily on the Strict-Transport-Security, X-Frame-Options and X-Content-Security-Policy headers. Even though it was a pretty quick/simple post, the overall reaction was positive. There were some requests for more similar posts, so consider this the next in the series. This post is about a couple of options you can set on cookies to improve security.

    HttpOnly

    When your server sets a cookie the client’s browser will include it in future requests. Cookies are probably the easiest way to maintain state across requests, so most web applications use them to store users’ logged-in state, etc. That makes them a good target for attacks: if an attacker can get ahold of a user’s cookie they’ll be able to take actions in the app as if they were that user.

    In addition to including them in future requests, the browser also exposes cookies to Javascript running on the client side (as document.cookie). This makes document.cookie a great target for exploitation if the attacker is able to find an XSS vulnerability: they can embed a script that does something evil with the values of the user’s cookies.

    The HttpOnly option tells the browsers that support it not to allow client-side access to the cookie. The cookie will still get sent along with future requests (so it can be used to maintain state on the server side), but won’t be visible to an attacker who manages to run a script on the page. Here’s what a Set-Cookie header might look like that includes HttpOnly:

    Set-Cookie: x=5; path=/; HttpOnly

    This option limits the surface area of XSS attacks, but (just like our discussion of X-Content-Security-Policy) is really just treating the symptoms and not the cause. The important thing is to prevent code injection altogether.

    Secure

    Cookies are also vulnerable when they are sent in the clear. Just like in the last post, the first step is to be sure you’re using SSL for all requests. The Strict-Transport-Security header comes in handy there (n.b. we recognize that this blog doesn’t use SSL - we’re hosting it on tumblr. We do use both of these options on Fiesta). Cookies also have a Secure option, which tells browsers to only transmit them over HTTPS. If you’re already using SSL for all requests, set the Secure bit on your cookies, too. Here’s an example header:

    Set-Cookie: x=5; path=/; HttpOnly; Secure

    Once you set the Secure option, browsers that support it will never transmit the cookie in the clear, only over https. That’s it for this post. If you have any questions or ideas for a future post let us know in the comments.

  16. Comments
  17. Deploying MongoDB for High Availability - MongoSV Live-Blog 2 years ago

    We’re live-blogging MongoSV today. This is the last post, but here’s a link to all of the posts from the event.

    This talk is being given by Eliot (the first of his that I’ve seen all day!)

    Going to go over HA best practices: keeping data online and safe.

    What about a single node? This will have downtime. If the node crashes, intervention might be necessary. If it disappears will need a backup.

    Replica set v1: Single datacenter, single switch, single power source. But automatically recovers from a single node crash. A good start but not great.

    The next step up is still single datacenter, but w/ multiple power/network zones. Like EC2 in a single region but multiple AZs. Still some points of failure (datacenter / two node failure). With an arbiter, we can’t do w=2 writes and remain up. With 3 non-arbiters we can use w=2 but are still vulnerable to datacenter failure.

    The next step up is multi datacenter w/ a single DR (disaster recovery) node in a different DC. We can’t always stay up, but we at least have a DR option now.

    Now let’s look at the ideal: three datacenters, five nodes. One has a single delayed slave (that helps recover from fat-finger incidents like accidental db.drop). The other two DCs each have 2 active nodes. We can lose an entire DC and still have a majority w/ the other two DCs. Can do `w={dc: 2}` to guarantee write in 2 DCs.

    Moving on to HA sharding

    Each shard needs to be a replica set - same rules apply as above. Balancing can be run in a window (this is cool!) can set an activeWindow to only run the balancer at night. Sweet!

    Config servers need to be on at least 2 different power/network zones. Ideally just put in three separate DCs. Use host names rather than IP addresses: much easier to move a config server. Take backups of config servers. Important note (saw this earlier too in Richard’s talk): not a replica set. To bring new nodes online you need to manually move the data. Not a problem if a config server is down for a day or something: just won’t do splits/migrates.

    Run one mongos per app-server. Don’t need to worry about scaling mongos. Saves a network hop for many ops. If you really don’t like this, run a pool per power region with a load-balancer in front.

    Application-level tips:

    Handle spikes: queue non-synchronous writes, isolate components and features. Can your site handle going into a read-only mode? That helps a lot when dealing w/ issues.

    Monitor! Load, disk, CPU, but most importantly I/O (iostat). Alerts go hand in hand w/ monitoring.

    Have good procedures for backups, adding replica set members, adding shards, etc. Practice (in staging) (which ought to be the same as prod). Randomly shut down boxes & load test as much as possible.

    Recap

    That’s all for today. Another great event by 10gen; I think my favorite talks of the day were Kyle’s and Richard’s, but all of them were great. Hope you enjoyed the blog posts!

  18. Comments
  19. Live-Blogging MongoSV: Architecture of a MongoDB-powered Event Processing System 2 years ago

    We’re live-blogging from MongoSV today. Here’s a link to the entire series of posts.

    Presented by Greg Brockman from Stripe

    The actual title of this talk is “There’s a Moster in My Closest”, but I thought the subtitle would be more elucidating. This talk is packed! Actually, all of the talks so far today have been pretty packed - great crowd here.

    Monster is the name of the event processing system Greg built for Stripe. Been using it in production for a few months now, and it’s built on top of MongoDB. The concept of event processing is that you want to glean some information from lots of real-time events that are happening (incremental stats, real time analytics, trending topics, etc.). Stripe uses it for fraud detection, dashboards, and more. Now we’re going to get a live demo!

    He’s showing a blog-post generator that he’s written, going to use Monster to monitor the content of the posts that it’s spitting out. Live coding a “model”, which looks like sort of a quanta of reporting. Logging a new event per-sentence that gets generated. Now we need a consumer to actually do something with the events. The consumer gets streamed events and just needs to “do something”. Doesn’t worry about storage, generation, etc. Registers for classes of events and has a `consume()` method. Pretty simple, but flexible. Consumer is logging when generated sentences are “too long”.

    Question: Monster vs celery/beanstalkd/resque? Answer: when using a job queue the act of logging implies an “action”/job. With Monster/event queuing the goal is to totally decouple logging from performing actions on logs. Can add new consumers later, etc. Events persist, not ephemeral.

    Consumer uses polling to get new events.

    Now we’re hearing why they chose MongoDB. Replica sets are a major reason, for HA. They also wanted a document store: easy to use, so developers will all use it. They need atomic operations (talking about things like findAndModify). Seems like a lot of the talks today have been mentioning findAndModify. They like automatic collection creation, from a deployment perspective. No migrations, etc. Finally, background index building is really important for Stripe. Can create new indexes w/o compromising availability.

    Tradeoff: no transactions. This is the one thing they’d really like for Monster (mainly for DR). The particular case that they need it for is what they call a Stateful Consumer - can modify the state of an event while consuming it. They basically build transactions at the application layer here.

    Like the previous talk, they aren’t using capped collections. They don’t expire old events. They also aren’t using sharding (these are in response to audience questions again). Environment is a 3-node replica set on AWS (large instances). Not using EBS except for on one of the secondaries.

  20. Comments