Sunday, March 18, 2012

Someone please disrupt the furniture industry!


Many things are cheaper online than in brick and mortar stores. Not so for furniture. Based on my (limited) experience buying furniture for my new home over the last 2 months, online furniture stores are way more expensive than your local furniture showroom. http://www.homelement.com/Living-Room/Living-Room-Sets/Lambeth-Sofa-Set-Homelegance-p-25911.html lists a sofa + loveseat combo for $1838. I got the identical sofa, loveseat and coffee table (listed for $309) for just $1325 after tax with free delivery and setup from a local store! That is a savings of over $800! 

 I visited three stores before I bought the sofa set. If I had bought it from the first store I visited, my savings would have been just $400. This is the strategy I followed. 

The first store is where you pick what you like. You peruse through the catalogs of various manufacturers and find what you like. Then you ask for a price quote. The salesman will consult his secret price list to determine what the furniture costs him, and then typically quotes you double that amount. Negotiate a little bit to get the price down.  

Then, armed with that price, you go to the next store. Almost every store carries the same set of catalogs, and even get their items from the same warehouses. This time, you waste no time - directly ask the salesman what his best price for item X on page Y of manufacturer Z's catalog is. Use the price from the first store as the starting point and negotiate the price down. Since the prices are often marked up 100% or more, there is a LOT of room to negotiate. Then you go to the next store and do the same. If you had nothing better to do, you can keep doing this. However, after the first 3-4 stores, the price stops going down further, and there is no point wasting another valuable weekend day on furniture shopping. At that point, you close the deal. But wait - offer to pay cash if you can and you will most likely get a further discount. 

This strategy saves you many 100s of dollars, but takes a lot of time. Visiting three furniture stores can take a whole Saturday. It would have been great if buying furniture was like shopping for a car online. You just pick the items you want from a multi-manufacturer online catalog and then different local dealers bid for your business. That's it - no driving from store to store. That would have been ideal. But that's not the way it is today. 

I hope some startup is working to disrupt the furniture industry and make this seamless online furniture shopping experience possible. I only know of one startup in the furniture space - dealdecor.com, which is trying to become the Groupon of furniture. Dealdecor offers one particular piece of furniture every few weeks for half price. That is great, if the item of the month is what you are looking for.  It most like won't be, and you have no option but to visit the local retailers. 

The US furniture market is a few 100 billion dollars a year(http://www.marketsize.com/blog/index.php/category/furniture/).   There are many challenges to disrupting this industry - resistance from the entrenched local dealers (now, that's a surprise!), finding a good way for shoppers to try out furniture before buying, shipping bulky items, etc.   Hopefully, someone will find a way to address these challenges and seize this huge opportunity.    As for me, I am done with buying furniture for a while.

Sunday, December 04, 2011

Accessing recursive Apache Hive partitions in CDH3

In this post, I describe the minor Hadoop (0.20.2-cdh3u2) patches required to access data deep inside a multi-level directory structure using hive 0.7. Consider the following directory structure:
Logs/
    2011_01
        01
        02
        ..
        31
    2011_02
        01
        02
        ..
        28
We want to issue hive queries involving individual days as well as whole months. For accessing individual days, we define one hive partition per day. For example, we define a partition 2011_01_02 with LOCATION Logs/2011_01/02. To access the whole month of 2011_01, we define a partition 2011_01 with LOCATION Logs/2011_01. However, if you query the 2011_01 partition, you will get no results. This is because hadoop 0.20.2 does not support recursive directory listing.
In order to get this monthly query working, you must first apply the following patch (based on MAPREDUCE-1501, which did not make it into hadoop 0.20.2) to hadoop 0.20.2.cdh3u2. After applying the patch, compile hadoop and point the HADOOP_HOME on the machine running the hive client to the patched hadoop jars. You do NOT have to replace the hadoop jars on the hadoop cluster; the recursive directory listing feature is only needed by the hive client.


In addition to the patched jars, you should also add the following lines to your hive-site.xml:
<property>
  <name>mapred.input.dir.recursive</name>
  <value>true</value>
</property>
After this querying the 2011_01 partition will work fine.

Thursday, November 24, 2011

Quickly find and open files

I frequently need to find a file that is located deep within the current directory and operate on it -- like opening in vim or svn diffing it.  I can never remember the exact path to the file, and sometimes can't even remember the full name.  All I know is that the file is somewhere within the current directory and its sub-directories.  So, I end up running the UNIX find command, and then cut-pasting the returned file path into the command of the interest.  This wastes time.  So I wrote a small python script to make it easier.

Copy the script f.py (located at the end of the post) into some directory that is on your PATH.  Suppose you are looking for the file that starts with Foo, you just run:

$ f.py Foo*
1) ./subdir1/subdir2/Foo1.java
2) ./subdir1/Foo.java
Enter file number:

Enter the number of the file you are interested in.  That will bring up the following menu of operations you can perform on the selected file.

Process ./subdir1/subdir2/Foo1.java
1. vim
2. emacs
3. svn add
4. svn diff
5. open (OSX only)
Enter choice (Default is 1):

If the pattern you specify matches only a single file, the script directly jumps to the operation selection menu.  Hope this will save you some key-strokes.

#!/usr/bin/python
# This program is used to easily locate a file matching 
# the user specified pattern within the current directory
# and to quickly perform some common operations (like
# opening it in vim) against it.
import subprocess
import sys
import os

def processFile(fileName):
    """
    Show the user the possible actions with the specified file,
    and prompt the user to select a particular action.

    """

    fileName = fileName.strip()
    print "Process %s" % fileName
    print "1. vim"
    print "2. emacs"
    print "3. svn add"
    print "4. svn diff"
    print "5. open (OSX only)"

    choice = raw_input("Enter choice (Default is 1):").strip()
    
    if choice == "1" or choice == "":
        cmd = "vim %s" % fileName
    elif choice == "2":
        cmd = "emacs %s" % fileName
    elif choice == "3":
        cmd = "svn add %s" % fileName
    elif choice == "4":
        cmd = "svn diff %s" % fileName
    elif choice == "5":
        cmd = "open %s" % fileName
    print cmd
    os.system(cmd)


def listFiles(fileNames):
    """ 
    Show the list of files and prompt user to select one 
    """

    fileIndex = 1
    for fileName in fileNames:
        print "%d) %s" % (fileIndex, fileName.strip())
        fileIndex += 1
    choice = raw_input("Enter file number:")
    chosenFileName = fileNames[int(choice)-1].strip()
    processFile(chosenFileName)


if __name__ == "__main__":

    if len(sys.argv) < 2:
        print "Usage: f.py FILE_PATTERN_OF_INTEREST"
        sys.exit(-1)

    pattern = sys.argv[1]
    proc = subprocess.Popen("find . -name \"%s\" | grep -v svn" % pattern, 
        shell=True, stdout=subprocess.PIPE)
    lines = proc.stdout.readlines()
    if len(lines) == 0:
        print "No matching files found. Note you can use wild cards like *"
    elif len(lines) == 1:
        processFile(lines[0])
    else:
        listFiles(lines)