Thursday, 24 May 2012

Putting data from a tab separated(.tsv) file into MySQL database

Many a times we need to populate data dumped by someone into a tab-separated file into our database. It can be done using the following ruby code:

      row = []"/path/to/file.tsv") do |f|
        f.each_line do |tsv|
          row << tsv.split(/\t/)
    rescue Exception => e
       puts "------exception------#{e.inspect}"

However, the file runs into hundreds of MBs, this will be way too slow. Instead we can use the following MySQL query.

LOAD DATA LOCAL INFILE '/path/to/file' REPLACE INTO TABLE table_name IGNORE 1 LINES (column1, column2, column3, column4);

The "IGNORE 1 LINES" part ensures that the first line containing the header is ignored. In case there are no headers, this part may be excluded. Also, if relative file paths are to be used, the LOCAL keyword may be dropped. This process is way faster than any other process; but validations are bypassed.

Tuesday, 15 May 2012


While trying to translate a Perl script to a ruby script, I found the necessity of testing things on the interpreter. Ruby provides IRB for that. If you are using Ruby on Rails, you might prefer Rails console over IRB. However, I could not find anything similar in Perl. I could do some tasks from the command line as follows:

perl -e "print 'Hello World';"

However, whenever I was trying any assignment, it was not working. For example,

perl -e "$k = 'Hello World';print $k;"

did not work. I later came to know it was because bash was gulping $k because I had used a double-quoted string. Using a single-quoted string works fine.

perl -e '$k = "Hello"; print $k;'

A quick search showed that I can use Devel::REPL for this purpose. So, I obtained it from CPAN.

cpan -i Devel::REPL

I prefer using the Perl REPL interpreter using the script.

[blog@domain ~]$
$ my $k = "Hello World"
Hello World$ print $k;
1$ Hello World

Thursday, 3 May 2012

The "Google" experience

Earlier the "Google" experience was very simple. In the typical "Google" workflow, i.e. the workflow followed in most of its products, interfaces were uncluttered and never came in your way. However, recently they are becoming increasingly cluttered. Youtube, for example, has a lot many things on its homepage now. However, earlier it just had videos. When you go to Youtube, you want to watch videos and that was what it showed to you. Wil Wheaton calls this as a huge mistake on the part of Google. He is not alone.

Another important aspect of the "Google" experience was speed. Now I am experiencing considerable speed reduction in almost all Google products. Be it Gmail, Blogger or Youtube. They have implemented instant search but their faster search pages were much more helpful. I can always do without instant search. When I come to search page, I know what I want to search. Suggesting instantly only reduces the number of letters I type.

Next aspect of the "Google" experience is coolness. Stopping the code search facility is not cool at all. By coolness, I mean positive rapport with the open source developer. Google used to inspire developers worldwide; but it is not that cool any more. DuckDuckGo is way cooler than Google.

Semantic search is the milestone that Google and its competitors are heading for. Social sites like Facebook are becoming important because they have better results at semantic search but in a restricted domain. Same is the case with many other tools that are providing or have the potential of providing better results at semantic search in various domains. For the ads market, the better the semantic search result the higher is the increase in likelihood of the ad getting attention and being clicked. So, instead of ruining its own experience Google should work on finding ways of improving semantic search in more and more domains.