What I learned at JavaOne 2007

Standard

I am just back from a week in San Francisco at JavaOne where I was one of 12.000 geeks that got together to get to know more about the latest and greatest in Java development. Since it was almost 2 years ago I worked full time with Java development this almost felt like a trip back in time for me, back to the time when Java was equal to my professional life, and what I have worked with the last few years (Web 2.0, mashups, web scraping etc) is still considered new and exotic in the Java world.

It was also very interesting for me to compare JavaOne with the Web 2.0 Expo that I attended a few weeks ago. JavaOne is almost strictly for developers that are using a mature technology that has not really taken any real leaps lately. The Web 2.0 Expo was for both techies and business people and it looking forward at new technologies and business oppertunities.

Keynotes and General Sessions

Most of the Keynotes and the General Sessions were also about running Java on all kinds of devices (cell phones, ATMs etc) and generally about how great Java is, so nothing new there. But there were a few things that caught my eye:

  • JavaFX Script – Sun has decided to take up the fight with Microsoft’s Silverlight and Adobe’s Flash, unfourtunatly they decided to give it a name that most people will confuse with “JavaScript” and Adobe’s “Flex” (just try saying JavaFX Script 10 times quickly). This is a very interesting move since it suddenly makes it easy for the whole Java community to develop Rich Internet Applications. This and Silverlight has the potential to make the web a much more interesting place.
  • Netbeans 6 – There was a quite impressive demo of Netbeans 6 and how it can be used for programming Ruby on Rails, much better than RadRails I am using now. Of course it was a demo and I havent had time to test the Netbeans 6 preview yet, but as soon as I have to dig down into Rails again Netbeans is my choice.
  • Blu-ray – There was some semi-desperat plees (=competitions) to get Java developers involved in the Blu-ray vs HD-DVD fight. Sun is squarly behind Blu-ray (they mentioned that it ran Java about 100 times). The Blu-ray demos were cool, but personally I just want one format to win quickly so I know what player to buy.

Java and Web 2.0
Most of the presentations were of course about hardcore Java stuff, and I skipped those. Instead I went to all presentations about Mashups, RSS, Atom and REST (acctually I held a presentation about Mashups myself, more about that in a later post). It is pretty clear that all the Web 2.0 technologies are viewed as some distant hype by most of the Java community.

The only really cool thing I saw in regards to Java and Mashups was a couple of demos of jMaki. It is a project developed by Sun and it is basically a framework where java developers can easily program Mashups. The great thing was what is called the “Glue” which is an event bus that enables widgets from different providers like Yahoo and Dojo. jMaki has a great future if it ever moves into the Enterprise world and it could be a real step forward for both Java and Web 2.0.

Another interesting Sun research project is Project Caroline that enables java developers to control all resouces from the code, ie create new server instances on the fly, set up new file systems etc etc. If this ever moves beyond just being a half implemented research project it could open up to a lot of competition for Amazons S3 and EC2.

HTML is the worlds most common API

Standard

Most folks that are working with Mashups just assume that services and APIs will magically appear but unfourtunatly there are not that many public APIs around today. Just check out programmableweb and you will see. More and more are added every day, but it will bever reach the level that a majority of systems have an API, especially not if you think about systems within the coporate firewalls. Simply put there is a painful lack of APIs, and if that is not addressed it will stop the mashup wave in it’s tracks. Fortunatly there are already smart people working at this, and one of the solutions is to start using HTML as it is an API. That’s right, start using all the data and functionality that today is available in HTML to build new innovative mashups and solutions.

The potential of HTML

All new interesting applications (Skype being the exception that proves the rule) has an HTML interface. And this is true not just for the consumer facing applications, but for Enterprise level applications as well. So with the millions and millions of HTML pages in existance today it is not unlikely that HTML is one of the worlds most common data formats (I wonder how it compares to printed text and audio for example). The great thing with HTML is that it does not just contain data, it also is the interface to a whole lot of functionality (when you search Google you do that via HTML don’t you?). What if we could use HTML as one big API? That would make HTML the worlds most widespread API and that would give mashup developers and programmers access to more data and more functionality than ever before.

The problem with HTML

Almost not sites on the web today are following the HTML 4 standards. So todays browsers are very good at interpreting the tag soup that most pages consists of (ie broken HTML, missing end tags etc). Furthermore HTML is used to both mark up data in a document, for example with the <title> tag, and to mark up how the data should be presented, for example the <b> tag. All this together makes HTML documents unstructured documents (by implementation, not by nature) with data in very application specific formats (microformats will help here, but there will be some time before that is widespread enough to be really usefull).

Another problem is of course that there is fewer and fewer pages on the web that uses pure (albeit broken) HTML, there are more and more Javascript around. Especially in the Web 2.0 applications most of the really interesting functionality is available via AJAX. So it is not only HTML, but also Javascript that has to be taken into consideration when one wants to get to a web applications functionality.

Parsing

So we have huge amounts of data and functionality in HTML and we want to use it to make our latest funky Mashup. The good old approach is to try to parse the page in question using Perl, now it can be done pretty well using almost any modern programming language. There are several problems with parsing though:

  • It is damn complicated to get to work on serious web pages and once it is done it breaks easily
  • Good luck handling a real tag soup, already that breaks most parsers (since using XML parsers for this means that the parser simply stops at the first error it encounters)
  • It is boring to program those parsers (if you havent tried then lucky you)
  • Can not access functionality that uses javascript and AJAX
  • It is hard to handle things like login into a web application (ie session handling) and to navigate over several pages

Still this is a very usual approach to get to data and functionality in HTML. But there is a much easier way…

Web Scraping

I bet that a fair portion of the people reading the word “Web Scraping” think of old mainframe terminals and “Screen Scraping” and frown. Don’t worry, technology has moved forward lightyears since the days of mainframes. Web Scraping is to interact with HTML (including Javascript if it is a good scraper) and to either extract data from the HTML or repackage the functionality in the HTML. The data can be saved into a database or a file for example, and the functionality can be made available as a REST service, a programming language API or whatever else makes sense. Suddenly HTML is wide open. Just imagine that you wanted to get data from Digg (before the Digg API that came out a few weeks ago) for some reason, without an API that would be hard. But using a web scraper you could for example build a REST service out of the search on Digg only by accessing the HTML. Web Scrapers are used more and more for doing things like collecting large amounts of job ads or flight information and then repackage that data into sites that then allow users to search for a job or a cheap flight.

Openkapow

The web scraper of my choice is the one supplied on openkapow.com (disclaimer: I am working for Kapow Technologies, the company behind openkapow.com, but trust me in that I am not plugging openkapow to make my boss happy – it is really a great product). Using openkapow one can access data and functionality on any web page and access it as a REST service or and RSS/Atom feed. Of course JavaScript is handled automatically, it is possible to navigate multiple pages, login to restricted pages and have full control over the process flow with conditions and error handling. I recommend that anybody that is interested in how to use HTML as an API takes a look at openkapow.

An Eye Opener…

Thinking of HTML as an API does significantly expand your horizons as a developer. I have literaly seen a light go on in fellow geeks eyes when they realize the potential. Suddenly the web is really yours to use in your programs and mashups. When suddenly APIs and services are abundant then you can start using the other cool mashup tools around (Teqlo, jMaki etc).

How vs. Why

Standard

Somebody recently mentioned that they were always more interested in the “Why” as compared to the “How” of any technical challenge. That made me think and my (current) conclustion is that there are two major groups of people (at least in the tech world) – people that primarily asks “How?” and people that primarily askes “Why?”

The “How?” Group

“How” is the focus of anybody working daily with solving problems though technology, and it is firmly rooted in us in all those Math and Programming courses at University. I am definitly in this group, even if I am slowly drifting out to the Why group. If there is any occupational hazard for computer geeks (except for destroying our hands on keyboards) it is probably that as soon as we hear about a problem we start to design the system to solve it, our subconscious is constantly working on solving some trivial or non-trivial problem. Things have to have a solution, and if we dont know what it is it is just because we havent thought it through properly.

The “Why?” Group

“Why” to do something is (in my humble opinion) not that important to most techies, and to the ones that consider the “Why” important a pure development job is probably quite unfulfilling. This is the group of where most entreprenurs belong to. They ask questions like “Why should I have to use FTP to upload my photos to the web?” before they ask “How should we implement that?” (see Flickr for the answer). This group of people is much smaller than the “How” group, but everybody in the “How” group can really benefit from thinking more about the “Why”. The best programmers I have ever worked with asks “Why” all day long. And without why we wouldn’t have most of the Web 2.0 apps of today.

More Why’s!

I am finding myself asking “why?” more and more, and “how?” less and less, and now when I am aware of this I will make sure to always ask “why?” before asking “how?”. This will be hard to do, as a programmer asking “how?” has been my focus for many many years.

If I could just expand this post to about 300 pages I can publish a management book! I also need to get a picture of myself in a suit and a tie for the cover, but that should be it…

links for 2007-05-15

Standard