26 במאי 2009

so long and thanks for the phish להתראות


שנתיים של כתיבה בבלוג זה, 99 פוסטים (איזה מספר עגול ויפה) והגיע הזמן לבלוג חדש.
את הבלוג החדש החלטתי לכתוב באנגלית, מה שיאפשר לי בנוסף להקלדה מהירה יותר גם לפנות לקהל רחב יותר. זה בלוג על פיתוח תכנה, כמו בלוג זה ואולי אכניס שם פרספקטיבה שונה מהסטארט-אפ החדש שהתחלתי לעבוד בו.
אז ללא שהיות מיותרות, קבלו בבקשה את PrettyPrint.me
מקווה שתהנו מהבלוג החדש כפי שנהנתם מבלוג זה.


After two years writing this blog and 99 posts I'm starting a new blog at PrettyPrint.me.
This time my blog will be purely English in favor of the international audience. Topics discussed in the blog will naturally be software development and maybe an insight to the new startup I've recently joined. In hope you will enjoy my new blog as you did this one.

14 במאי 2009

Count - an improved WC bookmarklet


The Hebrew announcement will be followed by an english one ;-)

To install the bookmarklet right-click this: Count and add it to your Bookmarks/Links (or just drag and drop if you use a mac). Make sure the Links tookbar is visible if you use IE.



החלטתי לכתוב בעברית ובאנגלית ביחד, להיות ידידותי לכל הקהלים.
אז אחרי שמימשתי את המועדפון (מילה שהרגע המצאתי עבור bookmarklet) לספירת אותיות נשארתי בהרגשה שזה עדיין לא זה והחלטתי לשפר אותו עוד. רציתי משהו יותר גמיש שסופר כל הזמן ונשאר צף על הדף ובעיקר משהו שישמש כעוגן להמשך פיתוח עתידי, כלומר שהמועדפון יקרא לקוד על השרת ואת הקוד הזה אוכל לשפר עם הזמן ללא צורך להטריד את המשתמש בהסרה והתקנה מחדש.
בנוסף, רציתי גם להשתפשף על jQuery וזו נראתה הזדמנות מתאימה.
התוצאה של כל העניין הזה היא מועדפון חינני שאני מאוד גאה בו. הוא שימושי לספירת תווים (או גם מילים ומשפטים) ולא פחות חשוב, הוא גם נראה טוב. אז כדי שלא ישעמם לי את הפרטים הטכניים אני אכתוב רק באנגלית, מקווה שתהנו מהספרן.



After implementing the first version of WC I was left with the feeling that it wasn't good enough, that something was missing. What I wanted was first, something more flexible in it's look and feel, second will not make you click it again and again, but will simply count chars whenever you select some text and third, is easy to upgrade.
I finally rewrote the bookmarklet and came up with this: Count
To host it I created a google apps application.
The idea is simple - there are cases you want to count characters. For example when using Topify to write a DM. All you need to do is click the bookmarklet and select some text (in any order; you may first select text and only then click the bookmarklet) and you get the char count right there as a floating window in your browser. You don't have to close it to continue your work, it'll just stay floating and you can drag it somewhere else if it gets in your way. What's nice is that next time you select some text it will listen and count the chars again.

Now the tech details.

What is a bookmarklet?
Bookmarklet are small pieces of javascript code wrapped in a bookmark. They are usually small applications or utilities you apply to the current browser page. In some sense they are like browser toolbars of addons or extensions but they usually are more lightweight, their installation process is VERY easy (drag and drop) and they are by definition cross-browser, so the same bookmarklet (when implemented correctly) should work on FF, IE, Opera, Safari etc. Example bookmarklets are bookmarklets that save links to delicious, bookmarklets that alert the page cookie etc.

How can javascript run inside a bookmark?
Bookmarks contain hyperlinks such as http://google.com. A hyperlink consists of several parts, the first being the protocol. In this example the protocol is http. After the protocol comes the colon delimiter and then the rest of the hyperlink. Browsers know many protocol types, to name a few: http, https, ftp and javascript. Yes. javascript is just another protocol the browser knows about which tells the browser to treat the rest of the text as actual javascript code and run it. So the following code will popup an alert box.

<a href="javascript:alert('hi')">Say hi</a>

Try it here: Say hi

Now, the same code can get saved as a bookmark, so if you drag that "Say hi" link to your link toolbar (or right click and add to bookmarks/link), every time you click it you get "hi".
So, this is the trick behind all bookmarklet. They are all hyperlinks using the javascript protocol, so they just run javascript on the current page without a page reload, which is important because they usually want to communicate with the page.

Why do you need the (function(){...})() construct in a bookmarklet?
If you look at real life bookmarklet, they all look like that:

(function(){...bookmarklet code here...})()

If you're wondering why does the bookmarklet code needs to get wrapped inside this weird function definition, wonder no more?
There are two compelling reasons to do this, but first let's see what is actually going on here with all the curly and other braces. What we have here is a function definition (an anonymous function)
function(){...}

This function is wrapped inside braces
(function(){...})

And after the braces we see another couple of braces:
(function(){...})()


What this construct accomplishes is an anonymous function definition and an immediate invocation of this function using the last (). So when the parser gets to the () it simply runs the code inside the function.
OK, so why do we need this? What's wrong with just having the actual code of the bookmarklet? Why do we need to wrap it in a function?
There are two reasons for this
1. the javascript: protocol allows for only one statement. So you can have a link javascript:alert(1) but you may not have javascript:alert(1);alert(2). A non trivial bookmarklet has more than one statement to using the above function construct lets you bundle several statements into one valid javascript statement javascript:(function(){alert(1);alert(2)})()
2. Namespace safety. The bookmarklet code may define local variables, for example var d = document; Now, keep in mind that a bookmarklet runs in the context of another page which you have no control over, so what if the page already has a variable d? Not good... The solution is simple, in javascript (as well as in other languages) variables are scoped, so if a bookmarklet defines a local variable d inside a function block it will not affect nor be affected by another page variable with the same name.

As a matter of fact, the function wrap trick is extensively used not only by bookmarklets, but by many other javascript applications that care about not polluting the default namespace. jQuery uses it as well.

How does one dynamically load server code to the current page?
Let's recall our setting: We write a bookmarklet, say the Count bookmarklet, which runs in the context of some arbitrary page so that when the user operates the bookmarklet by clicking it, the bookmarklet must not reload the page, but it needs to run scripts from a remote host. How do you do that? If the script was a simple alert() then you may simply include the code inline the bookmarklet. But if the code is considerably more complex, and maybe you want to use 3rd party code such as jQuery, what do you do?
As it appears, browsers allow dynamic loading of remote scripts by means of script injection to the head section (or to the body section, all the same). To acheive that you need to create a script dom element and attach it to the document head. The moment you do that, the browser notices that, loads the script and executes it. That's the same trick as jsonp uses, BTW, which I've blogged about in the past.
This is how you load a script dynamically:

var s = document.createElement('script');
s.setAttribute('src', 'http://charcount.appspot.com/s/jquery.js');
s.setAttribute('type', 'text/javascript');
document.body.appendChild(s);


What else is interesting about this bookmarklet?
I've had fun and challenges creating it. I enjoyed learning about jQuery, a kickass javascript library, I fought a lot of browser CSS quirks, learned a lot about how does one find the current selected text on a document (believe me, that's not half as easy as it may sound) and exercised my Fireworks Ninja. I did enough writing for one day, so I won't write about all these but if you have questions ping me at @rantav or just leave a comment here.

Where's your code? Can I see it?
Sure, it's all open sourced, hosted on Google Code here
And the web app that supports is is here.

29 באפר׳ 2009

WC - word count and char count


Update 2: There was a silly typo in the code which created a bug in gmail. I fixed that now, just remove the old WC and re-add the one just below.

Update: I have improved the bookmarklet to work with all types of frames and iframes (and not only gmail ones). Bookmarklet and code are updated.

Have you ever needed to count words in a document that you're authoring, in an email or a web page? Or even better, count characters and make sure you conform to the 140 twitter de-facto standard?
You have to try this bookmarklet then: WC

Again I'm blogging in English on this blog which is usually kept Hebrew, but since the last post on the subject was Heb I think that's fine.

A few days ago I was asked to give a twintterview, a short interview in email for which every question has to be answered with only 140 characters. Sounds like fun, right? But now, what if you had to actually count the characters for each and every answer? Not so much fun now... and twitter can't help you b/c you're in gmail...
So I did the interview and then took the time to write a simple bookmarklet that does that - counts words and chars of the selected text on a web page.

Many other applications already have this simple functionality built-in (e.g. Word or Unix) but unfortunately I couldn't find the same functionality as a bookmarklet or something else available to my gmail web client. Why does it have to be a bookmarklet? Well, it doesn't have to be one but I prefer it to be a bookmarklet b/c I want it to be available for my gmail and I want it to be browser agnostic. So, no Firefox add-ons, no Explorer toolbar, just a simple and straight forward bookmarklet.

First I thought, hey this is such a simple and useful functionality, there must be dozens of bookmarklet of add-ons out there doing just that. I googled it and found a few, but none of them were good enough for me, so I took the liberty to write my own (and use them as reference). For example the first one I found only counted words, while what I really need is chars. None of the bookmarklets of greasmonkey scripts. extension etc actually worked with gmail which is really where I needed it.

The code is right here (a bit compacted) for your enjoyment

javascript:(function(){
// Function: finds selected text on document d.
// @return the selected text or null
function f(d){
var t;
if (d.getSelection) t = d.getSelection();
else if(d.selection) t = d.selection.createRange();
if (t.text != undefined) t = t.text;
if (!t || t=='') {
var a = d.getElementsByTagName('textarea');
for (var i = 0; i < a.length; ++i) {
if (a[i].selectionStart != undefined && a[i].selectionStart != a[i].selectionEnd) {
t = a[i].value.substring(a[i].selectionStart, a[i].selectionEnd);
break;
}
}
}
return t;
};
// Function: finds selected text in document d and frames and subframes of d
// @return the selected text or null
function g(d){
var t;
try{t = f(d);}catch(e){};
if (!t || t == '') {
var fs = d.getElementsByTagName('frame');
for (var i = 0; i < fs.length; ++i){
t = g(fs[i].contentDocument);
if(t && t.toString() != '') break;
}
if (!t || t.toString() == '') {
fs = d.getElementsByTagName('iframe');
for (var i = 0; i < fs.length; ++i){
t = g(fs[i].contentDocument);
if(t && t.toString() != '') break;
}
}
}
return t;
};
var t= g(document);
if (!t || t == '') alert('please select some text');
else alert('Chars: '+t.toString().length+'\nWords: '+t.toString().match(/(\S+)/g).length);
})()


Disclaimer:
- Tested on FF and Safari. Didn't test on IE, so could be buggy

ספירת מילית ותווים WC


נתבקשתי לעשות ראיון-טוויטר, שזה אומר שכל תשובה לשאלה בראיון צריכה להיות באורך של לא יותר מ 140 תווים
את הראיון כבר השלמתי אבל סיימתי אותו בתחושת חוסר סיפוק. ולמה? מכיוון שלא הייתי בטוח שלכל התשובות אכן עמדתי במבחן ה 140. לחלק ספרתי אבל זה מתיש וברור שצריך כאן כלי אוטומטי.
חיפשתי משהו שסופר ומצאתי בערך, כלומר לא בדיוק מה שחיפשתי. אז קצת העתקתי והרבה שיפרתי וכתבתי את ה bookmarklet הבא ועכשיו אני הרבה יותר רגוע. אפשר לספור מילים ותווים ע"י סימון טקסט ולחיצה על ה bookmarklet

WC

מעניין שהאתגר הגדול יותר היה לגרום לו לעבוד ב gmail עם המערכת המופרעת של ה iframes שהולכת שם.
בדקתי בספארי ופיירפוקס. אין ברשותי אקספלורר, אז נותר רק להחזיק אצבעות, אבל אם יש באג נא לדווח.

הקוד במלואו, קצת מקומפקט להלן:


javascript:(function(){
var t,d=document;
if(location.href.indexOf('mail.google.com')>0){
d=d.getElementById('canvas_frame').contentDocument;
if(d.getElementsByTagName('iframe').length)d=d.getElementsByTagName('iframe')[0].contentDocument;
}
if(d.getSelection)t=d.getSelection();
else if(d.selection)t=d.selection.createRange();
if(t.text!=undefined)t=t.text;
if(!t||t==''){
a=document.getElementsByTagName('textarea');
for(i=0;i=a.length;i++){
if(a[i].selectionStart!=undefined&&a[i].selectionStart!=a[i].selectionEnd){
t=a[i].value.substring(a[i].selectionStart,a[i].selectionEnd);
break;
}
}
}
if(!t||t=='')alert('please select some text');
else{
alert('Chars: '+t.toString().length+'\nWords: '+t.toString().match(/(\S+)/g).length);
}
})()


10 באפר׳ 2009

לוח המשרות של רברס עם פלטפורמה


מזה מספר חודשים אורי ואני מקליטים פודקאסט לאנשי תכנה, רברס עם פלטפורמה. אנחנו נהנים מעצם העשייה ומהמפגש המעשיר עם האורחים שלנו, המאזינים והתגובות.
לאחרונה, עם התדרדרות מצב שוק העבודה החלטנו לעשות את שביכולתנו לסייע לאנשי תכנה מובטלים והקמנו את לוח המשרות של רוורס עם פלטפורמה.
לוח זה הוא שרות חינמי לחלוטין וללא כל התחייבות ומטרתו אחת: לשדך דורשי עבודה עם נותני עבודה. מה שמיוחד בלוח זה הוא שאני ואורי חושבים שיש הרבה שאלות שמעניינות דורשי עבודה בענף התכנה שאינן מקבלות מענה בלוחות המשרות הקיימים היום וזה מקשה על חיפוש העבודה ולכן כל מעסיק שמפרסם משרה מתבקש לענות על השאלון. מעבר לכך כיוון שהקהל שלנו מאוד ממוקד אנחנו מאמינים שביכולתנו לעשות שידוכים טובים. אז ללא שהיות מיותרות, מי שמחפש עבודה או מי שיש לו משרה להציע מוזמן לבקר את http://jobs.reversim.com נכון להיום יש כבר שלוש משרות פתוחות. תודה רבה לאדם מתן שלקח על עצמו את הפרוייקט הזה ועוזר לנו בבניית האתר.

אם מצאתם עבודה דרך הלוח אנא השאירו לנו הודעה או מייל - בטח נרצה לפרסם סיפורי הצלחה וגם להסיר את המשרה מהלוח.

מי שמחפש מהנדסי תוכנה מוצלחים- ניגש לאתר, ממלא את השאלון (בעברית או באנגלית) ושולח לאדם (שהתנדב לסייע במשימה החשובה הזו). משם אנחנו דואגים לפרסם אותה, לקדם אותה בבלוגים, בטוויטר וכמובן בפודקאסט.

בהצלחה לכולם ו... קדימה לעבודה.

19 במרץ 2009

TwitGraph-en


By popular demand I'm reposting this one in English. I usually keep this blog Hebrew only as I wanted it to approach Israeli developers but as I said, I was asked to post that one in en as well and who could resist such a request? ;)

For a while I've been trying to solve the following problem: how to effectively get feedback from users? Specifically online users. How does one measure success of a product launch or a campaign?
It's quite obvious I'm not the one first to think of this problem and I'm sure there's already an established industry out there working just on that but still... I had one specific problem that I just couldn't get solved any other way. So I built that solution my own hands. It was fun doing so, so I'm sharing the experience here.

Problem definition: I work on a product that has massive web attention. Once we release a new feature of the product I'd like to see how the community reacts. To do so I've set up several blog searches and fed them to Google Reader. I've also subscribed to a twitter search and fed that rss feed to GReader as well, but that wasn't enough. Some days I get dozens or hundreds of tweets and it gets too hard to measure both buzz and user's happiness. What I wanted to know was: 1) how many tweets are tweeted about my product and 2) were they positive or negative attitude?
So I've put the rest of the blogsphere and the rest of the world aside and simply concentrated on twitter.

I created this. A web application that graphs how many tweets a day there are on your subject of interest, plus, how many of them were positive, negative or neutral attitude.



And now... to the gory tech details. You can stop reading now if you don't care about fun software

I used google's appengine as my server, so a lot of the code is written in python. Naturally I also used a bit of javascript and family for the client side implementation. Arguably, I could have used more CSS to make it prettier, but I didn't (any volunteers that want to help with that?)

There were several interesting challenges, so I'll speak about them here and explain how I solved them.

1. Getting the data
What I want to do is query a date range, say past 7 days, and for each day graph how many tweets are there for that specific query term. So, challenge #1 is how to get that data.
Twitter has a pretty cool and slick search API which lets you search API-ish for all kinds of stuff. Here are just a few examples of it:
http://search.twitter.com/search.atom?q=twitter
http://search.twitter.com/search.atom?q=from%3Aalexiskold (from a user)
http://search.twitter.com/search.atom?q=twitter until:2009-03-01 (Until a date. You may also use from: date)

So, this API is quite nice. It not only allows you to query for an atom (xml) result but also a json and even jsonp, which is json with a callback and is useful for crossdomain requests. I'll talk a little bit about that later.
But the API has some very painful restrictions, the most important of them all is the limit on the number of results. They limit them to at most 100 for each call. Now, coming from a search engine company I can clearly see why they do that, they can't possibly allow not to limit the number of results, but as a developer that uses that API... that was a challenge.
It's important to me to get all the results and get the raw results, e,g, not aggregate b/c I'd like to run some text categorization algorithm on them later. Not that there's a way of getting aggregate results on twitter, but even if there were I wouldn't have used it.
So here's what I did: I called the API to get the first 100 results. Then I called it to get the next 100 results, and then the next 100 results... etc. That's mean, I know... API servers don't necessarily like that and they might block me sooner or later, but what other choice did I have?
This method works actually quite well for micro-trends. E.g. don't try to feed twitgraph with popular search terms such as "google" or "youtube" b/c it'll simple drown the server. That's one of the most annoying shortcomings of my service. But, for micro-trends, e.g. less popular trends it actually works pretty well, so that was really nice.
The first version was implemented almost entirely on the client side, which means that all logic was implemented in javascript and the google appengine server was hardly involved. But the second version changed that and nowadays all logic is actually implemented on the server, including recursively fetching the results.

Problem 1 solved (but only for micro-trends) by fetching results recursively.

2. How to analyze the data for positive and negative attitude?
The twitter search API has a neat feature. If you append to a search term a ":)" or a ":(" you get back happy/sad results. That's the first API that I've seen which actually uses emoticons, cool. Only problem is that it sucks :( and :( again. The results are absolute rubbish and had very poor quality, so I could not use them at all. Indeed, they would sometimes get the happy/sad sentiment right, but in most cases they would just say "don't know" and in some cases they would return the wrong answer. Bottom line: nice try, but can't use it.
So I had to do it myself. I didn't know what do to, so I posted a question on stackoverflow.
I got plenty of answers there, so now I knew what to do :)
Here's what I did: I fetched all the results from the server and then used a Naive Bayesian Classifier to tag them to :-), :-( and :-|.
Basically a naive bayesian classifier works like this:
First you train it by feeding it examples of :) tweets, of :( tweets and of :-| tweets which you prepared beforehand, and then you ask it to guess what's the sentiment of the next tweet. That works surprisingly well!
I used a bayesian classifier from here which was pretty simple to use. To bootstrap the system I fed it with a list of known good words and a list of known bad words that I found somewhere, which is BTW not ideal for a bayesian classifier, but it worked reasonably well, and then I added a dynamic learning feature, namely, as you get the search results back, as a user, you can teach twitgraph what's the correct sentiment of each and every tweet. Next time we use this data as a signal, and this turns out to be a very good signal. I've now tagged several dozens of tweets and already classification is getting really really good.

So - problem 2 solved - fetch all results and use an open-source bayesian classifier. Happy happy :)

3. How to graph the data?
That was actually the easiest part of then all!
I used the Google Visualization API javascript library which is pretty easy to use. Really, with only a few lines of code I created those nice graphs. To prove that I'll paste the two functions that draw the graphs here.


twitgraph.Grapher.prototype.drawLineChart = function() {
var aggregate = this.result.aggregate;
// Create and populate the data table.
var data = new google.visualization.DataTable();
data.addColumn('string', 'Date');
data.addColumn('number', ':-(');
data.addColumn('number', ':-)');
data.addColumn('number', ':-|');
data.addRows(aggregate.length);
for (var i = 0; i < aggregate.length; ++i) {
data.setCell(i, 0, aggregate[i].date);
data.setCell(i, 1, aggregate[i].neg);
data.setCell(i, 2, aggregate[i].pos);
data.setCell(i, 3, aggregate[i].neu);
}

// Create and draw the visualization.
twitgraph.Utils.$('twg-graph').innerHTML = '';
var chart = new google.visualization.AreaChart(twitgraph.Utils.$('twg-graph'));
chart.draw(data, {legend: 'bottom',
isStacked: true,
width: 600,
height: 300,
colors: ["#FF4848", "#4AE371", "#2F74D0"]});
}

twitgraph.Grapher.prototype.drawPieChart = function() {
var stats = this.result.stats;
// Create and populate the data table.
var data = new google.visualization.DataTable();
data.addColumn('string', 'Sentiment');
data.addColumn('number', 'Tweet count');
data.addRows(3);
data.setValue(0, 0, ':-(');
data.setValue(0, 1, stats.neg);
data.setValue(1, 0, ':-)');
data.setValue(1, 1, stats.pos);
data.setValue(2, 0, ':-|');
data.setValue(2, 1, stats.neu);

// Create and draw the visualization.
twitgraph.Utils.$('twg-graph-pie').innerHTML = '';
var chart = new google.visualization.PieChart(twitgraph.Utils.$('twg-graph-pie'));
chart.draw(data, {legend: 'none',
is3D: true,
width: 300,
height: 300,
colors: ["#FF4848", "#4AE371", "#2F74D0"]});
}

Problem 3 solved.
Well, almost... I also wanted to have the option of static images, e.g. gif. A common use case for static images is to be able to include the graph in an email which doesn't allow running any js. I solved that too, but this time using the Google Charts Service. So now static graph images (with for dynamic data which get updated every day) are also available.

Now 3 is really solved :)

4. how to embed in a 3rd party site?
I think twitgraph is useful if it can be embedded in 3rd party sites. But to do that you'd need to run an XmlHttp call across domains, which most browsers just wouldn't let you. The solution to that problem is already well known and is called jsonp, json with padding. That's actually a very well known technique which is widely used across other web services so I won't get into details and just lay out the short concepts and code here.
The idea is that you can't make an XmlHttp request across domains, but you can include JS across domains. And you can also load javascript dynamically to any web page if you'd just add a <script> element to its head at any given time, even if the page is already loaded. That javascript that you added to the page will call a callback function that you tell it to, once its loaded, and there - you have your data.
More about json and jsonp.
The code from my app is here:

function jsonp(url, callbackName) {
url += '&callback=' + callbackName;
addScript(url);
}
function addScript(url) {
var script = document.createElement("script");
script.setAttribute("src", url);
script.setAttribute("type", "text/javascript");
document.body.appendChild(script);
}

5 design
Well, what can I say... I didn't solve that one yet.. I'm bad at design so the application is still ugly. I'd be very happy to get professional help with this...

Hope the post was useful to all ya developers out there.

Would you like to contribute? The source code is here and you're welcome to drop me a line at the post comments.

18 במרץ 2009

דרוש מתנדב 2.0


הפופולריות של הפודקאסט שאני מריץ עם אורי, רוורס עם פלטפורמה, עולה יפה, ולאורי היה רעיון שאולי נוכל לנצל אותה לטובת הכלל.
הרעיון הוא לעזור לאנשי תעשיית התוכנה שמחפשים עבודה בתקופה קשה זו למצוא עבודה. אבל כדי לממש את זה אנו זקוקים לעזרה, אז אם יש למישהו כמה דקות פנויות ביום ורוח התנדבות קלה הוא מוזמן לפנות אלינו בדואל או להשאיר תגובה לפוסט ונוכל לדבר על הפרטים.
רן [את] רוורסים [דוט] קום או אורי [את] רוורסים [דוט] קום