HiredNetwork

The Art of Being a Webmaster

How to build traffic, sustainable websites and most of all profit online.

HiredNetwork header image 1

Mandatory Review Of Google Chrome - Its not a browser.

September 4th, 2008 · No Comments

Well every other blogger has one, so I decided I’d fling my hat in the ring.

So here it is.

Google’s Chrome - its not a browser.

Not by a long shot.

Yes, yes, the marketing blurb says it is, what with its blinding speed, multiple processes running for its tabs and its future expandability.

But they would say that wouldn’t they.

Well if its not a browser what is Google Chrome? Its a desktop application thats being designed to allow Google to encroach on your desktop and provide a single point of access to its (future) services.

Oh and it processes web pages as well, so yeah, maybe some bloggers got it right, its a browser in part.

What am I ranting about? Returning to Google’s version of the future, the future for the plebs is in cloud computing. We’ll all own desktops with very little memory - all our pictures, documents, spreadsheets and files will be stored somewhere on a data center - cloud computing. Google will of course own these data centers. So wouldn’t you need an application that is able to spawn multiple processes to make each tab faster? I.e. multitask over the internet? After all, if you are working on a spreadsheet in one tab, listening to music in another and video chatting with your friends on anther tab, do you really want one slow assed process handling all this? Nah. Better off have multiple processes running.

We are one step closer to Google’s cloud computing. And the absense of any sort of privacy.

→ No CommentsTags: Google

Mounting an USB external hard drive on linux

August 27th, 2008 · No Comments

I’m using Centos 5 but the commands will be similar for every flavour.

First unpack the shiny new Hard disk from its walmart / tesco / cheapest-discount-store-in-your-neighbourhood wrapping. Remember to recycle. I’ll know if you don’t. Plug it into your computer (Really, am I going to fast here?!).

Odds are that if you bought this from a general retailer its formatted for windows use, so all those microsofters can plug and play to their hearts content. Us linux folk don’t like having it so. So we need to format it, install partitions on it and then mount it.

Formating the External Drive
Firstly try the following command:
fdisk -l

For god sake, please remember to run this command as root. Otherwise you’ll get some wierd error messages.
You will get some output like the following:
Disk /dev/hdb: 64 heads, 63 sectors, 621 cylinders
Units = cylinders of 4032 * 512 bytes

Device Boot Start End Blocks Id System
/dev/hdb1 * 1 184 370912+ 83 Linux
/dev/hdb2 185 368 370944 83 Linux
/dev/hdb3 369 552 370944 83 Linux
/dev/hdb4 553 621 139104 82 Linux swap

Disk /dev/sda 500GB 50000001001010 bytes
255 heads …

Device Boot Start End Blocks ID System
/dev/sda 1 60801 488384001 86 Linux

It can be hard to recognise your external hard drive if your not used to looking at crappy output from Linux terminal windows! Normally linux will list your external drive under /dev/sda or /dev/sdb etc but it can vary. Look for the size of your drive (be specific) and stay away from anything labelled, boot, cdrom, dvd etc!

Once you’ve found your harddrive we’ll now proceed to format it. I’ll presume from here on in that your hard drive is under /dev/sda. If its not, just substitute where ever it is. EG you may be substituting /dev/hdb1.

Formatting an External Hard Drive on Linux
There are a few formats available for your hard drive. I’m not going to get into specifics on them. The one I use is ext3. If that really excites you, feel free to look it up elsewhere. To format to ext3 type the following command:
mkfs.ext3 -j /dev/sda

And thats it! Your external drive will now be formatted. It may take a few minutes, depending on your system.

Partion your hard drive
If you run a ‘fdisk -l’ again, you’ll get some warning about your external drive not having a valid partion table. So lets go ahead and create one. Run:
fdisk /dev/sda
a list of options appear, or you can type ‘m’ for the list. We need to create a partition table so we press ‘n’. You then get the options for what to do. If you wish to use the hard drive as one big file system, then just enter the defaults. If you wish to split it up into smaller sections, then feel free to do so.

Mount your external drive
The final step. If you wish to mount it straight away then simply type
mount /dev/sda /home/user/somewhere/you/want/to/access/it

Replace the path to wherever you want the acess to be. It may simply be ‘drive1′ on your desktop. Don’t forget to CREATE the folder first!!!!

With this method, we have to mount the hard drive everytime we log in. More useful would be to have the system mount it for us on start-up and then you wouldn’t have to keep reading riviting blogs like this to figure stuff out. To mount it automatically we must edit our /etc/fstab file. To do so:

vi /etc/fstab

Move the cursor to the end of the whatever writing you see. Press ‘i’
Hit return to start a new line and type this.
/dev/sda /path/to/where/you/want/access ext3 defaults 0 0

Hit escape. Then type :w. The file then gets written. you can then type :q to exit.

You can now restart your system, or for now mount the drive manually using the previous command. Access to the drive is for root only at the moment! So if you wish others to have access to it, and to be able to write to it, make a cup of tea and think about what permissions you want them to have. Navigate to the parent file location in your terminal window and type:

chgrp user file

where ‘user’ is the user’s name, and the file is the files name. Obviously! If you wish to make them the owner then substitute ‘chown’ for chmod.

Now lets define their permissions:

chmod g+r+w+x file

Now our group (g) has read(r), write(w) and execute(x) permissions on the external drive. Again, if you used chown in the previous command then for this one, the command would be “chmod o+r+w+x”.

Hope this helps!

→ No CommentsTags: Linux

I was away

August 25th, 2008 · No Comments

Well I’m after a weeks break! Had a great time. Most of it spent playing computer games and zoning out.

Back to the grind now. I’ll be picking up on the Distributed File System shortly. Part 3 is well overdue. I’ve just answered a whole bunch of emails and comments so sorry everyone for the late replys!!

→ No CommentsTags: Ramblings

Cannot be resolved to a type error / Only a type can be imported / xxx resolves to a package

August 15th, 2008 · No Comments

This type of error normally occurs when trying to use a java Bean in JSP pages or importing a file.

First lets assume its a form you are processing but the principles will be the same for what ever your at. I’ll try keep this as easy and as step-by-step as possible.

You compile your java file that you are calling. You do this with the javac command. Lets say this file is called ‘Processing.java’ and its in a directory ‘myDir’ The resultant class file should be placed in ‘.WEB-INF/myDir/’ You will of course need a package declation at the top of Processing.java:

package myDir;

You also need a get and set method for each field in your form. If you are unsure how to do this just type ‘jsp form tutorial’ into google - its well documented.

In your jsp file (for arguments sake we’ll call this index.jsp) you now need to import this file:

<% page import ="java.util.*,myDir.rocessing" %>

I’ve included java.util.* just to show you how to import multiple files (use a comma!).

Next we need to use our beans component to make use of all this.

Somewhere in index.jsp we’ll start using all of the above. Lets decare our bean:

<jsp:useBean id="formHandler" class="myDir.Processing" scope="request" >
<jsp:setProperty name="formHandler" property="*" />
</jsp:useBean>

If we had a field name ‘username’ in our form we could now obtain the value of it using the get method from our Processing.java file:

<jsp:getProperty name="formHandler" property="userName" />

If you’ve followed all of the above and the query is still occuring there are two things you can do (besides praying). First in your web.xml file(located in tomcat/conf) look for the following lines:
<param-name>fork</param-name>
<param-value>false</param-value>

Change ‘false’ to ‘true’ and restart tomcat. It should work at this stage.

Try the above again. If it still refuses to work then lets change our ‘class’ to ‘type’. It shouldn’t affect the error message you’re getting but its our last hope.

So in our index.jsp file above our bean tag would actually read:
<jsp:useBean id="formHandler" type="myDir.Processing" scope="request" >

If none of the above works, then check your code for errors….

→ No CommentsTags: JSP

JSP files not being parsed Mod_JK and Apache http

August 14th, 2008 · No Comments

This was one of those problems where I dreamed of writing my Great Novel. I distinctly remember wondering why I hadn’t spent so much time lovingly writing each page rather than frustrating myself with reams of code.

This problem arose with Apache, PHP, Mysql and apache tomcat. All frest vanilla installs. The pages could be processed by apache, but jsp pages were simply thrown out as source code rather than being parsed.

It took me a while (most of the night - its now 4am) to track this down and I hope I never have to touch it again!

First off lets get very basic.

Somewhere in your apache conf file (httpd.conf) you can write the line:
Include /usr/tomcat/conf/mod_jk.conf

Substitute with whatever your path to tomcat is. In the directory /usr/tomcat/conf create the file ‘mod_jk.conf’. Copy the below into it - a very simple config.

#for mod_jk
LoadModule jk_module modules/mod_jk.so

LoadModule jk_module libexec/mod_jk.so

JkWorkersFile /usr/local/apache2/conf/workers.properties
# Where to put jk shared memory
JkShmFile /var/log/httpd/mod_jk.shm
# Where to put jk logs
JkLogFile /var/log/httpd/mod_jk.log
# Set the jk log level [debug/error/info]
JkLogLevel debug
# Select the timestamp log format
JkLogStampFormat “[%a %b %d %H:%M:%S %Y] ”

#end modJK

As you can se from the above we now need to create a workers.properties file. Again substitute the path to whereever you wish to place the file. Its very important that your apache user and tomcat user can access these files! In other words check your file permissions!

The workers.properties file is as follows:
workers.tomcat_home=/usr/tomcat
workers.java_home=/usr/java/jdk1.6.0_03
ps=/

# Define 1 real worker using ajp13
worker.list=worker1
# Set properties for worker1 (ajp13)
worker.worker1.type=ajp13
worker.worker1.host=192.168.1.5
worker.worker1.port=8009

You will need to substitute the correct path for java_home and tomcat home.

This is where it gets tricky. I’m sure there are other variations to the below but I couldnd’t get any other to work.
Apache will have its doc root as path/to/apache/htdocs. Tomcats webroot is something like path/to/tomcat/webapps
You need to somehow resolve apache to also include tomcats webroot (ie apache will parse it for files) or a very simple method, if rather strange at first, is to move your entire document root for apache to tomcats webroot. That would mean in your httpd.conf file you would set:
DocumentRoot "/usr/tomcat/webapps/"

If you don;t want to to this you can set document root for each VirtualHost in your httpd.conf file. This would involve something along the lines of:


Options Indexes FollowSymlinks MultiViews
AllowOverride None
Order allow,deny
Allow from all

Alias /mysite/ “/usr/local/tomcat/mysite”

#to block off WEB-INF insert the following also

AllowOverride None
Order allow, deny
Deny from all

Apache should be able to find all your static files. If you try reach a jsp file the code won’t be parsed, it’ll be printed out to screen exactly as is.

This is where the JkMount directive got tricky with me.
A random VirtualHost from my httpd.conf file is below:

ServerAdmin me@email.com
ServerName www.mysite.com
DocumentRoot “/usr/tomcat/webapps/mysite/”

JkMount /*.jsp worker1

The JkMount should send all jsp files to tomcat, while apache handles all your normal php, html files.

But we also need to define our site in server.xml for tomcat. I think this is where I was tripping up as requests from jk_mod were returning blanks. If you wish see a more detailed picture of whats going on, in your jk_mod.conf change the log type to ‘debug’ rather than info. Opening the log at its location (the one specified in jk_mod.conf!) might throw the following type of error:
map_uri_to_worker::jk_uri_worker_map.c (682): Attempting to map URI '/button.php' from 1 maps
...
...
...
jk_map_to_storage::mod_jk.c (3211): no match for YOURFILE found

So in server.xml we include your site details as tomcat is now working on your site also. For those not familiar with tomcat the server.xml file will be located somewhere like ‘path/to/tomcat/conf/server.xml’.

Open this up and insert the following, modifying the paths of course.

<Host name="mysite.com" appBase="/usr/tomcat/webapps/mysite" xmlValidation="false">
<Alias>www.mysite.com</Alias>
<Context path="" reloadable="true" docBase="/usr/tomcat/webapps/mysite" />
</Host>

Hopefully your jsp files will now be displayed correctly.

→ No CommentsTags: Servers · JAVA

Speed up phpFox?

August 12th, 2008 · No Comments

I’m currently working on a clients site running on phpFox where the speed of the site has gone awful. Its taking nearly 30 seconds to load any page on the site.

The strange thing is that there are other sites hosted on the server as well. These all load without problem and load at normal speed - no drag whatsoever.

The membership is creeping just over 1,000 with about 300 of those quiet active. The script is running on a VPS server with about 1GB of RAM and 2GB of virtual memory. - should be capable of handling the requests.

There are two possible causes - the script is memory / cpu intensive or the server is at fault.

I’ve read of some possible patch to the javascript on the site that may speed up the site, so I’ll try this, but I can’t imagine that loading the javascript would take even 20 seconds to complete For those that are interested its a patch written by pawnage and can be found here:
phpFox Forum

→ No CommentsTags: phpFox

Java Distributed File System - part 2

August 11th, 2008 · No Comments

In this second part of the java distributed file system tutorial we’ll focus on setting up a system to monitor ‘heartbeats’ from each slave.

You can read PART ONE for an overview of the system.

On my main ‘master server’ - the one that keeps track of everything - there are essentially tow java server sockets running. One handles the hearbeats. The second one handles the tasks. Thats all pretty cool, but the obvious question is why use 2? Why not just use the task one to find out when a slave is lost? Won’t the connection fail anyway?
Well yes. But if you’ve processing big tasks on each slave that take a while, you want to know their at least still processing. You could work a ‘hearbeat’ check into each of your tasks, but I found it much easier just to have a seperate thread running. It won’t consume much processing or network bandwidth, so mid task if one of your slaves fail at least you know its failed from the lost heartbeat. Rather than waiting days on a large task to complete and finding out the fecking thing crashed half way through.

So onwards with the code. Therre’s nothing complex about the code. On one side we have a basic server socket (the master server), on the slaves we have client sockets connecting. It then enters a loop sending messages (’heartbeats’) that all is well and working.

You will need to work these files into your application. For instance you might have to executable files that start the hearbeat thread and the task thread. Or you may have a java class that does it. Following the names given to the files in the java trails (or whatever their called!) I’ve tried to keep the names something similar to aid in comparasion.

On the master server we have two files Out Server and our thread class to deal with multiple clients.


import java.net.*;

import java.io.*;
import java.util.ArrayList;

public class HServer extends Thread
{

public void run()
{

try{
System.out.println("Starting HServer");

ServerSocket serverSocket = null;

boolean listening = true;

//master server - gets the address of this server
InetAddress localIP = InetAddress.getLocalHost();
String master = localIP.getHostAddress();

try {
//this opens a port on 8600 - you can use a different port if you wish
serverSocket = new ServerSocket(8600);

} catch (IOException e) {

e.printStackTrace();

System.exit(-1);

}
System.out.println("HServer running...waiting for connections");

while (listening)
{
//accepts incoming connections
new HMultiServerThread(serverSocket.accept()).start();
}//end while

System.out.println("closing Hserver");

serverSocket.close();

}//end try
catch (Exception F) {

F.printStackTrace();

}//end catch

}

}

Now the thread that processes the incoming connections:


//accepts incoming connections for heartbeats

import java.net.*;

import java.io.*;

public class HMultiServerThread extends Thread
{

private Socket socket = null;
public HMultiServerThread(Socket socket)
{

super("HMultiServerThread");
this.socket = socket;
}

public void run()
{

System.out.println("HMultiServerThread starting.."+socket);

try
{

PrintWriter out = new PrintWriter(socket.getOutputStream(), true);

BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));

String inputLine=null;
String outputLine=null;
String socketaddr="";

outputLine = "0";

//here you should enter this slave into a database/file of active slaves that can
// be read by your task server.

out.println(outputLine);

//here we enter a simple loop. we send '0' to the clients and they return '1' if active.
// if a response is receive we put them to sleep for a while to save network bandwidth

while((inputLine=in.readLine())!=null)
{

if(inputLine.equals("1"))
{
try {
Thread.sleep(5000);
out.println(outputLine);
}//end try
catch (Exception f)
{

System.out.println(f);

}//end catch
}//end if

}//end while

//here you should remove the slave from your database / file that you created earlier.

out.close();

in.close();

socket.close();

}//end try
catch (IOException e)
{

e.printStackTrace();
}
}//end run
}//end class

Next the client side

Again we have two files. One is just a simple class that starts the client socket connection. In case you’re wondering why I’m using these, its becase its easier call these from a central class to start our socket connection from one command. I’ll show you how to do this in part 3. The files for the clients are:


public class StartSlave
{
//this simply cancels the output. You should include this
// if you are using a logging system.
static public void daemonize()
{
System.out.close();
System.err.close();
}
public static void main(String[] args)
{

HClient hClient = new HClient();
hClient.start();
daemonize();

}//end main
}//end class

The HClient class launces a client socket to our server above.


import java.io.*;
import java.net.*;

public class HClient extends Thread {

public void run()
{

//for logging

Socket cSocket = null;
PrintWriter out = null;
BufferedReader in = null;

try {
InetAddress localHost = InetAddress.getLocalHost();
String client=(localHost.getHostAddress()).toString();

System.out.println("Starting HClient");

//You need to have an IP address for your sever socket here. this might
// be something like 192.168.1.15 etc. I've used 192.168.1.20 for illustration purposes.
// the 8600 is the port of the server to connect to.
cSocket = new Socket("192.168.1.20", 8600);
out = new PrintWriter(cSocket.getOutputStream(), true);
in = new BufferedReader(new InputStreamReader(cSocket.getInputStream()));

//reads in data from server
BufferedReader stdIn = new BufferedReader(new InputStreamReader(System.in));
String fromServer;
String fromUser;

while ((fromServer = in.readLine()) != null) {
if (fromServer.equals("0"))
{
//sends an alive signal to the server
out.println("1");
}

else {
System.out.println("HClient Invalid Command Received : " +fromServer);
}

}//end while

out.close();
in.close();
//stdIn.close();
cSocket.close();

} catch (UnknownHostException e) {
System.err.println("Don't know about host: 192.168.1.20");
System.exit(1);
} catch (IOException e) {
System.err.println("Client:: Couldn't get I/O for the connection to: 192.168.1.20.");
System.exit(1);
}
catch (Exception f) {
System.err.println("Client:: " +f.toString());
System.exit(1);
}

}
}

This is essentially an endless loop that carries on until broken by a lost connection. In later versions we implemented a count on the server side to waiti for clients - if no heartbeat was received during this time limit the client was taken off the list of active slaves until a hearbeat was received.
In part 3 I’ll show you how to start all these classes from a central command on the server.

→ No CommentsTags: Distributed file system

Java Distributed File System - part 1

August 6th, 2008 · No Comments

I’ve been literally inundated with one comment about my java distributed file system so I’ve decided to blog a bit about it.
The code itself is nothing spectacular. It’ll take me time to rip out the relevant sections so I’ll give a brief overview of how the system works in this blog post and post more detailed code later on in the week. I’ve tried to keep the code as easy to read as possible. All comments welcome!!

Firstly the basic requirements of a distributed file system are:
1. Expandability - the system must be capable of expanding
2. Data Redundancy - the data should not be lost if a node fails
3. Error recovery - the system should not crash when a node fails

What do we need to implement the above?
1. A master server. The main purpose of our master server was to track the files on the distributed nodes. The master ‘memory’ of the whereabouts of each file should be capable of being reconstructed from data on all of the slave nodes in case of failure. Similarly, the master server should be capable of handling any slave nodes lost and re-distributing the work and data that was held on that slave.
2. A hearbeat check - I’ve implemented a heart beat check that simply sends a signal to each slave every 10 seconds or so. If no response is received the node is presumed MIA. God only knows what booby trap occurred, or if a bunker buster took out the hard drive but until further notice the master server no longer sends work its way. This process is the first process.
3. The second process comprises the actual tasks and jobs that the slaves must perform. What this means is entirely up to you. For us it meant crawling URLS, processing data and distributing that data. For you it might mean attempting to download all the pictures of elephants on the WWW, copying files to slaves or duplicating important pieces of information. Allied to this second process is a third process:
4. The third process is part of the second process. Each slave must be able to communicate with each other. We don’t want all data having to travel from each slave to the master server, and then back out to another slave. All commands begin in the server. With a proper file system layout, the master knows that each slave will duplicate the data to the other slaves directly - therefore there is no need to place additional workload on the mater server and clog up our internal network. This is important, both in terms of speed and time saved. Notification is sent to the master when updates to slaves from slaves are completed.

Right, I hope thats suitably simple or complex. How do we actually achieve the above in Java? Our own work began with a series of sockets and server Sockets. Out Master Server runs 2 server sockets ( on efor the heartbeat check, one for carrying out tasks), our nodes (or slaves - I use the term interchangeably) client sockets. Our nodes also run one server socket which is reserved specifically for slave to slave communication. So at any one time a node will have two client sockets connected to a master socket, and it may also have a server socket listening for connections from other slaves.

Because of the number of processes that could occur simultaneously, one of the biggest problems we ran into was threads banging heads on our databases. I got around this by implementing a strict write once read many rule for our data. That means no matter how many threads are connected, only one can write once to the data. For this reason there are instances where instead of updating the data directly, a ’shard’ of information is first build up on a slave and when directed this shard is sent to another slave to be updated on that second slave. This reduces the network traffic and ensures that there is only ever one process writing to the data at a time.

In all likihood you will run into the same problem, unless you are using an SQL database with multiple connections. It wasn’t possible in our case given the size of our data so I’ll be presuming that everyone can only use ‘write at once-read at many’ process. My presumptions have been known to be wrong.

My next blog will focus on the hearbeat checking and setting up any information we need for the task process - eg. the number of replications we require, who our slaves are and how we carry information over the network.

You can view Part 2 of the Java Distributed File System here

→ No CommentsTags: JAVA

Cuil Investment and backers

July 31st, 2008 · No Comments

I was curious to see how much Cuil has spent on investment. Reports suggest over $25 milion.

I would presume that this has been split somewhat roughly between hardware costs (datacenters / servers / offices etc) and staff costs. Where Cuil is falling down is none of these - its failing to seperate the spam sites from real sites. It is also failing to correctly pick the most important sites on the internet as sources for its top results. The reason for this failure? Cuil is based on content analysis solely. Google relies on link analysis AND content analysis.

Which means spam sites with multiple repeats of search terms are currently ranking well in cuil. They really need to tweak their algorithim to remove an above average return of terms on a page.

Its not a hard process to do. The terms are already being counted. They just need to decide an average percentage vis a vis the total for one term, versus the number of terms on a page.

So for instantce to prevent a page somewhere else ranking well for the term ‘hirednetwork’ we would want to filter a page that had this:

HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks! HiredNetwork rocks!

This is the type of page that currently ranks well on cuil. So to filter this cuil needs to check the results and figure out that ‘hirednetwork’ is 50% of the total tokens(or terms) on the page - an abnormal result. So this should be pushed someway down the results order rather than being displayed on the first page for a query term.

On a side note it’ll be interesting to see if I can get this post to rank first for ‘hirednetwork rocks!’ on Google!

→ No CommentsTags: Google · Search Engine Development

New Search Engine Cuil

July 29th, 2008 · No Comments

It was bound to happen at some point.

Google employees leave and set up their own search engine.

Its always the way!

Cuil has been set up by three former google employees. Anna Paterson (worked on Google’s search engine index), her husband Tom Costello (worked on search engines at Stanford University and IBM no less!). The final piece of the jigsaw is Russell Power, Paterson’s chief colleague at Google.

With an all star cast like that it’ll be interesting to see where cuil goes. I couldn’t find any figures for what they’ve invested in the start-up but it must be big. Presumably software development costs would have been minimal given their pedigree, most of the investment would have gone on hardware - servers, data centers etc.

Tom Costello said they have more of the internet indexed than Google. This means a wider search index for users. Most people are surprised that all the pages on the internet aren’t indexed but to do so would be a huge task. Consider it lilke this. Most urls consist of:

http://www.mySite.com/index.php?greeting=hello

Which is easy enough to index. But what when you could have an url like:

http://www.mySite.com/index.php?greeting=hello&name=david&age=16&friend=tara&dog=sam

now we automatically know their could be five different pages with different amounts of data on each given the options contained in the url. Do we index each and every one, or is all we need the greeting and the name ‘david’ ? Do we really want to consume valuable time and bandwidth indexing his friend tara and his dog sam?

The search engines may crawl that page above. But that is not to say it is indexed. The page may be contained within an option to ‘view similar pages’. In fact thousands of pages with a myriad of differening options may be contained with this option.

Anyway I digress. What chance has Cuil? The odds are stacked against it. Google has developed a stranglehold on the search market. Its adsense and adwords give both publishers and advertisers a mass market. Cuil does not. So it is reliant on its search results being significantly better than Google. Does it do this?

Yes and no. The actual results thrown up are in my opinion less quality than Google. But it does accurately predict what you may be searching for next, based on what your previous search query. Its a neat feature.
Plus the layout is completely different that Google. When Google appeared on the scene it was a breath of fresh air to have an empty screen compared to the clutter and text on previous ’search engines’ like Yahoo and Microsoft. Cuil has a layout more like a magazine or blog entries, but its still easy to scan over them looking for what you want.

One interesting thing about Cuil is its promise to display a wider set of results than simply focusing on the more popular pages on the internet. Its search its more content based rather than having page rank and link popularity rating its results. Its a double edged sword. Content based searches may throw up a lot of spam sites or sites made for adsense (MFA sites). If these become a regular result, it will turn a lot of users away. On the other hand, focusing on the content rather than the link popularity of a site may bring great results to certain searches. It must be said though that Google has clamped down an awful lot of the abuse of link popularity. Directories for instance have had much of their page rank stripped away and its debateable how much submitting to directories can affect your page rank now.

Even the last Google page rank update was rather strange in that ancedotal evidence is was a Page rank ‘down-grade’ rather than update. None of my sites received Page rank updates (Feel free to send your condolences). Most of the sites I visit lost page rank or had no change. Is Google making it harder to obtain page rank? Or is Google’s standard for links tighening (like the aforementioned loss of PR amongst directories)?

Either way cuil has its work cut out. It always takes so long for a search engine to get a snapshot of the estimate 120 billion pages out there. So in its early days it would not be surprising to see its search results of less quality than google. What I’ll be looking for is how its search results compare with Googles in 9-10 months (unless they have fantasic hardware its a big job that takes time).

In the end its good to see competition to Google emergine. Expect more.

→ No CommentsTags: Google · Search Engine Development