Google News: September 2006

Sunday, September 10, 2006

Google Maps Updates Satellite Photos

The satellite imagery for Google Earth and Google Maps was updated. Several countries/ states/ cities now have a higher resolution. The Google Earth Blog compiled a list. I’m still getting a weird chessboard pattern for many locations...

Thursday, September 07, 2006

Google Seeks Help with Recognition

Google does not shy away from gargantuan projects. The search giant is known, after all, for indexing the World Wide Web and mapping the entire planet in three dimensions. But the company's latest endeavor may be too big for even the Internet goliath to complete alone.

Google (GOOG) wants to index all the world's printed material for inclusion in a comprehensive online library. To that end, it launched Google Books, a service featuring free-to-download classics and excerpts from copyrighted works, and Google Scholar, a database of academic and scientific research (see BusinessWeek.com, 8/31/06, "Google Offers Classics for Free").

On Sept. 6, it added another service to further its goal: Google News Archive. The new application allows computer users to search back issues of various publications, such as The Washington Post (WPO), The New York Times (NYT), and The Wall Street Journal (DJ). Articles from some journals date as far back as 200 years and, in some cases, must be purchased from the original publication (see BusinessWeek.com, 9/06/06, "Google Digs Into the Archives").

Despite its relentless release of virtual library material, however, Google is asking the greater engineering community for help developing the technology it needs to index and archive all published works.

COMMUNITY EFFORT. On Aug. 30, Google's "über tech lead" Luc Vincent announced that the company was turning to the tech community for help improving Optical Character Recognition (OCR) technology, which enables computers to decipher words in scanned texts. The first step: Google debugged an old Hewlett-Packard OCR engine, named Tesseract, that HP had released to university researchers in Nevada. Before that, the application had sat idle at Hewlett-Packard (HP) since 1995, when the company decided to leave the OCR business and concentrate on its line of home office products, computers, printers, and cameras.

Google then released the cleaned version to the open-source community. Bdale Garbee, chief technologist at Hewlett-Packard, says the company is pleased others will build off its efforts. "We're happy to see good code being put to good use," he says, "and we look forward to seeing where the community takes this technology in the future."

Google is hoping they take the technology far beyond its current capabilities, says Chris DiBona, Google's open-source program manager. OCR technology is central to Google's cause because it enables search engines to "read" documents. Without OCR, the computer sees a scanned page of print only as an image and cannot find keywords or phrases in the text. In the search world, OCR means the difference between being able to find a book only if you know the complete title and being able to find it if all you know is a few key quotes.

Because it was essentially abandoned, the program's capabilities badly lag the standards of current commercial OCR engines. Tesseract has trouble reading gray scale and text with background color, for example. Google, however, sees promise that the technology community, by tinkering with formerly proprietary coding within Tesseract, will be able to come up with some solutions to problems that plague even the paid technology.

LOOKING TO LEAP FORWARD. DiBona says the OCR engines out there are 99.5% accurate at reading Latin characters, but still have some trouble with other languages, handwriting, highly stylized fonts, and unique layouts. In the past, Google has had some problems with blurry or off-center scans that can sometimes confuse the OCR engines. For example, a poorly scanned book with blurry characters could prevent the OCR engine from deciphering the letters and words in a document. Thus, that page would not be properly indexed by searches (see BusinessWeek.com, 12/22/05, "Google's Great Works in Progress").

"If you look at OCR over the past 10 years, not much has happened. There are some programs out there that are pretty good, but we wanted to see if by putting OCR out there we could improve it," DiBona says, adding it would be "really good if OCR gets better for everybody."

As more offices began moving from paper to digital, they needed OCR technology to help computers recognize the text in their scanned documents and allow them to edit the new digital versions. Over the past three years, search engines and other online companies expanded the use of OCR by applying the technology to search, says Robert Weideman, senior marketing vice-president for Nuance Communications (NUAN), maker of one of the market-leading OCR engines, OmniPage. Both Google and Amazon (AMZN), for example, use OCR technology to match search phrases with specific passages in books.

OCR PROLIFERATION. But who uses OCR outside of online search and commerce? Well, increasingly, everybody who has ever scanned a document or read a scanned document. "When you think about who touches on our OCR technology, it is literally millions of people worldwide…any industry that deals with paper uses OCR," Weideman says. Nuance experienced 8% growth and reaped more than $70 million in revenues from the OCR digital imaging business last year, Weideman says, which includes PDF conversion software.

Those profits may seem surprising for a technology that, at first, didn't seem to have many practical applications. When inventor Raymond Kurzweil created the first OCR system in 1974, he struggled to find a use for it (see BusinessWeek.com, 5/02/01, "How Ray Kurzweil Keeps Changing the World"). The mass-market answer eventually came in the form of a scanner.

Nuance has provided OCR technology to Google, though a confidentiality agreement keeps it from saying whether its OCR systems power Google's current book search. The company has also supplied Microsoft (MSFT) with OCR technology for its upcoming XPS program—a PDF competitor to Adobe's (ADBE) Acrobat—which will be included in the new Vista operating system.

FOLLOW THE LEADER. Now OCR is built into many scanners and comes standard on some computers. It even has become incorporated into cell phones via software that allows people to take pictures of text, such as business cards, for example, and then index the pertinent words in their address books.

Whether Google's open-source release will lead to better OCR technology and more future users is uncertain. Analysts say Nuance has built such a big lead in the industry that competitors remain also-rans, unlikely to contribute major advances any time soon. "For Nuance, they are leaps and bounds ahead of any of their competitors in the space," says Daniel Ives, an equity analyst at Friedman, Billings, Ramsey & Co., a Web-based investment bank based in Arlington, Va. "Nuance dominates the industry," adds Jeff Van Rhee, an equity research analyst at Craig-Hallum Capital in Minneapolis. "I don't see [Tesseract] being a big impact."

Over time, however, Ives sees the demand for OCR and related technology increasing as more people seek to switch effortlessly between the paper and digital worlds. "There's no doubt that we believe it is going to be a very fertile market area," he says.

So fertile, in fact, Google is turning to OCR for help moving the world's print into a digital archive of everything.

Google to offer print archives searches

WASHINGTON, Sept. 6 (Xinhua) -- Google has planned to offer a service, which will permit Internet users to search through the archives of newspapers, magazines and other publications.

The service will help Internet searchers uncover material, which in some cases dates back more than 200 years, The New York Times reported on Wednesday.

The new feature, to be named Google News Archive Search, will direct Google searchers to both paid and free digital content on publishers' Web sites, but will not directly generate revenue for Google, according to the report.

Google would not announce that how many publishers were taking part in the new service, for which Google has independently indexed material from online databases and will display the results both as part of standard searches and through a new archive search page, news.google.com/archivesearch, said the report.

However, it announced a number of partners including The Wall Street Journal, The New York Times, The Washington Post, Time, Guardian Unlimited, Factiva, Lexis-Nexis, HighBeam Research and Thomson Gale, the report added.

In contrast to Google's book scanning project, which has led to legal skirmishes with some publishers over copyright issues, some of the partners involved with the new service said they had been pressing Google to offer access to their archives for several years.

The databases included in the service are part of what some have called the "dark Web," because they cannot be "spidered," or indexed, by standard search engines and thereby have not been accessible through them. Enditem

http://news.xinhuanet.com/english/2006-09/07/content_5060556.htm

Google and the Pedophiles

It doesn't have the cachet or name recognition of Myspace or Facebook, but in Brazil, Google's relationship site Orkut boasts a coverage that its more celebrated rivals cannot match. Almost half of Brazil's 32 million internet users have a profile on Orkut. Go to any cyber café in Rio de Janiero or Sao Paulo and wired youngsters will be leaving messages for friends, checking out potential dates and surfing through hundreds of thousands of communities, from the spiritual ("We love God"; 254,072 members), intellectual ("Addicted to books"; 46, 203 members) and physical ("I've got a big butt, what about it?"; 62,673 members).

But Brazilian prosecutors say that pedophiles, anti-Semites and racists are also using Orkut to peddle less innocuous messages. And they accuse Google of protecting them by balking at revealing the IP addresses and other information that could help law enforcement track them down. A judge last week gave Google 15 days to hand over the incriminating data or face a daily fine equal to $900,000. "Making it easier for those Brazilians who use anonymity of Orkut to commit crimes of child pornography and racism reflects a profound disrespect for national sovereignty," Judge José Marcos Lunardelli said in last week's ruling. "Brazilian law is applicable here."

Google argued otherwise, saying that because the information on Orkut is stored on U.S. servers, its Brazilian subsidiary has no access to it and thus cannot hand it over. The company asked prosecutors to withdraw the summons against Google Brasil and address new ones to parent company Google Inc. Only then, Google officials said, would the company hand over incriminating data, as it has done in more than 70 similar cases elsewhere in the world.

Brazilian authorities complied and rewrote the court orders, and Google now says it will hand over the data on pedophiles and other criminals. But Google's argument infuriated Brazilians, who charged the company was putting bureaucratic niceties in the way of tracking down pedophiles and racists. To Internet watchdogs, however, the company stood up for the important principle of establishing international norms on what information global Internet companies should hand over to local authorities and what procedures both sides must follow.

"I think Google's decision to make the legal procedures go through the American justice system is a good thing, not because of Brazil but because of the world," said Julien Pain, director of the Internet freedom desk at Reporters Without Borders. "This way, if you make a request to Google in the U.S., the request can be supervised by American justice. This kind of procedure may seem useless in the case of Brazil, which is a democracy and respects human rights. But it's crucial when Google has to deal with repressive regimes. If a Chinese or a Syrian judge asks information about a dissident or a journalist, it's important that Google could say no."

Google denies it is consciously trying to set a precedent, and, recognizing that the issue of child porn is a sensitive one, is anxious to play down the controversy. However, experts praise the company for taking a more principled stance than some of its rivals have. According to a Human Rights Watch report issued last month, Yahoo voluntarily handed over incriminating info that led to the arrest of four Chinese dissidents; Microsoft censored searches and deleted blogs in China; and Skype configured its Chinese software to censor certain words in its chat function.

"You could argue Google is wrong in protecting child pornographers, but maybe they are not being driven by the fear of bad PR, but rather by what they think is right," said Esther Dyson, the former chairperson of the Internet Corporation for Assigned Names and Numbers who now writes a blog on developing technologies. "The next time the Brazilian authorities say we want e-mail addresses for some reason other than child porn, Google will have a much stronger position, because they have established, both here and in the U.S., that they do not blindly accede to government requests."

Now the question is whether other governments and Internet giants are watching.
[via time]

Google Office: Image Gallery

In a ZDNet Image Gallery, I've gone through 7 products that may become part of a future Google Web Office. Right now, Google doesn't have a full web-based office suite on the market - but this year they've gradually been compiling Web Office parts. For example if you click on "all my services" in the top left corner of your Gmail, you'll go to your Google account and see a list of products that Google offers. Many of them are Web Office parts, or could easily become a part of a Web Office. Here is the current list:

Analytics
Base
Calendar
Co-op
Gmail (including GTalk)
Page Creator
Personalized Homepage
Personalized Search
Spreadsheets

Google Spreadsheet

So there are 9 current Google services listed - the 6 I've highlighted are Office candidates. You can add word processing app Writely to that, which makes 7 possible Web Office suite parts. Some of the pre-beta products from the Google Labs page are possible additions in the future, as well as Labs "graduates" like Google Desktop. But let's not worry too much about what's missing (presentations and project management aren't even Google products yet).

writely
Writely

Indeed there's a lot of work to be done to integrate the 7 office-like products listed above. While recently Google released the oddly named Google Apps for Your Domain - which bundles together Gmail, Google Talk, Calendar and Page Creator - it's just the start of what could be done to integrate products into an office suite.

Even so it's worth looking at the current product mix, for clues to a future Google Office. In the Image Gallery I've compiled, I've focused on the 7 office-like products listed above. I've highlighted a few promising Web Office features from most of the products, even if there's work to be done by Google yet.

[ths gseeker]

[via readwriteweb]

And the Desktop Gadget winners are

With all the great entries we received for the Google Desktop Gadget Contest, we've learned that there are some very talented developers out there. Amongst all the gadgets submitted, these three really stood out:

diGGGadget by Marius and Yannick Stucki – Stay on top of the latest stories from digg.com. Click on a few buttons and you'll know why we think it's so great. It also takes advantage of our advanced APIs to enable sharing news with friends plus personalization based on your interests.
.......

[via googleblog.blogspot.com]

Tuesday, September 05, 2006

10 Great Uses For Google Desktop

Google Desktop lets you search your computer and have tiny bits of information at your fingertips if you use Google Gadgets. Everybody knows that. But Google Desktop can have other interesting uses.

1. Program launcher

If a program has a shortcut in the Start Menu, type the first letters of the program name and you can launch it.

2. Control Panel replacement

Do you want to change your mouse settings or the network setings? Type "mouse" or "network" and you can open the configuration tool without using Control Panel.

3. Address bar

Open files, folders and Internet addresses.

4. Browser history

Your browser can keep the sites you visit for a limited period (a week or more). Google Desktop keeps them indefinitely and makes your history searchable.

5. Browser cache

Google Desktop keeps all the versions of the pages you see in your browser, so you can use it for reference if you want to see how a page has changed.

6. File recovery

If you delete a document, Google Desktop keeps it in its cache, so you'll still be able to recover the content.

7. File versioning

You can use Google Desktop to revert to a previous version of a document (text file, Office document, HTML file). Google Desktop keeps all the versions of a file, so it may be useful if you don't backup your files.

8. Most recent documents

Google Desktop has a timeline that lets you the files created or modified recently.

9. Office / PDF viewer

If you don't have Microsoft Office or Adobe Reader, Google Desktop lets you see a text version for PDF, DOC, XLS, PPT files. It doesn't look great, but it's useful as a text preview.

10. Gmail replacement

Gmail is down or your internet connection is down. You don't use a desktop mail client, but you need to find an important mail. Fortunately, Google Desktop indexes your mail (if you want to), so you can search your messages and read them offline. The attachments aren't saved on your PC.

For most of these features, you need to enable in Google Desktop the indexing for web history, most file types, disable "remove deleted items", enable Gmail and "launch programs/files by default" in quick find.

If you don't like the sidebar, you can disable it in Preferences / Display by choosing Deskbar, Floating Deskbar or None. You can use the "Quick Search Box" (the one from the first screenshot) by hitting Ctrl twice.

Saturday, September 02, 2006

Google appeals to Brazilian court to review order on information disclosure

Google Brazil on Friday asked a federal judge to review his decision that it disclose the data of users of Google's social networking site, Orkut, who were accused of criminal activities.

Judge Jose Marcos Lunardelli ruled on Thursday that Google Brazil turn over within 15 days user information from websites that promote crimes like racism or child pornography, threatening the company with a daily fine of 50,000 reals (23,400 U.S. dollars) if it does not comply.

[via people' daily online]

Yahoo! and Google Play Tag

Earlier this week Yahoo! (Nasdaq: YHOO) launched a new feature that allows users to "geotag" pictures on its Flickr photo-management website, using a simple drag-and-drop interface. The process is so easy that the company reported that more than 1.2 million photos were geotagged in the first 24 hours.
......
On the same day Yahoo! announced its new geotagging feature, Google reportedly registered three new domain names, including googleimagetagger.com.
......
I can't begin to say who will win in this global game of tag, but I do see healthy new revenue streams for both Yahoo! and Google, and maybe even Microsoft. And, at a minimum, investors should begin considering this potentially lucrative line of new revenue when assessing the value of these stocks.
[via fool.com]

Google top of the website pops

Statistics released by web monitor ComScore revealed that Google properties attracted 156.3 million unique visitors during the month.

In comparison Microsoft sites racked up 144.1 million visitors, while the Yahoo network managed 99.5 million.

In addition to topping the charts, Google also had a higher share of surfers in Europe than the US.

[via vnunet]

Gmail Spam Filter Bug

Gmail’s spam filter has got quite the oversight, as discovered by a LifeHacker reader. Basically, if a spammer sends you a spam with your email address as the sending address, it can wind up in your sent items folder instead of your spam folder, even if it is really obvious, catchable spam. If spammers take advantage of this hole and start sending Gmail users a ton of this stuff, it could ruin people’s inboxes, filing them with useless, everlasting spemails.

[via inside google]

Download the classics free

We can download full copies of out-of-copyright books form Google Book Search to read at your own pace.

The tips to find out-of-copyright books is "Full view". To find out-of-copyright books that you can download,simply select the "Full view"

radio button when you search on book.google.com .

[via Inside Goolgle Blog]

Google talks with eBay

Gtalk will can be talking with Skype.Google and eBay have signed an agreement around text-based advertising and "click-to-call"adviertising, in wh

ich Google Talk and Skype will power voice calls between customer and merchcants. That's a big step in Google's commitment to interoperability

via open,industry standards. It's so good for users of Gtalk and Skype.

[via Google Talkabout]

Friday, September 01, 2006

Google Code Jam 2006

There are only 5 days left until registration closes for Google Code Jam 2006. So far, about 16,000 competitors have signed up to show off their programming skills -- and perhaps win an all-expenses paid trip to our New York City engineering office to compete in the finals on October 27. The winner gets $10,000 and global bragging rights: people have registered in huge numbers not only from the U.S., but from India, China, Canada, Brazil, the Russian Federation, Poland, Pakistan, Iran, Australia, the U.K., Germany, Singapore, Japan, Hungary -- you get the idea.

【via Google Blog】

Google News