Twitter Poll – How Does Google Index Content on the Web?


Google Indexes by Websites, Pages, or URLs

I thought this was an interesting question to ask people because I think it’s often misunderstood. Google treats content found at different URLs as if it is different content, even though it might be the same, such as in the following examples:

http://www.example.com
https://www.example.com
http://example.com
http://example.com/index.htm
http://example.com/Index.htm
http://example.com/default.asp

One of the most interesting papers I’ve come across on this topic is this one (One of the authors joined Google shortly after this was released – Ziv Bar-Yossef):

Do Not Crawl in the DUST: Different URLs with Similar Text

What do you think?


Copyright © 2016 SEO by the Sea. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Twitter Poll – How Does Google Index Content on the Web? appeared first on SEO by the Sea.





Source link

Share Button

How Google May Respond to Reverse Engineering of Spam Detection



The ultimate goal of any spam detection system is to penalize “spammy” content.

~ Reverse engineering circumvention of spam detection algorithms (Linked to below)

Four years ago, I wrote a post about a Google patent titled, The Google Rank-Modifying Spammers Patent. It told us that Google might be keeping an eye out for someone attempting to manipulate search results by spaming pages, and Google may delay responding to someone’s manipulative actions to make them think that whatever actions they were taking didn’t have an impact upon search results. That patent focused upon organic search results, and Google’s Head of Web Spam Matt Cutts responded to my post with a video in which he insisted that just because Google produced a patent on something doesn’t mean that they were going to use it. The video is titled, “What’s the latest SEO misconception that you would like to put to rest? ” Matt’s response is as follows:

I’m not sure how effective the process in that patent was, but there is a now a similar patent from Google that focuses upon rankings of local search SEO results. The patent describes this spam problem in this way:

The business listing search results, or data identifying a business, its contact information, web site address, and other associated content, may be displayed to a user such that the most relevant businesses may be easily identified. In an attempt to generate more customers, some businesses may employ methods to include multiple different listings to identify the same business. For example, a business may contribute a large number of listings for nonexistent business locations to a search engine, and each listing is provided with a contact telephone number that is associated with the actual business location. The customer may be defrauded by contacting or visiting an entity believed to be at a particular location only to learn that the business is actually operating from a completely different location. Such fraudulent marketing tactics are commonly referred to as “fake business spam”.

The patent tells us that search engines will sometimes modify how they rank businesses to keep fake businesses from showing, and they want to stop people from spamming local earch results. The patent developed in response to fake spam business listings Is:

Reverse engineering circumvention of spam detection algorithms
Inventors: Douglas Richard Grundman
Assigned to: Google
Patent 9,372,896
Granted June 21, 2016
Filed: November 26, 2013

Abstract

A spam score is assigned to a business listing when the listing is received at a search entity. A noise function is added to the spam score such that the spam score is varied. In the event that the spam score is greater than a first threshold, the listing is identified as fraudulent and the listing is not included in (or is removed from) the group of searchable business listings. In the event that the spam score is greater than a second threshold that is less than the first threshold, the listing may be flagged for inspection. The addition of the noise to the spam scores prevents potential spammers from reverse engineering the spam detecting algorithm such that more listings that are submitted to the search entity may be identified as fraudulent and not included in the group of searchable listings.

A Webmasterworld thread discussed the older patent I mentioned , and provides some interesting commentary on it that is worth reading through: Google’s Rank Modifying Patent for Spam Detection

The patent describes how it might not show any positive results in response to fake business spam to throw off people spamming results and to make it more difficult for people to reverse engineer spam detection patterns. I wasn’t convinced that being aware of this patent would help make it easier for people to spam local search results,

It may sometimes not demote a business after fake business spam has been submitted on behalf of a business, if a spam score added to a score for the listing doesn’t rise beyond a certain amount, as shown in this flowchart from the patent:

spam score

It’s difficult to say whether Google is using the process described in this patent or not (or in the pstent I wrote about 4 years ago.)


Copyright © 2016 SEO by the Sea. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post How Google May Respond to Reverse Engineering of Spam Detection appeared first on SEO by the Sea.





Source link

Share Button

The US is Asking for Help Understanding the Impacts of Artificial Intelligence



Artificial Intelligence, by Global Panorama. Some Rights Reserved
Artificial Intelligence, by Global Panorama. Some Rights Reserved

As we approach the celebration of the 4th of July, I thought it might be interesting to share a request for information made to the US Federal Register and a post on the Whitehouse blog. The US government is interested in what Artificial Intelligence might mean to the people of the United States, and how we could learn about it more. To find out, they are asking for comments by July 22, 2016.

Ed Felton, Deputy U.S. Chief Technology Officer wrote the following blog post about what the government would like to learn: How to Prepare for the Future of Artificial Intelligence. He tells us that the reason for the request for public input is to learn from a wide range of people about what we can do to become ready:

Broadly, OSTP is interested in developing a view of AI across all sectors for the purpose of recommending directions for research and determining challenges and opportunities in this field. The views of the American people, including stakeholders such as consumers, academic and industry researchers, private companies, and charitable foundations, are critical to informing an understanding of current and future needs for AI in diverse fields.

The request for information can be found on the Federal Register here:
Request for Information on Artificial Intelligence

The Office of Science and Technology Policy is asking for comments that tell them about:

  1. The legal and governance implications of AI;
  2. The use of AI for public good;
  3. The safety and control issues for AI;
  4. The social and economic implications of AI;
  5. The most pressing, fundamental questions in AI research, common to most or all scientific fields;
  6. The most important research gaps in AI that must be addressed to advance this field and benefit the public;
  7. The scientific and technical training that will be needed to take advantage of harnessing the potential of AI technology, and the challenges faced by institutions of higher education in retaining faculty and responding to explosive growth in student enrollment in AI-related courses and courses of study;
  8. The specific steps that could be taken by the federal government, research institutes, universities, and philanthropies to encourage multi-disciplinary AI research;
  9. Specific training data sets that can accelerate the development of AI and its application;
  10. The role that “market shaping” approaches such as incentive prizes and Advanced Market Commitments can play in accelerating the development of applications of AI to address societal needs, such as accelerated training for low and moderate income workers (see https://www.usaid.gov/cii/market-shaping-primer); and
  11. Any additional information related to AI research or policymaking, not requested above, that you believe OSTP should consider.

If you would like to submit comments, you can do so by Fax, or Mail, or online at:

https://www.whitehouse.gov/webform/rfi-preparing-future-artificial-intelligence

On July 7th, a workshop and presentation will be livestreamed on near term impacts of AI titled, The Social and Economic Implications of Artificial Intelligence Technologies in the Near-Term at 5:30pm (est). There are program committee members from Microsoft and Google, the Whitehouse, and New York University School of Law, Harvard, and Washington University. This presentation looks like it may be really interesting.


Copyright © 2016 SEO by the Sea. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post The US is Asking for Help Understanding the Impacts of Artificial Intelligence appeared first on SEO by the Sea.





Source link

Share Button

Machine Learning Inside Google



By OSX - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=12890983
By OSX – Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=12890983

Understanding Systems

When I was in high school, one of the required classes I had to take was a shop class. I had been taking mostly what the school called “enriched” courses, or what were mostly academic classes that featured primarily reading, writing, and arithmetic. A shop class had more of a trade focus. I was surprised when the first lesson on the first day of my shop class was a richer academic experience than any of the enriched classes I had taken.

The instructor started talking about systems, and how many manufacturing processes involved breaking products down into different systems. We were going to start off by building an electric motor for this shop class, which was an important part of electrical systems within automobiles. This idea of looking at the internal functions of vehicles and classifiying their parts, and understanding how they fit together was an exciting and interesting perspective. I’m reminded of that approach to understanding systems with a newly granted Google Patent that uses a machine learning algorithm to classify and understand how support pages might fit together.

Google Refocusing Upon Machine Learning

Steven Levy, author of In The Plex, which reveals stuff about the earliest days of Google has been sharing more with us, including a look at how Google has started relying upon machine learning approaches, and he tells us about that in a recent post, titled How Google is remaking itself as a machine learning first company.

I thought the machine learning article was interesting after reading a recently granted Google patent that attempts to understand what pages on the Web are about using a classification approach. The patent had me asking myself, “is Google going to say good bye to PageRank for a new way of ranking Webpages that doesn’t rely upon links from other sites?” They have used PageRank to rank pages from their earliest days.

This new patent focuses upon a way of classifying data that uses an approach based upon ” a hierarchical taxonomy of clustered data.”

The patent starts off by using an example of how information for a support center works. The patent tells us that keywords might be extracted from documents about providing support to users in a way that generates clusters of documents with similar keywords.

A classification algorithm might be used where classifications are based upon a taxonomy and documents are classified with a confidence level.

This is an interesting way of looking at the Web, and understanding its different parts and how they fit together.

The Patent

It wasn’t until I looked at the LinkIn profile for Nadav Benbarak that I gained a sense of why this patent came about. In his experiences at Google, we are told about one project he worked upon:

Product manager for new product development effort. Managed product vision, roadmap, design, and implementation. Led 15-person team of engineers and operations specialists.

• Created a new project to develop a suite of tools and data sources for reporting on the quality and effectiveness of Google’s customer service. Secured buy-in from senior management and garnered funding for 10 engineers.

• Launched a new machine learning algorithm for summarizing customer feedback from millions of users. This information drove significant product and operations improvements for the AdWords business.

• Core member of internal consulting team advising Google’s President of Sales on customer service strategy. Drove thought leadership for analysis plan. Team’s recommendations led to a reorganization of Google’s service team and a new initiative to increase customer satisfaction

This project that he was a project manager on appears to have been the inspiration behind this patent, and how it worked:

Methods and systems for classifying data using a hierarchical taxonomy
Inventors: Glenn M. Lewis, Kirill Buryak, Aner Ben-Artzi, Jun Peng, Nadav Benbarak
Assignee: Google
US Patent 9,367,814
Granted June 14, 2016
Filed: June 22, 2012

Abstract

A method and system for classifying documents is provided. A set of document classifiers is generated by applying a classification algorithm to a trusted corpus that includes a set of training documents representing a taxonomy. One or more of the generated document classifiers are executed against a plurality of input documents to create a plurality of classified documents. Each classified document is associated with a classification within the taxonomy and a classification confidence level. One or more classified documents that are associated with a classification confidence level below a predetermined threshold value are selected to create a set of low-confidence documents. The low-confidence documents are disassociated from each of the associated classifications. A user is prompted to enter a classification within the taxonomy for at least one low-confidence document. The low-confidence document is associated with the entered classification and with a predetermined confidence level to create a newly classified document.

Take Aways

We have an idea of how the process described in this patent was used at Google, to help build a customer support taxonomy. It focused upon classifying customer support issues involving things such as “account management, billing, campaign management, performance, policy, etc.”

The patent tells us how useful collecting and making available information to customer support representatives would be by exploring the details of topics such as billing, to include things such as “payment processing, credits, refunds, etc.”

For instance, the patent tells us that in the subcategory of payment processing, there are issues such as:

1) Customer has questions on activation fee;
2) Customer’s account is marked delinquent;
3) Customer has questions on account cancellation; and
4) Customer has questions on forms of payment and/or invoicing.

The patent provides a rich look at how this taxonomy may have been helpful when having to supply information to advertising customers.

The patent shows us information about how the classification algorithm it uses might work to cluster documents and organize them in a hierarchical manner, like this:

In the above example, the clustering module may define a cluster that contains documents (or references to documents) having both the words “inbox” and “capacity” in their text. Another cluster may include documents having both the words “drop” and “call,” and so on. In some implementations, one or more rules can specify, e.g., what words may be used for clustering, the frequency of such words, and the like. For example, the clustering module can be configured to group together documents where a given word or synonyms of the given word are present more than five times. In another example, the clustering module can be configured to group together documents where any of a pre-defined set of words is present at least once.

Google has started using machine learning processes to solve problems like customer support. This approach aims at making it easier for people inside of Google to help solve customer problems by better understanding those problems and organizing information about how to solve them.

As an SEO, it had me a little excited to see a section that described how Google may rank solutions to problems. This doesn’t appear to be a replacement for PageRank; at least not quite yet. But the roots of organizing a web full of information may be found by starting with solving smaller tasks. This is the passage about ranking documents from the patent. It feels like (to me) there are some hints in there as to how documents might be ranked on the Web to use to respond to queries from searchers:

In some implementations, the document clusters may be ranked using the ranking module, which may also be executed on the server.

In some implementations, the ranking module ranks document clusters according to one or more metrics. For example, the ranking module may rank the clusters according to the quantity of documents in each cluster, as a cluster with many documents may represent a relatively significant topic (e.g., product issue).

As another example, the ranking module may rank the clusters according to an estimated time to resolution of an issue represented by the cluster (e.g., issues represented by a cluster “software update” may typically be resolved faster than issues represented by a cluster “hardware malfunction”), a label assigned to the cluster, a number of documents in a cluster, a designated importance of subject matter associated with a cluster, identities of authors of documents in a cluster, or a number of people who viewed documents in a cluster, etc.

In an example, a cluster representing an issue that historically has taken a longer time to resolve may be ranked higher than a cluster representing an issue with a shorter historical time to resolution.

In another example, several metrics are weighted and factored to rank the clusters. The ranking module can be configured to output the rankings to a storage device (e.g., in the form of a list or other construct).

We’ve heard about a machine learning approach from Google used on Web pages called Rankbrain, which appears to be focused upon rewriting queries in a way that seems helpful in producing relevant search results for those queries. We’ve been told by Google that There is no Rankbrain score and you don’t optimize for it.

What role may machine learning play in how information on the Web might be returned in response to queries? We don’t know at this point, and it’s possible that there’s a lot of learning about machine learning going on at Google these days; like in the building of the automated customer support taxonomy algorithm described in this patent. It appears to have helped solve some problems they were experiencing. We’ll see if it helps solve their mission to “organize the world’s information and make it universally accessible and useful.”


Copyright © 2016 SEO by the Sea. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Machine Learning Inside Google appeared first on SEO by the Sea.





Source link

Share Button

How Google May Map a Query to an Entity for Suggestions



Search predictions come from:

– The terms you’re typing.

– What other people are searching for, including trending searches. Trending searches are popular stories in your area that change throughout the day. Trending searches aren’t related to your search history.

– Relevant searches you’ve done in the past (if you’re signed in to your Google Account and have Web & App Activity turned on).

Note: Search predictions aren’t the answer to your search, and they’re not statements by other people or Google about your search terms.

~ Search on Google using autocomplete

A website by the name of SourceFed produced a video that claimed that Google was intentionally manipulating search results to make Hillary Clinton look good, because it wasn’t showing results tied to her name that SourceFed insisted Google should be showing.

SEO Consultant Rhea Drysdale posted a response on Medium that shot holes in their argument. Rhea started off with:

SourceFed believes Google is manipulating search results in favor of Hillary Clinton, because “Hillary Clinton cri-” did not return “Hillary Clinton criminal charges” and “Hillary Clinton in-” did not return “Hillary Clinton indictment.”

I thought it was interesting that Google was just granted a new patent that describes one way they might be generating suggestions and autocomplete responses to queries on May 31, and thought it was worth looking at. I also thought it was interesting because it was trying to address how entity information might be used with autocomplete suggestions. The patent is:

Associating an entity with a search query
Inventors: Olivier Jean Andre Bousquet, Oskar Sandberg, Sylvain Gelly, Randolph Gregory Brown
Assignee: Google
US Patent 9,355,140
Granted: May 31, 2016
Filed: March 13, 2013

Abstract

Methods and apparatus for associating an entity with at least one search query. Some implementations are directed to methods and apparatus for identifying multiple queries associated with an entity and identifying one or more of the queries as an entity search query that provides desired search results for the entity. Some implementations are directed to methods and apparatus for identifying a particular entity and, in response to identifying the particular entity, identifying an entity search query corresponding to the particular entity.

The process described in this patent provides search suggestions to searchers using a query to entity mapping intended to show off new aspects of entities and queries to provide improved search results to searchers. This is a fairly complicated process, and is worth looking at to get a better sense of what is going on behind the curtains when Google does what it does, so that we don’t make assumptions that might not be very good, when it doesn’t do what we expect it to be doing.

When we search for Hillary Clinton in a Google Search Box, we see a number of query terms that Google is presenting as autosuggestions.

Hillary Clinton Auto Suggestions

When we choose one of those, like the term “email,” we see some additional words added to that query term:

Hillary Clinton email query suggestions

If we follow the suggestion [hillary Clinton email charges], we see a story that is about the possibility of criminal charges being filed against the candidate:

Hillary Clinton email query charges results

Google’s algorithm chose to map a query to the entity “Hillary Clinton” that used the terms “email charges” rather than “criminal charges” as SourceFed was guessing should be how Google would map the topic of that query. Sourcefed didn’t map out the query the way that Google did, but Google did have autosuggestions that covered that topic. If we compare Google trends information for both terms added to the entity “Hillary Clinton”, those terms seem to be close to each other in regards to how much interest searches appear to have shown for each of those queries:

Email Charges vs. Criminal Charges trends

Take Aways

I was left wondering why this patent doesn’t discuss trends, and if I would have to look for another that did (I chose to do that.)

This patent doesn’t mention the use of Google Trends in the identification of queries to map to entities, but we do know that Google Trends have used the Machine Identification numbers that would be assigned to entities at FreeBase.

This patent does tells us that properties associated with some entities may be identified at online encyclopedias such as Freebase, and entities may be assigned unique entity Identifiers.

This patent does focus upon how it might be helpful in telling one entity apart from another using properties associated with different entities, and uses the Entity “Sting” as an example, since there is a well known musician and a well known professional wrestler who both use that name, and they are different people:

Also, for example, in some implementations, the query suggestion system 135 may identify one or more entities associated with a received query via the query to entity association database 125. The query suggestion system 135 may provide one or more query suggestions based on the identified entities, with each of the query suggestions being particularly formulated to focus on a particular entity. For example, the musician Gordon Matthew Thomas Sumner and the wrestler Steve Borden may be associated with the query “sting” in the query to entity association database 125. In response to a received query “sting”, the query suggestion system 135 may identify the musician Gordon Matthew Thomas Sumner as the dominant entity from the query to entity association database 125 and suggest an alternative query suggestion to the user, with the alternative query suggestion being particularly formulated for the musician Gordon Matthew Thomas Sumner (e.g., “sting musician”).

The query to entity mapping described in this patent based upon terms describing properties found in a knowledge base such as Freebase that can help tell that one is a musician and one is an athlete. Using an autosuggest based upon using properties about those entities to find query terms to use to map to the entity shows how query terms may be selected carefully.

Since that patent focuses upon queries that might fit best with different entities, I looked at other patents that involved autocomplete to see what they said about using trend information. This one showed how trend information and personalized search histories could be used to generate suggestions using autocomplete:

Providing customized autocomplete data
Inventors: Nicholas B. Weininger and Radu C. Cornea
Assigned to: Google
US Patent 8,868,592
Granted: October 21, 2014
Filed: May 18, 2012

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing customized autocomplete suggestions. First profile data is obtained for a first user. Second profile data is obtained for second users that submitted search queries, where the second users are different from the first user. Based on the first profile data and the second profile data, similarity scores are determined. The similarity scores are each indicative of a degree of similarity between the first user and at least one of the second users. A proper subset of the search queries is selected based on the similarity scores, and an update for an autocomplete cache of a computing device associated with the first user is generated using the selected subset of search queries. The update is provided to the computing device associated with the first user.

This patent is telling us that autocomplete suggestions may be customized or personalized, but could use trends in word usage when they offer suggestions:

Autocomplete suggestions can be customized for the interests, attributes, and behavior of a particular user or a group of users. Using an autocomplete cache, personalized autocomplete suggestions can be generated when a network connection is unavailable. Using the autocomplete cache, personalized autocomplete suggestions can be presented in a manner that limits network latencies. The autocomplete cache can be updated to reflect current topics and trends in word usage, especially topics and trends among users with similarities to a particular user.

So, the “trend” information used in autocomplete for most people may not quite be the same that is shown in Google Trends, but may be customized for
each searcher performing a search.

Regardless of which autocomplete process Google is following; Rather than charging Google with showing a bias, it may be best to see what query suggestions Google provides, and see what range of topics and concepts that those cover, instead of expecting certain words to show up, like in this instance where “email charges” was a suggestion and “criminal charges” wasn’t, but Google appeared to be covering very similar concepts with those suggestions.

Google wasn’t purposefully avoiding a topic; it was just using words it preferred to use to offer as a query suggestion.


Copyright © 2016 SEO by the Sea. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post How Google May Map a Query to an Entity for Suggestions appeared first on SEO by the Sea.





Source link

Share Button

How Google May Map a Query to an Entity for Suggestions



Search predictions come from:

– The terms you’re typing.

– What other people are searching for, including trending searches. Trending searches are popular stories in your area that change throughout the day. Trending searches aren’t related to your search history.

– Relevant searches you’ve done in the past (if you’re signed in to your Google Account and have Web & App Activity turned on).

Note: Search predictions aren’t the answer to your search, and they’re not statements by other people or Google about your search terms.

~ Search on Google using autocomplete

A website by the name of SourceFed produced a video that claimed that Google was intentionally manipulating search results to make Hillary Clinton look good, because it wasn’t showing results tied to her name that SourceFed insisted Google should be showing.

SEO Consultant Rhea Drysdale posted a response on Medium that shot holes in their argument. Rhea started off with:

SourceFed believes Google is manipulating search results in favor of Hillary Clinton, because “Hillary Clinton cri-” did not return “Hillary Clinton criminal charges” and “Hillary Clinton in-” did not return “Hillary Clinton indictment.”

I thought it was interesting that Google was just granted a new patent that describes one way they might be generating suggestions and autocomplete responses to queries on May 31, and thought it was worth looking at. I also thought it was interesting because it was trying to address how entity information might be used with autocomplete suggestions. The patent is:

Associating an entity with a search query
Inventors: Olivier Jean Andre Bousquet, Oskar Sandberg, Sylvain Gelly, Randolph Gregory Brown
Assignee: Google
US Patent 9,355,140
Granted: May 31, 2016
Filed: March 13, 2013

Abstract

Methods and apparatus for associating an entity with at least one search query. Some implementations are directed to methods and apparatus for identifying multiple queries associated with an entity and identifying one or more of the queries as an entity search query that provides desired search results for the entity. Some implementations are directed to methods and apparatus for identifying a particular entity and, in response to identifying the particular entity, identifying an entity search query corresponding to the particular entity.

The process described in this patent provides search suggestions to searchers using a query to entity mapping intended to show off new aspects of entities and queries to provide improved search results to searchers. This is a fairly complicated process, and is worth looking at to get a better sense of what is going on behind the curtains when Google does what it does, so that we don’t make assumptions that might not be very good, when it doesn’t do what we expect it to be doing.

When we search for Hillary Clinton in a Google Search Box, we see a number of query terms that Google is presenting as autosuggestions.

Hillary Clinton Auto Suggestions

When we choose one of those, like the term “email,” we see some additional words added to that query term:

Hillary Clinton email query suggestions

If we follow the suggestion [hillary Clinton email charges], we see a story that is about the possibility of criminal charges being filed against the candidate:

Hillary Clinton email query charges results

Google’s algorithm chose to map a query to the entity “Hillary Clinton” that used the terms “email charges” rather than “criminal charges” as SourceFed was guessing should be how Google would map the topic of that query. Sourcefed didn’t map out the query the way that Google did, but Google did have autosuggestions that covered that topic. If we compare Google trends information for both terms added to the entity “Hillary Clinton”, those terms seem to be close to each other in regards to how much interest searches appear to have shown for each of those queries:

Email Charges vs. Criminal Charges trends

Take Aways

I was left wondering why this patent doesn’t discuss trends, and if I would have to look for another that did (I chose to do that.)

This patent doesn’t mention the use of Google Trends in the identification of queries to map to entities, but we do know that Google Trends have used the Machine Identification numbers that would be assigned to entities at FreeBase.

This patent does tells us that properties associated with some entities may be identified at online encyclopedias such as Freebase, and entities may be assigned unique entity Identifiers.

This patent does focus upon how it might be helpful in telling one entity apart from another using properties associated with different entities, and uses the Entity “Sting” as an example, since there is a well known musician and a well known professional wrestler who both use that name, and they are different people:

Also, for example, in some implementations, the query suggestion system 135 may identify one or more entities associated with a received query via the query to entity association database 125. The query suggestion system 135 may provide one or more query suggestions based on the identified entities, with each of the query suggestions being particularly formulated to focus on a particular entity. For example, the musician Gordon Matthew Thomas Sumner and the wrestler Steve Borden may be associated with the query “sting” in the query to entity association database 125. In response to a received query “sting”, the query suggestion system 135 may identify the musician Gordon Matthew Thomas Sumner as the dominant entity from the query to entity association database 125 and suggest an alternative query suggestion to the user, with the alternative query suggestion being particularly formulated for the musician Gordon Matthew Thomas Sumner (e.g., “sting musician”).

The query to entity mapping described in this patent based upon terms describing properties found in a knowledge base such as Freebase that can help tell that one is a musician and one is an athlete. Using an autosuggest based upon using properties about those entities to find query terms to use to map to the entity shows how query terms may be selected carefully.

Since that patent focuses upon queries that might fit best with different entities, I looked at other patents that involved autocomplete to see what they said about using trend information. This one showed how trend information and personalized search histories could be used to generate suggestions using autocomplete:

Providing customized autocomplete data
Inventors: Nicholas B. Weininger and Radu C. Cornea
Assigned to: Google
US Patent 8,868,592
Granted: October 21, 2014
Filed: May 18, 2012

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing customized autocomplete suggestions. First profile data is obtained for a first user. Second profile data is obtained for second users that submitted search queries, where the second users are different from the first user. Based on the first profile data and the second profile data, similarity scores are determined. The similarity scores are each indicative of a degree of similarity between the first user and at least one of the second users. A proper subset of the search queries is selected based on the similarity scores, and an update for an autocomplete cache of a computing device associated with the first user is generated using the selected subset of search queries. The update is provided to the computing device associated with the first user.

This patent is telling us that autocomplete suggestions may be customized or personalized, but could use trends in word usage when they offer suggestions:

Autocomplete suggestions can be customized for the interests, attributes, and behavior of a particular user or a group of users. Using an autocomplete cache, personalized autocomplete suggestions can be generated when a network connection is unavailable. Using the autocomplete cache, personalized autocomplete suggestions can be presented in a manner that limits network latencies. The autocomplete cache can be updated to reflect current topics and trends in word usage, especially topics and trends among users with similarities to a particular user.

So, the “trend” information used in autocomplete for most people may not quite be the same that is shown in Google Trends, but may be customized for
each searcher performing a search.

Regardless of which autocomplete process Google is following; Rather than charging Google with showing a bias, it may be best to see what query suggestions Google provides, and see what range of topics and concepts that those cover, instead of expecting certain words to show up, like in this instance where “email charges” was a suggestion and “criminal charges” wasn’t, but Google appeared to be covering very similar concepts with those suggestions.

Google wasn’t purposefully avoiding a topic; it was just using words it preferred to use to offer as a query suggestion.


Copyright © 2016 SEO by the Sea. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post How Google May Map a Query to an Entity for Suggestions appeared first on SEO by the Sea.





Source link

Share Button

Yahoo Assigns 2648 Patents to Mystery Excalibur IP, LLC Group


Google is possibly most well known for the patenting of an algorithm that sorted and ordered search results based upon a metric known as PageRank, named after Google Co-Founder Lawrence Page, while he was a student at Stanford University. Yahoo started off as a Web Directory, which became a Search Engine, and the patent it might be most well known for is one that it purchased from Overture (Originally Goto.com), and successfully sued Google with (winning a settlement out of the litigation) which describes paid search. That patent appears to have been assigned by Yahoo, along with a number of other patents last month.

A couple of weeks ago, an article on Yahoo’s patent portfolio ran, and provided some insights into what value those patents might hold. The article, Yahoo Has a Strong Patent Portfolio, But Reported Valuation is Too High gives us some ideas regarding how much the Search Engine’s patents might be worth (4 Billion?), and what they’ve been doing with them. Has Yahoo sold a good amount of those patents not much less than a week after that article? We don’t know for certain. It’s possible that they may have to one of the remaining bidders for the company: Exclusive: Yahoo’s bidder shortlist points to cash deal -sources.

On April 18th, 2016 an assignment was recorded at the United States Patent and Trademark Office (USPTO) on a transaction that appears to have been executed on April 18th, 2016 involving the assignment of 2648 patents from Yahoo! Inc. to Excalibur IP, LLC. It’s possible that name is made up to hold the patents temporarily. The address that the assignment indicates is Excalibur’s is “701 FIRST AVENUE SUNNYVALE, CALIFORNIA UNITED STATES OF AMERICA 94089”. A search for that address points to the the headquarters of Yahoo! as we see in the knowledge panel below, so the actual purchaser appears unknown.

Yahoo knowledge panel

The transaction includes a patent that was at the heart of litigation between Yahoo! and Google, after Yahoo! had purchased Overture, which had patented Paid search, and Yahoo! sued Google for adopting a paid search model that was said to be similar. In August of 2004, Yahoo! and Google settled the case between them. In the article Yahoo! and Google Resolve Disputes, we are told about the settlement between the two companies:

“Under the terms of the settlement agreement, Google will take a license to U.S. Patent No. 6,269,361 and several related patents, held by Yahoo!’s wholly-owned subsidiary, Overture, and Yahoo! dismissed its patent lawsuit against Google. The two parties have also resolved a dispute regarding shares issuable to Yahoo! pursuant to a warrant to purchase Google shares in connection with a 2000 services agreement.”

The patent is:

System and Method For Influencing a Position on a Search Result List Generated by a Computer Network Search Engine
Inventors: Thomas A. Soulanille, James B. Gallinatti JR., DARREN J. DAVIS, MATTHEW DERER, JOHANN GARCIA, LARRY GRECO, TOD E. KURT, THOMAS KWONG, JONATHAN C. LEE, KA LUK LEE, PRESTON PFARNER, STEVE SKOVRAN
US Patent 6269361
Filed: May 28, 1999
Granted: Jul 31, 2001

We don’t know what the financial terms of this transaction might be, or whom the ultimate parties may be as well. The USPTO site search tells us that there are 2179 granted Yahoo! patents listed there and there are 3151 pending Yahoo! patent applications. In the assignment, in addition to a number of patents involving paid search, there are also a number that involve web search, semantic search, image search, and a wide range of software applications. Have they been transferred to a new owner already, or are they now being held by a holding company? The patent behind paid search is included in that patent portfolio. We will see who it ends up going to.


Copyright © 2016 SEO by the Sea. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Yahoo Assigns 2648 Patents to Mystery Excalibur IP, LLC Group appeared first on SEO by the Sea.





Source link

Share Button

Google’s Reasonable Surfer Patent Updated


surfer

Systems and methods consistent with the principles of the invention may provide a reasonable surfer model that indicates that when a surfer accesses a document with a set of links, the surfer will follow some of the links with higher probability than others. This reasonable surfer model reflects the fact that not all of the links associated with a document are equally likely to be followed. Examples of unlikely followed links may include “Terms of Service” links, banner advertisements, and links unrelated to the document.

Google’s original PageRank algorithm is based upon what its inventor referred to as the Random Surfer model, where it ranked pages on the Web based upon a probability that a person following links at random on the Web might end up upon a particular page:

The rank of a page can be interpreted as the probability that a surfer will be at the page after following a large number of forward links. The constant α in the formula is interpreted as the probability that the web surfer will jump randomly to any web page instead of following a forward link.

Years later, some search engineers at Google came out with a newer patent based upon something referred to as the Reasonable Surfer model, which looked at different probabilities involving the likelihood that a person might click upon certain links, and that those probabilities could determine how likely it might be that someone might click upon links to specific pages on the web, and end up at one of those pages.

I wrote about this patent in a post from 2010 which I titled, Google’s Reasonable Surfer: How The Value Of A Link May Differ Based Upon Link And Document Features And User Data

Patents do sometimes get updated by the people who originally file them. These updates often take the shape of changes to the claims within the patents.

These changes may reflect a change in the way that the processes described within the patent operate.

It’s the claims section that is changed when one of these continuation patents is filed, because patent examiners from the patent office look at the claims, and compare those to claims from other patents to make sure that the new claims don’t copy other granted patents, and could be said to infringe those patents.

A continuation patent is called that because it “continues” the protection given by the original version of the patent and is given a date of coverage that begins with the original filing date of the original version of the patent.

The continuation patent is:

Ranking documents based on user behavior and/or feature data
Inventors: Jeffrey A. Dean, Corin Anderson, and Alexis Battle
Assigned to: Google
US Patent 9,305,099
Granted April 5, 2016
Filed: January 10, 2012

Abstract

A system generates a model based on feature data relating to different features of a link from a linking document to a linked document and user behavior data relating to navigational actions associated with the link. The system also assigns a rank to a document based on the model.

As I pointed out in my original post about the Reasonable Surfer patent, it changes the amount of PageRank that might flow through a link based upon different features associated with a link. If a link is in the main content area of a page, uses a font and color that might make it stand out, and uses text that may make it something likely that someone might click upon it, then it could pass along a fair amount of PageRank. On the otherhand, if it combines features that make it less likely to be clicked upon, such as being in the footer of a page, in the same color text as the rest of the text on that page, and the same font type, and uses anchor text that doesn’t interest people, it may not pass along a lot of PageRank.

So, how has the Claims for this patent changed, changing the Reasonable surfer model?

I’m seeing it refer to anchor text in those claims more frequently, and how much weight might be passed along based upon the probability that people might click upon a link. Here is some language that stands out to me, from the first new claim in the patent:

… a rank for a particular document, generating the rank including: determining particular feature data associated with a link to the particular document, the particular feature data identifying one or more attributes of the link, determining a weight indicating a probability of the link being selected, the weight being determined based on the particular feature data and selection data, the selection data identifying user behavior relating to links to other documents …the weight indicating a higher probability of the link being selected when the particular feature data corresponds to feature data associated with the one or more links than when the particular feature data corresponds to feature data associated with the one or more other links…words in anchor text associated with the links, and a quantity of the words in the anchor text

The claims in the original version of Ranking documents based on user behavior and/or feature data are different, and these newer claims seem to emphasize more that the weight that is passed along by links seems to be based upon the probability that people will click upon a link found upon a page.

It’s no longer a “random” probability, but now seems to be even more “reasonable” than it was even in the first version of the reasonable surfer patent.


Copyright © 2016 SEO by the Sea. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Google’s Reasonable Surfer Patent Updated appeared first on SEO by the Sea.





Source link

Share Button

Satisfaction a Future Ranking Signal in Google Search Results?


Do you search through Google on your phone? How do you know whether or not Google is watching you as you do and keeps on eye on whether or not you like the results you receive during your searches? Could Satisfaction with search results be a ranking signal that Google may use now, or in the future?

A newly published Google patent application describes technology that would modify scoring and ranking of query results using biometric indicators of user satisfaction or negative engagement with a search result. In other words; Google would track how satisfied or unsatisfied someone might be with search results, and using machine learning, build a model based upon that satisfaction, raising or lowering search results for a query. This kind of reaction might be captured using a camera on a searcher’s phone to see their reaction to a search result, as depicted in the following screenshot from the patent:

Google biometric responses

This satisfaction would be based upon Google tracking and measuring biometric parameters of a user obtained after thst search result is presented to the user, to determine whether those may indicate negative engagement by the user with a search result.

For example, someone searches for “Seafood Restaurants,” and the top result is a restaurant they have visited before and didn’t like, causing them to frown, which may be captured on their phone’s camera. That reaction may be seen as a negative signal by the search engine, and could potentially count against that restaurant ranking as highly for that query term. The patent tells us that such a reaction may influence search results for multiple searchers:

The actions include providing a search result to a user; receiving one or more biometric parameters of the user and a satisfaction value; and training a ranking model using the biometric parameters and the satisfaction value. Determining that one or more biometric parameters indicate likely negative engagement by the user with the first search result comprises detecting:

  • Increased body temperature
  • Pupil dilation
  • Eye twitching
  • Facial flushing
  • Decreased blink rate
  • Increased heart rate.

The patent is:

Ranking Query Results Using Biometric Parameters
Inventors: Jason Sanders, Gabriel Taubman
Assignee: Google
US Patent Application 20160103833
Published April 14, 2016
Filed: February 28, 2013

Abstract

Methods, systems, and apparatus, including computer program products, for providing query results using biometric parameters. One of the methods includes providing a search result in response to receiving a search query. If one or more of biometric parameters of a user indicate likely negative engagement by the user with the first search result, an additional search result is obtained and provided in response to the search query.

Take Aways

When I think of how often I get my face right up in my phone’s screen while searching for something, the idea that Google might use the phone’s camera to capture my facial impressions as I’m looking at results doesn’t surprise me. Would Google use such signals to rank search results, or build a model of biometric reactions to search results? It’s an interesting question. Instead of social media likes or dislikes, these rankings would be based upon what would be percieved as actual likes or dislikes.

Could you envision Google using an approach like this one in ranking search results?


Copyright © 2016 SEO by the Sea. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Satisfaction a Future Ranking Signal in Google Search Results? appeared first on SEO by the Sea.





Source link

Share Button

Selecting Entities on Sites and Performing Tasks On Them Through Google


Visitors to a website may want to perform certain actions related to Entities (specific places or people or things) that are displayed to them on the Web.

For example, at a page for a restaurant (an entity), a person viewing the site may want to create a reservation or get driving directions to the restaurant from their current location. Doing those things may require a person to take a number of steps, such as selecting the name of the restaurant and copying it, pasting that information into a search box, and submitting it as a search query, selecting the site from search results, determining if making a reservation is possible on the site, and then providing information necessary to make a reservation; getting driving directions may also require multiple steps.

Using a touch screen device may potentially be even more difficult because the site would possibly then be limited to touch input.

A patent granted to Google this week describes a way to easily identify an entity such as a restaurant on a touch device, and select it online and take some action associated with that entity based upon the context of a site the entity is found upon. Actions such as booking a reservation at a restaurant found on a website, or procuring driving directions to that site, or other actions could be easily selected by the user of a site.

The patent is:

Semantic selection and purpose facilitation
Inventors: Paul Nordstrom, Casey Stuart Whitelaw,
Assignee: Google
US Patent 9,305,108
Granted April 5, 2016
Filed: October 5, 2012

Abstract

Computer-implemented methods for proposing actions to a user to select based on the user’s predicted purpose for selecting content are provided. In one aspect, a method includes receiving an identifier of a referent entity associated with user-selectable content, identifying, based on a prediction of a purpose in selecting the content, at least one action to be executed that is associated with the entity, and providing, for display, at least one identifier of the at least one action to the device for selection by a user. Systems, graphical user interfaces, and machine-readable media are also provided.

How an entity and actions might be selected by a site visitor

A person searches for a site using text such as “sushi restaurants in Mountain View.” That person then circles the text “we love Ramen Sushi out of all of the places we’ve been to” on the web page they found with that search, by circling the text using a touch input. Based on the content they chose and the context of their selection of that text, The system decides that viewer of the page has selected “Ramen Sushi,” and it proposes that entity to the user. The user can confirm that, and is then given a number of actions to perform on the entity based on a context of that selection.

Someone circles an entity on a touch screen to perform actions on it.
Someone circles an entity on a touch screen to perform actions on it.

The context can include:

  • The current location of the device
  • A past location of the device
  • The type of the device
  • A previous action associated with the entity taken by the user or another user
  • A search query
  • Information on another user associated with the user
  • The file from which the user-selectable content was selected
  • The remaining content from which the user-selectable content was selected

Actions might then be displayed that could include:

  • Directions to Ramen Sushi
  • Make a reservation at Ramen Sushi
  • Operating hours for Ramen Sushi
  • Reviews of Ramen Sushi

Once an action is chosen, it can be performed by the system.

Entities are contained in an entity database, which may contain attributes or properties associated with the entity, and those can be pre-defined, and can have associated descriptors such as “location,” “restaurant,” and “phone number.” An entity that is a person such as George Washington can have an associated descriptor “notable person”.

The patent tells us that entities that are listed in the entity database can be associated with one or many user purposes and/or actions based on an associated descriptor.

A purpose is something that a user would want to do or find out with respect to an entity that selected. These actions are shown in a menu to the user as choices of actions to take regarding selected entities. These purposes may be referred to as a “task.” The patent provides a number of examples, that include:

“play” (e.g. for games and sports), “rate” or “evaluate,” “travel to,” “contact,” “communicate,” “share,” “record,” “remember,” dine,” “consume,” “experience” or “enjoy” (e.g. art, music), “reserve” (tickets, etc.), “compare,” “learn,” “study,” “understand,” “purchase,” “repair,” “fix,” “teach,” “cook,” and “make.” For the example purpose “dine,” an example sub-purpose can be “eat dinner,” from which example sub-purposes can be “make reservation,” “get directions,” and “find parking.”

The patent tells us that users can select multiple entities of the same type at the same time to compare them.

Entities, purposes, and actions can be added to the entity database either manually or automatically with a user (or even an owner of the entity) adding information. The patent provides some examples of how information might be added to the entity database, but it seems to be fairly wide open under the patent.

The patent doesn’t mention Schema vocabulary, which would be one way for a site owner to add entity information to an entity database.

Entities may be products, and actions presented to a user could include providing a review of the product, identifying a seller of the product, providing a price for the product, or providing an offer (e.g., discount or coupon) associated with the product. If the entity is a service, such as watching a movie or a plumber for hire, the actions that may be presented to the user could include “providing a review of the service, identifying an availability of the service (e.g., show times), identifying a location where the service is being provided (e.g., an address of the plumber), or providing an option to purchase the service (e.g., purchasing tickets for the movie or rates offered by the plumber).”

Take Aways

The entity database described in this patent could be a very big one, containing multiple businesses (like those from Google Maps), multiple products, multiple people (like those found at a knowledge base like Wikipedia), and multiple potential actions and tasks associated with those entities.

This seems to be a fairly aspirational patent, which might require a lot of steps being put into place before it is implemented. It does present a vision of how entities on the web could eventually be acted upon by people who see them in web pages.

This could be something that Google may intend to do, and some of the pieces for it are in place, such as a knowledge graph filled with entities, and a schema system that is extendable. It’s interesting seeing a patent that lays out a framework like this one does. Is this a future path that Google will follow? We may need to wait to see.


Copyright © 2016 SEO by the Sea. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Selecting Entities on Sites and Performing Tasks On Them Through Google appeared first on SEO by the Sea.





Source link

Share Button