The Monkey-Spider project

Frequently asked questions

What is the Monkey-Spider?

The Monkey-Spider is a crawler based low-interaction client honeypot.

Where can I get an overview about client honeypots?

Have a look at the Wikipedia article.

Which license is applicable to Monkey-Spider?

Monkey-Spider is licensed under the terms of the GPLv3.

So what is it good for?

A simplified explanation could be: You can think of it as a virus scanner for Web sites. It looks for threats on Web sites.

What kind of threats is Monkey-Spider looking for on the Web?

In the current implementation phase it is looking for any malware (virus, trojan, worm, spyware, adware, phishing, hoax etc.) which is detectable using predefined malware signatures aka. any malware which is detectable using malware scanners like anti-virus and anti-spyware scanners. But this is only the beginning. It might also use some automatical behavioral/emulated malware analysis techiques to detected unknown threats.

Of what use is the Monkey-Spider for me as a normal Web user?

Not much. It might be interesting for companies, organizations and security researchers who want to automatically find threats on portions of the Web e.g. their own Web site or Intranet. Especially community Web sites or forums with user uploadable content pose a risk for the hoster. These kind of threats could be detected.

Where can I ask questions and get help?

There is a mailing list for this. You can subscribe, ask questions or look into the archived posts here.

How can I support this project?

You can ask questions, give feedback, send patches, submit ideas, participate in the project and/or donate money.

What does 'low-interaction' mean?

Low-interaction in contrast to high-interaction is a classification for the abilities of an attacker on the attacked system. So if we let a malicious Web site (the attacker) attack our Web crawler (the victim), the attacker will have no (or low) interaction abilities on the system he is attacking.

What is the difference between a 'client honeypot' and a 'honeyclient'?

Nothing. It is used vice versa. Client honeypot is the correct classification, honeyclient is the other term that is generally used. However, honeyclient is the name of the first open source client honeypot implementation.

What are the requirements/dependencies for the Monkey-Spider?

Look at the documentation page.

Does the Monkey-Spider have a Web interface?

No. In contrast to the system proposed in the thesis the current state of the implementation has only a command line interface. The old Web interface was not fully functional and not a good solution so I decided to drop this. But adding an appropriate Web interface to the system is part of the future work and may come in one of the future releases.

Does the Monkey-Spider have support for Google search results?

No. Not anymore. The initial version discussed in the thesis had an interface to the Google SOAP Search API which would provide up to 1000 search results per day but due to their no longer issuing new API keys for the SOAP Search API I have decided to drop this support.

Does the mail seeder analyze attached binaries (aka. malicious mail attachments)?

Not yet.

The websearch seeder scripts don't work for me. Why?

For every Web service (Yahoo or Microsoft Live Search) you need to provide a valid Application ID to authenticate requests. These IDs are unique per Yahoo/Microsoft Live user, thats why I can't provide any with the source code.

How can I get Application IDs for Yahoo/Microsoft Live Search?

The Yahoo Application ID can be obtained from http://developer.yahoo.com/wsregapp/. And the Windows Live ID from http://dev.live.com/liveid/.

What is an ARC file?

The ARC file format is a file format used by Heritrix to store the crawled content. The ARC file format is described here. The Monkey-Spider extracts(dumps) these files to gather the original content.