by mikecanulkan
ExCoPe (extrae, copia y pega)
import logging
from gunicorn.app.base import BaseApplication
from app_init import create_initialized_flask_app
# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Flask app creation should be done by create_initialized_flask_app to avoid circular dependency problems.
app = create_initialized_flask_app()
class StandaloneApplication(BaseApplication):
def __init__(self, app, options=None):
self.application = app
self.options = options or {}
super().__init__()
def load_config(self):
# Apply configuration to Gunicorn
for key, value in self.options.items():
if key in self.cfg.settings and value is not None:
self.cfg.set(key.lower(), value)
def load(self):
Frequently Asked Questions
Journalists collecting online information for stories The ability to quickly extract and format text for different platforms makes ExCoPe an efficient tool for professional content workflows. Q2: How can ExCoPe improve productivity in a business setting?
ExCoPe significantly enhances productivity by: - Reducing manual copy-pasting time by up to 70% - Automatically removing ads, images, and irrelevant content - Providing instant formatting for popular business tools - Enabling quick transfer to collaborative platforms like Google Docs - Maintaining consistent formatting across team members This streamlines the content collection process and allows teams to focus on analysis rather than formatting.
Q3: What sets ExCoPe apart from standard web scrapers?
A: Unlike basic web scrapers, ExCoPe is designed for business users with features like: - Intelligent main content detection - Built-in formatting for common business applications - User-friendly interface requiring no technical knowledge - Cross-platform compatibility - Clean, distraction-free text extraction This makes it more suitable for professional environments where ease of use and formatted output are priorities.
Q4: How can I customize ExCoPe's content detection algorithm?
A: You can modify the content detection logic in routes.py by adjusting the main content selection criteria. Here's an example:
```python
Add custom content selectors
main_content_tags = ['article', 'main', '[role="main"]', '.main-content', '#main-content'] custom_selectors = ['.your-custom-class', '#your-custom-id'] main_content_tags.extend(custom_selectors)
Add custom content filtering
def is_valid_content(text): min_length = 100 # Minimum character length max_ads_keywords = 3 # Maximum number of ad-related keywords return len(text) >= min_length and sum(1 for kw in ad_patterns if kw in text.lower()) < max_ads_keywords
Apply custom filtering
extracted_text = [text for text in text_elements if is_valid_content(text)] ```
Q5: How can I implement custom export formats in ExCoPe?
A: You can add new export formats by extending the formatting functions in home.js. Here's an example:
``javascript
// Add custom format function
function formatForMarkdown(text) {
const paragraphs = text.split('\n\n');
return paragraphs.map(p =>
${p.trim()}\n\n`).join('');
}
// Add new button to HTML const markdownButton = document.getElementById('markdownButton');
// Add event listener markdownButton.addEventListener('click', () => { if (copyToClipboard(formatForMarkdown(extractedText))) { handleCopySuccess(markdownButton); openInNewTab('https://your-markdown-editor.com'); } else { showError('Copy failed'); } }); ```
This allows you to create custom export formats tailored to your specific needs while maintaining ExCoPe's clean interface and user experience.
Created: | Last Updated:
ExCoPe - Web Text Extraction Tool
ExCoPe is a web application that extracts main text content from web pages while removing ads, images, and other distracting elements. It provides formatting options to easily transfer the extracted text to Google Docs, Microsoft Word, Excel, or Google Sheets.
Getting Started
- Click "Start with this Template" to begin using ExCoPe
Testing the Application
- Click the "Test" button to deploy the application
- Once deployed, you'll receive a URL to access the web interface
Using the Application
- Open the provided URL in your browser to access ExCoPe
- Enter a webpage URL in the input field (must start with https://)
- Click "Extract Text" to process the webpage
- The extracted text will appear below with several formatting options:
- Copy to Clipboard - Copies raw text
- Word Format - Formats text for Microsoft Word
- Google Docs Format - Formats text for Google Docs
- Excel Format - Formats text for Microsoft Excel
-
Sheets Format - Formats text for Google Sheets
-
When using the formatting buttons:
- The text will be automatically copied to your clipboard
- You'll be redirected to the corresponding application (Word, Docs, Excel, or Sheets)
- Simply paste the formatted text into your document
The application automatically removes ads, navigation elements, footers, and other non-content elements to provide clean, readable text from any webpage.
Template Benefits
- Content Research Efficiency
- Streamlines the process of gathering web content for market research, competitive analysis, and content curation
-
Saves significant time by automatically removing ads, images, and irrelevant content
-
Cross-Platform Content Management
- Enables seamless transfer of extracted content to popular productivity tools (Word, Google Docs, Excel, Sheets)
-
Facilitates efficient content organization and sharing across different business platforms
-
Data Analysis Preparation
- Simplifies the process of collecting web data for business intelligence and market analysis
-
Provides clean, formatted text that can be easily imported into spreadsheets for further analysis
-
Documentation Automation
- Accelerates the creation of business documentation by quickly extracting and formatting relevant web content
-
Reduces manual copy-paste efforts and formatting time for report creation
-
Cost-Effective Content Processing
- Eliminates the need for expensive web scraping tools or multiple software subscriptions
- Provides a self-hosted solution that can be customized to specific business needs without ongoing service fees
Technologies



