December 2009 ~ Vinay's Blog

Dec 30, 2009

How to reset the BlackBerry smartphone to factory defaults

blackberryNo comments

Products

BlackBerry® Desktop Software
BlackBerry® Devices

Environment

BlackBerry® Device Software version 4.3 to 5.0
BlackBerry® Desktop Software version 4.7 to 5.0.1

Overview

To reset a BlackBerry smartphone to factory defaults, complete the following steps:

Warning: Back up the BlackBerry smartphone. For assistance backing up the BlackBerry smartphone

On a Windows® XP to Windows® 7 32 bit Operating System do the following:
Connect the BlackBerry smartphone to the computer.

Open Start > Programs > Accessories > Command Prompt.
Type cd C:\Program Files\Common Files\Research In Motion\Apploader
Type loader.exe /resettofactory

On a Windows XP or Windows Vista™ 64 bit Operating System do the following:

Connect the BlackBerry smartphone to the computer.
Open Start > Programs > Accessories > Command Prompt.
Type cd C:\Program Files (x86)\Common Files\Research In Motion\Apploader
Type loader.exe /resettofactory

Note: BlackBerry Desktop Manager 4.7 to 5.0 must be installed on the computer. The BlackBerry smartphone must be running BlackBerry Device Software versions 4.3 to 4.7.

How to reset the BlackBerry smartphone to factory defaults

blackberry1 comment

Products

BlackBerry® Desktop Software
BlackBerry® Devices

Environment

BlackBerry® Device Software version 4.3 to 5.0
BlackBerry® Desktop Software version 4.7 to 5.0.1

Overview

To reset a BlackBerry smartphone to factory defaults, complete the following steps:

Warning: Back up the BlackBerry smartphone. For assistance backing up the BlackBerry smartphone

On a Windows® XP to Windows® 7 32 bit Operating System do the following:
Connect the BlackBerry smartphone to the computer.

Open Start > Programs > Accessories > Command Prompt.
Type cd C:\Program Files\Common Files\Research In Motion\Apploader
Type loader.exe /resettofactory

On a Windows XP or Windows Vista™ 64 bit Operating System do the following:

Connect the BlackBerry smartphone to the computer.
Open Start > Programs > Accessories > Command Prompt.
Type cd C:\Program Files (x86)\Common Files\Research In Motion\Apploader
Type loader.exe /resettofactory

Note: BlackBerry Desktop Manager 4.7 to 5.0 must be installed on the computer. The BlackBerry smartphone must be running BlackBerry Device Software versions 4.3 to 4.7.

Windows 7 Library

windows 71 comment

Windows 7 Libraries is a great way to organize your data. But still there are some tasks which are most needed with respect to libraries, and cannot be easily performed by a nontechnical user like backing up libraries, adding network locations to libraries and such other tasks.

The Windows 7 Library Tool is a small app which lets you perform such tasks with just a couple of clicks.

At the bottom you will find buttons to play with your library settings, you can create new library, modify any existing one, delete any particular library and make backups. The key features of this tool include that it lets you add non-indexed folders to the library, you can create libraries backup, and it lets you change library icons as well.

Select any particular library and hit the Edit button, you will find a dialogue box where you can play with the library’s properties.

It works on both 32-bit and 64-bit versions of Windows 7.

Library tool can be downloaded from Library Tool
(http://zornsoftware.talsit.info/?page_id=37&did=2 )

Windows 7 Library

windows 71 comment

The Windows 7 Library Tool is a small app which lets you perform such tasks with just a couple of clicks.

Select any particular library and hit the Edit button, you will find a dialogue box where you can play with the library’s properties.

It works on both 32-bit and 64-bit versions of Windows 7.

Library tool can be downloaded from Library Tool
(http://zornsoftware.talsit.info/?page_id=37&did=2 )

Drag and Drop to Command Prompt feature in Windows 7

windows 71 comment

The Windows 7 Command Prompt now supports drag & drop of files. This means that to enter the destination of any file you can just drag & drop it into the command prompt window.

This small feature is not so useful for most users, but for those who work frequently in Windows command prompt, this is a lifesaver.

Sadly this drag & drop functionality is only for files and does not apply with some text commands which you might need to copy/paste in the command prompt quickly. Whether the files are executable, text files, documents, or whatever, they can now all be dragged and dropped into the command prompt.

Drag and Drop to Command Prompt feature in Windows 7

windows 71 comment

The Windows 7 Command Prompt now supports drag & drop of files. This means that to enter the destination of any file you can just drag & drop it into the command prompt window.

This small feature is not so useful for most users, but for those who work frequently in Windows command prompt, this is a lifesaver.

Hidden Blackberry Features

blackberryNo comments

Advanced Enterprise Activation Settings

ALT-CNFG In Options -> Advanced Options -> Enterprise Activation Settings for Enterprise Activation

Address Book

ALT-VALD In address book list Validate the data structure and look for inconsistencies

ALT-RBLD In address book list Force a data structure rebuild

Browser

ALT-RBVS Any HTML/WML webpage View web page source code

Calendar

ALT+VIEW Inside any Calendar item Show extra info for a Calendar event

(for these, just type the letters on the appropriate screen)

SYNC Calendar app>Options Enable Calendar slow sync

RSET Calendar app>Options Will prompt for a reload of the calendar from the BES

RCFG Calendar app>Options Request BES configuration

SCFG Calendar app>Options Send device configuration

DCFG Calendar app>Options Get CICAL configuration

SUPD Calendar app>Options Enable detailed Cal. report for backup

SUPS Calendar app>Options Disable detailed Cal. report for backup

SUPN Calendar app>Options Disable Cal. report database

LUID Calendar app>Options Enable view by UID

SRSL Calendar app>Options Show Reminder status log

Messaging

ALT + VIEW For messages, displays the RefId and FolderId for that particular message. For PIM items, displays only the RefId.

Search Application

ALT-ADVM Search Application - Enabled Advanced Global Search

MMS

MMSC Options > MMS - Show MMS hidden options

Home Screen

ALT-JKVV Home Screen - Display cause of PDP reject

ALT + CAP + H Home Screen - Displays the Help Me screen

ALT + NMLL Home Screen - Switches the signal strength from bars to a numeric value

ALT + LGLG Home Screen - Displays the Java™ event log.

WLAN

ALT-SMON WLAN Wizard Screen - Enable Simulated Wizard mode

ALT-SMOF WLAN Wizard Screen - Disable Simulated Wizard mode

Theme

ALT-THMN Any menu - Change to no theme (B&W)

ALT-THMD Any menu - Change to default theme

Date/Time

LOLO Options -> Date/Time - Show Network time values

SIM Card

MEPD Options > Advanced Options > SIM card - Display MEP info

MEP1 Options > Advanced options>SIM card - Disable SIM personalization

MEP2 Options > Advanced options>SIM card - Disable Network personalization

MEP3 Options > Advanced options>SIM card - Disable Network subset personalization

MEP4 Options > Advanced options>SIM card - Disable Service provider personalization
MEP5 Options > Advanced options>SIM card - Disable Corporate personalization

Other Useful Commands
*#06# Home Screen - Displays your device's international mobile equipment identity (IMEI - your serial number) on-screen
ALT-CAP-DEL Any menu – Soft Reset (“battery pull”)

Hidden Blackberry Features

blackberryNo comments

Advanced Enterprise Activation Settings

ALT-CNFG In Options -> Advanced Options -> Enterprise Activation Settings for Enterprise Activation

Other Useful Commands
*#06# Home Screen - Displays your device's international mobile equipment identity (IMEI - your serial number) on-screen
ALT-CAP-DEL Any menu – Soft Reset (“battery pull”)

Oracle External tables - Overview

oracleNo comments

Overview on Oracle External tables

External tables can read flat files (that follow some rules) as though they were ordinary (although read-only) Oracle tables. Therefore, it is convenient to use external tables to load flat files into the DB. External can be used like SQL tables in querying purpose. So we can use it for insert and update of data into permanent tables. No DML can be performed on external tables but they can be used for query, join and sort operations. Views and synonyms can be created against external tables. They are useful in the ETL process of data warehouses since the data doesn't need to be staged and can be queried in parallel.

Separate directory needs to be created to place the reference file i.e. the file which contains data to be uploaded. Directory objects can be created by DBAs or by any user with the CREATE ANY DIRECTORY privilege. After a directory is created, the user creating the directory object needs to grant READ or WRITE permission on the directory to other users.

Ex:

To create directory

create or replace directory ext_dir as '/home/ext_dir’;

To grant access:

GRANT READ ON DIRECTORY ext_dir to fnuser;

After creating the directory for external table usage we can upload the reference file into that directory. Normally the flat file would be the excel files. The excel file would be converted into Comma separated files and will be used in loading into external tables as the input file.

External tables are read-only. No data manipulation language (DML) operations or index creation is allowed on an external table. When the external table is accessed through a SQL statement, the fields of the external table can be used just like any other field in a normal table.

Syntax for creating external tables

create table sample
organization external (
  type              oracle_loader
  default directory ext_dir
  access parameters (
    records delimited  by newline
    fields terminated by ','
   optionally enclosed by '"'
    missing field are values null
  )
  location ('inputfile.csv')
)
reject limit unlimited;

ORGANIZATION EXTERNAL clause tells Oracle that we are creating an external table.

ORACLE_LOADER driver is a new type that "powers" external tables.

We can set a DEFAULT DIRECTORY once for the entire table. In most cases, we will wish to write log/bad/discard files to a logging directory and read our incoming data files from a data directory.

ACCESS PARAMETERS clause contains the SQL*Loader-style reference to enable Oracle to parse the flat-file into rows and columns. Note that if we have made any syntactical errors in our ACCESS PARAMETERS, we can still create the external table. Access parameters themselves are not parsed until we issue a SELECT against the external table.

LOCATION clause is where we specify the input file(s).

Accessing external tables:

After creating an external table, the data in the table can be viewed like normal SQL table.

Ex:

Select * from sample;

In the directory specified with ext_dir, a log file will as well be written upon selecting from the external table.

Advantages of External tables:

Data in external tables can be queried before it is loaded into the tables.

External tables are suitable for large data loads that may have a onetime use in the database.

External tables eliminate the need to create staging or temporary tables.

No need of physical space even for the largest external tables. Once the data files are loaded into OS, external tables can be created and can execute SQL queries against them.

An external table load allows modification of the data being loaded by using SQL functions and PL/SQL functions as part of the INSERT statement that is used to create the external table.

Disadvantages:

Till Oracle 11g there is no option to execute DML against an external table. External tables supports SELECT only.

No index can be created on External tables.

Oracle External tables - Overview

oracleNo comments

Overview on Oracle External tables

Ex:

To create directory

create or replace directory ext_dir as '/home/ext_dir’;

To grant access:

GRANT READ ON DIRECTORY ext_dir to fnuser;

Syntax for creating external tables

create table sample
organization external (
  type              oracle_loader
  default directory ext_dir
  access parameters (
    records delimited  by newline
    fields terminated by ','
   optionally enclosed by '"'
    missing field are values null
  )
  location ('inputfile.csv')
)
reject limit unlimited;

ORGANIZATION EXTERNAL clause tells Oracle that we are creating an external table.

ORACLE_LOADER driver is a new type that "powers" external tables.

We can set a DEFAULT DIRECTORY once for the entire table. In most cases, we will wish to write log/bad/discard files to a logging directory and read our incoming data files from a data directory.

ACCESS PARAMETERS clause contains the SQL*Loader-style reference to enable Oracle to parse the flat-file into rows and columns. Note that if we have made any syntactical errors in our ACCESS PARAMETERS, we can still create the external table. Access parameters themselves are not parsed until we issue a SELECT against the external table.

LOCATION clause is where we specify the input file(s).

Accessing external tables:

After creating an external table, the data in the table can be viewed like normal SQL table.

Ex:

Select * from sample;

In the directory specified with ext_dir, a log file will as well be written upon selecting from the external table.

Advantages of External tables:

Data in external tables can be queried before it is loaded into the tables.

External tables are suitable for large data loads that may have a onetime use in the database.

External tables eliminate the need to create staging or temporary tables.

No need of physical space even for the largest external tables. Once the data files are loaded into OS, external tables can be created and can execute SQL queries against them.

An external table load allows modification of the data being loaded by using SQL functions and PL/SQL functions as part of the INSERT statement that is used to create the external table.

Disadvantages:

Till Oracle 11g there is no option to execute DML against an external table. External tables supports SELECT only.

No index can be created on External tables.

Apex- Oracle’s Best Kept Secret

APEX, oracle2 comments

What is Oracle Apex?

Oracle Application Express (Oracle Apex, previously named Oracle HTML-DB) is a freeware software development environment based on the Oracle database. It allows a very fast development cycle to be achieved to create web based applications.

APEX is Oracle’s answer to wizard-based Web Development. It certainly is full of wizards! APEX contains metadata (lots of it) on everything in the tool, too. It produces dynamic HTML and it’s fast! APEX uses the PL/SQL module for Apache (mod_plsql). It doesn’t use any Java code. Anything you can do in SQL or PL/SQL can be done using APEX. APEX provides a nice Web-based team development environment for your organization.

If you’re an Oracle developer at heart (which means you know SQL and PL/SQL really well), Apex takes the mystery out of Web development. Write a query and view it in a browser! The data can be viewed as an HTML table (i.e. a report) or an Adobe Flash graph. Data can easily be edited in forms. A page can be made up of any number of components (i.e. charts, reports, forms, etc.). If you would like to execute a PL/SQL procedure or function before or after your page displays - fear not, Apex can easily handle this task with its processes.

Oracle Apex can be installed in an Oracle 9.2 or higher database, and starting from Oracle 11g it will be preinstalled along with the database.

Apex 3.1. includes a new major feature known as Interactive Reporting, which enables end-users to extensively customize a report without programmer intervention, using techniques such as filtering, sorting, group-by, choosing displayed columns, etc. The user can even save multiple versions of their customized reports. The programmer can limit which features are enabled.

APEX’s origin

One popular misconception is that it's a new version of Web DB. Mike Hichwa created Web DB, a successful web front-end for Oracle, but the development of Web DB started to move in a direction that diverged from Mike's vision. When tasked with building an internal web calendar, Mike enlisted the help of Joel Kallman and started "Flows". They co-developed the Web Calendar and Flows for several years, adding features to Flows as they needed them to develop the calendar. In the earliest days of Flows, there was no front-end for it, so all changes to an application were made in SQL*Plus via inserts, updates and deletes. In some ways APEX is an evolution of Web DB, but it was really a fresh start with new code and no upgrade path.

One of the most well known applications developed in Application Express is the AskTom application developed by Thomas Kyte. Oracle's online store also runs on APEX.

Below are the links:

Oracle Apex Architecture

Oracle Application Express consists of a metadata repository that stores the definitions of applications and an engine (called the Application Express engine) that renders and processes pages. It lives completely within your Oracle database. It is comprised of nothing more than data in tables and large amounts of PL/SQL code. The essence of Oracle Application Express is approximately 215 tables and 200 PL/SQL objects containing 300,000+ lines of code.

The Application Express engine performs:

Session state management
Authentication services
Authorization services
Page flow control
Validations processing
Rendering and page processing

The asynchronous session state management architecture ensures the minimal CPU resources are consumed. The browser sends a URL request that is translated into the appropriate Oracle Application Express PL/SQL call. After the database processes the PL/SQL, the results are relayed back to the browser as HTML. This cycle happens each time one either requests or submits a page. The session state is managed in the database and does not use a dedicated database connection. Each page view results in a new database session, thus database resources are only consumed when the Application Express engine processes or renders a page.

The Application Express engine is accessed from a Web browser through the Oracle HTTP Server (Apache) and mod_plsql. Applications are rendered in real time from the metadata repository stored in database tables. Building or extending applications does not cause code to be generated, instead metadata is created or modified.

With Oracle Database 11g you can replace the Oracle HTTP Server (Apache) and mod_plsql from the architecture with the Embedded PL/SQL Gateway (EPG). The EPG provides the Oracle Database with a Web Server and also the necessary infrastructure to create dynamic applications. The EPG runs in the XML Database HTTP Server, part of the Oracle Database, and includes the core features of mod_plsql, but does not require the Oracle HTTP Server powered by Apache. Oracle Database 10g Express Edition (XE) also utilizes the EPG.

APEX components

APEX has a number of components, including:

SQL Workshop

Interact with your database as with SQL*Plus, but visually
Data dictionary and object browsing, query by example

Utilities

Load and extract data from the database
Turn a spreadsheet into a table in a few seconds
Generate DDL
Object reports
Monitor the DB and the applications

Application Builder

Centerpiece of APEX
Loaded with wizards
Created reports, forms and charts
Connect pages using branches
75 predefined widgets
Basic HTML, popup lists, calendars, etc.
Full data entry validation

Oracle APEX Pros & Cons

Pros

Fast development.
100% web based
Ready to use components
Professional looking
Easy to create mock-ups
Easy to deploy (end user just needs to open an URL to access an Apex application)
Easy to understand
Fast (no overhead)
Easy to scale
All processing, validations are done in server side
Strong and supportive user community (especially Oracle Apex forum)
Price
Basic support for group development
Free hosting of demo applications provided by Oracle (see link below)

Cons

APEX applications are created using Oracle's own tools and only can be hosted in an Oracle database, making an implementer susceptible to Vendor lock-in. However, this problem is applicable to any other alternative technology like .NET etc.
While applications are exportable to a script form that can be version controlled, the underlying PL/SQL code is not intended to be human-readable or writable, meaning that it is not easy to compare source code revisions.
As an application framework, it can be difficult to customize an application outside of a set of expectations about how an APEX application is supposed to operate.
Large installation (V3.1.2 is 532Mb)
Very few web hosts offer Apex (Oracle Database) on their hosting service package (most of them offer PHP + MySQL or ASP + SQL Server). As a result, Apex applications are limited in their choice of web hosts.
The framework itself adds as little as 0.04 second of overhead to each page request; how well an application scales is primarily based on the efficiency of the SQL queries used by the application developer

Recommendations

APEX IS SUITABLE FOR DEVELOPING WEB APPLICATIONS FOR ORACLE

Enterprises looking for a quick and easy-to-use application development tool for Oracle will find APEX a good fit. APEX is not a replacement for Java or .NET programming environments, but it can help develop and deploy Web applications quickly. Enterprises that are planning to use APEX should:

Initially limit the application development only to tech-savvy users. Although APEX can be used by end users and business users, limit the application development mainly to techsavvy users including developers and DBAs. Having an understanding of the data model and knowing what data to access are key requirements for developing applications using APEX.
Look at consolidating spreadsheet and desktop databases. One of the key benefits of APEX is data sharing, and enterprises can leverage this by consolidating spreadsheets and desktop databases.
Start small and grow. Initially, deploy APEX on one or two Oracle databases mainly to understand the benefits and issues. Also consider developing internal standards and policies, documenting best practices, and sharing templates and forms to maximum productivity.
Nail down any data-security-related issues. Extra security measures should be taken when deploying APEX, especially if the database deals with private data. Perform routine audit of role and user access on databases that run APEX applications.
Limit the development of APEX on production databases. Similar to application development policies for other programming environments, APEX should not be used directly on production environments.
Train the developers and DBAs. Although APEX is relatively easy to learn and use, to take full advantage of the product, consider training developers and DBAs.

Reference:

Apex- Oracle’s Best Kept Secret

APEX, oracle2 comments

What is Oracle Apex?

Oracle Apex can be installed in an Oracle 9.2 or higher database, and starting from Oracle 11g it will be preinstalled along with the database.

APEX’s origin

One of the most well known applications developed in Application Express is the AskTom application developed by Thomas Kyte. Oracle's online store also runs on APEX.

Below are the links:

Oracle Apex Architecture

The Application Express engine performs:

Session state management
Authentication services
Authorization services
Page flow control
Validations processing
Rendering and page processing

APEX components

APEX has a number of components, including:

SQL Workshop

Interact with your database as with SQL*Plus, but visually
Data dictionary and object browsing, query by example

Utilities

Load and extract data from the database
Turn a spreadsheet into a table in a few seconds
Generate DDL
Object reports
Monitor the DB and the applications

Application Builder

Centerpiece of APEX
Loaded with wizards
Created reports, forms and charts
Connect pages using branches
75 predefined widgets
Basic HTML, popup lists, calendars, etc.
Full data entry validation

Oracle APEX Pros & Cons

Pros

Fast development.
100% web based
Ready to use components
Professional looking
Easy to create mock-ups
Easy to deploy (end user just needs to open an URL to access an Apex application)
Easy to understand
Fast (no overhead)
Easy to scale
All processing, validations are done in server side
Strong and supportive user community (especially Oracle Apex forum)
Price
Basic support for group development
Free hosting of demo applications provided by Oracle (see link below)

Cons

APEX applications are created using Oracle's own tools and only can be hosted in an Oracle database, making an implementer susceptible to Vendor lock-in. However, this problem is applicable to any other alternative technology like .NET etc.
While applications are exportable to a script form that can be version controlled, the underlying PL/SQL code is not intended to be human-readable or writable, meaning that it is not easy to compare source code revisions.
As an application framework, it can be difficult to customize an application outside of a set of expectations about how an APEX application is supposed to operate.
Large installation (V3.1.2 is 532Mb)
Very few web hosts offer Apex (Oracle Database) on their hosting service package (most of them offer PHP + MySQL or ASP + SQL Server). As a result, Apex applications are limited in their choice of web hosts.
The framework itself adds as little as 0.04 second of overhead to each page request; how well an application scales is primarily based on the efficiency of the SQL queries used by the application developer

Recommendations

APEX IS SUITABLE FOR DEVELOPING WEB APPLICATIONS FOR ORACLE

Initially limit the application development only to tech-savvy users. Although APEX can be used by end users and business users, limit the application development mainly to techsavvy users including developers and DBAs. Having an understanding of the data model and knowing what data to access are key requirements for developing applications using APEX.
Look at consolidating spreadsheet and desktop databases. One of the key benefits of APEX is data sharing, and enterprises can leverage this by consolidating spreadsheets and desktop databases.
Start small and grow. Initially, deploy APEX on one or two Oracle databases mainly to understand the benefits and issues. Also consider developing internal standards and policies, documenting best practices, and sharing templates and forms to maximum productivity.
Nail down any data-security-related issues. Extra security measures should be taken when deploying APEX, especially if the database deals with private data. Perform routine audit of role and user access on databases that run APEX applications.
Limit the development of APEX on production databases. Similar to application development policies for other programming environments, APEX should not be used directly on production environments.
Train the developers and DBAs. Although APEX is relatively easy to learn and use, to take full advantage of the product, consider training developers and DBAs.

Reference:

Optical Character Recognition (OCR)

OCR1 comment

This article aims to look at implementation of OCR solutions in transaction handling processes to further optimize accuracy and productivity.

What is OCR?

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable and searchable text/content. In today’s world, where information is critical, the ability to create live content where before there was only static images must be worth 10 times than the source.

Why do we need it?

Technologies like electronic data transfer, workflow optimization and OCR can significantly increase the efficiency and accuracy of operations by automating processes and enabling proactive management.

A recent case published by Accenture, where they consulted Finance operations of 120 leading organizations indicates that only about 15 percent of companies currently transact 60 percent or more of their accounts payable and accounts receivable functions on a fully automated basis. This essentially means that a significant number of organizations are still using manual and labor-intensive methods of transaction processing.

While the cost of new tools and technologies is one of the main barrier, most often the absence of an automated solution may primarily from lack of awareness within their organizations about what technology could actually improve transaction processing.

What Setup suits me

Generally, a desktop solution is most appropriate for low run workflows or environments where the quality of the scanned document is very poor and requires inline quality control and validation. Desktop solutions can add value if users need to be directly involved in the recognition process. If one anticipates requirements like scanning of the original paper document, manually selecting the zone on the page to extract content from, and then validating the recognized text, desktop might be the way to go, particularly if we want to repurpose the content.

Companies supplying SaaS or outsourcing solutions often possess cutting edge, high performance hardware and software, and access to off-shore resources to rapidly turn images into searchable output. This methodology compromises the security and confidentiality of the images. Some organizations have serious concerns about the confidentiality and security of external document processing, which overrules this methodology.

In most cases a server-based OCR is the optimal solution for the vast majority of today’s enterprises, enabling them to extract maximum value from corporate documentation at a reasonable cost. While desktop solutions offer more functionality most business find these extra add-ons unnecessary. In a typical workflow the scanned images already exist, created by scanning solutions. The content is most often destined for searchable content repositories where it needs to be indexed for later retrieval but its relationship with the original image remains intact. In these instances an “original image + hidden text” PDF maintains the visual and printable fidelity of the original scan while providing a fully searchable and indexable layer of content underneath.

Conversion Formats and Features

The features to consider in selecting an OCR solution is output formats. It begins with PDF (portable document format), the globally recognized format for standardized electronic documents. PDF offers the advantage of maintaining both the original image fidelity while creating a searchable document but it doesn’t end there. Often, companies may simply want text output for import into databases or any other applications. With an OCR solution, one can transform a scanned image into a fully editable,

MS Word: where even the headers, footers, tables and page numbering are all properly formatted.

MS Excel: where data is extracted into cells basis the layout of the original pdf files

HTML: facilitates sharing document contents, which were originally locked in an image format.

Some crucial features include:

Zonal OCR: performing OCR on a specific zone of the page to extract or read specific information. With zonal OCR we can define what pages and page regions to perform OCR.

Barcode Recognition: Whether a simple 1D barcode or a more advanced 2D barcode, OCR can extract information for use in routing, management, storage, profiling and more.

Optical Mark Recognition (OMR): Works with Zonal OCR, OMR recognizes a specific region of a page to determine the presence of content. Performing zonal recognition on specific boxes on an exam paper or mail-in survey can return a true or false value depending on whether the box was selected or not.

Compression: Scanned images are significantly larger than their electronic counterparts. Example : 25KB MS Word document could result in a 1MB TIFF when scanned. The ability to compress file outputs will enable reduction in server storage costs and allow faster and more efficient distribution or download via web / email.

Page orientation automation: In an automated environment, documents may have been rotated or come in with varying orientation can be automatically rotated to enable optimal reading by the OCR application.

Blank page detection: OCR automatically separates documents when a blank page is found during conversion.

User definable dictionaries: Allow users to setup specific collections of terms that the OCR engine can use to compare what it thinks it is seeing on a page with typical words. Law firms, for example, have a wide collection of terms that can supplement a standard OCR engine dictionary to help the engine guess more accurately what the correct word might be.

Despeckle / Deskew: Image clean-up processes are often handled upstream by the imaging software. Many OCR technologies offer additional clean-up to further ensure the highest quality output. Any extra black ‘dots’ that show up when OCR process is initiated, will result in inaccurate output or higher conversion time, wherein the tool tries to determine if the dots are actual characters or not.

Forms/Document Recognition: Match the structure with a known document type or format which can then drive a specific process.

Intelligent Character Recognition (ICR): OCR falls short when it comes to reading handwriting. In theory, ICR could “learn” to read handwriting, but the current technology is far from perfect and is used only for specialized tasks. While it is useful to have a feature which could identify handwriting but this technology is still being developed to perform with accuracy.

Accuracy of Conversion

OCR in a commercial domain such as free-form invoice processing or claims data entry faces significant challenges in image quality. Since every operation performed on an image subsequent to its acquisition hinges on the quality of the raw data, it is important to maximize the semantic content, even before OCR takes place. It has beene found that data extraction operations have error rates equal to about twice the OCR error rates in normal domains.

It becomes imperative for any process to identify the right set of tools with the right set of features that cater to the need of the business. A blob of pixels, a line, a part of an image, or noise can result into inaccurate conversion. By implementing several powerful and proprietary noise removal and character enhancement algorithms one can enhance the accuracy of the OCR output.

There are four main types of semantic enhancement for a document before OCR. This can be termed as pre-OCR steps which assist in enhancing the quality of conversion.

Line Removal

In claims processing, graphical lines that interfere with text present a significant problem. Even a small misalignment against pre-printed forms can result in a majority of the text on a page being partially obscured by horizontal or vertical lines. Line removal handles this misalignment using a three-stage approach: initial detection, line removal, and obscured text enhancement.

Let’s consider an image of an invoice which suffers misalignment resulting into line defect.

Pre:

Post:

Noise Removal

Noise comes in three flavors - patterned noise, associated noise, and random noise. Patterned noise comes from graphical patterns, especially half tone shaded areas (commonly seen on scanned forms).

Associated noise occurs when a scanned document is incorrectly threshold and surrounding valid pixels appear in the image.

Random noise comes from a bad threshold or a garbled source document or scanner.

All types of noise are removed using a combination of global and local statistical analysis of blob sizes and shapes. The following image shows an example of noise removal in a patterned area.

Dot Matrix Enhancement (Blob Aggregation)

Once noise and lines have been removed, any blob that is adjacent to another blob by less than the average horizontal character separation distance is likely to be a good candidate for aggregation. A statistical analysis is performed on all text lines over a certain height to determine if the form has been filled out with a dot matrix printer. If it has not, then the aggregation routine is bypassed.

Conclusion/Summary

OCR preprocessing and image enhancement greatly increases the accuracy of OCR, even on lower quality images. Since every point decline in OCR accuracy causes a two point decline in data extraction accuracy, this part of the process is critical to the productivity enhancements realized when using complete system for data extraction.

Once the setup is accurate and the tools are in place, OCR deployment results in productivity improvement to the extent on 150 – 180%.

References

Optical Character Recognition (OCR)

OCR2 comments

This article aims to look at implementation of OCR solutions in transaction handling processes to further optimize accuracy and productivity.

What is OCR?

Why do we need it?

What Setup suits me

Conversion Formats and Features

MS Word: where even the headers, footers, tables and page numbering are all properly formatted.

MS Excel: where data is extracted into cells basis the layout of the original pdf files

HTML: facilitates sharing document contents, which were originally locked in an image format.

Some crucial features include:

Zonal OCR: performing OCR on a specific zone of the page to extract or read specific information. With zonal OCR we can define what pages and page regions to perform OCR.

Barcode Recognition: Whether a simple 1D barcode or a more advanced 2D barcode, OCR can extract information for use in routing, management, storage, profiling and more.

Blank page detection: OCR automatically separates documents when a blank page is found during conversion.

Forms/Document Recognition: Match the structure with a known document type or format which can then drive a specific process.

Accuracy of Conversion

There are four main types of semantic enhancement for a document before OCR. This can be termed as pre-OCR steps which assist in enhancing the quality of conversion.

Line Removal

Let’s consider an image of an invoice which suffers misalignment resulting into line defect.

Pre:

Post:

Noise Removal

Noise comes in three flavors - patterned noise, associated noise, and random noise. Patterned noise comes from graphical patterns, especially half tone shaded areas (commonly seen on scanned forms).

Associated noise occurs when a scanned document is incorrectly threshold and surrounding valid pixels appear in the image.

Random noise comes from a bad threshold or a garbled source document or scanner.

All types of noise are removed using a combination of global and local statistical analysis of blob sizes and shapes. The following image shows an example of noise removal in a patterned area.

Dot Matrix Enhancement (Blob Aggregation)

Conclusion/Summary

Once the setup is accurate and the tools are in place, OCR deployment results in productivity improvement to the extent on 150 – 180%.

References

Version Control with Subversion

subversion, version controlNo comments

Introduction

Version control is the art of managing changes to information. It has long been a critical tool for programmers, who typically spend their time making small changes to software and then undoing those changes the next day. But the usefulness of version control software extends far beyond the bounds of the software development world. Anywhere you can find people using computers to manage information that changes often; there is room for version control.

And that's where Subversion comes into play.

What is Subversion?

Subversion is a free/open-source version control system. That is, Subversion manages files and directories over time. A tree of files is placed into a central repository. The repository is much like an ordinary file server, except that it remembers every change ever made to your files and directories. This allows you to recover older versions of your data, or examine the history of how your data changed. Subversion can access its repository across networks, which allows it to be used by people on different computers.

At some level, the ability for various people to modify and manage the same set of data from their respective locations fosters collaboration. And because the work is versioned, you need not fear that quality is the trade-off for losing that conduit—if some incorrect change is made to the data, just undo that change.

Some version control systems are also software configuration management (SCM) systems. These systems are specifically tailored to manage trees of source code, and have many features that are specific to software development— such as natively understanding programming languages, or supplying tools for building software. Subversion, however, is not one of these systems. It is a general system that can be used to manage any collection of files.

Subversion provides:

Directory versioning

CVS only tracks the history of individual files, but Subversion implements a “virtual” versioned filesystem that tracks changes to whole directory trees over time. Files and directories are versioned.

True version history

Since CVS is limited to file versioning, operations such as copies and renames—which might happen to files, but which are really changes to the contents of some containing directory—aren't supported in CVS. Additionally, in CVS you cannot replace a versioned file with some new thing of the same name without the new item inheriting the history of the old—perhaps completely unrelated—file. With Subversion, you can add, delete, copy, and rename both files and directories. And every newly added file begins with a fresh, clean history all of its own.

Atomic commits

A collection of modifications either goes into the repository completely, or not at all. This allows developers to construct and commit changes as logical chunks, and prevents problems that can occur when only a portion of a set of changes is successfully sent to the repository.

Versioned metadata

Each file and directory has a set of properties—keys and their values—associated with it. You can create and store any arbitrary key/value pairs you wish. Properties are versioned over time, just like file contents.

Choice of network layers

Subversion has an abstracted notion of repository access, making it easy for people to implement new network mechanisms. Subversion can plug into the Apache HTTP Server as an extension module. This gives Subversion a big advantage in stability and interoperability, and instant access to existing features provided by that server—authentication, authorization, wire compression, and so on. A more lightweight, standalone Subversion server process is also available.

Consistent data handling

Subversion expresses file differences using a binary differencing algorithm, which works identically on both text (human-readable) and binary (human-unreadable) files. Both types of files are stored equally compressed in the repository, and differences are transmitted in both directions across the network.

Efficient branching and tagging

The cost of branching and tagging need not be proportional to the project size. Subversion creates branches and tags by simply copying the project, using a mechanism similar to a hard-link. Thus these operations take only a very small, constant amount of time.

Hackability

Subversion has no historical baggage; it is implemented as a collection of shared C libraries with well-defined APIs. This makes Subversion extremely maintainable and usable by other applications and languages.

Basic Concepts

This chapter is a short, casual introduction to Subversion. If you're new to version control, this chapter is definitely for you. We begin with a discussion of general version control concepts, work our way into the specific ideas behind Subversion, and show some simple examples of Subversion in use.

Even though the examples in this chapter show people sharing collections of program source code, keep in mind that Subversion can manage any sort of file collection—it's not limited to helping computer programmers.

The Repository

Subversion is a centralized system for sharing information. At its core is a repository, which is a central store of data. The repository stores information in the form of a filesystem tree—a typical hierarchy of files and directories.

Any numbers of clients connect to the repository, and then read or write to these files. By writing data, a client makes the information available to others; by reading data, the client receives information from others.

When a client reads data from the repository, it normally sees only the latest version of the filesystem tree. But the client also has the ability to view previous states of the filesystem. For example, a client can ask historical questions like, “What did this directory contain last Wednesday?” or “Who was the last person to change this file, and what changes did they make?” These are the sorts of questions that are at the heart of any version control system: systems that are designed to record and track changes to data over time.

Versioning Models

The core mission of a version control system is to enable collaborative editing and sharing of data. But different systems use different strategies to achieve this.

The Problem of File-Sharing

All version control systems have to solve the same fundamental problem: how will the system allow users to share information, but prevent them from accidentally stepping on each other's feet? It's all too easy for users to accidentally overwrite each other's changes in the repository.

Suppose we have two co-workers, Harry and Sally. They each decide to edit the same repository file at the same time. If Harry saves his changes to the repository first, then it's possible that (a few moments later) Sally could accidentally overwrite them with her own new version of the file. While Harry's version of the file won't be lost forever (because the system remembers every change), any changes Harry made won't be present in Sally's newer version of the file, because she never saw Harry's changes to begin with. Harry's work is still effectively lost—or at least missing from the latest version of the file—and probably by accident. This is definitely a situation we want to avoid!

The Lock-Modify-Unlock Solution

Many version control systems use a lock-modify-unlock model to address this problem. In such a system, the repository allows only one person to change a file at a time. First Harry must “lock” the file before he can begin making changes to it. Locking a file is a lot like borrowing a book from the library; if Harry has locked a file, then Sally cannot make any changes to it. If she tries to lock the file, the repository will deny the request. All she can do is read the file, and wait for Harry to finish his changes and release his lock. After Harry unlocks the file, his turn is over, and now Sally can take her turn by locking and editing.

The problem with the lock-modify-unlock model is that it's a bit restrictive, and often becomes a roadblock for users:

Locking may cause administrative problems. Sometimes Harry will lock a file and then forget about it. Meanwhile, because Sally is still waiting to edit the file, her hands are tied. And then Harry goes on vacation. Now Sally has to get an administrator to release Harry's lock. The situation ends up causing a lot of unnecessary delay and wasted time.
Locking may cause unnecessary serialization. What if Harry is editing the beginning of a text file, and Sally simply wants to edit the end of the same file? These changes don't overlap at all. They could easily edit the file simultaneously, and no great harm would come, assuming the changes were properly merged together. There's no need for them to take turns in this situation.
Locking may create a false sense of security. Pretend that Harry locks and edits file A, while Sally simultaneously locks and edits file B. But suppose that A and B depend on one another, and the changes made to each are semantically incompatible. Suddenly A and B don't work together anymore. The locking system was powerless to prevent the problem—yet it somehow provided a false sense of security. It's easy for Harry and Sally to imagine that by locking files, each is beginning a safe, insulated task, and thus not bother discussing their incompatible changes early on.

The Copy-Modify-Merge Solution

Subversion, CVS, and other version control systems use a copy-modify-merge model as an alternative to locking. In this model, each user's client contacts the project repository and creates a personal working copy—a local reflection of the repository's files and directories. Users then work in parallel, modifying their private copies. Finally, the private copies are merged together into a new, final version. The version control system often assists with the merging, but ultimately a human being is responsible for making it happen correctly.

Here's an example. Say that Harry and Sally each create working copies of the same project, copied from the repository. They work concurrently, and make changes to the same file A within their copies. Sally saves her changes to the repository first. When Harry attempts to save his changes later, the repository informs him that his file A is out-of-date. In other words, that file A in the repository has somehow changed since he last copied it. So Harry asks his client to merge any new changes from the repository into his working copy of file A. Chances are that Sally's changes don't overlap with his own; so once he has both sets of changes integrated, he saves his working copy back to the repository.

But what if Sally's changes do overlap with Harry's changes? What then? This situation is called a conflict, and it's usually not much of a problem. When Harry asks his client to merge the latest repository changes into his working copy, his copy of file A is somehow flagged as being in a state of conflict: he'll be able to see both sets of conflicting changes, and manually choose between them. Note that software can't automatically resolve conflicts; only humans are capable of understanding and making the necessary intelligent choices. Once Harry has manually resolved the overlapping changes—perhaps after a discussion with Sally—he can safely save the merged file back to the repository.

The copy-modify-merge model may sound a bit chaotic, but in practice, it runs extremely smoothly. Users can work in parallel, never waiting for one another. When they work on the same files, it turns out that most of their concurrent changes don't overlap at all; conflicts are infrequent. And the amount of time it takes to resolve conflicts is far less than the time lost by a locking system.

In the end, it all comes down to one critical factor: user communication. When users communicate poorly, both syntactic and semantic conflicts increase. No system can force users to communicate perfectly, and no system can detect semantic conflicts. So there's no point in being lulled into a false promise that a locking system will somehow prevent conflicts; in practice, locking seems to inhibit productivity more than anything else.

References

Version Control with Subversion

subversion, version controlNo comments

Introduction

And that's where Subversion comes into play.

What is Subversion?

Subversion provides:

Directory versioning

True version history

Atomic commits

Versioned metadata

Choice of network layers

Consistent data handling

Efficient branching and tagging

Hackability

Basic Concepts

The Repository

Versioning Models

The core mission of a version control system is to enable collaborative editing and sharing of data. But different systems use different strategies to achieve this.

The Problem of File-Sharing

The Lock-Modify-Unlock Solution

The problem with the lock-modify-unlock model is that it's a bit restrictive, and often becomes a roadblock for users:

Locking may cause administrative problems. Sometimes Harry will lock a file and then forget about it. Meanwhile, because Sally is still waiting to edit the file, her hands are tied. And then Harry goes on vacation. Now Sally has to get an administrator to release Harry's lock. The situation ends up causing a lot of unnecessary delay and wasted time.
Locking may cause unnecessary serialization. What if Harry is editing the beginning of a text file, and Sally simply wants to edit the end of the same file? These changes don't overlap at all. They could easily edit the file simultaneously, and no great harm would come, assuming the changes were properly merged together. There's no need for them to take turns in this situation.
Locking may create a false sense of security. Pretend that Harry locks and edits file A, while Sally simultaneously locks and edits file B. But suppose that A and B depend on one another, and the changes made to each are semantically incompatible. Suddenly A and B don't work together anymore. The locking system was powerless to prevent the problem—yet it somehow provided a false sense of security. It's easy for Harry and Sally to imagine that by locking files, each is beginning a safe, insulated task, and thus not bother discussing their incompatible changes early on.

The Copy-Modify-Merge Solution

Dec 30, 2009

Dec 26, 2009

What is Oracle Apex?

APEX’s origin

Oracle Apex Architecture

APEX components

Oracle APEX Pros & Cons

Recommendations

Reference:

What is Oracle Apex?

APEX’s origin

Oracle Apex Architecture

APEX components

Oracle APEX Pros & Cons

Recommendations

Reference:

What is OCR?

What Setup suits me

Conversion Formats and Features

Accuracy of Conversion

Conclusion/Summary

References

What is OCR?

What Setup suits me

Conversion Formats and Features

Accuracy of Conversion

Conclusion/Summary

References

Introduction

What is Subversion?

Basic Concepts

Versioning Models

References

Introduction

What is Subversion?

Basic Concepts

Versioning Models

References

Popular Posts

Recent Posts

Categories

Unordered List

Pages

Blog Archive

Text Widget