FreedomLy's blog.

# Python生成词云

2017/03/25 Share

## Python如何生成词云

### 安装WordCloud

#### 通过PIP安装

pip install wordcloud


#### 下载WHL包安装

pip install yourfilepath\wordcloud‑1.3.1‑cp36‑cp36m‑win_amd64.whl


### 使用WordlCloud

from wordcould import WordCloud


#### 参数

Parameters

Font path to the font that will be used (OTF or TTF). Defaults to DroidSansMono path on a Linux machine. If you are on another OS or don’t have this font, you need to adjust this path.

Width of the canvas.

Height of canvas

The ratio of times to try horizontal fitting as opposed to vertical. If prefer_horizontal < 1, the algorithm will try rotating the word if it doesn’t fit. (There is currently no built-in way to get only vertical words.)

If not None, gives a binary mask on where to draw words. If mask is not None, width and height will be ignored and the shape of mask will be used instead. All white (#FF or #FFFFFF) entries will be considerd “masked out” while other entries will be free to draw on. [This changed in the most recent version!]

Scaling between computation and drawing. For large word-cloud images, using scale instead of larger canvas size is significantly faster, but might lead to a coarser fit for the words.

Smallest font size to use. Will stop when there is no more room in this size.

Step size for the font. font_step > 1 might speed up computation but give a worse fit.

The maximum number of words.

The words that will be eliminated. If None, the build-in STOPWORDS list will be used.

Background color for the word cloud image.

Maximum font size for the largest word. If None, height of the image is used.

Transparent background will be generated when mode is “RGBA” and background_color is None.

Importance of relative word frequencies for font-size. With relative_scaling=0, only word-ranks are considered. With relative_scaling=1, a word that is twice as frequent will have twice the size. If you want to consider the word frequencies and not only their rank, relative_scaling around .5 often looks good.

Callable with parameters word, font_size, position, orientation, font_path, random_state that returns a PIL color for each word. Overwrites “colormap”. See colormap for specifying a matplotlib colormap instead.

Regular expression to split the input text into tokens in process_text. If None is specified, r”\w[\w’]+” is used.

Whether to include collocations (bigrams) of two words.

Matplotlib colormap to randomly draw colors from for each word. Ignored if “color_func” is specified.

Whether to remove trailing ‘s’ from words. If True and a word appears with and without a trailing ‘s’, the one with trailing ‘s’ is removed and its counts are added to the version without trailing ‘s’ – unless the word ends with ‘ss’.

End~

[1]: 图片出处: https://github.com/amueller/word_cloud